Multi-Modal LLMs, Vector Stores, Embeddings, Retriever, and Query Engine#

Multi-Modal large language model (LLM) is a Multi-Modal reasoning engine that can complete text and image chat with users, and follow instructions.

Multi-Modal LLM Implementations#

Multi-Modal LLM Interface#

class llama_index.core.multi_modal_llms.base.MultiModalLLM#

Multi-Modal LLM interface.

abstract async achat(messages: Sequence[ChatMessage], **kwargs: Any) ChatResponse#

Async chat endpoint for Multi-Modal LLM.

abstract async acomplete(prompt: str, image_documents: Sequence[ImageDocument], **kwargs: Any) CompletionResponse#

Async completion endpoint for Multi-Modal LLM.

as_query_component(partial: Optional[Dict[str, Any]] = None, **kwargs: Any) QueryComponent#

Get query component.

abstract async astream_chat(messages: Sequence[ChatMessage], **kwargs: Any) AsyncGenerator[ChatResponse, None]#

Async streaming chat endpoint for Multi-Modal LLM.

abstract async astream_complete(prompt: str, image_documents: Sequence[ImageDocument], **kwargs: Any) AsyncGenerator[CompletionResponse, None]#

Async streaming completion endpoint for Multi-Modal LLM.

abstract chat(messages: Sequence[ChatMessage], **kwargs: Any) ChatResponse#

Chat endpoint for Multi-Modal LLM.

classmethod class_name() str#

Get the class name, used as a unique ID in serialization.

This provides a key that makes serialization robust against actual class name changes.

abstract complete(prompt: str, image_documents: Sequence[ImageDocument], **kwargs: Any) CompletionResponse#

Completion endpoint for Multi-Modal LLM.

classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model#

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it adds all passed values

copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model#

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters
  • include – fields to include in new model

  • exclude – fields to exclude from new model, as with values this takes precedence over include

  • update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep – set to True to make a deep copy of the model

Returns

new model instance

dict(**kwargs: Any) Dict[str, Any]#

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

json(**kwargs: Any) str#

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

abstract property metadata: MultiModalLLMMetadata#

Multi-Modal LLM metadata.

abstract stream_chat(messages: Sequence[ChatMessage], **kwargs: Any) Generator[ChatResponse, None, None]#

Stream chat endpoint for Multi-Modal LLM.

abstract stream_complete(prompt: str, image_documents: Sequence[ImageDocument], **kwargs: Any) Generator[CompletionResponse, None, None]#

Streaming completion endpoint for Multi-Modal LLM.

classmethod update_forward_refs(**localns: Any) None#

Try to update ForwardRefs on fields based on this Model, globalns and localns.

Multi-Modal Embedding#

class llama_index.core.embeddings.multi_modal_base.MultiModalEmbedding(*, model_name: str = 'unknown', embed_batch_size: ConstrainedIntValue = 10, callback_manager: CallbackManager = None)#

Base class for Multi Modal embeddings.

async acall(nodes: List[BaseNode], **kwargs: Any) List[BaseNode]#

Async transform nodes.

async aget_agg_embedding_from_queries(queries: List[str], agg_fn: Optional[Callable[[...], List[float]]] = None) List[float]#

Async get aggregated embedding from multiple queries.

async aget_image_embedding(img_file_path: Union[str, BytesIO]) List[float]#

Get image embedding.

async aget_image_embedding_batch(img_file_paths: List[Union[str, BytesIO]], show_progress: bool = False) List[List[float]]#

Asynchronously get a list of image embeddings, with batching.

async aget_query_embedding(query: str) List[float]#

Get query embedding.

async aget_text_embedding(text: str) List[float]#

Async get text embedding.

async aget_text_embedding_batch(texts: List[str], show_progress: bool = False) List[List[float]]#

Asynchronously get a list of text embeddings, with batching.

classmethod class_name() str#

Get the class name, used as a unique ID in serialization.

This provides a key that makes serialization robust against actual class name changes.

classmethod construct(_fields_set: Optional[SetStr] = None, **values: Any) Model#

Creates a new model setting __dict__ and __fields_set__ from trusted or pre-validated data. Default values are respected, but no other validation is performed. Behaves as if Config.extra = ‘allow’ was set since it adds all passed values

copy(*, include: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, exclude: Optional[Union[AbstractSetIntStr, MappingIntStrAny]] = None, update: Optional[DictStrAny] = None, deep: bool = False) Model#

Duplicate a model, optionally choose which fields to include, exclude and change.

Parameters
  • include – fields to include in new model

  • exclude – fields to exclude from new model, as with values this takes precedence over include

  • update – values to change/add in the new model. Note: the data is not validated before creating the new model: you should trust this data

  • deep – set to True to make a deep copy of the model

Returns

new model instance

dict(**kwargs: Any) Dict[str, Any]#

Generate a dictionary representation of the model, optionally specifying which fields to include or exclude.

get_agg_embedding_from_queries(queries: List[str], agg_fn: Optional[Callable[[...], List[float]]] = None) List[float]#

Get aggregated embedding from multiple queries.

get_image_embedding(img_file_path: Union[str, BytesIO]) List[float]#

Embed the input image.

get_image_embedding_batch(img_file_paths: List[Union[str, BytesIO]], show_progress: bool = False) List[List[float]]#

Get a list of image embeddings, with batching.

get_query_embedding(query: str) List[float]#

Embed the input query.

When embedding a query, depending on the model, a special instruction can be prepended to the raw query string. For example, “Represent the question for retrieving supporting documents: “. If you’re curious, other examples of predefined instructions can be found in embeddings/huggingface_utils.py.

get_text_embedding(text: str) List[float]#

Embed the input text.

When embedding text, depending on the model, a special instruction can be prepended to the raw text string. For example, “Represent the document for retrieval: “. If you’re curious, other examples of predefined instructions can be found in embeddings/huggingface_utils.py.

get_text_embedding_batch(texts: List[str], show_progress: bool = False, **kwargs: Any) List[List[float]]#

Get a list of text embeddings, with batching.

json(**kwargs: Any) str#

Generate a JSON representation of the model, include and exclude arguments as per dict().

encoder is an optional function to supply as default to json.dumps(), other arguments as per json.dumps().

similarity(embedding1: List[float], embedding2: List[float], mode: SimilarityMode = SimilarityMode.DEFAULT) float#

Get embedding similarity.

classmethod update_forward_refs(**localns: Any) None#

Try to update ForwardRefs on fields based on this Model, globalns and localns.

Multi-Modal Vector Store Index#

class llama_index.core.indices.multi_modal.base.MultiModalVectorStoreIndex(nodes: Optional[Sequence[BaseNode]] = None, index_struct: Optional[MultiModelIndexDict] = None, embed_model: Optional[BaseEmbedding] = None, storage_context: Optional[StorageContext] = None, use_async: bool = False, store_nodes_override: bool = False, show_progress: bool = False, image_vector_store: Optional[VectorStore] = None, image_embed_model: Union[BaseEmbedding, LCEmbeddings, str] = 'clip', is_image_to_text: bool = False, is_image_vector_store_empty: bool = False, is_text_vector_store_empty: bool = False, service_context: Optional[ServiceContext] = None, **kwargs: Any)#

Multi-Modal Vector Store Index.

Parameters
  • use_async (bool) – Whether to use asynchronous calls. Defaults to False.

  • show_progress (bool) – Whether to show tqdm progress bars. Defaults to False.

  • store_nodes_override (bool) – set to True to always store Node objects in index store and document store even if vector store keeps text. Defaults to False

as_query_engine(llm: Optional[Union[str, LLM, BaseLanguageModel]] = None, **kwargs: Any) BaseQueryEngine#

As query engine.

build_index_from_nodes(nodes: Sequence[BaseNode], **insert_kwargs: Any) IndexDict#

Build the index from nodes.

NOTE: Overrides BaseIndex.build_index_from_nodes.

VectorStoreIndex only stores nodes in document store if vector store does not store text

delete(doc_id: str, **delete_kwargs: Any) None#

Delete a document from the index. All nodes in the index related to the index will be deleted.

Parameters

doc_id (str) – A doc_id of the ingested document

delete_nodes(node_ids: List[str], delete_from_docstore: bool = False, **delete_kwargs: Any) None#

Delete a list of nodes from the index.

Parameters

node_ids (List[str]) – A list of node_ids from the nodes to delete

delete_ref_doc(ref_doc_id: str, delete_from_docstore: bool = False, **delete_kwargs: Any) None#

Delete a document and it’s nodes by using ref_doc_id.

property docstore: BaseDocumentStore#

Get the docstore corresponding to the index.

classmethod from_documents(documents: Sequence[Document], storage_context: Optional[StorageContext] = None, show_progress: bool = False, callback_manager: Optional[CallbackManager] = None, transformations: Optional[List[TransformComponent]] = None, service_context: Optional[ServiceContext] = None, **kwargs: Any) IndexType#

Create index from documents.

Parameters

documents (Optional[Sequence[BaseDocument]]) – List of documents to build the index from.

property index_id: str#

Get the index struct.

property index_struct: IS#

Get the index struct.

index_struct_cls#

alias of MultiModelIndexDict

insert(document: Document, **insert_kwargs: Any) None#

Insert a document.

insert_nodes(nodes: Sequence[BaseNode], **insert_kwargs: Any) None#

Insert nodes.

NOTE: overrides BaseIndex.insert_nodes.

VectorStoreIndex only stores nodes in document store if vector store does not store text

property ref_doc_info: Dict[str, RefDocInfo]#

Retrieve a dict mapping of ingested documents and their nodes+metadata.

refresh(documents: Sequence[Document], **update_kwargs: Any) List[bool]#

Refresh an index with documents that have changed.

This allows users to save LLM and Embedding model calls, while only updating documents that have any changes in text or metadata. It will also insert any documents that previously were not stored.

refresh_ref_docs(documents: Sequence[Document], **update_kwargs: Any) List[bool]#

Refresh an index with documents that have changed.

This allows users to save LLM and Embedding model calls, while only updating documents that have any changes in text or metadata. It will also insert any documents that previously were not stored.

set_index_id(index_id: str) None#

Set the index id.

NOTE: if you decide to set the index_id on the index_struct manually, you will need to explicitly call add_index_struct on the index_store to update the index store.

Parameters

index_id (str) – Index id to set.

update(document: Document, **update_kwargs: Any) None#

Update a document and it’s corresponding nodes.

This is equivalent to deleting the document and then inserting it again.

Parameters
  • document (Union[BaseDocument, BaseIndex]) – document to update

  • insert_kwargs (Dict) – kwargs to pass to insert

  • delete_kwargs (Dict) – kwargs to pass to delete

update_ref_doc(document: Document, **update_kwargs: Any) None#

Update a document and it’s corresponding nodes.

This is equivalent to deleting the document and then inserting it again.

Parameters
  • document (Union[BaseDocument, BaseIndex]) – document to update

  • insert_kwargs (Dict) – kwargs to pass to insert

  • delete_kwargs (Dict) – kwargs to pass to delete

Multi-Modal Vector Index Retriever#

class llama_index.core.indices.multi_modal.retriever.MultiModalVectorIndexRetriever(index: MultiModalVectorStoreIndex, similarity_top_k: int = 2, image_similarity_top_k: int = 2, vector_store_query_mode: VectorStoreQueryMode = VectorStoreQueryMode.DEFAULT, filters: Optional[MetadataFilters] = None, alpha: Optional[float] = None, node_ids: Optional[List[str]] = None, doc_ids: Optional[List[str]] = None, sparse_top_k: Optional[int] = None, callback_manager: Optional[CallbackManager] = None, **kwargs: Any)#

Multi Modal Vector index retriever.

Parameters
  • index (MultiModalVectorIndexRetriever) – Multi Modal vector store index for images and texts.

  • similarity_top_k (int) – number of top k results to return.

  • vector_store_query_mode (str) – vector store query mode See reference for VectorStoreQueryMode for full list of supported modes.

  • filters (Optional[MetadataFilters]) – metadata filters, defaults to None

  • alpha (float) – weight for sparse/dense retrieval, only used for hybrid query mode.

  • doc_ids (Optional[List[str]]) – list of documents to constrain search.

  • vector_store_kwargs (dict) – Additional vector store specific kwargs to pass through to the vector store at query time.

async aimage_to_image_retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Async Retrieve image nodes given image query.

Implemented by the user.

as_query_component(partial: Optional[Dict[str, Any]] = None, **kwargs: Any) QueryComponent#

Get query component.

async atext_retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Async Retrieve text nodes given text query.

Implemented by the user.

async atext_to_image_retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Async Retrieve image nodes given text query.

Implemented by the user.

get_prompts() Dict[str, BasePromptTemplate]#

Get a prompt.

get_service_context() Optional[ServiceContext]#

Attempts to resolve a service context. Short-circuits at self.service_context, self._service_context, or self._index.service_context.

property image_similarity_top_k: int#

Return image similarity top k.

image_to_image_retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Retrieve image nodes given image query.

Implemented by the user.

retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Retrieve nodes given query.

Parameters

str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.

property similarity_top_k: int#

Return similarity top k.

text_retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Retrieve text nodes given text query.

Implemented by the user.

text_to_image_retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Retrieve image nodes given text query.

Implemented by the user.

update_prompts(prompts_dict: Dict[str, BasePromptTemplate]) None#

Update prompts.

Other prompts will remain in place.

Multi-Modal Retriever Interface#

class llama_index.core.base.base_multi_modal_retriever.MultiModalRetriever(callback_manager: Optional[CallbackManager] = None, object_map: Optional[Dict] = None, objects: Optional[List[IndexNode]] = None, verbose: bool = False)#

Multi Modal base retriever.

abstract async aimage_to_image_retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Async Retrieve image nodes given image query.

Implemented by the user.

as_query_component(partial: Optional[Dict[str, Any]] = None, **kwargs: Any) QueryComponent#

Get query component.

abstract async atext_retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Async Retrieve text nodes given text query.

Implemented by the user.

abstract async atext_to_image_retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Async Retrieve image nodes given text query.

Implemented by the user.

get_prompts() Dict[str, BasePromptTemplate]#

Get a prompt.

get_service_context() Optional[ServiceContext]#

Attempts to resolve a service context. Short-circuits at self.service_context, self._service_context, or self._index.service_context.

abstract image_to_image_retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Retrieve image nodes given image query.

Implemented by the user.

retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Retrieve nodes given query.

Parameters

str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.

abstract text_retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Retrieve text nodes given text query.

Implemented by the user.

abstract text_to_image_retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Retrieve image nodes given text query.

Implemented by the user.

update_prompts(prompts_dict: Dict[str, BasePromptTemplate]) None#

Update prompts.

Other prompts will remain in place.

Multi-Modal Simple Query Engine#

class llama_index.core.query_engine.multi_modal.SimpleMultiModalQueryEngine(retriever: MultiModalVectorIndexRetriever, multi_modal_llm: Optional[MultiModalLLM] = None, text_qa_template: Optional[BasePromptTemplate] = None, image_qa_template: Optional[BasePromptTemplate] = None, node_postprocessors: Optional[List[BaseNodePostprocessor]] = None, callback_manager: Optional[CallbackManager] = None, **kwargs: Any)#

Simple Multi Modal Retriever query engine.

Assumes that retrieved text context fits within context window of LLM, along with images.

Parameters
as_query_component(partial: Optional[Dict[str, Any]] = None, **kwargs: Any) QueryComponent#

Get query component.

get_prompts() Dict[str, BasePromptTemplate]#

Get a prompt.

image_query(image_path: Union[str, QueryBundle], prompt_str: str) Union[Response, StreamingResponse, PydanticResponse]#

Answer a image query.

property retriever: MultiModalVectorIndexRetriever#

Get the retriever object.

update_prompts(prompts_dict: Dict[str, BasePromptTemplate]) None#

Update prompts.

Other prompts will remain in place.