Knowledge Graph Index#

Building the Knowledge Graph Index

KG-based data structures.

llama_index.indices.knowledge_graph.GPTKnowledgeGraphIndex#

alias of KnowledgeGraphIndex

class llama_index.indices.knowledge_graph.KGTableRetriever(index: KnowledgeGraphIndex, query_keyword_extract_template: Optional[BasePromptTemplate] = None, max_keywords_per_query: int = 10, num_chunks_per_query: int = 10, include_text: bool = True, retriever_mode: Optional[KGRetrieverMode] = KGRetrieverMode.KEYWORD, similarity_top_k: int = 2, graph_store_query_depth: int = 2, use_global_node_triplets: bool = False, max_knowledge_sequence: int = 30, callback_manager: Optional[CallbackManager] = None, object_map: Optional[dict] = None, verbose: bool = False, **kwargs: Any)#

KG Table Retriever.

Arguments are shared among subclasses.

Parameters
  • query_keyword_extract_template (Optional[QueryKGExtractPrompt]) – A Query KG Extraction Prompt (see Prompt Templates).

  • refine_template (Optional[BasePromptTemplate]) – A Refinement Prompt (see Prompt Templates).

  • text_qa_template (Optional[BasePromptTemplate]) – A Question Answering Prompt (see Prompt Templates).

  • max_keywords_per_query (int) – Maximum number of keywords to extract from query.

  • num_chunks_per_query (int) – Maximum number of text chunks to query.

  • include_text (bool) – Use the document text source from each relevant triplet during queries.

  • retriever_mode (KGRetrieverMode) – Specifies whether to use keywords, embeddings, or both to find relevant triplets. Should be one of “keyword”, “embedding”, or “hybrid”.

  • similarity_top_k (int) – The number of top embeddings to use (if embeddings are used).

  • graph_store_query_depth (int) – The depth of the graph store query.

  • use_global_node_triplets (bool) – Whether to get more keywords(entities) from text chunks matched by keywords. This helps introduce more global knowledge. While it’s more expensive, thus to be turned off by default.

  • max_knowledge_sequence (int) – The maximum number of knowledge sequence to include in the response. By default, it’s 30.

as_query_component(partial: Optional[Dict[str, Any]] = None, **kwargs: Any) QueryComponent#

Get query component.

get_prompts() Dict[str, BasePromptTemplate]#

Get a prompt.

get_service_context() Optional[ServiceContext]#

Attempts to resolve a service context. Short-circuits at self.service_context, self._service_context, or self._index.service_context.

retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Retrieve nodes given query.

Parameters

str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.

update_prompts(prompts_dict: Dict[str, BasePromptTemplate]) None#

Update prompts.

Other prompts will remain in place.

class llama_index.indices.knowledge_graph.KnowledgeGraphIndex(nodes: Optional[Sequence[BaseNode]] = None, objects: Optional[Sequence[IndexNode]] = None, index_struct: Optional[KG] = None, service_context: Optional[ServiceContext] = None, storage_context: Optional[StorageContext] = None, kg_triple_extract_template: Optional[BasePromptTemplate] = None, max_triplets_per_chunk: int = 10, include_embeddings: bool = False, show_progress: bool = False, max_object_length: int = 128, kg_triplet_extract_fn: Optional[Callable] = None, **kwargs: Any)#

Knowledge Graph Index.

Build a KG by extracting triplets, and leveraging the KG during query-time.

Parameters
  • kg_triple_extract_template (BasePromptTemplate) – The prompt to use for extracting triplets.

  • max_triplets_per_chunk (int) – The maximum number of triplets to extract.

  • service_context (Optional[ServiceContext]) – The service context to use.

  • storage_context (Optional[StorageContext]) – The storage context to use.

  • graph_store (Optional[GraphStore]) – The graph store to use.

  • show_progress (bool) – Whether to show tqdm progress bars. Defaults to False.

  • include_embeddings (bool) – Whether to include embeddings in the index. Defaults to False.

  • max_object_length (int) – The maximum length of the object in a triplet. Defaults to 128.

  • kg_triplet_extract_fn (Optional[Callable]) – The function to use for extracting triplets. Defaults to None.

add_node(keywords: List[str], node: BaseNode) None#

Add node.

Used for manual insertion of nodes (keyed by keywords).

Parameters
  • keywords (List[str]) – Keywords to index the node.

  • node (Node) – Node to be indexed.

build_index_from_nodes(nodes: Sequence[BaseNode]) IS#

Build the index from nodes.

delete_nodes(node_ids: List[str], delete_from_docstore: bool = False, **delete_kwargs: Any) None#

Delete a list of nodes from the index.

Parameters

doc_ids (List[str]) – A list of doc_ids from the nodes to delete

delete_ref_doc(ref_doc_id: str, delete_from_docstore: bool = False, **delete_kwargs: Any) None#

Delete a document and it’s nodes by using ref_doc_id.

classmethod from_documents(documents: Sequence[Document], storage_context: Optional[StorageContext] = None, service_context: Optional[ServiceContext] = None, show_progress: bool = False, **kwargs: Any) IndexType#

Create index from documents.

Parameters

documents (Optional[Sequence[BaseDocument]]) – List of documents to build the index from.

get_networkx_graph(limit: int = 100) Any#

Get networkx representation of the graph structure.

Parameters

limit (int) – Number of starting nodes to be included in the graph.

NOTE: This function requires networkx to be installed. NOTE: This is a beta feature.

property index_id: str#

Get the index struct.

insert(document: Document, **insert_kwargs: Any) None#

Insert a document.

insert_nodes(nodes: Sequence[BaseNode], **insert_kwargs: Any) None#

Insert nodes.

property ref_doc_info: Dict[str, RefDocInfo]#

Retrieve a dict mapping of ingested documents and their nodes+metadata.

refresh(documents: Sequence[Document], **update_kwargs: Any) List[bool]#

Refresh an index with documents that have changed.

This allows users to save LLM and Embedding model calls, while only updating documents that have any changes in text or metadata. It will also insert any documents that previously were not stored.

refresh_ref_docs(documents: Sequence[Document], **update_kwargs: Any) List[bool]#

Refresh an index with documents that have changed.

This allows users to save LLM and Embedding model calls, while only updating documents that have any changes in text or metadata. It will also insert any documents that previously were not stored.

set_index_id(index_id: str) None#

Set the index id.

NOTE: if you decide to set the index_id on the index_struct manually, you will need to explicitly call add_index_struct on the index_store to update the index store.

Parameters

index_id (str) – Index id to set.

update(document: Document, **update_kwargs: Any) None#

Update a document and it’s corresponding nodes.

This is equivalent to deleting the document and then inserting it again.

Parameters
  • document (Union[BaseDocument, BaseIndex]) – document to update

  • insert_kwargs (Dict) – kwargs to pass to insert

  • delete_kwargs (Dict) – kwargs to pass to delete

update_ref_doc(document: Document, **update_kwargs: Any) None#

Update a document and it’s corresponding nodes.

This is equivalent to deleting the document and then inserting it again.

Parameters
  • document (Union[BaseDocument, BaseIndex]) – document to update

  • insert_kwargs (Dict) – kwargs to pass to insert

  • delete_kwargs (Dict) – kwargs to pass to delete

upsert_triplet(triplet: Tuple[str, str, str]) None#

Insert triplets.

Used for manual insertion of KG triplets (in the form of (subject, relationship, object)).

Parameters

triplet (str) – Knowledge triplet

upsert_triplet_and_node(triplet: Tuple[str, str, str], node: BaseNode) None#

Upsert KG triplet and node.

Calls both upsert_triplet and add_node. Behavior is idempotent; if Node already exists, only triplet will be added.

Parameters
  • keywords (List[str]) – Keywords to index the node.

  • node (Node) – Node to be indexed.

class llama_index.indices.knowledge_graph.KnowledgeGraphRAGRetriever(service_context: Optional[ServiceContext] = None, storage_context: Optional[StorageContext] = None, entity_extract_fn: Optional[Callable] = None, entity_extract_template: Optional[BasePromptTemplate] = None, entity_extract_policy: Optional[str] = 'union', synonym_expand_fn: Optional[Callable] = None, synonym_expand_template: Optional[BasePromptTemplate] = None, synonym_expand_policy: Optional[str] = 'union', max_entities: int = 5, max_synonyms: int = 5, retriever_mode: Optional[str] = 'keyword', with_nl2graphquery: bool = False, graph_traversal_depth: int = 2, max_knowledge_sequence: int = 30, verbose: bool = False, callback_manager: Optional[CallbackManager] = None, **kwargs: Any)#

Knowledge Graph RAG retriever.

Retriever that perform SubGraph RAG towards knowledge graph.

Parameters
  • service_context (Optional[ServiceContext]) – A service context to use.

  • storage_context (Optional[StorageContext]) – A storage context to use.

  • entity_extract_fn (Optional[Callable]) – A function to extract entities.

  • Optional[BasePromptTemplate]) (entity_extract_template) – A Query Key Entity Extraction Prompt (see Prompt Templates).

  • entity_extract_policy (Optional[str]) – The entity extraction policy to use. default: “union” possible values: “union”, “intersection”

  • synonym_expand_fn (Optional[Callable]) – A function to expand synonyms.

  • synonym_expand_template (Optional[QueryKeywordExpandPrompt]) – A Query Key Entity Expansion Prompt (see Prompt Templates).

  • synonym_expand_policy (Optional[str]) – The synonym expansion policy to use. default: “union” possible values: “union”, “intersection”

  • max_entities (int) – The maximum number of entities to extract. default: 5

  • max_synonyms (int) – The maximum number of synonyms to expand per entity. default: 5

  • retriever_mode (Optional[str]) – The retriever mode to use. default: “keyword” possible values: “keyword”, “embedding”, “keyword_embedding”

  • with_nl2graphquery (bool) – Whether to combine NL2GraphQuery in context. default: False

  • graph_traversal_depth (int) – The depth of graph traversal. default: 2

  • max_knowledge_sequence (int) – The maximum number of knowledge sequence to include in the response. By default, it’s 30.

  • verbose (bool) – Whether to print out debug info.

as_query_component(partial: Optional[Dict[str, Any]] = None, **kwargs: Any) QueryComponent#

Get query component.

get_prompts() Dict[str, BasePromptTemplate]#

Get a prompt.

get_service_context() Optional[ServiceContext]#

Attempts to resolve a service context. Short-circuits at self.service_context, self._service_context, or self._index.service_context.

retrieve(str_or_query_bundle: Union[str, QueryBundle]) List[NodeWithScore]#

Retrieve nodes given query.

Parameters

str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.

update_prompts(prompts_dict: Dict[str, BasePromptTemplate]) None#

Update prompts.

Other prompts will remain in place.