Table Index#

Building the Keyword Table Index

Keyword Table Index Data Structures.

llama_index.core.indices.keyword_table.GPTKeywordTableIndex#

alias of KeywordTableIndex

llama_index.core.indices.keyword_table.GPTRAKEKeywordTableIndex#

alias of RAKEKeywordTableIndex

llama_index.core.indices.keyword_table.GPTSimpleKeywordTableIndex#

alias of SimpleKeywordTableIndex

class llama_index.core.indices.keyword_table.KeywordTableGPTRetriever(index: BaseKeywordTableIndex, keyword_extract_template: Optional[BasePromptTemplate] = None, query_keyword_extract_template: Optional[BasePromptTemplate] = None, max_keywords_per_query: int = 10, num_chunks_per_query: int = 10, llm: Optional[LLM] = None, callback_manager: Optional[CallbackManager] = None, object_map: Optional[dict] = None, verbose: bool = False, **kwargs: Any)#

Keyword Table Index GPT Retriever.

Extracts keywords using GPT. Set when using retriever_mode=”default”.

See BaseGPTKeywordTableQuery for arguments.

as_query_component(partial: Optional[Dict[str, Any]] = None, **kwargs: Any) β†’ QueryComponent#

Get query component.

get_prompts() β†’ Dict[str, BasePromptTemplate]#

Get a prompt.

get_service_context() β†’ Optional[ServiceContext]#

Attempts to resolve a service context. Short-circuits at self.service_context, self._service_context, or self._index.service_context.

retrieve(str_or_query_bundle: Union[str, QueryBundle]) β†’ List[NodeWithScore]#

Retrieve nodes given query.

Parameters

str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.

update_prompts(prompts_dict: Dict[str, BasePromptTemplate]) β†’ None#

Update prompts.

Other prompts will remain in place.

class llama_index.core.indices.keyword_table.KeywordTableIndex(nodes: Optional[Sequence[BaseNode]] = None, objects: Optional[Sequence[IndexNode]] = None, index_struct: Optional[KeywordTable] = None, llm: Optional[LLM] = None, service_context: Optional[ServiceContext] = None, keyword_extract_template: Optional[BasePromptTemplate] = None, max_keywords_per_chunk: int = 10, use_async: bool = False, show_progress: bool = False, **kwargs: Any)#

Keyword Table Index.

This index uses a GPT model to extract keywords from the text.

build_index_from_nodes(nodes: Sequence[BaseNode]) β†’ IS#

Build the index from nodes.

delete(doc_id: str, **delete_kwargs: Any) β†’ None#

Delete a document from the index. All nodes in the index related to the index will be deleted.

Parameters

doc_id (str) – A doc_id of the ingested document

delete_nodes(node_ids: List[str], delete_from_docstore: bool = False, **delete_kwargs: Any) β†’ None#

Delete a list of nodes from the index.

Parameters

doc_ids (List[str]) – A list of doc_ids from the nodes to delete

delete_ref_doc(ref_doc_id: str, delete_from_docstore: bool = False, **delete_kwargs: Any) β†’ None#

Delete a document and it’s nodes by using ref_doc_id.

property docstore: BaseDocumentStore#

Get the docstore corresponding to the index.

classmethod from_documents(documents: Sequence[Document], storage_context: Optional[StorageContext] = None, show_progress: bool = False, callback_manager: Optional[CallbackManager] = None, transformations: Optional[List[TransformComponent]] = None, service_context: Optional[ServiceContext] = None, **kwargs: Any) β†’ IndexType#

Create index from documents.

Parameters

documents (Optional[Sequence[BaseDocument]]) – List of documents to build the index from.

property index_id: str#

Get the index struct.

property index_struct: IS#

Get the index struct.

index_struct_cls#

alias of KeywordTable

insert(document: Document, **insert_kwargs: Any) β†’ None#

Insert a document.

insert_nodes(nodes: Sequence[BaseNode], **insert_kwargs: Any) β†’ None#

Insert nodes.

property ref_doc_info: Dict[str, RefDocInfo]#

Retrieve a dict mapping of ingested documents and their nodes+metadata.

refresh(documents: Sequence[Document], **update_kwargs: Any) β†’ List[bool]#

Refresh an index with documents that have changed.

This allows users to save LLM and Embedding model calls, while only updating documents that have any changes in text or metadata. It will also insert any documents that previously were not stored.

refresh_ref_docs(documents: Sequence[Document], **update_kwargs: Any) β†’ List[bool]#

Refresh an index with documents that have changed.

This allows users to save LLM and Embedding model calls, while only updating documents that have any changes in text or metadata. It will also insert any documents that previously were not stored.

set_index_id(index_id: str) β†’ None#

Set the index id.

NOTE: if you decide to set the index_id on the index_struct manually, you will need to explicitly call add_index_struct on the index_store to update the index store.

Parameters

index_id (str) – Index id to set.

update(document: Document, **update_kwargs: Any) β†’ None#

Update a document and it’s corresponding nodes.

This is equivalent to deleting the document and then inserting it again.

Parameters
  • document (Union[BaseDocument, BaseIndex]) – document to update

  • insert_kwargs (Dict) – kwargs to pass to insert

  • delete_kwargs (Dict) – kwargs to pass to delete

update_ref_doc(document: Document, **update_kwargs: Any) β†’ None#

Update a document and it’s corresponding nodes.

This is equivalent to deleting the document and then inserting it again.

Parameters
  • document (Union[BaseDocument, BaseIndex]) – document to update

  • insert_kwargs (Dict) – kwargs to pass to insert

  • delete_kwargs (Dict) – kwargs to pass to delete

class llama_index.core.indices.keyword_table.KeywordTableRAKERetriever(index: BaseKeywordTableIndex, keyword_extract_template: Optional[BasePromptTemplate] = None, query_keyword_extract_template: Optional[BasePromptTemplate] = None, max_keywords_per_query: int = 10, num_chunks_per_query: int = 10, callback_manager: Optional[CallbackManager] = None, object_map: Optional[dict] = None, verbose: bool = False, **kwargs: Any)#

Keyword Table Index RAKE Retriever.

Extracts keywords using RAKE keyword extractor. Set when retriever_mode=”rake”.

See BaseGPTKeywordTableQuery for arguments.

as_query_component(partial: Optional[Dict[str, Any]] = None, **kwargs: Any) β†’ QueryComponent#

Get query component.

get_prompts() β†’ Dict[str, BasePromptTemplate]#

Get a prompt.

get_service_context() β†’ Optional[ServiceContext]#

Attempts to resolve a service context. Short-circuits at self.service_context, self._service_context, or self._index.service_context.

retrieve(str_or_query_bundle: Union[str, QueryBundle]) β†’ List[NodeWithScore]#

Retrieve nodes given query.

Parameters

str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.

update_prompts(prompts_dict: Dict[str, BasePromptTemplate]) β†’ None#

Update prompts.

Other prompts will remain in place.

class llama_index.core.indices.keyword_table.KeywordTableSimpleRetriever(index: BaseKeywordTableIndex, keyword_extract_template: Optional[BasePromptTemplate] = None, query_keyword_extract_template: Optional[BasePromptTemplate] = None, max_keywords_per_query: int = 10, num_chunks_per_query: int = 10, callback_manager: Optional[CallbackManager] = None, object_map: Optional[dict] = None, verbose: bool = False, **kwargs: Any)#

Keyword Table Index Simple Retriever.

Extracts keywords using simple regex-based keyword extractor. Set when retriever_mode=”simple”.

See BaseGPTKeywordTableQuery for arguments.

as_query_component(partial: Optional[Dict[str, Any]] = None, **kwargs: Any) β†’ QueryComponent#

Get query component.

get_prompts() β†’ Dict[str, BasePromptTemplate]#

Get a prompt.

get_service_context() β†’ Optional[ServiceContext]#

Attempts to resolve a service context. Short-circuits at self.service_context, self._service_context, or self._index.service_context.

retrieve(str_or_query_bundle: Union[str, QueryBundle]) β†’ List[NodeWithScore]#

Retrieve nodes given query.

Parameters

str_or_query_bundle (QueryType) – Either a query string or a QueryBundle object.

update_prompts(prompts_dict: Dict[str, BasePromptTemplate]) β†’ None#

Update prompts.

Other prompts will remain in place.

class llama_index.core.indices.keyword_table.RAKEKeywordTableIndex(nodes: Optional[Sequence[BaseNode]] = None, objects: Optional[Sequence[IndexNode]] = None, index_struct: Optional[KeywordTable] = None, llm: Optional[LLM] = None, service_context: Optional[ServiceContext] = None, keyword_extract_template: Optional[BasePromptTemplate] = None, max_keywords_per_chunk: int = 10, use_async: bool = False, show_progress: bool = False, **kwargs: Any)#

RAKE Keyword Table Index.

This index uses a RAKE keyword extractor to extract keywords from the text.

build_index_from_nodes(nodes: Sequence[BaseNode]) β†’ IS#

Build the index from nodes.

delete(doc_id: str, **delete_kwargs: Any) β†’ None#

Delete a document from the index. All nodes in the index related to the index will be deleted.

Parameters

doc_id (str) – A doc_id of the ingested document

delete_nodes(node_ids: List[str], delete_from_docstore: bool = False, **delete_kwargs: Any) β†’ None#

Delete a list of nodes from the index.

Parameters

doc_ids (List[str]) – A list of doc_ids from the nodes to delete

delete_ref_doc(ref_doc_id: str, delete_from_docstore: bool = False, **delete_kwargs: Any) β†’ None#

Delete a document and it’s nodes by using ref_doc_id.

property docstore: BaseDocumentStore#

Get the docstore corresponding to the index.

classmethod from_documents(documents: Sequence[Document], storage_context: Optional[StorageContext] = None, show_progress: bool = False, callback_manager: Optional[CallbackManager] = None, transformations: Optional[List[TransformComponent]] = None, service_context: Optional[ServiceContext] = None, **kwargs: Any) β†’ IndexType#

Create index from documents.

Parameters

documents (Optional[Sequence[BaseDocument]]) – List of documents to build the index from.

property index_id: str#

Get the index struct.

property index_struct: IS#

Get the index struct.

index_struct_cls#

alias of KeywordTable

insert(document: Document, **insert_kwargs: Any) β†’ None#

Insert a document.

insert_nodes(nodes: Sequence[BaseNode], **insert_kwargs: Any) β†’ None#

Insert nodes.

property ref_doc_info: Dict[str, RefDocInfo]#

Retrieve a dict mapping of ingested documents and their nodes+metadata.

refresh(documents: Sequence[Document], **update_kwargs: Any) β†’ List[bool]#

Refresh an index with documents that have changed.

This allows users to save LLM and Embedding model calls, while only updating documents that have any changes in text or metadata. It will also insert any documents that previously were not stored.

refresh_ref_docs(documents: Sequence[Document], **update_kwargs: Any) β†’ List[bool]#

Refresh an index with documents that have changed.

This allows users to save LLM and Embedding model calls, while only updating documents that have any changes in text or metadata. It will also insert any documents that previously were not stored.

set_index_id(index_id: str) β†’ None#

Set the index id.

NOTE: if you decide to set the index_id on the index_struct manually, you will need to explicitly call add_index_struct on the index_store to update the index store.

Parameters

index_id (str) – Index id to set.

update(document: Document, **update_kwargs: Any) β†’ None#

Update a document and it’s corresponding nodes.

This is equivalent to deleting the document and then inserting it again.

Parameters
  • document (Union[BaseDocument, BaseIndex]) – document to update

  • insert_kwargs (Dict) – kwargs to pass to insert

  • delete_kwargs (Dict) – kwargs to pass to delete

update_ref_doc(document: Document, **update_kwargs: Any) β†’ None#

Update a document and it’s corresponding nodes.

This is equivalent to deleting the document and then inserting it again.

Parameters
  • document (Union[BaseDocument, BaseIndex]) – document to update

  • insert_kwargs (Dict) – kwargs to pass to insert

  • delete_kwargs (Dict) – kwargs to pass to delete

class llama_index.core.indices.keyword_table.SimpleKeywordTableIndex(nodes: Optional[Sequence[BaseNode]] = None, objects: Optional[Sequence[IndexNode]] = None, index_struct: Optional[KeywordTable] = None, llm: Optional[LLM] = None, service_context: Optional[ServiceContext] = None, keyword_extract_template: Optional[BasePromptTemplate] = None, max_keywords_per_chunk: int = 10, use_async: bool = False, show_progress: bool = False, **kwargs: Any)#

Simple Keyword Table Index.

This index uses a simple regex extractor to extract keywords from the text.

build_index_from_nodes(nodes: Sequence[BaseNode]) β†’ IS#

Build the index from nodes.

delete(doc_id: str, **delete_kwargs: Any) β†’ None#

Delete a document from the index. All nodes in the index related to the index will be deleted.

Parameters

doc_id (str) – A doc_id of the ingested document

delete_nodes(node_ids: List[str], delete_from_docstore: bool = False, **delete_kwargs: Any) β†’ None#

Delete a list of nodes from the index.

Parameters

doc_ids (List[str]) – A list of doc_ids from the nodes to delete

delete_ref_doc(ref_doc_id: str, delete_from_docstore: bool = False, **delete_kwargs: Any) β†’ None#

Delete a document and it’s nodes by using ref_doc_id.

property docstore: BaseDocumentStore#

Get the docstore corresponding to the index.

classmethod from_documents(documents: Sequence[Document], storage_context: Optional[StorageContext] = None, show_progress: bool = False, callback_manager: Optional[CallbackManager] = None, transformations: Optional[List[TransformComponent]] = None, service_context: Optional[ServiceContext] = None, **kwargs: Any) β†’ IndexType#

Create index from documents.

Parameters

documents (Optional[Sequence[BaseDocument]]) – List of documents to build the index from.

property index_id: str#

Get the index struct.

property index_struct: IS#

Get the index struct.

index_struct_cls#

alias of KeywordTable

insert(document: Document, **insert_kwargs: Any) β†’ None#

Insert a document.

insert_nodes(nodes: Sequence[BaseNode], **insert_kwargs: Any) β†’ None#

Insert nodes.

property ref_doc_info: Dict[str, RefDocInfo]#

Retrieve a dict mapping of ingested documents and their nodes+metadata.

refresh(documents: Sequence[Document], **update_kwargs: Any) β†’ List[bool]#

Refresh an index with documents that have changed.

This allows users to save LLM and Embedding model calls, while only updating documents that have any changes in text or metadata. It will also insert any documents that previously were not stored.

refresh_ref_docs(documents: Sequence[Document], **update_kwargs: Any) β†’ List[bool]#

Refresh an index with documents that have changed.

This allows users to save LLM and Embedding model calls, while only updating documents that have any changes in text or metadata. It will also insert any documents that previously were not stored.

set_index_id(index_id: str) β†’ None#

Set the index id.

NOTE: if you decide to set the index_id on the index_struct manually, you will need to explicitly call add_index_struct on the index_store to update the index store.

Parameters

index_id (str) – Index id to set.

update(document: Document, **update_kwargs: Any) β†’ None#

Update a document and it’s corresponding nodes.

This is equivalent to deleting the document and then inserting it again.

Parameters
  • document (Union[BaseDocument, BaseIndex]) – document to update

  • insert_kwargs (Dict) – kwargs to pass to insert

  • delete_kwargs (Dict) – kwargs to pass to delete

update_ref_doc(document: Document, **update_kwargs: Any) β†’ None#

Update a document and it’s corresponding nodes.

This is equivalent to deleting the document and then inserting it again.

Parameters
  • document (Union[BaseDocument, BaseIndex]) – document to update

  • insert_kwargs (Dict) – kwargs to pass to insert

  • delete_kwargs (Dict) – kwargs to pass to delete