Multi-Document Agents (V1)¶
In this guide, you learn towards setting up a multi-document agent over the LlamaIndex documentation.
This is an extension of V0 multi-document agents with the additional features:
- Reranking during document (tool) retrieval
- Query planning tool that the agent can use to plan
We do this with the following architecture:
- setup a "document agent" over each Document: each doc agent can do QA/summarization within its doc
- setup a top-level agent over this set of document agents. Do tool retrieval and then do CoT over the set of tools to answer a question.
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
%pip install llama-index-agent-openai
%pip install llama-index-readers-file
%pip install llama-index-postprocessor-cohere-rerank
%pip install llama-index-llms-openai
%pip install llama-index-embeddings-openai
!pip install llama-index llama-hub
%load_ext autoreload
%autoreload 2
Setup and Download Data¶
In this section, we'll load in the LlamaIndex documentation.
domain = "docs.llamaindex.ai"
docs_url = "https://docs.llamaindex.ai/en/latest/"
!wget -e robots=off --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains {domain} --no-parent {docs_url}
from llama_index.readers.file import UnstructuredReader
reader = UnstructuredReader()
from pathlib import Path
all_files_gen = Path("./docs.llamaindex.ai/").rglob("*")
all_files = [f.resolve() for f in all_files_gen]
all_html_files = [f for f in all_files if f.suffix.lower() == ".html"]
len(all_html_files)
638
from llama_index.core import Document
# TODO: set to higher value if you want more docs
doc_limit = 100
docs = []
for idx, f in enumerate(all_html_files):
if idx > doc_limit:
break
print(f"Idx {idx}/{len(all_html_files)}")
loaded_docs = reader.load_data(file=f, split_documents=True)
# Hardcoded Index. Everything before this is ToC for all pages
start_idx = 72
loaded_doc = Document(
text="\n\n".join([d.get_content() for d in loaded_docs[72:]]),
metadata={"path": str(f)},
)
print(loaded_doc.metadata["path"])
docs.append(loaded_doc)
Define Global LLM + Embeddings
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
import nest_asyncio
nest_asyncio.apply()
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
Building Multi-Document Agents¶
In this section we show you how to construct the multi-document agent. We first build a document agent for each document, and then define the top-level parent agent with an object index.
Build Document Agent for each Document¶
In this section we define "document agents" for each document.
We define both a vector index (for semantic search) and summary index (for summarization) for each document. The two query engines are then converted into tools that are passed to an OpenAI function calling agent.
This document agent can dynamically choose to perform semantic search or summarization within a given document.
We create a separate document agent for each city.
from llama_index.agent.openai import OpenAIAgent
from llama_index.core import (
load_index_from_storage,
StorageContext,
VectorStoreIndex,
)
from llama_index.core import SummaryIndex
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.node_parser import SentenceSplitter
import os
from tqdm.notebook import tqdm
import pickle
async def build_agent_per_doc(nodes, file_base):
print(file_base)
vi_out_path = f"./data/llamaindex_docs/{file_base}"
summary_out_path = f"./data/llamaindex_docs/{file_base}_summary.pkl"
if not os.path.exists(vi_out_path):
Path("./data/llamaindex_docs/").mkdir(parents=True, exist_ok=True)
# build vector index
vector_index = VectorStoreIndex(nodes)
vector_index.storage_context.persist(persist_dir=vi_out_path)
else:
vector_index = load_index_from_storage(
StorageContext.from_defaults(persist_dir=vi_out_path),
)
# build summary index
summary_index = SummaryIndex(nodes)
# define query engines
vector_query_engine = vector_index.as_query_engine(llm=llm)
summary_query_engine = summary_index.as_query_engine(
response_mode="tree_summarize", llm=llm
)
# extract a summary
if not os.path.exists(summary_out_path):
Path(summary_out_path).parent.mkdir(parents=True, exist_ok=True)
summary = str(
await summary_query_engine.aquery(
"Extract a concise 1-2 line summary of this document"
)
)
pickle.dump(summary, open(summary_out_path, "wb"))
else:
summary = pickle.load(open(summary_out_path, "rb"))
# define tools
query_engine_tools = [
QueryEngineTool(
query_engine=vector_query_engine,
metadata=ToolMetadata(
name=f"vector_tool_{file_base}",
description=f"Useful for questions related to specific facts",
),
),
QueryEngineTool(
query_engine=summary_query_engine,
metadata=ToolMetadata(
name=f"summary_tool_{file_base}",
description=f"Useful for summarization questions",
),
),
]
# build agent
function_llm = OpenAI(model="gpt-4")
agent = OpenAIAgent.from_tools(
query_engine_tools,
llm=function_llm,
verbose=True,
system_prompt=f"""\
You are a specialized agent designed to answer queries about the `{file_base}.html` part of the LlamaIndex docs.
You must ALWAYS use at least one of the tools provided when answering a question; do NOT rely on prior knowledge.\
""",
)
return agent, summary
async def build_agents(docs):
node_parser = SentenceSplitter()
# Build agents dictionary
agents_dict = {}
extra_info_dict = {}
# # this is for the baseline
# all_nodes = []
for idx, doc in enumerate(tqdm(docs)):
nodes = node_parser.get_nodes_from_documents([doc])
# all_nodes.extend(nodes)
# ID will be base + parent
file_path = Path(doc.metadata["path"])
file_base = str(file_path.parent.stem) + "_" + str(file_path.stem)
agent, summary = await build_agent_per_doc(nodes, file_base)
agents_dict[file_base] = agent
extra_info_dict[file_base] = {"summary": summary, "nodes": nodes}
return agents_dict, extra_info_dict
agents_dict, extra_info_dict = await build_agents(docs)
0%| | 0/101 [00:00<?, ?it/s]
latest_search latest_genindex latest_index community_frequently_asked_questions community_integrations community_full_stack_projects integrations_tonicvalidate integrations_using_with_langchain integrations_trulens integrations_deepeval integrations_managed_indices integrations_chatgpt_plugins integrations_graphsignal integrations_lmformatenforcer integrations_graph_stores integrations_vector_stores integrations_fleet_libraries_context llama_packs_root faq_vector_database faq_query_engines faq_embeddings faq_chat_engines faq_llms getting_started_reading getting_started_discover_llamaindex getting_started_starter_example getting_started_customization getting_started_installation getting_started_concepts api_reference_storage api_reference_multi_modal api_reference_response api_reference_callbacks api_reference_node api_reference_readers api_reference_agents api_reference_query api_reference_example_notebooks api_reference_node_postprocessor api_reference_composability api_reference_llm_predictor api_reference_service_context api_reference_prompts api_reference_struct_store api_reference_indices api_reference_evaluation api_reference_index api_reference_memory api_reference_llms langchain_integrations_base storage_kv_store storage_vector_store storage_index_store storage_docstore storage_indices_save_load multi_modal_openai query_query_transform query_query_bundle query_retrievers query_response_synthesizer query_query_engines retrievers_vector_store retrievers_empty retrievers_transform retrievers_tree retrievers_kg retrievers_table query_engines_graph_query_engine query_engines_citation_query_engine query_engines_pandas_query_engine query_engines_retriever_query_engine query_engines_knowledge_graph_query_engine query_engines_retriever_router_query_engine query_engines_sub_question_query_engine query_engines_flare_query_engine query_engines_sql_query_engine query_engines_transform_query_engine query_engines_sql_join_query_engine query_engines_router_query_engine chat_engines_condense_question_chat_engine chat_engines_simple_chat_engine chat_engines_condense_plus_context_chat_engine indices_vector_store indices_tree indices_kg indices_list indices_table indices_struct_store service_context_embeddings service_context_prompt_helper llms_xinference llms_langchain llms_replicate llms_llama_cpp llms_azure_openai llms_openai llms_predibase llms_gradient_model_adapter llms_anthropic llms_openllm llms_huggingface
Build Retriever-Enabled OpenAI Agent¶
We build a top-level agent that can orchestrate across the different document agents to answer any user query.
This RetrieverOpenAIAgent
performs tool retrieval before tool use (unlike a default agent that tries to put all tools in the prompt).
Improvements from V0: We make the following improvements compared to the "base" version in V0.
- Adding in reranking: we use Cohere reranker to better filter the candidate set of documents.
- Adding in a query planning tool: we add an explicit query planning tool that's dynamically created based on the set of retrieved tools.
# define tool for each document agent
all_tools = []
for file_base, agent in agents_dict.items():
summary = extra_info_dict[file_base]["summary"]
doc_tool = QueryEngineTool(
query_engine=agent,
metadata=ToolMetadata(
name=f"tool_{file_base}",
description=summary,
),
)
all_tools.append(doc_tool)
print(all_tools[0].metadata)
ToolMetadata(description='This document provides examples and instructions for searching within the Llama Index documentation.', name='tool_latest_search', fn_schema=<class 'llama_index.tools.types.DefaultToolFnSchema'>)
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import (
ObjectIndex,
SimpleToolNodeMapping,
ObjectRetriever,
)
from llama_index.core.retrievers import BaseRetriever
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.llms.openai import OpenAI
llm = OpenAI(model_name="gpt-4-0613")
tool_mapping = SimpleToolNodeMapping.from_objects(all_tools)
obj_index = ObjectIndex.from_objects(
all_tools,
tool_mapping,
VectorStoreIndex,
)
vector_node_retriever = obj_index.as_node_retriever(similarity_top_k=10)
# define a custom retriever with reranking
class CustomRetriever(BaseRetriever):
def __init__(self, vector_retriever, postprocessor=None):
self._vector_retriever = vector_retriever
self._postprocessor = postprocessor or CohereRerank(top_n=5)
super().__init__()
def _retrieve(self, query_bundle):
retrieved_nodes = self._vector_retriever.retrieve(query_bundle)
filtered_nodes = self._postprocessor.postprocess_nodes(
retrieved_nodes, query_bundle=query_bundle
)
return filtered_nodes
# define a custom object retriever that adds in a query planning tool
class CustomObjectRetriever(ObjectRetriever):
def __init__(self, retriever, object_node_mapping, all_tools, llm=None):
self._retriever = retriever
self._object_node_mapping = object_node_mapping
self._llm = llm or OpenAI("gpt-4-0613")
def retrieve(self, query_bundle):
nodes = self._retriever.retrieve(query_bundle)
tools = [self._object_node_mapping.from_node(n.node) for n in nodes]
sub_question_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=tools, llm=self._llm
)
sub_question_description = f"""\
Useful for any queries that involve comparing multiple documents. ALWAYS use this tool for comparison queries - make sure to call this \
tool with the original query. Do NOT use the other tools for any queries involving multiple documents.
"""
sub_question_tool = QueryEngineTool(
query_engine=sub_question_engine,
metadata=ToolMetadata(
name="compare_tool", description=sub_question_description
),
)
return tools + [sub_question_tool]
custom_node_retriever = CustomRetriever(vector_node_retriever)
# wrap it with ObjectRetriever to return objects
custom_obj_retriever = CustomObjectRetriever(
custom_node_retriever, tool_mapping, all_tools, llm=llm
)
tmps = custom_obj_retriever.retrieve("hello")
print(len(tmps))
6
from llama_index.agent.openai_legacy import FnRetrieverOpenAIAgent
from llama_index.core.agent import ReActAgent
top_agent = FnRetrieverOpenAIAgent.from_retriever(
custom_obj_retriever,
system_prompt=""" \
You are an agent designed to answer queries about the documentation.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\
""",
llm=llm,
verbose=True,
)
# top_agent = ReActAgent.from_tools(
# tool_retriever=custom_obj_retriever,
# system_prompt=""" \
# You are an agent designed to answer queries about the documentation.
# Please always use the tools provided to answer a question. Do not rely on prior knowledge.\
# """,
# llm=llm,
# verbose=True,
# )
Define Baseline Vector Store Index¶
As a point of comparison, we define a "naive" RAG pipeline which dumps all docs into a single vector index collection.
We set the top_k = 4
all_nodes = [
n for extra_info in extra_info_dict.values() for n in extra_info["nodes"]
]
base_index = VectorStoreIndex(all_nodes)
base_query_engine = base_index.as_query_engine(similarity_top_k=4)
Running Example Queries¶
Let's run some example queries, ranging from QA / summaries over a single document to QA / summarization over multiple documents.
response = top_agent.query(
"Tell me about the different types of evaluation in LlamaIndex"
)
=== Calling Function === Calling function: tool_api_reference_evaluation with args: { "input": "types of evaluation" } === Calling Function === Calling function: vector_tool_api_reference_evaluation with args: { "input": "types of evaluation" } Got output: The types of evaluation can include correctness evaluation, faithfulness evaluation, guideline evaluation, hit rate evaluation, MRR (Mean Reciprocal Rank) evaluation, pairwise comparison evaluation, relevancy evaluation, and response evaluation. ======================== Got output: The types of evaluation mentioned in the `api_reference_evaluation.html` part of the LlamaIndex docs include: 1. Correctness Evaluation 2. Faithfulness Evaluation 3. Guideline Evaluation 4. Hit Rate Evaluation 5. MRR (Mean Reciprocal Rank) Evaluation 6. Pairwise Comparison Evaluation 7. Relevancy Evaluation 8. Response Evaluation ========================
print(response)
There are several types of evaluation in LlamaIndex: 1. Correctness Evaluation: This type of evaluation measures the accuracy of the retrieval results. It checks if the retrieved documents are correct and relevant to the query. 2. Faithfulness Evaluation: Faithfulness evaluation measures how faithfully the retrieved documents represent the original data. It checks if the retrieved documents accurately reflect the information in the original documents. 3. Guideline Evaluation: Guideline evaluation involves comparing the retrieval results against a set of guidelines or ground truth. It checks if the retrieval results align with the expected or desired outcomes. 4. Hit Rate Evaluation: Hit rate evaluation measures the percentage of queries that return at least one relevant document. It is a binary evaluation metric that indicates the effectiveness of the retrieval system in finding relevant documents. 5. MRR (Mean Reciprocal Rank) Evaluation: MRR evaluation measures the average rank of the first relevant document in the retrieval results. It provides a single value that represents the effectiveness of the retrieval system in ranking relevant documents. 6. Pairwise Comparison Evaluation: Pairwise comparison evaluation involves comparing the retrieval results of different systems or algorithms. It helps determine which system performs better in terms of retrieval accuracy and relevance. 7. Relevancy Evaluation: Relevancy evaluation measures the relevance of the retrieved documents to the query. It can be done using various metrics such as precision, recall, and F1 score. 8. Response Evaluation: Response evaluation measures the quality of the response generated by the retrieval system. It checks if the response is informative, accurate, and helpful to the user. These evaluation types help assess the performance and effectiveness of the retrieval system in LlamaIndex.
# baseline
response = base_query_engine.query(
"Tell me about the different types of evaluation in LlamaIndex"
)
print(str(response))
LlamaIndex utilizes various types of evaluation methods to assess its performance and effectiveness. These evaluation methods include RelevancyEvaluator, RetrieverEvaluator, SemanticSimilarityEvaluator, PairwiseComparisonEvaluator, CorrectnessEvaluator, FaithfulnessEvaluator, and GuidelineEvaluator. Each of these evaluators serves a specific purpose in evaluating different aspects of the LlamaIndex system.
response = top_agent.query(
"Compare the content in the contributions page vs. index page."
)
=== Calling Function === Calling function: compare_tool with args: { "input": "content in the contributions page vs. index page" } Generated 2 sub questions. [tool_development_contributing] Q: What is the content of the contributions page? [tool_latest_index] Q: What is the content of the index page? === Calling Function === Calling function: summary_tool_development_contributing with args: { "input": "development_contributing.html" } === Calling Function === Calling function: vector_tool_latest_index with args: { "input": "content of the index page" } Got output: The development_contributing.html file provides information on how to contribute to LlamaIndex. It includes guidelines on what to work on, such as extending core modules, fixing bugs, adding usage examples, adding experimental features, and improving code quality and documentation. The file also provides details on each module, including data loaders, node parsers, text splitters, document/index/KV stores, managed index, vector stores, retrievers, query engines, query transforms, token usage optimizers, node postprocessors, and output parsers. Additionally, the file includes a development guideline section that covers environment setup, validating changes, formatting/linting, testing, creating example notebooks, and creating a pull request. ======================== Got output: The content of the index page provides information about LlamaIndex, a data framework for LLM applications. It explains why LlamaIndex is useful for augmenting LLM models with private or domain-specific data that may be distributed across different applications and data stores. LlamaIndex offers tools such as data connectors, data indexes, engines, and data agents to ingest, structure, and access data. It is designed for beginners as well as advanced users who can customize and extend its modules. The page also provides installation instructions, tutorials, and links to the LlamaIndex ecosystem and associated projects. ======================== [tool_latest_index] A: The content of the `latest_index.html` page provides comprehensive information about LlamaIndex, a data framework for LLM applications. It explains the utility of LlamaIndex in augmenting LLM models with private or domain-specific data that may be distributed across different applications and data stores. The page details the tools offered by LlamaIndex, such as data connectors, data indexes, engines, and data agents, which are used to ingest, structure, and access data. It is designed to cater to both beginners and advanced users, with the flexibility to customize and extend its modules. Additionally, the page provides installation instructions and tutorials for users. It also includes links to the LlamaIndex ecosystem and associated projects for further exploration and understanding. [tool_development_contributing] A: The `development_contributing.html` page of the LlamaIndex docs provides comprehensive information on how to contribute to the project. It includes guidelines on the areas to focus on, such as extending core modules, fixing bugs, adding usage examples, adding experimental features, and improving code quality and documentation. The page also provides detailed information on each module, including data loaders, node parsers, text splitters, document/index/KV stores, managed index, vector stores, retrievers, query engines, query transforms, token usage optimizers, node postprocessors, and output parsers. In addition, there is a development guideline section that covers various aspects of the development process, including environment setup, validating changes, formatting/linting, testing, creating example notebooks, and creating a pull request. Got output: The content in the contributions page of the LlamaIndex documentation provides comprehensive information on how to contribute to the project, including guidelines on areas to focus on and detailed information on each module. It also covers various aspects of the development process. On the other hand, the content in the index page of the LlamaIndex documentation provides comprehensive information about LlamaIndex itself, explaining its utility in augmenting LLM models with private or domain-specific data. It details the tools offered by LlamaIndex and provides installation instructions, tutorials, and links to the LlamaIndex ecosystem and associated projects. ========================
print(response)
The contributions page of the LlamaIndex documentation provides guidelines for contributing to LlamaIndex, including extending core modules, fixing bugs, adding usage examples, adding experimental features, and improving code quality and documentation. It also includes information on the environment setup, validating changes, formatting and linting, testing, creating example notebooks, and creating a pull request. On the other hand, the index page of the LlamaIndex documentation provides information about LlamaIndex itself. It explains that LlamaIndex is a data framework that allows LLM applications to ingest, structure, and access private or domain-specific data. It provides tools such as data connectors, data indexes, engines, data agents, and application integrations. The index page also mentions that LlamaIndex is designed for beginners, advanced users, and everyone in between, and offers both high-level and lower-level APIs for customization. It provides installation instructions, links to the GitHub and PyPi repositories, and information about the LlamaIndex community on Twitter and Discord. In summary, the contributions page focuses on contributing to LlamaIndex, while the index page provides an overview of LlamaIndex and its features.
response = top_agent.query(
"Can you compare the tree index and list index at a very high-level?"
)
print(str(response))
At a high level, the Tree Index and List Index are two different types of indexes used in the system. The Tree Index is a tree-structured index that is built specifically for each query. It allows for the construction of a query-specific tree from leaf nodes to return a response. The Tree Index is designed to provide a more optimized and efficient way of retrieving nodes based on a query. On the other hand, the List Index is a keyword table index that supports operations such as inserting and deleting documents, retrieving nodes based on a query, and refreshing the index with updated documents. The List Index is a simpler index that uses a keyword table approach for retrieval. Both indexes have their own advantages and use cases. The choice between them depends on the specific requirements and constraints of the system.