Multi-Document Agents (V1)¶
In this guide, you learn towards setting up a multi-document agent over the LlamaIndex documentation.
This is an extension of V0 multi-document agents with the additional features:
- Reranking during document (tool) retrieval
- Query planning tool that the agent can use to plan
We do this with the following architecture:
- setup a "document agent" over each Document: each doc agent can do QA/summarization within its doc
- setup a top-level agent over this set of document agents. Do tool retrieval and then do CoT over the set of tools to answer a question.
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
%pip install llama-index-core
%pip install llama-index-agent-openai
%pip install llama-index-readers-file
%pip install llama-index-postprocessor-cohere-rerank
%pip install llama-index-llms-openai
%pip install llama-index-embeddings-openai
%pip install unstructured[html]
%load_ext autoreload
%autoreload 2
Setup and Download Data¶
In this section, we'll load in the LlamaIndex documentation.
domain = "docs.llamaindex.ai"
docs_url = "https://docs.llamaindex.ai/en/latest/"
!wget -e robots=off --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains {domain} --no-parent {docs_url}
from llama_index.readers.file import UnstructuredReader
reader = UnstructuredReader()
from pathlib import Path
all_files_gen = Path("./docs.llamaindex.ai/").rglob("*")
all_files = [f.resolve() for f in all_files_gen]
all_html_files = [f for f in all_files if f.suffix.lower() == ".html"]
len(all_html_files)
1219
from llama_index.core import Document
# TODO: set to higher value if you want more docs
doc_limit = 100
docs = []
for idx, f in enumerate(all_html_files):
if idx > doc_limit:
break
print(f"Idx {idx}/{len(all_html_files)}")
loaded_docs = reader.load_data(file=f, split_documents=True)
# Hardcoded Index. Everything before this is ToC for all pages
start_idx = 72
loaded_doc = Document(
text="\n\n".join([d.get_content() for d in loaded_docs[72:]]),
metadata={"path": str(f)},
)
print(loaded_doc.metadata["path"])
docs.append(loaded_doc)
Define Global LLM + Embeddings
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
import nest_asyncio
nest_asyncio.apply()
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings
llm = OpenAI(model="gpt-3.5-turbo")
Settings.llm = llm
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small", embed_batch_size=256
)
Building Multi-Document Agents¶
In this section we show you how to construct the multi-document agent. We first build a document agent for each document, and then define the top-level parent agent with an object index.
Build Document Agent for each Document¶
In this section we define "document agents" for each document.
We define both a vector index (for semantic search) and summary index (for summarization) for each document. The two query engines are then converted into tools that are passed to an OpenAI function calling agent.
This document agent can dynamically choose to perform semantic search or summarization within a given document.
We create a separate document agent for each city.
from llama_index.agent.openai import OpenAIAgent
from llama_index.core import (
load_index_from_storage,
StorageContext,
VectorStoreIndex,
)
from llama_index.core import SummaryIndex
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.node_parser import SentenceSplitter
import os
from tqdm.notebook import tqdm
import pickle
async def build_agent_per_doc(nodes, file_base):
print(file_base)
vi_out_path = f"./data/llamaindex_docs/{file_base}"
summary_out_path = f"./data/llamaindex_docs/{file_base}_summary.pkl"
if not os.path.exists(vi_out_path):
Path("./data/llamaindex_docs/").mkdir(parents=True, exist_ok=True)
# build vector index
vector_index = VectorStoreIndex(nodes)
vector_index.storage_context.persist(persist_dir=vi_out_path)
else:
vector_index = load_index_from_storage(
StorageContext.from_defaults(persist_dir=vi_out_path),
)
# build summary index
summary_index = SummaryIndex(nodes)
# define query engines
vector_query_engine = vector_index.as_query_engine(llm=llm)
summary_query_engine = summary_index.as_query_engine(
response_mode="tree_summarize", llm=llm
)
# extract a summary
if not os.path.exists(summary_out_path):
Path(summary_out_path).parent.mkdir(parents=True, exist_ok=True)
summary = str(
await summary_query_engine.aquery(
"Extract a concise 1-2 line summary of this document"
)
)
pickle.dump(summary, open(summary_out_path, "wb"))
else:
summary = pickle.load(open(summary_out_path, "rb"))
# define tools
query_engine_tools = [
QueryEngineTool(
query_engine=vector_query_engine,
metadata=ToolMetadata(
name=f"vector_tool_{file_base}",
description=f"Useful for questions related to specific facts",
),
),
QueryEngineTool(
query_engine=summary_query_engine,
metadata=ToolMetadata(
name=f"summary_tool_{file_base}",
description=f"Useful for summarization questions",
),
),
]
# build agent
function_llm = OpenAI(model="gpt-4")
agent = OpenAIAgent.from_tools(
query_engine_tools,
llm=function_llm,
verbose=True,
system_prompt=f"""\
You are a specialized agent designed to answer queries about the `{file_base}.html` part of the LlamaIndex docs.
You must ALWAYS use at least one of the tools provided when answering a question; do NOT rely on prior knowledge.\
""",
)
return agent, summary
async def build_agents(docs):
node_parser = SentenceSplitter()
# Build agents dictionary
agents_dict = {}
extra_info_dict = {}
# # this is for the baseline
# all_nodes = []
for idx, doc in enumerate(tqdm(docs)):
nodes = node_parser.get_nodes_from_documents([doc])
# all_nodes.extend(nodes)
# ID will be base + parent
file_path = Path(doc.metadata["path"])
file_base = str(file_path.parent.stem) + "_" + str(file_path.stem)
agent, summary = await build_agent_per_doc(nodes, file_base)
agents_dict[file_base] = agent
extra_info_dict[file_base] = {"summary": summary, "nodes": nodes}
return agents_dict, extra_info_dict
agents_dict, extra_info_dict = await build_agents(docs)
Build Retriever-Enabled OpenAI Agent¶
We build a top-level agent that can orchestrate across the different document agents to answer any user query.
This RetrieverOpenAIAgent
performs tool retrieval before tool use (unlike a default agent that tries to put all tools in the prompt).
Improvements from V0: We make the following improvements compared to the "base" version in V0.
- Adding in reranking: we use Cohere reranker to better filter the candidate set of documents.
- Adding in a query planning tool: we add an explicit query planning tool that's dynamically created based on the set of retrieved tools.
# define tool for each document agent
all_tools = []
for file_base, agent in agents_dict.items():
summary = extra_info_dict[file_base]["summary"]
doc_tool = QueryEngineTool(
query_engine=agent,
metadata=ToolMetadata(
name=f"tool_{file_base}",
description=summary,
),
)
all_tools.append(doc_tool)
print(all_tools[0].metadata)
ToolMetadata(description='This document provides examples and documentation for an agent on the llama index platform.', name='tool_latest_index', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>)
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import (
ObjectIndex,
ObjectRetriever,
)
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.core.schema import QueryBundle
from llama_index.llms.openai import OpenAI
llm = OpenAI(model_name="gpt-4-0613")
obj_index = ObjectIndex.from_objects(
all_tools,
index_cls=VectorStoreIndex,
)
vector_node_retriever = obj_index.as_node_retriever(
similarity_top_k=10,
)
# define a custom object retriever that adds in a query planning tool
class CustomObjectRetriever(ObjectRetriever):
def __init__(
self,
retriever,
object_node_mapping,
node_postprocessors=None,
llm=None,
):
self._retriever = retriever
self._object_node_mapping = object_node_mapping
self._llm = llm or OpenAI("gpt-4-0613")
self._node_postprocessors = node_postprocessors or []
def retrieve(self, query_bundle):
if isinstance(query_bundle, str):
query_bundle = QueryBundle(query_str=query_bundle)
nodes = self._retriever.retrieve(query_bundle)
for processor in self._node_postprocessors:
nodes = processor.postprocess_nodes(
nodes, query_bundle=query_bundle
)
tools = [self._object_node_mapping.from_node(n.node) for n in nodes]
sub_question_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=tools, llm=self._llm
)
sub_question_description = f"""\
Useful for any queries that involve comparing multiple documents. ALWAYS use this tool for comparison queries - make sure to call this \
tool with the original query. Do NOT use the other tools for any queries involving multiple documents.
"""
sub_question_tool = QueryEngineTool(
query_engine=sub_question_engine,
metadata=ToolMetadata(
name="compare_tool", description=sub_question_description
),
)
return tools + [sub_question_tool]
# wrap it with ObjectRetriever to return objects
custom_obj_retriever = CustomObjectRetriever(
vector_node_retriever,
obj_index.object_node_mapping,
node_postprocessors=[CohereRerank(top_n=5)],
llm=llm,
)
tmps = custom_obj_retriever.retrieve("hello")
# should be 5 + 1 -- 5 from reranker, 1 from subquestion
print(len(tmps))
6
from llama_index.agent.openai import OpenAIAgent
from llama_index.core.agent import ReActAgent
top_agent = OpenAIAgent.from_tools(
tool_retriever=custom_obj_retriever,
system_prompt=""" \
You are an agent designed to answer queries about the documentation.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\
""",
llm=llm,
verbose=True,
)
# top_agent = ReActAgent.from_tools(
# tool_retriever=custom_obj_retriever,
# system_prompt=""" \
# You are an agent designed to answer queries about the documentation.
# Please always use the tools provided to answer a question. Do not rely on prior knowledge.\
# """,
# llm=llm,
# verbose=True,
# )
Define Baseline Vector Store Index¶
As a point of comparison, we define a "naive" RAG pipeline which dumps all docs into a single vector index collection.
We set the top_k = 4
all_nodes = [
n for extra_info in extra_info_dict.values() for n in extra_info["nodes"]
]
base_index = VectorStoreIndex(all_nodes)
base_query_engine = base_index.as_query_engine(similarity_top_k=4)
Running Example Queries¶
Let's run some example queries, ranging from QA / summaries over a single document to QA / summarization over multiple documents.
response = top_agent.query(
"What types of agents are available in LlamaIndex?",
)
Added user message to memory: What types of agents are available in LlamaIndex? === Calling Function === Calling function: tool_agents_index with args: {"input":"types of agents"} Added user message to memory: types of agents === Calling Function === Calling function: vector_tool_agents_index with args: { "input": "types of agents" } Got output: The types of agents mentioned in the provided context are ReActAgent, Native OpenAIAgent, OpenAIAgent with Query Engine Tools, OpenAIAgent Query Planning, OpenAI Assistant, OpenAI Assistant Cookbook, Forced Function Calling, Parallel Function Calling, and Context Retrieval. ======================== Got output: The types of agents mentioned in the `agents_index.html` part of the LlamaIndex docs are: 1. ReActAgent 2. Native OpenAIAgent 3. OpenAIAgent with Query Engine Tools 4. OpenAIAgent Query Planning 5. OpenAI Assistant 6. OpenAI Assistant Cookbook 7. Forced Function Calling 8. Parallel Function Calling 9. Context Retrieval ========================
print(response)
The types of agents available in LlamaIndex include ReActAgent, Native OpenAIAgent, OpenAIAgent with Query Engine Tools, OpenAIAgent Query Planning, OpenAI Assistant, OpenAI Assistant Cookbook, Forced Function Calling, Parallel Function Calling, and Context Retrieval.
# baseline
response = base_query_engine.query(
"What types of agents are available in LlamaIndex?",
)
print(str(response))
The types of agents available in LlamaIndex are ReActAgent, Native OpenAIAgent, and OpenAIAgent.
response = top_agent.query(
"Compare the content in the agents page vs. tools page."
)
Added user message to memory: Compare the content in the agents page vs. tools page. === Calling Function === Calling function: compare_tool with args: {"input":"agents vs tools"} Generated 2 sub questions. [tool_understanding_index] Q: What are the functionalities of agents in the Llama Index platform? Added user message to memory: What are the functionalities of agents in the Llama Index platform? [tool_understanding_index] Q: How do agents differ from tools in the Llama Index platform? Added user message to memory: How do agents differ from tools in the Llama Index platform? === Calling Function === Calling function: vector_tool_understanding_index with args: { "input": "difference between agents and tools" } === Calling Function === Calling function: vector_tool_understanding_index with args: { "input": "functionalities of agents" } Got output: Agents are typically individuals or entities that act on behalf of others, making decisions and taking actions based on predefined rules or instructions. On the other hand, tools are instruments or devices used to carry out specific functions or tasks, often under the control or direction of an agent. ======================== Got output: Agents typically have a range of functionalities that allow them to perform tasks autonomously or semi-autonomously. These functionalities may include data collection, analysis, decision-making, communication with other systems or users, and executing specific actions based on predefined rules or algorithms. ======================== [tool_understanding_index] A: In the context of the Llama Index platform, agents are entities that make decisions and take actions based on predefined rules or instructions. They are designed to interact with users, understand their queries, and provide appropriate responses. On the other hand, tools are instruments or devices that are used to perform specific functions or tasks. They are typically controlled or directed by an agent and do not make decisions on their own. They are used to assist the agents in providing accurate and relevant responses to user queries. [tool_understanding_index] A: In the Llama Index platform, agents have a variety of functionalities. They can perform tasks autonomously or semi-autonomously. These tasks include data collection and analysis, making decisions, communicating with other systems or users, and executing specific actions. These actions are based on predefined rules or algorithms. Got output: Agents in the Llama Index platform are responsible for making decisions and taking actions based on predefined rules or instructions. They interact with users, understand queries, and provide appropriate responses. On the other hand, tools in the platform are instruments or devices used to perform specific functions or tasks. Unlike agents, tools are typically controlled or directed by an agent and do not make decisions independently. Their role is to assist agents in delivering accurate and relevant responses to user queries. ========================
print(response)
The comparison between the content in the agents page and the tools page highlights the difference in their roles and functionalities. Agents on the Llama Index platform are responsible for decision-making and interacting with users, while tools are instruments used to perform specific functions or tasks, controlled by agents to assist in providing responses.
response = top_agent.query(
"Can you compare the compact and tree_summarize response synthesizer response modes at a very high-level?"
)
Added user message to memory: Can you compare the compact and tree_summarize response synthesizer response modes at a very high-level? === Calling Function === Calling function: compare_tool with args: {"input":"Compare the compact and tree_summarize response synthesizer response modes at a very high-level."} Generated 4 sub questions. [tool_querying_index] Q: What are the key differences between the compact and tree_summarize response synthesizer response modes? Added user message to memory: What are the key differences between the compact and tree_summarize response synthesizer response modes? [tool_querying_index] Q: How does the compact response synthesizer response mode optimize query logic and response quality? Added user message to memory: How does the compact response synthesizer response mode optimize query logic and response quality? [tool_querying_index] Q: How does the tree_summarize response synthesizer response mode optimize query logic and response quality? Added user message to memory: How does the tree_summarize response synthesizer response mode optimize query logic and response quality? [tool_evaluating_index] Q: What are the guidelines for evaluating retrievals in the context of response synthesizer response modes? Added user message to memory: What are the guidelines for evaluating retrievals in the context of response synthesizer response modes? === Calling Function === Calling function: vector_tool_querying_index with args: { "input": "compact response synthesizer response mode" } === Calling Function === Calling function: summary_tool_querying_index with args: { "input": "tree_summarize response synthesizer response mode" } === Calling Function === Calling function: vector_tool_querying_index with args: { "input": "compact vs tree_summarize response synthesizer response modes" } === Calling Function === Calling function: vector_tool_evaluating_index with args: { "input": "evaluating retrievals response synthesizer response modes" } Got output: The response modes for the response synthesizer include "compact" and "tree_summarize". ======================== Got output: The response mode "tree_summarize" in the response synthesizer configures the system to recursively construct a tree from a set of Node objects and the query, returning the root node as the final response. This mode is particularly useful for summarization purposes. ======================== Got output: "compact" the prompt during each LLM call by stuffing as many Node text chunks that can fit within the maximum prompt size. If there are too many chunks to stuff in one prompt, "create and refine" an answer by going through multiple prompts. ======================== === Calling Function === Calling function: summary_tool_querying_index with args: { "input": "compact vs tree_summarize response synthesizer response modes" } Got output: Response synthesizer response modes can be evaluated by comparing what was retrieved for a query to a set of nodes that were expected to be retrieved. This evaluation process typically involves analyzing metrics such as Mean Reciprocal Rank (MRR) and Hit Rate. It is important to evaluate a batch of retrievals to get a comprehensive understanding of the performance. If you are making calls to a hosted, remote LLM, you may also want to consider analyzing the cost implications of your application. ======================== Got output: The response modes for the response synthesizer include "compact" and "tree_summarize". ======================== [tool_querying_index] A: The compact response synthesizer response mode optimizes query logic and response quality by compacting the prompt during each LLM call. It does this by stuffing as many Node text chunks that can fit within the maximum prompt size. If there are too many chunks to fit in one prompt, it will "create and refine" an answer by going through multiple prompts. This approach allows for a more efficient use of the prompt space and can lead to more refined and accurate responses. [tool_querying_index] A: The "tree_summarize" response synthesizer response mode optimizes query logic and response quality by recursively constructing a tree from a set of Node objects and the query. This approach allows the system to handle complex queries and generate comprehensive responses. The root node, which is returned as the final response, contains a summarized version of the information, making it easier for users to understand the response. This mode is particularly useful for summarization purposes, where the goal is to provide a concise yet comprehensive answer to a query. [tool_evaluating_index] A: When evaluating retrievals in the context of response synthesizer response modes, you should compare what was retrieved for a query to a set of nodes that were expected to be retrieved. This evaluation process typically involves analyzing metrics such as Mean Reciprocal Rank (MRR) and Hit Rate. It's crucial to evaluate a batch of retrievals to get a comprehensive understanding of the performance. If you are making calls to a hosted, remote LLM, you may also want to consider analyzing the cost implications of your application. [tool_querying_index] A: The "compact" and "tree_summarize" are two different response modes for the response synthesizer in LlamaIndex. The "compact" mode provides a more concise response, focusing on delivering the most relevant information in a compact format. This mode is useful when you want a brief and direct answer to your query. On the other hand, the "tree_summarize" mode provides a more detailed and structured response. It breaks down the information into a tree-like structure, making it easier to understand the relationships and hierarchy of the information. This mode is useful when you want a comprehensive understanding of the query topic. Got output: The "compact" response synthesizer mode focuses on providing a concise and direct response, while the "tree_summarize" mode offers a more detailed and structured response by breaking down information into a tree-like structure. The compact mode aims to deliver the most relevant information in a compact format, suitable for brief answers, whereas the tree_summarize mode is designed to provide a comprehensive understanding of the query topic by presenting information in a hierarchical manner. ========================
print(str(response))
The "compact" response synthesizer mode provides concise and direct responses, while the "tree_summarize" mode offers detailed and structured responses in a tree-like format. The compact mode is suitable for brief answers, while the tree_summarize mode presents information hierarchically for a comprehensive understanding of the query topic.