Multi-Document Agents (V1)#

In this guide, you learn towards setting up a multi-document agent over the LlamaIndex documentation.

This is an extension of V0 multi-document agents with the additional features:

Reranking during document (tool) retrieval
Query planning tool that the agent can use to plan

We do this with the following architecture:

setup a “document agent” over each Document: each doc agent can do QA/summarization within its doc
setup a top-level agent over this set of document agents. Do tool retrieval and then do CoT over the set of tools to answer a question.

If you’re opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

%pip install llama-index-agent-openai
%pip install llama-index-readers-file
%pip install llama-index-postprocessor-cohere-rerank
%pip install llama-index-llms-openai
%pip install llama-index-embeddings-openai

!pip install llama-index llama-hub

%load_ext autoreload
%autoreload 2

Setup and Download Data#

In this section, we’ll load in the LlamaIndex documentation.

domain = "docs.llamaindex.ai"
docs_url = "https://docs.llamaindex.ai/en/latest/"
!wget -e robots=off --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --domains {domain} --no-parent {docs_url}

from llama_index.readers.file import UnstructuredReader

reader = UnstructuredReader()

from pathlib import Path

all_files_gen = Path("./docs.llamaindex.ai/").rglob("*")
all_files = [f.resolve() for f in all_files_gen]

all_html_files = [f for f in all_files if f.suffix.lower() == ".html"]

len(all_html_files)

from llama_index.core import Document

# TODO: set to higher value if you want more docs
doc_limit = 100

docs = []
for idx, f in enumerate(all_html_files):
    if idx > doc_limit:
        break
    print(f"Idx {idx}/{len(all_html_files)}")
    loaded_docs = reader.load_data(file=f, split_documents=True)
    # Hardcoded Index. Everything before this is ToC for all pages
    start_idx = 72
    loaded_doc = Document(
        text="\n\n".join([d.get_content() for d in loaded_docs[72:]]),
        metadata={"path": str(f)},
    )
    print(loaded_doc.metadata["path"])
    docs.append(loaded_doc)

Define Global LLM + Embeddings

import os

os.environ["OPENAI_API_KEY"] = "sk-..."

import nest_asyncio

nest_asyncio.apply()

from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import Settings

Settings.llm = OpenAI(model="gpt-3.5-turbo")
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

Building Multi-Document Agents#

In this section we show you how to construct the multi-document agent. We first build a document agent for each document, and then define the top-level parent agent with an object index.

Build Document Agent for each Document#

In this section we define “document agents” for each document.

We define both a vector index (for semantic search) and summary index (for summarization) for each document. The two query engines are then converted into tools that are passed to an OpenAI function calling agent.

This document agent can dynamically choose to perform semantic search or summarization within a given document.

We create a separate document agent for each city.

from llama_index.agent.openai import OpenAIAgent
from llama_index.core import (
    load_index_from_storage,
    StorageContext,
    VectorStoreIndex,
)
from llama_index.core import SummaryIndex
from llama_index.core.tools import QueryEngineTool, ToolMetadata
from llama_index.core.node_parser import SentenceSplitter
import os
from tqdm.notebook import tqdm
import pickle


async def build_agent_per_doc(nodes, file_base):
    print(file_base)

    vi_out_path = f"./data/llamaindex_docs/{file_base}"
    summary_out_path = f"./data/llamaindex_docs/{file_base}_summary.pkl"
    if not os.path.exists(vi_out_path):
        Path("./data/llamaindex_docs/").mkdir(parents=True, exist_ok=True)
        # build vector index
        vector_index = VectorStoreIndex(nodes)
        vector_index.storage_context.persist(persist_dir=vi_out_path)
    else:
        vector_index = load_index_from_storage(
            StorageContext.from_defaults(persist_dir=vi_out_path),
        )

    # build summary index
    summary_index = SummaryIndex(nodes)

    # define query engines
    vector_query_engine = vector_index.as_query_engine(llm=llm)
    summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize", llm=llm
    )

    # extract a summary
    if not os.path.exists(summary_out_path):
        Path(summary_out_path).parent.mkdir(parents=True, exist_ok=True)
        summary = str(
            await summary_query_engine.aquery(
                "Extract a concise 1-2 line summary of this document"
            )
        )
        pickle.dump(summary, open(summary_out_path, "wb"))
    else:
        summary = pickle.load(open(summary_out_path, "rb"))

    # define tools
    query_engine_tools = [
        QueryEngineTool(
            query_engine=vector_query_engine,
            metadata=ToolMetadata(
                name=f"vector_tool_{file_base}",
                description=f"Useful for questions related to specific facts",
            ),
        ),
        QueryEngineTool(
            query_engine=summary_query_engine,
            metadata=ToolMetadata(
                name=f"summary_tool_{file_base}",
                description=f"Useful for summarization questions",
            ),
        ),
    ]

    # build agent
    function_llm = OpenAI(model="gpt-4")
    agent = OpenAIAgent.from_tools(
        query_engine_tools,
        llm=function_llm,
        verbose=True,
        system_prompt=f"""\
You are a specialized agent designed to answer queries about the `{file_base}.html` part of the LlamaIndex docs.
You must ALWAYS use at least one of the tools provided when answering a question; do NOT rely on prior knowledge.\
""",
    )

    return agent, summary


async def build_agents(docs):
    node_parser = SentenceSplitter()

    # Build agents dictionary
    agents_dict = {}
    extra_info_dict = {}

    # # this is for the baseline
    # all_nodes = []

    for idx, doc in enumerate(tqdm(docs)):
        nodes = node_parser.get_nodes_from_documents([doc])
        # all_nodes.extend(nodes)

        # ID will be base + parent
        file_path = Path(doc.metadata["path"])
        file_base = str(file_path.parent.stem) + "_" + str(file_path.stem)
        agent, summary = await build_agent_per_doc(nodes, file_base)

        agents_dict[file_base] = agent
        extra_info_dict[file_base] = {"summary": summary, "nodes": nodes}

    return agents_dict, extra_info_dict

agents_dict, extra_info_dict = await build_agents(docs)

latest_search
latest_genindex
latest_index
community_frequently_asked_questions
community_integrations
community_full_stack_projects
integrations_tonicvalidate
integrations_using_with_langchain
integrations_trulens
integrations_deepeval
integrations_managed_indices
integrations_chatgpt_plugins
integrations_graphsignal
integrations_lmformatenforcer
integrations_graph_stores
integrations_vector_stores
integrations_fleet_libraries_context
llama_packs_root
faq_vector_database
faq_query_engines
faq_embeddings
faq_chat_engines
faq_llms
getting_started_reading
getting_started_discover_llamaindex
getting_started_starter_example
getting_started_customization
getting_started_installation
getting_started_concepts
api_reference_storage
api_reference_multi_modal
api_reference_response
api_reference_callbacks
api_reference_node
api_reference_readers
api_reference_agents
api_reference_query
api_reference_example_notebooks
api_reference_node_postprocessor
api_reference_composability
api_reference_llm_predictor
api_reference_service_context
api_reference_prompts
api_reference_struct_store
api_reference_indices
api_reference_evaluation
api_reference_index
api_reference_memory
api_reference_llms
langchain_integrations_base
storage_kv_store
storage_vector_store
storage_index_store
storage_docstore
storage_indices_save_load
multi_modal_openai
query_query_transform
query_query_bundle
query_retrievers
query_response_synthesizer
query_query_engines
retrievers_vector_store
retrievers_empty
retrievers_transform
retrievers_tree
retrievers_kg
retrievers_table
query_engines_graph_query_engine
query_engines_citation_query_engine
query_engines_pandas_query_engine
query_engines_retriever_query_engine
query_engines_knowledge_graph_query_engine
query_engines_retriever_router_query_engine
query_engines_sub_question_query_engine
query_engines_flare_query_engine
query_engines_sql_query_engine
query_engines_transform_query_engine
query_engines_sql_join_query_engine
query_engines_router_query_engine
chat_engines_condense_question_chat_engine
chat_engines_simple_chat_engine
chat_engines_condense_plus_context_chat_engine
indices_vector_store
indices_tree
indices_kg
indices_list
indices_table
indices_struct_store
service_context_embeddings
service_context_prompt_helper
llms_xinference
llms_langchain
llms_replicate
llms_llama_cpp
llms_azure_openai
llms_openai
llms_predibase
llms_gradient_model_adapter
llms_anthropic
llms_openllm
llms_huggingface

Build Retriever-Enabled OpenAI Agent#

We build a top-level agent that can orchestrate across the different document agents to answer any user query.

This RetrieverOpenAIAgent performs tool retrieval before tool use (unlike a default agent that tries to put all tools in the prompt).

Improvements from V0: We make the following improvements compared to the “base” version in V0.

Adding in reranking: we use Cohere reranker to better filter the candidate set of documents.
Adding in a query planning tool: we add an explicit query planning tool that’s dynamically created based on the set of retrieved tools.

# define tool for each document agent
all_tools = []
for file_base, agent in agents_dict.items():
    summary = extra_info_dict[file_base]["summary"]
    doc_tool = QueryEngineTool(
        query_engine=agent,
        metadata=ToolMetadata(
            name=f"tool_{file_base}",
            description=summary,
        ),
    )
    all_tools.append(doc_tool)

print(all_tools[0].metadata)

ToolMetadata(description='This document provides examples and instructions for searching within the Llama Index documentation.', name='tool_latest_search', fn_schema=<class 'llama_index.tools.types.DefaultToolFnSchema'>)

# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import (
    ObjectIndex,
    SimpleToolNodeMapping,
    ObjectRetriever,
)
from llama_index.core.retrievers import BaseRetriever
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.core.query_engine import SubQuestionQueryEngine
from llama_index.llms.openai import OpenAI

llm = OpenAI(model_name="gpt-4-0613")

tool_mapping = SimpleToolNodeMapping.from_objects(all_tools)
obj_index = ObjectIndex.from_objects(
    all_tools,
    tool_mapping,
    VectorStoreIndex,
)
vector_node_retriever = obj_index.as_node_retriever(similarity_top_k=10)


# define a custom retriever with reranking
class CustomRetriever(BaseRetriever):
    def __init__(self, vector_retriever, postprocessor=None):
        self._vector_retriever = vector_retriever
        self._postprocessor = postprocessor or CohereRerank(top_n=5)
        super().__init__()

    def _retrieve(self, query_bundle):
        retrieved_nodes = self._vector_retriever.retrieve(query_bundle)
        filtered_nodes = self._postprocessor.postprocess_nodes(
            retrieved_nodes, query_bundle=query_bundle
        )

        return filtered_nodes


# define a custom object retriever that adds in a query planning tool
class CustomObjectRetriever(ObjectRetriever):
    def __init__(self, retriever, object_node_mapping, all_tools, llm=None):
        self._retriever = retriever
        self._object_node_mapping = object_node_mapping
        self._llm = llm or OpenAI("gpt-4-0613")

    def retrieve(self, query_bundle):
        nodes = self._retriever.retrieve(query_bundle)
        tools = [self._object_node_mapping.from_node(n.node) for n in nodes]

        sub_question_engine = SubQuestionQueryEngine.from_defaults(
            query_engine_tools=tools, llm=self._llm
        )
        sub_question_description = f"""\
Useful for any queries that involve comparing multiple documents. ALWAYS use this tool for comparison queries - make sure to call this \
tool with the original query. Do NOT use the other tools for any queries involving multiple documents.
"""
        sub_question_tool = QueryEngineTool(
            query_engine=sub_question_engine,
            metadata=ToolMetadata(
                name="compare_tool", description=sub_question_description
            ),
        )

        return tools + [sub_question_tool]

custom_node_retriever = CustomRetriever(vector_node_retriever)

# wrap it with ObjectRetriever to return objects
custom_obj_retriever = CustomObjectRetriever(
    custom_node_retriever, tool_mapping, all_tools, llm=llm
)

tmps = custom_obj_retriever.retrieve("hello")
print(len(tmps))

from llama_index.agent.openai_legacy import FnRetrieverOpenAIAgent
from llama_index.core.agent import ReActAgent

top_agent = FnRetrieverOpenAIAgent.from_retriever(
    custom_obj_retriever,
    system_prompt=""" \
You are an agent designed to answer queries about the documentation.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    llm=llm,
    verbose=True,
)

# top_agent = ReActAgent.from_tools(
#     tool_retriever=custom_obj_retriever,
#     system_prompt=""" \
# You are an agent designed to answer queries about the documentation.
# Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

# """,
#     llm=llm,
#     verbose=True,
# )

Define Baseline Vector Store Index#

As a point of comparison, we define a “naive” RAG pipeline which dumps all docs into a single vector index collection.

We set the top_k = 4

all_nodes = [
    n for extra_info in extra_info_dict.values() for n in extra_info["nodes"]
]

base_index = VectorStoreIndex(all_nodes)
base_query_engine = base_index.as_query_engine(similarity_top_k=4)

Running Example Queries#

Let’s run some example queries, ranging from QA / summaries over a single document to QA / summarization over multiple documents.

response = top_agent.query(
    "Tell me about the different types of evaluation in LlamaIndex"
)

=== Calling Function ===
Calling function: tool_api_reference_evaluation with args: {
  "input": "types of evaluation"
}
=== Calling Function ===
Calling function: vector_tool_api_reference_evaluation with args: {
  "input": "types of evaluation"
}
Got output: The types of evaluation can include correctness evaluation, faithfulness evaluation, guideline evaluation, hit rate evaluation, MRR (Mean Reciprocal Rank) evaluation, pairwise comparison evaluation, relevancy evaluation, and response evaluation.
========================
Got output: The types of evaluation mentioned in the `api_reference_evaluation.html` part of the LlamaIndex docs include:

1. Correctness Evaluation
2. Faithfulness Evaluation
3. Guideline Evaluation
4. Hit Rate Evaluation
5. MRR (Mean Reciprocal Rank) Evaluation
6. Pairwise Comparison Evaluation
7. Relevancy Evaluation
8. Response Evaluation
========================

print(response)

There are several types of evaluation in LlamaIndex:

Correctness Evaluation: This type of evaluation measures the accuracy of the retrieval results. It checks if the retrieved documents are correct and relevant to the query.

Faithfulness Evaluation: Faithfulness evaluation measures how faithfully the retrieved documents represent the original data. It checks if the retrieved documents accurately reflect the information in the original documents.

Guideline Evaluation: Guideline evaluation involves comparing the retrieval results against a set of guidelines or ground truth. It checks if the retrieval results align with the expected or desired outcomes.

Hit Rate Evaluation: Hit rate evaluation measures the percentage of queries that return at least one relevant document. It is a binary evaluation metric that indicates the effectiveness of the retrieval system in finding relevant documents.

MRR (Mean Reciprocal Rank) Evaluation: MRR evaluation measures the average rank of the first relevant document in the retrieval results. It provides a single value that represents the effectiveness of the retrieval system in ranking relevant documents.

Pairwise Comparison Evaluation: Pairwise comparison evaluation involves comparing the retrieval results of different systems or algorithms. It helps determine which system performs better in terms of retrieval accuracy and relevance.

Relevancy Evaluation: Relevancy evaluation measures the relevance of the retrieved documents to the query. It can be done using various metrics such as precision, recall, and F1 score.

Response Evaluation: Response evaluation measures the quality of the response generated by the retrieval system. It checks if the response is informative, accurate, and helpful to the user.

These evaluation types help assess the performance and effectiveness of the retrieval system in LlamaIndex.

# baseline
response = base_query_engine.query(
    "Tell me about the different types of evaluation in LlamaIndex"
)
print(str(response))

LlamaIndex utilizes various types of evaluation methods to assess its performance and effectiveness. These evaluation methods include RelevancyEvaluator, RetrieverEvaluator, SemanticSimilarityEvaluator, PairwiseComparisonEvaluator, CorrectnessEvaluator, FaithfulnessEvaluator, and GuidelineEvaluator. Each of these evaluators serves a specific purpose in evaluating different aspects of the LlamaIndex system.

response = top_agent.query(
    "Compare the content in the contributions page vs. index page."
)

=== Calling Function ===
Calling function: compare_tool with args: {
  "input": "content in the contributions page vs. index page"
}
Generated 2 sub questions.
[tool_development_contributing] Q: What is the content of the contributions page?
[tool_latest_index] Q: What is the content of the index page?
=== Calling Function ===
Calling function: summary_tool_development_contributing with args: {
  "input": "development_contributing.html"
}
=== Calling Function ===
Calling function: vector_tool_latest_index with args: {
  "input": "content of the index page"
}
Got output: The development_contributing.html file provides information on how to contribute to LlamaIndex. It includes guidelines on what to work on, such as extending core modules, fixing bugs, adding usage examples, adding experimental features, and improving code quality and documentation. The file also provides details on each module, including data loaders, node parsers, text splitters, document/index/KV stores, managed index, vector stores, retrievers, query engines, query transforms, token usage optimizers, node postprocessors, and output parsers. Additionally, the file includes a development guideline section that covers environment setup, validating changes, formatting/linting, testing, creating example notebooks, and creating a pull request.
========================
Got output: The content of the index page provides information about LlamaIndex, a data framework for LLM applications. It explains why LlamaIndex is useful for augmenting LLM models with private or domain-specific data that may be distributed across different applications and data stores. LlamaIndex offers tools such as data connectors, data indexes, engines, and data agents to ingest, structure, and access data. It is designed for beginners as well as advanced users who can customize and extend its modules. The page also provides installation instructions, tutorials, and links to the LlamaIndex ecosystem and associated projects.
========================
[tool_latest_index] A: The content of the `latest_index.html` page provides comprehensive information about LlamaIndex, a data framework for LLM applications. It explains the utility of LlamaIndex in augmenting LLM models with private or domain-specific data that may be distributed across different applications and data stores. 

The page details the tools offered by LlamaIndex, such as data connectors, data indexes, engines, and data agents, which are used to ingest, structure, and access data. It is designed to cater to both beginners and advanced users, with the flexibility to customize and extend its modules.

Additionally, the page provides installation instructions and tutorials for users. It also includes links to the LlamaIndex ecosystem and associated projects for further exploration and understanding.
[tool_development_contributing] A: The `development_contributing.html` page of the LlamaIndex docs provides comprehensive information on how to contribute to the project. It includes guidelines on the areas to focus on, such as extending core modules, fixing bugs, adding usage examples, adding experimental features, and improving code quality and documentation.

The page also provides detailed information on each module, including data loaders, node parsers, text splitters, document/index/KV stores, managed index, vector stores, retrievers, query engines, query transforms, token usage optimizers, node postprocessors, and output parsers.

In addition, there is a development guideline section that covers various aspects of the development process, including environment setup, validating changes, formatting/linting, testing, creating example notebooks, and creating a pull request.
Got output: The content in the contributions page of the LlamaIndex documentation provides comprehensive information on how to contribute to the project, including guidelines on areas to focus on and detailed information on each module. It also covers various aspects of the development process. 

On the other hand, the content in the index page of the LlamaIndex documentation provides comprehensive information about LlamaIndex itself, explaining its utility in augmenting LLM models with private or domain-specific data. It details the tools offered by LlamaIndex and provides installation instructions, tutorials, and links to the LlamaIndex ecosystem and associated projects.
========================

print(response)

The contributions page of the LlamaIndex documentation provides guidelines for contributing to LlamaIndex, including extending core modules, fixing bugs, adding usage examples, adding experimental features, and improving code quality and documentation. It also includes information on the environment setup, validating changes, formatting and linting, testing, creating example notebooks, and creating a pull request.

On the other hand, the index page of the LlamaIndex documentation provides information about LlamaIndex itself. It explains that LlamaIndex is a data framework that allows LLM applications to ingest, structure, and access private or domain-specific data. It provides tools such as data connectors, data indexes, engines, data agents, and application integrations. The index page also mentions that LlamaIndex is designed for beginners, advanced users, and everyone in between, and offers both high-level and lower-level APIs for customization. It provides installation instructions, links to the GitHub and PyPi repositories, and information about the LlamaIndex community on Twitter and Discord.

In summary, the contributions page focuses on contributing to LlamaIndex, while the index page provides an overview of LlamaIndex and its features.

response = top_agent.query(
    "Can you compare the tree index and list index at a very high-level?"
)

print(str(response))

At a high level, the Tree Index and List Index are two different types of indexes used in the system. 

The Tree Index is a tree-structured index that is built specifically for each query. It allows for the construction of a query-specific tree from leaf nodes to return a response. The Tree Index is designed to provide a more optimized and efficient way of retrieving nodes based on a query.

On the other hand, the List Index is a keyword table index that supports operations such as inserting and deleting documents, retrieving nodes based on a query, and refreshing the index with updated documents. The List Index is a simpler index that uses a keyword table approach for retrieval.

Both indexes have their own advantages and use cases. The choice between them depends on the specific requirements and constraints of the system.