Open In Colab

Defining a Unified Query Interface over your Data#

This notebook shows how to build a unified query interface that can handle:

  1. heterogeneous data sources (e.g. data about multiple cities) and

  2. complex queries (e.g. compare and contrast).

If you’re opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

!pip install llama-index
import logging
import sys

# logging.basicConfig(stream=sys.stdout, level=logging.INFO)
# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

# Uncomment if you want to temporarily disable logger
logger = logging.getLogger()
logger.disabled = True
from llama_index import (
    VectorStoreIndex,
    SimpleKeywordTableIndex,
    SimpleDirectoryReader,
    ServiceContext,
)
/Users/suo/miniconda3/envs/llama/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

Load Datasets#

Load Wikipedia pages about different cities.

wiki_titles = ["Toronto", "Seattle", "Chicago", "Boston", "Houston"]
from pathlib import Path

import requests

for title in wiki_titles:
    response = requests.get(
        "https://en.wikipedia.org/w/api.php",
        params={
            "action": "query",
            "format": "json",
            "titles": title,
            "prop": "extracts",
            # 'exintro': True,
            "explaintext": True,
        },
    ).json()
    page = next(iter(response["query"]["pages"].values()))
    wiki_text = page["extract"]

    data_path = Path("data")
    if not data_path.exists():
        Path.mkdir(data_path)

    with open(data_path / f"{title}.txt", "w") as fp:
        fp.write(wiki_text)
# Load all wiki documents
city_docs = {}
for wiki_title in wiki_titles:
    city_docs[wiki_title] = SimpleDirectoryReader(
        input_files=[f"data/{wiki_title}.txt"]
    ).load_data()

Building Vector Indices#

Build a vector index for the wiki pages about cities.

from llama_index.llms import OpenAI


chatgpt = OpenAI(temperature=0, model="gpt-3.5-turbo")
service_context = ServiceContext.from_defaults(llm=chatgpt, chunk_size=1024)

gpt4 = OpenAI(temperature=0, model="gpt-4")
service_context = ServiceContext.from_defaults(llm=gpt4, chunk_size=1024)
/Users/suo/miniconda3/envs/llama/lib/python3.9/site-packages/langchain/llms/openai.py:687: UserWarning: You are trying to use a chat model. This way of initializing it is no longer supported. Instead, please use: `from langchain.chat_models import ChatOpenAI`
  warnings.warn(
# Build city document index
vector_indices = {}
for wiki_title in wiki_titles:
    # build vector index
    vector_indices[wiki_title] = VectorStoreIndex.from_documents(
        city_docs[wiki_title], service_context=service_context
    )

    # set id for vector index
    vector_indices[wiki_title].set_index_id(wiki_title)
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 20744 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 16942 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 26082 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 18648 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 21844 tokens
index_summaries = {
    wiki_title: (
        f"This content contains Wikipedia articles about {wiki_title}. Use"
        " this index if you need to lookup specific facts about"
        f" {wiki_title}.\nDo not use this index if you want to analyze"
        " multiple cities."
    )
    for wiki_title in wiki_titles
}

Test Querying the Vector Index#

query_engine = vector_indices["Toronto"].as_query_engine()
response = query_engine.query("What are the sports teams in Toronto?")
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 8 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 1904 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
print(str(response))
The sports teams in Toronto include:

1. Toronto Maple Leafs (NHL - ice hockey)
2. Toronto Blue Jays (MLB - baseball)
3. Toronto Raptors (NBA - basketball)
4. Toronto Argonauts (CFL - Canadian football)
5. Toronto FC (MLS - soccer)
6. Toronto Marlies (AHL - ice hockey)
7. Toronto Six (NWHL - women's ice hockey)
8. Toronto Rock (NLL - lacrosse)
9. Toronto Rush (AUDL - ultimate frisbee)
10. Toronto Wolfpack (Rugby league, playing in the North American Rugby League tournament)

Build a Graph for Compare/Contrast Queries#

We build a graph by composing a keyword table index on top of all the vector indices. We use this graph for compare/contrast queries

from llama_index.indices.composability import ComposableGraph

graph = ComposableGraph.from_indices(
    SimpleKeywordTableIndex,
    [index for _, index in vector_indices.items()],
    [summary for _, summary in index_summaries.items()],
    max_keywords_per_chunk=50,
)
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [build_index_from_nodes] Total embedding token usage: 0 tokens
# get root index
root_index = graph.get_index(graph.root_id)

# set id of root index
root_index.set_index_id("compare_contrast")
# define decompose_transform
from llama_index.indices.query.query_transform.base import (
    DecomposeQueryTransform,
)

decompose_transform = DecomposeQueryTransform(llm=chatgpt, verbose=True)
# define custom retrievers
from llama_index.query_engine.transform_query_engine import (
    TransformQueryEngine,
)


custom_query_engines = {}
for index in vector_indices.values():
    query_engine = index.as_query_engine(service_context=service_context)
    query_engine = TransformQueryEngine(
        query_engine,
        query_transform=decompose_transform,
        transform_metadata={"index_summary": index.index_struct.summary},
    )
    custom_query_engines[index.index_id] = query_engine

custom_query_engines[graph.root_id] = graph.root_index.as_query_engine(
    retriever_mode="simple",
    response_mode="tree_summarize",
    service_context=service_context,
    verbose=True,
)
# define graph
graph_query_engine = graph.as_query_engine(
    custom_query_engines=custom_query_engines
)

Test querying the graph#

query_str = "Compare and contrast the arts and culture of Houston and Boston. "
response = graph_query_engine.query(query_str)
INFO:llama_index.indices.keyword_table.retrievers:> Starting query: Compare and contrast the arts and culture of Houston and Boston. 
INFO:llama_index.indices.keyword_table.retrievers:query keywords: ['contrast', 'houston', 'arts', 'boston', 'culture', 'compare']
INFO:llama_index.indices.keyword_table.retrievers:> Extracted keywords: ['houston', 'boston']
> Current query: Compare and contrast the arts and culture of Houston and Boston. 
> New query: What are some notable cultural institutions or events in Houston?

INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 11 tokens
> Current query: Compare and contrast the arts and culture of Houston and Boston. 
> New query: What are some notable cultural institutions or events in Houston?

INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 1877 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> Current query: Compare and contrast the arts and culture of Houston and Boston. 
> New query: What are some notable cultural institutions or events in Boston?

INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 11 tokens
> Current query: Compare and contrast the arts and culture of Houston and Boston. 
> New query: What are some notable cultural institutions or events in Boston?

INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 2130 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 885 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 885 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
print(response)
Houston and Boston both have rich arts and culture scenes, with a variety of cultural institutions and events that cater to diverse interests. Both cities have a strong presence of performing arts organizations, such as the Houston Grand Opera and Houston Ballet in Houston, and the Boston Ballet and Boston Lyric Opera Company in Boston. They also have renowned symphony orchestras, with the Houston Symphony Orchestra and the Boston Symphony Orchestra.

Both cities host annual events that celebrate their unique cultural identities, such as the Houston Livestock Show and Rodeo, Houston Gay Pride Parade, and Houston Greek Festival in Houston, and the Boston Gay Pride Parade and Festival, Italian Summer Feasts, and Fourth of July events in Boston. Additionally, both cities have thriving theater districts, with Houston's Theater District and Boston's Theater District housing several historic and modern theaters.

In terms of visual arts, both Houston and Boston have notable art museums, such as the Museum of Fine Arts in both cities, as well as the Houston Museum of Natural Science and the Contemporary Arts Museum Houston in Houston, and the Isabella Stewart Gardner Museum and the Institute of Contemporary Art in Boston. Houston also has unique institutions like the Menil Collection, Rothko Chapel, and the Byzantine Fresco Chapel Museum, while Boston has historic sites related to the American Revolution preserved in the Boston National Historical Park and along the Freedom Trail.

While both cities have a strong focus on arts and culture, Houston's cultural scene tends to be more diverse, with events like the Art Car Parade, Houston International Festival, and Bayou City Art Festival showcasing the city's eclectic mix of cultures. On the other hand, Boston's cultural scene is deeply rooted in its history and traditions, with events like the Boston Early Music Festival and historic sites along the Freedom Trail reflecting the city's rich past.

Build a router to automatically choose between indices and graph#

We can use a RouterQueryEngine to automatically route to the vector indices and the graph.

To do this, first build the query engines, and give each a description to obtain a QueryEngineTool.

from llama_index.tools.query_engine import QueryEngineTool

query_engine_tools = []

# add vector index tools
for wiki_title in wiki_titles:
    index = vector_indices[wiki_title]
    summary = index_summaries[wiki_title]

    query_engine = index.as_query_engine(service_context=service_context)
    vector_tool = QueryEngineTool.from_defaults(
        query_engine, description=summary
    )
    query_engine_tools.append(vector_tool)


# add graph tool
graph_description = (
    "This tool contains Wikipedia articles about multiple cities. "
    "Use this tool if you want to compare multiple cities. "
)
graph_tool = QueryEngineTool.from_defaults(
    graph_query_engine, description=graph_description
)
query_engine_tools.append(graph_tool)

Then, define the RouterQueryEngine with a desired selector module. Here, we use the LLMSingleSelector, which uses LLM to choose a underlying query engine to route the query to.

from llama_index.query_engine.router_query_engine import RouterQueryEngine
from llama_index.selectors.llm_selectors import LLMSingleSelector


router_query_engine = RouterQueryEngine(
    selector=LLMSingleSelector.from_defaults(service_context=service_context),
    query_engine_tools=query_engine_tools,
)

Asking a compare and contrast question should route the query to the graph.

# ask a compare/contrast question
response = router_query_engine.query(
    "Compare and contrast the arts and culture of Houston and Boston.",
)
INFO:llama_index.query_engine.router_query_engine:Selecting query engine 5: This tool contains Wikipedia articles about multiple cities, which allows for comparison and analysis of different cities, such as Houston and Boston..
INFO:llama_index.indices.keyword_table.retrievers:> Starting query: Compare and contrast the arts and culture of Houston and Boston.
INFO:llama_index.indices.keyword_table.retrievers:query keywords: ['contrast', 'houston', 'arts', 'boston', 'culture', 'compare']
INFO:llama_index.indices.keyword_table.retrievers:> Extracted keywords: ['houston', 'boston']
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 11 tokens
> Current query: Compare and contrast the arts and culture of Houston and Boston.
> New query: What are some notable cultural institutions or events in Houston?
> Current query: Compare and contrast the arts and culture of Houston and Boston.
> New query: What are some notable cultural institutions or events in Houston and Boston?

INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 1835 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
> Current query: Compare and contrast the arts and culture of Houston and Boston.
> New query: What are some notable cultural institutions or events in Boston?

INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 11 tokens
> Current query: Compare and contrast the arts and culture of Houston and Boston.
> New query: What are some notable cultural institutions or events in Boston?

INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 2134 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 772 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 772 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
print(response)
Based on the context information provided, both Houston and Boston have rich arts and cultural scenes, with a variety of institutions and events catering to diverse interests.

Houston's cultural institutions and events include the Houston Theater District, the Museum District, the Houston Livestock Show and Rodeo, the Houston Gay Pride Parade, the Houston Greek Festival, the Art Car Parade, the Houston Auto Show, the Houston International Festival, and the Bayou City Art Festival.

In contrast, Boston's cultural institutions and events include the Boston Symphony Hall, New England Conservatory's Jordan Hall, Boston Ballet, various performing-arts organizations, contemporary classical music groups, the Theater District, First Night, Boston Early Music Festival, Boston Arts Festival, Boston Gay Pride Parade and Festival, Italian Summer Feasts, Fourth of July events, art museums such as the Museum of Fine Arts and Isabella Stewart Gardner Museum, the Institute of Contemporary Art, art gallery destinations like the South End Art and Design District (SoWa) and Newbury St, and the Boston National Historical Park.

Both cities have theater districts, gay pride parades, and arts festivals. However, Houston has unique events such as the Livestock Show and Rodeo, the Greek Festival, the Art Car Parade, and the Houston Auto Show. On the other hand, Boston has a strong focus on classical music with venues like the Symphony Hall and Jordan Hall, as well as historical sites related to the American Revolution.

Asking a question about a specific city should route the query to the specific vector index query engine.

response = router_query_engine.query("What are the sports teams in Toronto?")
INFO:llama_index.query_engine.router_query_engine:Selecting query engine 0: This content contains Wikipedia articles about Toronto, which can provide information about the sports teams in the city..
INFO:llama_index.token_counter.token_counter:> [retrieve] Total LLM token usage: 0 tokens
INFO:llama_index.token_counter.token_counter:> [retrieve] Total embedding token usage: 8 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total LLM token usage: 1905 tokens
INFO:llama_index.token_counter.token_counter:> [get_response] Total embedding token usage: 0 tokens
print(response)
The sports teams in Toronto include:

1. Toronto Maple Leafs (NHL - ice hockey)
2. Toronto Blue Jays (MLB - baseball)
3. Toronto Raptors (NBA - basketball)
4. Toronto Argonauts (CFL - Canadian football)
5. Toronto FC (MLS - soccer)
6. Toronto Marlies (AHL - ice hockey)
7. Toronto Six (NWHL - women's ice hockey)
8. Toronto Rock (NLL - lacrosse)
9. Toronto Rush (AUDL - ultimate frisbee)
10. Toronto Wolfpack (Rugby league, currently playing in the North American Rugby League tournament)