Usage Pattern#

Get Started#

Build a query engine from index:

query_engine = index.as_query_engine()


To learn how to build an index, see Indexing

Ask a question over your data

response = query_engine.query("Who is Paul Graham?")

Configuring a Query Engine#

High-Level API#

You can directly build and configure a query engine from an index in 1 line of code:

query_engine = index.as_query_engine(

Note: While the high-level API optimizes for ease-of-use, it does NOT expose full range of configurability.

See Response Modes for a full list of response modes and what they do.

Low-Level Composition API#

You can use the low-level composition API if you need more granular control. Concretely speaking, you would explicitly construct a QueryEngine object instead of calling index.as_query_engine(...).

Note: You may need to look at API references or example notebooks.

from llama_index.core import VectorStoreIndex, get_response_synthesizer
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.query_engine import RetrieverQueryEngine

# build index
index = VectorStoreIndex.from_documents(documents)

# configure retriever
retriever = VectorIndexRetriever(

# configure response synthesizer
response_synthesizer = get_response_synthesizer(

# assemble query engine
query_engine = RetrieverQueryEngine(

# query
response = query_engine.query("What did the author do growing up?")


To enable streaming, you simply need to pass in a streaming=True flag

query_engine = index.as_query_engine(
streaming_response = query_engine.query(
    "What did the author do growing up?",

Defining a Custom Query Engine#

You can also define a custom query engine. Simply subclass the CustomQueryEngine class, define any attributes you'd want to have (similar to defining a Pydantic class), and implement a custom_query function that returns either a Response object or a string.

from llama_index.core.query_engine import CustomQueryEngine
from llama_index.core.retrievers import BaseRetriever
from llama_index.core import get_response_synthesizer
from llama_index.core.response_synthesizers import BaseSynthesizer

class RAGQueryEngine(CustomQueryEngine):
    """RAG Query Engine."""

    retriever: BaseRetriever
    response_synthesizer: BaseSynthesizer

    def custom_query(self, query_str: str):
        nodes = self.retriever.retrieve(query_str)
        response_obj = self.response_synthesizer.synthesize(query_str, nodes)
        return response_obj

See the Custom Query Engine guide for more details.