Usage Pattern#
Get Started#
Build a chat engine from index:
chat_engine = index.as_chat_engine()
Tip
To learn how to build an index, see Indexing
Have a conversation with your data:
response = chat_engine.chat("Tell me a joke.")
Reset chat history to start a new conversation:
chat_engine.reset()
Enter an interactive chat REPL:
chat_engine.chat_repl()
Configuring a Chat Engine#
Configuring a chat engine is very similar to configuring a query engine.
High-Level API#
You can directly build and configure a chat engine from an index in 1 line of code:
chat_engine = index.as_chat_engine(chat_mode="condense_question", verbose=True)
Note: you can access different chat engines by specifying the
chat_mode
as a kwarg.condense_question
corresponds toCondenseQuestionChatEngine
,react
corresponds toReActChatEngine
,context
corresponds to aContextChatEngine
.Note: While the high-level API optimizes for ease-of-use, it does NOT expose full range of configurability.
Available Chat Modes#
best
- Turn the query engine into a tool, for use with aReAct
data agent or anOpenAI
data agent, depending on what your LLM supports.OpenAI
data agents requiregpt-3.5-turbo
orgpt-4
as they use the function calling API from OpenAI.condense_question
- Look at the chat history and re-write the user message to be a query for the index. Return the response after reading the response from the query engine.context
- Retrieve nodes from the index using every user message. The retrieved text is inserted into the system prompt, so that the chat engine can either respond naturally or use the context from the query engine.condense_plus_context
- A combination ofcondense_question
andcontext
. Look at the chat history and re-write the user message to be a retrieval query for the index. The retrieved text is inserted into the system prompt, so that the chat engine can either respond naturally or use the context from the query engine.simple
- A simple chat with the LLM directly, no query engine involved.react
- Same asbest
, but forces aReAct
data agent.openai
- Same asbest
, but forces anOpenAI
data agent.
Low-Level Composition API#
You can use the low-level composition API if you need more granular control.
Concretely speaking, you would explicitly construct ChatEngine
object instead of calling index.as_chat_engine(...)
.
Note: You may need to look at API references or example notebooks.
Here's an example where we configure the following:
- configure the condense question prompt,
- initialize the conversation with some existing history,
- print verbose debug message.
from llama_index.core import PromptTemplate
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.core.chat_engine import CondenseQuestionChatEngine
custom_prompt = PromptTemplate(
"""\
Given a conversation (between Human and Assistant) and a follow up message from Human, \
rewrite the message to be a standalone question that captures all relevant context \
from the conversation.
<Chat History>
{chat_history}
<Follow Up Message>
{question}
<Standalone question>
"""
)
# list of `ChatMessage` objects
custom_chat_history = [
ChatMessage(
role=MessageRole.USER,
content="Hello assistant, we are having a insightful discussion about Paul Graham today.",
),
ChatMessage(role=MessageRole.ASSISTANT, content="Okay, sounds good."),
]
query_engine = index.as_query_engine()
chat_engine = CondenseQuestionChatEngine.from_defaults(
query_engine=query_engine,
condense_question_prompt=custom_prompt,
chat_history=custom_chat_history,
verbose=True,
)
Streaming#
To enable streaming, you simply need to call the stream_chat
endpoint instead of the chat
endpoint.
Warning
This somewhat inconsistent with query engine (where you pass in a streaming=True
flag). We are working on making the behavior more consistent!
chat_engine = index.as_chat_engine()
streaming_response = chat_engine.stream_chat("Tell me a joke.")
for token in streaming_response.response_gen:
print(token, end="")
See an end-to-end tutorial