Structured Input#
The other side of structured data, beyond the output, is the input: many prompting guides and best practices, indeed, include some techniques such as XML tagging of the input prompt to boost the LLM's understanding of the input.
LlamaIndex offers you the possibility of natively formatting your inputs as XML snippets, leveraging banks and Jinja (make sure to have llama-index>=0.12.34
installed).
Using Structured Input Alone#
Here is a simple example of how to use structured inputs with Pydantic models:
from pydantic import BaseModel
from llama_index.core.prompts import RichPromptTemplate
from llama_index.llms.openai import OpenAI
template_str = "Please extract from the following XML code the contact details of the user:\n\n```xml\n{{ user | to_xml }}\n```\n\n"
prompt = RichPromptTemplate(template_str)
class User(BaseModel):
name: str
surname: str
age: int
email: str
phone: str
social_accounts: Dict[str, str]
user = User(
name="John",
surname="Doe",
age=30,
email="[email protected]",
phone="123-456-7890",
social_accounts={"bluesky": "john.doe", "instagram": "johndoe1234"},
)
## check how the prompt would look like
prompt.format(user=user)
llm = OpenAI()
response = llm.chat(prompt.format_messages(user=user))
print(response.message.content)
As you can see, in order to employ the structured output, we need to use a Jinja expression (delimited by {{}}
) with the to_xml
filter (the filtering operator is |
).
Combining Structured Input with Structured Output#
The combination of structured input and structured output can really boost the consistency (and thus reliability) of your LLM's output.
With this code snippet below, you can see how you can chain these two step of data structuring.
from pydantic import Field
from typing import Optional
class SocialAccounts(BaseModel):
instagram: Optional[str] = Field(default=None)
bluesky: Optional[str] = Field(default=None)
x: Optional[str] = Field(default=None)
mastodon: Optional[str] = Field(default=None)
class ContactDetails(BaseModel):
email: str
phone: str
social_accounts: SocialAccounts
sllm = llm.as_structured_llm(ContactDetails)
structured_response = await sllm.achat(prompt.format_messages(user=user))
print(structured_response.raw.email)
print(structured_response.raw.phone)
print(structured_response.raw.social_accounts.instagram)
print(structured_response.raw.social_accounts.bluesky)
If you want a more in-depth guide to structured input, check out this example notebook.