Structured Prediction#
Structured Prediction gives you more granular control over how your application calls the LLM and uses Pydantic. We will use the same Invoice
class, load the PDF as we did in the previous example, and use OpenAI as before. Instead of creating a structured LLM, we will call structured_predict
on the LLM itself; this a method of every LLM class.
Structured predict takes a Pydantic class and a Prompt Template as arguments, along with keyword arguments of any variables in the prompt template.
from llama_index.core.prompts import PromptTemplate
prompt = PromptTemplate(
"Extract an invoice from the following text. If you cannot find an invoice ID, use the company name '{company_name}' and the date as the invoice ID: {text}"
)
response = llm.structured_predict(
Invoice, prompt, text=text, company_name="Uber"
)
As you can see, this allows us to include additional prompt direction for what the LLM should do if Pydantic isn’t quite enough to parse the data correctly. The response object in this case is the Pydantic object itself. We can get the output as JSON if we want:
json_output = response.model_dump_json()
print(json.dumps(json.loads(json_output), indent=2))
{
"invoice_id": "Uber-2024-10-10",
"date": "2024-10-10T19:49:00",
"line_items": [
{"item_name": "Trip fare", "price": 12.18},
{"item_name": "Access for All Fee", "price": 0.1},
...,
],
}
structured_predict
has several variants available for different use-cases include async (astructured_predict
) and streaming (stream_structured_predict
, astream_structured_predict
).
Under the hood#
Depending on which LLM you use, structured_predict
is using one of two different classes to handle calling the LLM and parsing the output.
FunctionCallingProgram#
If the LLM you are using has a function calling API, FunctionCallingProgram
will
- Convert the Pydantic object into a tool
- Prompts the LLM while forcing it to use this tool
- Returns the Pydantic object generated
This is generally a more reliable method and will be used by preference if available. However, some LLMs are text-only and they will use the other method.
LLMTextCompletionProgram#
If the LLM is text-only, LLMTextCompletionProgram
will
- Output the Pydantic schema as JSON
- Send the schema and the data to the LLM with prompt instructions to respond in a form the conforms to the schema
- Call
model_validate_json()
on the Pydantic object, passing in the raw text returned from the LLM
This is notably less reliable, but supported by all text-based LLMs.
Calling prediction classes directly#
In practice structured_predict
should work well for any LLM, but if you need lower-level control it is possible to call FunctionCallingProgram
and LLMTextCompletionProgram
directly and further customize what’s happening:
textCompletion = LLMTextCompletionProgram.from_defaults(
output_cls=Invoice,
llm=llm,
prompt=PromptTemplate(
"Extract an invoice from the following text. If you cannot find an invoice ID, use the company name '{company_name}' and the date as the invoice ID: {text}"
),
)
output = textCompletion(company_name="Uber", text=text)
The above is identical to calling structured_predict
on an LLM without function calling APIs and returns a Pydantic object just like structured_predict
does. However, you can customize how the output is parsed by subclassing the PydanticOutputParser
:
from llama_index.core.output_parsers import PydanticOutputParser
class MyOutputParser(PydanticOutputParser):
def get_pydantic_object(self, text: str):
# do something more clever than this
return self.output_parser.model_validate_json(text)
textCompletion = LLMTextCompletionProgram.from_defaults(
llm=llm,
prompt=PromptTemplate(
"Extract an invoice from the following text. If you cannot find an invoice ID, use the company name '{company_name}' and the date as the invoice ID: {text}"
),
output_parser=MyOutputParser(output_cls=Invoice),
)
This is useful if you are using a low-powered LLM that needs help with the parsing.
In the final section we will take a look at even lower-level calls to the extract structured data, including extracting multiple structures in the same call.