OpenAI Responses API¶
This notebook shows how to use the OpenAI Responses LLM.
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
%pip install llama-index llama-index-llms-openai
Basic Usage¶
import os
os.environ["OPENAI_API_KEY"] = "..."
from llama_index.llms.openai import OpenAIResponses
llm = OpenAIResponses(
model="gpt-4o-mini",
# api_key="some key", # uses OPENAI_API_KEY env var by default
)
Call complete
with a prompt¶
from llama_index.llms.openai import OpenAI
resp = llm.complete("Paul Graham is ")
print(resp)
Paul Graham is a prominent computer scientist, entrepreneur, and venture capitalist, best known for co-founding the startup accelerator Y Combinator. He is also recognized for his essays on technology, startups, and programming, which have influenced many in the tech community. Graham has a background in programming languages and artificial intelligence, having earned a Ph.D. from Harvard University. His work has significantly shaped the startup ecosystem, particularly in Silicon Valley. Would you like to know more about a specific aspect of his work or ideas?
Call chat
with a list of messages¶
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)
print(resp)
assistant: Ahoy, matey! Ye can call me Captain Jollybeard, the most colorful pirate to sail the seven seas! What brings ye to me ship today? Arrr!
Streaming¶
Using stream_complete
endpoint
resp = llm.stream_complete("Paul Graham is ")
for r in resp:
print(r.delta, end="")
Paul Graham is a prominent computer scientist, entrepreneur, and venture capitalist, best known for co-founding the startup accelerator Y Combinator. He is also recognized for his essays on technology, startups, and programming, which have influenced many in the tech community. Graham has a background in programming languages and artificial intelligence and has authored several influential works, including "Hackers and Painters." His insights on entrepreneurship and innovation have made him a respected figure in Silicon Valley.
Using stream_chat
endpoint
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)
for r in resp:
print(r.delta, end="")
Ahoy there! Ye can call me Captain Jollybeard, the most colorful pirate to sail the seven seas! What brings ye to me ship today?
Configure Parameters¶
The Respones API supports many options:
- Setting the model name
- Generation parameters like temperature, top_p, max_output_tokens
- enabling built-in tool calling
- setting the resoning effort for O-series models
- tracking previous responses for automatic conversation history
- and more!
Basic Parameters¶
from llama_index.llms.openai import OpenAIResponses
llm = OpenAIResponses(
model="gpt-4o-mini",
temperature=0.5, # default is 0.1
max_output_tokens=100, # default is None
top_p=0.95, # default is 1.0
)
Built-in Tool Calling¶
The responses API supports built-in tool calling, which you can read more about here.
Configuring this means that the LLM will automatically call the tool and use it to augment the response.
Tools are defined as a list of dictionaries, each containing settings for a tool.
Below is an example of using the built-in web search tool.
from llama_index.llms.openai import OpenAIResponses
from llama_index.core.llms import ChatMessage
llm = OpenAIResponses(
model="gpt-4o-mini",
built_in_tools=[{"type": "web_search_preview"}],
)
resp = llm.chat(
[ChatMessage(role="user", content="What is the weather in San Francisco?")]
)
print(resp)
print("========" * 2)
print(resp.additional_kwargs)
assistant: As of 12:18 AM on Friday, March 28, 2025, in San Francisco, the current weather is partly sunny with a temperature of 61°F (16°C). ## Weather for San Francisco, CA: Current Conditions: Partly sunny, 61°F (16°C) Daily Forecast: * Thursday, March 27: Low: 52°F (11°C), High: 61°F (16°C), Description: Periods of rain and drizzle beginning in the late morning; breezy this afternoon * Friday, March 28: Low: 47°F (8°C), High: 61°F (16°C), Description: A shower in the area in the morning; otherwise, clouds giving way to some sun * Saturday, March 29: Low: 50°F (10°C), High: 60°F (15°C), Description: Mostly sunny * Sunday, March 30: Low: 51°F (11°C), High: 59°F (15°C), Description: Cloudy; periods of rain in the morning followed by a shower in spots in the afternoon * Monday, March 31: Low: 49°F (10°C), High: 58°F (14°C), Description: Cloudy and cool; a couple of showers in the afternoon * Tuesday, April 01: Low: 53°F (12°C), High: 58°F (14°C), Description: Some sunshine giving way to clouds, breezy and cool; occasional rain in the afternoon * Wednesday, April 02: Low: 52°F (11°C), High: 56°F (13°C), Description: A couple of showers in the morning; otherwise, cloudy and remaining cool In March, San Francisco typically experiences daytime temperatures around 61°F (16°C) and nighttime temperatures around 47°F (8°C). The city usually receives about 3.5 inches (89 mm) of rainfall over approximately 11 days during the month. ([weather2visit.com](https://www.weather2visit.com/north-america/united-states/san-francisco-march.htm?utm_source=openai)) ================ {'built_in_tool_calls': [ResponseFunctionWebSearch(id='ws_67e5eaecce088191ab2edce452ef25420a24041ef7e917b2', status='completed', type='web_search_call')], 'annotations': [AnnotationURLCitation(end_index=1561, start_index=1439, title='San Francisco Weather in March 2025 | United States Averages | Weather-2-Visit', type='url_citation', url='https://www.weather2visit.com/north-america/united-states/san-francisco-march.htm?utm_source=openai')], 'usage': ResponseUsage(input_tokens=327, output_tokens=462, output_tokens_details=OutputTokensDetails(reasoning_tokens=0), total_tokens=789, input_tokens_details={'cached_tokens': 0})}
Reasoning Effort¶
For O-series models, you can set the reasoning effort to control the amount of time the model will spend reasoning.
The possible values are low
, medium
, and high
.
from llama_index.llms.openai import OpenAIResponses
from llama_index.core.llms import ChatMessage
llm = OpenAIResponses(
model="o3-mini",
reasoning="high",
)
resp = llm.chat(
[ChatMessage(role="user", content="What is the meaning of life?")]
)
print(resp)
print("========" * 2)
print(resp.additional_kwargs)
assistant: The question “What is the meaning of life?” has intrigued humanity for centuries, and there isn’t one universally accepted answer. Different perspectives offer different insights: • Religious and Spiritual Views: Many traditions propose that life’s meaning is connected to a divine purpose, spiritual growth, or fulfilling the will of a higher power. For example, some religions teach that life is about serving God, others emphasize enlightenment or unity with the universe. • Philosophical Perspectives: Philosophers have long debated the issue. Existentialists, such as Jean-Paul Sartre or Albert Camus, suggest that life doesn’t come with an inherent meaning—that instead, each person must create their own purpose through choices and actions. Other philosophical traditions, like those found in ancient Greek thought, propose that the pursuit of virtue or wisdom is central to a meaningful life. • Scientific and Evolutionary Insights: From a scientific standpoint, life can be seen as the product of natural processes like evolution. In this view, the “meaning” is less about cosmic purpose and more about survival, reproduction, and the development of complex societies. Many find purpose in understanding the universe and our place within it. • Personal and Existential Meanings: For many people today, meaning is deeply personal. It might be found in relationships, love, creative expression, learning, or contributing to something larger than oneself—be it community, art, science, or social progress. This view suggests that meaning isn’t handed to us; it’s something we create over the course of our lives. In essence, the meaning of life is a multifaceted question that can lead to introspection about what matters most to you. Whether you lean toward religious faith, philosophical inquiry, scientific curiosity, or personal fulfillment, the idea is that meaning often emerges from how we engage with the world, form connections, and choose to live our lives. ================ {'built_in_tool_calls': [], 'reasoning': ResponseReasoningItem(id='rs_67e5eb6de5a881918ffb8aabe12eb8da0859b64a1dc4ba8f', summary=[], type='reasoning', status=None), 'annotations': [], 'usage': ResponseUsage(input_tokens=72, output_tokens=828, output_tokens_details=OutputTokensDetails(reasoning_tokens=448), total_tokens=900, input_tokens_details={'cached_tokens': 0})}
Image Support¶
OpenAI has support for images in the input of chat messages for many models.
Using the content blocks feature of chat messages, you can easily combone text and images in a single LLM prompt.
!wget https://cdn.pixabay.com/photo/2016/07/07/16/46/dice-1502706_640.jpg -O image.png
from llama_index.core.llms import ChatMessage, TextBlock, ImageBlock
from llama_index.llms.openai import OpenAIResponses
llm = OpenAIResponses(model="gpt-4o")
messages = [
ChatMessage(
role="user",
blocks=[
ImageBlock(path="image.png"),
TextBlock(text="Describe the image in a few sentences."),
],
)
]
resp = llm.chat(messages)
print(resp.message.content)
The image shows three white dice with black dots, captured in mid-air above a checkered surface. The dice are in various orientations, displaying different numbers of dots. The background is dark, with a subtle light illuminating the dice, creating a dramatic effect. The checkered surface resembles a chess or checkerboard.
Using Function/Tool Calling¶
OpenAI models have native support for function calling. This conveniently integrates with LlamaIndex tool abstractions, letting you plug in any arbitrary Python function to the LLM.
In the example below, we define a function to generate a Song object.
from pydantic import BaseModel
from llama_index.core.tools import FunctionTool
class Song(BaseModel):
"""A song with name and artist"""
name: str
artist: str
def generate_song(name: str, artist: str) -> Song:
"""Generates a song with provided name and artist."""
return Song(name=name, artist=artist)
tool = FunctionTool.from_defaults(fn=generate_song)
The strict
parameter tells OpenAI whether or not to use constrained sampling when generating tool calls/structured outputs. This means that the generated tool call schema will always contain the expected fields.
Since this seems to increase latency, it defaults to false.
from llama_index.llms.openai import OpenAIResponses
llm = OpenAIResponses(model="gpt-4o-mini", strict=True)
response = llm.predict_and_call(
[tool],
"Write a random song for me",
# strict=True # can also be set at the function level to override the class
)
print(str(response))
name='Chasing Stars' artist='Luna Sky'
We can also do multiple function calling.
llm = OpenAIResponses(model="gpt-4o-mini")
response = llm.predict_and_call(
[tool],
"Generate five songs from the Beatles",
allow_parallel_tool_calls=True,
)
for s in response.sources:
print(f"Name: {s.tool_name}, Input: {s.raw_input}, Output: {str(s)}")
Name: generate_song, Input: {'args': (), 'kwargs': {'name': 'Hey Jude', 'artist': 'The Beatles'}}, Output: name='Hey Jude' artist='The Beatles' Name: generate_song, Input: {'args': (), 'kwargs': {'name': 'Let It Be', 'artist': 'The Beatles'}}, Output: name='Let It Be' artist='The Beatles' Name: generate_song, Input: {'args': (), 'kwargs': {'name': 'Come Together', 'artist': 'The Beatles'}}, Output: name='Come Together' artist='The Beatles' Name: generate_song, Input: {'args': (), 'kwargs': {'name': 'Yesterday', 'artist': 'The Beatles'}}, Output: name='Yesterday' artist='The Beatles' Name: generate_song, Input: {'args': (), 'kwargs': {'name': 'Twist and Shout', 'artist': 'The Beatles'}}, Output: name='Twist and Shout' artist='The Beatles'
Manual Tool Calling¶
If you want to control how a tool is called, you can also split the tool calling and tool selection into their own steps.
First, lets select a tool.
from llama_index.core.llms import ChatMessage
from llama_index.llms.openai import OpenAIResponses
llm = OpenAIResponses(model="gpt-4o-mini")
chat_history = [ChatMessage(role="user", content="Write a random song for me")]
resp = llm.chat_with_tools([tool], chat_history=chat_history)
Now, lets call the tool the LLM selected (if any).
If there was a tool call, we should send the results to the LLM to generate the final response (or another tool call!).
tools_by_name = {t.metadata.name: t for t in [tool]}
tool_calls = llm.get_tool_calls_from_response(
resp, error_on_no_tool_call=False
)
while tool_calls:
# add the LLM's response to the chat history
chat_history.append(resp.message)
for tool_call in tool_calls:
tool_name = tool_call.tool_name
tool_kwargs = tool_call.tool_kwargs
print(f"Calling {tool_name} with {tool_kwargs}")
tool_output = tool(**tool_kwargs)
chat_history.append(
ChatMessage(
role="tool",
content=str(tool_output),
# most LLMs like OpenAI need to know the tool call id
additional_kwargs={"call_id": tool_call.tool_id},
)
)
resp = llm.chat_with_tools([tool], chat_history=chat_history)
tool_calls = llm.get_tool_calls_from_response(
resp, error_on_no_tool_call=False
)
Calling generate_song with {'name': 'Chasing Stars', 'artist': 'Luna Sky'}
Now, we should have a final response!
print(resp.message.content)
Here's a song for you titled **"Chasing Stars"** by **Luna Sky**! ### Chasing Stars **Verse 1** In the midnight glow, we wander free, With dreams like fireflies, lighting up the sea. Whispers of the night, calling out our names, Together we’ll ignite, this wild, untamed flame. **Chorus** We’re chasing stars, through the endless night, With every heartbeat, we’ll take flight. Hand in hand, we’ll break the dark, In this cosmic dance, we’ll leave our mark. **Verse 2** Underneath the moon, secrets softly shared, Every glance a promise, every touch a dare. The universe is ours, let the journey start, With every step we take, we’re painting art. **Chorus** We’re chasing stars, through the endless night, With every heartbeat, we’ll take flight. Hand in hand, we’ll break the dark, In this cosmic dance, we’ll leave our mark. **Bridge** And when the dawn arrives, we’ll still be here, With the echoes of our laughter, crystal clear. No matter where we go, no matter how far, Forever in our hearts, we’ll chase those stars. **Chorus** We’re chasing stars, through the endless night, With every heartbeat, we’ll take flight. Hand in hand, we’ll break the dark, In this cosmic dance, we’ll leave our mark. **Outro** So let’s chase the stars, let’s light the way, In this beautiful journey, we’ll never stray. With dreams as our compass, love as our guide, Together we’ll soar, side by side. Feel free to let me know if you'd like any changes or another song!
Structured Prediction¶
An important use case for function calling is extracting structured objects. LlamaIndex provides an intuitive interface for converting any LLM into a structured LLM - simply define the target Pydantic class (can be nested), and given a prompt, we extract out the desired object.
from llama_index.llms.openai import OpenAIResponses
from llama_index.core.prompts import PromptTemplate
from pydantic import BaseModel
from typing import List
class MenuItem(BaseModel):
"""A menu item in a restaurant."""
course_name: str
is_vegetarian: bool
class Restaurant(BaseModel):
"""A restaurant with name, city, and cuisine."""
name: str
city: str
cuisine: str
menu_items: List[MenuItem]
llm = OpenAIResponses(model="gpt-4o-mini")
prompt_tmpl = PromptTemplate(
"Generate a restaurant in a given city {city_name}"
)
# Option 1: Use `as_structured_llm`
restaurant_obj = (
llm.as_structured_llm(Restaurant)
.complete(prompt_tmpl.format(city_name="Dallas"))
.raw
)
# Option 2: Use `structured_predict`
# restaurant_obj = llm.structured_predict(Restaurant, prompt_tmpl, city_name="Miami")
restaurant_obj
Restaurant(name='Tex-Mex Delight', city='Dallas', cuisine='Tex-Mex', menu_items=[MenuItem(course_name='Tacos', is_vegetarian=False), MenuItem(course_name='Vegetarian Enchiladas', is_vegetarian=True), MenuItem(course_name='Fajitas', is_vegetarian=False), MenuItem(course_name='Chips and Salsa', is_vegetarian=True), MenuItem(course_name='Queso Dip', is_vegetarian=True)])
Async¶
from llama_index.llms.openai import OpenAIResponses
llm = OpenAIResponses(model="gpt-4o")
resp = await llm.acomplete("Paul Graham is ")
print(resp)
Paul Graham is a British-American entrepreneur, venture capitalist, and essayist. He is best known for co-founding Viaweb, one of the first web-based applications, which was later sold to Yahoo and became Yahoo Store. Graham is also a co-founder of Y Combinator, a highly influential startup accelerator that has funded and supported numerous successful startups, including Dropbox, Airbnb, and Reddit. In addition to his work in technology and startups, Graham is known for his insightful essays on topics such as programming, entrepreneurship, and the philosophy of work.
resp = await llm.astream_complete("Paul Graham is ")
async for delta in resp:
print(delta.delta, end="")
Paul Graham is a British-American entrepreneur, venture capitalist, and essayist. He is best known for co-founding Viaweb, one of the first web-based applications, which was later sold to Yahoo and became Yahoo Store. Graham is also a co-founder of Y Combinator, a highly influential startup accelerator that has funded and supported numerous successful startups, including Dropbox, Airbnb, and Reddit. In addition to his work in technology and startups, Graham is known for his insightful essays on topics related to entrepreneurship, technology, and society.
Async function calling is also supported.
llm = OpenAIResponses(model="gpt-4o-mini")
response = await llm.apredict_and_call([tool], "Generate a random song")
print(str(response))
name='Chasing Stars' artist='Luna Sky'
Additional kwargs¶
If there are additional kwargs not present in the constructor, you can set them at a per-instance level with additional_kwargs
.
These will be passed into every call to the LLM.
from llama_index.llms.openai import OpenAIResponses
llm = OpenAIResponses(
model="gpt-4o-mini", additional_kwargs={"user": "your_user_id"}
)
resp = llm.complete("Paul Graham is ")
print(resp)