Anthropic¶
Anthropic offers many state-of-the-art models from the haiku, sonnet, and opus families.
Read on to learn how to use these models with LlamaIndex!
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
%pip install llama-index-llms-anthropic
Set Tokenizer¶
First we want to set the tokenizer, which is slightly different than TikToken. This ensures that token counting is accurate throughout the library.
NOTE: Anthropic recently updated their token counting API. Older models like claude-2.1 are no longer supported for token counting in the latest versions of the Anthropic python client.
from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings
tokenizer = Anthropic().tokenizer
Settings.tokenizer = tokenizer
Basic Usage¶
import os
os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."
You can call complete
with a prompt:
from llama_index.llms.anthropic import Anthropic
# To customize your API key, do this
# otherwise it will lookup ANTHROPIC_API_KEY from your env variable
# llm = Anthropic(api_key="<api_key>")
llm = Anthropic(model="claude-3-7-sonnet-latest")
resp = llm.complete("Who is Paul Graham?")
print(resp)
Paul Graham is a computer scientist, entrepreneur, venture capitalist, and essayist. He co-founded Viaweb (one of the first web application companies, later sold to Yahoo! and became Yahoo! Store), and later co-founded Y Combinator, an influential startup accelerator that has helped launch companies like Airbnb, Dropbox, Stripe, and Reddit. Graham is also well-known for his essays on technology, startups, and programming, which are published on his website. He created the Lisp dialect called Arc, and authored books including "On Lisp," "ANSI Common Lisp," and "Hackers & Painters." He has a PhD in Computer Science from Harvard and studied painting at the Rhode Island School of Design and in Florence, Italy.
You can also call chat
with a list of chat messages:
from llama_index.core.llms import ChatMessage
from llama_index.llms.anthropic import Anthropic
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="Tell me a story"),
]
llm = Anthropic(model="claude-3-7-sonnet-latest")
resp = llm.chat(messages)
print(resp)
assistant: # THE TREASURE OF CRIMSON COVE *Arrr, gather 'round, ye curious soul, for I be havin' a tale that'll chill yer very bones!* 'Twas fifteen years ago when me and me crew aboard the Salty Vengeance caught wind of a treasure most rare - the Sapphire of Poseidon, said to control the very tides themselves! The map came to me hands after a particularly spirited game o' cards with a one-eyed merchant who'd had far too much rum. We set sail under the cover of a moonless night, navigatin' by stars alone to reach the dreaded Crimson Cove - a place where the water turns red as blood when the sun sets, on account of the strange coral beneath the waves. Three days into our journey, the skies turned black as pitch! A storm like none I'd ever seen! Waves tall as mountains threatened to swallow us whole! "HOLD FAST, YE MANGY DOGS!" I bellowed over the howlin' winds. When we finally reached the cove, half me crew was convinced the treasure was cursed. Bah! Superstitious bilge rats! But I'll not be lyin' to ye... when we found that hidden cave behind the waterfall, and saw them skeletons arranged in a circle 'round an empty chest... well, even ME beard seemed to tremble of its own accord! The real treasure weren't no sapphire at all, but a map to somethin' far greater... somethin' I still be searchin' for to this very day! *Leans in closer, voice dropping to a whisper* And perhaps, if ye prove yerself worthy, I might be persuaded to let ye join the hunt! HARR HARR HARR!
Streaming Support¶
Every method supports streaming through the stream_
prefix.
from llama_index.llms.anthropic import Anthropic
llm = Anthropic(model="claude-3-7-sonnet-latest")
resp = llm.stream_complete("Who is Paul Graham?")
for r in resp:
print(r.delta, end="")
Paul Graham is a computer scientist, entrepreneur, venture capitalist, and essayist. He's best known for: 1. Co-founding Viaweb (later sold to Yahoo and became Yahoo Store) 2. Creating the programming language Arc 3. Co-founding Y Combinator, an influential startup accelerator that has funded companies like Airbnb, Dropbox, and Stripe 4. Writing influential essays on startups, programming, and technology that are published on his website 5. His work on Lisp programming language Graham is widely respected in the tech and startup communities for his insights on building companies and technology development.
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(role="user", content="Who is Paul Graham?"),
]
resp = llm.stream_chat(messages)
for r in resp:
print(r.delta, end="")
Paul Graham is a computer scientist, entrepreneur, venture capitalist, and essayist. He's best known for: 1. Co-founding Viaweb (later sold to Yahoo and became Yahoo Store) 2. Creating the programming language Arc 3. Co-founding Y Combinator, an influential startup accelerator that has funded companies like Airbnb, Dropbox, and Stripe 4. Writing influential essays on startups, programming, and technology that are published on his website 5. His work on Lisp programming language Graham is widely respected in the tech and startup communities for his insights on building companies and technology development.
Async Usage¶
Every synchronous method has an async counterpart.
from llama_index.llms.anthropic import Anthropic
llm = Anthropic(model="claude-3-7-sonnet-latest")
resp = await llm.astream_complete("Who is Paul Graham?")
async for r in resp:
print(r.delta, end="")
Paul Graham is a computer scientist, entrepreneur, venture capitalist, and essayist. He's best known for: 1. Co-founding Viaweb (later sold to Yahoo and became Yahoo Store) 2. Creating the programming language Arc 3. Co-founding Y Combinator, an influential startup accelerator that has funded companies like Airbnb, Dropbox, Stripe, and Reddit 4. Writing influential essays on startups, programming, and technology that are published on his website 5. His work on Lisp programming language Graham is widely respected in the tech and startup communities for his insights on building companies and technology development.
messages = [
ChatMessage(role="user", content="Who is Paul Graham?"),
]
resp = await llm.achat(messages)
print(resp)
assistant: Paul Graham is a computer scientist, entrepreneur, venture capitalist, and essayist. He's best known for: 1. Co-founding Viaweb (later sold to Yahoo and became Yahoo Store) 2. Creating the programming language Arc 3. Co-founding Y Combinator, an influential startup accelerator that has funded companies like Airbnb, Dropbox, Stripe, and Reddit 4. Writing influential essays on startups, programming, and technology that are published on his website 5. His work on Lisp programming language Graham is widely respected in the tech and startup communities for his insights on building companies and technology development.
Vertex AI Support¶
By providing the region
and project_id
parameters (either through environment variables or directly), you can use an Anthropic model through Vertex AI.
import os
os.environ["ANTHROPIC_PROJECT_ID"] = "YOUR PROJECT ID HERE"
os.environ["ANTHROPIC_REGION"] = "YOUR PROJECT REGION HERE"
Do keep in mind that setting region and project_id here will make Anthropic use the Vertex AI client
Bedrock Support¶
LlamaIndex also supports Anthropic models through AWS Bedrock.
from llama_index.llms.anthropic import Anthropic
# Note: this assumes you have standard AWS credentials configured in your environment
llm = Anthropic(
model="anthropic.claude-3-7-sonnet-20250219-v1:0",
aws_region="us-east-1",
)
resp = llm.complete("Who is Paul Graham?")
Multi-Modal Support¶
Using ChatMessage
objects, you can pass in images and text to the LLM.
!wget https://cdn.pixabay.com/photo/2021/12/12/20/00/play-6865967_640.jpg -O image.jpg
from llama_index.core.llms import ChatMessage, TextBlock, ImageBlock
from llama_index.llms.anthropic import Anthropic
llm = Anthropic(model="claude-3-7-sonnet-latest")
messages = [
ChatMessage(
role="user",
blocks=[
ImageBlock(path="image.jpg"),
TextBlock(text="What is in this image?"),
],
)
]
resp = llm.chat(messages)
print(resp)
assistant: The image shows four wooden dice arranged on a dark blue or black textured surface. The dice appear to be made of light-colored wood with black dots representing the numbers. Each die shows a different face value, with various combinations of dots visible. The dice have a natural wooden finish and the classic cubic shape with rounded edges that's typical of gaming dice. This type of dice would commonly be used for board games, tabletop games, or various games of chance.
Prompt Caching¶
Anthropic models support the idea of prompt cahcing -- wherein if a prompt is repeated multiple times, or the start of a prompt is repeated, the LLM can reuse pre-calculated attention results to speed up the response and lower costs.
To enable prompt caching, you can set cache_control
on your ChatMessage
objects, or set cache_idx
on the LLM to always cache the first X messages (with -1 being all messages).
from llama_index.core.llms import ChatMessage
from llama_index.llms.anthropic import Anthropic
llm = Anthropic(model="claude-3-7-latest")
# cache individual message(s)
messages = [
ChatMessage(
role="user",
content="<some very long prompt>",
additional_kwargs={"cache_control": {"type": "ephemeral"}},
),
]
resp = llm.chat(messages)
# cache first X messages (with -1 being all messages)
llm = Anthropic(model="claude-3-7-latest", cache_idx=-1)
resp = llm.chat(messages)
Structured Prediction¶
LlamaIndex provides an intuitive interface for converting any Anthropic LLMs into a structured LLM through structured_predict
- simply define the target Pydantic class (can be nested), and given a prompt, we extract out the desired object.
from llama_index.llms.anthropic import Anthropic
from llama_index.core.prompts import PromptTemplate
from llama_index.core.bridge.pydantic import BaseModel
from typing import List
class MenuItem(BaseModel):
"""A menu item in a restaurant."""
course_name: str
is_vegetarian: bool
class Restaurant(BaseModel):
"""A restaurant with name, city, and cuisine."""
name: str
city: str
cuisine: str
menu_items: List[MenuItem]
llm = Anthropic("claude-3-5-sonnet-20240620")
prompt_tmpl = PromptTemplate(
"Generate a restaurant in a given city {city_name}"
)
# Option 1: Use `as_structured_llm`
restaurant_obj = (
llm.as_structured_llm(Restaurant)
.complete(prompt_tmpl.format(city_name="Miami"))
.raw
)
# Option 2: Use `structured_predict`
# restaurant_obj = llm.structured_predict(Restaurant, prompt_tmpl, city_name="Miami")
restaurant_obj
Restaurant(name='Ocean Breeze Bistro', city='Miami', cuisine='Seafood', menu_items=[MenuItem(course_name='Grilled Mahi-Mahi', is_vegetarian=False), MenuItem(course_name='Coconut Shrimp', is_vegetarian=False), MenuItem(course_name='Key Lime Pie', is_vegetarian=True), MenuItem(course_name='Vegetable Paella', is_vegetarian=True)])
Structured Prediction with Streaming¶
Any LLM wrapped with as_structured_llm
supports streaming through stream_chat
.
from llama_index.core.llms import ChatMessage
from IPython.display import clear_output
from pprint import pprint
input_msg = ChatMessage.from_str("Generate a restaurant in San Francisco")
sllm = llm.as_structured_llm(Restaurant)
stream_output = sllm.stream_chat([input_msg])
for partial_output in stream_output:
clear_output(wait=True)
pprint(partial_output.raw.dict())
restaurant_obj = partial_output.raw
restaurant_obj
{'city': 'San Francisco', 'cuisine': 'California Fusion', 'menu_items': [{'course_name': 'Sourdough Avocado Toast', 'is_vegetarian': True}, {'course_name': 'Dungeness Crab Cioppino', 'is_vegetarian': False}, {'course_name': 'Mission-style Veggie Burrito', 'is_vegetarian': True}, {'course_name': 'Grilled Napa Valley Lamb Chops', 'is_vegetarian': False}, {'course_name': 'Vegan Ghirardelli Chocolate Mousse', 'is_vegetarian': True}], 'name': 'Golden Gate Grill'}
Restaurant(name='Golden Gate Grill', city='San Francisco', cuisine='California Fusion', menu_items=[MenuItem(course_name='Sourdough Avocado Toast', is_vegetarian=True), MenuItem(course_name='Dungeness Crab Cioppino', is_vegetarian=False), MenuItem(course_name='Mission-style Veggie Burrito', is_vegetarian=True), MenuItem(course_name='Grilled Napa Valley Lamb Chops', is_vegetarian=False), MenuItem(course_name='Vegan Ghirardelli Chocolate Mousse', is_vegetarian=True)])
Model Thinking¶
With claude-3.7 Sonnet
, you can enable the model to "think" harder about a task, generating a chain-of-thought response before writing out the final answer.
You can enable this by passing in the thinking_dict
parameter to the constructor, specififying the amount of tokens to reserve for the thinking process.
from llama_index.llms.anthropic import Anthropic
from llama_index.core.llms import ChatMessage
llm = Anthropic(
model="claude-3-7-sonnet-latest",
# max_tokens must be greater than budget_tokens
max_tokens=64000,
# temperature must be 1.0 for thinking to work
temperature=1.0,
thinking_dict={"type": "enabled", "budget_tokens": 1600},
)
messages = [
ChatMessage(role="user", content="(1234 * 3421) / (231 + 2341) = ?")
]
resp_gen = llm.stream_chat(messages)
for r in resp_gen:
print(r.delta, end="")
print()
print(r.message.content)
# Evaluating (1234 * 3421) / (231 + 2341) I'll solve this step by step. ## Step 1: Calculate the numerator (1234 * 3421) 1234 * 3421 = 4,221,514 ## Step 2: Calculate the denominator (231 + 2341) 231 + 2341 = 2,572 ## Step 3: Divide the numerator by the denominator 4,221,514 ÷ 2,572 = 1,641.335... Therefore: (1234 * 3421) / (231 + 2341) = 1,641.335... The exact answer is 1,641 + 862/2,572, which can be simplified to 1,641.335... # Evaluating (1234 * 3421) / (231 + 2341) I'll solve this step by step. ## Step 1: Calculate the numerator (1234 * 3421) 1234 * 3421 = 4,221,514 ## Step 2: Calculate the denominator (231 + 2341) 231 + 2341 = 2,572 ## Step 3: Divide the numerator by the denominator 4,221,514 ÷ 2,572 = 1,641.335... Therefore: (1234 * 3421) / (231 + 2341) = 1,641.335... The exact answer is 1,641 + 862/2,572, which can be simplified to 1,641.335...
print(r.message.additional_kwargs["thinking"]["signature"])
ErUBCkYIARgCIkA2LmXlUq2Lmkrlw4yPTpMD2I688kow8bnUjgP8DaEg0jXSgnTBjx0MWOJGpxQJA6Y3RVT/fGFm/X8ZDa7JXC0jEgybB8Sb5YUDH8RsEKcaDAFQAYIlE+97QPbA8yIwUaJV4/6oPFzx6PHC8ZZn8P05tcGdcR/Vp1z4mlLmjfaikz3mHzAOvQp1wunx0sa0Kh0TIbmx80VaWeU/RgFk0yIIZmkKXtCVI27VFVu8nw==
We can also expose the exact thinking process:
print(resp.message.additional_kwargs["thinking"]["thinking"])
I need to calculate (1234 * 3421) / (231 + 2341) Let's start by calculating the numerator: 1234 * 3421 1234 * 3421 = (1234 * 3000) + (1234 * 400) + (1234 * 20) + (1234 * 1) = 3702000 + 493600 + 24680 + 1234 = 4221514 Now let's calculate the denominator: 231 + 2341 231 + 2341 = 2572 Finally, let's calculate the division: 4221514 / 2572 Actually, let me just double-check my calculation of 1234 * 3421. 1234 * 3421 = 1234 * 3421 Let me do this calculation differently. 1234 × 3421 ------ 1234 24680 493600 3702000 ------ 4221514 So the numerator is 4221514. Now let's calculate the denominator: 231 + 2341 = 2572 Finally, let's calculate the division: 4221514 / 2572 4221514 / 2572 = ? Let me try long division. 4221514 / 2572 = 1640.94... Actually, let me verify this with another approach. 4221514 / 2572 ≈ 4200000 / 2600 ≈ 1615.38... That's not matching my earlier calculation. Let me try the division again. 4221514 / 2572 2572 goes into 4221 about 1.64 times, which is about 1 time. 4221 - 2572 = 1649 Bring down the 5: 16495 2572 goes into 16495 about 6.41 times, which is about 6 times. 16495 - (6 * 2572) = 16495 - 15432 = 1063 Bring down the 1: 10631 2572 goes into 10631 about 4.13 times, which is about 4 times. 10631 - (4 * 2572) = 10631 - 10288 = 343 Bring down the 4: 3434 2572 goes into 3434 about 1.33 times, which is about 1 time. 3434 - 2572 = 862 Actually, I'm going to try one more approach. I'll use polynomial long division. 4221514 / 2572 = (4221514/2572) Let me calculate this directly. 4221514 / 2572 = 1641.3351... Let me double-check this by multiplying: 1641.3351 * 2572 ≈ 4221514? Let's see. That's approximately 1641 * 2572 = 4,220,652. That seems close enough (1641 * 2572 is a bit less than 4221514, which makes sense since 1641 is a bit less than 1641.3351). So our answer is 4221514 / 2572 = 1641.3351...
Tool/Function Calling¶
Anthropic supports direct tool/function calling through the API. Using LlamaIndex, we can implement some core agentic tool calling patterns.
from llama_index.core.tools import FunctionTool
from llama_index.core.llms import ChatMessage
from llama_index.llms.anthropic import Anthropic
from datetime import datetime
llm = Anthropic(model="claude-3-7-sonnet-latest")
def get_current_time() -> dict:
"""Get the current time"""
return {"time": datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
# uses the tool name, any type annotations, and docstring to describe the tool
tool = FunctionTool.from_defaults(fn=get_current_time)
We can simply do a single pass to call the tool and get the result:
resp = llm.predict_and_call([tool], "What is the current time?")
print(resp)
{'time': '2025-03-06 12:36:25'}
We can also use lower-level APIs to implement an agentic tool-calling loop!
chat_history = [ChatMessage(role="user", content="What is the current time?")]
tools_by_name = {t.metadata.name: t for t in [tool]}
resp = llm.chat_with_tools([tool], chat_history=chat_history)
tool_calls = llm.get_tool_calls_from_response(
resp, error_on_no_tool_call=False
)
if not tool_calls:
print(resp)
else:
while tool_calls:
# add the LLM's response to the chat history
chat_history.append(resp.message)
for tool_call in tool_calls:
tool_name = tool_call.tool_name
tool_kwargs = tool_call.tool_kwargs
print(f"Calling {tool_name} with {tool_kwargs}")
tool_output = tool.call(**tool_kwargs)
print("Tool output: ", tool_output)
chat_history.append(
ChatMessage(
role="tool",
content=str(tool_output),
# most LLMs like Anthropic, OpenAI, etc. need to know the tool call id
additional_kwargs={"tool_call_id": tool_call.tool_id},
)
)
resp = llm.chat_with_tools([tool], chat_history=chat_history)
tool_calls = llm.get_tool_calls_from_response(
resp, error_on_no_tool_call=False
)
print("Final response: ", resp.message.content)
Calling get_current_time with {} Tool output: {'time': '2025-03-06 12:43:36'} Final response: The current time is 12:43:36 PM on March 6, 2025.