Anthropic¶

Anthropic offers many state-of-the-art models from the haiku, sonnet, and opus families.

Read on to learn how to use these models with LlamaIndex!

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [ ]:

Copied!

%pip install llama-index-llms-anthropic
%pip install llama-index-llms-anthropic

Set Tokenizer¶

First we want to set the tokenizer, which is slightly different than TikToken. This ensures that token counting is accurate throughout the library.

NOTE: Anthropic recently updated their token counting API. Older models like claude-2.1 are no longer supported for token counting in the latest versions of the Anthropic python client.

In [ ]:

Copied!

from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings

tokenizer = Anthropic().tokenizer
Settings.tokenizer = tokenizer
from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings

tokenizer = Anthropic().tokenizer
Settings.tokenizer = tokenizer

Basic Usage¶

In [ ]:

Copied!

import os

os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."
import os

os.environ["ANTHROPIC_API_KEY"] = "sk-ant-..."

You can call complete with a prompt:

In [ ]:

Copied!





from llama_index.llms.anthropic import Anthropic

# To customize your API key, do this
# otherwise it will lookup ANTHROPIC_API_KEY from your env variable
# llm = Anthropic(api_key="<api_key>")
llm = Anthropic(model="claude-3-7-sonnet-latest")

resp = llm.complete("Who is Paul Graham?")
from llama_index.llms.anthropic import Anthropic

# To customize your API key, do this
# otherwise it will lookup ANTHROPIC_API_KEY from your env variable
# llm = Anthropic(api_key="")
llm = Anthropic(model="claude-3-7-sonnet-latest")

resp = llm.complete("Who is Paul Graham?")

In [ ]:

Copied!

print(resp)
print(resp)

Paul Graham is a computer scientist, entrepreneur, venture capitalist, and essayist. He co-founded Viaweb (one of the first web application companies, later sold to Yahoo! and became Yahoo! Store), and later co-founded Y Combinator, an influential startup accelerator that has helped launch companies like Airbnb, Dropbox, Stripe, and Reddit. 

Graham is also well-known for his essays on technology, startups, and programming, which are published on his website. He created the Lisp dialect called Arc, and authored books including "On Lisp," "ANSI Common Lisp," and "Hackers & Painters." He has a PhD in Computer Science from Harvard and studied painting at the Rhode Island School of Design and in Florence, Italy.

You can also call chat with a list of chat messages:

In [ ]:

Copied!





from llama_index.core.llms import ChatMessage
from llama_index.llms.anthropic import Anthropic

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="Tell me a story"),
]
llm = Anthropic(model="claude-3-7-sonnet-latest")
resp = llm.chat(messages)

print(resp)
from llama_index.core.llms import ChatMessage
from llama_index.llms.anthropic import Anthropic

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="Tell me a story"),
]
llm = Anthropic(model="claude-3-7-sonnet-latest")
resp = llm.chat(messages)

print(resp)

assistant: # THE TREASURE OF CRIMSON COVE

*Arrr, gather 'round, ye curious soul, for I be havin' a tale that'll chill yer very bones!*

'Twas fifteen years ago when me and me crew aboard the Salty Vengeance caught wind of a treasure most rare - the Sapphire of Poseidon, said to control the very tides themselves! The map came to me hands after a particularly spirited game o' cards with a one-eyed merchant who'd had far too much rum.

We set sail under the cover of a moonless night, navigatin' by stars alone to reach the dreaded Crimson Cove - a place where the water turns red as blood when the sun sets, on account of the strange coral beneath the waves.

Three days into our journey, the skies turned black as pitch! A storm like none I'd ever seen! Waves tall as mountains threatened to swallow us whole! "HOLD FAST, YE MANGY DOGS!" I bellowed over the howlin' winds.

When we finally reached the cove, half me crew was convinced the treasure was cursed. Bah! Superstitious bilge rats! But I'll not be lyin' to ye... when we found that hidden cave behind the waterfall, and saw them skeletons arranged in a circle 'round an empty chest... well, even ME beard seemed to tremble of its own accord!

The real treasure weren't no sapphire at all, but a map to somethin' far greater... somethin' I still be searchin' for to this very day!

*Leans in closer, voice dropping to a whisper*

And perhaps, if ye prove yerself worthy, I might be persuaded to let ye join the hunt! HARR HARR HARR!

Streaming Support¶

Every method supports streaming through the stream_ prefix.

In [ ]:

Copied!

from llama_index.llms.anthropic import Anthropic

llm = Anthropic(model="claude-3-7-sonnet-latest")

resp = llm.stream_complete("Who is Paul Graham?")
for r in resp:
    print(r.delta, end="")
from llama_index.llms.anthropic import Anthropic

llm = Anthropic(model="claude-3-7-sonnet-latest")

resp = llm.stream_complete("Who is Paul Graham?")
for r in resp:
    print(r.delta, end="")

Paul Graham is a computer scientist, entrepreneur, venture capitalist, and essayist. He's best known for:

1. Co-founding Viaweb (later sold to Yahoo and became Yahoo Store)
2. Creating the programming language Arc
3. Co-founding Y Combinator, an influential startup accelerator that has funded companies like Airbnb, Dropbox, and Stripe
4. Writing influential essays on startups, programming, and technology that are published on his website
5. His work on Lisp programming language

Graham is widely respected in the tech and startup communities for his insights on building companies and technology development.

In [ ]:

Copied!





from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(role="user", content="Who is Paul Graham?"),
]

resp = llm.stream_chat(messages)
for r in resp:
    print(r.delta, end="")
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(role="user", content="Who is Paul Graham?"),
]

resp = llm.stream_chat(messages)
for r in resp:
    print(r.delta, end="")

Paul Graham is a computer scientist, entrepreneur, venture capitalist, and essayist. He's best known for:

1. Co-founding Viaweb (later sold to Yahoo and became Yahoo Store)
2. Creating the programming language Arc
3. Co-founding Y Combinator, an influential startup accelerator that has funded companies like Airbnb, Dropbox, and Stripe
4. Writing influential essays on startups, programming, and technology that are published on his website
5. His work on Lisp programming language

Graham is widely respected in the tech and startup communities for his insights on building companies and technology development.

Async Usage¶

Every synchronous method has an async counterpart.

In [ ]:

Copied!

from llama_index.llms.anthropic import Anthropic

llm = Anthropic(model="claude-3-7-sonnet-latest")

resp = await llm.astream_complete("Who is Paul Graham?")
async for r in resp:
    print(r.delta, end="")
from llama_index.llms.anthropic import Anthropic

llm = Anthropic(model="claude-3-7-sonnet-latest")

resp = await llm.astream_complete("Who is Paul Graham?")
async for r in resp:
    print(r.delta, end="")

Paul Graham is a computer scientist, entrepreneur, venture capitalist, and essayist. He's best known for:

1. Co-founding Viaweb (later sold to Yahoo and became Yahoo Store)
2. Creating the programming language Arc
3. Co-founding Y Combinator, an influential startup accelerator that has funded companies like Airbnb, Dropbox, Stripe, and Reddit
4. Writing influential essays on startups, programming, and technology that are published on his website
5. His work on Lisp programming language

Graham is widely respected in the tech and startup communities for his insights on building companies and technology development.

In [ ]:

Copied!





messages = [
    ChatMessage(role="user", content="Who is Paul Graham?"),
]

resp = await llm.achat(messages)
print(resp)
messages = [
    ChatMessage(role="user", content="Who is Paul Graham?"),
]

resp = await llm.achat(messages)
print(resp)

assistant: Paul Graham is a computer scientist, entrepreneur, venture capitalist, and essayist. He's best known for:

1. Co-founding Viaweb (later sold to Yahoo and became Yahoo Store)
2. Creating the programming language Arc
3. Co-founding Y Combinator, an influential startup accelerator that has funded companies like Airbnb, Dropbox, Stripe, and Reddit
4. Writing influential essays on startups, programming, and technology that are published on his website
5. His work on Lisp programming language

Graham is widely respected in the tech and startup communities for his insights on building companies and technology development.

Vertex AI Support¶

By providing the region and project_id parameters (either through environment variables or directly), you can use an Anthropic model through Vertex AI.

In [ ]:

Copied!

import os

os.environ["ANTHROPIC_PROJECT_ID"] = "YOUR PROJECT ID HERE"
os.environ["ANTHROPIC_REGION"] = "YOUR PROJECT REGION HERE"
import os

os.environ["ANTHROPIC_PROJECT_ID"] = "YOUR PROJECT ID HERE"
os.environ["ANTHROPIC_REGION"] = "YOUR PROJECT REGION HERE"

Do keep in mind that setting region and project_id here will make Anthropic use the Vertex AI client

Bedrock Support¶

LlamaIndex also supports Anthropic models through AWS Bedrock.

In [ ]:

Copied!





from llama_index.llms.anthropic import Anthropic

# Note: this assumes you have standard AWS credentials configured in your environment
llm = Anthropic(
    model="anthropic.claude-3-7-sonnet-20250219-v1:0",
    aws_region="us-east-1",
)

resp = llm.complete("Who is Paul Graham?")
from llama_index.llms.anthropic import Anthropic

# Note: this assumes you have standard AWS credentials configured in your environment
llm = Anthropic(
    model="anthropic.claude-3-7-sonnet-20250219-v1:0",
    aws_region="us-east-1",
)

resp = llm.complete("Who is Paul Graham?")

Using ChatMessage objects, you can pass in images and text to the LLM.

In [ ]:

Copied!

!wget https://cdn.pixabay.com/photo/2021/12/12/20/00/play-6865967_640.jpg -O image.jpg
!wget https://cdn.pixabay.com/photo/2021/12/12/20/00/play-6865967_640.jpg -O image.jpg

In [ ]:

Copied!





from llama_index.core.llms import ChatMessage, TextBlock, ImageBlock
from llama_index.llms.anthropic import Anthropic

llm = Anthropic(model="claude-3-7-sonnet-latest")

messages = [
    ChatMessage(
        role="user",
        blocks=[
            ImageBlock(path="image.jpg"),
            TextBlock(text="What is in this image?"),
        ],
    )
]

resp = llm.chat(messages)
print(resp)
from llama_index.core.llms import ChatMessage, TextBlock, ImageBlock
from llama_index.llms.anthropic import Anthropic

llm = Anthropic(model="claude-3-7-sonnet-latest")

messages = [
    ChatMessage(
        role="user",
        blocks=[
            ImageBlock(path="image.jpg"),
            TextBlock(text="What is in this image?"),
        ],
    )
]

resp = llm.chat(messages)
print(resp)

assistant: The image shows four wooden dice arranged on a dark blue or black textured surface. The dice appear to be made of light-colored wood with black dots representing the numbers. Each die shows a different face value, with various combinations of dots visible. The dice have a natural wooden finish and the classic cubic shape with rounded edges that's typical of gaming dice. This type of dice would commonly be used for board games, tabletop games, or various games of chance.

Prompt Caching¶

Anthropic models support the idea of prompt cahcing -- wherein if a prompt is repeated multiple times, or the start of a prompt is repeated, the LLM can reuse pre-calculated attention results to speed up the response and lower costs.

To enable prompt caching, you can set cache_control on your ChatMessage objects, or set cache_idx on the LLM to always cache the first X messages (with -1 being all messages).

In [ ]:

Copied!





from llama_index.core.llms import ChatMessage
from llama_index.llms.anthropic import Anthropic

llm = Anthropic(model="claude-3-7-latest")

# cache individual message(s)
messages = [
    ChatMessage(
        role="user",
        content="<some very long prompt>",
        additional_kwargs={"cache_control": {"type": "ephemeral"}},
    ),
]

resp = llm.chat(messages)

# cache first X messages (with -1 being all messages)
llm = Anthropic(model="claude-3-7-latest", cache_idx=-1)

resp = llm.chat(messages)
from llama_index.core.llms import ChatMessage
from llama_index.llms.anthropic import Anthropic

llm = Anthropic(model="claude-3-7-latest")

# cache individual message(s)
messages = [
    ChatMessage(
        role="user",
        content="",
        additional_kwargs={"cache_control": {"type": "ephemeral"}},
    ),
]

resp = llm.chat(messages)

# cache first X messages (with -1 being all messages)
llm = Anthropic(model="claude-3-7-latest", cache_idx=-1)

resp = llm.chat(messages)

Structured Prediction¶

LlamaIndex provides an intuitive interface for converting any Anthropic LLMs into a structured LLM through structured_predict - simply define the target Pydantic class (can be nested), and given a prompt, we extract out the desired object.

In [ ]:

Copied!





from llama_index.llms.anthropic import Anthropic
from llama_index.core.prompts import PromptTemplate
from llama_index.core.bridge.pydantic import BaseModel
from typing import List


class MenuItem(BaseModel):
    """A menu item in a restaurant."""

    course_name: str
    is_vegetarian: bool


class Restaurant(BaseModel):
    """A restaurant with name, city, and cuisine."""

    name: str
    city: str
    cuisine: str
    menu_items: List[MenuItem]


llm = Anthropic("claude-3-5-sonnet-20240620")
prompt_tmpl = PromptTemplate(
    "Generate a restaurant in a given city {city_name}"
)

# Option 1: Use `as_structured_llm`
restaurant_obj = (
    llm.as_structured_llm(Restaurant)
    .complete(prompt_tmpl.format(city_name="Miami"))
    .raw
)
# Option 2: Use `structured_predict`
# restaurant_obj = llm.structured_predict(Restaurant, prompt_tmpl, city_name="Miami")
from llama_index.llms.anthropic import Anthropic
from llama_index.core.prompts import PromptTemplate
from llama_index.core.bridge.pydantic import BaseModel
from typing import List


class MenuItem(BaseModel):
    """A menu item in a restaurant."""

    course_name: str
    is_vegetarian: bool


class Restaurant(BaseModel):
    """A restaurant with name, city, and cuisine."""

    name: str
    city: str
    cuisine: str
    menu_items: List[MenuItem]


llm = Anthropic("claude-3-5-sonnet-20240620")
prompt_tmpl = PromptTemplate(
    "Generate a restaurant in a given city {city_name}"
)

# Option 1: Use `as_structured_llm`
restaurant_obj = (
    llm.as_structured_llm(Restaurant)
    .complete(prompt_tmpl.format(city_name="Miami"))
    .raw
)
# Option 2: Use `structured_predict`
# restaurant_obj = llm.structured_predict(Restaurant, prompt_tmpl, city_name="Miami")

In [ ]:

Copied!

restaurant_obj
restaurant_obj

Out[ ]:

Restaurant(name='Ocean Breeze Bistro', city='Miami', cuisine='Seafood', menu_items=[MenuItem(course_name='Grilled Mahi-Mahi', is_vegetarian=False), MenuItem(course_name='Coconut Shrimp', is_vegetarian=False), MenuItem(course_name='Key Lime Pie', is_vegetarian=True), MenuItem(course_name='Vegetable Paella', is_vegetarian=True)])

Structured Prediction with Streaming¶

Any LLM wrapped with as_structured_llm supports streaming through stream_chat.

In [ ]:

Copied!





from llama_index.core.llms import ChatMessage
from IPython.display import clear_output
from pprint import pprint

input_msg = ChatMessage.from_str("Generate a restaurant in San Francisco")

sllm = llm.as_structured_llm(Restaurant)
stream_output = sllm.stream_chat([input_msg])
for partial_output in stream_output:
    clear_output(wait=True)
    pprint(partial_output.raw.dict())
    restaurant_obj = partial_output.raw

restaurant_obj
from llama_index.core.llms import ChatMessage
from IPython.display import clear_output
from pprint import pprint

input_msg = ChatMessage.from_str("Generate a restaurant in San Francisco")

sllm = llm.as_structured_llm(Restaurant)
stream_output = sllm.stream_chat([input_msg])
for partial_output in stream_output:
    clear_output(wait=True)
    pprint(partial_output.raw.dict())
    restaurant_obj = partial_output.raw

restaurant_obj

{'city': 'San Francisco',
 'cuisine': 'California Fusion',
 'menu_items': [{'course_name': 'Sourdough Avocado Toast',
                 'is_vegetarian': True},
                {'course_name': 'Dungeness Crab Cioppino',
                 'is_vegetarian': False},
                {'course_name': 'Mission-style Veggie Burrito',
                 'is_vegetarian': True},
                {'course_name': 'Grilled Napa Valley Lamb Chops',
                 'is_vegetarian': False},
                {'course_name': 'Vegan Ghirardelli Chocolate Mousse',
                 'is_vegetarian': True}],
 'name': 'Golden Gate Grill'}

Out[ ]:

Restaurant(name='Golden Gate Grill', city='San Francisco', cuisine='California Fusion', menu_items=[MenuItem(course_name='Sourdough Avocado Toast', is_vegetarian=True), MenuItem(course_name='Dungeness Crab Cioppino', is_vegetarian=False), MenuItem(course_name='Mission-style Veggie Burrito', is_vegetarian=True), MenuItem(course_name='Grilled Napa Valley Lamb Chops', is_vegetarian=False), MenuItem(course_name='Vegan Ghirardelli Chocolate Mousse', is_vegetarian=True)])

Model Thinking¶

With claude-3.7 Sonnet, you can enable the model to "think" harder about a task, generating a chain-of-thought response before writing out the final answer.

You can enable this by passing in the thinking_dict parameter to the constructor, specififying the amount of tokens to reserve for the thinking process.

In [ ]:

Copied!





from llama_index.llms.anthropic import Anthropic
from llama_index.core.llms import ChatMessage

llm = Anthropic(
    model="claude-3-7-sonnet-latest",
    # max_tokens must be greater than budget_tokens
    max_tokens=64000,
    # temperature must be 1.0 for thinking to work
    temperature=1.0,
    thinking_dict={"type": "enabled", "budget_tokens": 1600},
)
from llama_index.llms.anthropic import Anthropic
from llama_index.core.llms import ChatMessage

llm = Anthropic(
    model="claude-3-7-sonnet-latest",
    # max_tokens must be greater than budget_tokens
    max_tokens=64000,
    # temperature must be 1.0 for thinking to work
    temperature=1.0,
    thinking_dict={"type": "enabled", "budget_tokens": 1600},
)

In [ ]:

Copied!





messages = [
    ChatMessage(role="user", content="(1234 * 3421) / (231 + 2341) = ?")
]

resp_gen = llm.stream_chat(messages)

for r in resp_gen:
    print(r.delta, end="")

print()
print(r.message.content)
messages = [
    ChatMessage(role="user", content="(1234 * 3421) / (231 + 2341) = ?")
]

resp_gen = llm.stream_chat(messages)

for r in resp_gen:
    print(r.delta, end="")

print()
print(r.message.content)

# Evaluating (1234 * 3421) / (231 + 2341)

I'll solve this step by step.

## Step 1: Calculate the numerator (1234 * 3421)
1234 * 3421 = 4,221,514

## Step 2: Calculate the denominator (231 + 2341)
231 + 2341 = 2,572

## Step 3: Divide the numerator by the denominator
4,221,514 ÷ 2,572 = 1,641.335...

Therefore:
(1234 * 3421) / (231 + 2341) = 1,641.335...

The exact answer is 1,641 + 862/2,572, which can be simplified to 1,641.335...
# Evaluating (1234 * 3421) / (231 + 2341)

I'll solve this step by step.

## Step 1: Calculate the numerator (1234 * 3421)
1234 * 3421 = 4,221,514

## Step 2: Calculate the denominator (231 + 2341)
231 + 2341 = 2,572

## Step 3: Divide the numerator by the denominator
4,221,514 ÷ 2,572 = 1,641.335...

Therefore:
(1234 * 3421) / (231 + 2341) = 1,641.335...

The exact answer is 1,641 + 862/2,572, which can be simplified to 1,641.335...

In [ ]:

Copied!

print(r.message.additional_kwargs["thinking"]["signature"])
print(r.message.additional_kwargs["thinking"]["signature"])

ErUBCkYIARgCIkA2LmXlUq2Lmkrlw4yPTpMD2I688kow8bnUjgP8DaEg0jXSgnTBjx0MWOJGpxQJA6Y3RVT/fGFm/X8ZDa7JXC0jEgybB8Sb5YUDH8RsEKcaDAFQAYIlE+97QPbA8yIwUaJV4/6oPFzx6PHC8ZZn8P05tcGdcR/Vp1z4mlLmjfaikz3mHzAOvQp1wunx0sa0Kh0TIbmx80VaWeU/RgFk0yIIZmkKXtCVI27VFVu8nw==

We can also expose the exact thinking process:

In [ ]:

Copied!

print(resp.message.additional_kwargs["thinking"]["thinking"])
print(resp.message.additional_kwargs["thinking"]["thinking"])

I need to calculate (1234 * 3421) / (231 + 2341)

Let's start by calculating the numerator: 1234 * 3421
1234 * 3421 = (1234 * 3000) + (1234 * 400) + (1234 * 20) + (1234 * 1)
= 3702000 + 493600 + 24680 + 1234
= 4221514

Now let's calculate the denominator: 231 + 2341
231 + 2341 = 2572

Finally, let's calculate the division: 4221514 / 2572

Actually, let me just double-check my calculation of 1234 * 3421.
1234 * 3421 = 1234 * 3421

Let me do this calculation differently.
   1234
×  3421
------
   1234
  24680
 493600
3702000
------
4221514

So the numerator is 4221514.

Now let's calculate the denominator: 231 + 2341 = 2572

Finally, let's calculate the division: 4221514 / 2572

4221514 / 2572 = ?

Let me try long division.
4221514 / 2572 = 1640.94...

Actually, let me verify this with another approach.

4221514 / 2572 
≈ 4200000 / 2600 
≈ 1615.38...

That's not matching my earlier calculation. Let me try the division again.

4221514 / 2572 

2572 goes into 4221 about 1.64 times, which is about 1 time.
4221 - 2572 = 1649
Bring down the 5: 16495
2572 goes into 16495 about 6.41 times, which is about 6 times.
16495 - (6 * 2572) = 16495 - 15432 = 1063
Bring down the 1: 10631
2572 goes into 10631 about 4.13 times, which is about 4 times.
10631 - (4 * 2572) = 10631 - 10288 = 343
Bring down the 4: 3434
2572 goes into 3434 about 1.33 times, which is about 1 time.
3434 - 2572 = 862

Actually, I'm going to try one more approach. I'll use polynomial long division.
4221514 / 2572 = (4221514/2572)

Let me calculate this directly.
4221514 / 2572 = 1641.3351...

Let me double-check this by multiplying: 1641.3351 * 2572 ≈ 4221514? Let's see. That's approximately 1641 * 2572 = 4,220,652.

That seems close enough (1641 * 2572 is a bit less than 4221514, which makes sense since 1641 is a bit less than 1641.3351).

So our answer is 4221514 / 2572 = 1641.3351...

Tool/Function Calling¶

Anthropic supports direct tool/function calling through the API. Using LlamaIndex, we can implement some core agentic tool calling patterns.

In [ ]:

Copied!

from llama_index.core.tools import FunctionTool
from llama_index.core.llms import ChatMessage
from llama_index.llms.anthropic import Anthropic
from datetime import datetime

llm = Anthropic(model="claude-3-7-sonnet-latest")

def get_current_time() -> dict:
    """Get the current time"""
    return {"time": datetime.now().strftime("%Y-%m-%d %H:%M:%S")}

# uses the tool name, any type annotations, and docstring to describe the tool
tool = FunctionTool.from_defaults(fn=get_current_time)
from llama_index.core.tools import FunctionTool
from llama_index.core.llms import ChatMessage
from llama_index.llms.anthropic import Anthropic
from datetime import datetime

llm = Anthropic(model="claude-3-7-sonnet-latest")

def get_current_time() -> dict:
    """Get the current time"""
    return {"time": datetime.now().strftime("%Y-%m-%d %H:%M:%S")}

# uses the tool name, any type annotations, and docstring to describe the tool
tool = FunctionTool.from_defaults(fn=get_current_time)

We can simply do a single pass to call the tool and get the result:

In [ ]:

Copied!

resp = llm.predict_and_call([tool], "What is the current time?")
print(resp)
resp = llm.predict_and_call([tool], "What is the current time?")
print(resp)

{'time': '2025-03-06 12:36:25'}

We can also use lower-level APIs to implement an agentic tool-calling loop!

In [ ]:

Copied!





chat_history = [ChatMessage(role="user", content="What is the current time?")]
tools_by_name = {t.metadata.name: t for t in [tool]}

resp = llm.chat_with_tools([tool], chat_history=chat_history)
tool_calls = llm.get_tool_calls_from_response(
    resp, error_on_no_tool_call=False
)

if not tool_calls:
    print(resp)
else:
    while tool_calls:
        # add the LLM's response to the chat history
        chat_history.append(resp.message)

        for tool_call in tool_calls:
            tool_name = tool_call.tool_name
            tool_kwargs = tool_call.tool_kwargs

            print(f"Calling {tool_name} with {tool_kwargs}")
            tool_output = tool.call(**tool_kwargs)
            print("Tool output: ", tool_output)
            chat_history.append(
                ChatMessage(
                    role="tool",
                    content=str(tool_output),
                    # most LLMs like Anthropic, OpenAI, etc. need to know the tool call id
                    additional_kwargs={"tool_call_id": tool_call.tool_id},
                )
            )

            resp = llm.chat_with_tools([tool], chat_history=chat_history)
            tool_calls = llm.get_tool_calls_from_response(
                resp, error_on_no_tool_call=False
            )
    print("Final response: ", resp.message.content)
chat_history = [ChatMessage(role="user", content="What is the current time?")]
tools_by_name = {t.metadata.name: t for t in [tool]}

resp = llm.chat_with_tools([tool], chat_history=chat_history)
tool_calls = llm.get_tool_calls_from_response(
    resp, error_on_no_tool_call=False
)

if not tool_calls:
    print(resp)
else:
    while tool_calls:
        # add the LLM's response to the chat history
        chat_history.append(resp.message)

        for tool_call in tool_calls:
            tool_name = tool_call.tool_name
            tool_kwargs = tool_call.tool_kwargs

            print(f"Calling {tool_name} with {tool_kwargs}")
            tool_output = tool.call(**tool_kwargs)
            print("Tool output: ", tool_output)
            chat_history.append(
                ChatMessage(
                    role="tool",
                    content=str(tool_output),
                    # most LLMs like Anthropic, OpenAI, etc. need to know the tool call id
                    additional_kwargs={"tool_call_id": tool_call.tool_id},
                )
            )

            resp = llm.chat_with_tools([tool], chat_history=chat_history)
            tool_calls = llm.get_tool_calls_from_response(
                resp, error_on_no_tool_call=False
            )
    print("Final response: ", resp.message.content)

Calling get_current_time with {}
Tool output:  {'time': '2025-03-06 12:43:36'}
Final response:  The current time is 12:43:36 PM on March 6, 2025.

Server-Side Tool Calling¶

Anthropic now also supports server-side tool calling in latest versions.

Here's an example of how to use it:

In [ ]:

Copied!





from llama_index.llms.anthropic import Anthropic

llm = Anthropic(
    model="claude-3-7-sonnet-latest",
    max_tokens=1024,
    tools=[
        {
            "type": "web_search_20250305",
            "name": "web_search",
            "max_uses": 3,  # Limit to 3 searches
        }
    ],
)

# Get response with citations
response = llm.complete("What are the latest AI research trends?")

# Access the main response content
print(response.text)

# Access citations if available
for citation in response.citations:
    print(f"Source: {citation.get('url')} - {citation.get('cited_text')}")
from llama_index.llms.anthropic import Anthropic

llm = Anthropic(
    model="claude-3-7-sonnet-latest",
    max_tokens=1024,
    tools=[
        {
            "type": "web_search_20250305",
            "name": "web_search",
            "max_uses": 3,  # Limit to 3 searches
        }
    ],
)

# Get response with citations
response = llm.complete("What are the latest AI research trends?")

# Access the main response content
print(response.text)

# Access citations if available
for citation in response.citations:
    print(f"Source: {citation.get('url')} - {citation.get('cited_text')}")

I'll help you explore the latest AI research trends. Let me search for the most current information.

Based on the search results, I can provide you with an overview of the latest AI research trends for 2025. Here are the key developments shaping the AI landscape:

## 1. Agentic AI

Autonomous agents are becoming a dominant trend in AI research. These are AI systems that can perform tasks independently without direct human involvement. While such systems have existed for some time, the development of Large Language Models (LLMs) with strong reasoning capabilities has accelerated autonomous agent research exponentially in recent years.

By 2025, AI agents will have advanced capabilities, such as conversing with customers and planning subsequent actions—for example, processing payments, checking for fraud, and completing shipping actions. Software companies are already embedding agentic AI capabilities into their core products.

In 2025, AI will evolve from being merely a tool for work and home to becoming an integral part of both environments. AI-powered agents will operate with greater autonomy to help simplify life both at home and on the job. On the global stage, AI will help address major challenges, from climate change to healthcare access.

## 2. AI Reasoning and Frontier Models

Among the top trends in AI are advancements in AI reasoning and frontier models. The world's largest tech companies are competing to refine cutting-edge AI applications, including large language models' ability to reason like humans and frontier models that push boundaries in natural-language processing, image generation, and coding.

Over the past year, AI models have become faster and more efficient. Today's large-scale "frontier models" can complete a broad range of tasks from writing to coding, while specialized models can be tailored for specific tasks or industries. In 2025, these models will do more and do it better. Models with advanced reasoning capabilities, like OpenAI o1, can already solve complex problems using logical steps similar to human thinking. These capabilities will continue to be valuable in fields such as science, coding, math, law, and medicine.

## 3. Multimodal AI

Multimodal AI, which integrates and processes information from multiple data sources like text, images, and audio, is expected to grow significantly. By 2025, this technology will allow systems to understand and respond to complex inputs more naturally, creating a seamless experience for users. For example, customer support chatbots will soon be able to handle both text and images.

Contemporary autonomous agents often make use of generative AI in a central role, but there will be many more generative AI iterations and applications to come, including multimodal generative AI. Multimodal AI models process and generate various data types instead of focusing exclusively on just one—examples include text-to-image and image-to-audio conversions. Advances in this technology will help systems interpret and generate content across different modalities, leading to applications in healthcare for diagnosis enhancement, autonomous vehicles, more robust content generation, and many other exciting uses. The rise of multimodal generative AI will be a prime catalyst of the continued AI industrial revolution in 2025.

## 4. AI in Scientific Research

One of the most exciting developments to watch in 2025 will be how AI's use in scientific research fuels progress in addressing some of the world's most pressing concerns. According to Ashley Llorens, corporate vice president at Microsoft Research, "We'll start to see these tools having a measurable impact on the throughput of the people and institutions who are working on these huge problems, such as designing sustainable materials and accelerating development of life-saving drugs."

This trend is expected to continue in 2025, with more data sets and models specifically aimed at scientific
Source: https://machinelearningmastery.com/7-machine-learning-trends-2025/ - If you have been paying attention to the latest machine learning buzz terminology, you know that the autonomous agent, and discussion of them, is ever...
Source: https://www.mckinsey.com/capabilities/mckinsey-digital/our-insights/superagency-in-the-workplace-empowering-people-to-unlock-ais-full-potential-at-work - In 2025, an AI agent can converse with a customer and plan the actions it will take afterward—for example, processing a payment, checking for fraud, a...
Source: https://news.microsoft.com/source/features/ai/6-ai-trends-youll-see-more-of-in-2025/ - In 2025, AI will evolve from a tool for work and home to an integral part of both. · AI-powered agents will do more with greater autonomy and help si...
Source: https://www.morganstanley.com/insights/articles/ai-trends-reasoning-frontier-models-2025-tmt - The top trends in new AI frontiers and the focus on enterprises include AI reasoning, custom silicon, cloud migrations, systems to measure AI efficacy...
Source: https://www.morganstanley.com/insights/articles/ai-trends-reasoning-frontier-models-2025-tmt - The top trends in new AI frontiers and the focus on enterprises include AI reasoning, custom silicon, cloud migrations, systems to measure AI efficacy...
Source: https://news.microsoft.com/source/features/ai/6-ai-trends-youll-see-more-of-in-2025/ - · Over the past year, AI models became faster and more efficient. Today, large-scale “frontier models” can complete a broad range of tasks from writin...
Source: https://www.hdwebsoft.com/blog/top-10-ai-and-machine-learning-trends-for-2025.html - Multimodal AI, which integrates and processes information from multiple data sources like text, images, and audio, will grow significantly. By 2025, t...
Source: https://machinelearningmastery.com/7-machine-learning-trends-2025/ - Contemporary autonomous agents, mentioned above, make use of generative AI, often in a central role, but there will be many more generative AI iterati...
Source: https://machinelearningmastery.com/7-machine-learning-trends-2025/ - Advances in multimodal AI technology will help systems interpret and generate content across different modalities, leading to many interesting applica...
Source: https://news.microsoft.com/source/features/ai/6-ai-trends-youll-see-more-of-in-2025/ - One of the most exciting things to watch in 2025 will be how AI’s use in scientific research fuels progress in addressing some of the world’s most pre...
Source: https://www.technologyreview.com/2025/01/08/1109188/whats-next-for-ai-in-2025/ - Expect this trend to continue next year, and to see more data sets and models that are aimed specifically at scientific discovery. Proteins were the p...

Anthropic¶

Set Tokenizer¶

Basic Usage¶

Streaming Support¶

Async Usage¶

Vertex AI Support¶

Bedrock Support¶

Multi-Modal Support¶

Prompt Caching¶

Structured Prediction¶

Structured Prediction with Streaming¶

Model Thinking¶

Tool/Function Calling¶

Server-Side Tool Calling¶