Using LLMs#

Tip

For a list of our supported LLMs and a comparison of their functionality, check out our LLM module guide.

One of the first steps when building an LLM-based application is which LLM to use; they have different strengths and price points and you may wish to use more than one.

LlamaIndex provides a single interface to a large number of different LLMs. Using an LLM can be as simple as installing the appropriate integration:

pip install llama-index-llms-openai

And then calling it in a one-liner:

from llama_index.llms.openai import OpenAI

response = OpenAI().complete("William Shakespeare is ")
print(response)

Note that this requires an API key called OPENAI_API_KEY in your environment; see the starter tutorial for more details.

complete is also available as an async method, acomplete.

You can also get a streaming response by calling stream_complete, which returns a generator that yields tokens as they are produced:

handle = OpenAI().stream_complete("William Shakespeare is ")

for token in handle:
    print(token.delta, end="", flush=True)

stream_complete is also available as an async method, astream_complete.

Chat interface#

The LLM class also implements a chat method, which allows you to have more sophisticated interactions:

messages = [
    ChatMessage(role="system", content="You are a helpful assistant."),
    ChatMessage(role="user", content="Tell me a joke."),
]
chat_response = llm.chat(messages)

stream_chat and astream_chat are also available.

Specifying models#

Many LLM integrations provide more than one model. You can specify a model by passing the model parameter to the LLM constructor:

llm = OpenAI(model="gpt-4o-mini")
response = llm.complete("Who is Laurie Voss?")
print(response)

Some LLMs support multi-modal chat messages. This means that you can pass in a mix of text and other modalities (images, audio, video, etc.) and the LLM will handle it.

Currently, LlamaIndex supports text, images, and audio inside ChatMessages using content blocks.

from llama_index.core.llms import ChatMessage, TextBlock, ImageBlock
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o")

messages = [
    ChatMessage(
        role="user",
        blocks=[
            ImageBlock(path="image.png"),
            TextBlock(text="Describe the image in a few sentences."),
        ],
    )
]

resp = llm.chat(messages)
print(resp.message.content)

Tool Calling#

Some LLMs (OpenAI, Anthropic, Gemini, Ollama, etc.) support tool calling directly over API calls -- this means tools and functions can be called without specific prompts and parsing mechanisms.

from llama_index.core.tools import FunctionTool
from llama_index.llms.openai import OpenAI


def generate_song(name: str, artist: str) -> Song:
    """Generates a song with provided name and artist."""
    return {"name": name, "artist": artist}


tool = FunctionTool.from_defaults(fn=generate_song)

llm = OpenAI(model="gpt-4o")
response = llm.predict_and_call(
    [tool],
    "Pick a random song for me",
)
print(str(response))

For more details on even more advanced tool calling, check out the in-depth guide using OpenAI. The same approaches work for any LLM that supports tools/functions (e.g. Anthropic, Gemini, Ollama, etc.).

You can learn more about tools and agents in the tools guide.

Available LLMs#

We support integrations with OpenAI, Anthropic, Mistral, DeepSeek, Hugging Face, and dozens more. Check out our module guide to LLMs for a full list, including how to run a local model.

Tip

A general note on privacy and LLM usage can be found on the privacy page.

Using a local LLM#

LlamaIndex doesn't just support hosted LLM APIs; you can also run a local model such as Meta's Llama 3 locally. For example, if you have Ollama installed and running:

from llama_index.llms.ollama import Ollama

llm = Ollama(
    model="llama3.3",
    request_timeout=60.0,
    # Manually set the context window to limit memory usage
    context_window=8000,
)

See the custom LLM's How-To for more details on using and configuring LLM models.