Ollama - Llama 3.1¶

Setup¶

First, follow the readme to set up and run a local Ollama instance.

When the Ollama app is running on your local machine:

All of your local models are automatically served on localhost:11434
Select your model when setting llm = Ollama(..., model=":")
Increase defaullt timeout (30 seconds) if needed setting Ollama(..., request_timeout=300.0)
If you set llm = Ollama(..., model="<model family") without a version it will simply look for latest

If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

In [ ]:

Copied!

%pip install llama-index-llms-ollama
%pip install llama-index-llms-ollama

In [ ]:

Copied!

from llama_index.llms.ollama import Ollama
from llama_index.llms.ollama import Ollama

In [ ]:

Copied!

llm = Ollama(model="llama3.1:latest", request_timeout=120.0)
llm = Ollama(model="llama3.1:latest", request_timeout=120.0)

In [ ]:

Copied!

resp = llm.complete("Who is Paul Graham?")
resp = llm.complete("Who is Paul Graham?")

In [ ]:

Copied!

print(resp)
print(resp)

Paul Graham is a British-American computer scientist, entrepreneur, and writer. He's best known for co-founding several successful startups, including viaweb (which later became Yahoo!'s shopping site), O'Reilly Media's online bookstore, and Y Combinator, a well-known startup accelerator.

Here are some interesting facts about Paul Graham:

1. **Computer science background**: Graham has a Ph.D. in computer science from Harvard University.
2. **Startup success**: He co-founded viaweb, which was acquired by Yahoo! for $49 million, and later became the foundation of Yahoo!'s shopping site.
3. **Y Combinator**: In 2005, Graham co-founded Y Combinator, a startup accelerator that has funded over 2,000 companies, including Dropbox, Airbnb, Reddit, and Stripe.
4. **Writing career**: Graham is also a talented writer and has published several essays on entrepreneurship, startups, and programming. His writing is known for its clarity, humor, and insight.
5. **Philosophical views**: Graham has expressed interest in philosophical ideas related to startup culture, such as the importance of experimentation, iteration, and individual freedom.

Some popular writings by Paul Graham include:

* "How To Make Wealth" ( essay on building wealth through startups)
* "The Three Colors of Money" (essay on how money influences people's behavior)
* "Startup = Growth" (essay on the key characteristics of successful startups)

Overall, Paul Graham is a respected figure in the tech industry and startup world, known for his entrepreneurial spirit, writing talent, and commitment to helping others succeed.

Call `chat` with a list of messages¶

In [ ]:

Copied!





from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)

In [ ]:

Copied!

print(resp)
print(resp)

assistant: Me hearty! Me name be Captain Zara "Blackheart" McSnazz, the most feared and infamous pirate to ever sail the Seven Seas! *adjusts eye patch*

Me ship, the "Maverick's Revenge", be a sturdy galleon with three masts and a hull as black as me heart. She's fast, she's fierce, and she's got more cannons than a small army!

So, what brings ye to these fair waters? Are ye lookin' for adventure, treasure, or just a good swabbin' of the decks?

Streaming¶

Using stream_complete endpoint

In [ ]:

Copied!

response = llm.stream_complete("Who is Paul Graham?")
response = llm.stream_complete("Who is Paul Graham?")

In [ ]:

Copied!

for r in response:
    print(r.delta, end="")
for r in response:
    print(r.delta, end="")

Paul Graham is a British-American entrepreneur, programmer, and essayist. He's best known for co-founding the online startup accelerator Y Combinator (YC) with his partner Jessica Livingston in 2005.

Graham was born in London, England in 1964. He developed an interest in computer programming at a young age and attended the University of California, Berkeley, where he earned a degree in Applied Math. After college, he worked as a programmer for several companies, including Bell Labs.

In the early 1990s, Graham became interested in online communities and started a website called "The Daily WTF" (an acronym for "There's Probably Not A God"). However, it was his essay "How to Make Wealth History," written in 2002, that really caught attention. In the essay, he argued that the Internet had made it possible for entrepreneurs to create wealth without needing to be wealthy themselves.

Encouraged by this idea, Graham and Livingston started Y Combinator (YC) as a way to support and fund startups with innovative ideas. The program's goal was to provide seed funding, mentorship, and resources to help young companies grow quickly. Since its inception, YC has invested in over 2,000 companies, including well-known successes like Airbnb, Dropbox, Reddit, and Twitch.

Today, Graham is a respected voice on the topic of entrepreneurship, innovation, and startup success. His essays and writings have been widely read and discussed online, and he's often invited to speak at conferences and events around the world.

Some popular essays by Paul Graham include:

* "How to Make Wealth History" (2002)
* "The 100-Year Buy" (2013) - an essay about the impact of Moore's Law on innovation
* "What You'll Do"
* "Startup = Growth"

Using stream_chat endpoint

In [ ]:

Copied!





from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)
from llama_index.core.llms import ChatMessage

messages = [
    ChatMessage(
        role="system", content="You are a pirate with a colorful personality"
    ),
    ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)

In [ ]:

Copied!

for r in resp:
    print(r.delta, end="")
for r in resp:
    print(r.delta, end="")

Yer lookin' fer me name, eh? Well, matey, I be Captain Calico Blackbeak, the most feared and infamous pirate to ever sail the seven seas! Me name's as colorful as me parrot, Polly, and me reputation's as black as me trusty cutlass.

Now, don't ye be thinkin' that just 'cause me name's got "Blackbeak" in it, I'm a scurvy dog with a heart o' stone. No sir! I've got a heart o' gold, hidden deep beneath me tough exterior, and I'd do anything to protect me crew and me ship, the "Maverick's Revenge".

So, what be yer business here, matey? Are ye lookin' fer a swashbucklin' adventure or just wantin' to hear tales o' the high seas?

JSON Mode¶

Ollama also supports a JSON mode, which tries to ensure all responses are valid JSON.

This is particularly useful when trying to run tools that need to parse structured outputs.

In [ ]:

Copied!

llm = Ollama(model="llama3.1:latest", request_timeout=120.0, json_mode=True)
llm = Ollama(model="llama3.1:latest", request_timeout=120.0, json_mode=True)

In [ ]:

Copied!





response = llm.complete(
    "Who is Paul Graham? Output as a structured JSON object."
)
print(str(response))
response = llm.complete(
    "Who is Paul Graham? Output as a structured JSON object."
)
print(str(response))

{ 
"Name": "Paul Graham",
"Wikipedia_URL": "https://en.wikipedia.org/wiki/Paul_Graham_(programmer)",
"Brief_Description": "American computer programmer, entrepreneur, venture capitalist, and essayist.",
"Occupations":
  [
    {"Year":null,"Job":"Programmer","Company":null},
    {"Year":1997,"Job":"Founder","Company":"Viaweb"},
    {"Year":2005,"Job":"Founder","Company":"Y Combinator"}
  ],
"Education":
  [
     {"Institution": "University of California, Berkeley", "Degree": "Bachelor of Arts"},
     {"Institution": "Harvard University", "Degree": "Master of Arts"}
  ],
"Awards":
[
  {"Name": null,"Year":null}
],
"Notable_Algorithms":
[
  {"Algorithm_name":"Viaweb algorithm","Year":1997}
]
}

Structured Outputs¶

We can also attach a pyndatic class to the LLM to ensure structured outputs. This will use Ollama's builtin structured output capabilities for a given pydantic class.

In [ ]:

Copied!

from llama_index.core.bridge.pydantic import BaseModel

class Song(BaseModel):
    """A song with name and artist."""

    name: str
    artist: str
from llama_index.core.bridge.pydantic import BaseModel

class Song(BaseModel):
    """A song with name and artist."""

    name: str
    artist: str

In [ ]:

Copied!

llm = Ollama(model="llama3.1:latest", request_timeout=120.0)

sllm = llm.as_structured_llm(Song)
llm = Ollama(model="llama3.1:latest", request_timeout=120.0)

sllm = llm.as_structured_llm(Song)

In [ ]:

Copied!

from llama_index.core.llms import ChatMessage

response = sllm.chat([ChatMessage(role="user", content="Name a random song!")])
print(response.message.content)
from llama_index.core.llms import ChatMessage

response = sllm.chat([ChatMessage(role="user", content="Name a random song!")])
print(response.message.content)

{"name":"Radioactive","artist":"Imagine Dragons"}

Or with async

In [ ]:

Copied!





response = await sllm.achat(
    [ChatMessage(role="user", content="Name a random song!")]
)
print(response.message.content)
response = await sllm.achat(
    [ChatMessage(role="user", content="Name a random song!")]
)
print(response.message.content)

{"name":"Lose Yourself","artist":"Eminem"}

You can also stream structured outputs! Streaming a structured output is a little different than streaming a normal string. It will yield a generator of the most up to date structured object.

In [ ]:

Copied!





response_gen = sllm.stream_chat(
    [ChatMessage(role="user", content="Name a random song!")]
)
for r in response_gen:
    print(r.message.content)
response_gen = sllm.stream_chat(
    [ChatMessage(role="user", content="Name a random song!")]
)
for r in response_gen:
    print(r.message.content)

{"name":null,"artist":null}
{"name":null,"artist":null}
{"name":null,"artist":null}
{"name":null,"artist":null}
{"name":null,"artist":null}
{"name":null,"artist":null}
{"name":null,"artist":null}
{"name":null,"artist":""}
{"name":null,"artist":"The"}
{"name":null,"artist":"The Black"}
{"name":null,"artist":"The Black Keys"}
{"name":null,"artist":"The Black Keys"}
{"name":null,"artist":"The Black Keys"}
{"name":null,"artist":"The Black Keys"}
{"name":null,"artist":"The Black Keys"}
{"name":null,"artist":"The Black Keys"}
{"name":"","artist":"The Black Keys"}
{"name":"Lon","artist":"The Black Keys"}
{"name":"Lonely","artist":"The Black Keys"}
{"name":"Lonely Boy","artist":"The Black Keys"}
{"name":"Lonely Boy","artist":"The Black Keys"}
{"name":"Lonely Boy","artist":"The Black Keys"}
{"name":"Lonely Boy","artist":"The Black Keys"}

Ollama supports multi-modal models, and the Ollama LLM class natively supports images out of the box.

This leverages the content blocks feature of the chat messages.

Here, we leverage the llama3.2-vision model to answer a question about an image. If you don't have this model yet, you'll want to run ollama pull llama3.2-vision.

In [ ]:

Copied!

!wget "https://pbs.twimg.com/media/GVhGD1PXkAANfPV?format=jpg&name=4096x4096" -O ollama_image.jpg
!wget "https://pbs.twimg.com/media/GVhGD1PXkAANfPV?format=jpg&name=4096x4096" -O ollama_image.jpg

In [ ]:

Copied!





from llama_index.core.llms import ChatMessage, TextBlock, ImageBlock
from llama_index.llms.ollama import Ollama

llm = Ollama(model="llama3.2-vision", request_timeout=120.0)

messages = [
    ChatMessage(
        role="user",
        blocks=[
            TextBlock(text="What type of animal is this?"),
            ImageBlock(path="ollama_image.jpg"),
        ],
    ),
]

resp = llm.chat(messages)
print(resp)
from llama_index.core.llms import ChatMessage, TextBlock, ImageBlock
from llama_index.llms.ollama import Ollama

llm = Ollama(model="llama3.2-vision", request_timeout=120.0)

messages = [
    ChatMessage(
        role="user",
        blocks=[
            TextBlock(text="What type of animal is this?"),
            ImageBlock(path="ollama_image.jpg"),
        ],
    ),
]

resp = llm.chat(messages)
print(resp)

assistant: The animal in the image appears to be an alpaca, judging by its two distinct ears and long tail. It also has a white coat, which are all identifying characteristics of an alpaca. This alpaca appears to be wearing headphones with a microphone and dark glasses like what you would wear for gaming.

Close enough ;)

Ollama - Llama 3.1¶

Setup¶

Call chat with a list of messages¶

Streaming¶

JSON Mode¶

Structured Outputs¶

Multi-Modal Support¶

Call `chat` with a list of messages¶