Ollama - Llama 3.1¶
Setup¶
First, follow the readme to set up and run a local Ollama instance.
When the Ollama app is running on your local machine:
- All of your local models are automatically served on localhost:11434
- Select your model when setting llm = Ollama(..., model="
: ") - Increase defaullt timeout (30 seconds) if needed setting Ollama(..., request_timeout=300.0)
- If you set llm = Ollama(..., model="<model family") without a version it will simply look for latest
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
In [ ]:
Copied!
%pip install llama-index-llms-ollama
%pip install llama-index-llms-ollama
In [ ]:
Copied!
from llama_index.llms.ollama import Ollama
from llama_index.llms.ollama import Ollama
In [ ]:
Copied!
llm = Ollama(model="llama3.1:latest", request_timeout=120.0)
llm = Ollama(model="llama3.1:latest", request_timeout=120.0)
In [ ]:
Copied!
resp = llm.complete("Who is Paul Graham?")
resp = llm.complete("Who is Paul Graham?")
In [ ]:
Copied!
print(resp)
print(resp)
Paul Graham is a British-American computer scientist, entrepreneur, and writer. He's best known for co-founding several successful startups, including viaweb (which later became Yahoo!'s shopping site), O'Reilly Media's online bookstore, and Y Combinator, a well-known startup accelerator. Here are some interesting facts about Paul Graham: 1. **Computer science background**: Graham has a Ph.D. in computer science from Harvard University. 2. **Startup success**: He co-founded viaweb, which was acquired by Yahoo! for $49 million, and later became the foundation of Yahoo!'s shopping site. 3. **Y Combinator**: In 2005, Graham co-founded Y Combinator, a startup accelerator that has funded over 2,000 companies, including Dropbox, Airbnb, Reddit, and Stripe. 4. **Writing career**: Graham is also a talented writer and has published several essays on entrepreneurship, startups, and programming. His writing is known for its clarity, humor, and insight. 5. **Philosophical views**: Graham has expressed interest in philosophical ideas related to startup culture, such as the importance of experimentation, iteration, and individual freedom. Some popular writings by Paul Graham include: * "How To Make Wealth" ( essay on building wealth through startups) * "The Three Colors of Money" (essay on how money influences people's behavior) * "Startup = Growth" (essay on the key characteristics of successful startups) Overall, Paul Graham is a respected figure in the tech industry and startup world, known for his entrepreneurial spirit, writing talent, and commitment to helping others succeed.
Call chat
with a list of messages¶
In [ ]:
Copied!
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)
In [ ]:
Copied!
print(resp)
print(resp)
assistant: Me hearty! Me name be Captain Zara "Blackheart" McSnazz, the most feared and infamous pirate to ever sail the Seven Seas! *adjusts eye patch* Me ship, the "Maverick's Revenge", be a sturdy galleon with three masts and a hull as black as me heart. She's fast, she's fierce, and she's got more cannons than a small army! So, what brings ye to these fair waters? Are ye lookin' for adventure, treasure, or just a good swabbin' of the decks?
Streaming¶
Using stream_complete
endpoint
In [ ]:
Copied!
response = llm.stream_complete("Who is Paul Graham?")
response = llm.stream_complete("Who is Paul Graham?")
In [ ]:
Copied!
for r in response:
print(r.delta, end="")
for r in response:
print(r.delta, end="")
Paul Graham is a British-American entrepreneur, programmer, and essayist. He's best known for co-founding the online startup accelerator Y Combinator (YC) with his partner Jessica Livingston in 2005. Graham was born in London, England in 1964. He developed an interest in computer programming at a young age and attended the University of California, Berkeley, where he earned a degree in Applied Math. After college, he worked as a programmer for several companies, including Bell Labs. In the early 1990s, Graham became interested in online communities and started a website called "The Daily WTF" (an acronym for "There's Probably Not A God"). However, it was his essay "How to Make Wealth History," written in 2002, that really caught attention. In the essay, he argued that the Internet had made it possible for entrepreneurs to create wealth without needing to be wealthy themselves. Encouraged by this idea, Graham and Livingston started Y Combinator (YC) as a way to support and fund startups with innovative ideas. The program's goal was to provide seed funding, mentorship, and resources to help young companies grow quickly. Since its inception, YC has invested in over 2,000 companies, including well-known successes like Airbnb, Dropbox, Reddit, and Twitch. Today, Graham is a respected voice on the topic of entrepreneurship, innovation, and startup success. His essays and writings have been widely read and discussed online, and he's often invited to speak at conferences and events around the world. Some popular essays by Paul Graham include: * "How to Make Wealth History" (2002) * "The 100-Year Buy" (2013) - an essay about the impact of Moore's Law on innovation * "What You'll Do" * "Startup = Growth"
Using stream_chat
endpoint
In [ ]:
Copied!
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)
In [ ]:
Copied!
for r in resp:
print(r.delta, end="")
for r in resp:
print(r.delta, end="")
Yer lookin' fer me name, eh? Well, matey, I be Captain Calico Blackbeak, the most feared and infamous pirate to ever sail the seven seas! Me name's as colorful as me parrot, Polly, and me reputation's as black as me trusty cutlass. Now, don't ye be thinkin' that just 'cause me name's got "Blackbeak" in it, I'm a scurvy dog with a heart o' stone. No sir! I've got a heart o' gold, hidden deep beneath me tough exterior, and I'd do anything to protect me crew and me ship, the "Maverick's Revenge". So, what be yer business here, matey? Are ye lookin' fer a swashbucklin' adventure or just wantin' to hear tales o' the high seas?
JSON Mode¶
Ollama also supports a JSON mode, which tries to ensure all responses are valid JSON.
This is particularly useful when trying to run tools that need to parse structured outputs.
In [ ]:
Copied!
llm = Ollama(model="llama3.1:latest", request_timeout=120.0, json_mode=True)
llm = Ollama(model="llama3.1:latest", request_timeout=120.0, json_mode=True)
In [ ]:
Copied!
response = llm.complete(
"Who is Paul Graham? Output as a structured JSON object."
)
print(str(response))
response = llm.complete(
"Who is Paul Graham? Output as a structured JSON object."
)
print(str(response))
{ "Name": "Paul Graham", "Wikipedia_URL": "https://en.wikipedia.org/wiki/Paul_Graham_(programmer)", "Brief_Description": "American computer programmer, entrepreneur, venture capitalist, and essayist.", "Occupations": [ {"Year":null,"Job":"Programmer","Company":null}, {"Year":1997,"Job":"Founder","Company":"Viaweb"}, {"Year":2005,"Job":"Founder","Company":"Y Combinator"} ], "Education": [ {"Institution": "University of California, Berkeley", "Degree": "Bachelor of Arts"}, {"Institution": "Harvard University", "Degree": "Master of Arts"} ], "Awards": [ {"Name": null,"Year":null} ], "Notable_Algorithms": [ {"Algorithm_name":"Viaweb algorithm","Year":1997} ] }
Structured Outputs¶
We can also attach a pyndatic class to the LLM to ensure structured outputs
In [ ]:
Copied!
from llama_index.core.bridge.pydantic import BaseModel
from llama_index.core.tools import FunctionTool
class Song(BaseModel):
"""A song with name and artist."""
name: str
artist: str
from llama_index.core.bridge.pydantic import BaseModel
from llama_index.core.tools import FunctionTool
class Song(BaseModel):
"""A song with name and artist."""
name: str
artist: str
In [ ]:
Copied!
llm = Ollama(model="llama3.1:latest", request_timeout=120.0)
sllm = llm.as_structured_llm(Song)
llm = Ollama(model="llama3.1:latest", request_timeout=120.0)
sllm = llm.as_structured_llm(Song)
In [ ]:
Copied!
response = sllm.chat([ChatMessage(role="user", content="Name a random song!")])
print(response.message.content)
response = sllm.chat([ChatMessage(role="user", content="Name a random song!")])
print(response.message.content)
{"name": "Yesterday", "artist": "The Beatles"}
Or with async
In [ ]:
Copied!
response = await sllm.achat(
[ChatMessage(role="user", content="Name a random song!")]
)
print(response.message.content)
response = await sllm.achat(
[ChatMessage(role="user", content="Name a random song!")]
)
print(response.message.content)
{"name": "Happy Birthday to You", "artist": "Traditional"}
Currently, Ollama does not support streaming structured objects. But hopefully soon!