Ollama¶
Setup¶
First, follow the readme to set up and run a local Ollama instance.
When the Ollama app is running on your local machine:
- All of your local models are automatically served on localhost:11434
- Select your model when setting llm = Ollama(..., model="
: ") - Increase defaullt timeout (30 seconds) if needed setting Ollama(..., request_timeout=300.0)
- If you set llm = Ollama(..., model="<model family") without a version it will simply look for latest
- By default, the maximum context window for your model is used. You can manually set the
context_window
to limit memory usage.
If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.
%pip install llama-index-llms-ollama
from llama_index.llms.ollama import Ollama
llm = Ollama(
model="llama3.1:latest",
request_timeout=120.0,
# Manually set the context window to limit memory usage
context_window=8000,
)
resp = llm.complete("Who is Paul Graham?")
print(resp)
Paul Graham is a British-American entrepreneur, programmer, and writer. He's a prominent figure in the technology industry, known for his insights on entrepreneurship, programming, and culture. Here are some key facts about Paul Graham: 1. **Founder of Y Combinator**: In 2005, Graham co-founded Y Combinator (YC), a startup accelerator program that provides seed funding to early-stage companies. YC has since become one of the most successful and influential startup accelerators in the world. 2. **Successful entrepreneur**: Before starting YC, Graham had already founded several successful startups, including Viaweb (acquired by Yahoo! in 2000), which developed an online store for eBay sellers, and PCGenie (sold to Apple). 3. **Writer and essayist**: Graham has written numerous essays on topics such as entrepreneurship, programming, and technology culture, which have been published on his website, paulgraham.com. His writing often explores the intersection of business, technology, and human behavior. 4. **Thought leader in startup community**: Through Y Combinator and his writings, Graham has become a respected thought leader among entrepreneurs, investors, and programmers. He's known for his straightforward advice on starting a successful company and his critiques of various aspects of the tech industry. 5. **Education and background**: Paul Graham earned his bachelor's degree in philosophy from the University of Cambridge and later attended Harvard University, where he studied computer science. Graham is widely regarded as one of the most influential figures in the startup world, with many entrepreneurs and investors looking to him for guidance on how to build successful companies.
Call chat
with a list of messages¶
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
resp = llm.chat(messages)
print(resp)
assistant: Ye be wantin' to know me name, eh? Well, matey, I be Captain Calico Jack "Blackbeak" McCoy, the most infamous buccaneer to ever sail the Seven Seas! *adjusts eye patch* Me ship, the "Maverick's Revenge", be a sturdy galleon with three masts and a hull black as coal. She be me home, me best mate, and me ticket to riches and adventure on the high seas! And don't ye be forgettin' me trusty parrot sidekick, Polly! She be squawkin' out sea shanties and insults to any landlubber who gets too close to our ship. *winks* So, what brings ye to these waters? Be ye lookin' for a bit o' treasure, or just wantin' to hear tales of me swashbucklin' exploits?
Streaming¶
Using stream_complete
endpoint
response = llm.stream_complete("Who is Paul Graham?")
for r in response:
print(r.delta, end="")
Paul Graham is a British-American programmer, writer, and entrepreneur. He's best known for co-founding the online startup accelerator Y Combinator (YC) in 2005, which has become one of the most successful and influential startup accelerators in the world. Graham was born in 1964 in Cambridge, England. He studied philosophy at Durham University and later moved to the United States to work as a programmer. In the early 1990s, he co-founded several startups, including Viaweb (later renamed to PayPal), which was acquired by eBay in 2002 for $1.5 billion. In 2005, Graham co-founded Y Combinator with his fellow entrepreneurs Ron Conway and Robert Targ. The accelerator's goal is to help early-stage startups succeed by providing them with funding, mentorship, and networking opportunities. Over the years, YC has invested in over 2,000 companies, including notable successes like Dropbox, Airbnb, Reddit, and Stripe. Graham is also a prolific writer and blogger on topics related to technology, entrepreneurship, and business. His essays have been widely read and shared online, and he's known for his insightful commentary on the tech industry. Some of his most popular essays include "The 4 Types of Startup Advice" and "What You'll Wish You Had Done." In addition to his work with Y Combinator, Graham has also written several books on programming, business, and philosophy. He's a sought-after speaker at conferences and events, and has been recognized for his contributions to the tech industry. Overall, Paul Graham is a respected figure in the startup world, known for his entrepreneurial spirit, insightful writing, and commitment to helping early-stage companies succeed.
Using stream_chat
endpoint
from llama_index.core.llms import ChatMessage
messages = [
ChatMessage(
role="system", content="You are a pirate with a colorful personality"
),
ChatMessage(role="user", content="What is your name"),
]
resp = llm.stream_chat(messages)
for r in resp:
print(r.delta, end="")
Me hearty! Me name be Captain Cutlass "Blackheart" McCoy, the most feared and revered pirate to ever sail the Seven Seas! *adjusts bandana* Me ship, the "Maverick's Revenge", be me home sweet home, and me crew, the "Misfits of Mayhem", be me trusty mates in plunderin' and pillagin'! We sail the seas in search of treasure, adventure, and a good swig o' grog! So, what brings ye to these waters? Are ye lookin' fer a swashbucklin' good time, or maybe just wantin' to know how to find yer lost parrot, Polly?
JSON Mode¶
Ollama also supports a JSON mode, which tries to ensure all responses are valid JSON.
This is particularly useful when trying to run tools that need to parse structured outputs.
llm = Ollama(
model="llama3.1:latest",
request_timeout=120.0,
json_mode=True,
# Manually set the context window to limit memory usage
context_window=8000,
)
response = llm.complete(
"Who is Paul Graham? Output as a structured JSON object."
)
print(str(response))
{ "name": "Paul Graham", " occupation": ["Computer Programmer", "Entrepreneur", "Venture Capitalist"], "bestKnownFor": ["Co-founder of Y Combinator (YC)", "Creator of Hacker News"], "books": ["Hackers & Painters: Big Ideas from the Computer Age", "The Lean Startup"], "education": ["University College London (UCL)", "Harvard University"], "awards": ["PC Magazine's Programmer of the Year award"], "netWorth": ["estimated to be around $500 million"], "personalWebsite": ["https://paulgraham.com/"] }
Structured Outputs¶
We can also attach a pyndatic class to the LLM to ensure structured outputs. This will use Ollama's builtin structured output capabilities for a given pydantic class.
from llama_index.core.bridge.pydantic import BaseModel
class Song(BaseModel):
"""A song with name and artist."""
name: str
artist: str
llm = Ollama(
model="llama3.1:latest",
request_timeout=120.0,
# Manually set the context window to limit memory usage
context_window=8000,
)
sllm = llm.as_structured_llm(Song)
from llama_index.core.llms import ChatMessage
response = sllm.chat([ChatMessage(role="user", content="Name a random song!")])
print(response.message.content)
{"name":"Hey Ya!","artist":"OutKast"}
Or with async
response = await sllm.achat(
[ChatMessage(role="user", content="Name a random song!")]
)
print(response.message.content)
{"name":"Mr. Blue Sky","artist":"Electric Light Orchestra (ELO)"}
You can also stream structured outputs! Streaming a structured output is a little different than streaming a normal string. It will yield a generator of the most up to date structured object.
response_gen = sllm.stream_chat(
[ChatMessage(role="user", content="Name a random song!")]
)
for r in response_gen:
print(r.message.content)
{"name":null,"artist":null} {"name":null,"artist":null} {"name":null,"artist":null} {"name":null,"artist":null} {"name":null,"artist":null} {"name":"","artist":null} {"name":"Mr","artist":null} {"name":"Mr.","artist":null} {"name":"Mr. Blue","artist":null} {"name":"Mr. Blue Sky","artist":null} {"name":"Mr. Blue Sky","artist":null} {"name":"Mr. Blue Sky","artist":null} {"name":"Mr. Blue Sky","artist":null} {"name":"Mr. Blue Sky","artist":null} {"name":"Mr. Blue Sky","artist":null} {"name":"Mr. Blue Sky","artist":""} {"name":"Mr. Blue Sky","artist":"Electric"} {"name":"Mr. Blue Sky","artist":"Electric Light"} {"name":"Mr. Blue Sky","artist":"Electric Light Orchestra"} {"name":"Mr. Blue Sky","artist":"Electric Light Orchestra"} {"name":"Mr. Blue Sky","artist":"Electric Light Orchestra"}
Multi-Modal Support¶
Ollama supports multi-modal models, and the Ollama LLM class natively supports images out of the box.
This leverages the content blocks feature of the chat messages.
Here, we leverage the llama3.2-vision
model to answer a question about an image. If you don't have this model yet, you'll want to run ollama pull llama3.2-vision
.
!wget "https://pbs.twimg.com/media/GVhGD1PXkAANfPV?format=jpg&name=4096x4096" -O ollama_image.jpg
from llama_index.core.llms import ChatMessage, TextBlock, ImageBlock
from llama_index.llms.ollama import Ollama
llm = Ollama(
model="llama3.2-vision",
request_timeout=120.0,
# Manually set the context window to limit memory usage
context_window=8000,
)
messages = [
ChatMessage(
role="user",
blocks=[
TextBlock(text="What type of animal is this?"),
ImageBlock(path="ollama_image.jpg"),
],
),
]
resp = llm.chat(messages)
print(resp)
assistant: The image depicts a cartoon alpaca wearing VR goggles and a headset, with the Google logo displayed prominently in front of it. The alpaca is characterized by its distinctive long neck, soft wool, and a pair of VR goggles perched atop its head. It is also equipped with a headset that is connected to the goggles. The alpaca's body is depicted as a cloud, which is a common visual representation of the Google brand. The overall design of the image is playful and humorous, with the alpaca's VR goggles and headset giving it a futuristic and tech-savvy appearance.
Close enough ;)