Skip to content

Starter Tutorial (Local Models)#


Make sure you've followed the custom installation steps first.

This is our famous "5 lines of code" starter example with local LLM and embedding models. We will use BAAI/bge-base-en-v1.5 as our embedding model and Llama3 served through Ollama.

Download data#

This example uses the text of Paul Graham's essay, "What I Worked On". This and many other examples can be found in the examples folder of our repo.

The easiest way to get it is to download it via this link and save it in a folder called data.


Ollama is a tool to help you get set up with LLMs locally (currently supported on OSX and Linux. You can install Ollama on Windows through WSL 2).

Follow the README to learn how to install it.

To download the Llama3 model just do ollama pull llama3.

NOTE: You will need a machine with at least 32GB of RAM.

To import llama_index.llms.ollama, you should run pip install llama-index-llms-ollama.

To import llama_index.embeddings.huggingface, you should run pip install llama-index-embeddings-huggingface.

More integrations are all listed on

Load data and build an index#

In the same folder where you created the data folder, create a file called file with the following:

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.ollama import Ollama

documents = SimpleDirectoryReader("data").load_data()

# bge-base embedding model
Settings.embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

# ollama
Settings.llm = Ollama(model="llama3", request_timeout=360.0)

index = VectorStoreIndex.from_documents(

This builds an index over the documents in the data folder (which in this case just consists of the essay text, but could contain many documents).

Your directory structure should look like this:

โ””โ”€โ”€ data
 ย ย  โ””โ”€โ”€ paul_graham_essay.txt

We use the BAAI/bge-base-en-v1.5 model through our HuggingFaceEmbedding class and our Ollama LLM wrapper to load in the Llama3 model. Learn more in the Local Embedding Models page.

Query your data#

Add the following lines to

query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")

This creates an engine for Q&A over your index and asks a simple question. You should get back a response similar to the following: The author wrote short stories and tried to program on an IBM 1401.

You can view logs, persist/load the index similar to our starter example.