For a list of our supported LLMs and a comparison of their functionality, check out our LLM module guide.
One of the first steps when building an LLM-based application is which LLM to use; you can also use more than one if you wish.
LLMs are used at multiple different stages of your pipeline:
During Indexing you may use an LLM to determine the relevance of data (whether to index it at all) or you may use an LLM to summarize the raw data and index the summaries instead.
During Querying LLMs can be used in two ways:
During Retrieval (fetching data from your index) LLMs can be given an array of options (such as multiple different indices) and make decisions about where best to find the information you’re looking for. An agentic LLM can also use tools at this stage to query different data sources.
During Response Synthesis (turning the retrieved data into an answer) an LLM can combine answers to multiple sub-queries into a single coherent answer, or it can transform data, such as from unstructured text to JSON or another programmatic output format.
LlamaIndex provides a single interface to a large number of different LLMs, allowing you to pass in any LLM you choose to any stage of the pipeline. It can be as simple as this:
from llama_index.llms import OpenAI response = OpenAI().complete("Paul Graham is ") print(response)
Usually you will instantiate an LLM and pass it to a
ServiceContext, which you then pass to other stages of the pipeline, as in this example:
from llama_index.llms import OpenAI from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext llm = OpenAI(temperature=0.1, model="gpt-4") service_context = ServiceContext.from_defaults(llm=llm) documents = SimpleDirectoryReader("data").load_data() index = VectorStoreIndex.from_documents( documents, service_context=service_context )
In this case, you’ve instantiated OpenAI and customized it to use the
gpt-4 model instead of the default
gpt-3.5-turbo, and also modified the
VectorStoreIndex will now use gpt-4 to encode or
embed your documents for later querying.
A ServiceContext is a bundle of configuration data that you pass into different parts of LlamaIndex. You can learn more about ServiceContext and how to customize it, including using multiple ServiceContexts to use multiple LLMs.
We support integrations with OpenAI, Hugging Face, PaLM, and more. Check out our module guide to LLMs for a full list, including how to run a local model.
Using a local LLM
LlamaIndex doesn’t just supported hosted LLM APIs; you can also run a local model such as Llama2 locally.
Once you have a local LLM such as Llama 2 installed, you can use it like this:
from llama_index import ServiceContext service_context = ServiceContext.from_defaults(llm="local")
This will use llama2-chat-13B from with LlamaCPP, and assumes you have
llama-cpp-python installed. A full LlamaCPP usage guide is available.
See the custom LLM’s How-To for more details.