🚀 Contributing to LlamaIndex#
Welcome to LlamaIndex! We’re excited that you want to contribute and become part of our growing community. Whether you're interested in building integrations, fixing bugs, or adding exciting new features, we've made it easy for you to get started.
🎯 Quick Start Guide#
If you're ready to dive in, here’s a quick setup guide to get you going:
- Fork the repo and clone your fork.
- Navigate to the project folder:
cd llama_index
- Set up a new virtual environment with
Poetry
:poetry shell
- Install development (and/or docs) dependencies:
poetry install --only dev,docs --no-root
- Install the package(s) you want to work on. You will for sure need to install
llama-index-core
:
pip install -e llama-index-core
From there, you can install specific integrations that you want to work on:
pip install -e llama-index-integrations/llms/llama-index-llms-openai
That’s it! If anything seems unclear, scroll down to the Development Guidelines for more details.
🛠️ What Can You Work On?#
There’s plenty of ways to contribute—whether you’re a seasoned Python developer or just starting out, your contributions are welcome! Here are some ideas:
1. 🆕 Extend Core Modules#
Help us extend LlamaIndex's functionality by contributing to any of our core modules. Think of this as unlocking new superpowers for LlamaIndex!
- New Integrations (e.g., connecting new LLMs, storage systems, or data sources)
- Data Loaders, Vector Stores, and more!
Explore the different modules below to get inspired!
2. 📦 Contribute Tools, Readers, Packs, or Datasets#
Create new Packs, Readers, or Tools that simplify how others use LlamaIndex with various platforms.
3. 🧠 Add New Features#
Have an idea for a feature that could make LlamaIndex even better? Go for it! We love innovative contributions.
4. 🐛 Fix Bugs#
Fixing bugs is a great way to start contributing. Head over to our Github Issues page and find bugs tagged as good first issue
.
5. 🎉 Share Usage Examples#
If you’ve used LlamaIndex in a unique or creative way, consider sharing guides or notebooks. This helps other developers learn from your experience.
6. 🧪 Experiment#
Got an out-there idea? We’re open to experimental features—test it out and make a PR!
7. 📄 Improve Documentation & Code Quality#
Help make the project easier to navigate by refining the docs or cleaning up the codebase. Every improvement counts!
🔥 How to Extend LlamaIndex’s Core Modules#
Data Loaders#
A data loader ingests data from any source and converts it into Document
objects that LlamaIndex can parse and index.
- Interface:
load_data
: Returns a list ofDocument
objects.lazy_load_data
: Returns an iterable ofDocument
objects (useful for large datasets).
Example: MongoDB Reader
💡 Ideas: Want to load data from a source not yet supported? Build a new data loader and submit a PR!
Node Parsers#
A node parser converts Document
objects into Node
objects—atomic chunks of data that LlamaIndex works with.
- Interface:
get_nodes_from_documents
: Returns a list ofNode
objects.
Example: Hierarchical Node Parser
💡 Ideas: Add new ways to structure hierarchical relationships in documents, like play-act-scene or chapter-section formats.
Text Splitters#
A text splitter breaks down large text blocks into smaller chunks—this is key for working with LLMs that have limited context windows.
- Interface:
split_text
: Takes a string and returns smaller strings (chunks).
Example: Token Text Splitter
💡 Ideas: Build specialized text splitters for different content types, like code, dialogues, or dense data!
Vector Stores#
Store embeddings and retrieve them via similarity search with vector stores.
- Interface:
add
,delete
,query
,get_nodes
,delete_nodes
,clear
Example: Pinecone Vector Store
💡 Ideas: Create support for vector databases that aren't yet integrated!
Query Engines & Retrievers#
- Query Engines implement
query
to return structured responses. - Retrievers retrieve relevant nodes based on queries.
💡 Ideas: Design fancy query engines that combine retrievers or add intelligent processing layers!
✨ Steps to Contribute#
- Fork the repository on GitHub.
- Clone your fork to your local machine.
git clone https://github.com/your-username/llama_index.git
- Create a branch for your work.
git checkout -b your-feature-branch
- Set up your environment (follow the Quick Start Guide).
- Work on your feature or bugfix, ensuring you have unit tests covering your code.
- Commit your changes, then push them to your fork.
git push origin your-feature-branch
- Open a pull request on GitHub.
And voilà—your contribution is ready for review!
🧑💻 Development Guidelines#
Repo Structure#
LlamaIndex is organized as a monorepo, meaning different packages live within this single repository. You can focus on a specific package depending on your contribution:
- Core package:
llama-index-core/
- Integrations: e.g.,
llama-index-integrations/
Setting Up Your Environment#
- Install Poetry (if you don’t already have it):
curl -sSL https://install.python-poetry.org | python3 -
- Activate the environment:
poetry shell
- Install dependencies:
poetry install --only dev,docs --no-root
- Install the package(s) you want to work on. You will for sure need to install
llama-index-core
:
pip install -e llama-index-core
From there, you can install specific integrations that you want to work on:
pip install -e llama-index-integrations/llms/llama-index-llms-openai
Running Tests#
We use pytest
for testing. Make sure you run tests in each package you modify:
pytest
If you’re integrating with a remote system, mock it to prevent test failures from external changes.
By default, CICD will fail if test coverage is less than 50% -- so please do add tests for your code!
👥 Join the Community#
We’d love to hear from you and collaborate! Join our Discord community to ask questions, share ideas, or just chat with fellow developers.
Join us on Discord https://discord.gg/dGcwcsnxhU
🌟 Acknowledgements#
Thank you for considering contributing to LlamaIndex! Every contribution—whether it’s code, documentation, or ideas—helps make this project better for everyone.
Happy coding! 😊