Open In Colab

Simple Directory Reader

The SimpleDirectoryReader is the most commonly used data connector that just works.
Simply pass in a input directory or a list of files.
It will select the best file reader based on the file extensions.

Get Started

If you’re opening this Notebook on colab, you will probably need to install LlamaIndex 🦙.

!pip install llama-index
from llama_index import SimpleDirectoryReader

Download Data

!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'

Load specific files

reader = SimpleDirectoryReader(
    input_files=["./data/paul_graham/paul_graham_essay.txt"]
)
docs = reader.load_data()
print(f"Loaded {len(docs)} docs")
Loaded 1 docs

Load all (top-level) files from directory

reader = SimpleDirectoryReader(input_dir="../../end_to_end_tutorials/")
docs = reader.load_data()
print(f"Loaded {len(docs)} docs")
Loaded 72 docs

Load all (recursive) files from directory

# only load markdown files
required_exts = [".md"]

reader = SimpleDirectoryReader(
    input_dir="../../end_to_end_tutorials",
    required_exts=required_exts,
    recursive=True,
)
docs = reader.load_data()
print(f"Loaded {len(docs)} docs")
Loaded 174 docs

Full Configuration

This is the full list of arguments that can be passed to the SimpleDirectoryReader:

class SimpleDirectoryReader(BaseReader):
    """Simple directory reader.

    Load files from file directory. 
    Automatically select the best file reader given file extensions.


    Args:
        input_dir (str): Path to the directory.
        input_files (List): List of file paths to read
            (Optional; overrides input_dir, exclude)
        exclude (List): glob of python file paths to exclude (Optional)
        exclude_hidden (bool): Whether to exclude hidden files (dotfiles).
        encoding (str): Encoding of the files.
            Default is utf-8.
        errors (str): how encoding and decoding errors are to be handled,
                see https://docs.python.org/3/library/functions.html#open
        recursive (bool): Whether to recursively search in subdirectories.
            False by default.
        filename_as_id (bool): Whether to use the filename as the document id.
            False by default.
        required_exts (Optional[List[str]]): List of required extensions.
            Default is None.
        file_extractor (Optional[Dict[str, BaseReader]]): A mapping of file
            extension to a BaseReader class that specifies how to convert that file
            to text. If not specified, use default from DEFAULT_FILE_READER_CLS.
        num_files_limit (Optional[int]): Maximum number of files to read.
            Default is None.
        file_metadata (Optional[Callable[str, Dict]]): A function that takes
            in a filename and returns a Dict of metadata for the Document.
            Default is None.
"""