Docling
DoclingReader #
Bases: BasePydanticReader
Docling Reader.
Extracts PDF, DOCX, and other document formats into LlamaIndex Documents as either Markdown or JSON-serialized Docling native format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
export_type
|
Literal[markdown, json]
|
The type to export to. Defaults to "markdown". |
required |
doc_converter
|
DocumentConverter
|
The Docling converter to use. Default factory: |
required |
md_export_kwargs
|
Dict[str, Any]
|
Kwargs to use in case of markdown export. Defaults to |
required |
id_func
|
(DocIDGenCallable, optional): Doc ID generation function to use. Default: |
required |
Source code in llama-index-integrations/readers/llama-index-readers-docling/llama_index/readers/docling/base.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
lazy_load_data #
lazy_load_data(file_path: str | Path | Iterable[str] | Iterable[Path], extra_info: dict | None = None) -> Iterable[Document]
Lazily load from given source.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
str | Path | Iterable[str] | Iterable[Path]
|
Document file source as single str (URL or local file) or pathlib.Path — or iterable thereof |
required |
extra_info
|
dict | None
|
Any pre-existing metadata to include. Defaults to None. |
None
|
Returns:
Type | Description |
---|---|
Iterable[Document]
|
Iterable[LIDocument]: Iterable over the created LlamaIndex documents. |
Source code in llama-index-integrations/readers/llama-index-readers-docling/llama_index/readers/docling/base.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|