File
CSVReader #
Bases: BaseReader
CSV parser.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
concat_rows |
bool
|
whether to concatenate all rows into one document. If set to False, a Document will be created for each row. True by default. |
True
|
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/tabular/base.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None) -> List[Document]
Parse file.
Returns:
Type | Description |
---|---|
List[Document]
|
Union[str, List[str]]: a string or a List of strings. |
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/tabular/base.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
|
DocxReader #
Bases: BaseReader
Docx parser.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/docs/base.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None, fs: Optional[AbstractFileSystem] = None) -> List[Document]
Parse file.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/docs/base.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
|
EpubReader #
Bases: BaseReader
Epub Parser.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/epub/base.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None, fs: Optional[AbstractFileSystem] = None) -> List[Document]
Parse file.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/epub/base.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
FlatReader #
Bases: BaseReader
Flat reader.
Extract raw text from a file and save the file type in the metadata
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/flat/base.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None) -> List[Document]
Parse file into string.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/flat/base.py
23 24 25 26 27 28 29 30 31 32 33 |
|
HTMLTagReader #
Bases: BaseReader
Read HTML files and extract text from a specific tag with BeautifulSoup.
By default, reads the text from the <section>
tag.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/html/base.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
|
HWPReader #
Bases: BaseReader
Hwp Parser.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/docs/base.py
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None, fs: Optional[AbstractFileSystem] = None) -> List[Document]
Load data and extract table from Hwp file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file |
Path
|
Path for the Hwp file. |
required |
Returns:
Type | Description |
---|---|
List[Document]
|
List[Document] |
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/docs/base.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
|
IPYNBReader #
Bases: BaseReader
Image parser.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/ipynb/base.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None, fs: Optional[AbstractFileSystem] = None) -> List[Document]
Parse file.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/ipynb/base.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|
ImageCaptionReader #
Bases: BaseReader
Image parser.
Caption image using Blip.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/image_caption/base.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None) -> List[Document]
Parse file.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/image_caption/base.py
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
ImageReader #
Bases: BaseReader
Image parser.
Extract text from images using DONUT or pytesseract.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/image/base.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None, fs: Optional[AbstractFileSystem] = None) -> List[Document]
Parse file.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/image/base.py
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|
ImageTabularChartReader #
Bases: BaseReader
Image parser.
Extract tabular data from a chart or figure.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/image_deplot/base.py
8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None) -> List[Document]
Parse file.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/image_deplot/base.py
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 |
|
ImageVisionLLMReader #
Bases: BaseReader
Image parser.
Caption image using Blip2 (a multimodal VisionLLM similar to GPT4).
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/image_vision_llm/base.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None) -> List[Document]
Parse file.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/image_vision_llm/base.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
|
MarkdownReader #
Bases: BaseReader
Markdown parser.
Extract text from markdown files. Returns dictionary with keys as headers and values as the text between headers.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/markdown/base.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
markdown_to_tups #
markdown_to_tups(markdown_text: str) -> List[Tuple[Optional[str], str]]
Convert a markdown file to a dictionary.
The keys are the headers and the values are the text under each header.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/markdown/base.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 |
|
remove_images #
remove_images(content: str) -> str
Remove images in markdown content.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/markdown/base.py
77 78 79 80 |
|
remove_hyperlinks #
remove_hyperlinks(content: str) -> str
Remove hyperlinks in markdown content.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/markdown/base.py
82 83 84 85 |
|
parse_tups #
parse_tups(filepath: Path, errors: str = 'ignore', fs: Optional[AbstractFileSystem] = None) -> List[Tuple[Optional[str], str]]
Parse file into tuples.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/markdown/base.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None, fs: Optional[AbstractFileSystem] = None) -> List[Document]
Parse file into string.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/markdown/base.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
|
MboxReader #
Bases: BaseReader
Mbox parser.
Extract messages from mailbox files. Returns string including date, subject, sender, receiver and content for each message.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/mbox/base.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None, fs: Optional[AbstractFileSystem] = None) -> List[Document]
Parse file into string.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/mbox/base.py
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 |
|
PDFReader #
Bases: BaseReader
PDF parser.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/docs/base.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None, fs: Optional[AbstractFileSystem] = None) -> List[Document]
Parse file.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/docs/base.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
PagedCSVReader #
Bases: BaseReader
Paged CSV parser.
Displayed each row in an LLM-friendly format on a separate document.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
encoding |
str
|
Encoding used to open the file. utf-8 by default. |
'utf-8'
|
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/paged_csv/base.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None, delimiter: str = ',', quotechar: str = '"') -> List[Document]
Parse file.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/paged_csv/base.py
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|
PandasCSVReader #
Bases: BaseReader
Pandas-based CSV parser.
Parses CSVs using the separator detection from Pandas read_csv
function.
If special parameters are required, use the pandas_config
dict.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
concat_rows |
bool
|
whether to concatenate all rows into one document. If set to False, a Document will be created for each row. True by default. |
True
|
col_joiner |
str
|
Separator to use for joining cols per row. Set to ", " by default. |
', '
|
row_joiner |
str
|
Separator to use for joining each row.
Only used when |
'\n'
|
pandas_config |
dict
|
Options for the |
{}
|
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/tabular/base.py
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None, fs: Optional[AbstractFileSystem] = None) -> List[Document]
Parse file.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/tabular/base.py
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
PandasExcelReader #
Bases: BaseReader
Pandas-based Excel parser.
Parses Excel files using the Pandas read_excel
function.
If special parameters are required, use the pandas_config
dict.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
concat_rows |
bool
|
whether to concatenate all rows into one document. If set to False, a Document will be created for each row. True by default. |
True
|
sheet_name |
str | int | None
|
Defaults to None, for all sheets, otherwise pass a str or int to specify the sheet to read. |
None
|
pandas_config |
dict
|
Options for the |
{}
|
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/tabular/base.py
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None, fs: Optional[AbstractFileSystem] = None) -> List[Document]
Parse file.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/tabular/base.py
166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
|
PptxReader #
Bases: BaseReader
Powerpoint parser.
Extract text, caption images, and specify slides.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/slides/base.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
|
caption_image #
caption_image(tmp_image_file: str) -> str
Generate text caption of image.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/slides/base.py
59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None, fs: Optional[AbstractFileSystem] = None) -> List[Document]
Parse file.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/slides/base.py
88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
|
PyMuPDFReader #
Bases: BaseReader
Read PDF files using PyMuPDF library.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/pymu_pdf/base.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
|
load_data #
load_data(file_path: Union[Path, str], metadata: bool = True, extra_info: Optional[Dict] = None) -> List[Document]
Loads list of documents from PDF file and also accepts extra information in dict format.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/pymu_pdf/base.py
13 14 15 16 17 18 19 20 |
|
load #
load(file_path: Union[Path, str], metadata: bool = True, extra_info: Optional[Dict] = None) -> List[Document]
Loads list of documents from PDF file and also accepts extra information in dict format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path |
Union[Path, str]
|
file path of PDF file (accepts string or Path). |
required |
metadata |
bool
|
if metadata to be included or not. Defaults to True. |
True
|
extra_info |
Optional[Dict]
|
extra information related to each document in dict format. Defaults to None. |
None
|
Raises:
Type | Description |
---|---|
TypeError
|
if extra_info is not a dictionary. |
TypeError
|
if file_path is not a string or Path. |
Returns:
Type | Description |
---|---|
List[Document]
|
List[Document]: list of documents. |
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/pymu_pdf/base.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
|
RTFReader #
Bases: BaseReader
RTF (Rich Text Format) Reader. Reads rtf file and convert to Document.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/rtf/base.py
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
load_data #
load_data(input_file: Union[Path, str], extra_info: Optional[Dict[str, Any]] = None, **load_kwargs: Any) -> List[Document]
Load data from RTF file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_file |
Path | str
|
Path for the RTF file. |
required |
extra_info |
Dict[str, Any]
|
Path for the RTF file. |
None
|
Returns:
Type | Description |
---|---|
List[Document]
|
List[Document]: List of documents. |
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/rtf/base.py
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 |
|
UnstructuredReader #
Bases: BaseReader
General unstructured text reader for a variety of files.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/unstructured/base.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 |
|
from_api
classmethod
#
from_api(api_key: str, url: str = None)
Set the server url and api key.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/unstructured/base.py
70 71 72 73 |
|
load_data #
load_data(file: Optional[Path] = None, unstructured_kwargs: Optional[Dict] = None, document_kwargs: Optional[Dict] = None, extra_info: Optional[Dict] = None, split_documents: Optional[bool] = False, excluded_metadata_keys: Optional[List[str]] = None) -> List[Document]
Load data using Unstructured.io.
Depending on the configuration, if url is set or use_api is True, it'll parse the file using an API call, otherwise it parses it locally. extra_info is extended by the returned metadata if split_documents is True.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file |
Optional[Path]
|
Path to the file to be loaded. |
None
|
unstructured_kwargs |
Optional[Dict]
|
Additional arguments for unstructured partitioning. |
None
|
document_kwargs |
Optional[Dict]
|
Additional arguments for document creation. |
None
|
extra_info |
Optional[Dict]
|
Extra information to add to the document metadata. |
None
|
split_documents |
Optional[bool]
|
Whether to split the documents. |
False
|
excluded_metadata_keys |
Optional[List[str]]
|
Keys to exclude from the metadata. |
None
|
Returns:
Type | Description |
---|---|
List[Document]
|
List[Document]: List of parsed documents. |
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/unstructured/base.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 |
|
VideoAudioReader #
Bases: BaseReader
Video audio parser.
Extract text from transcript of video/audio files.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/video_audio/base.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None, fs: Optional[AbstractFileSystem] = None) -> List[Document]
Parse file.
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/video_audio/base.py
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
XMLReader #
Bases: BaseReader
XML reader.
Reads XML documents with options to help suss out relationships between nodes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
tree_level_split |
int
|
From which level in the xml tree we split documents, |
0
|
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/xml/base.py
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
|
load_data #
load_data(file: Path, extra_info: Optional[Dict] = None) -> List[Document]
Load data from the input file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file |
Path
|
Path to the input file. |
required |
extra_info |
Optional[Dict]
|
Additional information. Default is None. |
None
|
Returns:
Type | Description |
---|---|
List[Document]
|
List[Document]: List of documents. |
Source code in llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/xml/base.py
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
|