Pdf table
PDFTableReader #
Bases: BaseReader
PDF Table Reader. Reads table from PDF.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
row_separator |
str
|
Row separator used to join rows of a DataFrame. |
'\n'
|
col_separator |
str
|
Col separator used to join columns of a DataFrame. |
', '
|
Source code in llama-index-integrations/readers/llama-index-readers-pdf-table/llama_index/readers/pdf_table/base.py
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
|
load_data #
load_data(file: Path, pages: str = '1', extra_info: Optional[Dict] = None) -> List[Document]
Load data and extract table from PDF file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file |
Path
|
Path for the PDF file. |
required |
pages |
str
|
Pages to read tables from. |
'1'
|
extra_info |
Optional[Dict]
|
Extra information. |
None
|
Returns:
Type | Description |
---|---|
List[Document]
|
List[Document]: List of documents. |
Source code in llama-index-integrations/readers/llama-index-readers-pdf-table/llama_index/readers/pdf_table/base.py
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
|