UnstructuredElementNodeParser#

pydantic model llama_index.core.node_parser.UnstructuredElementNodeParser#

Unstructured element node parser.

Splits a document into Text Nodes and Index Nodes corresponding to embedded objects (e.g. tables).

Show JSON schema
{
   "title": "UnstructuredElementNodeParser",
   "description": "Unstructured element node parser.\n\nSplits a document into Text Nodes and Index Nodes corresponding to embedded objects\n(e.g. tables).",
   "type": "object",
   "properties": {
      "include_metadata": {
         "title": "Include Metadata",
         "description": "Whether or not to consider metadata when splitting.",
         "default": true,
         "type": "boolean"
      },
      "include_prev_next_rel": {
         "title": "Include Prev Next Rel",
         "description": "Include prev/next node relationships.",
         "default": true,
         "type": "boolean"
      },
      "callback_manager": {
         "title": "Callback Manager",
         "type": "object",
         "default": {}
      },
      "llm": {
         "title": "Llm",
         "description": "LLM model to use for summarization.",
         "allOf": [
            {
               "$ref": "#/definitions/LLM"
            }
         ]
      },
      "summary_query_str": {
         "title": "Summary Query Str",
         "description": "Query string to use for summarization.",
         "default": "What is this table about? Give a very concise summary (imagine you are adding a new caption and summary for this table), and output the real/existing table title/caption if context provided.and output the real/existing table id if context provided.and also output whether or not the table should be kept.",
         "type": "string"
      },
      "num_workers": {
         "title": "Num Workers",
         "description": "Num of workers for async jobs.",
         "default": 4,
         "type": "integer"
      },
      "show_progress": {
         "title": "Show Progress",
         "description": "Whether to show progress.",
         "default": true,
         "type": "boolean"
      },
      "nested_node_parser": {
         "title": "Nested Node Parser",
         "description": "Other types of node parsers to handle some types of nodes.",
         "allOf": [
            {
               "$ref": "#/definitions/NodeParser"
            }
         ]
      },
      "partitioning_parameters": {
         "title": "Partitioning Parameters",
         "description": "Extra dictionary representing parameters of the partitioning process.",
         "default": {},
         "type": "object"
      },
      "class_name": {
         "title": "Class Name",
         "type": "string",
         "default": "UnstructuredElementNodeParser"
      }
   },
   "definitions": {
      "PydanticProgramMode": {
         "title": "PydanticProgramMode",
         "description": "Pydantic program mode.",
         "enum": [
            "default",
            "openai",
            "llm",
            "guidance",
            "lm-format-enforcer"
         ],
         "type": "string"
      },
      "BasePromptTemplate": {
         "title": "BasePromptTemplate",
         "description": "Chainable mixin.\n\nA module that can produce a `QueryComponent` from a set of inputs through\n`as_query_component`.\n\nIf plugged in directly into a `QueryPipeline`, the `ChainableMixin` will be\nconverted into a `QueryComponent` with default parameters.",
         "type": "object",
         "properties": {
            "metadata": {
               "title": "Metadata",
               "type": "object"
            },
            "template_vars": {
               "title": "Template Vars",
               "type": "array",
               "items": {
                  "type": "string"
               }
            },
            "kwargs": {
               "title": "Kwargs",
               "type": "object",
               "additionalProperties": {
                  "type": "string"
               }
            },
            "output_parser": {
               "title": "Output Parser",
               "type": "object",
               "default": {}
            },
            "template_var_mappings": {
               "title": "Template Var Mappings",
               "description": "Template variable mappings (Optional).",
               "type": "object"
            }
         },
         "required": [
            "metadata",
            "template_vars",
            "kwargs"
         ]
      },
      "LLM": {
         "title": "LLM",
         "description": "LLM interface.",
         "type": "object",
         "properties": {
            "callback_manager": {
               "title": "Callback Manager",
               "type": "object",
               "default": {}
            },
            "system_prompt": {
               "title": "System Prompt",
               "description": "System prompt for LLM calls.",
               "type": "string"
            },
            "output_parser": {
               "title": "Output Parser",
               "description": "Output parser to parse, validate, and correct errors programmatically.",
               "type": "object",
               "default": {}
            },
            "pydantic_program_mode": {
               "default": "default",
               "allOf": [
                  {
                     "$ref": "#/definitions/PydanticProgramMode"
                  }
               ]
            },
            "query_wrapper_prompt": {
               "title": "Query Wrapper Prompt",
               "description": "Query wrapper prompt for LLM calls.",
               "allOf": [
                  {
                     "$ref": "#/definitions/BasePromptTemplate"
                  }
               ]
            },
            "class_name": {
               "title": "Class Name",
               "type": "string",
               "default": "base_component"
            }
         }
      },
      "NodeParser": {
         "title": "NodeParser",
         "description": "Base interface for node parser.",
         "type": "object",
         "properties": {
            "include_metadata": {
               "title": "Include Metadata",
               "description": "Whether or not to consider metadata when splitting.",
               "default": true,
               "type": "boolean"
            },
            "include_prev_next_rel": {
               "title": "Include Prev Next Rel",
               "description": "Include prev/next node relationships.",
               "default": true,
               "type": "boolean"
            },
            "callback_manager": {
               "title": "Callback Manager",
               "type": "object",
               "default": {}
            },
            "class_name": {
               "title": "Class Name",
               "type": "string",
               "default": "base_component"
            }
         }
      }
   }
}

Config
  • arbitrary_types_allowed: bool = True

Fields
  • partitioning_parameters (Optional[Dict[str, Any]])

Validators
  • _validate_id_func » id_func

field partitioning_parameters: Optional[Dict[str, Any]] = {}#

Extra dictionary representing parameters of the partitioning process.

classmethod class_name() str#

Get the class name, used as a unique ID in serialization.

This provides a key that makes serialization robust against actual class name changes.

extract_elements(text: str, table_filters: Optional[List[Callable]] = None, **kwargs: Any) List[Element]#

Extract elements from text.

filter_table(table_element: Any) bool#

Filter tables.

get_nodes_from_node(node: TextNode) List[BaseNode]#

Get nodes from node.