HTMLNodeParser#

pydantic model llama_index.node_parser.HTMLNodeParser#

HTML node parser.

Splits a document into Nodes using custom HTML splitting logic.

Parameters

include_metadata (bool) – whether to include metadata in nodes
include_prev_next_rel (bool) – whether to include prev/next relationships

Show JSON schema

{
   "title": "HTMLNodeParser",
   "description": "HTML node parser.\n\nSplits a document into Nodes using custom HTML splitting logic.\n\nArgs:\n    include_metadata (bool): whether to include metadata in nodes\n    include_prev_next_rel (bool): whether to include prev/next relationships",
   "type": "object",
   "properties": {
      "include_metadata": {
         "title": "Include Metadata",
         "description": "Whether or not to consider metadata when splitting.",
         "default": true,
         "type": "boolean"
      },
      "include_prev_next_rel": {
         "title": "Include Prev Next Rel",
         "description": "Include prev/next node relationships.",
         "default": true,
         "type": "boolean"
      },
      "callback_manager": {
         "title": "Callback Manager"
      },
      "id_func": {
         "title": "Id Func"
      },
      "tags": {
         "title": "Tags",
         "description": "HTML tags to extract text from.",
         "default": [
            "p",
            "h1",
            "h2",
            "h3",
            "h4",
            "h5",
            "h6",
            "li",
            "b",
            "i",
            "u",
            "section"
         ],
         "type": "array",
         "items": {
            "type": "string"
         }
      },
      "class_name": {
         "title": "Class Name",
         "type": "string",
         "default": "HTMLNodeParser"
      }
   }
}

Config

arbitrary_types_allowed: bool = True

Fields

tags (List[str])

field tags: List[str] = ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'li', 'b', 'i', 'u', 'section']#: HTML tags to extract text from.

classmethod class_name() → str#: Get class name.

classmethod from_defaults(include_metadata: bool = True, include_prev_next_rel: bool = True, callback_manager: Optional[CallbackManager] = None, tags: Optional[List[str]] = ['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'li', 'b', 'i', 'u', 'section']) → HTMLNodeParser#

get_nodes_from_node(node: BaseNode) → List[TextNode]#: Get nodes from document.