Defining and Customizing Nodes#
Nodes represent “chunks” of source Documents, whether that is a text chunk, an image, or more. They also contain metadata and relationship information with other nodes and index structures.
Nodes are a first-class citizen in LlamaIndex. You can choose to define Nodes and all its attributes directly. You may also choose to “parse” source Documents into Nodes through our NodeParser
classes.
For instance, you can do
from llama_index.node_parser import SentenceSplitter
parser = SentenceSplitter()
nodes = parser.get_nodes_from_documents(documents)
You can also choose to construct Node objects manually and skip the first section. For instance,
from llama_index.schema import TextNode, NodeRelationship, RelatedNodeInfo
node1 = TextNode(text="<text_chunk>", id_="<node_id>")
node2 = TextNode(text="<text_chunk>", id_="<node_id>")
# set relationships
node1.relationships[NodeRelationship.NEXT] = RelatedNodeInfo(
node_id=node2.node_id
)
node2.relationships[NodeRelationship.PREVIOUS] = RelatedNodeInfo(
node_id=node1.node_id
)
nodes = [node1, node2]
The RelatedNodeInfo
class can also store additional metadata
if needed:
node2.relationships[NodeRelationship.PARENT] = RelatedNodeInfo(
node_id=node1.node_id, metadata={"key": "val"}
)
Customizing the ID#
Each node has an node_id
property that is automatically generated if not manually specified. This ID can be used for
a variety of purposes; this includes being able to update nodes in storage, being able to define relationships
between nodes (through IndexNode
), and more.
You can also get and set the node_id
of any TextNode
directly.
print(node.node_id)
node.node_id = "My new node_id!"