Kùzu Graph Store¶
This notebook walks through configuring Kùzu
to be the backend for graph storage in LlamaIndex.
%pip install llama-index
%pip install llama-index-llms-openai
%pip install llama-index-graph-stores-kuzu
%pip install pyvis
# My OpenAI Key
import os
os.environ["OPENAI_API_KEY"] = "API_KEY_HERE"
Prepare for Kùzu¶
# Clean up all the directories used in this notebook
import shutil
shutil.rmtree("./test1", ignore_errors=True)
shutil.rmtree("./test2", ignore_errors=True)
shutil.rmtree("./test3", ignore_errors=True)
import kuzu
db = kuzu.Database("test1")
Using Knowledge Graph with KuzuGraphStore¶
from llama_index.graph_stores.kuzu import KuzuGraphStore
graph_store = KuzuGraphStore(db)
Building the Knowledge Graph¶
from llama_index.core import SimpleDirectoryReader, KnowledgeGraphIndex
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
from IPython.display import Markdown, display
import kuzu
documents = SimpleDirectoryReader(
"../../../examples/data/paul_graham"
).load_data()
# define LLM
llm = OpenAI(temperature=0, model="gpt-3.5-turbo")
Settings.llm = llm
Settings.chunk_size = 512
from llama_index.core import StorageContext
storage_context = StorageContext.from_defaults(graph_store=graph_store)
# NOTE: can take a while!
index = KnowledgeGraphIndex.from_documents(
documents,
max_triplets_per_chunk=2,
storage_context=storage_context,
)
# # To reload from an existing graph store without recomputing each time, use:
# index = KnowledgeGraphIndex(nodes=[], storage_context=storage_context)
Querying the Knowledge Graph¶
First, we can query and send only the triplets to the LLM.
query_engine = index.as_query_engine(
include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
"Tell me more about Interleaf",
)
display(Markdown(f"<b>{response}</b>"))
Interleaf was involved in making software, added a scripting language, was inspired by Emacs, taught what not to do, built impressive technology, and made software that became obsolete and was replaced by a service. Additionally, Interleaf made software that could launch as soon as it was done and was affected by rapid changes in the industry.
For more detailed answers, we can also send the text from where the retrieved tripets were extracted.
query_engine = index.as_query_engine(
include_text=True, response_mode="tree_summarize"
)
response = query_engine.query(
"Tell me more about Interleaf",
)
display(Markdown(f"<b>{response}</b>"))
Interleaf was a company that made software for creating documents. They added a scripting language inspired by Emacs, making it a dialect of Lisp. Despite having smart people and building impressive technology, Interleaf ultimately faced challenges due to the rapid advancements in technology, as they were affected by Moore's Law. The software they created could be launched as soon as it was done, and they made use of software that was considered slick in 1996. Additionally, Interleaf's experience taught valuable lessons about the importance of being run by product people rather than sales people in technology companies, the risks of editing code by too many people, the significance of office environment on productivity, and the impact of conventional office hours on optimal hacking times.
Query with embeddings¶
# NOTE: can take a while!
db = kuzu.Database("test2")
graph_store = KuzuGraphStore(db)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
new_index = KnowledgeGraphIndex.from_documents(
documents,
max_triplets_per_chunk=2,
storage_context=storage_context,
include_embeddings=True,
)
# query using top 3 triplets plus keywords (duplicate triplets are removed)
query_engine = index.as_query_engine(
include_text=True,
response_mode="tree_summarize",
embedding_mode="hybrid",
similarity_top_k=5,
)
response = query_engine.query(
"Tell me more about what the author worked on at Interleaf",
)
display(Markdown(f"<b>{response}</b>"))
The author worked on software at Interleaf, a company that made software for creating documents. The software the author worked on was an online store builder, which required a private launch before a public launch to recruit an initial set of users. The author also learned valuable lessons at Interleaf, such as the importance of having technology companies run by product people, the pitfalls of editing code by too many people, the significance of office environment on productivity, and the impact of big bureaucratic customers. Additionally, the author discovered that low-end software tends to outperform high-end software, emphasizing the importance of being the "entry level" option in the market.
Visualizing the Graph¶
## create graph
from pyvis.network import Network
g = index.get_networkx_graph()
net = Network(notebook=True, cdn_resources="in_line", directed=True)
net.from_nx(g)
net.show("kuzugraph_draw.html")
kuzugraph_draw.html
[Optional] Try building the graph and manually add triplets!¶
from llama_index.core.node_parser import SentenceSplitter
node_parser = SentenceSplitter()
nodes = node_parser.get_nodes_from_documents(documents)
# initialize an empty database
db = kuzu.Database("test3")
graph_store = KuzuGraphStore(db)
storage_context = StorageContext.from_defaults(graph_store=graph_store)
index = KnowledgeGraphIndex(
[],
storage_context=storage_context,
)
# add keyword mappings and nodes manually
# add triplets (subject, relationship, object)
# for node 0
node_0_tups = [
("author", "worked on", "writing"),
("author", "worked on", "programming"),
]
for tup in node_0_tups:
index.upsert_triplet_and_node(tup, nodes[0])
# for node 1
node_1_tups = [
("Interleaf", "made software for", "creating documents"),
("Interleaf", "added", "scripting language"),
("software", "generate", "web sites"),
]
for tup in node_1_tups:
index.upsert_triplet_and_node(tup, nodes[1])
query_engine = index.as_query_engine(
include_text=False, response_mode="tree_summarize"
)
response = query_engine.query(
"Tell me more about Interleaf",
)
str(response)
'Interleaf was involved in creating documents and also added a scripting language to its software.'