Ingestion Pipeline¶
In this notebook we will demonstrate usage of Ingestion Pipeline in building RAG applications.
Installation¶
In [ ]:
Copied!
!pip install llama-index llama-index-vector-stores-qdrant
!pip install llama-index llama-index-vector-stores-qdrant
Set API Key¶
In [ ]:
Copied!
import nest_asyncio
nest_asyncio.apply()
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
import nest_asyncio
nest_asyncio.apply()
import os
os.environ["OPENAI_API_KEY"] = "sk-..."
Download Data¶
In [ ]:
Copied!
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
!mkdir -p 'data/paul_graham/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'
--2024-04-26 13:35:44-- https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.111.133, 185.199.108.133, ... Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 75042 (73K) [text/plain] Saving to: ‘data/paul_graham/paul_graham_essay.txt’ data/paul_graham/pa 100%[===================>] 73.28K --.-KB/s in 0.009s 2024-04-26 13:35:44 (8.36 MB/s) - ‘data/paul_graham/paul_graham_essay.txt’ saved [75042/75042]
Load Data¶
In [ ]:
Copied!
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
from llama_index.core import SimpleDirectoryReader
documents = SimpleDirectoryReader("./data/paul_graham/").load_data()
Ingestion Pipeline - Apply Transformations¶
In [ ]:
Copied!
from llama_index.core import Document
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.node_parser import TokenTextSplitter
from llama_index.core.extractors import TitleExtractor
from llama_index.core.ingestion import IngestionPipeline, IngestionCache
from llama_index.core import Document
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.node_parser import TokenTextSplitter
from llama_index.core.extractors import TitleExtractor
from llama_index.core.ingestion import IngestionPipeline, IngestionCache
Text Splitters¶
In [ ]:
Copied!
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
]
)
nodes = pipeline.run(documents=documents)
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
]
)
nodes = pipeline.run(documents=documents)
In [ ]:
Copied!
nodes[0]
nodes[0]
Out[ ]:
TextNode(id_='c6856f07-73bc-44ce-bd0b-5e27271f9f0f', embedding=None, metadata={'file_path': '/content/data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75042, 'creation_date': '2024-04-26', 'last_modified_date': '2024-04-26'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='244aec5e-98e0-48d1-81fd-9c12c2fe4c5c', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_path': '/content/data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75042, 'creation_date': '2024-04-26', 'last_modified_date': '2024-04-26'}, hash='952e9dc1a243648316292b0771f0f024a059072e500f7da0092671800767f543'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='fe681b68-998e-4f5f-b113-2684fbaa543a', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='d3386e1e52a73d6920911fc30d0592648217874646641b9e7c64ef1c1f4cc82b')}, text='What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn\'t write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n\nThe first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district\'s 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain\'s lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.\n\nThe language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in the card reader and press a button to load the program into memory and run it. The result would ordinarily be to print something on the spectacularly loud printer.\n\nI was puzzled by the 1401. I couldn\'t figure out what to do with it. And in retrospect there\'s not much I could have done with it. The only form of input to programs was data stored on punched cards, and I didn\'t have any data stored on punched cards. The only other option was to do things that didn\'t rely on any input, like calculate approximations of pi, but I didn\'t know enough math to do anything interesting of that type. So I\'m not surprised I can\'t remember any programs I wrote, because they can\'t have done much. My clearest memory is of the moment I learned it was possible for programs not to terminate, when one of mine didn\'t. On a machine without time-sharing, this was a social as well as a technical error, as the data center manager\'s expression made clear.\n\nWith microcomputers, everything changed. Now you could have a computer sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punch cards and then stopping. [1]\n\nThe first of my friends to get a microcomputer built it himself. It was sold as a kit by Heathkit. I remember vividly how impressed and envious I felt watching him sitting in front of it, typing programs right into the computer.\n\nComputers were expensive in those days and it took me years of nagging before I convinced my father to buy one, a TRS-80, in about 1980. The gold standard then was the Apple II, but a TRS-80 was good enough. This was when I really started programming. I wrote simple games, a program to predict how high my model rockets would fly, and a word processor that my father used to write at least one book. There was only room in memory for about 2 pages of text, so he\'d write 2 pages at a time and then print them out, but it was a lot better than a typewriter.\n\nThough I liked programming, I didn\'t plan to study it in college. In college I was going to study philosophy, which sounded much more powerful. It seemed, to my naive high school self, to be the study of the ultimate truths, compared to which the things studied in other fields would be mere domain knowledge. What I discovered when I got to college was that the other fields took up so much of the space of ideas that there wasn\'t much left for these supposed ultimate truths. All that seemed left for philosophy were edge cases that people in other fields felt could safely be ignored.\n\nI couldn\'t have put this into words when I was 18. All I knew at the time was that I kept taking philosophy courses and they kept being boring. So I decided to switch to AI.\n\nAI was in the air in the mid 1980s, but there were two things especially that made me want to work on it: a novel by Heinlein called The Moon is a Harsh Mistress, which featured an intelligent computer called Mike, and a PBS documentary that showed Terry Winograd using SHRDLU. I haven\'t tried rereading The Moon is a Harsh Mistress, so I don\'t know how well it has aged, but when I read it I was drawn entirely into its world. It seemed only a matter of time before we\'d have Mike, and when I saw Winograd using SHRDLU, it seemed like that time would be a few years at most. All you had to do was teach SHRDLU more words.\n\nThere weren\'t any classes in AI at Cornell then, not even graduate classes, so I started trying to teach', start_char_idx=2, end_char_idx=4473, text_template='{metadata_str}\n\n{content}', metadata_template='{key}: {value}', metadata_seperator='\n')
Text Splitter + Metadata Extractor¶
In [ ]:
Copied!
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TitleExtractor(),
]
)
nodes = pipeline.run(documents=documents)
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TitleExtractor(),
]
)
nodes = pipeline.run(documents=documents)
100%|██████████| 5/5 [00:01<00:00, 3.71it/s]
In [ ]:
Copied!
nodes[0].metadata["document_title"]
nodes[0].metadata["document_title"]
Out[ ]:
'From Painting to Programming: A Journey through Writing, AI, and Fine Arts'
Text Splitter + Metadata Extractor + OpenAI Embedding¶
In [ ]:
Copied!
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TitleExtractor(),
OpenAIEmbedding(),
]
)
nodes = pipeline.run(documents=documents)
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TitleExtractor(),
OpenAIEmbedding(),
]
)
nodes = pipeline.run(documents=documents)
100%|██████████| 5/5 [00:01<00:00, 4.31it/s]
In [ ]:
Copied!
nodes[0].metadata["document_title"]
nodes[0].metadata["document_title"]
Out[ ]:
'Journeys in Writing, Programming, and Art: Exploring the Evolution of Artificial Intelligence and the Intersection of Technology and Creativity'
In [ ]:
Copied!
nodes[0]
nodes[0]
Out[ ]:
TextNode(id_='0a6d8435-cc5c-4100-b12c-e22d175190c6', embedding=[0.004466439131647348, -0.01828564889729023, -0.007774787023663521, -0.02322954684495926, 0.005550032947212458, 0.034214481711387634, -0.02435377426445484, -0.005089505575597286, -0.017676126211881638, -0.024462133646011353, 0.02787545509636402, 0.03118041716516018, 1.217588214785792e-05, -0.0046763853169977665, -0.0014975607628002763, 0.01750004291534424, 0.022349126636981964, 0.014804602600634098, 0.003238930134102702, -0.01607782579958439, -0.02095399796962738, -0.009826842695474625, 0.008926105685532093, -0.007693517487496138, 0.006928229238837957, -0.0003453955869190395, 0.019382787868380547, -0.04445444419980049, -0.003735013073310256, -0.014249260537326336, 0.01731041446328163, -0.008208224549889565, -0.006948546506464481, -0.006609923206269741, -0.032670360058546066, -0.0017185123870149255, -0.014587883837521076, -0.005929290782660246, 0.011431916616857052, -0.012617097236216068, 0.03632748872041702, 0.026669956743717194, -0.005414583720266819, -0.0037891927640885115, -0.019491147249937057, 0.014493069611489773, 0.019098343327641487, 0.004730565007776022, -0.0012046517804265022, 0.0229451023042202, 0.011337102390825748, 0.037898700684309006, -0.00988102238625288, -0.0025955461896955967, -0.023893248289823532, 0.00570579944178462, 0.02282319776713848, 0.024746578186750412, 0.005512784235179424, 0.012596780434250832, 0.01019255630671978, 0.005699027329683304, -0.00018285648548044264, 0.012583235278725624, -0.012705139815807343, 0.003738399362191558, -0.024042241275310516, 0.008113410323858261, 0.011709587648510933, -0.019220247864723206, -0.000280422274954617, 0.005583895370364189, -0.026033345609903336, 0.00767997233197093, 0.03275162726640701, 0.012312336824834347, 0.004513846244663, -0.007178809959441423, -0.003839986165985465, -0.006813097279518843, -0.01393772754818201, 0.0021214738953858614, 0.0014315292937681079, 0.009975837543606758, 0.0035047493875026703, -0.011418371461331844, -0.000737351831048727, 0.026250064373016357, -0.01476396806538105, -0.03185766190290451, 0.014018997550010681, 0.01748649775981903, 0.001982638379558921, 0.027631646022200584, -0.0066844201646745205, 0.019436966627836227, -0.004263265058398247, 0.009786208160221577, -0.0037959651090204716, -0.030638620257377625, 0.007077223155647516, -0.003252475056797266, 0.011939851567149162, -0.015197405591607094, -0.029338307678699493, 0.028146354481577873, 0.0037079232279211283, -0.007585158105939627, 0.01535994466394186, -0.02077791467308998, 0.020032944157719612, 0.03459373861551285, 0.013334978371858597, -0.03936155140399933, -0.021360347047448158, -0.03351014479994774, 0.02118426188826561, -0.03088242933154106, -0.02141452580690384, -0.02066955529153347, -0.0006708970759063959, 0.014967141672968864, 0.016443539410829544, -0.009393405169248581, 0.012508737854659557, 0.03066571056842804, -0.023473354056477547, -0.01846173219382763, 0.014967141672968864, -0.02164478972554207, 0.01326048094779253, 0.0006459236028604209, 0.030638620257377625, 0.00215025688521564, -0.014587883837521076, 0.014208626002073288, -0.023717163130640984, 0.0009058168507181108, -0.0075445231050252914, -0.012461330741643906, 0.009041237644851208, 0.018028294667601585, -0.010226418264210224, -0.013842913322150707, -0.008336901664733887, 0.01607782579958439, 0.009081872180104256, 0.0020588284824043512, 0.017513588070869446, 0.018664905801415443, -0.0009396791574545205, -0.013423020951449871, -0.02191568911075592, 0.0028850689996033907, 0.014289896003901958, -0.0029155451338738203, 0.01659253239631653, 0.028769420459866524, -0.012501965276896954, 0.0029273969121277332, 0.0208998192101717, 0.025924986228346825, 0.005170775111764669, 0.002055442426353693, -0.0014924814458936453, 0.02635842375457287, 0.026602232828736305, 0.014262805692851543, -0.014804602600634098, -0.02198341302573681, -0.0058547938242554665, 0.025884350761771202, -0.04190799593925476, 0.0012943869223818183, 0.010355095379054546, 0.01913897879421711, 0.02202404849231243, 0.004815220832824707, -0.02583017200231552, -0.012332653626799583, 0.007740924600511789, 0.0017168192425742745, 0.027496198192238808, 0.01901707425713539, -0.010361867025494576, 0.004578184802085161, 0.012542600743472576, -0.01061922125518322, -0.005397652741521597, -0.0023212614469230175, 0.019057709723711014, 0.011310012079775333, -0.018272103741765022, -0.02100817859172821, -0.6514567136764526, -0.032291099429130554, 0.010409275069832802, -0.010057106614112854, -0.002747926628217101, 0.00908864475786686, -0.0016516342293471098, 0.008912560530006886, 0.010781760327517986, 0.03532516211271286, -0.016511263325810432, -0.005079346708953381, 0.013585560023784637, 0.006298390217125416, 0.003443797118961811, -0.0016236978117376566, 0.02435377426445484, -0.017283324152231216, -0.011039113625884056, -0.0017185123870149255, -0.0190306194126606, 0.024326685816049576, -0.0029866560362279415, -0.01993812993168831, -0.009373088367283344, 0.021658334881067276, 0.001068355981260538, -0.00684018712490797, 0.0008097325335256755, 0.026209428906440735, -0.02249811962246895, 0.019464056938886642, 0.009393405169248581, -0.004337762016803026, 0.05393588915467262, -0.011743449606001377, -0.0113845095038414, 0.016050735488533974, 0.026764770969748497, 0.024800756946206093, -0.03061152994632721, -0.026886675506830215, -0.010043561458587646, 0.004493528977036476, -0.0018234854796901345, 0.012325881980359554, 0.022701293230056763, -0.00826240424066782, 0.007659655064344406, -0.0037858064752072096, -0.005441673565655947, 0.014777513220906258, -0.0028376616537570953, -0.010165465995669365, 0.006427066866308451, 0.0021248601842671633, 0.021373892202973366, -0.009312136098742485, 0.0003913213440682739, -0.01198048610240221, -0.005590667948126793, 0.009610123932361603, -0.029907194897532463, 0.014154446311295033, -0.009603351354598999, 0.007463253568857908, -0.009075099602341652, -0.005807386711239815, -0.0023043302353471518, -0.020710190758109093, 0.020628919824957848, 0.02522064931690693, -0.015495394356548786, -0.019260883331298828, 0.00263956724666059, 0.012793181464076042, 0.0201006680727005, 0.0014069790486246347, -0.007977960631251335, 0.0013104714453220367, 0.0095153097063303, -0.004937124904245138, -0.007314259186387062, -0.008912560530006886, 0.02844434231519699, -0.006308548618108034, -0.02901322953402996, 0.0016220047837123275, 0.008404625579714775, 0.00023407323169521987, -0.0043580797500908375, 0.016199730336666107, -0.029690474271774292, -0.04095985367894173, -0.007077223155647516, 0.0012757625663653016, -0.0065184952691197395, -0.013382385484874249, 0.007009498775005341, -0.03399776294827461, 0.005495853256434202, 0.0007686744793318212, 0.023202456533908844, 0.01611846126616001, 0.018082475289702415, -0.00888547021895647, 0.006982408929616213, 0.020479926839470863, 0.01104588620364666, -0.03879266604781151, -0.0004931199364364147, 0.002641260391101241, -0.018922260031104088, 0.003147501964122057, -0.008614571765065193, -0.018705541267991066, 0.011560593731701374, 0.0017185123870149255, 0.016538353636860847, -0.006630240473896265, 0.025342553853988647, -0.014222171157598495, -0.018935805186629295, -0.0055331019684672356, -0.007009498775005341, 0.002561683999374509, 0.00133840786293149, -0.03283289819955826, -0.04640491306781769, 0.012014348059892654, -0.010226418264210224, -0.013287571258842945, 0.021170716732740402, -0.008391081355512142, 0.0014730105176568031, -0.008871925994753838, 0.013097941875457764, -0.012556144967675209, 0.0030882428400218487, -0.0020351249258965254, -0.0329141691327095, -0.003735013073310256, 0.007172037847340107, 0.008722931146621704, -0.006897753104567528, -0.024448588490486145, -0.0010395729914307594, -0.00010412660776637495, 0.0012588314712047577, 0.006897753104567528, 0.009610123932361603, -0.007876373827457428, -0.007151720114052296, 0.009291818365454674, -0.013450110331177711, -0.0071381754241883755, -0.014872327446937561, -0.022240767255425453, -0.0172562338411808, -0.018597181886434555, -0.028417252004146576, 0.04247688502073288, -0.004043160006403923, 0.006071512587368488, 0.007077223155647516, 0.003254168201237917, 0.005421356298029423, 0.036137860268354416, -0.029392486438155174, -0.03459373861551285, 0.012203977443277836, 0.007578385528177023, -0.005035325884819031, 0.0017202054150402546, -0.005709185730665922, 0.018258558586239815, -0.015698567032814026, 0.0017405227990821004, 0.0038873935118317604, -0.012454558163881302, -0.005688868463039398, 0.024733033031225204, 0.021604154258966446, -0.014046086929738522, 0.020141303539276123, 0.0019064481602981687, 0.0227690190076828, 0.016782162711024284, -0.001776078250259161, 0.003575860057026148, 0.016145549714565277, 0.012258157134056091, -0.010686946101486683, 0.0013020058395341039, -0.007713834755122662, 0.029284127056598663, -0.008066002279520035, -0.0037722615525126457, 0.009386632591485977, 0.024123510345816612, 0.008912560530006886, 0.009027692489326, 0.003938186913728714, -0.009488219395279884, 0.011357419192790985, -0.03497299551963806, -0.0008656053687445819, -0.031261686235666275, 0.028498521074652672, -0.01011128630489111, -0.004080408718436956, -0.007632565218955278, -0.015468304045498371, -0.03730272129178047, 0.011296466924250126, 0.015563118271529675, -0.016199730336666107, 0.009318908676505089, -0.05171452462673187, 0.011079748161137104, 0.0025938530452549458, 0.036706745624542236, -0.0011623238679021597, -0.009738801047205925, 0.00826240424066782, 0.037167273461818695, 0.004842310678213835, 0.0020842254161834717, -0.005001463461667299, -0.027455562725663185, 0.006115533411502838, -0.00809309259057045, 0.020886274054646492, 0.001722745131701231, 0.020859183743596077, 0.015468304045498371, 0.019640140235424042, -0.01629454456269741, 0.040174245834350586, -0.002385599771514535, -0.023933881893754005, 0.028200533241033554, 0.03248072788119316, -0.03692346438765526, 0.00809309259057045, 0.004845696967095137, 0.05014331266283989, -3.158718755003065e-05, -0.01221075002104044, 0.020141303539276123, -0.021211352199316025, 0.007212672382593155, -0.0011157632106915116, -0.005160616245120764, 0.007768014445900917, -0.009041237644851208, 0.005282520782202482, 0.007558068260550499, 0.01750004291534424, 0.02726593427360058, -0.015861107036471367, 0.004080408718436956, -0.02253875508904457, -0.002495652297511697, -0.0026361809577792883, -0.013646511361002922, -0.0019403104670345783, 0.003663902170956135, -0.015197405591607094, -0.009244411252439022, -0.005881883669644594, -0.02589789591729641, 0.00019365009211469442, -0.0063254800625145435, 0.032561998814344406, 0.003167819231748581, 0.015034866519272327, 0.004825379233807325, 0.029663385823369026, 0.015048411674797535, -0.04139329120516777, -0.026277154684066772, 0.013951272703707218, -0.008194679394364357, -0.003988980315625668, 0.0109849339351058, -0.008851608261466026, 0.006653944496065378, 0.007321031764149666, -0.0007365052588284016, -0.011127155274152756, -0.004286968614906073, 0.005211409647017717, -0.0010090968571603298, -0.017053060233592987, 0.007530978415161371, 0.0314784049987793, -0.01890871487557888, 0.006237437948584557, -0.016511263325810432, 0.019260883331298828, 0.013409475795924664, -0.018136654049158096, -0.01954532600939274, 0.0433979406952858, -0.02833598293364048, -0.009901340119540691, 0.0006806324818171561, 0.009061554446816444, -0.01778448559343815, 0.005059029441326857, 0.012373289093375206, 0.004286968614906073, 0.01200757548213005, 0.00908864475786686, 0.011513185687363148, -0.004198926500976086, -0.005800614133477211, 0.017649037763476372, 0.02226785570383072, 0.002195970853790641, -0.028471432626247406, -0.02413705550134182, 0.0019470829283818603, 0.09681912511587143, 0.03413321077823639, -0.013666829094290733, 0.005759979132562876, 0.009488219395279884, -0.007178809959441423, -0.018096020445227623, -0.03020518273115158, 0.03288707882165909, 0.002785175107419491, 0.01800120435655117, -0.007781559135764837, 0.011167790740728378, 0.0009997847955673933, 0.014371165074408054, -0.005197864957153797, 0.00034984000376425683, -0.04196217656135559, 0.010612448677420616, -0.011174563318490982, 0.02616879530251026, 0.008621344342827797, 0.0015949149383231997, 0.006982408929616213, -0.0021400980185717344, -0.029148677363991737, 0.021793784573674202, 0.00016698353283572942, 0.014831692911684513, -0.01624036394059658, -0.0032084539998322725, 0.008871925994753838, 0.02584371715784073, 0.0007902617217041552, -0.00888547021895647, 0.03077406994998455, 0.015563118271529675, 0.036652565002441406, 0.006667489185929298, -0.01164186280220747, 0.008330129086971283, 0.02287737838923931, 0.0024143827613443136, -0.02755037695169449, -0.006284845061600208, -0.022687749937176704, -0.0006819022819399834, 0.04616110399365425, -0.014980686828494072, -0.041880909353494644, 0.006904525216668844, 0.020290296524763107, -0.025748902931809425, -0.005455218255519867, 0.0011106837773695588, 0.02545091323554516, 0.021739603951573372, 0.0026344878133386374, 0.004324217326939106, 0.014452435076236725, -8.820074435789138e-05, -0.01981622539460659, 0.004320831038057804, -0.004815220832824707, 0.0050624157302081585, -0.024435045197606087, -0.00607489887624979, 0.015468304045498371, -0.003965276759117842, 0.03600240871310234, 0.012014348059892654, -0.0024008378386497498, -0.04388555511832237, 0.003333744592964649, 0.015928830951452255, -0.003338824026286602, 0.022118862718343735, -0.0026717365253716707, -0.004429190419614315, 0.01942342147231102, -0.011628317646682262, -0.037681981921195984, 0.012515510432422161, -0.021373892202973366, 0.0075445231050252914, 0.001223276020027697, -0.017527133226394653, -0.0043580797500908375, -0.00783573929220438, 0.007652882486581802, 0.011418371461331844, 0.014330530539155006, 0.003575860057026148, -0.016267454251646996, 0.024231869727373123, 0.0005155536928214133, 0.005506012123078108, 0.018353372812271118, 0.009562716819345951, -0.024123510345816612, 0.002543059643357992, -0.011377736926078796, -0.02885068953037262, -0.03145131468772888, 0.005285907071083784, 0.011567365378141403, 0.02129262126982212, 0.030909517779946327, -0.003643584670498967, -0.02043929137289524, 0.01691761054098606, -0.0010378798469901085, 0.015454758889973164, 0.007131402846425772, 0.031315866857767105, 0.03004264272749424, 0.006379659753292799, -0.0026209428906440735, -0.0017015811754390597, -0.005831090267747641, -0.00885838083922863, -0.04402100667357445, 0.018434641882777214, 0.008140499703586102, -0.007063678465783596, 0.012705139815807343, 0.010151920840144157, -0.009643986821174622, -0.024177690967917442, -0.01450661476701498, -0.00027386145666241646, 0.02333790622651577, 0.0014035928761586547, -0.023933881893754005, -0.014777513220906258, 0.0009464516188018024, -0.017811575904488564, 0.005387493874877691, -0.007598702795803547, 0.01326048094779253, -0.000933753268327564, 0.003745171707123518, -0.002834275597706437, -0.01559020858258009, -0.007490343414247036, -0.02488202601671219, -0.006135851144790649, -0.003150888020172715, 0.01970786601305008, 0.024895571172237396, 0.005743048153817654, 0.006806324701756239, -0.008926105685532093, 0.006108761299401522, -0.007111085578799248, -0.02736074849963188, -0.008296266198158264, 0.013348523527383804, 0.05106436833739281, 0.006112147122621536, 0.038305047899484634, -0.01970786601305008, 0.03169512376189232, 0.013551697134971619, 0.0010666628368198872, 0.005617757793515921, 0.012739001773297787, 0.003518294310197234, -0.013497517444193363, 0.0002874063793569803, 0.02248457446694374, -0.006322093773633242, 0.01217688713222742, -0.013592331670224667, 0.016660258173942566, 0.020073577761650085, -0.0027005192823708057, 0.010429591871798038, -0.017703216522932053, -0.04900553822517395, -0.014018997550010681, 0.026588687673211098, 0.014059632085263729, 0.016226820647716522, -0.013389158062636852, -0.0023263408802449703, 0.023378539830446243, 0.0017388297710567713, 0.011330329813063145, 0.008391081355512142, 0.032345280051231384, -0.014953597448766232, -0.011086520738899708, 0.008052458055317402, -0.0041650645434856415, -0.022010503336787224, -0.005157229956239462, -0.04288323223590851, 0.02639905922114849, 0.0137413265183568, -0.004507073666900396, 0.009339225478470325, 0.004246334079653025, -0.024380864575505257, 0.001368037424981594, -0.0001503698294982314, -0.009847160428762436, -0.0018539616139605641, 0.01800120435655117, -0.00708399573341012, -0.0077341520227491856, -0.04326248914003372, 0.0027106781490147114, -0.008303038775920868, -0.009102189913392067, -0.005763365421444178, 0.008235313929617405, 0.015874652191996574, -0.02066955529153347, -0.03180348500609398, 0.021509340032935143, -0.013125032186508179, 0.04981823265552521, -0.00219427770934999, 0.026764770969748497, 0.03215565159916878, 0.01663316786289215, -0.010206100530922413, 0.0071178581565618515, 0.0012435934040695429, 0.005004849750548601, 0.031613852828741074, 0.023243090137839317, -0.021048814058303833, 0.006511722691357136, -0.023012828081846237, -0.006691192742437124, -0.017635492607951164, -0.04402100667357445, 0.004903262946754694, 0.01602364517748356, -0.01584756188094616, -0.007740924600511789, -0.01155382115393877, -0.04865337163209915, 0.016091370955109596, -0.003518294310197234, -0.0009616896859370172, -0.01096461620181799, -0.0070501333102583885, -0.00945435743778944, -0.004466439131647348, -0.0009777742670848966, 0.014140901155769825, 0.011018795892596245, 0.011960168369114399, -0.003447183407843113, -0.03088242933154106, 0.009921657852828503, 0.018813900649547577, -0.0002296288002980873, 0.005699027329683304, -0.021875053644180298, 0.008025367744266987, -0.004080408718436956, -0.006620082072913647, -0.008235313929617405, -0.02289092354476452, 0.012312336824834347, 0.030746979638934135, -0.002769937040284276, 0.0042497203685343266, -0.015319310128688812, -0.006769075989723206, -0.00027068686904385686, -0.007808648981153965, -0.0015322696417570114, -0.008580709807574749, -0.019260883331298828, -0.009027692489326, 0.019044164568185806, 0.015563118271529675, -0.006989181041717529, -0.012122707441449165, -0.007903463207185268, -0.009637214243412018, 0.010036788880825043, 0.0024431657511740923, 0.0032355438452214003, -0.043072860687971115, -0.006843573413789272, -0.00954239908605814, 0.00434453459456563, -0.010768215171992779, 0.007470026146620512, -0.0016389358788728714, 0.0003136496525257826, 0.048003215342760086, -0.014953597448766232, 0.009549171663820744, 0.004398714285343885, -0.010402502492070198, -0.023188911378383636, -0.001190260285511613, -0.00832335650920868, -0.006511722691357136, 0.023419175297021866, -0.024800756946206093, -0.025315463542938232, -0.016497718170285225, 0.007720607332885265, 0.021766694262623787, -0.024123510345816612, -0.0006201036158017814, 0.012400378473103046, 0.006200189236551523, -0.008553620427846909, -0.008391081355512142, -0.005245272070169449, -0.018041839823126793, -0.004808448255062103, 0.001276609138585627, 0.003999139182269573, 0.0009015840478241444, 0.010084196925163269, -0.01013160403817892, 0.02486848272383213, -0.011621545068919659, -0.04775940626859665, 0.008194679394364357, 0.017472952604293823, 0.02061537466943264, -0.005133526399731636, -0.0015889890491962433, -0.01691761054098606, 0.00633225217461586, -0.02077791467308998, -0.0013138577342033386, 0.009975837543606758, -0.013158894143998623, -0.033753953874111176, 0.022525209933519363, -0.004432576708495617, 0.00391786964610219, 0.0012960799504071474, 0.0014501535333693027, -0.022064682096242905, 0.005201251246035099, -0.0496286042034626, 0.011425144039094448, -0.007747697178274393, 0.04819284379482269, -0.0071178581565618515, -0.010402502492070198, -0.02550509385764599, -0.03464791923761368, -0.029880104586482048, -0.019856858998537064, 0.007558068260550499, 0.010314459912478924, 0.03562315180897713, 0.00570579944178462, 0.013470428064465523, 0.030313542112708092, -0.02874233014881611, -0.014018997550010681, -0.005509397946298122, -0.023039916530251503, 0.005123367998749018, 0.009840387850999832, 0.008160817436873913, -0.003633426036685705, -0.015183860436081886, -0.020466381683945656, -0.005390880163758993, -0.007612247951328754, -0.004412259440869093, 0.008993830531835556, -0.005773524288088083, 0.0006882515153847635, 0.01030768733471632, -0.004987918771803379, 0.009664303623139858, 0.011804401874542236, -0.0035961775574833155, -0.030638620257377625, -0.01104588620364666, 0.002988348947837949, -0.025410279631614685, -0.010998479090631008, -0.008363991044461727, -0.00024338536604773253, 0.006904525216668844, -0.0017193588428199291, 0.0037011506501585245, 0.0013985134428367019, 0.009610123932361603, -0.012935402803122997, 0.03795287758111954, -0.021793784573674202, -0.005908973515033722, 0.011933078989386559, 0.016958246007561684, -0.0137413265183568, -0.02680540643632412, 0.009020919911563396, -0.004341148305684328, -0.0178793016821146, 0.008506212383508682, -0.010856256820261478, -0.002473641885444522, 0.010869801975786686, 0.041366200894117355, -0.011933078989386559, -0.009975837543606758, -0.017513588070869446, 0.00011163981253048405, -0.02282319776713848, 0.005147071555256844, 0.01181117445230484, 0.0033032684586942196, -0.016958246007561684, -0.001860734075307846, 0.011817947030067444, 0.004581570625305176, -0.034106120467185974, 0.000609098351560533, -0.01750004291534424, 0.004537549801170826, 0.025125835090875626, 0.008932878263294697, -0.009325680322945118, -0.015089046210050583, 0.024001607671380043, -0.015657933428883553, -0.006467701401561499, 0.21455161273479462, -0.01680925115942955, 0.012258157134056091, 0.007219444960355759, -0.026669956743717194, -0.0010040175402536988, 0.028065083548426628, -0.0029951215256005526, -0.013619421981275082, 0.022186586633324623, -0.009894567541778088, 0.02947375550866127, -0.021888598799705505, -0.004398714285343885, 0.00709754042327404, -0.004012683872133493, -0.038575947284698486, -0.028525611385703087, -0.018421098589897156, -0.029771745204925537, 0.00922409351915121, 0.003670674515888095, -0.027225298807024956, -0.015468304045498371, 0.029825923964381218, 0.0069620911963284016, -0.012901540845632553, -0.004242947790771723, 0.018840990960597992, 0.008167590014636517, -0.01021964568644762, 0.0007521666120737791, -0.0015644388040527701, -0.008865153416991234, -0.012691594660282135, 0.007950871251523495, 0.010104513727128506, 0.009251183830201626, 0.00792378094047308, -0.008871925994753838, 0.017445862293243408, -0.018637817353010178, 0.00315935374237597, -0.03191184252500534, -9.476156265009195e-05, 0.032399460673332214, -0.011452234350144863, -0.019613051787018776, -0.02113008312880993, 0.02520710416138172, -0.014601428993046284, -0.011350646615028381, 0.02885068953037262, 0.021847963333129883, -0.005445059854537249, -0.00246348325163126, -0.012000803835690022, 0.011066203936934471, -3.261892925365828e-05, 0.01976204477250576, 0.00013185138232074678, 0.01189921610057354, -0.01901707425713539, 0.03367268294095993, 0.0060274917632341385, -0.0029138519894331694, -0.008296266198158264, 0.008194679394364357, -0.0014543862780556083, -0.019680775701999664, -0.005089505575597286, -0.01573920249938965, -0.025396734476089478, -0.015156771056354046, -0.030015554279088974, -0.025586362928152084, 0.02618234045803547, 0.012650960125029087, 0.012650960125029087, 0.02522064931690693, -0.013118259608745575, -0.01450661476701498, -0.0023178751580417156, -0.015766292810440063, -0.01317243929952383, -0.03838631510734558, 0.01053795125335455, -0.019152523949742317, 0.003636812325567007, -0.005804000422358513, -0.01544121466577053, -0.00015290950250346214, -0.012021120637655258, -0.016159094870090485, 0.005238499492406845, -0.014777513220906258, 0.002856286009773612, 0.024109967052936554, 9.322717960458249e-05, 0.012684822082519531, -0.022728383541107178, 0.006234051659703255, 0.0274826530367136, 0.0040973396971821785, -0.0007572459289804101, 0.003035756293684244, 0.007842511869966984, -0.010341550223529339, 0.007212672382593155, -0.009129279293119907, -0.013138577342033386, -0.02906740829348564, 0.008025367744266987, -0.0016220047837123275, 0.008905787952244282, 0.01812310889363289, -0.019179614260792732, -0.027956724166870117, 0.020358022302389145, -0.013714236207306385, -0.021279076114296913, -0.006968863774091005, -0.013450110331177711, 0.017933480441570282, -0.01862427219748497, -0.009034465067088604, -0.0364629365503788, 0.007287169340997934, -0.022213676944375038, -0.0005900507676415145, 0.01720205508172512, -0.027685826644301414, 0.0008110023918561637, 0.004439349286258221, -0.008479123003780842, -0.011492868885397911, 0.013226618990302086, -0.007199127692729235, 0.008472350426018238, -0.0039923666045069695, -0.0040567051619291306, 0.010402502492070198, -0.012793181464076042, 0.008052458055317402, 0.004547708667814732, -0.034945905208587646, 0.02225431054830551, 0.025816626846790314, -0.0033015753142535686, -0.02726593427360058, -0.007693517487496138, 0.0133417509496212, 0.003839986165985465, -0.0014222171157598495, 0.020588286221027374, -0.00734134903177619, -0.016497718170285225, -0.01476396806538105, -0.02106235735118389, 0.006037650164216757, -0.05453186854720116, 0.028606880456209183, 0.015305764973163605, -0.016619622707366943, -0.01510259136557579, -0.02146870642900467, -0.17370010912418365, 0.01981622539460659, 0.015414124354720116, -0.02925703674554825, 0.03621912747621536, 0.017567766830325127, 0.017649037763476372, -0.002563376910984516, 0.0031237981747835875, -0.004788130987435579, 0.0010692025534808636, 0.0035995638463646173, -0.010883347131311893, -0.01664671301841736, -0.01479105744510889, 0.016213275492191315, -0.02339208498597145, 0.01947760209441185, 0.03088242933154106, -0.0015618990873917937, 0.007984733209013939, -0.030448991805315018, 0.0025295147206634283, -0.0016871896805241704, 0.02043929137289524, 0.016145549714565277, -0.002195970853790641, 0.0031745918095111847, -0.013030217960476875, -0.02646678313612938, 0.00726685207337141, 0.015495394356548786, 0.011953395791351795, -0.0035216803662478924, 0.05512784421443939, -0.014465979300439358, 0.003154274309054017, -0.02072373405098915, -0.036598388105630875, 0.01624036394059658, 0.03323924541473389, 0.025369644165039062, 0.03605658933520317, -0.02236267179250717, -0.018543001264333725, 0.0033997760619968176, 0.02792963571846485, -0.03004264272749424, 0.0005765058449469507, -0.00925795640796423, -0.003370993072167039, -0.031830571591854095, -0.004700088873505592, -0.008391081355512142, 0.002686974359676242, 0.005844634957611561, -0.005665164906531572, 0.031370047479867935, -0.004520618822425604, 0.009244411252439022, -0.011926306411623955, -0.014046086929738522, 0.01535994466394186, 0.013605876825749874, -0.0005502625717781484, -0.026101069524884224, -0.020967543125152588, -0.00208761147223413, -0.031315866857767105, 0.01782512106001377, -0.008269176818430424, -0.02822762355208397, 0.0026852814480662346, -0.0020588284824043512, -0.0020385112147778273, 0.010578586719930172, -0.022051136940717697, 0.02061537466943264, -0.00925795640796423, 0.0019860246684402227, -0.006200189236551523, 0.023486899212002754, -0.00875002145767212, -0.0038873935118317604, 0.002060521626845002, -0.007212672382593155, 0.010653083212673664, -0.00015915287076495588, -0.0011360805947333574, -0.0256134532392025, 0.01078853290528059, -0.005526329390704632, -0.015698567032814026, -0.007822194136679173, -0.0032135334331542253, 0.013558469712734222, 0.008018595166504383, 0.0035961775574833155, -0.0020943840499967337, -0.021726058796048164, 0.02367652766406536, -0.007903463207185268, 0.004371624439954758, -0.006237437948584557, 0.04350629821419716, 0.014344075694680214, -0.014696243219077587, 0.016484173014760017, 0.050116222351789474, -0.0013350216904655099, -0.03174930438399315, 0.015427669510245323, 0.004070249851793051, 0.02662932127714157, 0.0038670760113745928, 0.01908479817211628, 0.016660258173942566, -0.031668033450841904, 0.012508737854659557, -0.016226820647716522, 0.033022526651620865, 0.0017676126444712281, -0.011120383627712727, 0.018217923119664192, -0.015915285795927048, -0.019003529101610184, -0.11626963317394257, -0.014046086929738522, -0.003252475056797266, 0.010273825377225876, -0.0075038885697722435, 0.027902545407414436, -0.026656411588191986, 0.03391649201512337, -0.029202857986092567, 0.034675005823373795, -0.007977960631251335, -0.042043447494506836, 0.012102390639483929, -0.00354199786670506, -0.002629408612847328, -0.024854937568306923, -0.009671076200902462, -0.015752747654914856, -0.020628919824957848, 0.028498521074652672, 0.015346399508416653, -0.0091157341375947, -0.0043783970177173615, -0.0001617983652977273, 0.0019420036114752293, 0.002756392117589712, -0.03220983222126961, 0.011398054659366608, 0.017134329304099083, 0.0114725511521101, -0.010273825377225876, -0.01579338312149048, 0.02209177240729332, -0.011431916616857052, 0.023554624989628792, -0.024827847257256508, -0.0003678293724078685, 0.009908112697303295, 0.02079145982861519, -0.005133526399731636, 0.0029240106232464314, 0.019342152401804924, 0.02424541488289833, -0.01172990445047617, 0.00494051119312644, 0.0010226417798548937, -0.026250064373016357, 0.021495794877409935, -0.002822423819452524, -0.04025551676750183, -0.03638166934251785, 0.006846959702670574, -0.030421901494264603, -0.005543260369449854, 0.01839400827884674, 0.0016897293971851468, 0.0295279361307621, 0.0051030502654612064, 0.004828765522688627, 0.008512984961271286, -0.019843315705657005, -0.013639739714562893, -0.003147501964122057, -0.007849283516407013, 0.020249662920832634, -0.008980285376310349, -0.026371968910098076, -0.008729703724384308, -0.0002124860038748011, -0.010747897438704967, -0.007429391145706177, 0.022511664777994156, -0.035352252423763275, 0.011743449606001377, 0.01005033403635025, -0.012630642391741276, -0.01468269806355238, 0.0012622176436707377, 0.02049347199499607, -1.540999801363796e-05, -0.016660258173942566, -0.011201652698218822, -4.653422365663573e-05, -0.016944700852036476, -0.000882113236002624, 0.0042023127898573875, -0.023364994674921036, 0.0025312078651040792, 0.0016440151957795024, -0.017337502911686897, 0.005231727380305529, 0.026033345609903336, 0.013517835177481174, 0.005512784235179424, 0.020886274054646492, -0.0007551295566372573, -0.0042903549037873745, -0.0008575630490668118, -0.00337268621660769, -0.01021964568644762, -0.020398655906319618, 0.00800505094230175, -0.05818899720907211, 0.025735357776284218, 0.009183458983898163, -0.028633970767259598, -0.0027868682518601418, 0.0038806209340691566, 0.013185984455049038, -0.026818951591849327, -0.02385261282324791, -0.010497316718101501, -0.05152489244937897, 0.01789284497499466, -0.014005452394485474, -0.004622205626219511, -0.010734353214502335, 0.007673199754208326, 0.009481447748839855, 0.004845696967095137, -0.0007157646468840539, -0.0071584926918148994, 0.003951731603592634, -0.00022073995205573738, 0.014520158991217613, 0.006335638463497162, -0.0032964961137622595, -0.003531839232891798, -0.03248072788119316, 0.008783883415162563, -0.011533503420650959, -0.018272103741765022, -0.008912560530006886, -0.007910235784947872, 0.015292219817638397, -0.0035860189236700535, -0.009759118780493736, 0.024827847257256508, 0.012325881980359554, 0.028823599219322205, -0.00954239908605814, 0.07617665827274323, -0.031613852828741074, -0.021793784573674202, 0.014926507137715816, -0.010829167440533638, 0.0007678279071114957, 0.005763365421444178, -0.00800505094230175, 0.0013739633141085505, 0.01970786601305008, 0.002937555545940995, 0.026209428906440735, 0.02049347199499607, -0.03754653036594391, -0.021306166425347328, -0.006674261763691902, -0.0015585129149258137, 0.0070298160426318645, -0.013761643320322037, -0.014032541774213314, 0.01567147858440876, 0.02226785570383072, -0.013551697134971619, 0.011831492185592651, -0.014046086929738522, 0.01479105744510889, -0.032670360058546066, -0.00866197980940342, 0.024096421897411346, 0.003829827532172203, -0.009833615273237228, -0.014018997550010681, -0.01206852775067091, 0.007551295682787895, 0.027414927259087563, 0.023134730756282806, -0.007564840372651815, -0.004317444749176502, 0.018082475289702415, -0.0035250666551291943, -0.004862627945840359, 0.019464056938886642, 0.005719344597309828, -0.007693517487496138, 0.012386833317577839, 0.035921141505241394, 0.010097741149365902, 0.02232203632593155, 0.01141159888356924, -0.001872585853561759, 0.017337502911686897, 0.01670089177787304, 0.004239561501890421, 0.005164002534002066, 0.0029189311899244785, 0.02164478972554207, 0.005465377122163773, -0.014046086929738522, 0.007652882486581802, 0.026222974061965942, 0.015048411674797535, 0.016836341470479965, 0.0026226360350847244, 0.009603351354598999, -0.014831692911684513, -0.04253106564283371, 0.015400579199194908, -0.018217923119664192, -0.030286451801657677, 0.00532315531745553, 0.009332452900707722, 0.01306407991796732, 0.010625993832945824, 0.0006065586931072176, 0.014493069611489773, -0.02844434231519699, 0.01634872332215309, -0.008472350426018238, -0.01670089177787304, -0.02073727920651436, 0.03388940170407295, 0.01675507239997387, 0.011784084141254425, 0.05057675018906593, -0.03516262397170067, 0.007998278364539146, 0.034675005823373795, 0.02874233014881611, -0.005367176607251167, -0.011865354143083096, -0.0023280340246856213, 0.006741986144334078, -0.01376841589808464, -0.0023060233797878027, -0.005272361915558577, 0.0029138519894331694, -0.014587883837521076, -0.01668734662234783, 0.02249811962246895, -0.004998077172785997, 0.04952024668455124, 0.015481849201023579, -0.002592159900814295, 0.007530978415161371, -0.018068930134177208, 0.004564639646559954, 0.01907125487923622, -0.002937555545940995, -0.012732229195535183, -0.011228743009269238, 0.00610537501052022, -0.0064338394440710545, 0.019206702709197998, -0.05092891678214073, -0.009589807130396366, 0.00818113423883915, -0.007876373827457428, 0.02385261282324791, -0.0008232774562202394, -0.0026073979679495096, 0.012590007856488228, 0.018949350342154503, 0.0219698678702116, 0.003907710779458284, -0.03077406994998455, -0.0021045426838099957, 0.017527133226394653, 0.001386661664582789, -0.015576663427054882, -0.03283289819955826, 0.019897494465112686, -0.018583636730909348, -0.040797311812639236, -0.014750422909855843, 0.008160817436873913, -0.012217522598803043, -0.0008249706006608903, -0.01703951507806778, 0.025288375094532967, -0.00039428428863175213, 0.001479782979004085, 0.006620082072913647, -0.02253875508904457, -0.018299194052815437, -0.009461130015552044, -0.005807386711239815, -0.024096421897411346, 0.0017676126444712281, -0.02941957674920559], metadata={'file_path': '/content/data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75042, 'creation_date': '2024-04-26', 'last_modified_date': '2024-04-26', 'document_title': 'Journeys in Writing, Programming, and Art: Exploring the Evolution of Artificial Intelligence and the Intersection of Technology and Creativity'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='244aec5e-98e0-48d1-81fd-9c12c2fe4c5c', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'file_path': '/content/data/paul_graham/paul_graham_essay.txt', 'file_name': 'paul_graham_essay.txt', 'file_type': 'text/plain', 'file_size': 75042, 'creation_date': '2024-04-26', 'last_modified_date': '2024-04-26'}, hash='952e9dc1a243648316292b0771f0f024a059072e500f7da0092671800767f543'), <NodeRelationship.NEXT: '3'>: RelatedNodeInfo(node_id='6107b60b-a45c-44f4-a3ea-e146d0151f47', node_type=<ObjectType.TEXT: '1'>, metadata={}, hash='d3386e1e52a73d6920911fc30d0592648217874646641b9e7c64ef1c1f4cc82b')}, text='What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn\'t write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n\nThe first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district\'s 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain\'s lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.\n\nThe language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in the card reader and press a button to load the program into memory and run it. The result would ordinarily be to print something on the spectacularly loud printer.\n\nI was puzzled by the 1401. I couldn\'t figure out what to do with it. And in retrospect there\'s not much I could have done with it. The only form of input to programs was data stored on punched cards, and I didn\'t have any data stored on punched cards. The only other option was to do things that didn\'t rely on any input, like calculate approximations of pi, but I didn\'t know enough math to do anything interesting of that type. So I\'m not surprised I can\'t remember any programs I wrote, because they can\'t have done much. My clearest memory is of the moment I learned it was possible for programs not to terminate, when one of mine didn\'t. On a machine without time-sharing, this was a social as well as a technical error, as the data center manager\'s expression made clear.\n\nWith microcomputers, everything changed. Now you could have a computer sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punch cards and then stopping. [1]\n\nThe first of my friends to get a microcomputer built it himself. It was sold as a kit by Heathkit. I remember vividly how impressed and envious I felt watching him sitting in front of it, typing programs right into the computer.\n\nComputers were expensive in those days and it took me years of nagging before I convinced my father to buy one, a TRS-80, in about 1980. The gold standard then was the Apple II, but a TRS-80 was good enough. This was when I really started programming. I wrote simple games, a program to predict how high my model rockets would fly, and a word processor that my father used to write at least one book. There was only room in memory for about 2 pages of text, so he\'d write 2 pages at a time and then print them out, but it was a lot better than a typewriter.\n\nThough I liked programming, I didn\'t plan to study it in college. In college I was going to study philosophy, which sounded much more powerful. It seemed, to my naive high school self, to be the study of the ultimate truths, compared to which the things studied in other fields would be mere domain knowledge. What I discovered when I got to college was that the other fields took up so much of the space of ideas that there wasn\'t much left for these supposed ultimate truths. All that seemed left for philosophy were edge cases that people in other fields felt could safely be ignored.\n\nI couldn\'t have put this into words when I was 18. All I knew at the time was that I kept taking philosophy courses and they kept being boring. So I decided to switch to AI.\n\nAI was in the air in the mid 1980s, but there were two things especially that made me want to work on it: a novel by Heinlein called The Moon is a Harsh Mistress, which featured an intelligent computer called Mike, and a PBS documentary that showed Terry Winograd using SHRDLU. I haven\'t tried rereading The Moon is a Harsh Mistress, so I don\'t know how well it has aged, but when I read it I was drawn entirely into its world. It seemed only a matter of time before we\'d have Mike, and when I saw Winograd using SHRDLU, it seemed like that time would be a few years at most. All you had to do was teach SHRDLU more words.\n\nThere weren\'t any classes in AI at Cornell then, not even graduate classes, so I started trying to teach', start_char_idx=2, end_char_idx=4473, text_template='[Excerpt from document]\n{metadata_str}\nExcerpt:\n-----\n{content}\n-----\n', metadata_template='{key}: {value}', metadata_seperator='\n')
Cache¶
In [ ]:
Copied!
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TitleExtractor(),
]
)
nodes = pipeline.run(documents=documents)
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TitleExtractor(),
]
)
nodes = pipeline.run(documents=documents)
100%|██████████| 5/5 [00:01<00:00, 4.76it/s]
In [ ]:
Copied!
# save and load
pipeline.cache.persist("./llama_cache.json")
new_cache = IngestionCache.from_persist_path("./llama_cache.json")
# save and load
pipeline.cache.persist("./llama_cache.json")
new_cache = IngestionCache.from_persist_path("./llama_cache.json")
In [ ]:
Copied!
new_pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TitleExtractor(),
],
cache=new_cache,
)
new_pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TitleExtractor(),
],
cache=new_cache,
)
Now it will run instantly due to the cache.¶
Will be very useful when extracting metadata and also creating embeddings
In [ ]:
Copied!
nodes = new_pipeline.run(documents=documents)
nodes = new_pipeline.run(documents=documents)
Now let's add embeddings to it. You will observe that the parsing of nodes, title extraction is loaded from cache and OpenAI embeddings are created now.
In [ ]:
Copied!
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TitleExtractor(),
OpenAIEmbedding(),
],
cache=new_cache,
)
nodes = pipeline.run(documents=documents)
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TitleExtractor(),
OpenAIEmbedding(),
],
cache=new_cache,
)
nodes = pipeline.run(documents=documents)
In [ ]:
Copied!
# save and load
pipeline.cache.persist("./nodes_embedding.json")
nodes_embedding_cache = IngestionCache.from_persist_path(
"./nodes_embedding.json"
)
# save and load
pipeline.cache.persist("./nodes_embedding.json")
nodes_embedding_cache = IngestionCache.from_persist_path(
"./nodes_embedding.json"
)
In [ ]:
Copied!
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TitleExtractor(),
OpenAIEmbedding(),
],
cache=nodes_embedding_cache,
)
# Will load it from the cache as the transformations are same.
nodes = pipeline.run(documents=documents)
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TitleExtractor(),
OpenAIEmbedding(),
],
cache=nodes_embedding_cache,
)
# Will load it from the cache as the transformations are same.
nodes = pipeline.run(documents=documents)
In [ ]:
Copied!
nodes[0].text
nodes[0].text
Out[ ]:
'What I Worked On\n\nFebruary 2021\n\nBefore college the two main things I worked on, outside of school, were writing and programming. I didn\'t write essays. I wrote what beginning writers were supposed to write then, and probably still are: short stories. My stories were awful. They had hardly any plot, just characters with strong feelings, which I imagined made them deep.\n\nThe first programs I tried writing were on the IBM 1401 that our school district used for what was then called "data processing." This was in 9th grade, so I was 13 or 14. The school district\'s 1401 happened to be in the basement of our junior high school, and my friend Rich Draves and I got permission to use it. It was like a mini Bond villain\'s lair down there, with all these alien-looking machines — CPU, disk drives, printer, card reader — sitting up on a raised floor under bright fluorescent lights.\n\nThe language we used was an early version of Fortran. You had to type programs on punch cards, then stack them in the card reader and press a button to load the program into memory and run it. The result would ordinarily be to print something on the spectacularly loud printer.\n\nI was puzzled by the 1401. I couldn\'t figure out what to do with it. And in retrospect there\'s not much I could have done with it. The only form of input to programs was data stored on punched cards, and I didn\'t have any data stored on punched cards. The only other option was to do things that didn\'t rely on any input, like calculate approximations of pi, but I didn\'t know enough math to do anything interesting of that type. So I\'m not surprised I can\'t remember any programs I wrote, because they can\'t have done much. My clearest memory is of the moment I learned it was possible for programs not to terminate, when one of mine didn\'t. On a machine without time-sharing, this was a social as well as a technical error, as the data center manager\'s expression made clear.\n\nWith microcomputers, everything changed. Now you could have a computer sitting right in front of you, on a desk, that could respond to your keystrokes as it was running instead of just churning through a stack of punch cards and then stopping. [1]\n\nThe first of my friends to get a microcomputer built it himself. It was sold as a kit by Heathkit. I remember vividly how impressed and envious I felt watching him sitting in front of it, typing programs right into the computer.\n\nComputers were expensive in those days and it took me years of nagging before I convinced my father to buy one, a TRS-80, in about 1980. The gold standard then was the Apple II, but a TRS-80 was good enough. This was when I really started programming. I wrote simple games, a program to predict how high my model rockets would fly, and a word processor that my father used to write at least one book. There was only room in memory for about 2 pages of text, so he\'d write 2 pages at a time and then print them out, but it was a lot better than a typewriter.\n\nThough I liked programming, I didn\'t plan to study it in college. In college I was going to study philosophy, which sounded much more powerful. It seemed, to my naive high school self, to be the study of the ultimate truths, compared to which the things studied in other fields would be mere domain knowledge. What I discovered when I got to college was that the other fields took up so much of the space of ideas that there wasn\'t much left for these supposed ultimate truths. All that seemed left for philosophy were edge cases that people in other fields felt could safely be ignored.\n\nI couldn\'t have put this into words when I was 18. All I knew at the time was that I kept taking philosophy courses and they kept being boring. So I decided to switch to AI.\n\nAI was in the air in the mid 1980s, but there were two things especially that made me want to work on it: a novel by Heinlein called The Moon is a Harsh Mistress, which featured an intelligent computer called Mike, and a PBS documentary that showed Terry Winograd using SHRDLU. I haven\'t tried rereading The Moon is a Harsh Mistress, so I don\'t know how well it has aged, but when I read it I was drawn entirely into its world. It seemed only a matter of time before we\'d have Mike, and when I saw Winograd using SHRDLU, it seemed like that time would be a few years at most. All you had to do was teach SHRDLU more words.\n\nThere weren\'t any classes in AI at Cornell then, not even graduate classes, so I started trying to teach'
RAG using Ingestion Pipeline¶
In [ ]:
Copied!
import qdrant_client
from llama_index.vector_stores.qdrant import QdrantVectorStore
client = qdrant_client.QdrantClient(location=":memory:")
vector_store = QdrantVectorStore(
client=client, collection_name="llama_index_vector_store"
)
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TitleExtractor(),
OpenAIEmbedding(),
],
cache=nodes_embedding_cache,
vector_store=vector_store,
)
# Ingest directly into a vector db
nodes = pipeline.run(documents=documents)
import qdrant_client
from llama_index.vector_stores.qdrant import QdrantVectorStore
client = qdrant_client.QdrantClient(location=":memory:")
vector_store = QdrantVectorStore(
client=client, collection_name="llama_index_vector_store"
)
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TitleExtractor(),
OpenAIEmbedding(),
],
cache=nodes_embedding_cache,
vector_store=vector_store,
)
# Ingest directly into a vector db
nodes = pipeline.run(documents=documents)
In [ ]:
Copied!
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_vector_store(vector_store)
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_vector_store(vector_store)
In [ ]:
Copied!
query_engine = index.as_query_engine()
query_engine = index.as_query_engine()
In [ ]:
Copied!
response = query_engine.query("What did paul graham do growing up?")
print(response)
response = query_engine.query("What did paul graham do growing up?")
print(response)
Paul Graham skipped a step in the evolution of computers and went straight from batch processing to microcomputers, which made microcomputers seem all the more exciting to him.
Custom Transformations¶
Implementing custom transformations is pretty easy.
Let's include a transformation that removes special characters from the text before generating embeddings.
The primary requirement for transformations is that they should take a list of nodes as input and return a modified list of nodes.
In [ ]:
Copied!
from llama_index.core.schema import TransformComponent
import re
class TextCleaner(TransformComponent):
def __call__(self, nodes, **kwargs):
for node in nodes:
node.text = re.sub(r"[^0-9A-Za-z ]", "", node.text)
return nodes
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TextCleaner(),
OpenAIEmbedding(),
],
cache=nodes_embedding_cache,
)
nodes = pipeline.run(documents=documents)
from llama_index.core.schema import TransformComponent
import re
class TextCleaner(TransformComponent):
def __call__(self, nodes, **kwargs):
for node in nodes:
node.text = re.sub(r"[^0-9A-Za-z ]", "", node.text)
return nodes
pipeline = IngestionPipeline(
transformations=[
TokenTextSplitter(chunk_size=1024, chunk_overlap=100),
TextCleaner(),
OpenAIEmbedding(),
],
cache=nodes_embedding_cache,
)
nodes = pipeline.run(documents=documents)
In [ ]:
Copied!
nodes[0].text
nodes[0].text
Out[ ]:
'What I Worked OnFebruary 2021Before college the two main things I worked on outside of school were writing and programming I didnt write essays I wrote what beginning writers were supposed to write then and probably still are short stories My stories were awful They had hardly any plot just characters with strong feelings which I imagined made them deepThe first programs I tried writing were on the IBM 1401 that our school district used for what was then called data processing This was in 9th grade so I was 13 or 14 The school districts 1401 happened to be in the basement of our junior high school and my friend Rich Draves and I got permission to use it It was like a mini Bond villains lair down there with all these alienlooking machines CPU disk drives printer card reader sitting up on a raised floor under bright fluorescent lightsThe language we used was an early version of Fortran You had to type programs on punch cards then stack them in the card reader and press a button to load the program into memory and run it The result would ordinarily be to print something on the spectacularly loud printerI was puzzled by the 1401 I couldnt figure out what to do with it And in retrospect theres not much I could have done with it The only form of input to programs was data stored on punched cards and I didnt have any data stored on punched cards The only other option was to do things that didnt rely on any input like calculate approximations of pi but I didnt know enough math to do anything interesting of that type So Im not surprised I cant remember any programs I wrote because they cant have done much My clearest memory is of the moment I learned it was possible for programs not to terminate when one of mine didnt On a machine without timesharing this was a social as well as a technical error as the data center managers expression made clearWith microcomputers everything changed Now you could have a computer sitting right in front of you on a desk that could respond to your keystrokes as it was running instead of just churning through a stack of punch cards and then stopping 1The first of my friends to get a microcomputer built it himself It was sold as a kit by Heathkit I remember vividly how impressed and envious I felt watching him sitting in front of it typing programs right into the computerComputers were expensive in those days and it took me years of nagging before I convinced my father to buy one a TRS80 in about 1980 The gold standard then was the Apple II but a TRS80 was good enough This was when I really started programming I wrote simple games a program to predict how high my model rockets would fly and a word processor that my father used to write at least one book There was only room in memory for about 2 pages of text so hed write 2 pages at a time and then print them out but it was a lot better than a typewriterThough I liked programming I didnt plan to study it in college In college I was going to study philosophy which sounded much more powerful It seemed to my naive high school self to be the study of the ultimate truths compared to which the things studied in other fields would be mere domain knowledge What I discovered when I got to college was that the other fields took up so much of the space of ideas that there wasnt much left for these supposed ultimate truths All that seemed left for philosophy were edge cases that people in other fields felt could safely be ignoredI couldnt have put this into words when I was 18 All I knew at the time was that I kept taking philosophy courses and they kept being boring So I decided to switch to AIAI was in the air in the mid 1980s but there were two things especially that made me want to work on it a novel by Heinlein called The Moon is a Harsh Mistress which featured an intelligent computer called Mike and a PBS documentary that showed Terry Winograd using SHRDLU I havent tried rereading The Moon is a Harsh Mistress so I dont know how well it has aged but when I read it I was drawn entirely into its world It seemed only a matter of time before wed have Mike and when I saw Winograd using SHRDLU it seemed like that time would be a few years at most All you had to do was teach SHRDLU more wordsThere werent any classes in AI at Cornell then not even graduate classes so I started trying to teach'