import sys
from pathlib import Path
1, str(Path.cwd().parent)) sys.path.insert(
30 Vectorstores and Embeddings
Recall the overall workflow for retrieval augmented generation (RAG):
= []
ls "a"])
ls.extend(["b"])
ls.extend([ ls
['a', 'b']
We just discussed Document Loading
and Splitting
.
from langchain.document_loaders import PyPDFLoader
# Load PDF
= [
loaders # Duplicate documents on purpose - messy data
"docs/MachineLearning-Lecture01.pdf"),
PyPDFLoader("docs/MachineLearning-Lecture02.pdf"),
PyPDFLoader("docs/MachineLearning-Lecture03.pdf")
PyPDFLoader(
]= []
docs for loader in loaders:
docs.extend(loader.load())
# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
= RecursiveCharacterTextSplitter(
text_splitter = 1500,
chunk_size = 150
chunk_overlap )
= text_splitter.split_documents(docs) splits
len(splits)
151
30.1 Embeddings
Let’s take our splits and embed them.
from langchain.embeddings.openai import OpenAIEmbeddings
= OpenAIEmbeddings() embedding
/var/folders/70/7wmmf6t55cb84bfx9g1c1k1m0000gn/T/ipykernel_11952/1742550774.py:2: LangChainDeprecationWarning: The class `OpenAIEmbeddings` was deprecated in LangChain 0.0.9 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-openai package and should be used instead. To use it run `pip install -U :class:`~langchain-openai` and import as `from :class:`~langchain_openai import OpenAIEmbeddings``.
embedding = OpenAIEmbeddings()
= "i like dogs"
sentence1 = "i like canines"
sentence2 = "the weather is ugly outside" sentence3
= embedding.embed_query(sentence1)
embedding1 = embedding.embed_query(sentence2)
embedding2 = embedding.embed_query(sentence3) embedding3
[-0.005669887070811415,
-0.007792916467724385,
0.00620382220901969,
0.006839460098766079,
0.00394413100574398,
0.004141178369723112,
-0.00700472561482402,
-0.009858738445246954,
0.005752519596009746,
-0.04164697151388982,
-0.012414001625813919,
0.01296700605903942,
0.00300656556074627,
0.010532513993705238,
-0.01533157722075668,
-0.021840506753068156,
0.023378748006189318,
0.014619663082263942,
0.027408690606680178,
-0.01475950395817523,
0.01051344516434929,
0.020454816134794575,
0.008720947265213481,
-0.01086940223359566,
-0.005584075941725814,
-0.01997173167386344,
0.006705975964968051,
-0.03523974426745518,
0.02161167707550655,
-0.000794546954729368,
0.007061933034214421,
-0.021865931858876086,
-0.024688161444393994,
-0.028603689206104052,
-0.0063627319142869265,
-0.02261598365608072,
0.013335675681189754,
-0.009528206481808516,
0.004115753263915181,
0.008466692481843948,
0.0068076782508448876,
-0.019768327102109767,
-0.004252415070277922,
-0.013589930464559287,
-0.023569440025039023,
0.026366244504749,
0.004118931402141173,
-0.008098021928371057,
-0.012897086086745054,
0.02284481333364232,
0.018967425093289275,
-0.013577217911655321,
-0.014238280907209641,
-0.03373964067304591,
-0.015725671948714944,
0.026391671473202045,
-6.157738855496857e-05,
-0.004277840641747131,
0.009051478297329362,
-0.00992865841754132,
-0.01988274194089057,
-0.012458495560977797,
-0.02822230703104975,
-0.004474888471387542,
-0.004309622489668322,
-0.012706394067895348,
-0.004977041761674625,
-0.012922511192552985,
-0.013310250575381825,
-0.009064191781555885,
0.01596721417918051,
0.02420507698346286,
0.002240622373919762,
-0.01723848995867329,
-0.002107138472952374,
-0.00408397141599399,
0.0026696777865171275,
0.004818132522068667,
-0.011492326639115527,
0.010087568122808557,
0.03353623610129224,
-0.026951031251556973,
-0.02687475593413318,
0.014123866068428841,
0.02252699392310785,
0.0073543264076184065,
-0.018789445627343533,
0.021281145112068113,
-0.012490277874560267,
-0.01324668687953944,
0.006200644070793698,
0.00032993562596361645,
0.003982269130117154,
-0.009235813108404529,
-0.004719608840079101,
0.02135742229213702,
-0.008256931633638293,
0.04167239661969775,
0.0008032869512661637,
-0.021128592614575416,
-0.005870112573016539,
0.005177267729541027,
-0.010170200648006887,
-0.0060894078359001675,
-0.042308035440766695,
-0.002086480341652791,
0.011416050390369178,
-0.00949006882309662,
0.010392674049116507,
-0.026188266901448373,
0.004725965116531084,
0.016666415764769283,
0.013806047589216925,
-0.04345218382857471,
0.0021389203208735654,
-0.015509555755379864,
0.013895036390867239,
-0.008771798408151898,
-0.02161167707550655,
-0.003155940385674254,
0.011892778574848332,
0.028120604745172917,
0.013005143717751317,
-0.010189269477362836,
0.010526157717253256,
-0.008746372371021412,
-0.02559076760173644,
-0.016552000925988485,
-0.03867218943087922,
0.005301216982999802,
0.025768745205037063,
0.03201070740107504,
0.020480241240602506,
-0.004166603941192321,
-0.021065027987410476,
-0.0018385816019046016,
-0.03419730002816912,
0.011886422298396348,
-0.01032275407682214,
0.0021325640444215827,
0.0005371137420955447,
0.029366455418857764,
-0.019819179176370744,
0.0037280141139169817,
-0.014594237976456011,
0.0024281353232209206,
0.014022163782552005,
-0.0031495841092222714,
0.0061911096561157244,
-0.03544314883920885,
0.0061879315178897335,
-0.023874546882669535,
0.01486120531272951,
-0.02349316284497012,
0.017721574419604424,
0.022183749406765443,
-0.019755614549205804,
0.01475950395817523,
-0.02649337375907888,
-0.008212436767151858,
0.006699619688516069,
0.011416050390369178,
0.020950613148629678,
0.02041667847608268,
-0.00045368630399092696,
0.01246485183742978,
0.003082842042323258,
0.0029747837128250784,
0.004376364323736697,
-0.0014492536156240454,
-0.0018973780904079978,
-0.002606113857844105,
-0.02191678207049195,
0.033459960783868446,
0.0009931836788597951,
0.03534144841597713,
-0.005669887070811415,
0.02534922537127087,
-0.033332831529538566,
-0.023353322900381387,
-0.031400493685814025,
-0.0028238198187840987,
0.0176580116550846,
0.015458704612441445,
-0.0024313136942775513,
-0.00700472561482402,
0.011180864436355593,
0.00028663279414256707,
0.014390834336024895,
-0.02064550815364428,
0.03579910777110033,
0.003680341342035194,
0.022984653278231054,
-0.0131831222523745,
-0.6814034452791066,
-0.002186593092755353,
0.0308411339074591,
0.012585622952662565,
0.005059674752534235,
0.012236023091190734,
0.011695731210869198,
0.006178397103211759,
-0.022336303766903255,
0.01898013764619324,
0.005895538144485748,
-0.0006229248129734858,
-0.016729980391934223,
-0.0023947644061867333,
0.019870029387986606,
-0.005666708932585423,
-0.004131643955045138,
-0.020785348098233014,
-0.0071382097486220476,
0.017441894530426962,
-0.007862835974357472,
0.015026471294448729,
-0.010945678482342009,
-0.016463013055660725,
-0.0019688873646459986,
-0.01095203475879399,
-0.0046910051303839006,
-0.0385196350707414,
-0.01445439803186728,
0.03007836769470539,
-0.02855283899448819,
0.01604349135924942,
-0.00835863391951513,
-0.022666835730341693,
0.05893631168417897,
-0.00689666751815648,
-0.01243942673162185,
0.04518111710054558,
0.0312987932625823,
0.03442612970573072,
-0.023251620614504552,
-0.0063627319142869265,
0.017124075119892487,
-0.0121343208053139,
-0.006044912969413732,
0.014085728409716945,
0.009280307974890965,
0.011517752676246015,
0.01308142089782022,
0.008199724214247892,
0.03900271953167254,
-0.0034578681737562136,
-0.0011830803482457664,
0.01671726783903026,
-0.001098858404688481,
0.010112993228616486,
0.007150922301526013,
-0.017670724207988566,
-0.0025965794431661313,
0.013678920197532158,
0.004179316959757565,
0.014619663082263942,
-0.026467946790625838,
-0.01277631497151227,
-0.004233346240921974,
-0.0003184146711675886,
-0.01950135976583627,
0.006839460098766079,
0.0019688873646459986,
0.008301426500124729,
0.016640990658961352,
0.02634081939894107,
-0.006006774845040558,
0.022018484356368783,
0.010049429532774103,
0.0087145909887615,
0.027154435823310643,
-0.023633004652203964,
-0.010132062057972435,
0.020429391028986645,
0.006985656785468072,
-0.03066315444151336,
-0.014835780206921579,
0.006527997430344867,
0.024370343896504634,
-0.011021954731088357,
-0.003702588775278412,
0.008040814508980657,
0.013945887533805656,
-0.006655125287690912,
0.005450591807927787,
0.017886841332646203,
-0.026951031251556973,
-0.013640781607497706,
-0.0004014448419939574,
0.006092585974126158,
-0.00838405902532306,
0.03826538028737187,
0.03366336535562212,
-0.02875624170359675,
0.01632317124842688,
-0.0016876177078636216,
0.01130799182804036,
0.011791076288971495,
0.016640990658961352,
0.018687743341466695,
-0.014212855801401712,
0.0045193828722126985,
-0.0017098650246915197,
0.0007305858751934162,
0.017429181977523,
-8.777260353014629e-06,
-0.01690795799523485,
0.004608372139524291,
0.011733868869581094,
-0.02162438962841052,
0.039206124103426214,
-0.020136996724260104,
0.026417096579009976,
-0.018077531488850794,
0.0007854096327066635,
0.008638313808692594,
0.023416887527546327,
-0.018547903396877966,
-0.005924141854180948,
0.014428971994736792,
0.015293439562044783,
-0.013055994860689734,
-0.008085309375467092,
-0.022082048983533723,
0.011352486694526796,
0.008657383569371098,
0.02875624170359675,
-0.018471626216809058,
0.03437527949411486,
-0.0017321123415194178,
0.030764856727390193,
-0.005313930001565046,
0.03302772653455317,
-0.012261448196998665,
-0.02743411571248811,
-0.011848283708361896,
-0.016170618750934184,
0.02135742229213702,
0.012572910399758599,
-0.045079412952023626,
-0.016348598216879926,
-0.00496432920877066,
-0.012769958695060288,
0.024688161444393994,
-0.015344290704983202,
-0.004271484365295148,
-0.01542056695372955,
-0.002580688519205535,
-0.02748496778674908,
-0.004875339941459067,
0.033002301428745245,
-0.004042654687733546,
-0.009483712546644638,
0.003081252973210262,
-0.0003962802800652318,
0.00477363812124351,
-0.004741856273322319,
-0.01721306485286536,
-0.022654121314792615,
-0.015369715810791131,
-0.016374023322687857,
0.040045163770958604,
0.0008859198257104537,
-0.025641617813352297,
0.0092103880025966,
-0.01690795799523485,
0.010411742878472454,
0.008593819873528715,
-0.012515702980368198,
0.008975202048583014,
-0.03173102751189758,
-0.03999431355934274,
-0.012159745911121829,
-0.011403337837465213,
0.020175136245617113,
-0.0010289383159787953,
-0.02311178066991582,
-0.011371555523882743,
-0.0021341531135345786,
-0.016946097516591863,
0.006527997430344867,
0.02707815864324174,
-0.031781877723513444,
0.029620710202227295,
-0.0022787604982929363,
0.0012823987976224697,
-0.020798060651136978,
0.00383289430518917,
-0.007030151186293229,
-0.01806481893594683,
-0.0016018065787780208,
0.017149500225700418,
0.005069209632873487,
0.006505750462762928,
0.01848434063235814,
-0.004945259913753434,
0.01542056695372955,
-0.008123447965501544,
-0.0010813784116148893,
-0.00954727624248702,
0.01773428697250839,
-0.025247523085394033,
0.013895036390867239,
0.009343871670733347,
0.016755405497742154,
-0.02407794959177809,
-0.024179651877654928,
0.007303475264679988,
-0.02781549975018752,
0.034680384489100255,
-0.002650608491499901,
0.024319491822243657,
0.014937482492798415,
-0.01905641296361703,
-0.011466901533307596,
-0.011104588187609244,
0.023683854863819825,
0.004452641038144324,
-0.011199933265711542,
-0.005787480047818207,
0.011206289542163524,
0.020454816134794575,
0.01343737796706659,
-0.007894618287939942,
-0.02784092485599545,
-0.0018973780904079978,
0.012954293506135455,
0.011015598454636373,
-0.020759922992425083,
-0.010564296307287708,
-0.012941580021908932,
-0.021484549683821786,
0.03923154920923414,
0.002683979641364728,
0.0023836406895651244,
-0.0058574000201125735,
0.011041024491766862,
-0.00595592370210214,
0.011848283708361896,
-0.006559779278266059,
0.04874068779300927,
-0.006324593789913752,
-0.02546364021005167,
0.0040712583974287465,
0.013767908999182472,
-0.008409485062453547,
0.0011822858136892687,
0.011428762943273144,
0.018077531488850794,
-0.04472345774542237,
0.023709279969627756,
-0.004659223282462709,
0.029824112911335854,
0.02972241062545902,
0.0022088405259985708,
0.0011393802491464682,
-0.006448543043372527,
-0.0007524359247430653,
0.027001883325817946,
-0.010297328039691654,
0.0006793376395997288,
-0.00014897754309782028,
0.010278259210335705,
0.02228545169264228,
-0.011466901533307596,
-0.02041667847608268,
0.008638313808692594,
-0.020086146512644242,
0.0009892108896619861,
-0.018611468024042906,
0.013011500925525856,
0.006763183384358452,
0.02228545169264228,
0.012630117819149,
0.02314992019127283,
-0.03262091739104583,
-0.011511395468471474,
0.018751307968631635,
-0.017060512355372662,
-0.017721574419604424,
-0.004004516563360372,
-0.015166312170360017,
-0.011683017726642677,
-0.005272613273304602,
0.0007814369017165145,
0.01923439242956277,
0.00667419411704686,
-0.005533224798787396,
0.018751307968631635,
-0.0006555011954511752,
0.0304597517324048,
-0.03450240502315451,
-0.004913478065832241,
0.0018226906779440057,
0.006391335623982127,
-0.025412788135790697,
0.002874670728891877,
-0.02234901631980722,
-0.006585204849735267,
0.005116882171924635,
-0.0036835194802611855,
-0.0053361769691469855,
-0.002326433503005363,
0.015725671948714944,
-0.01597992859472959,
-0.006057625987978975,
-0.012426714178717884,
0.01671726783903026,
-0.03666357254444066,
-0.010564296307287708,
-0.005120060310150627,
-0.014250993460113607,
0.02517124590532513,
0.02768837049585764,
-0.003063772980136671,
-0.008187011661343929,
-0.04096048248120501,
0.004166603941192321,
0.0898537246343521,
0.02032768874310981,
0.003661272279848607,
0.016526575820180554,
0.0353160233101692,
0.007494166817868416,
-0.01078041250062279,
-0.03867218943087922,
0.0024186009085429465,
-0.010939322205890025,
0.00785647969790549,
-0.011594028924992363,
0.010049429532774103,
0.0071254967300568035,
0.01020833923804134,
-0.007621294209553183,
0.006019487397944522,
0.01161309775434831,
0.020098859065548206,
0.006718688983533295,
0.006063982264430958,
-0.015179024723263983,
-0.019806466623466776,
0.030002092377281596,
0.023505877260519198,
0.025222097979586102,
0.00992865841754132,
-0.011396981561013231,
0.007684857905395566,
-0.0010345000578742803,
-0.024421194108120495,
0.016857107783618992,
-0.012204240777608264,
0.0030653620492496663,
0.0035118974549206232,
0.023721992522531724,
-0.0015096391732404373,
0.00802174567962471,
0.0253746504770788,
-0.01736561735035806,
0.0026331284984263095,
0.02626454408151728,
0.016119768539318322,
-0.027154435823310643,
0.017149500225700418,
-0.008371346472419095,
-0.005062852890760226,
0.017289342032934266,
0.010379961496212542,
-0.020721783471068074,
0.023696567416723793,
0.0008398361229416619,
0.006756827107906469,
-0.02035311384891774,
-0.002362982558265542,
0.012591980160437104,
-0.012388575588683432,
0.007322544559697214,
-0.017276627617385187,
-0.009604483661877421,
-0.014543386833517594,
-0.017289342032934266,
-0.0036644504180745985,
0.009483712546644638,
-0.00832685160593266,
-0.02768837049585764,
-0.011047380768218843,
0.003740726899651586,
-0.024090662144682057,
-0.00344833375907824,
0.015153598686133495,
-0.022921090513711228,
-0.013208548289504988,
0.015573119451222247,
-0.00813616051840551,
0.013589930464559287,
-0.006334128204591726,
-0.02559076760173644,
0.008975202048583014,
0.0063627319142869265,
-0.013119558556532117,
-0.013386526824128173,
0.014912056455667927,
0.016857107783618992,
-0.008237862804282346,
-0.013005143717751317,
0.01970476433758994,
0.0097633924358221,
0.0017623051203276138,
0.014784929063983161,
0.009706185016431699,
0.014111153515524876,
0.004182495097983556,
-0.01770886186670046,
0.015763811470071953,
-0.006022665536170514,
0.008187011661343929,
0.007506879370772382,
-0.002351858841643933,
-0.009604483661877421,
0.027713797464310685,
-0.012013549690081116,
-0.007983607089590256,
0.00540609740710263,
0.008301426500124729,
0.006699619688516069,
-0.0006197466165398346,
0.01638673587559182,
0.010284615486787689,
-0.010888471062951608,
0.027103585611694785,
-0.008333207882384642,
-0.008403128786001565,
0.0027904489017499113,
-0.024065237038874126,
-0.0008700289017498578,
-0.0021039603347263827,
0.00866373984582308,
0.006833103822314096,
-0.00985238216879497,
-0.013144984593662605,
-0.04169782172550568,
0.0008215615371039128,
0.005733450766653798,
0.011435119219725126,
-0.0025965794431661313,
-0.0001390457156224479,
-0.05303759772977362,
-0.033459960783868446,
-0.006667837840594877,
-0.007303475264679988,
0.006095764112352149,
0.004671935835366674,
-0.03277347175118364,
-0.030332624340720035,
-0.02367114231091586,
0.0021293856733649525,
0.007347970131166424,
-0.002246978650371745,
-0.0028142851712754853,
-0.01999715677967137,
-0.011880066021944366,
-0.00656613602037932,
-0.015204449829071914,
-0.004843558093537876,
-0.021192155379095245,
-0.005606322909307754,
-0.0012148623125822777,
-0.011371555523882743,
0.026773053648256346,
-0.006756827107906469,
-0.01831907371931636,
0.015929076520468616,
0.0063722663289649,
-0.0003680738667521103,
-0.023950822200093324,
0.0015096391732404373,
0.00700472561482402,
0.011479614086211561,
0.025768745205037063,
0.02452289639399733,
-0.01575109891716799,
0.025425502551339775,
0.0009463053833268456,
-0.014645089119394428,
-0.002606113857844105,
-0.0022104295951115662,
0.0003488061127187389,
-0.050444195959172204,
0.01867503078856273,
0.023709279969627756,
0.004064902120976764,
0.020505666346410437,
-0.019984444226767404,
-0.013272111985347371,
0.013958600086709622,
0.004455819176370316,
-0.0010996529392449787,
-0.015280727009140817,
-0.006057625987978975,
0.002086480341652791,
0.008867143486254196,
-0.01633588566397596,
0.03722293604808581,
0.005679421485489389,
-0.015840086787495745,
0.010170200648006887,
0.017835989258385226,
0.016984235175303758,
-0.011327060657396307,
0.03066315444151336,
0.005593610356403788,
0.010081211846356573,
-0.0023105425790447675,
0.009439217680158202,
0.00043024715622445224,
-0.01202626224298508,
0.005193158886332263,
-0.0059749929971193655,
0.018827583286055428,
0.006686907135612104,
0.0012371095129948561,
-0.026417096579009976,
-0.005777945167478955,
-0.00800267685026876,
0.021929494623395912,
-0.005485551794074969,
-0.021598964522602588,
0.01938694492705547,
-0.024866140910339733,
-0.003340275196749421,
-0.0051232384483766175,
-0.024090662144682057,
-0.03590080819433206,
0.021509974789629717,
0.0019482291169310962,
-0.0043636517708327315,
0.009915945864637354,
-0.021929494623395912,
-0.02173880446719132,
-0.028425711602803425,
-0.0042047425312267735,
0.022984653278231054,
0.00046401539874454825,
0.02517124590532513,
0.017467319636234893,
-0.010316397800370159,
-0.027001883325817946,
0.006744114555002504,
-0.004039476549507554,
0.018598753608493824,
0.004449462899918333,
0.003597708584006224,
-0.01610705598641436,
-0.0018417597401305928,
-0.000899427146001556,
0.03343453567806052,
0.012388575588683432,
-0.011009242178184392,
0.023658429758011894,
0.04118931215876662,
0.01362806905459374,
-0.0029493581413558693,
-0.03465495938329232,
-0.032875172174415365,
0.020124284171356137,
-0.015827374234591778,
-0.024561034052709225,
-0.02162438962841052,
-0.026239117113064234,
-0.001001129140840093,
0.02443390666102446,
-0.034171874922361185,
0.020365826401821704,
0.008803579790411813,
0.0006773512450008244,
0.011371555523882743,
-0.008879856039158161,
-0.007538661218693573,
0.0007901768982533102,
0.01638673587559182,
0.001102036542914472,
-0.004684648853931918,
-0.012153389634669847,
-0.0005116882288339955,
-0.004020407254490328,
0.013144984593662605,
0.011098231911157263,
-0.012153389634669847,
0.02419236443055889,
-0.02916305084710409,
-0.008720947265213481,
0.02405252448597016,
0.012712751275669888,
0.0030939657589448668,
0.013653494160401672,
-0.002002258281680186,
-0.024446619213928426,
0.024637311232778132,
-0.025972149776790736,
0.0037153013281823774,
0.009896876103958849,
-0.010252833173205219,
0.01707322490827663,
-0.009312090288473435,
-0.014327270640182512,
-0.0038106471047765914,
-0.007309831541131971,
-0.024955130643312604,
-0.01051344516434929,
0.023633004652203964,
-0.010367248011986019,
-0.010087568122808557,
0.009483712546644638,
0.019425082585767365,
-0.00411257512568919,
0.00954727624248702,
0.022158324300957516,
-0.03742633689454925,
0.009064191781555885,
-0.00018304375062851792,
0.005367959282729456,
-0.0012124785924974646,
0.009439217680158202,
-0.005917785577728965,
-0.02214561174805355,
0.013005143717751317,
-0.00656613602037932,
-0.015929076520468616,
0.0015525447377832378,
-0.0007687241741895699,
0.008701878435857534,
-0.011651236344382764,
-0.026213692007256303,
0.020874337831205885,
0.014606950529359976,
-0.0012744532192268522,
-0.041011332692820875,
-0.009184962896788669,
0.037451765725647415,
0.007278049693210779,
0.016501150714372623,
0.025603480154640403,
-0.02037853895472567,
0.015140886133229529,
0.0177724264938654,
0.036053362554469864,
-0.015890938861756722,
-0.024065237038874126,
-0.0020976038254437606,
0.0042301676370347045,
0.015268013524914296,
0.012789027524416236,
-0.03264634249685376,
0.002092836618104774,
0.011206289542163524,
-0.009470999062418115,
0.027790072781734478,
-0.009273951698438983,
0.019158115249493866,
0.012947937229683473,
0.01613248109222229,
-0.03180730282932137,
0.0013690043448492488,
-0.017556309369207764,
-0.008117091689049562,
-0.010271902933883723,
-0.018141096116015734,
-0.014327270640182512,
-0.0151154610274216,
-0.006972943766902828,
0.010100280675712522,
0.01699694772820772,
-0.01051344516434929,
-0.03490921416666185,
0.01400945122964804,
-0.02822230703104975,
0.011333417865170847,
0.017136787672796455,
-0.010519801440801273,
0.03066315444151336,
-0.007805629020628349,
-0.0176580116550846,
0.05730908256073005,
0.014034877266778528,
0.019653912263328965,
0.002741186827924489,
-0.003340275196749421,
0.0020086145581321682,
-0.009687116187075752,
0.003994982148682397,
-0.007875549458583994,
-0.005708025195184589,
-0.025984862329694703,
0.0003124555746823652,
-0.0028953290930220994,
0.0019021452977469847,
0.022310878661095327,
-0.005714381471636572,
0.0067949652322796435,
7.955400938211603e-05,
-0.012032618519437063,
-0.004989754780239869,
0.021052315434506513,
0.014606950529359976,
-0.02284481333364232,
-0.02890879606373456,
-0.01797582920297396,
0.00979517474940457,
-0.00945828650951415,
0.003127336675979054,
0.008975202048583014,
-0.002601346650505118,
-0.023696567416723793,
-0.014619663082263942,
-0.01703508538691962,
-0.012375863035779467,
-0.03671442275605651,
0.0069665874904508455,
-0.0006543093936164285,
-0.01133977414162283,
0.022997365831135017,
-0.006178397103211759,
-0.005072387771099479,
-0.004957972932318677,
-0.015636684078387187,
-0.02026412411594487,
-0.014352695745990443,
-0.001233136840212367,
-0.00959812645410288,
-0.0098205998552125,
-0.0025695648025839263,
0.01031004059259562,
-0.005933676268858922,
0.001019403668470182,
0.01023376434384927,
-0.0190309878578091,
-0.006731401536437261,
0.0035118974549206232,
0.004557520996585873,
0.005186802144219001,
0.030891984119074963,
-0.014530674280613628,
0.0002987496498559683,
0.0006753649086095797,
-0.0019784217793239723,
0.013526366768716904,
-0.009947727246897268,
0.004827666936746641,
0.00021333584625627612,
-0.017251202511577256,
-0.03490921416666185,
0.01967933923178201,
0.024484758735285436,
-0.01985731683508264,
0.018764020521535602,
0.20503124208358667,
-0.0011576548931918772,
-0.028146029850980844,
0.021700664945834308,
0.017174927194153464,
0.021344707876587942,
0.006102120388804132,
-0.00587329071124253,
0.0025298373762671173,
0.005806548877174155,
0.00424923693205193,
-0.019221679876658807,
-0.017047799802468698,
0.0034546900355302223,
0.025094970587901336,
-0.0063150589095745,
-0.022717685941957555,
-0.006725045259985278,
-0.02758667007262592,
-0.012986074888395368,
-0.0023677497656045287,
-0.024408481555216528,
-0.003985447268343145,
-0.015001046188640798,
0.02817145681943389,
-0.000888303487587607,
-0.02543821510424374,
0.004439928019579081,
0.029010498349611395,
0.00913411175385025,
-0.0070428642048584724,
0.00016745076257469385,
0.00656613602037932,
-0.006858528928122027,
-0.00720177344446443,
-0.009331159117829384,
0.013221260842408954,
-0.01594178907337258,
-0.0007333668043488184,
0.02049295379350647,
0.011663948897286728,
0.0067949652322796435,
0.012846234943806637,
-0.004614728415976274,
-0.0037184796992390076,
0.004802241830938711,
...]
import numpy as np
np.dot(embedding1, embedding2)
0.9631510802407719
np.dot(embedding1, embedding3)
0.7702031204123156
np.dot(embedding2, embedding3)
0.7590539714454778
30.2 Vectorstores
# ! pip install chromadb
from langchain.vectorstores import Chroma
= 'docs/chroma/' persist_directory
!rm -rf ./docs/chroma # remove old database files if any
= Chroma.from_documents(
vectordb =splits,
documents=embedding,
embedding=persist_directory
persist_directory )
print(vectordb._collection.count())
151
30.2.1 Similarity Search
= "is there an email i can ask for help" question
= vectordb.similarity_search(question,k=3) docs
len(docs)
3
0].page_content docs[
"cs229-qa@cs.stanford.edu. This goes to an account that's read by all the TAs and me. So \nrather than sending us email individually, if you send email to this account, it will \nactually let us get back to you maximally quickly with answers to your questions. \nIf you're asking questions about homework problems, please say in the subject line which \nassignment and which question the email refers to, since that will also help us to route \nyour question to the appropriate TA or to me appropriately and get the response back to \nyou quickly. \nLet's see. Skipping ahead — let's see — for homework, one midterm, one open and term \nproject. Notice on the honor code. So one thing that I think will help you to succeed and \ndo well in this class and even help you to enjoy this class more is if you form a study \ngroup. \nSo start looking around where you're sitting now or at the end of class today, mingle a \nlittle bit and get to know your classmates. I strongly encourage you to form study groups \nand sort of have a group of people to study with and have a group of your fellow students \nto talk over these concepts with. You can also post on the class newsgroup if you want to \nuse that to try to form a study group. \nBut some of the problems sets in this class are reasonably difficult. People that have \ntaken the class before may tell you they were very difficult. And just I bet it would be \nmore fun for you, and you'd probably have a better learning experience if you form a"
Let’s save this so we can use it later!
vectordb.persist()
30.3 Failure modes
This seems great, and basic similarity search will get you 80% of the way there very easily.
But there are some failure modes that can creep up.
Here are some edge cases that can arise - we’ll fix them in the next class.
= "what did they say about matlab?" question
= vectordb.similarity_search(question,k=5) docs
Notice that we’re getting duplicate chunks (because of the duplicate MachineLearning-Lecture01.pdf
in the index).
Semantic search fetches all similar documents, but does not enforce diversity.
docs[0]
and docs[1]
are indentical.
0] docs[
1] docs[
We can see a new failure mode.
The question below asks a question about the third lecture, but includes results from other lectures as well.
= "what did they say about regression in the third lecture?" question
= vectordb.similarity_search(question,k=5) docs
for doc in docs:
print(doc.metadata)
print(docs[4].page_content)
Approaches discussed in the next lecture can be used to address both!