 All right, so we are going to talk about Canopy. Canopy is, yep, Canopy is Pinecone's open source rag framework, and Pinecone, if you don't know, is a cloud native vector database. Okay, so Canopy is an open source rag framework. It's written in Pinecone, and it's from Canopy, as I said before, and what Canopy, or from Pinecone, and what Canopy does is abstract away all the non-trivial tasks involved in creating a rag GenAI app. And the point of Canopy is that by abstracting away all of these non-trivial tedious tasks, it frees up your time as a developer to concentrate on what we think really matters, which is the end product. So Canopy abstracts away all the work involved with data chunking, generating the vectors slash embeddings. We're gonna use them interchangeably in this presentation. Setting up your vector database, query optimization, context generation, and LLM orchestration, and management. All you need is a Pinecone API key and a Pinecone environment, which you can sign up for very easily, pinecone.io. An open AI API key, we currently use open AI embedding models and text files to populate your Canopy index with. You can have Parquet, JSON-L, TXT, or CSV files. And since it's all backed by Pinecone, you get free storage and compute up to 100,000 vectors. That's about 15 million words or about 30,000 pages of text. And you have the ability to scale to meet production demands seamlessly. You also get a very nice Pinecone dashboard, which if you use Pinecone before, the same exact dashboard, it's right in the same dashboard with all of your regular Pinecone indexes. And we're HIPAA, GDPR, SOC2 type two, compliant and certified. And all of Pinecone's infrastructure and algorithms are purpose built for vector computation and storage. So this is something I'm sure you all are familiar with. This is the typical gen AI workflow. So a user will issue a query to an LLM. That LLM will answer the user. They might say, what is the most colorful bird in the world? They'll send this query to an LLM, likely made by open AI, cohere hosted on hugging face, et cetera. And the LLM would say the most colorful bird in the world is a parrot. But when we talk about RAG, which is retrieval augmented generation, if you don't know, when we talk about RAG, cool, okay. When we talk about RAG, we put these two more steps in the gen AI workflow. So your user query gets ported over to a vector database. Hopefully that would be Pinecone. From the vector database, you get vector search results. We call these context. And then together the vector database results and your user query get ported over to the large language model and then the large language model using that context would answer the user. Whoops, too easy. So we have our same scenarios before, but this time let's say we have some proprietary, very cool, very private bird documents. And now we know that the most colorful bird in the world is not just a parrot, but it's actually a Wilson's bird of paradise. And this is not public information. The LLM could never have known this in the whole world and it can only know this because you use RAG in your gen AI application. And you can see the two little, the yellow highlighted bit, that's the combined user query with your context. And so they go both together to the LLM in order for the LLM to have access to information it otherwise would never have known. And well, this is all well and good. The vector database and context retrieval aspect is easy, but tedious. And it requires dev hours and also research in order to actually do well. So some of the steps involved in RAG applications that are non-trivial are chunking up your data, and we'll go through what all of these mean later, chunking up your data, choosing whatever embedding model you want to embed or create vectors out of your text data, actually using that model, which is a whole other beast, setting up your vector database, querying that database, choosing your LLM, which is what will generate your answers. And then finally prompting that LLM with system level prompts and context prompts and all the prompts. And all of these things just takes a lot of time. But with Canopy, you don't have to do any of it. I will note though that Canopy is fully, completely open source. So if you do have the know-how and you do want to have more control over any of these things, there's an extremely easy configuration files. There's a YAML that you can put all of your gadgets and digits in and make Canopy as fancy as you want it to. If you don't want to go with our defaults. So there are two deployment options for Canopy. One, you can develop it as a service. So you can spin up the whole Canopy stack in the Canopy server. Our app.py file has all of our endpoints and they're all wrapped in fast API decorators. So it's super easy to use out of the box. You can also use Canopy's modular components, all or two or three or one, as libraries embedded in your own GNAI stacks. If you'd rather just pick and choose, cherry pick what you want out of the Canopy library. Alrighty. And this is my favorite part of Canopy. This is what a developer would see probably as a first touch point when using Canopy. This is the CLI. And this is where pretty much everything gets done with just a few commands. We'll have a live demo of this later. But all you have to do is do Canopy chat and essentially you're just chatting with all the documents that Canopy indexes into your index for you. And notable with the CLI is that this is a development tool. So the CLI is used to compare and contrast different configurations for your RAG pipeline. So if you have a higher temperature in your LLM or you have a different chunk size token overlap, things like that, you would chat with your documents after each of those iterations, figure out which one is best, and then go from production after you figure that out in the POC phase. All right, so if we go below the surface of the CLI, there are three main components, each embedded within the other, but modular, so they can be decoupled. So first is the chat engine. That's what you interact with primarily when you're hitting the CLI. The chat engine handles your prompts and saves your chat history. It can also do multi-turn conversations, multi-agent conversations. It's really wonderful. It saves your state and all that. The context engine builds and manages your context from your pine cone vector database, and the knowledge base orchestrates all the scary things. So like the chunking, the vectorization, building the actual pine cone database, and upserting those documents to pine cone. All right, so now we're gonna go through a little bit of before and after to show you just how much work Canopy can save you. Okay, so without Canopy, as I'm sure a lot of you know, if you've tried to build a RAG pipeline, getting the data into a good state is very annoying, and it creates a lot of ambiguity, and you have to do a lot of research. There are really good tools out there now. We have Lang Chain, Lama Index, et cetera, but it's still just time, and we don't have time for these things. So without Canopy, you would have to figure out your chunk size, and there's lots of strategies for doing this. You can divide your text document into chunks by paragraph, by sentence, by some n number of tokens. You can configure how many tokens across chunks overlap, so that's how many tokens at the end and beginning of each chunk are the same. And then you just have random stuff you have to worry about, like if you wanna keep punctuation, if you wanna split on line breaks, if you wanna keep those line breaks. There's lots of things that I don't wanna think about, I just want to build cool LLM applications. So with Canopy, all of this is abstracted away for you, and all of the heuristics that we've put into the default YAML file were developed by a team totally and only dedicated to researching this product, so it's all with best practices in mind, and they all have PhDs and they're very fancy, so we should all trust them. By default, Canopy sets the maximum tokens in each of your chunks to 56. This is pretty small, you can change it if you want to, but we've found that generally, for most text-based LLM RAG application use cases, that's a good chunk size. We extend LangChain's recursive character text splitter and some other methods from LangChain in order to chunk up your documents intelligently. We do things, what was it? We keep the markdown separators, and we split on those to things like headers and new lines, things like that, and then also for each chunk, Canopy will format it into an object that is ideal for indexing into Pinecone, so the data you put into Pinecone needs to be in a particular format, Canopy does all of this for you and it will include automatically any custom metadata fields that your data has when you upload it, right? So vectorization, without Canopy, figuring out what model you want to use for vectorization is difficult. There are hundreds of embeddings models out there and each one is pre-trained on a specific task or ideal for a certain domain and it's really hard to figure out which one to use and even when you figure out which one to use, actually hitting the model is confusing too. Do you hit it through an API? Do you host it locally? If you host it locally, will it crash your computer? Like it did mine a bunch of times. All of these things are very complicated. Canopy though, does it all for you? So by default, we use A to two, which is an open AI model, it's a text vectorization model, but as of today, literally at like 9 a.m. this morning, as of, I guess, version zero, five, oh, we support cohere and any scale embedding models. So that's really exciting, especially if you want multilingual rag applications, cohere embedding models, our multilingual, which open AIs are not, which is awesome. We also generate the embeddings for you in batches. We have a default batch size of 400. We send that to the open AI API, that's why you need your API key. And we do this all with fancy Python syntax, generator objects, all of that good stuff, error handling, retries, all that jazz. All right, indexing without canopy. So I'm not gonna make this sound too bad because indexing with the pine cone is a breeze, but it's still annoying. I don't wanna have to deal with yet another thing. Every time I say, okay, I'll just do this one thing manually, that thing gets up to 20 items long and then I've missed my dinner. So configuring the index is annoying. You have to just figure out the dimensionality that you want your index, what similarity metric you might think you need for your particular use case. Hybrid search needs dot product, well, regular search needs cosine, et cetera. You also, as I said before, need to format these data objects. So pine cone objects need to be with a particular format. It's essentially a map of ID vector metadata and then the metadata has to be a nested dictionary with lots of fields. And then absurding these objects into pine cone if you're dealing with a really large scale is non-trivial. So you need to do this either in batches or multi-parallel or multi-processing in parallel, keeping bottlenecks, retries, et cetera in mind. Both canopy, we do all that junk for you. So by default, the pine cone index that canopy spins up for you is the one that we see most people using. It has cosine similarity as a similarity metric. It runs on a P1 pod type, which is basically, we have three pod types. This P1 is the one that has the best balance between cost and performance. And it's the most popular pod type we have. And also canopy will intelligently pull the dimensionality from whatever text embedding model it knows you're using to generate your vectors. So you don't have to worry about, oh, do my vectors, they have a dimension of 1536, but my index has a dimension of 768. Like what am I gonna do? Doesn't matter, pine cone takes care of all of that alignment for you. It also, as I said before, formats those data objects into all the fields, the nested dictionaries, yada yada, that pine cone needs objects to be in for indexing. And then it batch upserts your objects again with fancy, wonderful Python syntax, this time in batches of 200. Lastly, prompting. So there's a whole field now, prompt to engineering. Without canopy, it's a little bit confusing what you should do, what format your prompts should be in. For most LLMs, you should really be doing a custom system prompt, this tells the LLM how to act, whether or not it's okay to hallucinate a little bit or whether they have to be 100% accurate because you're dealing with medical cases, anything like this, and it's really hard to know what words matter, what line breaks matter, all of those things. Luckily, again, canopy does this all for you. The picture on the right hand side is a screenshot of the YAML, which I'll actually show us live soon. But basically, we do a system prompt for you again, super easy to change, but basically it says, don't hallucinate, only use the context, answer as if it's your own knowledge, don't act robotic and weird. And then something unique to canopy, which I hadn't heard of anybody doing before I started working on the canopy product was we issue a retrieval prompt. So if you guys have worked with vector databases at all and LLMs, you know that sometimes whatever question I as a user might ask, an LLM might not be optimized for vector retrieval. So on the back end of canopy, when I ask a question, let's say it's really long or really complex or has lots of, I don't know, entities that are confusing to an LLM at face value, canopy will actually use chatGBT 3.5 turbo and optimize that prompt by splitting it into sub questions that are optimized for vector retrieval for you. So that you never have to worry about like, is my prompt good for both an LLM and a vector database? I don't know, it doesn't care. Canopy will take care of that for you. And as I said before, it'll format it all, line breaks, et cetera. Miscellaneous things that can change the behavior of the LLM are taken care of for you. All right, we're gonna do a live demo. I'm really scared to do this, I've literally never done a live demo, so we're gonna see how this goes. All right, and now with this double screen thing, it might be even harder. Okay, so, let's see. So this is my, I made a little pine cone index and all I had to do in the CLI was say canopy new and I had an environment variable that said my index name. AI dev demo and it prepositions everything with canopy dash dash. All right, so we have zero vectors in here, cool, cool, cool. I'm using my coworker James's dataset who is here. He put this onto Huggingface, it's just a bunch of archive.org research papers, cool, cool, cool. Behind the scenes, I took 10 of these research papers and I put them into the format that canopy likes and we are going to play with it. So, oh I have to drag my terminal, hold on. Where's my mouse? Oh, oh, okay, do I drag it this way? Okay, so I'm in a poetry environment right now, this is what you would be in if you were to contribute to the library, otherwise you can just pip install everything. And I'm going to do, nope, I'm going to do canopy start. This should start up the canopy server, which is a G unicorn, Gunicorn, whatever it's called, server, that guy. All right, cool, so we're up and running. So now canopy is alive, the server is up and running, I can do a bunch of stuff with it. So, oh and actually I just realized one thing, hold on, I'm quitting this for a second, let's quit that for a moment. We have no vectors in our index, so obviously I want to put vectors in there, which I did not do. So, okay, so I'm going to do canopy upsert, which is our pinecone word for inserting and or updating vectors. And I'm going to do the name of my JSON file that has those 10 archive.org papers that just looks like this, so I'm going to go ahead and dig a little JSON file, cool, cool, cool, I'm going to copy it, going to copy it, going to bring it back here. All right, cool, so I'm telling it right now, grab that JSON L file, upsert it, or chunk it, vectorize it, upsert it, do all the fancy things and we'll see what happens. Cool, so this is one of my favorite parts as well. It'll show you a preview of how it has understood your text file and then it'll say, does this data look right? And you can be like, yes or no. So we're going to say yes, it does look right. Now we have a little progress bar on the side here and we can watch things trickle in. That was loading. All right, we got 200, let's see if we can, all right, we're at 40%, you can see it going, all right, now it's 289. Do we think it auto refreshes? Maybe, let's see, 289, let's see if it goes higher. Oh, we're at 80%, 635. All right, 80%, almost there, ding, ding, ding, ding. Cool, we have a success message. If it was not successful, it would die until you immediately, so you don't have to wait like two hours just to get an error message. All right, so we have a total of 805 vectors. Those vectors represent chunks of those archive.org articles, so just 10 articles generates nearly 1,000 vectors. That's just kind of cool. And then on the dashboard, you can see here previews of all these vectors. So you have the vectors, the document ID, the source, et cetera. And one of these articles is this guy, which I loaded up here. De-coupled weight decay regularization. Doesn't sound super fun, but archive.org research papers are not always that fun. So I don't feel like reading this. I just want to know essentially what this is about. And I see this dude named Acacin mentioned everywhere. So I took a sin. Nope. H, c, see anything? T, C, H. Yeah, this dude. All right, this guy, Acacin. Like who is he? What does he say? I don't know. Oh, God. All right. Now, all right, cool. So now that we have all of our stuff in canopy, we're gonna do canopy start again, which we did before. Start up the canopy server. Pretend we didn't do that before. That is gonna start up. And you can see here, this is for debugging only. Again, this is like for POCing stuff. This isn't for, you're not gonna like launch into production through the CLI. That would be crazy. All right. So now I'm gonna do the command canopy chat and this no rag flag here means that the answers I get will show me both with and without rag. By default, it's just with rag, but I wanna show you how cool canopy is, so we're gonna do both. All right. So we are gonna start this little chat interface up. Cool, cool, cool. And then I have some questions that I put here. And these I purposefully made pretty long and pretty complex with reference entities. So who is this Acacin guy? Has he said anything about weight decay? If so, what is he said? This part's not immediate. It has to go through the API, two API calls. One to chat GBT to reformat my query into sub queries. Another one to chat GBT the LLM. All right, so we have Acacin as a person who has discussed weight decay in the context of Bayesian filtering. Yada, yada, yada. Cool, this is all from that article, but without rag, chat GBT says without more context it's difficult to provide a specific answer. There are several individuals named this name. Blah, blah, blah, blah, blah. If you provide more details, I might be able to assist you further. That is annoying. I don't want to assist you further. You're supposed to be in AI. Just like tell me things. All right, what if I say, please tell me about more about Acacin's theoretical framework. Cool. From what I remember of playing with this before, this is gonna be a biggie, which I like. All right, so this is all about a theoretical framework, which is super cool. And it specifically says as mentioned, whoops, as mentioned in the context, which I like as an end user, because I know that it's actually reading my context. I don't have to go back into the article and like copy paste answers to see if the answer that my rag pipeline gave me is actually mentioned in the paper. And then it also gives me the archive.org citation, which is awesome. And that's the citation for my paper. So again, I can confirm that what I'm doing is actually making an impact on the end product. And without rag, again, given the lack of specific information, it's challenging to provide a perfect answer, blah, blah, blah, blah, blah. It's just asking for more and more details about Acacin, but like listen, I already put them in the vector database. Like sorry, you don't know about it. And that, that is our live demo. So I think it's really exciting. Everybody should use it. And more things, more good things are coming to the future of Canopy. We have more model options that were in the process of deploying. I personally am working on an Azure deployment right now. So like all your Azure models, you can put them into Canopy or hit their endpoints, host them Azure through Canopy. We're also trying to get local deployments to work with Canopy. I personally would love to see, and I know the team is working on direct file uploads. So like you can just drag and drop a text file or a PDF file directly into the CLI right now. Evaluation support, whether that's native or through a third party, I think as a search engineer, I think it'd be really exciting to have like ranking differentials between different configurations of Canopy. Advanced retrieval features, like adding in diversity, which is especially important if you're doing things like e-com search, and much more. And that is the end. My name is Audrey again, our forum. If anybody has questions and wants to join the community is community.pinecone.io, and that is it. Sorry for the weird presentation format, but thank you. I don't know if we have question time. Go ask a question, I don't know, yeah. Yes, so this again is a configuration parameter you can tune, but basically five of your chunks, the top five most relevant chunks from your query, then get packaged with your query. So like what is a dog? If I send that to the vector database, I get five chunks about dogs. It would send what is a dog, that query, plus my five chunks I got from pinecone to the LLM, correct. The answer to that actually, I know pinecone has security measures for like the vector data in the index itself, but when we send it to a third party LLM endpoint, I'm not sure, I know the data's encrypted in transit, but that's, well sure, but yeah, I don't know. James, do you know? Yeah, that would be the local model integration if you never wanted it to leave your ecosystem, yeah. Cool, we do lots of FinTech stuff, but yeah, not with the third party LLMs yet. All right, thank you.