 Good morning, everyone. I'm Karthik. I work for Salesforce search relevance team. My team maintains the search backend and does relevance for CRM backend search. Today, I'm excited to delve into innovative area in AI and natural language processing. We'll be talking about grounding the LLMs to make them usable in enterprise settings. We'll be talking about a concept called Retrival Augmented Generation, commonly known as RAG. This will significantly leap in how AI systems interact and generate human-like text. At the end of this talk, you'll have a clear understanding of what RAG is, how it works, and then how it can be a game changer for your enterprise applications. RAG is a concept that provides LLM with more context from external sources. That could be your enterprise datasets or some other external systems or knowledge base that you already have, which you don't want your LLM to be trained on. Before going into more details, let's see what are the drawbacks with today's LLMs. Most of the LLMs are trained on static datasets and they are frozen in time. You might have seen, a lot of you might have used chat GPT or any other LLM systems. If you have, if you ask some question, which is very latest, something like sports event or that might have occurred in a recent past, it will just say, hey, I'm trained only till until this date. I don't have information and then it will just raise it hands. But if you try to use these LLMs in your production applications, you can't do that. You can't say that to your customer. So we should have some kind of solution. And then these LLMs are mostly trained by few companies. Obviously, if you're a small company, you can't train your own LLM because you can't spend obviously millions and millions of dollars to do all these trainings. So they lack context on the private data that your company might have or your team might have. So what happens is if you ask some question, which it doesn't understand LLM will just give some hallucinated answer, which will be very difficult for you to surface it back to your application. That's where we need to provide some more context and then do more of prompt engineering so that these LLMs can be used for your company specific questions or domain specific questions. And one more bigger challenge is once you are in an enterprise setting, these LLMs are all like black boxes. You give it something and then it gives a result. Sometimes you might ask for a citation and all those kind of things, but they're all questionable. You can't use that for auditing purposes. You might want a situation or a, you might be in a scenario where you want to cite what are the sources this LLM is using. So we could use a framework or we could come up with a framework, something like RAG, where you can use the citation of your documents, which you fed into the prompt so that you can use it for auditing purposes. So this is like a hybrid approach. This is best of both worlds where you'll be using the LLMs for doing the work and as well as your homegrown knowledge base at the same time. Now let's get into the technicalities of the architecture and why this RAG is fascinating. It essentially merges two concepts. First is the neural retrieval model and then LLM. Let's start understanding how this works. For any enterprise company or any enterprise, the center to their entire system is the knowledge base. So what we start doing is we convert this knowledge base. It could be our knowledge articles, documents or anything like chat histories or anything. So we convert those into something called embeddings. These embeddings are nothing but high dimensional vectors. What they do is they don't just convert it into like keyword vectors or something like that. What they do is they capture the essence and then they convert these documents into a numerical form. Now say you have a large document. You don't want to convert the entire document into a numerical representation. Then if you do that, what you'll do is you'll boil down or you lose an essence. So what you could do is you could convert these documents into a smaller chunks and then try to convert those into embeddings. So there are a couple of techniques that you could do. You could do like a uniform chunking where you could just chunk by say number of characters like say 200 characters at a time. But then you lose the context. So you could do more nuanced approach like going by sentences. Like say if you have a well written document in English language or any other language, you write it in such a way that you have sentences, you have side headings, you have context around it. So if you do a sentence based chunking then it will have more context around it. Or you could use like more complicated ones like a parent document retriever where chunks maintain a link to the original document and the surrounding documents. Once you have these embeddings, you store those in a vector database. So in vector database, what happens is you could search based on the nearest neighbors. Say you want to search for a fruit and then recovery comes up for Apple. It just doesn't do a keyword search and just return documents containing Apple. But it will also look for the nearest ones like a fruit or a grape or something like that. So that you get all the documents which are nearest to this. And once you have these results from vector database, what you could do is you could look at top K results and then sort them and then use them. And then you could pass that context to your LLM. So once you have this context and as well as the query that user is looking for or your app is prompting, you could have a much more nuanced context and grounded LLM with cited knowledge articles. So that will be a much more better response than directly asking an LLM. This rag is extensively used in like Q&A chatbots or say customer service kind of applications in content creation. Hope you like this talk. Thanks everyone.