 Hey, everyone. Welcome. And today my talk is beyond naive rag towards more complex question answering. I'm Jerry, co-founder and CEO of Wama Index. We got a lot of stuff in here. And so if there's too much stuff, if there's too much content, Lori Voss or VP of DevRel is doing a workshop at 11.50. And so please check that out. It's a much deeper dive. I really only have nine minutes and 40 seconds left to get through the rest of the slides. OK. For those of you who aren't familiar with Wama Index, Wama Index is a framework connecting your data to your LLM to build a production grade application. Production grade LLM application. These days, if you've been hacking around with LLMs the past year, there's a lot of use cases that have emerged. And Wama Index is one of the core frameworks to basically build LLM maps on top of your private data. Whether or not it's documents, PDFs, databases, APIs, services, you can basically connect all those to your language model to build a chat GPT-like experience over your data in both simple to complex manner. So let's talk about retrieval augmented generation. Chances are, if you've played around with LLMs this past year, you've probably heard about this term. And it's abbreviated as RAG. What is RAG? RAG is basically a conceptual framework for building a QA system that consists of two main concepts. It's data ingestion and parsing, as well as data querying. So you start off with some source document, like a PDF. And let's say you want the LLM to understand this PDF. So the steps you take are you take this PDF, you process it, put it into a vector database, like a Sandra. And then the vector database itself contains embeddings as well as the source documents. During data querying, when the user actually asks a question or has a task, you basically do retrieval and then synthesis to pull information from the storage system into the context window of the model. So it's this framework that's emerging that basically captures the fact that you don't really need to train these models. You just need to prompt them the right way by pulling data from your storage system and putting that into the context window of the language model. In LLOM index, you can do this through the quick start in five lines of code. But of course, we have much deeper tutorials to really help you understand how this works both at a very simple level and at an advanced level. And so many of you might have already built RAG pipelines, whether on your own or as part of a company effort. We've kind of seen that this is one of the biggest use cases for enterprises this year. But of course, there's a lot of challenges in building RAG. And many of you might have run into this. Naive RAG is limited. If you just do what I just told you, you put stuff in a vector database, do some very basic strategies, do some very basic retrieval strategies, you tend to find that it works well for simple questions over a very simple constrained set of documents. You can ask stuff like, what are the risk factors for Tesla over the Tesla 2021 10K? What did the author do during his time at YSE over a simple essay? These are questions about specific facts within a document that this setup tends to work well. So if you're an enterprise, you have a bunch of documents, and you want to ask questions about specific facts, the naive RAG setup tends to be OK. But there's a lot of questions. And if you just open up ChatGPT and just ask more complex questions, you find that it tends to be able to answer as long as the knowledge is in its domain. So how do you basically replicate this experience over your own external sources of data? And there's certain questions you want to ask where top K retrieval, this naive stuff, will fail. This includes summarization questions, just getting a summary of the document. Comparison questions, compare A versus B. Structured analytics and semantic search. For instance, if you want to ask questions that contain structured information in addition to unstructured text. So for instance, tell me the risk factors of the highest performing ride share company in the US. You need to execute a SQL query to actually get the highest performing ride share company in the US. And general multi-part agentic questions. And this is kind of like the Holy Grail. You just ask some arbitrarily complex task, like tell me about blah, blah, blah, and then tell me about that, and then compare the two, and then do some task based on that. That is much more complex than what the naive RAG setup can handle. So here's a very general cheat sheet as to how we think about this. Wama index helps you set up both the basic RAG stack, as well as helps you basically solve a lot of the pain points you're facing. And a lot of these solutions we propose are based on specific pain points. And so if, for instance, your pain point is questions over unstructured or structured data, we have a certain set of solutions. And some other pain points include questions over complex documents, like if your document contains much of embedded tables and charts, for instance, like SCC filings. This is something that we have a bunch of abstractions over. And what I just said, if you want to ask more complex questions over your data, like multi-part questions, there's a variety of techniques that we can go through. And so for this keynote, maybe the one lesson I want to drive home is we should kind of think about how do you actually compose more sophisticated and interesting ALM-powered systems that basically carry the core concepts of RAG but extend it to become more capable and sophisticated of doing question answering and search and retrieval. So let's address these pain points. And specifically, I'm going to talk about agents. And agents as an extension of RAG to help you answer more complex questions over your data. So RAG, you can think of that as a black box where given a user query, you get back a response. And this response contains the answer of your question over your data. You can see agents as basically a layer. And this is the way we look at it as basically this kind of understanding and query layer in front of any sort of RAG pipeline over your documents. And it's an ALM-powered abstraction that basically crunches any sort of input task you give it and is able to dynamically understand what the query is, generate, for instance, like sub-questions or decompositions or reasoning loops to really try to take that task and solve it. And so we see agents as an extension of RAG by putting this as a layer in front of your RAG pipeline. A popular framework that came out last year is React, reasoning and acting with LLMs. Since then, there's been a ton of papers at iClear, ICML, and NURBS right now that have basically extended beyond this in terms of different types of reasoning. But this is one of the first reasoning papers where you basically prompt the LLM to generate both chain of thought as well as tool use. And so you can dynamically break down a question into, for instance, multiple iterations and then solve each of those by calling a tool or function or RAG pipeline. And so we baked some of these abstractions into LLMA index. But as a general framework for thinking about how agents really interact with your data system, some of the core components that you need are an agent reasoning loop, like a thought loop that does chain of thought and tool selection. And then as one of the underlying tools, you can basically have a RAG pipeline over your documents. And this basically has that agent reasoning layer at the front, as well as the RAG pipeline, based on an underlying storage system that you can use to query your data. So an example that we show here is financial analysis with agents plus RAG. And so a question is, for instance, compare and contrast Uber and Lyft's revenue growth. This is a bit of a complex question because it's a comparison question. It's multi-part. If you actually want to do this, you want to break this down into two sub-questions, probably. One is, what is Uber's revenue growth? If you have financial reports about Uber. And then what is Lyft's revenue growth if you have financial reports about Lyft? And so the agent layer, powered by React or any other thought loop, will break down this question into a set of sub-questions over your tools. And then once each question is broken down, you can basically have some sort of RAG pipeline where you can basically answer a question over a given document via your standard retrieval setup. And so the lesson here is you can basically add agents as a layer in front of the setup to handle these types of comparison questions or any general multi-part questions. And so even if your existing RAG pipeline can't handle these types of overall tasks that you want to give it, you can basically add some sort of more sophisticated query reasoning layer at the top to handle this and route it to a more kind of weaker retrieval system to answer the question. So in the last few slides, let's put it all together and these are just some resources that help you get started, especially if you're, for instance, interested in building RAG and the enterprise. And this applies no matter what data you're building on top of, right? This includes financial documents, like legal briefs, general invoice processing, multi-modal, these are all efforts that we're investing in. The first thing I'll show is SEC Insights, which basically is an agent layer on top of a RAG setup that allows you to ask questions over your financial documents. We built this in-house within Lamaindex, just to mostly dog food the framework, but also in the end create a full stack application template that, by the way, it's fully open source, MIT license, that you can just clone, rip, and then do whatever you want with it. You can even commercialize it if you wanted to. Contains a PDF viewer, intermediate sources and citations, and roughly follows architecture that I had set up. Another resource that I want to shout out, and so, you know, where the data stacks folks are good friends of ours and so, you can build a chat server with any documentation with Lamaindex plus data stacks, Astro. There's a GitHub link as well as QR code in the slides, but this is a nice way to basically generate a full stack API server that does RAG for you. Great, thank you. And in the last bit, if you're building LLMs in an enterprise setting, we love to chat. And of course, we are looking actually for backend data engineers, and as well as full stack front end. So. Cool, thank you.