 All right. Hello, everyone. I am stoked to see so many of you made it all the way to the end of the conference. I was like, if anyone showed up, I'd be happy. There are more than one person here, so that's excellent. My name is Marcus Helberg. I'm a nerd. I like coding, and that's why I'm here. So I'm a really curious person in general. When I see something cool, I want to understand how it works and how can I build something like that. So this talk today is going to be very much the practical part of this conference. It's going to be hands-on coding. I'm sure that in the past 20 or so sessions that you've been to, you've heard a lot of the basic terms and stuff in slides. We're going to turn all of that into code, and we're going to build a little RAG-powered application. So we're going to have an application. It's going to access some of our data. It's going to access some of our APIs, and it's going to do meaningful things. And hopefully, we're going to manage to do all of this in the 30 minutes we have allotted for it. So what I'm going to use for this demo is LangChain. We're specifically going to use the Java version of LangChain, because I'm a Java developer. I enjoy working with Java. The same concepts will work with any flavor of LangChain. So if you're more Python or JavaScript, just kind of try to translate the code in your head. I'm going to use a framework called HILA for the orchestration of the front end and the back end. We're going to build a full-stack web application here. What it does is it takes a React front end, a Spring Boot back end, and just gives us a really kind of seamless, type safe way of calling the back end. And that's going to make it really easy for us to build this. The app we're going to build looks like this. So we're going to simulate that we're renting cars. And I want to give big credits to the LangChain for J team for this example. I have added the UI part of it, but the idea came from them. And I think it's a really powerful example of how we can put together all of these concepts that we've heard about over the past two days into an actual working application. So what I have here on the one side is a live view of my database. So we have some bookings made for different people. And what we have on the other side here is simulating a customer service agent chat that we can chat with. And what we want to do is we want to be able to ask it meaningful questions, like ask it to pull up our reservation details. We want to cancel our reservation. And it should then determine whether or not we're within a allowable cancellation window. And if it is, then go ahead and cancel it. If not, just blightly tell me, no, that can't be done. Good. All right, so let's get going. My application is, like I mentioned, a Spring Boot application. So we have a source main Java folder here with an application. And on the client, we have a React front end. So we're not going to go too deep into this, but we can see that we have the message list here. And we have the grid here. You can see that we're using some of the components in our framework to simplify that. So we have a message list component that's bound to a list of messages up here. And what's really the interesting part of this whole thing is that when we make a call or send a new message, we're calling this assistant service dot chat method. And we subscribe to the response from there. So it's going to start streaming chunks of the response to us. And we are going to append those to the output so that we're getting that chat GPT-like streaming response. The assistant service looks like this. It's a Java class annotated with a browser callable. So that's how we're able to call it here as a method, as opposed to calling a URL like it would do if we're calling a rest endpoint or something like that. So that's what we have here. If I try to interact with this right now, let me put this up a little bit and say, hey, it's just going to say, sorry, my brain's not hooked up. So that's our task for today. We need to provide it with a brain that's hopefully functioning even. So for that, I'm going to mostly work within the application class. We're going to configure a whole bunch of these small parts that are going to make up this application. So let's go ahead and start here. So the first thing we need to do is define the model that we want to work with. So the way I'm going to do this is for each thing, I'm going to define a spring bean and then configure how that's going to work. So I'm going to do a new bean, which will be a streaming chat language model. And this streaming chat language model will return a open AI chat language model. And LangChain uses a builder pattern for almost everything. So I'm going to kind of complete the build here and then going to go in here and configure things. So one of the things we need for open AI is, of course, our API key. I have that in my environment as a variable. So I want to inject it here. So I'll have a string API key here. And the way I'm going to get it from my environment is using the value annotation like this. And we're going to say that we want to have the open AI API key like this. So now we have the API key from our environment. We're going to say that we want to use that API key. And we want to use a specific model, GPT-4, because that's a good model to use. So that tells us what model we want to use. Next thing we need to do is define or what tokenizer that model uses, because we need to count some tokens and whatnot. So we're going to define a bean that returns the tokenizer. And that tokenizer will return a new open AI tokenizer like this using the same model. So we're using this tokenizer that's meant for the specific model that we're using. Now, what's pretty cool about LangChain4j is that it works very similar to Spring, where if you've ever used spring data, you know that you provide an interface of, say, a JPA repository. And Spring provides you with the implementation. You don't have to go and type in all the boilerplate. This works very similarly. So we need to provide an interface of how we want to interact with the agent. And we use LangChain4j for actually providing that implementation. So let's do a public interface interface like this. And we're going to call this our customer support agent. And this interface will just have one method that returns a token stream. Token stream means that we're going to kind of stream the responses it's coming in. And we're going to call this chat. It's going to take in two things. It's going to take in a string, which will be our chat ID. So it keeps track of the different chats we're doing. Each one of those will have a separate memory attached to it. And then we'll have the string, which corresponds to the actual user message that we get in. Now, I'm going to annotate these so LangChain knows what it is. So this will be the memory ID, keeping track of each chat separately. And this will be the user message, which will get passed in as the user message. We can define a system message, essentially telling the LLM how it should behave, like what is its role. So I'm going to go in here and copy over a message that I haven't. We'll just go through it real quick. So what we're telling this specific LLM is like, this is how we want you to behave your customer support agent at a car rental company called Miles of Smiles. It'll be friendly, helpful. Before changing a booking of any way, you need to get the following information. Booking number, first name, last name. And before changing a booking, be sure that the terms of service actually allow that thing to happen. And then finally, we inject today's date, because a lot of the calculations on dates require it to know what day it is today. And that's not something that the LLM knows. So we now have a kind of a description of what the interface needs to look like. Now we need to have Lanchain actually provide that for us. So what we'll do, again, follow the same thing. We're going to do a bean. And we're going to return a customer support agent here. And in this definition, we're going to start injecting some of those things that we just created. So we'll have the streaming chat language model. And then we'll have the tokenizer like this. And then we'll return what we get from calling the Lanchain AI services builder. And the builder needs to get a class. In this case, the interface we want to implement. And then it is a builder again, like everything else that we've used here. So we're essentially saying that we want to have this interface. And we need to give it some more information about how it should work. So first of all, it needs to have a language model, which is something that we already configured. So we're going to say, use the streaming chat language model that we created. And then we need to have a way of managing the history of all these chats that we have. So we're going to use a chat memory provider that has essentially a lambda that takes in a unique ID per chat. And then it provides a memory that corresponds to that. So for that, I'm going to use a token window chat memory, which means that I want to have a certain amount of tokens worth of chat in my memory. That's something that I remember, like the first time I built a RAG application, I spent like 400 lines of code just trying to count tokens myself. And that was like, I don't know, five months ago. And now I can do this in a line of code, which is pretty cool. So we're going to say that the ID for this should be the chat ID. And we want to have a maximum of, say, 500 tokens using that GPT-4 tokenizer. Now, what I want to do then is, when we have this customer support agent, we can go into our service class here and just replace this, I'm sorry, my brain is not hooked up, message with an actual call to our LLM. So the way this works is we are going to create a field here for our customer support agent. Like this, we're going to tell our ID that we want to inject it through our constructor. And then we're going to return it here. Now, Lanchain's token stream is not a plain Java type that Spring understands necessarily. So we're going to convert it into a flux, which is from the project reactor library, which is something that Spring uses internally. So for that, we're going to create a sync, which is a programmatic way of creating a flux. So we're going to create a sync like this by using the Unicast for this and, say, on-back pressure buffer. So this means that if we're getting more messages than we can handle at any time, let's keep track of those. We're not going to drop words in the middle of the sentence that would not make any kind of sense at all. And then at the end of all of this, we can return what we get from calling the sync as a flux. So really what we need to do here is call this customer support agent and pipe those things between them. So we're going to call our customer support agent. We're going to call the method that we define in the interface. We're going to pass in the chat ID. We're going to pass in, I guess I called it a question here. And then we're just going to say that on the next chunk that comes along, we're going to pass that to the sync try-emit next. Then when the stream completes, we're going to get an event and we're going to call sync dot try-emit complete. And then finally, if there's an error, we're going to say sync dot try-emit error. And finally, we get it started by calling start. Now if we manage to do things correctly, we might be able to interact with this. So let's see. Hello there. So we can see. Can you see the text here in the back? All good. Let's make it a little bit bigger. So we can see that it understands now from that system prompt that it's working at miles of smiles. That's great. Really kind of a big problem with this if we ask it something very specific about our business, like let's see if this works. Come on. Can you explain the cancellation policy to me, please? Let's see what it answers. All right, so let's see. You can cancel a reservation online by 48 hours before. I mean, all of this seems reasonable, but the problem is that it's in no way grounded in our reality. So if we look at our terms of service, it's like, well, you can cancel it up to seven days prior. So I mean, sounds plausible is completely made up. So that gets us to our next step of our adventure. How do we teach this LLM to stay within its line? So this is where the retrieval augmented generation comes in rag. So we need to take this document, ingest it into a vector databases embeddings, and then use that information to pull in the right information. So who here does not know how vector embeddings work? OK, great. So vector embeddings essentially take a piece of text and convert that, the meaning of the text, into a vector, essentially a multidimensional array. It works very similarly to how a color picker works. So you can essentially think about it. You can take any color in the world, and you can get a RGB value for it. That's a three-value vector. And intuitively, if you've used a color picker, you know that similar color values have very similar RGB values, very similar idea, just that we're working with text here. Now, because we're working with a document, we want to split that into small sections. And the reason we're doing that is that we want to have very specific meanings for all of these embeddings. Otherwise, it's like asking, what's the color of this entire painting? That doesn't make a whole lot of sense. Whereas if we say, what's the color of this, I don't know, house, it's going to be able to say what it is. So likewise, if we say, what's the meaning of this entire document, it doesn't make as much sense as, what's the meaning of this specific paragraph? Good. So that's what we're going to do. So for this, we need to, again, define a couple of beans. We're going to have an embedding model. So that's like, how do we convert from text to a vector? This embedding model, we are going to use for this just in memory once, but you can plug in any other ones here. So we're going to return a new all mini LLM. So this is just an in memory Java embedding engine. You can use any open AI or other embedding model here. But this is a simple one that we can run in our JVM. Likewise, we're going to create a embedding store. So this is the vector store where we store all of these. So we're going to do one that has to do with text segments. And we're going to return an in memory embedding store. So again, we're using in memory one. You could use Pinecone. You could use Mary. What's yours called? Astra. Yeah, you could use Astra as well. Just plug in basically any one. The API stays the same. But for the simplicity of this, we're going to do this. So in order for our agent here to be able to interact with these, we're going to define a retriever, which combines these two into one tool for it. So we're going to define a beam, which is a retriever of text segments called retriever. It'll take in two things. Let me see here. So it'll take in two things. It'll take in the embedding model, embedding model, embedding store. Let's see that one. Text segment, embedding store. And what this will then return is a embedding store retriever from. And here we pass in a couple of things. So if we look at the parameters it takes in store, model, how many results, and what's the minimum score for something to qualify. So we're going to do the embedding store, embedding model, at least one result. And we want that result to have 0.6. So in real life, you're going to play around with these settings for this demo. I happen to know that these are exactly what we need for today, so good. So once we defined the retriever, we need to tell our agent here how to use it. So we're going to auto-wire it in here. Have our retriever here, and we're going to go into the builder and say that your retriever is this. Now, of course, this would work really great if that vector store we just bound to had data in it, which it doesn't right now. So for the sake of simplicity, I'm going to do the ingestion, essentially turning that text into vectors in the same application. Now in a real life, real world application, you probably don't do that in the same application like consume and create the vectors. But we're going to do it here just so we can kind of see the full end-to-end stuff. So I'm going to create a Spring command line runners just so we can run a task here. And we're going to say this can be dock to embeddings like this. And what we need in here, we'll need the embedding model so that we can turn text into vectors. We need an embedding store so we can put those vectors somewhere. And we're going to call this our embedding store. We need a tokenizer so we can count tokens. And then we need a resource loader so that we can load some stuff from our class path. This will return a lambda with some arguments and then whatever's in here will get run. So let's go ahead and first get the resource, which corresponds to our terms of service here. So we're going to say resource is equal to loader.grateResource, class path, and then it will be terms of service.txt. Then we need to turn that into a langchain document. So I'm going to save our doc is equal to loadDocument. And we're going to import that. And we're going to take the resource and get the file path and say, hey, a langchain turned this into a document. Then we need to split it into those small sections as we said. So we're going to create a splitter. And we can use, again, a langchain here. So we're going to call documentSplitters.recursive. Let's say we want to have 100 token chunks of it with no overlap. And again, this is something where you'd play around with values that work for you. And we give that tokenizer so it knows what a token is for this specific model. When we have those, we can create a ingester. So we'll do a ingester like that. And we'll have embeddingStore ingester. Again, a builder pattern here. So I'm going to close it and then start configuring it here. So it takes in the embedding model, which we have. Great. Takes in the embeddingStore so it knows where to store them. Takes in that splitter that we have. And then returns an ingester. So once we have this ingester, what we can do is call ingest and just pass in as many documents. So we could say, just ingest this entire folder if we wanted to. For now, we just have one document. So that's all we're going to tell it to ingest. So again, now if all demo gods are on our site, we should be able to have a more meaningful conversation here. So let's see once this reloads. I just built the project. And let's see if it's able to tell us something more. Hey, can you tell me about the cancellation policy? So let's see. All right. Excellent. So now you can see that it's not giving us some sort of made up story about canceling 48 hours before it's actually telling us exactly what it was in our document with some additional text because we told it to be nice and polite and courteous. So it's doing a pretty good job here. But what's really missing from this for it to actually be useful for us in a business setting is that it should be able to access our database with these bookings and be able to pull some information and do something with them. And that's something that we can do with LangChain tools. So for that, I'm going to go into my service package here. I'm going to create a new class. I'm going to call this our booking tools. And this will just be a plain spring component. Let me hide the sidebar here. And what I want to do here is I want to inject my back end service. I have a plain spring service class that's very non-interesting for this talk. We just want to call methods on it. So we're going to have a private field for our car rental service. We'll call it service. We'll again inject that to our constructor. And then we're going to create a public method that returns booking details. And we'll call this get booking details. We want to be very descriptive in our naming here because that's going to help the LLM understand what it needs to do with this. So we're going to get booking details. We're going to pass in three parameters, string, booking, number, string, first name, string, last name. And here I'm going to turn on co-pilot because I'm a very lazy typer and we're in a hurry. So we're going to return what we get from calling the service with those same bookings. So that's going to go into my database and figure out who this is. It's going to return an exception if that person doesn't exist. Likewise, we're going to have one that should be a void method. So like that. So we have another method here that cancels the booking again with those information. The way we make these available to LangChain is we add a tool annotation here. So in those, that's a method we can call. We don't want to give it access to our entire back end service or anything. We want to be very mindful. Like these are the only two methods you're allowed to call. And just having those annotations there is not going to help us do that yet. What we need to do here is in our agent configuration, we need to go in here and inject our booking tools like this. We need to go into our builder. We're going to say that here is a tool you can use. And if we build this now and things went well, we should be able to have, again, a meaningful conversation with our assistant here. So again, this is a live view of our database. So whenever the chat completion completes, it's going to update this. So let's try two different paths here. So the first booking here by John Doe is too close to the cancellation. So it's within that window when we're not allowed to cancel. So let's see if it allows us to do that. Hi, my name is John Doe. My booking number is 101. Can you please cancel it? So it's going to, hmm. So what is going on here? Let's try that again. It has not happened in my trial so far. Hi, my name is John Doe. My booking number is 101. Can you please cancel it? All right, so I don't know what happened. Just a moment ago, things can happen. But what happened right now is what we wanted to happen. So it says, because your booking is from today, until whenever we're not within that seven days prior, we cannot do it. So that's exactly what we wanted to happen. So let's try the last one here and see we should be able to see this turn into a canceled. Hi, my name is Robert Taylor. My booking number is 105. Can you please cancel it? Let's see if that works. You may cancel up to seven days prior. Would you like me to proceed? Yes, please do. Successfully canceled. And we can see that that now turned into canceled in our database. All right, that was all that I wanted to show you. We have very limited time. So if you want to like dig into this on your own time, you can find it on my github, which is github.com slash Marcus Helberg slash spring bootline chain rag. So you can find all the code for this here. Just the only thing you need to make sure that you have before running it is that open API key environment variable. I believe I did document that in the read me after a couple of people send the angry messages telling me it didn't run. So go ahead and try it out. I think we have some time for questions. So if you have any questions, anything I can make clear, please ask. Yes, sir? Okay, so the question was how did it know to call that particular cancel booking? Yeah, so what this essentially does under the hood is it uses the functions API in open AI. So in this case, we were doing very much just like clear naming of the function was enough to have like a cancel booking name and first name, last name. So it understood kind of by us telling it that there's a function with this name that takes in these name parameters. It was able to just figure it out. We could add annotations to like give much more deep descriptions of what they are and how it should input that data. But this was purely based on convention. I remember you saying that you specifically called it that name because you know you was gonna call it base. So if you changed that name, it wouldn't have worked then, right? Yeah, I mean, if I changed that to, I don't know, Foo, it probably would not know like what it was. Then I would have to like in that tool annotation, I would have to tell that this method Foo is used for cancelling bookings in this case. Cool. The second question is what are these spring beans? Are they like enterprise Java beans, session beans that run in an app server or why not just make them regular classes or functions? Why making beans? What's the benefit of that? So the benefit of making them beans in this case was just to have them like define one small thing at a time so we could have a discussion. You could just define them as variables and put them together. The beans in spring are very similar to enterprise beans in like the Jakarta yeast stack for sure. The benefit is that you can inject them in different places. So for instance, remember the tokenizer, we ended up needing that in a whole bunch of different places so that way we only had to define it once and we could use it in several different places. Okay, the whole thing runs into just the same JVM or it's not like an app server or anything. No, this is all running in the JVM, yeah? Okay, cool. All right, we have a question back there, I believe. Now let me check, I'm not entirely sure how long we're supposed to go on to. I think we're already over time, maybe. But yeah, go ahead. Sure, so when we created the vectors using the text file data. Yeah. Is there limitations on that just to be a text file or can you ingest larger volume of data through a PDF or? Yeah, yeah, so I think Lang chain supports a whole bunch of different types of documents from plain text files, markdown, PDF, doc, I think there are kind of plugins you can add or define your own handlers for different file types. So essentially you could, for any types that it supports or you tell it how to support, you can just give it a big old file folder of content and have it just go through that. So typically the embedding like content to vector creation would probably run like on a build server or something like whenever your documentation changes you go and update the vectors for that documentation. You wouldn't normally do it within the app that consumes them if that makes sense. Yeah, but on that note, you made a comment that that data more or less is more effective when it's in bite size pieces as represented in that text file. Yeah. You're gonna have small representations of the data so it's more clear. Is that correct? Yeah, so you definitely wanna chunk up the data and that's something you probably need to kind of play around with as you're developing the application figuring out what's a meaningful chunk size. The recursive splitter that we used will start basically by trying to fit a whole paragraph into a chunk and then it starts to go into like smaller, like recursively smaller pieces until it can fit it into a chunk. So if you give it a kind of a meaningfully large size you can do it. What I've done for like our documentation we have like a chat that you can use for interacting with our documentation because it's in markdown format. I essentially just took every, I split it by headings because I know those headings are sort of meaningful for this. The size is definitely still, so the question was is the size limited by the context window certainly. So we always need to be mindful of how much we're sending over to the LLM. So in this case we didn't really pay too much attention of like how much we're sending through this retriever. The only way where we kind of managed our context was by limiting the size of the history but I believe the retriever does take in parameters where you can kind of configure how much not only how many results you want but like how many tokens worth of results you want at most. All right, let's wrap up here. I'll be around if you wanna ask further questions. Thank you so much for sticking around all the way to the end of the conference. I hope you had a good time. Thank you so much for coming and bye.