 and the state is all yours, Gage. Awesome, okay, thank you. Thank you guys for coming out to this last, one of the last sessions of the conference. I hope everybody had a good time. And I'd like to introduce myself, I'm Gage Crumbach. I am an internet red hat. I am part of the AI COE. And today I'm gonna talk to you guys about stateful sessions for intelligent apps. And what we're gonna be looking at is this open shift powered application, which is gonna be using a lot of model services, in particular, a audio decoding model service. So if we look at sort of the roadmap of our presentation here, is first I'm gonna define what statefulness is and why this talks about statefulness and what we need it for. And then I'm gonna talk about how we're gonna build a model service that requires the statefulness. And in particular, that's gonna be the audio decoding model service. And then we're gonna put this model service to the test and put it into an architecture and solve a little problem along the way. And then we'll show a little demo of it actually sort of working, hopefully. So first, what is statefulness? In an application, statefulness is probably something we see every day. It's data that is persistent, that is used to persistent future interactions with the application. And a major example, it would be something like a shopping cart in e-commerce site. You go to the e-commerce site and you put stuff in the shopping cart, come back a couple days later, it's still in that shopping cart. That generally is what state is for an application. But state is used everywhere for so many things and that's just one broad example. But today we're gonna look at state inside a model service. And state inside a model service is not too common of a thing. And really what it's used for is session-based statefulness. So you need state to persist through maybe an hour of use or maybe minutes of use with an API service. So an example would be some sort of chain models like audio decoding where you can't give the entire model an entire audio file. You have to give the model small chunks of the audio file which it then changed together. And that state needs to be saved for the model to be able to chain them all together. So now that we know what statefulness is, let's look at a small little example of how it's being used in a model service. So right here is an example of a puzzle builder API. And what it does is it takes in puzzle pieces and then builds them up into an actual puzzle. So if we wanna use this, if we wanna use this API, we're gonna need state. But first let's look at what happens if we don't use state. Meaning we're not gonna store state in the API. So we give it a puzzle piece and it does its job. The model correctly puts it on the puzzle board. But because it's not saving state when we add another piece, it's no longer there. So we can only just add one piece to the puzzle. We can't see the entire puzzle completed as a whole. So let's try it with state. Once we add it with state, we can see that puzzle pieces go to the right spot just as before. But we can add all the puzzle pieces without the other ones disappearing. And that's because we stored state inside the actual API. Now, this is not, there's multiple reasons why this is not a great way to do things. Mainly because you can't really scale this up because we're storing state in there and it no longer makes it a restful API. There's a lot of reasons why we want, we might want to avoid this. So, and there are easy ways to avoid this. We can serialize the data out into a database. So a puzzle piece, we can hash it out or do whatever, any way we can get it into a database. So then what this does, it keeps our API stateless and we're able to scale it a lot easier. And that's great. But there are some situations when we just can't scale the, we can't serialize the data. And this could be from various different issues or situations you have to overcome. So I mean, one of them might be you're working with a third-party model and it just doesn't give you the option to serialize the model's progress out. And that issue is, that issue is the same here with audio decoding. Audio decoding, you have to chunk up the data into multiple audio chunks. And the audio files are in those small little audio, raw audio chunks are given to the API. And you can't serialize this model's progress easily. In particular, with the model I was using, I couldn't serialize it out easily at all. So I had to store, I had no option but to store the model's progress, the state inside the API. So what we have so far is an actual working API, but we have to store the state inside the API. We can give it raw audio. It stores state, as you can see. It's trying to say have a nice blah, blah, blah. And it gives it back, it gives the model's progress back to the input. We can obviously scale it, we can scale up the inputs a little bit and the single API, the single server can handle this load. It can just store more pieces of state inside itself. But now we have this working API and we can't really, you might be able to guess what is the issue that's coming up, but there's not too good So there's not really a good way to see that with just looking at this diagram. And I thought a good way to see, okay, let's test out this API, see what might be a big flaw on it is to actually put it into an application. So that's what we're gonna do. We're gonna test this model by solving a problem. And the problem we're gonna solve that audio is a big factor of is call centers. So call centers have a bunch of phone lines so we can convert the audio into phone lines. So that fixes that. And what we have is a bunch of phone lines giving raw audio to this API and the API is creating a bunch of text and say we're the manager of this call center. Maybe we want to see if these calls are good or bad or if they're neutral or if there's common things that's being talked about. We can do this with another model service. We can add a sentiment analysis model. So when this text flows through this model, we're gonna get a lot of data like, okay, is it good? Is the call bad? Are they talking about cars? Are they talking about whatever? This model service will chunk this up into a little data object for us that we can then visualize in the web app. So as a manager, I can look at all these phone lines cumulatively through a little web app. Now, how do we connect the pieces of this puzzle? We need to somehow get the data from our audio decoder all the way to our web app. And especially if we add more phone lines, we're gonna need to be able to scale this. So one way that we can do this is use Kafka. Now Kafka is an open source stream processing framework that allows us to handle real time data feeds. Essentially what this means is we can have service A, which we can think of as the service that we want to push data to push data from and service B is the service we wanna push data to. So Kafka is gonna be this connection point for us. Service A does something called Kafka produce, which pushes data onto something called a Kafka topic. Now you can think of a Kafka topic as a stream, a stream with data or a little river. And what service A does, it produces and places data onto the stream. Now service B does something else, it consumes. So it's gonna consume. It's gonna take that data off the stream. So what you have now is data flowing through the stream and essentially connecting our two services together. Now we can put this into our architecture here and start making some producers and consumers in these elements of our puzzle. So the audio decoder is gonna produce and so produce to a topic called decoded speech. Now decoded speech is going to have all that text that was converted from raw audio. And our model service is the service that needs that. So it's gonna consume off that specific topic. But that model service also is gonna produce that it's gonna produce what it's analyzed. It's new analyzed data is gonna produce out to another topic, sentiment text. So it's gonna both consume and produce and the sentiment text topic will basically have all this data of is this called good? Is this called going bad? What are some common topics in this call? The management web app will then consume off that sentiment text and then that will complete the flow of data through our architecture. Now remember before we talked about trying to test this API by making it into an app. Well we have this theoretical app here and it looks like it works. It looks like we can see the data flow. We can see everything works. But there's still an issue and we haven't really found it yet. So we have to keep going through making this app more production-ish. So we should look at how it scales. And to do that, we add more phones. So with more phones, we have an obvious issue. The API is gonna be overloaded. It's not gonna be able to produce any sort of real time predictions, especially if we're trying to look at all these, all this data in real time, these phone lines putting in data constantly and there's thousands of them. We won't be able to, the management web app won't be able to actually see the most recent data coming in. They might see it five, 10 minutes after. So this is a big issue. And the obvious solution here is just to scale the API. And to do this, we're gonna do something called OpenShift. Now OpenShift is a flavor of Kubernetes which is gonna help facilitate scaling for us. And what we're going to do to make that easier is use OpenShift's auto-scaling feature. Now, what this is, you can apply this to our service which is our audio decoding API. And what happens is when the API is starting to use a lot of these resources, its CPU percentage goes up or it's just using a lot of resources, our OpenShift auto-scaler will increase the amount of containers that it produces. So that solves one of our problems. We no longer have a API that is being overloaded by all these phone lines. We're able to distribute the load. But the way we're distributing the load is the issue that we just created. We might see this new element in here called the Ingress Controller. Now the Ingress Controller is intercepting the request from the phone lines and it's redirecting the request to a specific audio decoding API. And the way it's redirecting and the one it's choosing to redirect to is the issue. We can look at this a little bit more closely is we can start with this sentence. We're just looking at one phone line right now and our auto-scaler are scaled up to three APIs. So we got our phone line which is just trying to decode a small little sentence, have a good day. And we can walk through what's gonna happen. First we send the first chunk of audio, have, which goes to the Ingress Controller, which then says, okay, well, we're gonna push you to one of these APIs, which is the first one. And APIs can do well, it's going to put it to the model. The model is gonna go through and then store it in state. We can do same thing with the next chunk, a Ingress Controller takes it, redirects it to the second one. Now, this is gonna start state here. You might see there's already a problem here. We go through with good, good does the same thing, goes to the next one. Then day, same thing, it goes to zero. So we have three separate containers and three separate states all for one phone line. And you might imagine, this is just a small sentence and it already messed it up. What if you had a large, hour-long conversation that you're trying to decode? It's gonna be split into three separate, it's gonna be messed up. Like we have half day, a good, doesn't make any sense. So what do we do about that? We need to use OpenShift Sticky Sessions. Now, OpenShift Sticky Sessions is really the real hero here that's gonna fix our entire API. This is the issue we ran into and this is gonna be the solution. So what happens, what you're looking at right now is what we have originally going on, but just a little bit closer. The phone line sends our request, the Ingress Controller, then redirects to the Audio Decoder API. You see here, it's also generating a cookie. Now, what's special about these cookies is that they are links between the, a cookie is a link to an actual container. So the Audio Decoder API, I say it's got like three different containers in there scaled up. A cookie is linked to one of them or another cookie is linked to another one. And that's really good for us because when the Ingress Controller sends that cookie back to the phone line, we can use it. But you see right now, we're not using that. We're just sending another request and ignoring the cookie. And it's generating a new one and sending us to a different container. So let's start using the cookies. We start saving the cookies in the client side and then send that same cookie back. So then the Ingress Controller is gonna remember the cookie and send us to the same container. We can look at this a little bit. We can look at this in the same example I had before with have a good day. So have a good day, we send have and it's also gonna send this cookie. It's gonna send this cookie back. So this cookie is gonna be linked to API zero. So then when we go to use our next request A, we're storing the cookie on the client and then we're sending that cookie with our request to the Ingress Controller. The Ingress Controller says, oh, I know that cookie. I made that cookie and it's going to say, okay, you should go to this API zero because that's what I remember I linked it to. And it's gonna do that for all of them until we have the entire sentence without any split amongst the containers. And that's our scaling that's fixed. It's great. We have all our phone lines that can store cookies. They go into the Ingress Controller and that routes it to the correct Audio Decoding API. And with that, we can scale as much, we can scale the Audio Decoder as much as we want and we can scale and we can add as many phone lines as we want without there being any issues. An OpenShift will out of scale up for us so we don't have to worry about any of that. So let's look at it in action a little bit. Here we go. Okay, here we go. So what you have here is just a, this is an OpenShift topology view of our application. Right here, this is our phone lines. We can, usually there would be many, but this is just a simulator that's going to be simulating a bunch of them. Think right now we just have one running, but this is all our phone lines, our simulator. This is our API. Now our API has the autoscaler on it and that autoscales to how many phone lines we have. Now this goes to, this is default analysis. This is our sentiment analysis model. And then this goes over to our web application. Goes to our web backend, which then goes to our client. And through this we can see, hopefully it's still doing what I want it to. But yeah, it's starting decoding with certain IDs, which you can IDs like a phone number maybe. An audio decoder, let's see if it's doing its thing. Yep, looks like it's producing. Yeah, so it's producing, sending to the audio decoder, decoded speech topic that we talked about. And then from there, it's going to default analysis, which is going to be looking at, it's going to be consuming and then producing. You see here it's consuming, great. And then we have our sentences right here. The question, blah, blah, blah. And then we can see some more data maybe, I think. Yeah, so yeah, this is the consumed. And then it's going to produce out to our API. And that's going to go to the client. So that was it running an OpenShift. So we look at and review what we kind of talked about here is we talked about statefulness and why our model service might need it. We talked about the issue here with the stateful API, stateful model service is scaling it. And the solution is to use OpenShift and use cookies, use OpenShift and use cookies, which will do that matching that we talked about back here. And that's it. That is stateful sessions for intelligent apps. So if there's any questions, I will take them. Otherwise, I hope you guys all had a great conference. Hope to see you at the next one. Thank you, Gage, so much for a very engaging conversation or talk and very nice pictures of cookies. Yeah, we'll take any questions now if anybody has any. And we can give it maybe five more seconds. And after that, at 4 p.m. Eastern time, we will have a closing ceremony or a closing session with a trivia that would be under tracks. I guess there are no questions. Thank you again so much, Gage. That was a very nice presentation, very easy to follow. And we'll see you all at the closing session, guys. Thank you so much.