 Hi everybody, I'm Gage Krumbach and I am an intern at Red Hat. I work in the forward deployed engineering group and today I'm going to talk to you guys about stateful sessions for intelligent applications and here we're going to be using OpenShift to power a multi-user model service and a Kappa architecture. Okay, so first we need to talk about statefulness. Statefulness in an application is something you've probably seen before and that's something like data being saved for future visits. So if you go to Amazon or any kind of e-commerce site, you're going to have a shopping cart, you put an item back maybe a day or two later, that item is still going to be in there. That's statefulness and I mean we use that for usernames and passwords, e-commerce sites and everything. Now in an API, this is a little bit different, it's a little bit more short term, it's more of a session than an actual day-to-day thing. So we're going to have, we're looking at a time frame between like five, 10 minutes instead of anywhere between days and months or whatever. So in an API, the data needs to be stored for persistent interaction, so past data is going to be needed for that future data and that's not always the case but a lot of times that's why you might need statefulness in an API and to look at why we might actually need a stateful API, we can look at this example for a puzzle builder. Now the puzzle builder is a API, it has a board inside the API and on the right we have puzzle pieces. We want to post these puzzle pieces over to the puzzle builder API and then we want the API to correctly place these puzzle pieces on its board. So we start, we're going to start without a state, so we're not storing any state inside this API yet. Now we can see it successfully puts a piece in and it successfully puts another piece in, but it doesn't save the state. So we're never going to be able to actually save the entire puzzle. Now in contrast if we have states, we're saving the state of the board inside the API, we're able to put all four puzzle pieces in and complete the actual set. Now we can do a little special data manipulation here with serializing it out to a database. Now the pros of this is so we don't have to keep state inside the puzzle builder API, this makes it a lot more restless and we can make this a lot more restful and scale it and have a lot of other fun features with restful APIs. Now the issue is we can only serialize data that we're able to serialize. If our data is not easily serializable like we're working with third party tools or there's just some objects that it's not worth doing or maybe we're trying to live some sort of live encoding or decoding then we don't want to spend time pushing up to a database. It's a lot of reasons why we might want to keep the state inside the API and that comes with a lot of difficulties. So now you have to start storing the state inside the API to manage it. If there's multiple users going to manage multiple different states and then those two things up there will go hand in hand with scaling and it's not easy to scale a statefully API right out of the box. Now to further look at why we can't do that I'm going to introduce this real-time call center management use case and here we're working as so imagine you're working as a call center manager and you need to monitor multiple conversations for quality assurance. So say there's like 10, 15 or like 100 call lines and you need to make sure that each call line is either good, bad, there's some common problems. You need to be aware of this and you don't want to keep spot checking each one. So there's a couple ways you can look at these as a whole. One of them would be listening to each one at the same time. It's on something like this. You can kind of pick out a little bit but maybe not enough and it's probably not the best way to do it and another way to do it would be using machine learning and having a whole model service and data flow that actually transcribes this audio and then puts it through some natural language processing which sorts and groups these live audio calls together and then you can go in with a web application and manage them from there and to do this it's not too bad. You just need a couple things. We're going to use OpenShift to deploy three services, two model services in the web application. The first one is our our first model service is this audio decoding API and it's going to decode audio coming in from the phone lines on the left. So the phone lines are going to call this API, host some audio chunks to it live and it's going to store the state of this audio in the container and the reason it needs to store state is because audio decoding, at least live audio decoding is going to need past state to actually decode. It's going to need past state to actually decode properly. If you don't have past state, then it's just looking at small audio chunks and the sentiment analysis model is going to take these decoded texts. You can have a bunch of texts from the phone lines and it's going to group them based on if they're good or bad or if they are or if they have some common issues so we can extract the nouns or verbs or whatever and then we also need a web application to manage all this. Now the issue is we have all these in OpenShift and they're running just fine except we need to connect them somehow and to do this we're going to use Kafka. Now OpenDataHUB, this is a OpenDataHUB is like a group of machine learning tools and OpenDataHUB utilizes upstream efforts and one of these upstream efforts is Kafka or Strimsy and we're going to use this to connect all of our services together. So through OpenShift we're going to deploy Kafka and if you are not familiar with Kafka you can sort of think of it as data streaming or one of its use cases is data streaming. So if we have service A and then we have service B on the right we want to take data from service A and put it in service B. Service A can produce data, use Kafka produce and producing it puts this data onto a stream or a Kafka topic, a stream of data. So this is a one-way stream of data where we could have multiple services or one service putting everything onto this stream and then service B can be on the other end of the stream and it can pick up, it could consume data from that stream. So it's going to look at the Kafka topic and consume off of it and you can have multiple Kafka topics and they're all independent of each other. So now we're going to use this to tie together our solution. So we have our data from our audit, we have our data from our phone lines going into the decoder and decoders can have this text. This text is going to be produced over to the decoding speech topic and from there our model service is going to consume and produce its analysis onto this sentiment text. So now we're going to have this little object of our sentence and our sentiment analysis or maybe some more natural language processing analysis and that can be all consumed by our web application. Now this flow, this architectures, which is called the Kafka architecture where our data is coming in through one source and it's flowing through a bunch of services to get to an end source. Now the issue is if we start adding more phone lines because obviously we're going to want more than four phone lines it's going to increase the load on our API and once we have a larger load on this API it's going to start to not be able to produce real-time predictions anymore because it's going to have to wait for each prediction to be done and then it's no longer going to be live and it's going to sort of defeat the purpose of our whole project here is to look at our phone lines live instead of just recordings of them. So the obvious solution would be well we should just scale this decoding API, right? Now that doesn't work all that well but first we need to introduce a couple things here in OpenShift. Now OpenShift has an autoscaler which will automatically scale your API based on CPU usage so because we have more phone lines it's going to start to scale the API a lot more and now we might have tens or even hundreds of APIs and our ingress controller on the left it's one of its main uses is to take external traffic which is our phone lines and then route them to internal services such as our API so a phone line is going to push it's going to call our API and our ingress controller is going to intercept that and then put it in one of the APIs. Now the issue why we can't scale here is because the ingress controller doesn't always put it doesn't always send our phone lines to the same containers so for example if I'm talking in one phone line and I'm going to API one and that's fine I have a state there but all sudden I continue talking but the ingress controller now will put me in state in API two so now I have two states and the second state doesn't know anything from the past so it's going to have a different prediction and the state one's going to have a different prediction and they're going to try to combine each other and it's not going to work well so we want to have we want our phone lines to be linked to these audio decoding APIs the same container so a phone line to a decoder and a really cool solution for this is OpenShift sticky sessions and sticky sessions are an easier way to deal with this scaling issue here and sticky sessions they use cookies for session persistence and that sort of works because cookies are going to be linked to the container and cookies are generated in the ingress controller now if we look at this how it all will work in our flow here we have a phone line makes a request to the OpenShift autoscaler now that's going to be intercepted by the phone line is going to make a request to the audio decoding API and it's going to be intercepted by the ingress controller ingress controller is going to generate a cookie this cookie is going to be linked to this container that it called so it's going to and then it's going to send that cookie back on the response that the API gives so the phone line is going to receive its response and a cookie and this cookie is a link between itself and the container now originally right now we're choosing to ignore this cookie and we're going to send another response well to give us a different API and over and over again which will mess up our system now if we just start saving our cookies inside our client our phone line then when we make a request we're going to send the cookie with it and the ingress controller remembers that cookie and the so then once because it remembers the cookies it remembers which container to go to and it's going to send it to that container so now our phone lines are going to be linked to a container so we're no longer losing state our state's always going to be linked together now if we actually put this into our solution where we have our phone lines on the left with cookies our ingress controller is going to map these cookies to the container and everything works in unison now which is pretty cool and it all that really required was our was the client to be saving cookies and i mean that can be done a lot different ways i did it in python and all i really had to do was change a couple lines of code to just say hey start saving cookies and then it just it just starts working open shifts ingress controller manages the rest so now we're going to look to see what this actually looks like an open shift i think you guys can see this yes um so we have all our parts in here if we look we have our call simulator now this is just going to represent all of our phone lines and this is going to represent all of our phone lines yep and the audio decoder is going to call all of the it's going to take all the call simulators on calls this is our api and then we need our minimal notebook now this is this is going to be our this is our uh sentiment analysis model yes and it's not a service right now because we're in development so we want to be running it in a jupyter notebook so we can further develop it more well first let's just spin up some of these simulators let's put up like i don't know four five six it's going to spin up quite quickly and then it's going to go to audio decoder we can say we want another pod for our so it's a little bit too much for our decoders to handle so let's get another one going and now let's go over here and start developing our our model uh yep we got to go to the route this takes us here to jupyter hub now we can there we go so now you can see we're using we're just installing a little bit of some installing kafka because we're going to have to produce and consume and they're also going to install some flare and ltk to do some sentiment analysis and download some pre-trained models just for testing purposes and we're going to set up kafka right here so kafka is going to have a from topic and a two topic remember in our diagram it was the two topic that went to the um because it's consuming and it's producing so we need both we set up our consumer and producer and all we really need to do is iterate over this consumer and this and then we can just grab each data piece from the stream of data we do some sentiment analysis we take out the nouns from our sentence and we package it up in this little object with an identification number and then we produce it and we just set it out on that topic which is where our consumer our web application will pick it up so if we just do a quick run here it's going to start pushing out a bunch of sentences now we're going to go over to our web application which is just this note app right here you can see it's going to start coming in in real time now it might not seem like the quickest right now and that's not because of our api that's because we're using jupyra notebooks here and it's being bottlenecked by the fact that this python script can only do one at a time in to the speed of the actual notebook but you can tell it's actually not that bad it's still producing real time results here and for six for six simulators it's doing pretty good you see that we have positive quality right here for some we have negative quality for some and then we have some top boards being extracted and then we have the call line id so let's review what we did here to actually get to this point we had an issue where we had statefulness or we needed to store state in the api we couldn't take it out because of issues with third-party tools with our audio decoder now scaling is what we needed to do once we needed more phone lines and that ended up being an issue because we needed state to be persistent inside the container we didn't need it in different containers we want everything to be linked and to link these we used cookies or open shift session stickiness and with those cookies we have a back and forth handshake between the ingress controller and the and the actual phone line and that made it so that session is persistent and as long as the ingress controller is up and the actual container is up everything will be working just fine now this is a solution it's a quick way to actually get statefulness up and running but it is not going to be like the only solution there is a lot of other solutions and one of the solutions I was actually looking at before this was using open shift operators and you make a custom operator a resource and then instead of instead of having just apis you have a bunch of different containers so each each call line has its own little pod and it's it's a little bit more complicated that's the issue but there are better ways to do this but here is it actually a really simple way to get it set up and running without actually changing a lot from your main build it's a good proof of concept and it could actually be used in a lot of ways or a lot of different ways as well so I think that's it if there's any questions I'll take them thanks gauge great talk it does appear that there's one question in the chat by Kasia where ingress controller stores state all cookies I'm wonder if you can answer that yes the ingress controller where the ingress controller stores the state um where it stores the state I'm not a hundred percent sure where it stores the state um it doesn't okay the ingress controller doesn't store the state itself it stores a container it doesn't it stores a link between the cookie and the container so the state is being stored in the api still so if we go back and we look at our first example here we are our puzzle builder api we can think of that we can think of that as our audio decoder api and these puzzle pieces as our audio chunks coming in now they're actually being stored in the api itself the actual container so the container is storing the api and the benefits of that is it's a little bit more quicker to like grab the state down instead of going to a database and the fact we couldn't go to a database anyway because of the audio serializing which is not possible but yeah it is just in memory inside the container but the ingress controller doesn't store a state it stores a link between the phone line and the container or more generally the client and the container