 So, hey everyone, my name is Mihai, and together with Muru, who unfortunately can't be here today, I'm going to be talking to you about data processing at scale with Knative and Bentos. So a bit about us, I'm a principal software engineer at Optum, and I work on data streaming, and I'm also a Bentos contributor, and Muru is a staff software engineer at Box, and he works on container platforms, he's an open source contributor, and he's also part of the Knative Steering Committee member representing end users. And for today, we're going to cover Knative autoscaling, how that works, Bentos itself and why it's useful and how you can use it, how to combine Bentos and Knative, and what are the benefits, and I'll do a demo on sentiment analysis on Twitter data using these two components in mix. And now I have to play a video from Muru, who like I said, unfortunately can't be here, and he'll cover the first part on Knative autoscaling. We're going to be talking about my favorite feature in Knative, which is the Knative Port Autoscaling. The other favorite feature of me is the developer experience that you get out of Knative. What I mean by developer experience is like, you create one Knative service and that creates all the deployment Knative, sorry, Kubernetes artifacts for you. Apart from that, you also get AutoTLS and the custom domain, things like that. So about Knative autoscaling, how does Knative provide this autoscaling for you? There are two metrics that it could use. One is concurrency, number of concurrent requests to your application disease, or number of requests per second. The default one is concurrency and the default value is 100. For example, you create a Knative service without changing any default values and you send in like 150 requests to your application. You should have two parts of application running. And this could be changed based on your needs. It could be either concurrency or request per second and you can change this value around. And if you see who's sending these requests to you, it could be end users, like millions of end users accessing your application, or it could be an application calling your APIs, or you could be leveraging Knative eventing and which could source data from multiple event sources and send it over to your application. Since pushing the request to the Knative service, we call this push-based autoscaling. There is another set of use cases, like batch use cases, like the data which is in database or CSV. Can we use Knative for autoscaling to speed up the process? Yes, you could, but you have to develop a component which is like for each use case, which takes the data from, pulls it from a database and send it over to the Knative service. And since it pulls the data, we call it like pull-based autoscaling. And what I mean by like a special component or a bespoke component is like, suppose if you want to source data from a CSV, you need to develop a component which can source it from CSV, like let's say you want to process like 100,000 records, it has to take it and then it has to chunk it. It has to parallelly call your Knative service, like maybe 100 requests in parallel with 10 records per request to get the board autoscaling and the process to be done quickly. Wouldn't it be nice if there is a component which could integrate with multiple data sources and it's just configurable and it could also integrate with multiple things. We don't even need multiple things. We just need like HTTP, GRPC, sync to send the data over. And it is scalable based on the load and it's resilient to failures. If there's any failure during the whole processing, it's able to catch up. And last but not the least, observe. We should be able to observe the whole process. To talk about this component, I'm going to hand it over to my co-presenter, Mihe here. Thank you all. Over to you, Mihe. So that was Murus' part. And thank you for listening to that. I'm sorry you had a bit of technical difficulties. So let me go to the next part of the talk. And I'm going to go into Bentos here. So what is Bentos? And as Ashley Jeffs, the Bentos creator, likes to describe it. It's fancy stream processing made operationally mundane. And what he means by this is that Bentos is a very simple thing. It's a single static binary. It supports many sources and many syncs, like real time, batch, file storage, et cetera. And you have a configuration for it. And in the configuration, you specify your sources, your syncs, and then various operations that have to happen on the messages as they are in flight from a source to a sync. And what kind of operations do I mean? Well, you can do transformations, like imagine a schema migration for various messages, filtering. Like if you want to drop some messages you don't care about, you can just do that. Or hydration, if you want to, certain fields in the messages need to be adjusted somehow and hydrated. Or enrichments, if you want to, for example, get data from some other source and put it in the original message and enhance it and then send it to the output. And why Bentos? Well, as you probably guessed by now, it's written in go, there's the go-go for here. It's performant and simple, so very good at this kind of small workflows where you launch it as a lambda and it does the thing. It supports YAML and QyConfig. Well, Qy is kind of a recent addition which is used to direct the YAML config for Bentos. It's stateless, so it doesn't store any state locally about the messages that are inside in flight in Bentos. You can, however, configure a cache for it. So, for example, if you want to have something like an in-process cache or external cache, you can do that and it's extendable. So, as I said, it's written in go, you can import it as a library, create a custom Bentos binary with your own additions and have that as a custom functionality. So, let's combine them. We can take Bentos, we can take Knative serving, put them together and see how this thing works and how you can get Knative to do the auto-scaling for it and do whatever you want for your pool-based auto-scaling model that Muru was introducing back in the video. And for the demo that I'm presenting here, I'm going to have a local Kubernetes cluster which is just going to be kind, basically, and it's going to run Knative serving. And Knative serving is going to auto-scale the sentiment analysis lambda, which I'm going to show in a minute. And Bentos is going to run here on the side as a separate process and there's going to be a database. In my case, I just have a Postgres locally running and there's a source and sync table. And Bentos is going to read data from the source, batch load it, and then it's going to construct these batches that is going to send for processing for the sentiment analysis app and then it's going to batch store in the sync the data that it gets. And having said that, I'm going to go into the demo and please bear with me, because this is going to be kind of complex and I hope it won't blow up on me. Right. So I have a pretty long read me here and I'm going to try to go through it and hopefully everything is going to work. What I did do in advance is I did deploy the Knative serving cluster and kind. So everything there should be already running. Let's just go through that. Oops. Hopefully this is going to work. So yeah, we have Knative serving running here kind stuff running in there. And I'm just going to launch here a watch on my pod so I can see things coming up. So there's no pods yet in here. So what I want to do now is create my lambda and by the way, this is probably a good time to mention that my lambda is actually a Bentos, a custom Bentos instance which imports sentiment analysis library. And to show you what I mean, there's this deploy Bentos Vader script here which basically has inside it a pretty boring Knative configuration with a bunch of annotations. You can see I'm setting the target rps to 200 and auto scale max 10 instances so I don't blow up my laptop and it's checking the window every 6 seconds. Right. So let's do that. So we created a config for our Bentos Vader. Vader being the sentiment analysis library that I'm using and now I'm launching Bentos Vader. So this is running that Knative CRD and starting it up and we can see here the first Bentos Bentos Vader pod coming up. It takes like 20 seconds to start up so I'm just waiting for it. And in the meantime we have this curl command here prepared. You can see it came up so it's running. It's healthy. And if I run my curl command oops you can see here all the way at the bottom that it did produce this output. So I said hey you know I love Bentos and it computed the sentiment for this and it came up with a bunch of numbers that I'm not going to try to explain in detail but it looks pretty positive like it's a positive number here. Right. Then we have one pod running. What I'm going to do is start my database so I'm going to start the Postgres in Docker and I'm going to populate the database and I managed to sneak in here another Bentos which actually loads the data into Postgres just for fun. So now let's look at our data so if we select count oops we see we have about almost 1500 records in here and in our sync we don't have anything sync yeah and then if we look at our source let's just have a quick look see at it yeah we have this ID and we have the text of the tweets um yeah it doesn't matter too much just a bunch of stuff right and now I want to launch a Prometheus instance just to show you a bunch of metrics and I have a URL ready made for it here and as you can see there's no data yet so I'm not pulling anything out from under my sleeve and now what I want to do is run Bentos and this is the Bentos that does the batch loading from the source table that I showed you in the diagram and the slides and the config for it that I have is here oops so it has an input which is the Postgres with the source table it has a pipeline where I'm reaching out into this lambda and I'm taking it for the sentiment analysis data and then the output is here which puts the data in the sync and just puts those four fields that are computed and I'm applying a bunch of output batching just such that this output is fast enough to show you the auto scaling um so if we go back here and I time my Bentos this is going to take about 30 seconds something so we can see here on the right that our one pod is running and it's spinning another one up just because they realize that it's doing a bunch more requests than one can handle if you remember I configured this um here with 200 rps um yeah so it's still churning along we can actually um look at our database if we want to but I'm just going to leave it alone um let's see sync right so it's almost there I hope it's going to work well because like it's a local demo so my laptop sometimes is slower it's faster it depends on how this containers are initialized and before it was a bit faster but anyway let's let's just bear with it there we go it finished it finished in one minute and a few seconds so we're good now let's try to play with the Bentos config a bit and here I'm going to use six threads instead of one to talk to this a vader service a bentos vader service so if we run this again hopefully this time it should be faster and we should see a bunch more pods coming up here like two of them already came up three so things are going better and if we look at our um output data if I still have it I closed it okay great it's putting more data in the sync and it should finish hopefully a bit faster than last time uh like I said it's a bit environmental depending on how Docker initializes the containers and I see it took well 38 seconds it was a bit faster previously but I think we did better like we ran a few more pods here and now just for to drive it home like we can look in Prometheus here to see what we did so we see initially we run it here and we had about uh 260 messages processed per second by the lambda and over here we reached the peak of 478 and for some reason it dipped down a bit I'm not sure why but like I said it's running locally so it's a bit environmental um so yeah that kind of shows you what I wanted to demo and if I go back to my slides I just want to say in summary Knative supports push-based autoscaling but for pull-based autoscaling we require bespoke components and bentos is one way of doing this and it provides all this functionality for free it's configurable and it lets us use a lot of components and we can write a bunch of bentos DSL to transform our messages and it's like a Kubernetes native application that can be running supports all the probes we need all the metrics also things that make it an ideal choice for this kind of workflow and with that I want to say thank you and I will take any questions you might have thank you for the great talk quick question concurrency or request-based autoscaling what's your take on this and can you hear me very likely maybe speak a bit louder okay question on concurrency or request-based autoscaling I saw you using request-based autoscaling first of first question why did you pick the one and second which one would you recommend for users yeah it's it kind of depends like here it was mostly because of so the question is why did I use RPS instead of anything else and this depends on like how you run the demo I run it here locally and I kind of I try to fit around with the parameters to make it show something useful but I think in production things will be very different so it really depends on your workflow and a more realistic test case to figure out what is the best way to configure Knative to autoscale the app but I don't have a realistic setup so I can't really play with it to give you informed advice hope that helps any other question questions okay well thank you hi I'm Maru remotely yeah thank you thank you everyone thank you