 Good afternoon, Prometheus Day. My name's Alex. I'm here to talk to you about OpenFAS, specifically the auto-scaling and how we're using it, along with Prometheus, to horizontally scale out functions for open source users and for customers alike. But I'd like to know something from you who writes code that's mainly what you do, put your hand up. Maybe 40% of the room. And who runs other people's code? Maybe 70%? Yeah, OK. That's good to know. So we're going to start with a quick primer on what OpenFAS is, because sometimes people may not realize the use cases for functions are pretty similar to microservices, especially when you combine it with Kubernetes. We'll look at the original auto-scaling. And I know Julius Vols couldn't be here in person, but he helped me a lot back in 2016, sort of thinking reason about this and Prometheus itself. And then we'll look at the new scaling that came in in 2022, earlier this year, and why we needed to sort of think about it again and overhaul it and what had changed. You're also going to get, if we've got time, a little bit of something that you can take away, so all of those that raise their hands. You may be doing this already, but if you're not, recording rules, filtering queries, and rewriting labels are really powerful features of Prometheus and probably going to be very useful for you at some point. So serverless, what is the basic idea? Well, the way I like to think of this, and you may have seen this graph before, is some sort of progression or evolution. And sometimes people get confused because I think, well, microservices are usually better than monoliths, so everybody should just use microservices, and if you're not, it's bad. Then we have serverless, and if you're not using serverless, well, that's bad. Really, the way I look at it is you use what is right for your team and your technology. And actually, functions work really well alongside microservices and monoliths. Traditionally, they'll take a request, process some data, and give you a response, just like a promql query, takes in some text and gives you a graph or some statistics. But you could have a function that ran promql. That's an example, for instance. But really what we're trying to do is take ourselves from this. We're having to put a server in a room, connect the cables, really having to babysit it and monitor it and buy in network stacks to where we're having to write a Docker file, write a HPA rule for our microservices, and then deal with all that duplication to functions where we can automate a lot of this and maybe even infer it automatically as what would be a good configuration. That's important for 100% of the people in the room, because developers have a bit less work to do and operations as well. So OpenFast started in 2016. And I was checking out Lambda. I really liked the idea in the technology, this idea that you could write some code and just have it completely managed. I was using Alexa. But then I wanted to run it with Docker because we weren't really into Kubernetes in the same way back then. It wasn't as popular as it is now. And there was just no way of doing it. So I had a bit of an experimentation on my own in my bedroom after work with a full-time job. And it turned out that it was very popular as an idea. And there's now over 340 people that have contributed code to the project. And we've got full-time people working on it every day, improving it for open source users and customers. So I'm really here almost as a developer, but also as a end user, as I'm kind of presenting what our end users have said to us. Now, what does this look like? When you look at that code, does it look familiar? Does someone shout out? Looks like a middleware in Go. We've got a request and a response. We've got a HTTP status code. We've got a body. It's very familiar to us. And this is all a function is. It's a HTTP handler. But what you don't see here is the Docker file that your SRE team is maintaining. You don't see the entry point for the HTTP server or the Prometheus logging metrics, rather, because that's all done centrally. So you just fill in the blank, and you have a production ready endpoint. That's the idea, at least. And you can work with any language that you like as long as it can be packaged in Docker. So when we think about this, what is a function in OpenFast? Well, it may be different in Lambda, but certainly in OpenFast, from the bottom up, we have Kubernetes. So eventually, there's a pod. The pod can be scaled, which is what we're doing. And we have more replicas of that. It's kept as an OCI image. It's in a registry. And then, as we start to go up the stack, is where we give you a bit more value over, are you doing it all yourself? OK. So the OpenFast gateway is a RESTful API, which allows you to deploy your code programmatically and quite often get asked. So Alex, you've written a book about Go. I've got my microservice. But how do I deploy it remotely to this machine? Even if they're not running Kubernetes, it's quite hard. Well, OpenFast is an API. We send a JSON request. You've now deployed a new version of your function. Prometheus will give you centralized logging, so we don't have this issue that Bartek was talking about, where our pod has come up, and then we scale down. It's gone, and we've lost the metrics because they're aggregated in the gateway. That also means we can scale with it. And that allows us to do asynchronous messaging, who's ever done integration with a queue or an event source, Kafka, RabbitMQ, we all have. And every company you go to, if they don't have it yet, you've got to write it all. Well, we bundle that for you. You just send a request, HTTP. It gets NQ'd, DQ'd, and scales as well. And then all of the normal things that you'd expect from Kubernetes will work, so Argo, CDE, and GitHub Actions. So that's OpenFast in a nutshell. You can find out a lot more with the link I posted into the Slack channel. So Patrick Shanzium was the chief storyteller at Docker. And he'd seen OpenFast, and he said, wouldn't it be cool if we could use alert manager not to say whether the function was down, but to actually scale it horizontally? And it turned out that it was actually really easy to do. And so at DockerCon, I had my first test of this. And 5,000 people, very kindly, got their phones out, starred the GitHub repository, which generated a webhook that downloaded their avatar in a function, and uploaded it to S3. And so they got to be on the big screen. And as an SRE, we were a DevOps engineer, it's probably quite hard to get production ready traffic. And here, we were able to get it quite quickly. This is how it works, at least version one. The gateway, remember, is centralized. So all of our invocations are going through it, and it can scale out. So that allows us to collect the data, aggregate it in Prometheus, with our function right on the other side. It's just a normal pod. Works with all your normal tools. And sitting down here is alert manager, and a nice little alert. And as it's firing, the gateway is able to handle this webhook, and then call the Kubernetes API and say, I want a few more pods. That make sense? Yeah, I'm seeing nodding. And so this is a very basic query, and I myself couldn't really believe that this was useful in production. But it turned out that it was, for one reason or another, up until about 2021. And that's where we start to see a few more edge cases. So after the webhook had fired, we'd go proportionally up in our replicas. If that was firing still, we'd go proportionally up again and up again until we hit the ceiling, and then we'd have maximum. So this is what it looked like at Dockocon back then. We had the big amount of red metrics, so rate error duration. And then, look at that right on that side of the screen. Do you see what happened? We went immediately back to minimum replicas as soon as the traffic had abated. And that might not be what you want in production if it's more sporadic. So bear that in mind. The other thing that we had was scale to zero. An internal customer of OpenFaz at the time had a GCP account. They had three of them. They were using OpenFaz in their product. And they just said, look, we're spending so much money with all of these nodes up. Could we just scale these functions back because we're not using them and get the money back? And it'd never really been that interesting for OpenFaz. Even today, the default is to have one replica all the time so there's no cold start. But we built it. And because Prometheus had been so useful, we just put a demon in there. It ran from QL periodically for the functions. And if there was no metrics or it was at zero, we're pretty sure we could scale it down. Except, and perhaps you are already thinking this, there is a bit of a lag when you scrape, isn't there? And so what might happen is, you invoke the function. It then eventually goes off the query and we're back to zero. We query the data and it says zero. Then we invoke it, then we scrape it. And if we check in between then, then we can get a gap. So we've done quite a lot of work about graceful draining of requests. Now, anybody here done graceful draining? It's a pain in the neck. And Kubernetes doesn't make it any easier. So we automate a lot of that for you as well. So you're not gonna suddenly drop production data. Okay. So one of the things about open files is we're really trying to make the developer experience simple. Compare this to your average pod spec. Two fields, the function name, the image. It's pretty much all you need. And then we've got labels to help you tune the auto scaling. And this is the old way. But we had some issues, right? So we only had that one generic rule. So I could say I want 10 requests per second or 100, but I couldn't have both for two different functions. As I say, I was sort of surprised that this was useful in production. But it was. Long running functions would never trigger the alert because if we're doing 10 requests per second and we're taking 20 seconds to do one, we're not really got much data there. CPU bound functions like ML models, they just didn't really work with this either. And then HPA v2, you're probably thinking, well, why didn't you just use this, Alex? Someone said that in the break. We did have some customers using it, but they found it incredibly hard to integrate because they had to manage the objects themselves. And then finally, I did sort of mention there's some edge cases on scale to zero that you have to be aware of. So we had a new go. Had another go at this. And this time, and like in 2016, when I was in my bedroom having a cool idea, we actually had customers paying us who could tell us how they were using it. So Kevin from Search imports mortgage data and he does it every day. They have a huge amount of mortgage data and then they have a Salesforce integration and they can give a credit rating for customers of their customers. Makes sense? Sounds quite useful, doesn't it? And what he needed was to scale up as quickly as possible but be limited so that a pod wouldn't do more than a couple of requests at a time. It was quite intensive. So we built an in-flight request mode which is started but not finished requests, almost like a queue depth. And that serves his use case really well. The other one was request per second. And we made that more tunable because clearly one value for the entire system is, you know, it shouldn't be useful. So William is importing security logs, DNS logs, potentially any other alerts that are coming from all kinds of legacy systems, running them through different functions and the request per second scaling is really effective for them at checkpoint software. And then finally, the CPU is one that I thought we just wouldn't need because red metrics is supposed to be the trendy new way of doing everything and use metrics as like the old Google way of doing it. But it turned out that they wanted CPU as we looked at the data, found out we could get it and do it all in one way and they use that for their ML models. So how does the new version work? Well, alert manager went away and we have an autoscaler demon. Its job is to query periodically Prometheus with promql. We also get metrics from the kubelet which eventually gets them from Cadvisor and that gives us usage, saturation and sometime down the line, we might get errors as well. We don't get resource errors right now. The labels change slightly as well. We have three modes, CPU, which you guys can probably understand quite well. Capacity, which has started not finished requests and the last one is the RPS from before. And a bit like HPA to provide that familiarity will say the target is that pod can only do 500 MI, like half a core. If it's doing any more, we might need two pods. And so that gives us a level of compatibility and people can just adopt this without installing Cado or HPA or whole new load of software. And then the scale to zero now has a specific time per function because we had some functions that should scale down after an hour and some almost immediately. Okay, so if you can't see this at the back, what we have is a single metric which is from a recording rule. And we're able to say or a function name and the scaling type, what is the load right now? And the load is 1800 MI milli CPUs. We divide that by the target of the function which is 500 milli CPUs, it gives us 3.6. On the other side, we know we have three pods, 500 each. So we're clearly slightly under provisioned and if we seal that number, we get four. Therefore, we just schedule four pods. And that same algorithm in Prometheus metric works for all of the strategies we've built and any we build in the future would work in the same way. And now our dashboard is so much more useful than the one I showed you from DockerCon because we can correlate the CPU usage just for the functions with no noise from the rest of the system with the duration, any retries, the delays and the scaling. And here what I did is I invoked a function that was concurrency limited. It had to retry a whole bunch of times which is this peak and eventually it completed all the work and it scaled back again. And so we're giving this dashboard to customers now because they just have no clue how to run the system. It's very hard to run anything distributed. We give them this and when there's a problem they can go to it and we can ask some questions as well. And maybe you have similar things for your internal customers. Okay, so pod usage total seconds. If you recognize this, it comes eventually from Cadvisor and what we'll find is that it's got stuff like alert manager and even open fashion. We don't care about that. The customers only care about functions. But there's nothing in here. There's no cardinality like a label that says it's a function. So how do you fix it? Perhaps you can filter it out. And so if you've got at least one label in there that you can match with a data set that lists all your functions, you can then clear that down and you won't have that kind of data spilling out on your dashboards. So anybody here think that might be useful to them? Filtering, joining data, this is what you need. So we have our query by the key. Then we can put a times and on. And this is the important thing. We need to have something common in both sets and then we can filter out all of the stuff that is not in gateway service count, which is our replicas per function. And all that means is that we then need to kind of think about this data we've got from CAdvisor and from Open Fazz's gateway. How do we look at it in one view? Well, recording rules were really helpful here because now I've just got one thing to look at. You won't be able to read this from the back, but that's quite a lot of promql. That's the promql for each of the three strategies. If I wanted to just run one query, I'd have to find some way of matching all of those tables together, all those views, and having one piece of promql. Now, that just is so difficult. And a friend of mine from the project said, why don't you use recording rules? Anybody using a recording rule? So, okay, I'm preaching to the choir. I only found out about these this year. And so now I emit a label for each of them. And look how simple that becomes. That gives us the request per second metric that we needed. So record, expression, and labels. Definitely have a look if you've not. And now we've got two. So we've got the RPS in there, and we have the other strategy. Now, one thing I didn't expect is by putting this data into the one chart, it would help customers immediately get value because we all know about cube state metrics. We've read all the blog posts. We've been on Stack Overflow, but Billy that I showed you earlier, he forgot to put body.close. Who else has done that? I've done it, yeah. It's an honest mistake to make. And so he detected a memory leak, not a big one, but maybe five megabytes, every megabyte counts. And you can see when he redeployed it, it stayed very stable and dropped back off again. So thanks to Prometheus and providing the new data, we were able to actually see things that we weren't seeing before. Not only that, but because we've got the data, we can now infuse it into the OpenFast REST API. It's the same one you deploy functions with. You can list functions, we can put it in our UI. I was a bit concerned about this before, because I thought if we're going to query from QL every time we show this data, could it be slow? Could it add overhead? Linked either it in their product, other products do the same thing. So I thought we'd give it a go and it seems okay. If it becomes an issue, maybe cache the data. But what this now does is give us a live view of the aggregate amount of CPU and RAM for any of the functions in the cluster. Click on them, immediately we know. We didn't have this data before. Now onto a few of the tips, as we've got about 10 minutes to roll down. Who's seen a query like this before? It's a service discovery node. Obviously we've got, yeah, service discovery pods is very common. Back before we were using Kubernetes, we were on Docker swarm, there was one for that and there was a DNS discovery. This is really clever, and I think it came from KubeState Metrics, so thank you. What it allows us to do is get all the node names and then put them into this URL, proxy metrics resource, and that goes straight to C Advisor. Now, that means OpenFast can remain very lightweight. We just have our own copy of Prometheus. It only collects our metrics. It scrapes it very frequently and it throws data away very frequently as well. And so we sort of black box that in or package it in together all in one stack, so there's just one Helm chart to install. So we added this and that gives us that extra data. But, as I said before, filtering sometimes can help you, but what if the data's in the wrong format? Now, Regex is very easy to write if you want to match a phone number, or A to B, zero to nine, but this was a bit harder. And what we have here is I wanted sleep.OpenFast, function.namespace, because I have to join a lot of data together. So somehow we had to throw off all those characters at the end and then combine that with the namespace as a new label. And originally I was going to write a demon. That was going to run the demon. I was going to read the data, emit the data, scrape the data and use that. And this was just so much easier to do. And so that is what it ends up looking like. If you think that would be useful, you can get the slides later. Or come and ask me for this. So we just matched the first part, we throw away the rest and then this is quite cool. We can emit a new label by joining two together. So first of all we got roughly the deployment name and we emit a new one which is called function name and that matches to everything that we've already got. I want to ask you, if you've ever run into this problem, what do you think could have gone wrong? We've got a gauge and the gauge is the replica count. We've got one count of our gateway that emits this. Then we scale it to three. Do you think what might go wrong? In the data? Let's say we've got 20 functions. Of one function we've got 20 replicas. We then scale that data source to three, Prometheus is scraping it three times and we're summing the value. Well, I made this mistake and then I saw a customer make the same mistake when I was looking at their dashboard. Basically we get the value three times and sum is the wrong thing to use when you're going highly available because this is an exact picture of how many replicas we've got. We end up counting it three times. We need average instead. So watch out for this, especially if you're testing on your laptop, you're often not going to have a HA configuration. You get to production. This is the kind of thing you might see. But you still do need some. So for the function invocations and the red metrics, we do need to sum them up across the gateways. Okay, so to sum up, we now have request based scaling on a per function basis before it was just one rule with alert manager. Alert manager turned out just to be a little bit too hard to use, to get the behavior we wanted want something more like HPA. Capacity based scaling is really useful for slow running jobs. Perhaps you're scraping a webpage or running a machine learning model and you really can't just give it 10 requests because you run out of memory and you don't want that to happen. The CPU scaling is there. I don't know if it's going to be useful. I've got a feeling that capacity is almost a proxy for CPU, but I need to validate it. Now we've got quite a versatile system so we can tune this as well. The recording rules, that it sounds like a lot of you are using, are incredibly useful for simplifying these metrics. Really, we should do more to tell people about them as a community. I certainly have found them very helpful. And then filtering. I have a lot of problems with this. And I think quite often people want something that they can relate to, like SQL where you can inner join two tables on common keys. Again, we could perhaps do a little bit more to sort of explain this technique to help people get the best out of their data. Consider whether you need average or some because it's quite tempting to use some. You test it locally, it works. You ship it to a customer. Perhaps you need average in some circumstances. And then do factor in the scrape interval. So if your function or some data you're relying on to drive a control system could have stale data. There are things that you may need to do around that, such as what we're doing, which is a graceful shutdown, a big dance about taking the endpoint out of the connection pool, waiting a certain amount of time, everything you have to do. So bear that in mind. Now in terms of future work, Kevin that I mentioned to you before has now asked, could we scale based upon the queue depth in NATS? So there isn't really such a thing as queue depth in NATS or even the new NATS jet stream. It's very difficult to do, but I've got some ideas. I've been having a look at it and that might then give us a fourth strategy and then we'll be able to omit the metrics and use the same algorithm without doing too much more work. So that is OpenFAS. You can find out more about that at openfas.com. If you'd like to email me about anything that I've said or any ideas you have for your system, alexatopenfas.com, be more than happy to speak to any of you. All right, thank you. So we have seven minutes for questions. Also, Ian, if you want, you can already get that up like in five or so. Questions. Is there any way how a function can populate a custom metric that will show up in Prometheus? This isn't something that's come up and Bartek, when he joined the community call, he asked the same question. It's how are you scraping the metrics? At the moment, the red metrics have been sufficient, but I could potentially see that being an ask. We're trying to do is keep OpenFAS as simple as possible and not add features that we think might be useful but really drive it on customer interest. Now, as soon as we do get that, it's a very good question, how do you do it? And I think the talk we just heard before has all the answers to that, or at least the best answers we've got. More questions. One of the questions that Bartek had was, no, I think it was Damian, is why not use HPA? And HPA is an obvious choice when you're doing this. So I didn't cover this in the talk, but we wanted to keep the installation requirements as simple as possible. We were already using Prometheus and had a pretty stable system, we're happy with it. This just meant that we were able to keep the stack very lightweight and scale on exactly what we wanted without Prometheus adapter, cube state metrics, and goodness knows what else. Well, last thing to say then, if you want to get a bit deeper with Go, perhaps you're thinking about writing an operator or a Kubernetes controller, integrating with Prometheus, I have an ebook that's available, you can find that, just Google for it every day go, it might be helpful, all right. Thank you. Okay, thanks everybody.