 Good afternoon, everyone. My name is Park. I'm here to talk to you guys about building maintainable and observable applications on serverless architecture. Just to recap on what I'm going to be talking about is the overview of the serverless computing. What we did with serverless in terms of monitoring, as well as the implementation of the feature that we built, and how to optimize performance in the serverless environments, which slightly different from containers or traditional server architecture. And how do you monitor the applications running on serverless architecture and make it observable? A little bit about me. I'm a software engineer at SignalFX. I've been with the company for a couple of years. I worked on the application side of the product, as well as the serverless monitoring. And I also did some feature development using serverless myself and also monitored them, as well. So getting to know a little bit about you guys. So show of hands, who of you guys have serverless running in productions today? Cool. And who are planning to potentially use serverless architecture in the next three to six months for your production? Nice, nice. Quite a few of you guys there. All right, cool. So what is serverless? So why do we use serverless? Why is it becoming popular? The reason it is picking up the steam is because it has minimal operational effort. As some of you guys might be in the pancake talk this morning, we talked about. And the split between development and operations, serverless help us only focus on allow developer to focus on development and minimize operations. So you don't have to care as much about where it's running. How do you provision all those servers or containers or configuring them? You just build your own function and you run it. And these are some of the serverless providers that are there. They are mostly have the same characteristics. Their charge model are slightly different. All of them are charged by execution time. But some of them are charged by allocated memory. Some of them are charged by consumed memory. And some of them are also charged by CPU that are consumed and allocated. So why am I here? So what we did with serverless? We, at CineFX, we use AWS Lambda. We use AWS a lot, and we decided to try out Lambda. And many aspects of serverless computing are the same across all the cloud providers. There are some coaches here and there, but most likely they are transferable. We have, like I said, implemented a feature using Lambda. We handle public traffic. As I mentioned before, it's minimal operational work, just development. It is working great for mostly long-tail traffic coming in that you need minimal taking care of. But what we found is that our customers and some of our engineers report slowness in request response time. So we decided to take a look at it and to figure out what's going on. So these are the example of one of our charts handling the durations of the functions. As you can see, and this is in millisecond. So as you can see, there's some nice graph over there, over the period of time. There's some spike in values. But nothing looks ordinary. So what's going on there? In order to explain a bit further on what's going on here, we need to take a look at monitoring 101 of serverability 101. That is, if you have multiple instances of functions running at the same time, they are reporting the data, basically the same data, because now it's a serverless instance of the same function running at the same time, unlike the servers or hosts where you have hosts parameter that you can dimensionalize based on that. If all of them are sending the data to the service at the same, exactly the same time, for example, in this case, one instance of serverless might be sending the number of 10, be it like 10 seconds or 10 milliseconds. The other is sending 20. And the third one is sending 30. In order to visualize them or calculate analytics on them, the default rollup, I would say, would average to 20, which sort of makes sense in general, because if you have all the data, you want to roll up the same data in the same data point, it becomes 20. The same thing happens when you do it horizontally over time. If you are to zoom out very far out and you look at the day overall and each data point, might becomes five minutes interval. If you have 10, 20, and 30 in the same small period of time, a matter of seconds, then you have to roll up the data point into average of 20 in the same way. Now, what does it mean in our case is that in order to understand observability, the data points will be rolled up. And the rollup has many different forms. One could do some, basically summing all the data together, usually the use case of counter and things like that, or we do average, which is what the default usually is and it usually makes sense, or min, or max, or count of what has happened. Now, let's take a look at the data again at what we have. Like I said so far, oops. And PowerPoint just died on me. Sorry about that. So rollups. Rollups is the action of operating on multiple data points into one data point. And it has many, many meanings depending on operations that we did, and you've got the flexibility of picking the one you want depending on the applications that you use. So what we did, let's take a look at the average that I had before on each of the function duration data point. I changed it to max. So what happened is that at each point in time, you can see that the maximum time spent on each of the functions actually spiked quite a lot, I would say. So we want to figure out what's going on there. So serverless, as you may already know, and I chatted with a couple of folks downstairs earlier yesterday, doesn't mean that there's no server. Serverless, on the other hand, it just means that you are no longer managing your own server. You're using someone else's server. And behind the cloud, there are container, server, and containers provisioned for you. And not only that, it is configured and has its own lifecycle. And this is important. Now I'll speak a little bit on why it's important. But once you have that in combination of everything, you get your nice execution of your lambda function here, for example. So what's the difference? Why is serverless different from the traditional container or server architecture? Execution time varies because the startup time and the implementation inside the code difference, it's stateless. What it means is that there's no longer mean to maintain state, counter, or anything that you can maintain inside the code. In traditional container, where the start of the server, you can initialize much of stuff. And then when the request comes in, you just handle those small requests. Now there's no state, so you cannot maintain those things inside the code itself. And one of the most important things I'm going to also talk about here is call starts and why it's important to the serverless performance. So what does it mean by stateless? No state. Traditionally, if you start a server or your servers, it launches once, and you can initialize your code any way you want. For example, in your crypto package, your XML parser, and those kind of things. And it has multiple threads. And as time goes on, you can handle incoming requests. And it will just serve the request, and it will be very happy. But in serverless, that's not what's happening. In serverless, where every single execution and incoming request, you spawn up the new serverless instance. It's got initialized, a handle request, and that's it. The next request that comes in, the next one will be spawn up again, a handle request, and that's it. So this is basically what is going to happen in the serverless world. So now, what is call starts, and why do they happen? So call starts happen. Call starts describe the behavior when the serverless is being started. It started from your container that handles your serverless function being provisioned to your configuration. For example, CPU memory configurations in the region that you want, in the location and availability that you want. Then, your containers configured with network configuration and security configurations are applied. Basically, if you put it under VPC, if you apply the IAM roles, if you have permissioning, all of that are applied on the fly. And then, your code is uploaded to the container. So basically, if you have the code running with Python node or Java, the code plus all the dependency to the code is uploaded inside the container or mounted into the container. Then, the actual code execution happens. We launch the applications, and with node Java or Python or Golang, it's going to start up. Then, it comes to the actual running of your function. The last part is when you're actually handling the request coming in that you want to process. Now, cloud providers do optimize this from time to time. So not every time, you're going to hit with call starts. So if you have your function instance running, it just finished serving the previous request, and the next request comes in. It might be handling using the instance that you have there already running and executed right away. So none of the call starts that happens earlier would be needed because it is using the live instance of the functions that's going on. Now, what does it mean? There's this concept called a law of leaky abstractions. The law of leaky abstractions indicated that all non-trivial abstractions, to some degree, are leaky. In this case, serverless are meant to abstract out the infrastructure that is provided underneath so that you don't have to care about them. You don't have to operate on them. But it does impact the performance and overall behavior of your function. So what makes it so cold? I mentioned a couple of behavior of call starts. Network configuration, like I mentioned, do get supply to your container from time to time when you initialize that. Security configuration, same thing. Runtime technology depends a lot. So it varies a lot from runtime to runtime. So Python is very fast from what we measured. Node, it's OK. Java is really slow to start up a Java application in general. So take your pick on what you want. These are the things that you can potentially optimize for. Container provisioning and things like that have to be done regardless. So there's nothing that you can avoid there. Now, let's talk about monitoring. How do you monitor them? You can use the Cloud Provider monitor, like Cloud Watch or either a provider like GCP or Azure has its own monitoring system. But it has a basic information that's going on in your serverless environment and its execution. And then if you want to instrument something different, be it custom metrics or what's going on inside its execution, then it's going to be a little bit more difficult. And the Cloud Provider data can somewhat be lacking and behind. And also, it's actually hard to build charts and dashboard and do complex analytics on the built-in Cloud Provider monitoring. So what do we do? So let's talk a little bit about how else do you monitor your serverless environment? In the traditional old-school monitoring, you have your own lambda function. You would log all the things out to logs. Basically, you can print out as much as you want, whatever you want to instrument. It takes a little bit for the logs to be piped through from your function. Then you put in on the stream, right? So be it in AWS, you can use Kinesis Stream and Kinesis Firehose. It has its own pros and cons. But in this case, you have to handle your stream basically and make sure it stays performant. You have to scale with it. And in order to handle the scale, you also then have to monitor your stream. So this next thing that you have to operate on and maintain. Then you send it to your favorite log provider or aggregator. Or you can use another lambda function to actually aggregate your logs and process your logs. And a funny story here is that the lambda function itself would then provide under log, which then you will process again and you will become infinite loop and you pay your cloud provider an exorbitant amount of money for that. Then you visualize the data that you got from what you processed. So what we did with our lambda function is that we wrapped lambda function around our own wrapper. It's basically another runtime execution that handles before and after your lambda can execute it. So it can handle, monitor the startup time, the execution time, and call starts and all those kind of things. And it sends the data directly to signal effects where we can visualize the data. And it happens in a matter of seconds. In this case, it's maintainable because you don't have to have all the intermediate service in between that you have to also handle and monitor at low maintenance because you don't need to do any of that. And a very low latency because it just pipes through directly to the monitoring service. Now what do we look for? What do you typically want to know in your lambda function or as a serverless environment? The duration is very important. How long your serverless function got executed. In this case, you have a breakdown of each of the function that you have. The invocation count with the breakdown of errors, the actual total invocation. And what's very important, like I mentioned before, is the number of call starts that happens. And the throttle because you can have so much serverless of functions running at the same time over the period of a short period of time for each call provider, you might get throttle, depending on how high limits you have. And then errors. So you want to know what's the error rate of your functions that's going on in real time. That's the most important thing. Now, how do you optimize for the performance of your serverless function? Reducing impact of call starts is one of the things that you could do. Like I mentioned before, network configuration, security configuration, memory configurations, try not to have things that you don't need. Reduce the scope of the configurations. If your serverless doesn't need any private network configurations, then don't have that. Because these things add to the complexity and the startup time of your functions every single execution, as well as remove the unnecessary initialization of your code so that it doesn't unnecessarily do things that it does not need it. The other thing that we did, and a lot of people using serverless doing, is having a warmer. The concept of a warmer is a periodic timer that keeps calling your lambda function with like a no op flag. So your function can be bootstrapped and initialized without executing anything so that when it times comes, when you're required to do the real processing from the customer traffic, the serverless function is already warm and live. And it can execute the function right away without having to handle with all the provisioning and configuration. So like I said, it's this timer that has a warmer. And the most important thing here is the warmer can then also warm multiple functions or multiple instances of the function so you can also handle concurrency traffic. So this is what the example of the warmer function in the code. This is the lambda code example. So as you can see here, we're just moving through all the functions that we want to warm with the number of concurrency that we want. And then we actually invoke that function with the parameter that we want. And then the next example is the handler of the function. So in the lambda function itself, using our wrapper, we check if it's a warmer event. Then we handle it as a warmer. We don't execute any real thing. And then if it's a real traffic and not warming, then we actually execute a function and move on. So with all that, what happened? We tested it out with our experiments lambda function. We set a warmer to run every five minutes, triggering 20 concurrent execution of the function. And this is what we got. So to explain, the green chart, the green line, is a number of concurrent warmer that we issue to the function. So that basically means every five minutes, we are issuing 20 requests for concurrent execution of the lambda function. The blue area below is the number of cold stars that happen every now and then. So as you can see that every five minutes, we issued 20 warmer requests which result in 10 cold stars. So that means that somehow 10 of our instances got destroyed and reclaimed every five minutes or everything in between before the five minutes. Now, when we have some traffic going into the lambda function or handling the public traffic, you can see that the warmer is still happening every five minutes and it's still doing 20 concurrent execution. But the actual cold starts do not happen as frequently anymore. You can see the shape being different. As you can see here, it becomes eight. So what it means is that some of the lambda function got kept alive longer than otherwise. It wouldn't have any traffic. But then you can see the spike in the cold starts is happening as well. So what it means is that some of the instances eventually got destroyed anyway. Now, the next example is that when you have burst of traffic, so this is incoming traffic request to our lambda function. And you can see the number of requests in the y-axis there. We have burst of incoming traffic. And this is what our lambda function, the same shot, looks like. The warmer is to happen the same way. But then you can see the cold starts happen at that time because of the burst. So no matter how warm you kept your lambda function, if burst were to happen or the number of concurrency that is needed for your lambda function at a certain time is beyond the concurrency that you have reserved, cold starts will bound to happen. But what happens afterwards is that if the burst is sustained, no cold starts will be happening for the period of time that the burst keeps happening. So this is normal traffic under circumstances. So basically, if you keep the lambda function going and you have the warmer and everything's nice and cool, you handle all the normal traffic. What happened is that what we found is the amount of containers or lambda function that got reclaimed and have to restart are inconsistent. What you can see here is that you can see some of the cold starts happening 14, and some of them doesn't happen at all. And then some of them happen less frequently. So the behavior in the end is not really guaranteed on the algorithm of it being reclaimed and cold starts happening. So you can see there what's going on. So in conclusion, what we found is that containers that handle the lambda function got replaced over time. No matter how hard you try, it will be reclaimed and destroyed. So cold starts are bound to happen regardless. But the most important thing is that unused instances of lambda function or serverless function will be destroyed. So if you don't keep them warm, then it will for sure be reclaimed. Now to conclude, why are we talking about this? Because as I was composing these slides, it becomes apparent that it has a lot of problems. So why do we even try to use lambda function after keep reminding myself that as a developer, it is easy to develop and maintain because you just need to write the code and there's a place for you to run it right away. With basically minimal operations, you don't have to maintain it yourself. And the lambda function that we have been using are performing very well. We have close to zero error rate, the error that are not part of our code, close to zero. So the bugs that we produce ourselves, we have to fix. Minimal operations, basically you have to do anything. And the most important thing is you pay for use. So you don't pay for no traffic coming into your service. You only pay when you need to and the execution. And you will pay for the memory that you allocated. The thing that you have to be careful in the serverless world, like I mentioned before, is that it is stateless. So you don't want to maintain any state in your function. In fact, you cannot maintain any state in your function. You will want to use some of the external service that can handle the information to your function, for example, like Redis or other in-memory caching here, a database outside that you can call out. The other important thing that we talked about is call starts. Basically, call starts is about to happen. We just need to be able to manage and maintain them in order to get the optimum performance of your functions. And at the end, the paper and the first link describe in detail the performance of each of the cloud provider and the serverless performance and how it got reclaimed and how it got shared tenant in HVMs and containers, if you want to write down the URL there. And then we have the last link there is our serverless wrapper and some examples that I have here in there. So that is it. Thank you very much. I will upload the slides afterwards as well. So thank you.