 Hi, everyone. Today, we are going to give a talk on fast execution for function compositions in serverless computing. So this will be presented by myself, Retran Chen and my colleague, Stanley Akin-Akus, from Nokia Bell Labs. So I hope you will all enjoy our talk. So cloud computing has been a huge success. So in the past few years, there is a trend in cloud computing to enable application developers to focus on only the business logic of their applications and to abstract away from the actual infrastructure, which is running their applications. So this trend is so-called serverless computing. So as the name indicates, serverless computing refers to the concept of building and running applications that do not require server management. So it describes a very fine-grained development and deployment model, where applications are composed of a number of functions, and each function can be individually executed, scaled, built in response to the incoming traffic. So all major cloud providers, including Amazon, Microsoft, Google, IBM, so they have all embraced serverless computing concept. So so far, the current serverless practices can deal with the highly parallelizable tasks and can handle this sporadic, unpredictable node in a stateless manner. So companies in different industries are migrating their applications onto serverless platforms, such as data analytics, video encoding, decoding, events processing, and so on and so forth. So this certainly covers a lot of real-world use cases. However, the current serverless practices still can't live up to its full potential. For example, it couldn't support latency-sensitive applications and couldn't support applications which require faster state access. So those missing capabilities are critical to many modern applications. For example, many applications at the network edge require the short latency. And also, for example, the video processing applications that require the state management and the faster state access. So the current serverless platforms couldn't efficiently support those applications. So in this talk, we will show you how we can extend the applicability of the serverless computing to support those types of applications. So here is the outline of this talk. So first, we will give an overview of the serverless computing paradigm and its wide adoption. So while it's quite promising, so we will still show you what the problems are with the current serverless practices regarding function invocation delays and application state management. So afterwards, we will dive a little deeper into the existing serverless platforms and then give our approaches to address the problems of the existing platforms. So towards the end, we will give two demos to showcase the benefits of our approaches. So let's first take a look at the serverless computing in general and its adoption. So here is how serverless computing works at a very high level. So with serverless computing, a developer doesn't upload the entire application onto the serverless platform. Instead, the developer uploads the code of each individual function of his application onto the serverless platform. Then the developer will define what the events which can trigger the execution of the functions. So here the events can be anything like database triggers, timer events, or the events can be some detected network anomalies, some sensor ratings from the IoT devices, and so on and so forth. So basically, the developers can define any types of events. So whenever there is such an event coming in, the serverless platform will create an instance of the associated function to process the event and generate the result. Of course, if there are multiple concurrent events coming in, the serverless platform will simply create multiple instances of the function to process them in parallel and generate results. So those results can be returned back to the users or can serve as the events to trigger other functions in this workflow or in the application. So basically, this is how serverless computing works in general. The functions of an application can be executed, scaled, and build individually. So with serverless computing, there are several significant benefits for application developers. First, developers don't need to care about server resources and the server management. So they just simply deploy their application functions into serverless platform, and the platform will manage the applications for them. So there is no server management for developers. The second benefit is that an application is now consisting of individual functions. So each function can be managed and scaled up and done separately and individually. So not like before where the entire application or a big component is a unit for scaling. So with serverless computing, the function is a unit for scaling. So whenever there is an event coming in, the serverless platform creates a function instance to process the event. So once the event has been processed, the function instance is terminated. So this needs to greater scalability. So the third benefit is that with serverless computing, the platform will schedule the necessary number of function instances in real time to handle the incoming events, no more, no less. So this needs to lower cost for users because they only pay for what they have actually used, not like before with virtual machines or containers. So users usually have to pay for what they have reserved. So with those three benefits, serverless computing further enables higher productivity because now developers don't need to care too much about server management, scaling and cost. As a result, developers can more focus on application logic. So this needs to higher productivity. So here, let me give a concrete example to show that serverless computing is being widely adopted. So as we know that AWS is one of the market leaders in cloud computing and its AWS Lambda platform is also the pioneer in the serverless computing area. So at the beginning of 2020, Datadog performed an analysis over its customers who use AWS services. So from this Datadog diagram, so you can see that nearly half of those customers, they have now adopted AWS Lambda. Also, the AWS Lambda is quite popular among the company's running containers in AWS. So from this diagram, we can also see that nearly 80% of the companies in AWS that are running containers have now adopted Lambda. So those reports show that AWS Lambda or more generally speaking, serverless computing platforms are no longer limited to early adopters or niche use cases. So instead, serverless computing is widely adopted for various real-world use cases in various different industries. So serverless computing seems quite promising so far and has gotten a lot of adoption, but it doesn't solve everything at least yet. It still gets some significant problems which limits the applicability of serverless computing. So next, let's talk about those problems. So as I just mentioned, serverless computing does not solve everything yet. The current serverless practices are good at dealing with concurrent and easy to paralyze jobs and are good at handling sporadic, unpredictable traffic. So this is simply because that whenever the serverless platform receives an event coming in, it will create an associated function instance to process the event. And the many concurrent events can be processed in parallel. In addition, the serverless platforms usually work in a stateless manner. Whenever a function instance has finished, it will be terminated, but there is a big however. So the current serverless practice is bad at dealing with two kinds of applications. First, it can easily support the latency sensitive applications which need to react fast on events. For example, many applications as a network edge are latency sensitive and cannot tolerate, for example, an extra delays of hundreds of milliseconds when start. So to give you a better understanding of this problem, so we have run a typical image processing application which is composed of four functions. The first function extracts the metadata from their original image and then process the metadata recognized objects in the image and finally resize the image. So we have run this application on two serverless platforms including AWS step functions and IBM cloud functions. So here are the results with the average of 10 runs with warm starts. So the blue bars here indicates the total runtime of this application and the purple bars indicates the actual compute time of the four functions which compose this application. So ideally for both platforms, the purple bars which are the actual compute time of those four functions should be the same but they are a bit different because of the different hardware of those serverless platforms. So then the difference between the blue and the purple bars is the overhead introduced by those serverless platforms which is fairly significant as you can see. So for example for AWS, the actual compute time is a little over 400 milliseconds and the serverless platform incurs an extra of 200 milliseconds overhead. So this is an overhead of 50%. For IBM cloud functions, the overhead for this image processing application is even higher over 100%. So this substantial overhead severely affects the applicability of the existing serverless platforms to support net as a sensitive applications. So in addition, the current serverless practice also doesn't support applications which needs fast access to state. For example, in a video processing application, some states needs to be maintained across the processing of different video frames. As a result, the fast access to the application state is also needed. However, in a typical serverless platform, so functions are executed in a stateless manner and function instances in the platform are not addressable and there is no direct access to functions in memory states. So this needs to a situation where if one function wants to transfer the state to another function. So it needs to trigger the next function and pass over the state. Or as an alternative, the first function externalizes the state to a storage and then the next function can retrieve the state from that storage. So this needs to snow access to the application state. So before I describe our approach to those two problems, let me first dive a little deeper into the existing serverless platforms. So in a typical serverless platform, so that there are multiple hosts and those hosts are interconnected via a method bus or other mechanisms like service mesh or something similar. As I mentioned before, so with serverless computing, developers upload functions individually. Let's say there are two functions. One is red and the one is blue. So in a serverless platform, functions are usually isolated with containers always light with the ends. So here, let's say the red function is loaded into the container in red and the blue function is loaded into the container in blue. So those function containers are then deployed onto the hosts where there are enough resources available. For example, the red container which is hosting the red function is now deployed onto host one because this host has enough resources. And here, the blue container is also deployed onto host one which still has enough resources. But of course, if there are not enough resources, the serverless platform will deploy the function in containers onto other hosts according to certain policies. So those function containers handle the associated events and stay deployed until a timeout. So if there are multiple functions and the function containers, if they want to interact with each other, the interaction is usually using a distributed message bus, service mesh, or other similar techniques. So what are the implications of the existing serverless platforms? So there are three aspects. So regarding the function execution, so in the existing serverless platforms, there are a few options. So the original approach is to start a new container for every function execution. So this is so called a code start. So before the function gets actually executed, the code start process needs to set up a container, note the runtime, note the function code as well as the libraries and dependencies are required by the function code. So depending on the complexity of the function, so this code start can easily take seconds or even tens of seconds before the function gets actually executed. So this is a significant overhead because quite often the function per se takes only hundreds of milliseconds to execute. So people have realized the overhead of the code start and proposed the warm start approach. That is instead of starting a new container or light with VM for every function execution and determining it once the function has been executed. So we can actually keep the container warm and idle after the function execution has finished. So maybe it's better to reduce the idle existing containers. So this can skip many of the steps in code start and reduces the invocation delay of a function significantly. But those warm containers would unnecessarily occupy system resources. In addition, either the platform or the users as they need to manage those warm containers in certain ways. So the third approach is that if the traffic is more regular or more predictable, so users can provision the usage of the containers and the light with VMs to process the traffic. So in this case, the users will have to pay for the provision containers or VMs even if they are not being used. So for example, AWS provisions concurrency is working in this manner. So the second aspect is about the function concurrency. So if there are multiple concurrent events with the above three approaches, either you have to start a new container which may be slow or you can reuse the existing or provision containers if some of them are idle and available. Otherwise you would still have to start new containers or the concurrent events have to be queued until some containers become idle and available. So the third aspect is about the function interactions. So in many existing serverless platforms, if functions want to interact with each other, they need to go through distributed messaging but with very little locality consideration. So in this talk, we propose a set of mechanisms to enable the fast execution for function compositions in serverless computing. So our goal here is to address the two pain points in the existing serverless practices. The first goal is to reduce the invocation and the interaction delays for function executions. And the second goal is to enable the fast access to the application state. To achieve those goals, we developed three mechanisms namely application-level sandboxing, local messaging and the addressable function executions. So we have designed and developed an open-source serverless platform called KNICS to realize those mechanisms. So with this, I now hand over to Akin who will work you through those mechanisms. Towards the end of this talk, Akin will also use our KNICS serverless platform to showcase the benefits of those three mechanisms that we propose. Let's start with how to reduce function invocation delays. The first approach that we propose is the application-level sandboxing. Imagine there are four functions. The red one and the blue one are composing application one. The purple and the yellow one are composing application two. In many serverless platforms, each such function is treated independently regardless of the applications they belong to. Our insight here is that the different concepts like functions and applications should have different fault isolation. Strong isolation is desired between different applications. However, such a strong isolation between the functions of the same application may not be needed. The rationale behind this is that the functions of the same application tend to interact with each other and they are usually from the same developer. So as a result, isolating them if they are total strangers might be actually redundant. On the other hand, we would want to have strong isolation between different applications because we don't want them to interfere with each other. With the insight of two-level isolation, we propose the application-level sandboxing. We put different applications in separate containers. The functions of that application are run as separate processes in the same container and we load each function into its own function worker along with the dependencies. When there's a new event, the associated function worker forks a new process to handle that event. And when there are concurrent events, we simply fork multiple processes as function instances to process those concurrent events. This application-level sandboxing has two beneficial implications. First, forking a process is much faster than starting new containers, such that invocation delays of function executions are actually significantly reduced. Second, once a function instance has finished processing, the operating system will terminate the associated process. As a result, we can benefit from the automatic deallocation and allocation of resources. Our second proposal is to use locality to reduce function interaction latencies. Recall that in existing platforms, functions usually interact through a message bus that is distributed across the infrastructure. As a result, even if these functions are running on the same host, this interaction has to go through this bus. Our insight here is that we know that the functions of an application are going to interact with each other. As a result, we can create shortcuts for these interactions. In other words, we'll exploit the locality of these functions that was created by the application sandboxing approach. Let me show you how this would look like. Basically, we can have local message exchange mechanisms inside each sandbox. For example, a local message bus. And when functions interact with each other, they will do so locally. Doing so usually is faster than accessing a global message bus that is distributed across the infrastructure. And this approach has the following implications. First of all, these local shortcuts for interacting functions of an application contribute to reducing invocation latencies, which in turn reduces function interaction latencies. And second, we have dedicated messaging per application sandbox, meaning that it cannot be interfered by others. Let's revisit that image processing application that we had seen earlier. Recall that we have four interacting functions that compose this application. With the ideas of the application level sandboxing and the local messaging, Canex can achieve the results that you see on the right-hand side. As you can see, the overhead associated with facilitating the interactions among these functions is significantly reduced. As a result, the total runtime is much closer to the total compute time of the functions. So far, we have presented ideas to make regular function executions and their interactions faster. These regular functions are stateless, and if an application requires to keep state, the developers are usually forced to store that state in an external storage. What I want to do next is to revisit that concept a bit. What if we could make these function executions addressable, which is not possible via regular function executions today? And what if we could manage these addressable function executions similar to the way we manage the regular functions? We could start them on demand, use them as needed, and stop them when not needed anymore. In other words, we could still preserve the on-demand computing principle of serverless computing. But what does it mean to have addressable functions? It means that one can send multiple messages to the same function instance that has already started executing. And the biggest reason why this is desirable is that one can access that function instances in memory state, which could be keeping the application state. And as a result, an application can benefit from the fast access and still enjoy the benefits of serverless computing. Let me show you how this works. We still invoke an addressable function like a regular function. Upon receiving a message, the function worker still forks itself to create an instance. The difference here is that a regular function execution finishes and disappears, whereas an addressable function instance stays running. To find this instance again, we keep some routing metadata. And when there is a need, the platform uses this metadata and finds the running instance and delivers the messages using the local message bus. The implications are the following. First, the running function execution can keep the application state in memory and access it fast. And by being able to address it, the rest of the functions in the application can also access that state. Second, it is still a serverless function instance. When it is not needed anymore, it can be stopped by sending it another message. Next, I'll show you two demo applications. The first one is a simple application that showcases the sandboxing and the local messaging ideas. I'll try to show you how these choices lead to significant reductions in function interaction latencies. The second one is another serverless application that is slightly more sophisticated and is more of a proof of concept. I'll first show you a serverless application that is utilizing regular, stateless functions to recognize faces in live video without using a GPU. Then, I'll show you how this application can benefit from using an addressable function for its latency-sensitive tasks, again, without using a GPU. For these demos, I'll be using a single-host setup with all Canex components running on bare metal. Here is the first demo application. Like I said before, it is pretty simple. There are only two functions. The first function, F1, just takes a timestamp and triggers the second function with that. The second function, F2, takes another timestamp at the beginning of its execution. As a result, we can measure the latency between F1's end and F2's start, which is the function interaction latency. In other words, we'll measure how much latency overhead the platform is adding to facilitate the interaction between these two functions. While showing you this demo, I'll also briefly walk you through the GUI dashboard of Canex just to give you an idea. In the dashboard, you can see the functions, the workflows, the key value object store, as well as some documentation about the API that we exposed to user code and the usage of the SDK. Let me create these two simple functions. And here I can add first the function one and I could actually just type in the code here in the inline code editor, but I'm just going to upload the code from my computer. Afterwards, I'll also add function two. Similarly, I'm just going to add that code from my computer. And next, what I need to do is essentially to stitch these two functions together in a workflow. And therefore, I go to the workflow tab. I add a new workflow that's going to be called WF interaction latency. And here I can also get an inline editor where I can basically just type in my workflow description, but I'm just going to play it safe and then upload it from my computer. And after uploading it, I save that workflow description. And later on, I can see it again in the workflow editor. So here you can see that my workflow starts with the first state and it is using the resource that is F1, that is our first function. And then the next one that needs to be triggered with function one's output is essentially the second function, basically F2. And then afterwards, our workflow ends. Let's deploy this workflow. As you may recall, this is running on a bare metal setup. And therefore, we directly get the endpoint of the container address of that application. And we can copy this URL and then use it in our scripts to essentially programmatically trigger this application if we want to. Here I'm just going to basically use the GUI and then open the execution window where I'm just going to type in some sample input and then execute it. As you can see, after a while, we basically get the execution visualization and you can see how the message essentially flows through the workflow. It went to function one and then to function two and then afterwards it ended. And in the execution timeline, you can see how long it took for each function to finish execution as well as you can deduce the actual function difference between the two. Now I can execute this again and then in the execution output again, I can see the milliseconds that it took from first functions end to go to the second functions beginning. So just to give you a quick recap, the application sandboxing approach and the local messaging ideas that we presented in this talk are actually enabling KNICs to significantly reduce function invocation and interaction latencies. Let's see the other application. Here's the first version of the second demo application. It is a proof of concept that shows that one can recognize faces in live video with serverless functions. Basically, in this version, we try to recognize faces without any addressable functions. Let me briefly describe the workflow. We have two execution paths, training and recognition. The training path is for us to upload some pictures and attach names to those pictures. We detect the faces and encode their signatures which we store in our data store. These signatures are later going to be used to recognize these faces in the video frames we receive. The second execution path is about the recognition. For that, we launch a set of parallel recognizers and each recognizer produces a binary response, basically a yes or a no, whether a detected face belongs to a known person in the dataset. For example, if there are three people in our dataset and we have a single detected face in the current frame, there will be three recognizers. When these recognizers finish, they send their results back, namely the coordinates of the faces and the names attached to them. For this application, I'll be switching to the terminal. Here, we can utilize an SDK to programmatically interact with the Canix platform. Let me start that application and let's see how it looks. Using the SDK, we upload the functions as well as the workflow description and then when the application gets deployed, we upload some training pictures and as you can see, my face is being detected and recognized. However, the frame rate is not great. That's the reason why the video looks a bit choppy and the face box is moving around, not in a smooth way. Now I want to show you another version of our face recognition application that takes advantage of the addressable functions. The biggest difference in the workflow is the addressable function that is called coordinator. Before I describe what exactly the coordinator does, let me briefly explain the simple idea of this approach. The recognition task is computationally a more expensive task than tracking a detected object. That is why the previous version's output was not very smooth and had low frame rates. There was no tracking. It was always trying to recognize faces from scratch. In this version, we'll try to improve that using an addressable function. Here, we will recognize faces once in a while and the rest of the time, we will track the already detected and recognized faces. And we will achieve this by utilizing the addressable function coordinator. What the coordinator does is two-fold. Every 30 frames, it launches a group of recognizers similar to the previous case, where each recognizer produces a binary response, whether a detected face belongs to a known person in the dataset. When they finish, their responses are sent back to the addressable coordinator. The coordinator then starts tracking these detected and recognized faces. This tracking data structure is the application state that requires fast access in order to provide a timely response for each frame. Let's see how that works. We now start the application that uses an addressable function. We again upload the functions and the workflow, and we also upload the same training pictures. As you can see, the frame rate is much higher and the video output is much smoother. When a recognition task is done, the application can continue tracking the detected faces in successive frames. And as a result, it can produce a timely response for a better output. This is achieved by keeping the application state in a function's memory and by keeping that state up to date by addressing the function instance. You can also try Canex for yourselves. We have a test system running at canex.io that is incorporating the ideas that we presented in this talk to achieve lower startup and interaction latencies as well as to enable addressable function executions. You can also install Canex on your own infrastructure. We have health charts if you want to install it on Kubernetes with Knative, or we have Ansible scripts for bare metal or VM clusters. And finally, here are some useful links. All the code is available at GitHub along with the installation instructions. We also have a Slack channel that you can join to learn more about Canex or get help if you need any. And finally, the SDK and the command line tool are available at PyPy. Thank you.