 I'm Ali Raza and I will be presenting different aspects of serverless. I'm a PhD student at Boston University. Serverless computing. Okay, I have to hold it here then. Okay, so, alright, before starting let's look at what serverless is. So serverless decouples the application development from the back end infrastructure management so that developers can focus on solely just writing code instead of worrying about the servers. And back end management is all managed by the platform provider. For example, we have Amazon Lambda, Google Cloud Function and stuff like that. Okay, this was the boring application sort of definition. Let's look at the interesting one. So if you avoid this TV serial Silicon Valley, so there was a group of friends who were developers and after hearing about all of these companies started in Garage, they built an application and they tried to host it in their own basement. So the application got popular and a lot of users started listing their application and eventually servers started heating up. And they tried to scale it, put more servers so that they can handle more users but in the end they ended up setting the house on fire. So when I say that cloud computing or serverless decouples the back end management from software development or application development, that means that developers can just write code and give it to the cloud provider and cloud provider will deploy it and in case it gets popular or more traffic is coming they will scale up, add more nodes to your server and if nothing is going wrong like you are not getting a lot of visitors or users, you won't pay that much as well. So that's how cloud computing or serverless computing decouples the writing code from back end management. So recently there has been a great trend about serverless platforms. In 2014 Amazon Lambda came up and then in 2016 Azure Function, IBM Cloud, Google Cloud and there have been a number of open source implementations as well. One of them is Apache OpenWiz that I have personally worked with and there are others and definitely Lambda being the oldest one has more support available and there are more features there. So it's like don't take it from me like as a final verdict but they are kind of like an industry leader in that area. And now there has been like another advancement about deploying serverless computing on Edge as well. So Edge means it's closer to the user for Internet of Things and these sort of applications it would be better to have serverless computing on the Edge. So let's see how it works. A developer writes code in any of the given language. We will talk about what sort of languages are available. And then they prepare a package along with dependencies and submit it to the serverless provider. So whenever a request comes, we will call it a trigger. Whenever it's triggered, this serverless provider will spin up a container and in the container they will execute the code and then whatever the response is, it would be sent back to the user or application. So let's look at like different aspects of this separately. First about like writing code. So there are like some options about like you can pick the language. So here I'm listing like these different languages sported by different serverless platforms. Amazon Lambda again sports a lot of them. Google Cloud Function for now is sporting only two of them. And then Azure Function is also sporting a reasonable number and IBM Cloud as well. So while choosing a language is important, we will look at that in detail later. So for now, okay, let's see that whenever you have a function and you want to trigger it, it will start a container and execute your code. And that's called like the time from starting the container to executing your code is called cold start. Okay, and mostly it has been seen in different studies that different languages have different cold start delays. And mostly the languages that have interpreters like Python or Node.js, they have less cold start. And by having less cold start means the better performance for your application. Okay, and there are other things. For example, whenever you are spinning up a container, you give some resources to that container. Those resources can also affect the performance of your application or serverless function. We'll look at them as well now. Another way to reduce the cold start is like dummy inputs. Has anyone like ever deployed a serverless function here? Okay, so by dummy inputs, you mean you implement the whole logic of your serverless function, but you have one like case where you don't do anything. And whenever you send that particular input, you have a check there and you check if the input is that, you don't do anything and then just return or exit. So in that way, you are like keeping your container or function warm all the time and you won't suffer many like cold start delays. Okay, so another thing is about serverless function, they are totally stateless. That means the function will be executed, not a single value will be stored. And every time you execute a function, you have to pass everything. You can't like store any state, for example, DB connections or even global variables or anything. So for that, if you want to store the state of any variable or anything, you have to either use persistent storage from different platforms. For example, Amazon Lambda has Dynamo DB or S3 and Google also provides some storage, same with Azure and IBM Cloud Function. Or you have to use environment variable or there are always like some way around to store the variables that you want to use in future, but definitely you can't store them in function. Another thing is that whenever you are writing code, standard libraries are always supported by the platform. But if you are using some like extra libraries, somehow you have to submit them as well to the function or the platform. Amazon Lambda is slightly tricky. When you have to prepare a package, you download those libraries on your own machine and then prepare a zip folder and then you upload that zip folder there. So there can be some problems with that. For example, if you are using a different architect on your local machine, you prepare a package or you download a library, but when you deploy it on Amazon, they might trigger this virtual machine or a container in a different architect and there might be conflicts and your function might not work. So there is some extra added effort there. So Google is nice in a way that you can just specify the extra packages in a file. Let's say you have a Python and you are using NumPy. You just say NumPy in a file and Google will download it according to whatever architect they are using or operating system. So recently this open source, Apache OpenWis came and IBM is also based on OpenWis. So I'm assuming that they work similarly. What happens in OpenWis is that you can specify a container image which is like every time your function is going to spin up, they will download that container image from Docker Hub and then execute your function. But it adds the extra hassle of preparing container images. Can you hear me now? Can you hear me now? Okay. So previously, did you get anything what I said before? Okay. No, because I thought this was working. Okay. So dependency management, so different platforms have different dependency management so they can vary significantly from preparing your own packages to just specifying to preparing the container images. Okay. This is the interesting part because I'm working on this as well. So whenever if you remember here that whenever a container is spin up, this container would have some resources in terms of memory, in terms of CPU, IO bandwidth or network bandwidth. And most of this platform allows you to specify these resources available to your serverless function or indirectly to your container where this function would be executed. So here is a list of different configurable parameters or these resources that different platforms allow. Amazon Lambda lets you specify the memory available to the function. Google lets you specify memory and CPU power. And Azure doesn't let you specify anything but their pricing model is also different but like your function, if you are using Azure function, can only use up to 1.5 GB of memory other than that it might throw errors or exceptions or might not even work. And in IBM Cloud Function you can also specify memory. So why these things are important? Because the serverless platform charge you based on these resources and the execution time. So it's really important for developing a serverless application that you should know what resources that you should specify for each function because in terms it's also going to affect the performance and also the cost that you are going to pay. So here are different pricing models. For example, if in Google Cloud Function you are using 128 MB of memory and a certain CPU power and your function takes 100 millisecond to run, you will pay this price. As you see that as you increase the resources available to your function definitely performance will improve and in turn you have to pay more. Similarly for Amazon Lambda on x axis we have the memory and price per 100 milliseconds and as you specify more memory you have to pay more per 100 milliseconds. And similarly IBM Cloud Function and Azure Function also has some pricing model based on the memory and execution power, execution time of the function. So after that the next thing that comes is how do you execute this function or trigger them. So there are different sort of triggers that would also explain the usability or application of serverless functions. The first one is like if you build an application you just send an HTTP request and this function will be executed and you get the response and you do whatever you want to do with the response and then move forward. And then the second trigger is database triggers. For example you have a storage somewhere in the cloud and whenever something is being added to the database let's say you have an application of IoT application and whenever some reading is required you want to trigger some computation. Whether it's like sending a text that while your lights are on or something like that so the text sending functionality will be implemented as a serverless function and whenever your IoT device will submit data to the database our database trigger will trigger the serverless function and then in this way this application will flow. And then object storage triggers this is mainly for Amazon Lambda where they have with S3 for different operations on S3 they have like you can trigger the serverless functions and this trigger will also vary depending on the platform you are using. For example I guess I remember I think Azure doesn't allow these object storage triggers so you have to make sure that and check that. Alright so now I will talk about like different studies that some academic people or some people drivers online they did or some studies that we did but there is a disclaimer that this serverless area is changing so fast that some of the measurements that were done like 10 months ago or 8 months ago I saw them and when I did a little bit more digging I found out that some of them were not valid. So in case you find anything that you see doesn't make sense or might have changed so feel free to help me. So the first thing is we remember that every time you spin a container it would happen in some virtual machine. And there was this study from USDEX ADC where they found what are the underlying CPUs or VM configurations for the serverless execution and they found out that even though you have a same function and you execute it multiple times it's not guaranteed that every time you will have the same environment or underlying resources. So they fired 50,000 instances of different functions and they came up with this test that underlying infrastructure in terms of CPUs is like way diverse particularly about Amazon Lambda. So the second thing is like how many functions you can execute in parallel. So Amazon Lambda is best at scaling. You can do approximately 200 concurrent requests but as I said like today I also found out that you can specify concurrency so this number can go up but it's still not in millions or even like it's just like I think maximum goes to 1000. So there is another question that how you can design real-time applications with Amazon Lambda or serverless functions that we'll talk about later. So the thing is like if let's say you have a serverless function you send 2000 requests it would just execute in parallel whatever its maximum concurrency limit is and the other request would be queued. So also this study found that Azure scales to 200 instances but when they try to experiment and try to find it out it wasn't really scaling that great. So here is the cold start about different languages in Amazon Lambda. For example if you have a Python function the first time your container will be spin up it will like take the extra time but if the second request followed by the first is like quick enough then the second time you won't see these delays and here you can again see that these languages that don't have compilers or they have higher delays and the languages that have compilers or interpreters they have lower cold start delays. Here's another example this in detail cold start for different functions and here we see that Azure has significantly higher cold start. The reason for that is that Azure somehow doesn't let you specify any parameter and they say that your function can use up to 1.5 GB of memory so when they start the container they somehow try to allocate that memory beforehand. So more the resources you want to allocate to your containers more the time it would take to spin up that container. So that's why Azure has a great fluctuation and more like time cold start delays. Okay so what if you are not using your function would they keep the container up all the time? The answer is like definitely no because it would waste their resources to keep the container warm or up. So Amazon Lambda what it does that if you don't use your function for approximately half an hour they will kill the container and the next request would experience the full cold start delay and here is like approximately Google does that in two hours and also there is another thing that if you make changes to your serverless function there is some small probability that the following request might not see the changes immediately. Like whatever their consistency model is depending on that it might take some time for function like the updated function to start returning. One experiment I did with Google Cloud function and updating parameter would take ages like three minutes sometime or four minutes. So like it was easy to deploy a new function with different configurations rather than updating a function. So I hope they have worked on this but not sure yet. Okay and now as we are spinning up containers for serverless function in VMs we see that there is a great contention of resources co-residency and co-location is going to affect your IO and network performance. So okay so this was mostly like about the measurements and the other stuff. So this is what I like more interesting for me as well because I'm working on this. So in case you have any idea or you see that any assumptions or whatever I wrote can be improved or is wrong, feel free to tell me. So the first thing is what serverless should improve or what for a developer or researcher what are the interesting aspects that we can work on. So the first thing is which is more for the serverless platforms that they should add sport for new languages or more languages because right now I don't think any one of them sports more than eight languages and that is the best case Amazon Lambda. Google sports only two. So they can add more sport for more languages because when you want to translate an application which is already written in some other language it can be really painful. So if those languages are supported already that would be pretty nice. And also right now you don't know the underlying architect. What is, for example, you know that it would be running somewhere in the container but what if you want to implement something with GPUs or some specialized hardware. So it would be nice to at least let user have control of specifying if they can like the underlying architect or some accelerator like mostly machine learning applications take that, like use those hardware components. So as we have seen that performance consistency is like not greatly fluctuated like not fluctuates a lot but because the underlying hardware changes or the VM configurations change so performance can also be like varying. So it can also affect it by co-location. For example, if you have multiple requests coming at the same time. So that is like one limitation for designing real-time applications if you want to do something with serverless. Okay, so this is one experiment that we did recently that we deployed a function on Amazon Lambda and without like doing anything else we just changed the memory available for the function and every time we invoke it with some particular memory value we plot the runtime for that function. As you can see that when the function had less memory, 128 MB, the runtime was pretty high, around 15 seconds. And as we started increasing the memory available for the function, the runtime decreased like exponentially. Okay, and here we have the price that we paid for corresponding runs. So for 128 MB we paid this much price but there was a really high price because per 100 millisecond price for 3 GB memory is pretty high. So that's why it went really up. But as we can see that there is a sweet spot about performance and performance of this function. If you pick a memory around 800, your performance is still way better than the minimum memory and you are paying the least cost that's possible for that function to run. But there is no recommendation system as of now because as we can see that this thing is affecting the performance and the cost that you are paying, there is no system that is working on recommending any parameters for a serverless function. So in the last one year we developed a system, it's a machine learning based approach that takes your serverless function and with minimum sampling it can give you the best parameters for a serverless function that would also not only save you cost but also improve the performance. So here are the results. I can't go into the detail. This paper is under submission. But here are the results that we deployed the same function and then our system is named as COSY, configuring serverless functions. The cost paid by our system was less than the minimum memory pay, whatever the minimum memory would pay or the maximum memory would pay but our performance was pretty close to the maximum memory, whatever maximum memory would give. So we got performance around here, let's say, and we paid cost around here. And it was all automated. We didn't do any human intervention. It was just you gave the function and it will give you the parameters that are suitable for your function. Okay. Another thing that I thought it might be novel but today in the morning I found this blog where they are already doing it that different serverless platforms have different strong points. For example, one of them might have pretty good CPUs so CPU heavy function can go there. Amazon might have diverse architects so if your functions can somehow improve better, you deploy them there. So can we decompose applications? Given an application, depending on each serverless function, you decide where this function should go, whether in connection with the memory or the databases you are using. This is the thing that I am right now currently also working for my PhD so in case you have any idea, feel free to talk to me. And also I found this blog where they are talking about this company, Shamrock, I guess, yeah. It's an advising company and they are using both Amazon Lambda and Google Cloud for different tasks so they found that the machine learning or image-related functionality can be done better on Google functions so they took the functionality of the application which was related to that and deployed it on Google and they kept the rest on Amazon Lambda which was pretty cool for me. So finally I would say that serverless platforms are, I think, the future. People are using it a lot and there is a lot of development going on but as we have seen, I'm sure there are more problems there or there is more room for innovation. People can build or improve it way better and as this paper from Berkeley predicted 10 years ago, the serverless use will skyrocket. I think that's pretty much true. Thanks. If you have any questions, feel free to ask. Yes, I have a question. You mentioned that different cloud providers have different rates of charging. Does that mean, for example, that Azure is charging by the second and GCP, for example, charging by the one-hundredth of a millisecond? What is the granularity and does your algorithm take this into account? So the algorithm that we designed, it takes into account that because it only takes the runtime of the function and whatever cost it paid. So that's depending on the serverless platform. Whatever cost they charge us, we use that cost. So it doesn't really affect if they have different costs or model. That would be on their side. They would just tell us a number of that cost in our algorithm. Does that answer your question? Yes. Another question regarding the cold start. You mentioned that when you show the research that each time a function would run, it doesn't mean necessarily it will run on the same container. Then that will cause it to cold start again. Yes. Does this cold start affect how to say it reduces the attraction of using the serverless because if we cannot assure that we are running at least the container after it starts, yes. So the thing like that mainly depends on your application. For example, if your application isn't that popular and you are not getting a lot of requests, so then it might happen that after some time, serverless provider would kill the container and they have to spin up again. But if your application is popular or you are getting a lot of requests, so this study also showed that the cold start delays would be like for a reasonable application would be less than 1%. So, yeah. It only depends on the developer if they care about that 1% users or not. So the graph where you were showing the different performance as a function of memory use. Yes. Was that a serverless function that needed a lot of memory or was it something more in the infrastructure itself that made that difference? So this was a CPU heavy function. We took like a million numbers and we were using calculating a 10 of the function. So which is CPU heavy traditionally this trigonometric function. So the thing is Amazon Lambda assigns the CPU share proportion to the memory so that way memory was directly affecting the performance of the function. Yeah. All right. Thank you.