 So, hello everyone. Thanks for joining the talk. My name is Andrei. I am a PAD student at the University of Vietnam Alps, also in a partnership with RIAX Technologies. And today, I'm going to talk about scheduling policies for serverless-based Edge Cloud Continuum. More specifically, I will start with a container layer, our scheduling policy, and then some new progress that we have made with a few other objectives. So, this project is also funded by the MIAE project and the physics project, European project. And talking about physics, the goal is to provide a visual programming environment to create serverless workflows. And we had other two talks here at DevConf. One was yesterday, we had this workshop. And tomorrow, Yanis is going to present this other one. So, if you are interested, please join it. And so, to start, I will define a few concepts and then the Edge Cloud Continuum. For us, we understand the Edge Cloud Continuum as this infrastructure composite of several different layers. Called the Cloud Clusters, Edge Clusters, or Edge Resource. On Cloud Clusters, we have the Global Continuum layer, where we have several different clusters. And in the local level, we have the Edge Clusters, our Fog, and the Edge Resource. So, the idea here is that we come from this big view of the clusters and go to the local ones. And of course, on the clusters, we have the big machines that have more power resource. And then, as far as we go to the Edge, we go to less resource and more mobile machines. When we talk about serverless, to define serverless, I would like to do the transition from the cloud to the serverless. And so, when we have cloud platforms in general, we have this generic scenario here, where we have developers that need to develop big applications and deal with all the settings of the platform, and then put their applications to run there. And this application, we will stay there, let's say, forever, I mean, it continues there. And the user will pay by the machine hosting time. So as far as it's there, it will pay for that. When we go to the serverless, we have two main points. The first one is that we will split this view of the scenario. And now, the user, the final user, which can be a data scientist, a developer, and so on, we will just see this part of the scenario. So we split in half. And now, the developer just needs to deal with functions, which are much smaller pieces of code instead of whole applications, just small functions, and just a few settings, let's say, amount of CPU, amount of memory. And the second point, the main point of serverless, sorry, OK. And the second main point is that the other part will be full managed by the platform providers. So the platform provider needs to offer and to deliver everything about the platform. So scalability, provisioning of machines, and so on. So the user will just deploy their functions. And the platform will provide everything that is needed to run those functions. And then in the end, the final user will just be targeted by function execution time. And these functions are not staying there as before. Now we are talking about stateless functions and functions that will be triggered by events. So these functions are going to be executed just when needed. So one function is triggered. It's deployed on the platform, and then it's executed. And then with the scheduling policies here, we want to work on a few objectives which can be cost, can be energy, or time. So today I'm going to talk a bit about these three objectives. The first part of this presentation I'm going to show a few results of this paper that we just published at CC Grid on May of this year. So if you are interested, here are the QR codes for our repository with all the reproducible artifacts. So the paper is completely reproducible. So if you're interested, please check. And I'm going to start with that. And then in the end, to present a new step, our recent progress. So as I was talking about serverless computing, here are a few main points that we focus it when working on this first step. First, when we talk about serverless, for us, we talk a lot about containers. So here we are talking about functions that are deployed inside and executed inside containers. These containers deployment are not negligible. So through our studies and a few other papers, we understood that the time to deploy containers sometimes take longer than the function execution by itself. Because on serverless, we are talking about fast execution functions. So we are talking about minutes. And sometimes containers can take longer than that if we are downloaded the container from afar, let's say, cloud and et cetera. But we can share the layers of the containers. So in this first project, the idea is to share the container layers, even if you are not talking about the same containers. So here we have the infrastructure of the Cloud Continuum that I presented and our motivation. So we want to reduce the amount of data downloaded to deploy the containers and also the amount of data downloaded to upload and download functions, input, and output. So in this scenario, we can say that functions are triggered from the edge, are deployed here on the local level, but the containers come from the cloud. So we want to reduce the amount of data managed on both directions. And when we talk about the containers, we want to reduce the total amount of download. How we do that? We proposed a function orchestration algorithm, so we called it FOA, that it's based on a linear program that will optimize the amount of data downloaded, respecting a constraint of makespan. So the idea of this linear program is that we optimize the entire replacement of the functions on the platforms, respecting a constraint of makespan. And if we are not happy with the output of this linear program in terms of makespan, then we are going to run this looping here, going from the output of the second step and coming back to the linear program, reducing our restriction of makespan. So let's say we optimize it, the whole containers placement, but still it outputs a solution that will take hours to be executed. So then we take this output, we say, no, I want this in half, this makespan, cut it in half, and try to compute a new solution. And then we go on and on as far as we are satisfied with the solution. Then we will have here, we arrive at step three, when we will optimize the download of the layers. And then we will have the final schedule output. A few reference for the linear program and the minimal cost integration, it's based on a dual approximation algorithm of Moisan Tardosh, he's the paper here. And our experimental protocol. We are doing it through simulations. And to do that, first, we adapted functions on a benchmark called a function bench. We deployed a serverless platform called OpenWisk on top of an academic cluster called Grid5000 in France. And with that, we measured and calibrated the results. So we run this benchmark for a while, calibrated how long each function takes, and then we could build our workloads for our simulations. Why? Because we are running a linear program, so it's an offline approach. So we need to know every information in advance. To evaluate our scheduling policies, we have this simulated environment on top of Batsin and Syngrid, with our combination of a batch scheduling simulator. And as I mentioned, everything is open source and available. Our design of experiments, we varied sizes of workload, size of platforms, heterogeneity level. Here, we also investigate the heterogeneity of the platforms by changing the CPU speed. Right now, we are talking about light levels of heterogeneity by changing the CPU speed of the machines. So when I say that we have different levels of heterogeneity, it means different clusters, the number of different clusters that have the same CPU machine. We varied our two scheduling policies, FOA, and Kubernetes image locality. We tried to reproduce Kubernetes image locality policy, which is basically a first come, first serve approach. Taking into account the image, the container image. So Kubernetes, we will check if on the platform we already have the container deployed in some machine. And if it is there, it will use the same node. But it just looks for the entire containers. Here is our novelty that we want to check also the layers of the containers, not on entire containers. Here is the table of all the functions that we adapted from function bench and the different input values that we're using. So we have functions like mathematical functions like a million float operation, link pack, metrics, multiplication, and also another functions like image processing, video processing, and so on. Here is the simulated infrastructure of Batsyn and Syngrid. So the idea is that Batsyn will run the simulation on the first big square. And then Syngrid will perform the simulation on the real infrastructure. So they communicate to that. And we needed to add a few layers on top of these tools to perform like a serverless platform. So we added an extra layer here on the profiles of the workloads of Batsyn to use container layers. And so here to show a few open source contributions that we did on the project here are the function bench. Here is the function bench repository with our functions already merged there. And then we go to our experimental results. The figures that I'm going to show now will follow this structure. By the small facets here, the small squares, we combine workload size, platform size, and with the X here, we are showing the makespan. And on the IX, we are showing the amount of data downloaded. So what we have on this figure are the Pareto curves where we investigated the trade-off between reducing the amount of data download and the makespan. What does it mean? It means that we cannot reduce both at the same time in an optimal way. So if you want to reduce more makespan, OK, we need to relax a bit on the amount of data downloaded. If you want to optimize the amount of data downloaded, we need to relax a bit the makespan. And here with the colors, we are showing the repetitions of our algorithm that I illustrated in the first step. So the idea is that all the first solutions are here on the extreme right with the best cost with the best cost. So with the minimum cost, but with big makespan. So as I mentioned, if you are not satisfied with the makespan, we constraint it in half. And then we go to step two and we perform the algorithm. So we can see that we reduce the makespan, but increase a bit the amount of data downloaded. And then we go on, and go on, and go on. And then the last iterations show the best makespan, but the worst amount of data downloaded. So what we conclude here is in all scenarios, about three or four repetitions are enough for us. We don't go to the last one, because we will lose in terms of amount of data downloaded. And then now I'm going to show the specific objectives that we had. So here I'm comparing, again, with the same structure of combining workload and platform size. But now in the x-axe, I'm showing the heterogenic level of the platforms. And in the I-axe, I'm showing the amount of data downloaded. So here we are seeing the difference between our approach for and our baseline in terms of amount of data downloaded. And what we can see is that for outperforms image locality in almost two levels of magnitude in terms of this objective. So yes, FOA can do much better, because Kubernetes image locality actually has this grid approach. And we are focused on reduce this amount of data downloaded. What we can see also with the different combinations of size is that when we have very loaded platforms, we don't have too much choice. Because if we have, for example, here, we have 10 functions per machine. So we have a lot of functions, small platforms. We don't have too much options to say, OK, I want these functions to go here or there. But when we have a lot of options, FOA can behave much better. And the next result is the number of machines used. Here, it was not one of the objectives. Again, the objective was to reduce makespan and amount of data downloaded. And in the end, analyzing the results, we just checked that we did desoptimization at the same time that we used less machines. So in terms of serverless computing, where you pay for as far as you use, it's a much very good result. So we use less machines and optimize both parameters. So the conclusions at this part is that grid algorithms may not profit from the heterogeneous. Yeah, sorry, I forgot to mention that. We can see that as far as we change the heterogeneity of the platforms, the grid algorithm does not change the behavior, while we can see that FOA can adapt a bit better. So they may not profit from heterogeneity. FOA outperformed the baseline in terms of data transfers makespan in addition to system optimization, which means the number of machine used by up to two orders of magnitude. And FOA then minimizes code start delays. So if we minimize the amount of data downloaded, it means that we are reusing more and more container layers. So we optimize code start delays and it speedups function execution. So if we don't spend too much time deploying containers, we start the functions as soon as possible. However, at this step FOA is still very time consuming because it's based on a linear program which has many variables and it performs in order of minutes while our baseline perform in order of seconds. So this is the biggest trade-off or our approach. And so this is one of the main points that we work on this next step that I'm going to present now. So the point now is this was the motivation and let's say the objective of the first part. We want to continue with the good results. We want to improve it FOA in terms of performance. And we added one more objective to this multi-objective approach. We want also to reduce the energy consumption of the platform. So now we do that by splitting our scheduling policy in two phases. So now we are talking about the two levels of scheduling policies. One that will run on the global continuum. So at this step we are going to decide in which cluster each function are going to run. And then at the local level, we will decide in which machine inside such a cluster this function will run. So we do that by reducing at the global level the energy consumption and the function execution time. And at the local level, we are going to use our layers, our scheduling policy to optimize it. At first, since we are going to work with energy consumption, we have started to try to use the Kepler. So we are in touch with the community and the work is in progress to be able to use it. And I'm going to show a bit how we, these investigations in terms of energy consumption. Since we do not have yet something to run on our Kubernetes platforms yet, we are using power wattmeters on direct on machines to analyze the energy consumption at the first step. Again, to model our workloads, to prepare the inputs for our new linear program. So the first step is to analyze the energy consumption. Here we have an instance with three nodes with open-wisk running on it. And we have the first node is the master and we have two workers, but just the third work was used. And so we can see that the two nodes that are, let's say idle are still very energy consuming. But here we can see that the functions are running and then if we cut this and zoom it, we have this figure on the right. So here I'm analyzing the energy consumption, not the energy consumption yet, sorry, the instant power per time for different kind of functions. And then with that we computed the energy consumption by computing the integral of this area. Before that, even if we zoom a little bit more on a few functions, we can see the behaviors of the functions. So we can see here, our first investigations point that, for example, we have a few spikes at the beginning of each execution. And these spikes may become from the containers deployment. So the preparation of the environment to execute the function may consume a few more of energy and then we execute the functions. Here I'm showing different input sizes for the same function. So link back on left and Camille on the right. Okay, so once we have it, we computed the energy consumption, then we remodeled our workloads and we came back to the same steps that are the same environment setup that I showed previously. So we executed everything on top of OpenWis can read 5,000 model workloads and then we performed the simulations with the new algorithm. So it's still a linear program. We still do the repetitions to optimize the makespan as far as we are not satisfied with the final makespan of the solution. And now I'm going to show a few results. So the same structure for the plots with the combinations of workload size and platform size. We can see that first, we still do much better in terms of amount of data downloaded. Here, now our baseline is not anymore like Kubernetes image locality, but because now we are talking about two levels of scheduling policy. So our baseline also has two levels of scheduling policy. So now we implemented a first come first sever at the global level and then cache image locality on local level. So the same, we are optimizing a lot in terms of amount of data downloaded. In scenarios that are not that loaded, we don't have a big difference, but when we can save resources and improve these objectives, we can do that. Number of machine users, the same. We also reduce the number of machine users. As a grid algorithm, our baselines tends to use all the machines available, but we not. And the most important part right now is the energy consumption. So with our preliminary experiments in this direction, we already have good results showing that we can reduce the energy consumption in the median, like half in the median in all the scenarios. And we are going to continue on that. So the conclusions for the second part is still, the grid algorithm may not follow the difference of the heterogeneity of our platform, while FOA does. The new FOA, FOA E outperforms the baseline for data transfer and energy consumption in addition to yet reduce the system utilization, but now we lost in terms of makespan. So now we are not doing better than the baseline in terms of makespan yet, so this is the one of the directions that we are going to work on. Again, we minimize code start delays by minimizing the amount of data downloaded. And now the very good new is that FOA performs in terms of seconds and not anymore in terms of minutes. So it was one of the goals between our two steps, because now we can say that FOA is reasonable to run on made products on Kubernetes. So future work. Sorry, to continue improve FOA's model and try also other linear solvers. To study applications that can be modeled as workflows, because right now we are talking about stateless functions. We are talking about batch of stateless functions. We want to include Kepler for a continuous measurement of the energy consumption. So if we can continuously measure the energy consumption of the platform, maybe we can turn from an offline approach to online approach. And to continue this investigation towards the reduction of the energy consumption. As I showed, the platform by itself, it's also energy consuming and we did not take that into account yet in our model. We are just optimizing the functions. So optimizing also the platform, maybe one of the directions. And we are working on the real implementation on Kubernetes for both of our scheduling policies. Imagine locality is already in progress to already schedule talks on the community of Kubernetes to put it on the main branch of Kubernetes. And FOA is in on development. So thank you very much. Again, this is the reference for the paper that we published for the first part. Please scan the QR code if you are interested. And then on my GitHub repository, you can also found the results for these new steps that I presented today. Thank you very much and I will be happy to answer if you have any questions.