 Yeah, hello, welcome I'm really glad that you decided to join our session today I hope you're enjoying staying at Amsterdam and you're enjoying cube condos here My name is Grzegorz Panek and I'm a research engineer for the Orange and Today with Piotr Matysiak, we are going to share with you the output of our project called edge relocation so basically we are going to talk today about the seamless migration of the Cloud 94 clouds across multiple edge clusters and multiple edge Servers So first of all, I will give you some motivation from our side how we see the edge and why we are taking care about the edge and Later on I will give the floor to the Piotr that will give you a deeper View on the technical staffs how we are managing multiple clusters and how we are implemented Edge relocation procedure and finally I will give you some notes on the future research direction of our project So currently Orange is Deploying its own 5g network 5g connectivity Because we believe that the edge computing is not enough to achieve the low latency Communication in order to take profit from the late low latency communication for the edge services. We need as well a really rapid radio access network to the edge services and We think like an integration of the 5g and edge infrastructure will give you Will give you and will give us a whole opportunity to take profit from the low latency communication for the edge computing Nowadays more and more partners industrial partners are requesting are asking us are you already able to deploy our own services on your edge infrastructure and During these exercises with industrial partners. We noticed that a Lot of the or majority of the use cases that the partners are developing Are characterized not only bought by the low latency communication, but as well about the end user mobility So we notice a functional gap how to take care about the Edge application and the end user mobility how to take care about the service continuity for the users that are in the mobility state So we identified some a gap Procedure in the ecosystem that is a migrating of the edge application from one edge cluster to the other edge cluster because of the following the end user Maybe I will give you the more visual More visual and view on on our problem that we are dealing with So let's imagine that we have an autonomous vehicle steering system that is deployed at the edge clouds edge cluster However, as we know the users the cars are moving and They're approaching different geographical Location so it might turns out that once the user will change its position It should be hand-off to the another g-note be to the other Access network station. However, the edge Cluster that is closest to the targeted edge Targeted radio access network It's not ready to handle the end user request because the edge application is not there yet So we implemented the edge location procedure to move this application from one cluster to the another Seamlessly in order to guarantee the service continuity for the end user But you may ask me. Okay. Why not just to replicate the edge applications everywhere? That will solve the problem but we did exercises with our operational team and it turns out that the deployment of the edge infrastructure is really costly and we need somehow to optimize the Utilization of the resources at the whole topology. That's why we decided to Implement some pro procedure that based on the follow me approach. Okay, so if the user is Requesting to access some application. We are just on demand creating this application there But it's not only the one reason why we need to Take care about their location edge location We should Guarantee as well for instance the load balancing for our resources in order to increase the capacity of the system we can Imagine that one of our edge servers would be off Due to I don't know maintenance reasons power the outage or something like that or we are just wanted to move our application or move our infrastructure to the Provider at other cloud provider So all the use cases that I mentioned requires the moving the application from one cluster to the another With no service disruption for the end user so Just before going Floor to the PL tribe will give you a view on the our real objectives of the project and real Contributions to to solve this project So in fact we implemented two procedures the far the first one we called edge location And this is about the seamless migration of the containerized application between the clusters and the second that we called Laysay for management of the edge application and but in fact, this is the life cycle management of edge application and And user Management because this procedure is observing all the time. Thanks to the 5g network. It's observing call the time the user positioning the radio Conditions, it's observing as well the edge infrastructure Utilization and Based on this observability stuffs. We are taking a decision whether to migrate migrate our application or to stay at the source edge cluster So to implement that we implemented some smaller proof of concepts That we can mention for instance observability controller that is all the time monitoring the resource utilization at Kubernetes clusters and we implemented as well topology controller that is taking Care about where they were the edge cluster currently is located. Maybe not just is taking care about the edge infrastructure is Aware about whether the cluster is overloaded or not and finally we implemented a placement controller Which implements some mathematical modeling some mathematical optimization on Analyzes whether to migrate our application or not I will give you more detail about all the controllers in the next slides. So now the floor is yours Thank you, Greg and it's a pleasure to be here I Would like to talk about why we are at this event and How we place at this event? So we talk about the edge you can imagine that we have different types different layers of edge far edge mid edge Close edge of if you will a cloud folk or just edge or you you can have different Type of layers, you know, you call it as as you want and on top of that you'd like to have some control Management plane some kind of orchestrator. So We are at the edge and to place us at the Kubernetes event We assume that each edge cluster Is a Kubernetes clusters it's it's separate cluster with its own management with its Separate control plane and of course separate worker nodes and if you will also The cluster may be from different providers different kind of kind of clusters member metal clusters managed clusters, etc And for the orchestrator, I don't want to go deep dive into why we took the mco Which stands for edge multi cluster orchestrator? But what we like about mco is that it's intent based There are a lot of type of type of intents The most important is for us the placement intent So it's crucial to remember that we have placement intents in the mco and also It's composed of many different controllers again, the placement controllers are the most important for us and Mco introduced some mechanism to take a lot of clusters from different provider and different types of clusters under Under an umbrella called logical cloud and then we can deploy on those clusters application applications using Mco APIs you can deploy application on single cluster on the set of cluster. It depends What we do not like about mco is the lack of the functionalities that you would like to have for our use case. So we Extended the mco with the relocation workflow as being said and in fact the relocation workflow itself, it's It can be defined in the mco via Endpoints as an intent. So in the intent we can specify What application should be relocated where to which cluster and if we Deploy if we invoke such relocation workflow the following procedure will be Will be proceed so we will collect the information about the placement intent based on the relocation Intent we will update it. Of course the changes will be Applied and Reconciliation will be we'll take take turn But we will not We will not delete the old Instance from the previous clusters before that we need to make sure that the new instance is ready So we check for the readiness and then we steer the traffic from the user Which can be external from the from the cluster. So the service mesh is not always the solution and Only when we steer the traffic we can under a condition that this is application is not shared application There are no other user who is using this application. We can again Update the intent about placement and remove this application to I Don't want to show any demo. We have we are constrained on the time the system Makes makes bigger and bigger. So I will show the abstracted way on the slides So imagine that we deploy the we say to the mkp is we want to deploy the application x for example on cloud one and Is deployed and then we can send the second Request to the mkp is with the relocation intent that we would like to relocate the application x to there for example folk to and Following the procedure. I showed you in the previous slides the application of course will be Relocated from cloud one as shown to the folk to with almost It will be almost seamless migration the traffic steering thing is work in progress We have a simple solution, but there is a lot a lot a lot to do more But as you can see here, we have some manual in invoke of the Procedure which is not the case for example in the scenario where we have Vehicles which are moving because there will be no admin who will Define and trigger such a location. So we deep dive and we Develop another two components at least two components. There are another work workflow, which is LCM workflow and the placement controller for the That our own placement controller additional for the Mco So for the LCM workflow It's this almost the same as the relocation. I mean it's also Intent base so we can define the intent For the LCM workflow where we can specify the application requirements For example for the latency for example for the resource how much how many resource the application needs probably also the Type of algorithm to take into account when we will try to select the the best cluster And this is as follows when we start the LCM workflow at the first We subscribed for the notifications in our use case the telco use case. We had only one source from the IMF Which is a network function of the 5g control plane and we received a notification about the user movement from from that point We could subscribe for many type of notification from many different Source of information it depends on our use cases. This is for a proof of concept So when we subscribe for the notification we entered the loop which is Which is running for the entire life cycle of the application So we listen for the notification for the relocation if it occurs We call the placement controller and ask make it make the decision where where the application should be moved Where is the optimal place and we wait for for such a response? If this response is received We can make use of the relocation workflow described earlier and Generate such a relocation workflow and invoke it. So This is about the LCM the placement controller. It requires some source of Knowledge to make the decision So the first source of knowledge is LCM workflow where we define Requirements and the second and third are the topology controller and the observability controller We define proof of concept versions of those and Going deep dive placement controller receives also the intent As a request it receives the intent to select the best clusters and in the intent we specify The information which were provided in the LCM workflow. So application requirements the type of algorithm Which need to be selected to to to provide the placement and of course We also utilize the topology controller information For example the location of the latency information, but it's up to the some kind of topology Some kind of edge provider which can provide such topology controller to to define what information we can get It's the same for the observability controller here. We observe the resources on the Kubernetes clusters And based on that we so we have here at least three sources of truth and based on that We can select one of our Algorithms which we define for example optimal heuristic not going deep to the implementation details and the placement controller we are Receive we will send to us we will respond with the best placement for our application So to show you an example we deploy the application manually as we did and also we define the LCM workflow So LCM workflow at the time of deployment is started and it will work until the application is killed So it will listen for the notifications if we risk it if it receives that notification, of course it will automatically make the decision and Trigger the relocation in this case from cloud one to cloud two it will continue and if we receive another location Request it will again make the decision and make them Relocation we can do it as many times as we want. I think you you you got the idea and The LCM workflow can be started for many applications So we can deploy any application any number of application as we as we want and they will be separately Decision will be separately made of course on the same infrastructure. So the different application. We will make the decision Sorry, so the decision for one application will somehow impact on the decision on the second For the second application, but this isn't an idea. So for us, this is not yet that are yet targeted for the product So this is open source And this is as a proof of concept shown. So it's still working in progress as an example of further work we also consider the machine learning reinforcement learning agent to interact with it from the placement controller side And based on that we believe that we can further optimize the decision which cluster is the best but I Think I showed you the general view and our process how we received how we How we are here and I think right now I can Give a hand for the Greg to continue Thank you I don't know if Piotr you mentioned that the actual location procedures that we implemented is already open source So if you are dealing with the similar problems of the migration of the Application across the clusters so you can just take it from the open source the link was included in the presentation This is a part of the MCO project currently edge multi cluster orchestrator project But in order to make some fun with the procedure We decided to integrate open source 5g network with the Partially open source Mac system and partially done by us in our project So as you can see we have a 5g core that is a All the time informing us about the user position about the radio conditions for the user We can still monitor What's the status of the hour and user whether it's a handoff or whether it's doing some movement or not Because as you can see in order to user will Access that edge application it needs to access through the 5g access network. So We implemented as well the observability controller this constantly Observing the resource consumption at the different clusters edge clusters and finally we implemented placement controller as Piotr mentioned This is the place where we implemented Heuristic and reinforcement learning algorithms in order to decide whether to relocate or not based on the intense received from the 5g and Edge infrastructure Okay topology mapping how we did that so for the 5g control plane we take profit from the Pre-5gc project it implements as well a data plane for the Access network and a simulator of the end user. We took profit from the you we run ZIM project our Edge clusters edge hosts were implemented as a single separated Kubernetes clusters the service orchestrator That we have in this architecture We took profit from the mco project Which I mentioned before and the for the observability controller we implemented like a Prometheus plus Grafana my mere solution the placement controller is our Own child would say the same for the network and make topology So this is the last slide. I'm not going to to deep in this project because this is more mathematical modeling But as Piotr and I mentioned before we are receiving multiple notifications from the 5g control plane from the edge infrastructure and so first of all we implemented the heuristic based approach on whether to migrate our application or not based on some CPU memory measurements based on some latency Observing and Currently our ongoing work is implementing reinforcement learning approach in order to the agent let the agent learn our edge Topology let him know to do how to react in our Edge mobility case so thank you a lot If you have any question, please anybody got a Hi, Chesh Thank you for position. I have a question about how maybe I missed that but maybe you did not describe it how do you Describe the application that the services in the first place. I mean what kind of manifest format are you using? Are you using Terraform to ska things like this? Could you repeat? It's hard to hear on the stage. Yes, sorry about that. So my question was When you describe the application what kind of manifest format what kind of domain specific language Are you using or are you using any of? Those sort of domain specifically are using Terraform to ska things like this. Yeah, in fact, we Right now we do not care about the infrastructure creation So we assume that the infrastructure is here. We just registered the clusters and for the other Manifests I believe everything is in the Yamla format which the new you know parsed in the programming languages is Yeah, also Yeah, we also use hand packages to do describe the application So the hand package is the input for the mcop project