 Hello everybody, I'm Sundar Nadathur from Intel and I'm happy to co-present this talk with Sandeep Sharma from Arnaa Networks. In this talk, we will first introduce MCO and describe how it manages workflows. Then Sandeep will explain the work that Arnaa has done in using MCO workflows to automate policy-based service assurance for cloud-network loads. The Edge and Multi-Cluster Orchestrator is an open-source project to deploy and manage cloud-native applications and network functions across Kubernetes clusters. It is designed to handle geo-distributed microservice-based workloads which need to be deployed at scale across geo-boundaries in public or private clouds in telco or co-location edges and on-prem data centers. MCO is unique among app orchestrators in that it can not only deploy apps as hand charts but also automate the configuration of the environment around the application. For example, if the Kubernetes cluster has an ingress gateway, it may need to be provisioned with the HTTP routes for the application. Also, the enterprise deploying the app may have its own certificate authority. If so, an intermediate CA has to be set up for the cluster and certificates for MTLS and other needs need to be derived from that. MCO automates all these scenarios. MCO has been designed to be extensible. It has a microservice-based architecture and can be extended by adding more controllers or custom workflows. MCO can be used as a building block for larger stacks such as the 5G Super Blueprint from Lex Foundation. MCO has been a separate open-source project from 2019 since its origination as a modular own app. Since last year, it has been a part of the Lex Foundation. We have participation from several leading industry players including telcos. Commercial support is available, for example, from our non-networks as part of the AMPCOP platform. MCO is a part of the 5G Super Blueprint from Lex Foundation Networking. We have an active ecosystem and there are ongoing engagements with some telcos for deployment. MCO can be extended by adding new controllers that execute as part of the app deployment lifecycle. But you can also extend MCO functionality with workflows for situations where actions need to be taken outside of application lifecycle events or when you want to limit access to the MCO's shared resources and databases. MCO can handle workflows using the Temporal Workflow Engine since MCO's 22.03 release. You can launch workflows, monitor the status and cancel them all using MCO APIs. Temporal is a widely used open-source engine that offers resilience and proven scale. You can think of Temporal workflows as a set of distributed processes, each of which executes several complex stateful steps. These workflows may run for days or months and may even be crown jobs that repeat periodically. The Temporal framework provides many benefits, one is resilience, since the Temporal server records the state for each task and restores the state in the event of a failure. It provides a standard way to set timeout values and retry policies for workflows and their constituent activities. It also helps with observability of workflows with status checks and queries, and it is known to scale to a large number of workflows and tasks. These workflows can be written in many common languages. Let us now look at how Temporal can be integrated with MCO. In a cloud-native deployment, the Temporal server will be deployed as a container running in one or more Kubernetes parts. The worker processes will also be deployed as Kubernetes parts, perhaps in a different cluster. They may be deployed with a help shot across one or more clusters, possibly by MCO itself. A single worker can execute many tasks from different workflows. The workflow client will also be deployed as a Kubernetes part, and it may be in yet another cluster. One part may contain more than one workflow client, and each client may possibly handle more than one workflow. MCO needs a way to invoke this workflow client, so we expect the client is packaged with the HTTP server. This HTTP server takes an HTTP post call to trigger the right workflow client. The workflow parameters are included in the body of the post call. MCO is composed of many microservices. MCO may be deployed in a different Kubernetes cluster than workflow clients. We have added the workflow manager as a separate microservice that allows MCO users to define intents for workflows, including the workflow client location as well as timeouts and retry policies for each activity and for the whole workflow. The workflow manager initiates a workflow by adding a HTTP post to the workflow client. The workflow manager can also perform status checks and execute queries on a workflow by directly calling the temporal server. It can also cancel or terminate a workflow. I now hand over the floor to Sandeep. Thank you. Thank you, Sundar. Hi, all. My name is Sandeep, and I'm from Arna Networks, and I'm going to demonstrate a service assurance use case using MCO. For this purpose, we have introduced a new controller in MCO, and it is called the policy controller. This controller will enable users to define the closed loops and via MCO intents and deploy the closed loops in the target clusters. The policy engine that we have used in this controller is the OPPA, open policy agent. OPPA enables users to define policies declaratively, and OPPA separates the policy code from the application code from the microservice code flow. So with that brief introduction, I'll describe what the MCO policy controller is about. So MCO policy controller is a new microservice that is introduced in MCO, and it acts as the policy enforcement point. And as I have already described, the policy controller uses OPPA as the policy agent. So there are two parts to this policy controller. One is the policy agents. These agents are deployed in the target clusters, and their sole responsibility is to collect metrics from different resources. Prometheus is an example, and in our demonstration, we are using Prometheus. This agent is implemented as a Kubernetes controller, and we define custom resources specifying the kind of metrics that we want to collect, the intervals of metric collection, etc. These agents collect metrics, and they enrich these metrics with very specific information which the MCO controller requires in the central data center in order to complete the closed loop. The other part of the controller is the main policy controller, which sits in the central data center. Its main responsibilities are to serve the policy intent APIs, which the user specifies. And it also is responsible for calling the policy engine itself for the evaluation of the policies based on the KPIs and metrics that it receives from the agents in the target clusters. And after the evaluation, the policy controller also calls the actors. And as we know in a typical closed loop scenario, there is a component which executes the policy, and then there is a component which takes the actual action of auto correction. In our case, the controller that is doing the auto correction is the temporal workflow engine in MCO. And in our use case, the auto corrective action that we have to take is to migrate an application from one cluster to another. So temporal workflow engine as the actor is configured in the policy controller that I will show in the demonstration. So these are the responsibilities of the agents which sit in the central data center. So this slide explains the policy controller and temporal workflow integration. As I have already explained, the actor in this case is the temporal workflow engine in MCO. And the policy controller after the evaluation of policies will call specific workflow in the temporal workflow engine for performing the auto corrective action. This diagram explains the end to end closed loop flow. As we see, there are policy controllers in the central data center which are consuming events from the agents which sit in the target clusters. And policy controller is consuming these events and evaluating the policies against the metrics that are present in these events. And if the policy condition is met, the policy controller is then calling the actor which is the temporal workflow. So with this explanation, I will jump on to the demo and demonstrate the service assurance use case using MCO. In this use case, we are going to consume the memory utilization as a metric from application. And the condition that is fed in the policy is that if the memory utilization is above a certain value, we migrate that application from one cluster to the other. So in this demonstration, we are going to show the policy intent specification and the workflow intent specification since workflow is the actor in the service assurance use case. So we will start by showing the custom resource that is created for collecting the metrics. So as I had explained, the agents which sit in the target cluster are implemented as the Kubernetes controllers. So at this point, we can look at one of the custom resource that we have created. So the spec is very simple. All we are saying is that we want to collect CPU and memory related metrics. And with the specification of this CR, the controller will reconcile to it and it will start collecting these metrics from all the pods that are created via MCO. And these metrics are collected through Prometheus. And after collection of these metrics, the agent is going to enrich these events. And the enrichment is done by adding the deployment ID of these pods. So that MCO in the central data center could relate these pods to the composite applications and the deployment intent groups. Because the policy intent and the workflow intent, they are specified at the deployment intent group level. So this is how the whole relationship is getting resolved and maintained. So after this, we are going to prepare the actor. And so by that, what I mean is that we have to specify the temporal workflow intent, which Sundar already has described. This is the workflow definition. So it's a very simple workflow. All we are doing is we are going to migrate an application from one cluster to other cluster. And this will be triggered via the action that will be taken by the policy evaluation. Now we can start applying the intent. So this EML defines the intent. So we are applying the intent, the workflow intent. After the intent specification, now we are going to specify the policy intent. So this is the definition of the policy intent. So the intent has to know the endpoint of the policy controller and the workflow name, which it has to trigger. So this is the policy definition. And dynamically we are going to change this number, which is basically the memory usage bytes. So we will make it a more logical number so that this condition is hit. After changing the value of the memory bytes that is to be compared with, we see that the policy has hit. And it has already triggered the actor to take the action. And on the right hand side of the screen we see that the application has already started migrating. So it is terminating in the cluster one and it has started to come up in the cluster two. So that concludes my demo. Thank you.