 Hello, greetings. Good morning. Did anyone post the meeting notes into the chat yet? I'll do that. Thanks. Taylor. Yeah, you're welcome. Would you like to lead today? Just so people know, we'll probably went till five after before we get started. So people have time to join. Please add yourself to the meeting notes. And if you have any topic you'd like to discuss, you can put it in the. Agenda area of the meeting that's a necklace. So thanks for everyone that's joining right now. If you can just add your name to the meeting minutes and adding the agenda items you'd like to talk about, we'll get started in about a minute here. Okay. It's five after so we can probably get started. So welcome everybody to the telecom user group meeting. We meet on the first Monday of every single month. And if you want to add your name and any agenda items to the meeting minutes, you can always do that in this Google doc. So with that being said, before we jump in, is there anything that anyone would like to add to the agenda. Okay. Thanks everyone that is showing up. So just a reminder about upcoming events. The CNF working group meets every week on Mondays at 1600 UTC. Cube con Europe just passed and there's a bunch of great recordings from about telecommunications and cloud native telecommunications industry the videos are now up on YouTube. The Cube con North America is coming up in this fall and located alongside it will be open networking edge summit and Kubernetes on edge day. There's more information in the links here. So with that. Today we have a teach to talk about orchestra, which is a cloud native release orchestration lifecycle management platform for fine grain orchestration for groups of independent applications. Yeah, Natesh, would you like to take over. Thanks. Yeah, let me share my screen. Let me know if you can see the presentation. Yep, we can see it. Perfect. Hi folks. My name is Natesh. I'm a senior software engineer at Microsoft and newly formed team called Azure for operators. This is about an open source project that I've been working on for the past year, along with the team is called Azure Orchestra. Orchestra is an orchestration platform that's built for system level releases of complex mission critical applications. And also be seen as a tightly coupled helm applications because the delivery mechanism is Helm charts today when we deliver our apps to our customers. So previously I was a technology architect, the office of the CTO at the firm networks, the firm networks was acquired by Microsoft early last year. So Kubernetes enthusiast, I've been working with Kubernetes for the past three to four years now, prior to joining the the office of the CTO, I was working with service meshes and this was a studio to support our 5G platform. So using all the features from service mesh. I'm also an avid gopher. And the funny thing about me that I love to hoard swag. My entire wardrobe is just graphic tears from coupon and gopher con. So, talking about the couple of topics today I'll keep it short. And I do, I do wish to show you a demo of orchestra with a slightly less complex app as compared to network functions that we that we ship to our customers. So as an overview, orchestra is an open source project and didn't start as something that was open source it was kind of tailored to, I'm going to say a firm network site now because that's where we started our journey. Now it's Microsoft. It's tailored to our workloads and how we ship our network functions that we built which are 5G core network functions which are sold or by the ship to service providers, but over time we kind of realized there were components in here that would be of some benefit to the rest of the Kubernetes community and started abstracting out pieces that that were were not specific to the firm workloads and we, we built orchestra out of those abstractions. The primary goal of the platform is release orchestration of tightly like highly dependent applications or groups of applications. Release orchestration would be the rollouts rollbacks, and then you have the lifecycle management which would be around watching the state of the system, our network functions and other components and auto remediation on failures going back to the last successful deployment and everything would everything should be easier to touch. And like I said, it's a it's a group of health applications. So, I think of it as a bunch of health, health artifacts, health charts that are your applications and a lot of these health charts can also have some charts which could be micro services that power your, your network function. The, the operators cluster scope, it doesn't work across. It's not a multi cluster solution, though there are some. There's some work in progress to kind of build an abstraction on top of orchestra to, to make this multi home across multiple clusters, but this one just focuses on a single cluster and deploying components on on the one Kubernetes cluster. The architecture, it's the the whole solution is built using some popular CNCF projects we have our go workflows that let's us do some dependency based workflows so you have died based workflows and you have chart museum and help which are used as as your health registry and helping your vehicle of delivery and have controller to kind of automate the whole helm operations. So, have control has some nifty features where it can do remediation for you it's, it's a declarative spec you no longer have to deal with imperative commands using the help see a lighter deploy your applications. The orchestra itself is built using the cube builder project, it's written and go, and it leverages the the official client go and controller on time library so so no surprises there, it uses what cube builder provides. The, the, the input to orchestra as a custom resource called application group and it's it's a collection of applications with dependencies mentioned amongst them. So the primary use case is is for the in service upgrades of mission critical systems and especially as we're in the telco group it's it's around the five g core network function so we build a bunch of network functions that we do not manage. These, these network functions are are kind of delivered over an air gap environment and you no longer have any visibility so you need something to do zero touch deployment you need it to be reliable and completely automated so so that's the main use case when we build our applications is that you have your you have your network functions but you also have a whole bunch of supporting components that as ISVs we have to provide which may or may not be something that the customer is going to manage for us. So you have to build an entire group of application supporting components in for components that need to be shipped along with the NFS. I was listening to a session from the previous meeting and there's a line in there that caught my attention network functions are not just help applications. So they're more than that it's not like the enterprise world where you can have a single chart for a single application. So it's a little more complex. So the use case of orchestra was to provide rollout strategies you have your Kubernetes rollout the standard rollouts. But we want to enhance that especially in mission critical systems like 5G, you want to provide canary and blue green deployments so orchestra can leverage some service mesh features behind the scenes to do canary and blue green deployments. I think the the plug in ecosystem kind of allows allows users to build custom strategies for rollouts as well so you can you can build your entire pipeline on how you want to roll your application out including the kind of telemetry data that you want to look out that's not supported by the service measures or rather the progressive delivery frameworks. Another feature is the auto remediation so as soon as the failure is detected orchestra is capable of remediating the errors and rolling back to the last successful spec, if any auto remediation lets you fail fast. So you can catch errors quickly and rather than cause a lot of disruption you can you can just roll back to your last working set of applications. So we have two plan components that we're working on planned and rather than actually started off last week. We're going to do orchestral support quality gates. We would use quality gates for promotions manual and automated kind of gives you control over how you how you release your applications across the customers network so it could be this the staging labs canary labs and continuous testing as an integrated part of orchestra so rather than having your own test infrastructure running outside of outside of your Kubernetes orchestrator rather the release orchestrator you would have this continuous testing built into these rollouts along with automated canary analysis orchestra would also do its own analysis any custom metrics that it needs to look at. So the reason why it's so complex is why network functions are so complex is because they rely on a lot of infra and past components that that are specific to the vendors network function different vendors are going to deploy their network functions on a single or across the network of a customer's cluster so survivors cluster and everyone has to bring their own supporting components so so we just because of that we we start having tight dependencies so really complex dependencies between these applications and all the supporting components so here just show an example on how we deploy our workloads but we have strong hard dependencies on some RBAC policies that need to be configured before anything else can be deployed these aren't these aren't pods that microservices that converts these are base resources that you need to set up before anything else can be started similarly you have OPA that you need for a mutation and validation so these web hooks need to be up and running before anything else can start Metal LB is another case when you're doing this on bare metal servers bare metal clusters and then we do leverage STO service mesh which means the the service mesh needs to be up before you can start any of the network functions because the whole side car injection approach and configuration of envoy and much there's there's other components to observability security that we need that we need set up before our NFS can be spun up so this kind of drove us to the part that we need to we needed to automate the whole solution on deploying things in in a particular order following dependency graph. So, so some of the features of orchestra is that it's built for Kubernetes it uses the operator pattern to deploy controllers to to to Kubernetes cluster completely declarative because of the custom resource approach you provide the state that you want to be in and the controller reconciles and gets you gets you to that state. And it's a GitOps compatible kind of it can plug into any GitOps framework any so things like Argo CD flux CD. So it's it's it's pretty much agnostic to it. All we need is that custom resource to be deployed in the cluster and orchestra takes it from there. The way the way it works is by generating database workflows. This is where we leverage our go workflows and what what orchestra can do is one our applications are the health charts themselves can have dependencies amongst themselves. So so you could you could think of the of the first layer as the info components, the first set of layers and then you have the, as you go through that the final thing is your network functions or a set of dependent network functions that are being deployed. So so it renders a DAG based workflow for these applications as an optional for us it's kind of a mandatory feature is it also does DAG based workflows for the sub charts that are contained within the application chart so we have a lot of supporting or rather comprised of a lot of microservices within an application within an NF chart. So it be deployed those sub charts so those as health sub charts and they need to be coordinated as well. I should take that back they don't need to be coordinated but when you do in service upgrades. It's good to have them ordered following a DAG where the most crucial elements go get updated first or last however you want it so that you can reduce the blast radius when things go wrong. So you you cast the issues early and then you roll back rather than having everything kind of blow up at the same time. The reason this is important in the service provider world is because they're not going to do continuous deployment as it's done in the enterprise world. It's not take updates as they come in and roll out applications. Instead what we have realized and after talking to our partners is that they would do it on a schedule which means more than one application in for component could be ready with upgrades. Whenever they schedule the releases, which means it's not it's not just one application going in there's a whole bunch of help charts being upgraded along with say the service mesh and observability components security components. So so just one the application DAG helps with that where the DAG is always followed you you honor the dependence degree and deploy the intro components first and then your applications come in. And the sub charts again we break it down so that you can catch any failures at the microservice level. So, so I made a mention to the Github frameworks it plugs into any CSAB framework, you can also connect it into a QQ portal as well. So the plugin ecosystem is we you bring your own executor container image. So the way it works today is each of the DAG nodes are responsible for generating a custom resource that's picked up by health controller. The custom resource is called Helm release. And it's it's it's very similar to what you would do on a Helm CLI command you provide the values you provide any any flags that help needs and help controller would reconcile and make sure it does all the help actions that you need. So, so today it just just deploys the Helm release object and wait for the status before the workflow node succeeds, but you can bring your own workflow container where you could do a lot, a lot of other things that you could go in and look at monitors and monitor the state for a bit before deeming the node of success and moving on to the next node in the DAG. So it provides a safe and reliable zero touch system foreign service upgrades. And this is this is a simple example that I'll be demoing. We do our upgrades following the forward workflow, but the reversal is not a blanket rollback. Everything won't be rolled back to the to the previous versions altogether since stated reverses the workflow workflow again honoring the dependencies and starts starts either if this is a delete then starts deleting everything in the reverse order or if it's a rollback it does the same thing. So our ecosystem is becoming larger every day we're, we just introduced. We just spoke with the captain team and we're making, we started the integration there and that's captain is another CNCF project that can do validation and quality gates. So captain would be an integral part of our kesta over the next few months as we go on it and then kept in itself could leverage progressive delivery frameworks. While our kesta outside or kesta can do this without captain as well so we're going to put up some examples of using our go rollouts which is which leverages is to do some progressive delivery and automated analysis to promote your applications. So, so this is a roadmap, like I said, captain's the first item in our roadmap and then we have our go rollouts as an example as well that'll be added. So you can you can get to the GitHub repo following these links and we do have official docs as well, especially for admins and contributors who are interested in contributing there's a whole bunch of resources on the website that you can use to get yourself set up and start contributing if you wish to. So let me quickly jump to a demo do we have enough time. Yeah, we do. Okay, perfect. So, right so so what we have here is. I'm going to just quickly show you our application group. So, so this is our custom resource that orchestra picks up, and you can see that it just has a list of applications. So, so this is your application group with two applications you have ambassador and book info is is kind of based on the STO example where the book info app has a whole bunch of microservices product page reviews ratings and details so these are treated as sub charts in the application chart. And in this example book info is dependent on ambassador, which means ambassador is spun up and roll out before book info can be started so we have to make sure that everything comes up and only then book info gets kicked off. So, but then the book info application we have dependencies among the sub charts as well. So, in here you have details and ratings with no dependencies which means they can spin up first and then you have reviews, which depends on those two sub charts and then product based upon depends on reviews. So, kind of everything is driven through him so you have the standard helm options. You can specify what target namespace to go go to what override overlay values you want to use. In this case we have the replica accounts that we override on the default charts and sub charts. So, let me create this. So, so what we have here is our architecture in that the application group and it generates a workflow out of this. These tags are intermediate most and not the executive containers so you have ambassadors pinning out now on the right you can see it's starting the ambassador pods in within that and the Bay Orchestra did the generally the object for these other containers is it passed in a Helm release object to our executor as a basic before and quoted string and when you decode that it's just a Helm release object that help controller picks up and this executive container down here if you look at the logs is just waiting for the Helm release to go into a successful state. So again, you have, you have ambassador the primary application that's seated and now it's gone into a sub chart tag. So details reviews ratings and product pages are sub charts within the book and for application. Since, since sub charts are dependencies the dependencies go first before the application charges them. The application chart may or may not have any resources being deployed but if it does it has deployments config maps all of those will be created after the sub charts is deployed. So this is, this is different as compared with hell itself and you do a hell install everything gets started together. Orchestra kind of staggered everything out. So, so we have a successful workflow over here I can go in and make a modification and this would be similar to releases changing over time and being ready and the CD picking it up. So let's go down here and change the replica set. So product page. So both of them should see changes. And what's going to happen is this workflow is going to kick off again. It's completely at which, which means that the ones they have hand releases that haven't changed will not be affected at all. It's just the executive container will see the hand releases in a succeeded state and just do nothing and move on to the next part. So, so now we have the sub charts spinning up quickly I just wanted to show you don't have to watch the entire thing but if I just look at the hell release version the process name spaces. So you'll see that reviews and product pages to do applications to do sub charts that we touched the values that we touched were moved on to revision to finally the thing I wanted to show is the rollback. So, so let me show the remediation that if I go in and let's say, set it at a version that doesn't exist. So let's say rollback to the loss of the full spec. So this should, this should fail because it couldn't find the chart. And if you go back to it, it will spin it up again just to make sure everything's looking right. And if you go back to the spec, you should see that it's rolled back to a version six. So, so that's that's another one. So let's just roll out so that then we can show the reverse workflow, which should be quick. Let this turn green and then succeeded. Now I can go in and delete the application group. What we should see is when we look at the workflows, you have a reverse workflow that's picked off. So the forward workflow if it was in the middle of reconciling would be suspended. The reversal would be started. So if you go back and look at the pause, they'll be terminating in the same order as they were the opposite order of how they were deployed. We have a book in for product page interviews. Going in the reverse order and be deleted. So yeah, that's that's what I have for the demo building some more demo has to show features like progressive delivery and quality gets that should be out. In, in a short while, but we are at our MVP stage things are a lot more stable than they used to be so feel free to play around with the with the book info app and if you want to try it with your own application as well. And we'd love to get some feedback. And also, if you wish to contribute. There's a lot of open, open issues, small and large that would really appreciate anyone's contribution. All right. Any questions. Hi, thank you, Natasha. It's a really, really great presentation. I have a lot of questions. I'll start, I'll start maybe with one. You know, you started the presentation talking about the challenges of CNFs and and telco workloads. I wonder how how you see orchestra directly connecting to them and then I'll have another question. So, so you mean the orchestration of those CNFs. Well, you talked about the some challenges of CNFs having to do with integration with the platform with the host. Yeah, so let me go back to this picture. So this, this is based on my experience with the firm. The way are so you could you could think of our the data plane component like the UPF UPF is something in terms of the platform is speaking is would need DPDK on your platform as well. But when you when you get into the Kubernetes world is a whole bunch of CNI frameworks you have other operators like Intel building features to help with some hardware acceleration DPDK again. I'm not the expert on that but I have seen like these dependencies that are done through Kubernetes operators. So this kind of becomes a prerequisite before our NFS can be deployed, you can't just go in and deploy UPF and expect to work. So in that way things need to be set up. So so orchestra honors that dependency, the DAG so one when you're rolling things out it could follow that order where all the prerequisites are started. So in this case we show from the application from the platform side you have the service match that needs to be up you have OPA to do our mutations that we need in terms of we don't leverage it for a lot of security features. And like I said, any of those from what I remember they were like new new Mac or mappers and there's some whole dynamic set of controllers that Intel builds around this. So so that's what I mean by orchestra handling dependencies is not stick to what you're deploying but as long as you have it packaged as a helm chart, it'll follow the DAG. Does that does that answer your question. Oh, somewhat. Yeah, I guess I guess my follow up is related to that. So, I mean the DAG is a future of our go workflows. This is not maybe specific to orchestra and, of course, one of the things you can do with our go you don't have to call helm. You could your your tasks can be anything you want them to be including configuration tasks. Things like things that you absolutely cannot do with home things like installing things on the host installing CNI plugins for example and configuring them. Argo also were very useful or possibly useful for solving issues with configuration right so it's not just deploying lots of Kubernetes resources right because in the end what we have here as you showed with you showed that we have all these Kubernetes resources deployments pods etc that that are being deployed for us in order. So two issues there one. I think we would all like our applications to be more cloud native in the sense that they shouldn't depend on order that is things should come up and if they do have dependencies they should be autonomous in the sense that they component should be able to make sure that if the dependencies don't exist and it won't do any any work right. Ideally you'd want to be able to deploy everything declaratively without a workflow and and have everything up but but we we of course in reality isn't always that one of the aspects has to do with configuration tasks things like net configuration for various components. Things that could happen in order and it's useful to sometimes break them into building blocks that you can indeed put in a cyclical graph. But what one of the problems with helm and I think helm is very problematic to be honest is that in the end you know it's a it's a text templating tool right it creates a Kubernetes manifest for you and YAML and and those become the the Kubernetes resources but but but but we have much more that we need than just that I think I mentioned that kind of configuration tasks but there are all kinds of other things assembling for example clusters of components maybe putting a load balancer of some sort for a particular protocol among them there's this problem space I devoted a lot of time for it I'll share in the chat some people know this sorry for tooting my own horn, my own horn here but I'm working on an orchestrated which is based on Tosca rather than the Kubernetes manifest and lets you create apologies with all kinds of relationships relationships themselves are typed in Tosca so what I see here is basically there's one kind of relationship which is a dependency. And that dependency defines order of execution but I think there are a lot of other kinds of relationships that we we want to create some of them are networking connections but some of them are. You can call them logical relationships that have to do for example with, as I said, look, say look, you want to describe a load balancer so you know I love this visual graph presentation but this is a deployment graph. Specifically, this is an order of deployment. But once you get to day two operations and changes it's not so much about deployment it's modifications, especially in configuration that are more than this so not to say that this isn't an important contribution but I think it addresses one corner of the problem. Hey tell, do you have some specific questions that we can look at. Like, if you're saying there's a specific area how do you handle this and you don't see like you don't see that it's going to be handling, and then we can hear back some targeted questions about this. Well, so one. Yeah, I know I didn't end up really making a question here but yeah. The question would be more specifically about how does orchestra handle tasks that are not helm. So I mean it's integrated with Argo, but how well is it integrated with Argo that is if I wanted to find Give an example of a task that it wouldn't handle by default their helm so then we can examine that. I thought I gave one that so a net configuration so let's say some of these components are running a net con agent, and they need to be configured as part of this. This entire product right so it's not just every individual sub chart, but those components those pods those services might need to be configured with net conf right according to a general plan. How would that integrate. Yes, it's a good question but it's kind of outside the scope of orchestra, you're right we only address the whole deployment strategy so doing orchestration, agnostic of what kind of deployments here, you're trying to leverage it for so it doesn't know whether you're a galco system deploying by G applications or not, or just that's why I said it started it started pretty. It was, it was as built for the affirmed workloads at that point there were a lot of tie backs to configuration day to deployments but over time. We saw that this could be applied to any kind of application so so one it's it's a generic tool. It's not. It's in no way tied to 5G. But it was it was built with these mission critical systems that take multiple releases at a time rather than continuous delivery that enterprises do. It was built with in service upgrades in mind and this is this is just around the deployment aspect of it. You're right about the day to configuration. A lot of people have their own proprietary stuff we have our own proprietary stuff on how we do. I'm not going to dive in. I don't even know too much about it because I'm not the main person working with that but there are cases where we leverage helm and helm operations to do those configuration. That said, the the selling point of orchestra in my mind is the in service upgrades and the the the kind of progressive delivery that it brings. So it's it's defense in layers. And by that, what I mean is you could start at the lowest layer and again we are tied to help today. So I'm going to speak from that aspect. You can have helm tests, which are part of your health charts that the developers built. You can bring in progressive delivery framework like our go rollouts or flagger in which case you have if you're using canary deployments you have automated canary analysis. So it would it would leverage service measures to do redirect traffic in steps and do the validation as it as it redirects traffic to the to the canary pods before doing a promotion or rollback. So that's that's another layer and then the third layer is the introduction of captain, which one lets you do quality gates where I think the the other feature that we we love about captain is that it can it can do continuous testing for you. And in the 5G world, you could imagine that as as the deployments happening for every application going in captain can do a call back to say an X for server or whatever your testing framework is for your call flows. It could go kick that off do some validation and only then promote the node in the tag. So the applications around so it will change the node in the back to agreeing before it goes to the next. Next node in in the workflow dag. So that's that's three layers. The fourth one would be bring your bring your own container executor. So now you could build your own. You could have your own script, whatever runtime you want to use that could query. If you're using Azure, go in and look at Azure monitor or any other kind of metrics or behaviors that you want to test for across your system or application group just to make sure nothing else is affected so you can build your own own validation as well. So I think it's it's to answer your question it's not it's not tailored to 5G specifically or rather the configuration aspect of 5G, which itself on its own is like a really complex task and orchestra is not trying to address that at all. Yeah, no it answers my question fully I think I So I guess that it really depends on you know for these tasks to turn green right it means that you have to write a helm chart that not only succeeds in deploying but actually has some tests to make sure to make sure that this component is up right this sub chart is up so it. It's up to you to make sure that your helm chart does the verification that it needs for the task to turn green before it just moves on. Yeah, additionally, not just checking its own state and whether it's up, it can also so a lot of a lot of those NFS are interdependent so it can go and query the state of or make make some sample calls or whatever it needs to do between those those NFS to make sure things are looking good. All the calls are succeeding so it could do smaller tests there like integration tests in there and then you have the system level test with the call flows happening using captain. So it can it can look look at the entire topology and do some verification across the in fact across other applications running on the cluster. Okay, thank you. Welcome. Any other questions. Right. So let me just share the link with all of you. It would be great to see people come in and give some feedback. And we love we love for contributors to come in and pick up some tasks as well. Space is right here. Please add the link and and if if there's a online version of slides or if you can upload those somewhere. Should I put it on the Google Drive. Yeah, put it on the yeah that Google Doc you can add them the links to the line at him for you. I'll do that. Yeah, that's only good. Thanks for letting me speak. Appreciate it. Yeah, absolutely. Thanks for coming and presenting to us today. That's all we had on the agenda for today. So in the teachers you can add the links to the doctors so people can find them after the meeting that'd be great. Is there anything anyone else would like to discuss or chat about any more questions for the teacher about orchestra. I have a general question now regarding the agenda and the the KubeCon videos that are up. There are a lot of KubeCon videos I wonder if anybody in this group can point us to some good ones having to do with topics related to telco. Yeah, so I think Taylor put a couple in so one that I recommend was the keynote from book from budget telecom talking about how they're using CNCF technologies to build out their. 5G infrastructure. Taylor also gave a talk about the scene of working group. Did anybody else see any talks at KubeCon that they liked or would recommend to other people. Tal were you able to attend KubeCon. I've watched exactly the two videos recommended so and nothing more. Okay, I know there was also some at the Kubernetes on edge day two, but I will add a link to that afterwards because I wasn't able to watch them. There's a lot of things going on in the day zero events, but I'll add a link to those videos after the call to. Yeah, it's a bit overwhelming. Yeah. Cool. Is there anything else anyone wants to discuss. Can I see in a teacher's drop the link to the slides in the chat too. So with that thanks everyone for coming if you're joining the scene of working group meeting we're going to be starting in about eight minutes on the other zoom call. So thanks everyone for joining. Cheers. Cheers.