 So, hello everyone and welcome to our session. We are truly excited to present how we are moving from GitOps to Kubernetes resource model in our 5G core. So before we start the presentation or we dive into the telecommunications, the world of telecommunications and its intersection with the GitOps and the Kubernetes worlds, I would like to get a feel of the audience first. So could you please raise your hands if anyone's from the telco domain or work in the telco area? Okay, it's really nice to see we have a diverse audience today. So let's start with an analogy. Imagine that you're going on a cross-country road trip and you want to go from point A to B. And you have two options. The first is static paper map. The other one is an application like Apple Maps or Google Maps. So with the static map, you can see that it's quite static. The information or what's in there, it's quite fixed, right? And also it's unchanging and to the user end, it's quite overwhelming. On the other end, you have Apple Maps. It's quite dynamic. It also changes on the external conditions. So if there are any, due to the traffic conditions and such, it will also recommend you different routes and such. As it's more focused, it also gives you a simple way to navigate your path. And these two examples, it really mirrors what we are going through in our journey in the 5G core. So what we have now is GitHub's way of implementation and our configuration. It's quite static. It's very similar to this static paper map. And what we want to achieve in the end is dynamic configuration with the introduction of GitHub's and Kubernetes resource model. So with that, I would like to introduce myself. My name is Ashant Seneverathna. I am the product owner of the mobile cloud native automation team. My name is Joel Studler. I'm a DevOps engineer in the same team. You can find our contact info on the slides if you want to contact us. So we both work for Swisscom. Swisscom is one of the major or leading telco and IT providers in Switzerland. But only that, we're also moving from a telco company to more like a tech company. We call this the journey from telco to techco. So when we talk about 5G core, it's maybe important to distinguish what it means. So 5G consists of two components. The one component is the RAM, the radio access network, which basically makes sure that your cell phone can connect to the tower wirelessly. And then you're being redirected. The other part is the centralized part, which is the core, which basically handles user requests, user data management, authentication, and then actually forwarding the data to the right place. So we're focusing on the core part today. That's our journey, the automation of mobile network cores. So another question to all of you who already had a smartphone or a cell phone late 90s, early 2000s, please raise your hand. So that's not so many interesting, but chances are that when you already had a cell phone back then, it was based on 2G, which was the first mainstream technology used for mobile networkings. There was one G, but it was never reaching mainstream. So in the core part, it was built on custom hardware, like really vendor provided hardware and the configuration management was done using Excel spreadsheets. The next iteration didn't introduce much revolution on these two ends. It basically extended the configuration management with a few scripts. You could imagine like Python scripts or VBA within your Excel sheets, ugly stuff. We call this the physical network functions. The next iteration was introducing virtualization using x86 VMs, which basically paved the way to commodity hardware. The automation was extended with Ansible or like more sophisticated automation tools such as Ansible. This era of network function is called VNF. And where we are at now with 5G, we're basically moving from VMs to containers on Kubernetes, that's called cloud-native network functions, and we also leverage concepts like infrastructure as code and GitOps. And Arshaan is now going to talk about GitOps and what it looks like in practice. Thank you. So let's talk about GitOps. What is GitOps? So far, GitOps is not a tool. It's a way of working or a way of implementation. And the project called OpenGitOps, they have defined four principles, so I would like to go through them now. The first one is declarative. What does declarative mean? A system managed by GitOps must have its desired state expressed declaratively. The second principle is it's continuously reconciled. So the software agent is running on your system. It observes the actual system state and it tries to apply it to the desired state. The third is it's versioned and immutable. So your desired state is stored in a way that it enforces immutability, its version, and also it retains the complete version history. The last is pulled automatically. The software agents automatically pulled the desired state from the source. Now these four principles, it gives you a powerful way to set up your system or your automation framework. But in practice, not all the tools are embracing all these four principles, so it's really necessary to look into what's been done in practice. So here I would like to go through four examples in practice. So the first two are quite similar. The first one is infrastructure as code using Terraform, applied manually. I think most of you have used Terraform configuration. The second is it's very similar to in our set up, which we will go through next, triggering Ansible playbooks using a Jenkins pipeline. So with these two examples, what you see is the configuration is done in a declarative way. It's versioned and versioned control, but it's not harnessing all the full GitOps power, right? So it's missing out the automation and the reconciliation aspects. The third example is GitLab CI pipeline acting on commits. So this also misses the reconciliation aspect of it. The last example is on Flux or Argo CD, so you may have used Flux and Argo CD. So in here what you see is it does the automation and also the continuous reconciliation. It's done in a declarative way and it's also versioned control. So Flux and Argo CD, it adheres to all four principles. Now with these four examples in mind, I would like to show you what's our current state of automation. This is a very high level overview of the automation that we've done in the 5G core, because we don't want to overwhelm with the network or the networking like the topics and such. But what you see here is it's the first phase of our pipeline or the stage of the pipeline. We use Flux as the CD tool and we use the continuous integration in GitLab and it's using VOTE as our secret management tool. So what you see in here is in a very high level, Flux deploys the 5G core and it's using this, according to the previous slide, it's adhering to all four principles. So that's not just that for the 5G core. So Joel will explain how we do the config management. So what you see here is the second phase of our pipeline. It uses Ansible playbooks and the Jenkins pipeline triggers these Ansible playbooks to do the configuration of the core. So in the first phase, we are deploying the 5G core. It's a bunch of Helm charts and the docker images and such. The next phase is the quite important part for the network core. It uses the network configuration. Lastly we do some system integrations. It's more about the integration tools, monitoring tools and also the network management tools and such. So as I mentioned in the previous slide, we have quite different flavors of GitOps implementation. So we use a mix of Flux, Ansible and Jenkins. So what you see here is in our implementation, it's missing the reconciliation aspects on the Ansible and the Jenkins setup. And also we are doing it in an out-of-band Kubernetes wave. So we are not harnessing the full power of Kubernetes here. It's also taking a fire and forget approach for the network configuration part. Now next, let's also look into how GitOps manifests look like in our configuration. So this is one of the config files. Obviously you can't read it. There are too many lines. So let's zoom in a little bit. So what you see here is in our configuration or in our average GitOps manifest, it has quite many IP addresses, subnets, VLANs, secret references and such. And it's quite similar to this static map that we described at the start. And most importantly, it's also lacking the abstraction part of it. So with this, it's quite complex for us to scale in our 5G core to many sites. So with that, in our implementation, we have identified some key issues. So often there's no reconciliation aspects. It's more taking a fire and forget approach. It's out of band to Kubernetes. So we don't take the full leverage of the Kubernetes orchestration. And also it's missing the simplification of it. So for the network engineers, it's quite complex to scale our automation framework, what we have implemented. Having said that, we would also like to keep some of the good stuff that we found from the GitOps implementation we had. The source of truth, the review process that we've implemented, and also the declarative aspects. So with that, I would like to hand over to Joel. He will go through how we're going to improve our automation framework using Kubernetes resource model. Yeah. Thank you. So before we talk about how we can solve these issues, we need to talk about configuration in general. So let's assume you have a cloud native app. There is normally, there's a deployment. You have a config map and secrets. Then you expose it through a service, which can be consumed by the app consumer. So that's a very simple case. You have one single source of configuration, which is Kubernetes secrets and config maps. If we look at the bit more complex example, which could be a SQL database, you would still have all of that. But in addition, you now have a SQL command which basically needs to instantiate your database. In this case, create database myDB. So we have the first configuration source coming through the config maps and secrets, but we also have a second config source coming through the app interface. You could solve that using init containers, using operators, or Kubernetes Chrome jobs or jobs, but it's not cloud native by design. And similar to that, in a telco app, we have the exact same or almost the exact same issue. In this case, it's not SQL. It's not a SQL command, but it's a configuration interface that's the telco industry defined as a standard. This standard is called NetConf, and it's defined in an RFC. So here we have the exact same issue. We have the first config source, which is cloud native, and we have a second almost proprietary config source. The issue with that is we think it's not cloud native. It's not adhering to cloud native practice. And in the case of telco apps, we would really like to see a NetConf-free 5G core, but we are also quite sure that this is not going to disappear anytime soon, so we need to solve these issues and work around them. So when we go back to the picture, Ashland wrote, it's a bit simplified here. We have a Jenkins pipeline on top, which triggers an Ansible playbook, which then reads the config from Git, like the paper map, the very detailed configuration, and pushes down the configuration to the application. So that's all this NetConf procedure, which comes after the cloud native stuff. So the first issue is there's no reconciliation. Jenkins is usually in fire and forget mode, so it's a one-off task. Maybe there is a retry, but not so much as a reconciliation, which constantly tries to enforce the desired state, and the second issue is that it's out of bond with Kubernetes. So we cannot see the state of the pipeline within Kubernetes. The way we think we will solve that is by basically converting this manifest from Git into a Kubernetes resource, like a Kubernetes manifest, a CR, in this case of type config, which then can be synced using flux down to the cluster, so it will be an active CR in the cluster. And then we introduce a piece that we call config sync operator, which is a Kubernetes operator that is basically syncing the desired configuration down to the network function through this NetConf interface. So we solve reconciliation with that by introducing an operator that is basically constantly reconciling, and we solve the Kubernetes inbound problem, so we now have visibility from within the cluster across these resources, because everything is living as a custom resource. What we still didn't solve is abstraction, so we still have this static paper map in Git, which is something we need to solve later on. So the tool we currently think is going to solve this issue for us is called STC, it's schema-driven configuration. That's a fairly new tool that was published on GitHub a few weeks ago. It's basically implementing protocols like GNMI and NetConf, so telco industry standards. It's supporting physical, virtual, and cloud-native endpoint. It's vendor agnostic, so you don't need to care or worry if it works as long as your vendor adheres to standards. It's declarative, as in it's using Kubernetes resources, Kubernetes operators, and so on. And on the roadmap, there's schema validation and fixing of config drift. If you want to know more, please visit their website and, yeah, also give feedback and try it out yourself. So I would like to demo this tool very briefly now. Please keep in mind we don't show any of the flux stuff. It's really just on the Kubernetes cluster a very specific demo to this operator. So on this view on the bottom, you can see the operator locks, which is basically the controller that reconciles these resources. On the upper right, you can see the locks of my NetConf server. So I've just deployed a simple NetConf server. It's not a full-blown 5G core. I think that wouldn't run on my laptop, but it's just a simple example app. So if we get the targets, we can see there is already a target there. Target is like an endpoint, which needs to be referenced by my config, which represents my target in the upper right. So if I have a look at the config I will apply, I can see this target is referenced using a label selector. And then in the manifest in the spec of my config, there is this NetConf compatible configuration, which will push down to the CNF. So let me apply that now. And you can see immediately we have, so the reconciler is reconciling it, and on the locks of my NetConf server, we can see that we are getting messages of type edit config. So edit config is the type of message which edits a config. So I'm not going to bore you with NetConf internals, but that's all you need to know right now. And then you can see the XML representation of the config we've had in the spec earlier. So that's a very small demo on how this SDC tool works, which basically will solve this reconciliation topic for us. So remember, we solved reconciliation, and we're now Kubernetes inbounds, so everything is within the cluster. But we still don't have abstraction. So the way we think this will be solved in our case is that we introduce a new custom resource. In this case, in this example, it would be a DNN. DNN stands for data network name. It's a piece of the 5G core, basically, which contains only abstract information. So instead of having, for instance, an IP, IPv4 addresses, like we have on the left side, we now just have a flag that tells the system that we need an IP. So this resource can then be synced by flags down to the cluster, and in addition, we also sync other resources like an external secret, which will be able to fetch secrets from Vault, and a resource, in this case, of type IP address claim, which basically, similar to a PVC persistent volume claims, tells Kubernetes or the ecosystem to claim an IP address. So that's actually how we integrate into our IPAM system. And then we introduce a CNF config operator, which basically assembles these resources and will, based on custom business logic, will be able to render a final config that can't be consumed by the config sync operator I've shown earlier. So with that system, we now solve the reconciliation and its Kubernetes inbounds that was already there. But in addition, we now have abstraction using this CNF config operator. I would like to showcase you very briefly how this IPAM integration works. So we're not going to show you the whole setup, but just one piece of the puzzle. We're using Netbox for IP address management. That's an IPAM tool, and we need to integrate towards this tool. So if we list the CRD that contain Netbox in their name, we have IP addresses and prefixes. So that's just concepts, prefixes like a subnet. And then for each of these, we have the claim resources, similar to PVC and PV. So first, I need to apply the prefix. So there is a new prefix I would like to create. Let me show you on this Netbox UI that there is currently no prefix with that name or with that cider. So I will apply this. And if I refresh now, hopefully, it's a bit small. I can see there is a prefix now. There's no IP address yet assigned. So I will now assign the IP address, which it's a type of IP address claim. So I don't specify the exact IP. I just specify the parent prefix where to get an IP from. And we will then apply this resource. And hopefully, we will get an IP, which is the case. So we will get the .1 IP. And if I refresh in the UI, I will be able to see it was reserved. So that's just one small piece of the puzzle integrating IPs into Kubernetes using Kubernetes resources. So that said, I think we now have the prerequisites for showing you the entire ecosystem, how we envision this. And I will hand over to Ashan for this. Yeah, thanks. So we started with the current state, which is our static paper map example. And you can see here, this is our implementation now. It has these long lines of static configuration. And where we want to move towards this future state with this dynamic configuration, with more simplicity to the user or to the engineers. So as you all mentioned in the previous slide, it starts with the high level intent. So if you compare to the left-hand side, you have these long lines of configuration, which is quite hard to understand or very overwhelming. Here it's a high intent, which then translated by the configuration management into a resource level intent. So this is taken care by the Kubernetes layer. So you can see here the configuration operator. It does the rendering of the configuration by talking to the surrounding operators. And then using the config.sync, we apply the 5G code and the configuration. And here we have introduced the concept, which is within the Kubernetes layer. So what have we solved? In the end, three things. We solved the reconciliation by introducing operators. We have introduced the simplification to our complex configuration by introducing the dynamic config generation. And by introducing the Kubernetes resource model and extending the Kubernetes API, we managed to orchestrate everything in-band to the Kubernetes layer. Yeah, so the only thing left is to share with you our learnings on the journey of thinking beyond GitOps when it comes to Kubernetes workloads. And hopefully, there's something interesting for you as well. So what we learned is that we really should avoid using CI tools for CD. So they're usually fire and forget, which is not what we want on Kubernetes. We should avoid checking in low-level configuration into Git. Things like IP addresses, VLAN IDs, stuff like that. If possible, we should not have them in Git. We learned that tools that are out-of-band with Kubernetes, especially when it comes to automation, they really complicate the whole process and the whole ecosystem. And we would like to see a NetCon-free 5G core. Which will never happen. So what we instead aim to is to reuse cloud-native tools. So all the tools on the CNCF landscape, yeah, we can use them. They're there. They're usually cloud-native, as in they adhere to GitOps practice and to the cloud-native principles. If you don't find a tool that does the job reuse, so if you cannot reuse, then create your own Kubernetes operator and share it, obviously. Then we found that it's important to every now and then validate the GitOps principles in your system, because as Asran showed earlier, we found many places where not all of the GitOps principles were implemented while we were thinking they were implemented or we were thinking we were doing GitOps. Then we learned that it makes sense to introduce abstractions. So get the low-level config out of your Git repo and abstract it away using custom software or operators in our case. And last but not least, please contribute. Share your findings, share your projects. We're also working on open-sourcing this NetBox operator, so if that's something of interest, please approach us and also give feedback to existing projects. So the very last slide contains a few further material. We've talked about GitOps principles and this SDC config-sync tools. There was a talk half a year ago, which is basically the predecessor of what we are showing today at container days. And there is a white paper about accelerating cloud-native in Telco, co-written by Asran, which will be hopefully very interesting for you if you are in Telco. And with that, I thank you very much. Thanks for being here. Thanks for your interest. Yeah. It's time for questions if you have any. Yeah. I don't know maybe you can. You can maybe pass the mic. Hello. Hey. Hey. Thanks for a nice presentation. I just want to clarify maybe a matter of perspective, because in the beginning of presentation you said moving away from GitOps, but in the end you show that your final solution is adhering to all the principles of a GitOps approach. So in my view, you would like all in GitOps not moving away GitOps. That's very true. Yes. I think GitOps is still a very valid concept, but I think the bottom line for us was that we find that most often it's not properly implemented. So the de facto GitOps is not actually GitOps for us. Thank you. I don't think that gives us ... I mean, and also you need to do a lot of ... I mean, especially with the Telco, right? It's not true cloud native. So you need to introduce these operators and such using this Kubernetes resource model. So there's quite a many of things that you need to do in addition, right, to bring the whole lifecycle aspect there. Hi. I was wondering what did you mean in your avoid slide by checking low-level configurations with Git? I'm not sure if you are using GitOps, everything is managed in Git and if it's not Git-centric. So I'm not sure what you meant. Yes, so we feel that if you really have all the low-level configuration, like all the details that you need in your final deployment, if you check all of that into Git, management of it is going to be quite cumbersome because either you have a tool that manages the Git repo for you, but that's not the practice usually, or you will have huge merge requests with many lines which can be quite cumbersome for engineers to manage in the end. Yeah, like big manifests and our take in this is really to simplify it down to the relevant parameters and having as much as possible dynamically fetched from outside systems because usually GitOps is about integrating the app you want to deploy into your ecosystem, right? And there, GitOps is quite static in the end as in if let's say the IP changes or the network changes, you need to go through it manually usually. Yeah, I mean to add to that one, in our low-level configuration, how we prepare it now is it's based on Ansible inventory and the low-level design is in the end, it's an inventory where you have quite amount of IP addresses and these references and stuff, so it's quite dynamic but the configuration itself is static in the end. So yeah, so it's not scalable at all like what we have now. Thank you. So I would like to follow up on that because it was probably the most interesting slide in this presentation to be honest. So what you say is basically you suggest people to not store the state in Git to only store the configuration. Is that basically the same thing? So you have another place where you need to have all those low-level configurations being stored. In your case, are those CRDs or well, where actually are they in your case? The CRs they are stored in the Kubernetes API or you can also have your own extension of the CUBE API if LCD doesn't scale, that's possible. But my question is because you say that you are not recommending storing the low-level configuration like IP addresses or particular villains in the Git. So I assume you want to only have the high-level configuration, what you want to achieve but you still need some place to store the information about the actual IP addresses. Yes. Where the state is in your case? In that city, in the CUBE API. Okay, so it's basically there. But maybe to follow up on that, I mean, we're not 100% sure yet. I mean, sometimes you need, you want to have visibility of your configuration, especially when it comes to net conf like these huge manifests. It can be useful to have them in a Git repo. So for instance, the approach Nefio has on that is basically Nefio's UI or a CLI interface that basically using KPT, right, is creating merge requests, rendering files, and then you always have the Git log for your actual configuration. So that's something that we imagine could also happen and we're not, we didn't take a final decision yet. Can you write that? Nefio. Okay. Yeah, it's on the second slide, third slide. Yeah, thank you for the presentation. I think you work with Ericsson mostly and I think most CNFs are still not really cloud native. How are you working with Ericsson to become, to have them make more cloud native ready CNFs for you? Yeah, I mean, we do collaborate with them a lot on this journey. And maybe when you say it's not true cloud native, it's like, I guess you're alluding to the configuration aspects of it. So one approach is to have this configuration as config maps or such. This is something that we are discussing. And at the same time, we also have this collaboration with the other operators to define what we really require from the CNFs. And so here we collaborated with Dorje Telecom, Orange and such. So that's the white paper about. So maybe it might be interesting for you to look into. Yeah. Thank you. Hello and thank you. So it seems like you're actually trying to solve how to get tickets out of the system. Or at least that's my take on it. You have a very complex configuration and you have a lot of input. Do you see that you will need to get input from something other than Git? Because now you're saving your states in Git through, or not saving your states, you're saving your requests in Git. But do you see that we will get input from other sources, like totally decoupled from Git or Kubernetes itself, but use Kubernetes to source these resources? Yeah, so I like thousands, millions of requests for IP addresses, for instance. Yeah, I think the key message for us is that Git is suitable for many things and also for GitOps. I mean, it's really a strong concept. But having everything in Git will make the system really static and cumbersome to manage. So everything that you know will be dynamic shouldn't be in Git, I assume. I mean, that's a general statement, but you need to figure it out yourself. But we think there are certain parts of a configuration that should be more dynamic than what Git offers. Yeah. Yeah, and also it's about passing this intent to the Kubernetes layer. Like letting it handle the orchestration of these dynamic components. One question. I see storing dynamic information in Git is like a snapshot of the actual state. What are you doing? What are you bringing life to the system? How are you handling misconfiguration and rollbacks if your dynamic configuration is somehow wrong? Yeah, so the snapshot thing, I mean, according to GitOps, it should be the desired state and not the actual state, right, in Git, if you're here to GitOps practice. But that's what I meant earlier. Maybe it might be useful to have an intermediate step with like an additional GitOps layer, which is more automated. We don't know that yet, but I mean, yeah, rolling back resources that dynamically generates, that's going to be hard if you don't have this additional layer. That's true, yeah. So I don't know if you still have time, but two minutes. Okay, cool. Yeah, thank you. Yeah, in network configuration, you essentially have a lot of dependencies all the time, like day zero configuration, day one configuration, underlays, overlays, and so on. What do you do with these dependencies? Like if you change something in the underlay, would like a Kubernetes resource automatically do a rescheduling of the overlay as well, do post checks, and so on? I assume that should be part of the CRD eventually, like doing post checks if something happens, right? How is, or will this be built in and if, how? Yeah, I mean, maybe simply the dependency mapping, we want to bring into the Kubernetes layer. So say for an example, we have a network function deploying in one site, but on top from the telcos point of view, you could also have a georedundancy, like a configuration. So the idea is to bring this next layer as a CR, introduce as a CR as well. So there could be a geored, like a peering as a CR, and then you build the layers above. So then the next would be the 5G core as a CR. So it's more about handing over the dependency mapping to the Kubernetes layer. Okay, so if you have like a high available connection, then you would like have a CR of both of the legs and have like an overall status in the status or so as well. But maybe it's also important if you want, I mean, the ideal state would be to automate everything. But we're living in another reality where we have constraints, right? So you need to decide where you focus on, and then maybe some of these pieces don't need to be fully automated, and some of the pieces surely should be, right? So it's always a compromise as well, I guess. Yes? Yeah, thank you very much. We will stay here, so please, if you have further questions.