 Good morning, everybody. We are a little bit delayed, but we promise that we will at least try to compress our presentation as much as possible. So it's really a pleasure to be here, especially after all this crazy COVID times, seeing the people together. That's really wonderful. It's really a pleasure to be here. So my name is Michael Severa. And today, together with Semenicius, we will present the stuff related with addressing the GitOps. So the title is GitOps-based Cloud native 5GC lifecycle management. I think that we will have a lot of insights coming from our experiences during the deployment. But before, we'll start just a short intro and presentation. So once again, my name is Michael Severa. Actually, I'm not talking about the technology stuff. I play guitar. I am a physicist enthusiast. And I'm also a member of Open Networking Foundation. Actually, we did a lot of activities around open software, around the packet core with the first world deployment of the open source EPC. And now we are trying to continue this work also in the cloud native area. Yeah, hello. My name is Semenicius. I'm located in Brim, northern Germany. And yeah, I like coffee. And I like traveling and cycling. Actually, on an e-bike, but not because I'm lazy, but I like to cycle fast. Yeah, I recently joined the 5GC cloud native DevOps team. And yeah, all right. Perfect. Very good. So let's get started. Actually, the agenda for today is a sort of journey. And we would like to take you with this journey focusing on how we started and where we are today. So we will go quickly through an evolution of mobile core networks, then getting through the 5GC architecture, which we have. And then finally, focusing on the transition and the technical platform deployment. So that's basically the plan. Of course, we will finish with the Q&A session. So whatever questions you have, it will be a pleasure to discuss things with you. So let's start with the evolution of mobile core. Actually, I think that we hear the keyword transformation every five to 10 years. When I started my work at telecom industry like 20 years ago, that was a world of boxes. So basically, everything was a box and still plenty of people from that times, they are telling like, yeah, that's the highest quality. That was the very good sort of telecommunication, right? So that was a closed shop box, standard TCA or ATCA. And basically, that was a completely built ecosystem, including the hardware. Now, I think that a decade ago, we as an industry started this sort of transformation trying to get the benefits of the virtualization. So that was this famous moment of doing the virtualization. Actually, even in our company, in Deutsche Telekom, we started a program called PanNet with the ambition, actually, to move the telco platforms to this frame. But honestly speaking, from my perspective, that was really not changing significantly because that was ending up still in the world of boxes. We had a box which was pretty much still owned by the supplier. OK, the technology was different, but the processes behind it, that was still pretty much the same. If you will go to the industry checking the things, you will easily discover that a lot of operators across the world, they have a box-based virtualized solution, which means from contractual perspective and process perspective, that's not significant difference compared to a world of previous generation of boxes. Now, right now, we are in the front of the huge opportunity. And actually, that was a fantastic presentation from Cisco showing that 5G is actually an abler. And we consider it exactly the same, so we believe that 5G is the first platform which really gives the opportunity to do things differently. And that's the sort of cloud-native approach. Now, in order to understand what does it mean cloud-native, I would like to first of all state a following statement, which I always try to repeat, is that running application as a container does not mean that the application is cloud-native. And I think that is a fundamental statement. There is a lot of claims in the market that, hey, we are super cloud-native and we have a platform which is cloud-native. That's really not true. Cloud-native means it's no longer a box-based solution. And when I say it's no longer a box-based solution, it means that it's completely decomposed not only from a technology perspective, but also from the processes perspective. So if you will see here on this picture, you have a lot of moving parts. And those moving parts have different ownership. So actually, you have images, hand charts, pipelines. You have things like Kubernetes network policies. You have the things like RBAC resources. And of course, if you will go to the bottom, you will see that, OK, there are also platform as a service components, which are not coming from a supplier. They are generic. So definition of system is completely different. There is no longer a world of box. We are talking about heavily distributed system. And each of those components has their own lifecycle, which means, and that's a super important statement, is that there is no longer a concept like, yeah, let's freeze everything. Let's keep it, do not touch it for next half a year. Let's do big sort of testing. And then after this big sort of testing, let's do once upgrade. And we are done for one year. No. Actually, we need to enable the process to allow individual changes of individual platforms. So if you will travel with time, I love the movie Back to the Future, if you will do this traveling time and speak with the guys 20 years ago, that will be a total disaster for him. Because it means that potentially every week, every month, one of those components will be changed. So for example, you have a security vulnerability and you need to address it on a Kubernetes layer. What does it mean? It means that basically you will need to do a rolling upgrade. And there is a different state of the system after this change. And to cope with this, we need to remember, and I really highlighted, that the frequent changes are imminent part of the system. And I think that if I will try to compile the most important statement, which is showing the difference, that will be this one. So how to cope with a system which can change at any point of time, and you need to still be able to control quality, and you need to really be able to control the touch points. So I was mentioning this movie, Back to the Future. And I really like it because I think that essential part of the cloud native ecosystem is to have a sort of time machine. And Sami will explain the technical details of this later. But I think this analogy is really very well. So you should have the ability to put your time in back to time whenever you have an issue. So for example, you are doing an upgrade, or one of the components is being upgraded, and then you're saying, OK, we have a problem. You need to be able to automatically revert all the changes, not only in terms of configuration, in terms of everything, images, deployments, things like this. So I think that this one is showing the sort of challenge. Now, to go to the 5GC architecture itself, I was just showing the framework. Now, if we think about what does it mean in terms of complexity, we'll need to multiply this challenge by each of these platforms. So I think that the colleagues from Swisscom, they were mentioning the combo core-based upload. So converged core means 5G and 4G. And that's, of course, a natural way how you need to produce the service. You cannot just say, hey, I will just do the 5G part, and customers will be happy. Of course, it's not impossible, right? We need to have a converged core. So we need to take the previously, the legacy, which we really hate because those legacy, like LT, that was completely not prepared for a cloud native way of producing the service. So that increased the complexity even more, right? We have the components like AMF, which are pretty much prepared for the cloud native. But we need to support the components like MME. And if we go even more, actually, we have a lot of protocols which we really don't like. On the right side, on the slide, you will have a fantastic SBI stack, which we really love, because that's purely designed for a cloud native. If we have an HTTP-based REST interfaces, that's really fantastic. You can use Kubernetes ingress services. Kubernetes is basically designed for it, right? But if you want to produce the service containing both 5G and 4G, still you have interfaces like, for example, M3. And when we are saying about the interface, if we are discussing the details about the interface like M3, they are really far away for being a Kubernetes sort of compliance, right? The things like GTP and things like this, that's immediately enabling the light in our head like, OK, which means MUTUS, which probably means a couple of other things. Maybe in the future, doing things in smartNICS, but not today, right? So this complexity, actually, is because we have a lot of legacy in our backpack. So now, how to do this sort of transformation? Because I think that essence is really the transformation itself. And I really love the statement, the most dangerous phrase in the languages, we've always done it this way. I think that one of the biggest mistakes of the industry with the virtualization was that the people who tried to implement it, they had the mindset of boxes. And if we are smart enough as an industry, we should take the learning out of it and do not make the same sort of mistake. I think that trying to implement the cloud native with just keeping the same sort of box way approach would be a disaster. One of the examples which I can make is that I had a couple of colleagues and they were always asking me, but where is the command line? Where I can log into a box? And I'm saying, there is no longer a box until we will not get out from the side sort of thinking, like, okay, I need to upgrade side A, side B, and I need to move the traffic. That's really not cloud native. Of course, it is a journey. We will need to go to this. But I think that the holy grail of cloud native stuff is thinking beyond the sides. It's thinking beyond the boxes. And that's the essence. So from our perspective, the idea to make it happen was to merge two types of personalities. On left side, you have the telco experts. They know details about 3GPP. They know everything about architectures, complex interfaces, those kind of things. On the right side, you have typically young guys, and I see it also in the sort of community that we are sort of mixed, giving a fantastic experience about the systems related with cloud native. The question is, how to mix those two worlds together? Because in our view, doing it separately means, like, let the application guys, let the telco experts work completely separately out of the guys from cloud will not work. So we took the journey in a way like creating a team, trying to merge the skills, trying to put the experts from both domains. And what is really fundamental, let's do the clean approach. So in our case, when we tried to build this new solution, we had an idea actually to question every single thing from a legacy. One of my favorite examples is that IPv6. We always had this sort of debate like, okay, is IPv6 future or not? Yes, of course it is. So why we are using V4, right? In our deployment, we erased V4 completely, right? And if you will go to an examples, actually we were really successful to question the legacy things. There are a bunch of legacy things which needs to be still questionable. One of my favorite one is multipathing, which was designed conceptually like 20 years ago, where the world of boxes was. Now in the cloud native, that's completely irrelevant. It's like trying to port horse into a front of car, right? Because yeah, we had horses some years ago. And I think that the concept of questioning everything, doing a clean design and mixing the things is a sort of, was a sort of fundament in case of our approach. So if we will go further, actually comparing those two worlds, we believe that the fundamental thing is to be sure and to be aware about the differences. So if we think about the 5G, cloud native 5G ecosystem, we believe that the change is really an asset. So in a traditional world, and I know it from years of experience before, that was always the sort of big day. You are preparing for an upgrade, half a year you are doing a thousand regression tests with Spirant, and then there is a sort of big day like yeah, today we are doing a pilot and we are putting a new software and whatsoever, right? Here it's completely opposite. I mean, we are doing massive changes. We are doing changes sometimes, couple of changes even per week, but they are really small ones. And I think that the essence is to have the framework which really supports it. So from my perspective, once you have a framework which you can really trust, and it's not using the sort of brain interface, so protein-based interface like human beings for every single work order or change, you can really trust it. And then once this will be, this is set up, you have a lot of other benefits, like you can do small improvements, dynamic scaling, recovery of system. What I love in actually cloud native approach is that with this travel and time, you can say like yeah, I want to be again on Monday 9 a.m. with everything, right? And with GitOps, actually, that's possible. And imagine that you have a system which contains 10 sites, you just did something, and you would like to revert everything in 30 minutes. It's absolutely possible. In a world of legacy, typical behavior is that yeah, okay, we need to do redeployment, which means sending professional services on site or doing things remotely, but the long time, right? And I think that the fundamental difference is exactly that you have built in mechanisms to support this frequent changes. And in order to understand how we did it, it will be a pleasure to hand over to Sami. Sami was a significant contributor in the project. Actually, he forced a couple of solutions which we're very proud of. And Sami, the floor is yours. Thanks, Vita. Telecommunication, hardware and software specifically designed to support very strict requirements like 5.9's availability. On the other hand, IT hardware components are not specifically designed for takeover application, even that of high quality. DT's target is to achieve at least as good capacity, performance, and availability on the cloud as it is provided nowadays on traditional legacy or proprietary hardware. That's the reason why DT decided to build their own car solution, which is called Das Schiff, which means the boat in English. Das Schiff is a managed Kubernetes cluster service for takeover applications and it runs on any infrastructure. It is available in different region, different environments, and in every network segment where takeover applications are located. It is built with infrastructure orchestration in mind, using the cluster API framework, and there are two types of clusters, management clusters and workload or tenant clusters. And the management clusters manage the lifecycle of the workload clusters. So I would like to take the chance to advertise also the KubeCon talk from our colleagues on Friday afternoon. So if you would like to know more, especially on the networking part, then please make sure to attend this talk. And this picture is actually from them describing how they feel about telco workloads. Okay, so the next section, I would like to describe how we are operating the network in comparison to the legacy era, using all the benefits which we get from the cloud. This slide shows an overview about each layer used to build the five GC and F, the fundamental for the five GC and Fs. Due to the high demand of user plane traffic, it is absolutely essential for us to run on bare metal hardware. On top of that, the host OS runs with container runtime followed by the CAS layer, which includes Kubernetes as orchestration API. The past layer is mixed responsibility and we already got challenged several times in troubleshooting, identifying issues on the border between application and CAS. And components like observability and monitoring are managed SAS components by the shift. On the other hand, components like ingress or specific CNIs like already mentioned, MULTUS are brought and managed by the five GC DevOps team themselves. And on top of that, we are running all our five GC and Fs. Now I would like to explain the concept of rolling train. Rolling train means that we keep infrastructure, host OS, CAS and pass components on new software version. This is a big challenge for us as the environment is frequently changing. We need to implement nonstop testing and keep the impact on customer traffic at a minimum level while doing the upgrade. The big advantage is to release up-to-date product versions more rapidly and have them fixed according to customer and security demands. The next topic I would like to address is GitOps, which is a major part of the new operational model. So what is GitOps? GitOps means that everything we would like to apply to Kubernetes cluster is derived from a Git repository. The Git repository is defined as a desired state, whereas the Kubernetes cluster itself represents the actual or the current state. The task of the GitOps operator is to always make sure that the current state is equal to the desired state. This process is called Reconciliation and it's a fundamental concept how Kubernetes works internally. As developers, everything we do is based on two principles. Either we are changing the desired state in the Git repository or we are observing the current state on the Kubernetes API. And the GitOps operator is the only entity which applies resources to the Kubernetes API and all resources and also custom resources must, and I repeat, must be stored in the Git repository. So what do we gain from this? With Git at the center of operations, we increase the deployment frequency by on the same time decreasing the change failure rate. We reduce lead times and we have an increased productivity and maintainability through the usage of best practices like infrastructure as code or don't repeat yourself. We decrease the potential of human error as we have an improved data quality, accuracy and transparency through the removal of data silos. So this is the concept of single source of truth. Yeah. And the DevOps team can do merge requests to accelerate and simplify application deployments. So we are able to prove our takeover or to merge our takeover processes into the GitOps system. And there are many, many, many more advantages. So now I will give back to Michael. Thanks a lot, Sammy. So I think that the keyword of this part which Sammy presented would be reconciliation. You remember Steve Ballmer doing developers, developers. I think that we should do the same with reconciliation, reconciliation, reconciliation. Honestly, I think that reconciliation is the heart of this concept. If you have reconciliation for all the layers, for configuration, for images, for everything, then you are very close to a target concept of cloud. Because it basically means that sooner or later you could start to think, I don't care how many sites I have, five, 10, I have a full automation. So if I have a problem, actually I can revert to a day before with everything. And that's exactly reconciliation. So it means there is no longer a sort of human being saying like, oh shit, I need to do some manual work. Or where is the documentation for our lost site? Oh, okay, maybe. And he's on holiday. Sorry, he cannot do it today, right? With GitOps actually, everything is documented. You can always revert to a state N minus one and you need to fully trust it. So I think that reconciliation is a keyword. Now in order to think about the sort of challenges, because you would say like, yeah, fantastic, wonderful. Actually just go for it and yeah, everything is simple. There are a couple of challenges and that's the last slide and after it we will go for the Q and A. But I would like to really underline those kind of challenges because we spent a lot of months actually or hard work facing those kind of challenges and I would like to share this sort of experience. So first of all, ISSU, so in software system upgrade is a sort of holy grail. What does it mean? It means that actually colleagues from Infra or we can do the rolling upgrade of the Infra or the upgrade of the application without even thinking that, hey, I need to move the traffic from one side to the other. That's the holy grail. Well, unfortunately, it's not so simple because we still have a lot of protocols as you remember from the slides before which are not cloud native. And it basically means you still will have a certain interruption of the traffic. So for example, if you have SCTP session on the NGAP level, so NGAP, which is application level interface for AMF, it will survive but SCTP will have a glitch, right? The second part is how to do it stateless. So imagine that you have millions of subscribers connected to your network and while they are using a service, you are doing an upgrade. So you are taking note by note by note which means that whenever microservices is being drained, the new one needs to fetch the last state from the subscriber and it needs to be smooth for the customer. Of course, it's a sort of holy grail. That would be fantastic. I could go sleep and say, hey, okay, rolling upgrade is happening itself. I don't need to care, but unfortunately we are not there yet. I think that for industry, this is one of the holy grails. And once we will do it, we will be really having significant benefits of running the cloud. Second part is configuration management. Configuration management needs to really follow these 12 factor principles. In the beginning of session, I think that there was this Q and A and someone was posting 12 factors. I believe that 12 factors is a hard of this. If the application is not fulfilling 12 factors, it's really not cloud native. It needs to really be completely separated from health charts. It needs to be separated from code and it needs to be stateless and that's absolutely essential. The next one, our famous example UPF, we have a lot of discussions like how to make UPF without relying on multis. And we believe that the holy grail is exactly to port the user plane out of it towards the smartNIC or maybe to a kernel leveraging then Kubernetes only for the control plane. And I'm absolutely sure that within the course of the presentation, this topic will be heavily, heavily touched. And last but not least, something which I call continuous testing. In a world of boxes, there was a statement like have we completed all the regression tests? Yeah, project manager is calling and saying how much percentage of the tests you've completed or a famous management question. Well, guys, tell me how many of test cases you have automated? Well, in cloud native, this is no longer the question. In cloud native world, the question is, well, how many test cases you have automated for a root cause analysis? Because you need to run the test automation always continuously. And the fact that you will be having thousands of pickups from the results does not mean that you know what's wrong. So I will park with this one. I would love to have the sort of Q&A right now. Yeah, and that was basically it, what we prepared. Thank you very much. It was really a pleasure. Thank you.