 Thank you for coming presentation, testing Kubernetes, cluster and VNFs in Terakostaging environment. I'm Kentaro Okawa, research engineer from NTT, R&D, and my colleague, Hirom Asahina, he's the most skillful engineer related to NTT. So NTT is Japanese biggest and oldest Teruko in short of the Japanese version of AT&T, actually. And our mission at NTT is to contribute to open source that may help us in the future and to propose the use of open source to other department or doing real business in our company, like the development team of the 5GC. And for the last two years, we have worked for the OpenStack TACCA project to introduce Kubernetes into Teruko and help migration from OpenStack-based VM to Kubernetes-based containers. However, now Kubernetes has become quite common, so my important thing is not using Kubernetes, but how Kubernetes is used. So we think GitOps provides a good answer to this question. So now we focus on proposing GitOps to our 5G development department. But since our system still has legacy parts and we cannot change everything at once, so after internal interviews, we decided that the staging environment would be our first target. So if we succeed in changing our staging environment, we believe we can easily apply it to the production environment or in the near future. Okay, and now Asahina will go on to explain our challenges. Okay, thank you. Again, I'm Asahina from NTT. I'm a software engineer. And like he said, we are trying to introduce GitOps to our 5G department, let's say, I think UC Wallamine. And originally, I didn't plan to do any live demo, but after watching this cubicle, I think we have to try it. So, because like he said, we are trying to do GitOps, so maybe this is an easy demo you've ever seen. So we have a configuration, maybe you see the KRM like thing, and I'm going to change just the Kubernetes version here. Okay, so something happens behind here. And you can see the current version of the Kubernetes is 1.2.4. It's a little too old, but if it's succeeded, maybe we can see it will be upgraded. So I'll check it later. It's going to take several minutes, so maybe we'll run out of time. Okay, let's go back to my slide. Okay, like he said, we worked for OpenStack, but nowadays, OpenStack is a little bit too, it's kind of legacy, and we have to move on to the Kubernetes, and just using Kubernetes is not important for now. How to use it is more important. And like he said, the answer for that question is GitOps, obviously. So we are now trying to introduce GitOps to our 5G department, and what I will show today is our prototype we are proposing now. Okay, so let's summarize what is GitOps. I don't believe it's very necessary because it's KubeCon, but I'd like to make everybody on the same page. So GitOps is a methodology and practice that uses Git repository as a single source of truth. An interesting part here is a single source of truth for me, because that means if I put everything on the Git repository, we can deploy it on the actual environment, and we can reach every information we need on Git repository. So in other words, we can gather every information we need on the Git repository. That is the most interesting part for me. I'll talk to you later. And next, I'd like to show you the good example of the what we call staging environment. By the way, this is the shared open level provided by .com or R&D. By the way, this is not related to us. This is other department things. I just borrowed it, but this is a good example, so let's see. Okay, you can see the server here and the VRAM application running on that. And this server is our target. Specifically, software running inside this server is our target. Kubernetes cluster running inside there, and the network function VRAM running inside there. Those things are our target. And what is the out of scope? We don't care about, you know, devices like smart phone or Wi-Fi router, whatever it is. And are you radio station out of scope? Because it is hard to control them from Kubernetes for now, so we're just focusing on virtual resources. And although this example just show VRAM, we try to support both 5GT and VRAM. Okay. So this is procedure of our staging environment. And when I say staging environment, it's a kind of mirror of production environment. And we can see two teams here. One teams in charge of creating infrastructure for testing environment. So everything, Kubernetes cluster and physical things like a network. And another team, testing team in charge of running test actual test. And if all tests has passed, it's signed that ready to promote to production environment. Sorry. And what I want to do is change it to like this, to this. I want to make this environment self-service style as possible. So meaning that testing, if testing team want to run their tests, they can create Kubernetes cluster by their own and run test from CI. Now they are doing test by hand. A part of tests are automated and they don't have a manual. So I want to change it to CI best test. But I don't try to change everything at once, like he said. So there's, we accept there's a manual part, but at least we can control them through GitOps. Okay. And here I want to share a little bit about what I say to this maker, because maybe when you try to introduce GitOps to your department, someone say, you know, is it worth enough? Is it beneficial for us? Something like that. And we were facing that problem. In our environment, not all members are colder and they have already automated some of them. So they may not be entirely welcoming GitOps, so when I talk that GitOps is a good solution to progress automation, they are not interested in. So I changed my word to GitOps, is there something improve information reachability? Well, GitOps in our environment is now looks like this. So there are many spreadsheet and artifacts are scattered over there. Our environment, we can't reach them easily. So there is some key person dependency. Only a specific key person know there what's where it is, where artifact, where it's simple by using GitOps. GitOps is not a portal, but by introducing GitOps, we can do everything through Git. So our member naturally lead to the word that GitOps is the center of the world. So in other words, we don't need expensive robot that can automate a specific step, we need conveyor space on single line. So let's move on to the how did I do that? How did I introduce GitOps to our staging environment? I added one thing to their images I showed previously. That way they create the system design part. System design is a fundamental for us. This is decided at the beginning of our development process. And the staging environment and production environment should follow the design. So we have to find a way to handle those system design and way to reflect that system design to staging environment somehow with GitOps. So the first challenge, is how to convert that 5G system design to Kubernetes cluster and network function configurations. So let's look back a little bit about the 5G system, how it looks like. It's an Asian-wide service and there are many edges with complex configurations like a networks placement accelerator. So we have to handle some patterns of deployment like I show here. There is a one Kubernetes cluster per edge. Another pattern we have is one Kubernetes cluster per region. Region needs a set of edges. So we have to handle those kind of differences and the configuration. And next I'd like to show you is our development processes. This is not tech development processes. This is what I believe user development process like a web service. In any case, it's happened within single organization. So you can control development team and the production team. So we have to unify their development phase and their operating phase. But in Teleco it's slightly different. We are separated by vendor. Maybe it's a little bit wrong world choices but Telecommunication services development looks like a more car manufacturer. I don't know about that. We design the system but we don't make it by ourselves. Instead we order it and buy softwares and the hardware from multiple vendors and assemble them, test them, operate them and sell them maybe someday. So what does that affect to us? It's affected to us. So I think it's a good environment because we can't develop from development process. Instead we have to start it from system design part and make system design and staging environment and even production environment be consistent. So the single source of truth, when I say single source of truth, we do regression test of course. So it can handle easily by CEI pipeline but also we have complex scenario test for specific operation like upgrading Kubernetes cluster on the edge. So it might consist of multiple operations, multiple steps and have to be executed for specific changes. So we have to handle those two different kind of tests on gate service. So those are challenges I just told. So let's see how we solved it. This is a general architecture of our system. So let me introduce a little bit. They're the flock CD. The flock CD to as a get up CD and we use cluster API. Like I said, we try to make a Kubernetes cluster and the network functions at the same time. And we use helm basically. In other words, we don't use other kind of package system like customization or kept package. So we want to reduce options which show to our development members. And this flock CD is thinking this repository and this repository has helm releases. Helm releases is a kind of flock CD's resources. You can write any parameters you want to give to the helm install command or helm upgrade command and we have test. We define test as a CEI job config and this job config is templated in another repositories. So the test team can just configure their scenario on here. They can consume the predefined template from here so they can focus on it. And the one last thing I want to share by this slide is their abstracted helm values. This is a new thing I made and what I show at the beginning of this presentation because helm releases are a little bit complex for our testing members so we can use this so that they can focus on the parameters they have to change. So like the templating of the CEI configuration our main most important thing for us is the separation of concerns so I did it here also. So this is the separate system design part from helm releases like this. So like I said the single source of truth for us is the system design so we have to define it other Kubernetes like resources and we can convert it to configuration by CEI automatically and the configuration is automatically. So our members can just focusing on the system design part. So how it looks like? It looks like this. It's small, it's short, it's abstracted enough and helm releases it's a little bit more complex so there are some fields that our members don't have to care about and even helm releases can have a variety depending on what you bought by vendors. So we can make a construction layer to absorb that differences. Okay, so the next challenge how to manage test scenario on get service. We don't honestly we don't have any concrete answer for that question but what we are trying now is something like GitHub flow. We create branch at the beginning of test scenario and doing some operations and configuration for test on that branch or pull request, merge request so that we can find a unit of test later easily and after test has passed we merge it so it's a really simple way. And we made a rule rule to make test scenario on get define CI template to enable tester to focus on test scenario make test branch before making a test do not make multiple test branch at the same time and these rules is maybe not necessary but we have choices so we do we can do something but always we have options and taking one get options from those options is a hard question for most of our members so we have to make a rule even if it's not so sophisticated. Okay, let's go to DEMPRAT. What I did is still running on the behind it so I show you the video first and finally I show you the result of what I did at the beginning of this talk and we have two scenario in this demo creating first one is creating 5D system and deploying Kubernetes cluster and OAI network functions on that and running test and second scenario is upgrading Kubernetes clusters and see the connectivity is maintained during upgrade we use OpenStack as an infrastructure and we use, like I said open air interface as an workload this is the initial stage of this demo and we make this in one shot connectivity test and stress test so there it is an abstraction value I showed just before and now I was trying to push that values to abstracted values repository so it's pushed and like I said it's automatically converted to Hermit release and promote it to the repository where flux is watching so this is the repository flux that watching and this is the configuration of our test scenario like I said we imported some templates from here and just configured parameters for test and now the Hermit release has been promoted to and now we have a staging environment we did so many things here so you can see fleet test connectivity test and it passed about stress test now you can see the cluster provisioned and we can access to the provisioned cluster by Qubic config automatically created by class API and we are provisioning now and all network functions are successfully deployed and we did some test for network connectivity and doing stress test after that and it's passed and like ordinary CI setup we can access to the test artifact easily and it's dummy but we make CI job to promote our staging environment to production environment and like I said if the test has passed we merge it this is the first scenario let me show you the second one so now we are trying to upgrade edges and let me explain a little bit more what we are trying to do is a little bit complex test when we try to upgrade edges we make a plan we make a plan like upgrading one edges and try to watch another edge that is running in the same area and so that connectivity for users is maintained so we are trying to automate that here so keep watching the another edge can be connected to the 5GC 5GC so I think we are running out of time we speed update so again I updated abstracted values so it's pretty easy and automatically converted to the 5GC 5GC and it releases like I showed before and promote it to staging environment and another test is automatically triggered here like I said there are some specific some tests which is specific for sorry for example there is a connectivity from adjacent edges so that service is not going down and all job are succeeded and do the same thing for another edge so it's almost the same just upgrading the edge 2 in the same way so the same test each changes to the production environment step by step and we can access to the artifact of course okay that's it so let's summarize the what I told in this talk a little bit like I said our environment in our environment not a member and not our steps can be automated at once so need to make design and configuration and the configuration be consistent so sorry I didn't know sorry so I don't know if this suggestion works for you but what we want is make an extensive place for future automation first not try to do automation and everything and do separate concerns through abstraction and template can be good choices to start this kind of thing and use OSS to keep maintainability and avoid key person dependency so lastly let's see what happens in the actual environment so the sorry sorry I'm sorry I really sorry I forgot to promote it to the actual environment so I missed it but I'll show you the you know the changes automatically propagated to here so it's what use so in my video so it's exactly the same thing and after merging it I believe it start to work see it's going to be 125 so unfortunately we can see the end of this demo it takes so maybe you can get some clue from here okay thank you for listening