 Hi everyone, my name is Hyokyokui. I'm a software engineer at NTT Communications and I'm in charge of developing network provision system, especially for transport network, to configure some optical devices like transponder, rodent, and layer-to-switching network. I'm also a member of DevOps platform team and I'm working on developing Kubernetes-based CACD platform. And now I'm really interested in bringing my knowledge and experiences of Kubernetes and cloud-native application development to the modernization of the transport network provisioning system. So these days I'm mainly focusing on developing a new network provisioning system, leveraging cloud-native open-sources and application development practices. Thank you for having me at this great conference. I really appreciate Open Source Time to Japan community giving me this opportunity to share with you my recent activities about this cloud-native network provision system. Okay, so before we get into the main topic, let me share the scope of this presentation. The network provisioning system I will share with you is one that receives intent-based higher-level network provisioning request from the Northbound API, then it compiles the request to its actual device config and deploys it to the Southbound devices. It is just focused on the network configuration already and its target is only configuration plan, not control plan. So some topics like config generation from high-level intent, decorative configuration and programming, get-offs with the scope of this presentation. But other topics related to esteem control plan work for engine or API orchestration to integrate with other systems, monitoring till-touch provisioning out of the scope of this presentation. And today's session consists of the following four topics. First, I will share with you the legacy network provisioning system and the practical problems of it. Then I will introduce the cloud-native technologies helpful to develop network provisioning systems. And I will show you the brand-new network provisioning system, leveraging cloud Kubernetes and some other cloud-native open-source technologies. Finally, I will show you a demonstration of what I developed. Okay, so let's get into the main session. I will show the following two cases and explain the practical problems we will face at the production-level system development, API automation, and gate-based continuous delivery. So first, API automation. These days, most network devices or their management system have REST API to configure them through API automatically without human operation. In addition to that, and several modules and Python libraries for network automation are prepared as open-source. So thanks to these open technologies, we can develop a network element provisioner relatively easy. The network element provisioner receives an intent-based API request from the northbound, maps it to the actual device config, then sends a request to update device config of the southbound devices. If you want to provide some kind of on-demand network services using the devices and provisioner, you might need an API orchestrator or a workflow engine to collaborate with other systems, like inventory management, accounting, and so on. And in the network element provisioner, in most cases, some kind of configuration store is needed to store northbound service intents and to cache the actual device config for data reconciliation, when some cases for better performance. In my opinion, this is rarely simple, but the typical basic architecture of the telecom network service. As for entity communications, some products are structured like this architecture. However, you will face some critical problems if you want to develop production quality network services. First, Ansible is a great product, but if we want to integrate minor devices or develop complex use cases, there might be no Ansible modules you can use as an out-of-box solution, and you might have to implement network provisioning script from scratch. If you get into transport network application development, it is the case you are facing. And if you write simple network provisioning script by yourself, you might select text templating or like Ginger. Ginger text templating is useful, and it is enough when you implement the small use case and the configuration that there are no relationship and dependence between much request from northbound. But if you get into the complex use cases consisting of Android and Android network, lots of BGP connections, your strategy to use text templating will be broke. In addition, if you want to check what is actually provisioned to the network devices, these configurations are generated dynamically by network element provisioning, so you have to invoke get command as a network device directory to see what is actually deployed. And in terms of data sync, configuration drift might be caused in the case that the network operates to configure network devices directly without using network element provisioning for some troubleshooting. Or the case that the application version is updated and mapping logic from northbound intent to suspect device config is changed. You have to reconcile in such situation when configuration drift occurred. But if you have thousands of inter-based service orders, the minor data sync or reconciliation is already troublesome. Okay, so let's move on to the next use case. Get and get repository services are good choices for content department of infrastructure as a code. Most engines are familiar with get operation. So it improves developer experience and we can get lots of benefits from the get-based tools and practices that are elaborated for application development. Even in the network community, almost all engineers know the advantage of get-based operation. So recently, lots of network operators are eager to introduce get-based continuous delivery into the network operation using an simple script as a provision and using networks to manage infrastructure debt. But the get-based continuous delivery in the network community is a little bit different from the crowd-native get-offs approach and there are some critical problems in this use case as well. First, if there's no end-sever modules you can use and you choose the simple script and text templating, it is difficult to test whether the generated configuration is correct without deploying it to the actual device. The generated configuration is just a simple text and there are no models and schema so you cannot perform any static analysis or any other model-based tests like policy check. All you can do is go to file testing that the reviewer checks the difference between the previous snapshot and the newly generated one. As you know, it is very painful for a reviewer to check the overall config updates with his or her eyes and might miss some critical errors. It just depends on the reviewer's scale so it's very painful and we should try to change this situation. Even if the reviewer checks perfectly and there are no errors, unfortunately the actual mapped config might be different from the shared testing results. There are some reasons for this configuration dwarfed. One is that the management data stored at networks is changed from the one that is used as the shared testing time and also config storage will be occurred when the network element provisionals application version is different from the one installed in the shared environment. There is another problem. The config mapping logic of the network element provisionals depends on an external system like networks. So if you want to perform a rollback to the previous configuration, it is not enough to perform git checkout to the target revision. It's also needed to restore external service to the corresponding version. The network operation is not close to the git operation so operators must ensure the right version of the configuration at the right to both git repositories and external services like networks. As for security, this architecture selects the fresh based approach. So that's the all secrets like login password or credentials are stored at the network element provision. As you know, it increases the security risk and should be corrected. I've introduced some problems of the current network provisioning system. So from here, I will explain the cloud-native technologies that are helpful to solve these problems. There are lots of great technologies and also these in the cloud-net run scape. Today I'd like to share these four topics that is especially helpful for development of network provisioning system. As you know, Kubernetes is the most famous container orchestration tool in the world and it contains excellent mechanisms for automation, reconciliation loop and operator pattern. Kubernetes reconciliation loop is an engine to converge the actual state of the delivery target infrastructure to the declaratory described manifest by running the procedure implemented in recon-serial repeatedly. All Kubernetes resources are managed by its approach so that Kubernetes has some great features like resiliency and out-hearing. In addition, we can extend Kubernetes behavior by implementing custom resource definition and custom controller to manage external application or your own resources without modifying Kubernetes itself. Kubernetes operator patterns ecosystem is growing rapidly and there are some great approaches to write to your own custom operators like KubeBuild and Operator SDK. Next, get-offs. There are some important principles in get-offs that should be satisfied. The configurations of the system and the infrastructure must be described declaratively as an infrastructure's code and system configuration should be deployed when new get-committed is pushed or get-for-request merged to the main branch. And also, the concept of single-source of tools is really important for get-offs. Get-manifestor repository must be single-source of tools or configuration that must be aggregated to the manifestor repository. And get-offs controllers must support full-based approach to ensure the configuration that is stored as a single-source of tools repository will be deployed as it is without any modifications. And next, KubeLanguage. Kube is a powerful data configuration language and it has great features that change our relationship practice to handle declarative configurations significant. The following is a brief overview of Kube's features. Kube is specialized in data unification. It enables us to unify multiple data in the arbitrary layer. And since Kube is commutative and associative, so we can get the same result when performing a composite of multiple data regardless of the order of evaluation. In terms of types, Kube does not distinguish values, types, and value constraints. Even types and constraints are the kind of value, so the only difference between them is that types and constraints do not have concrete value. Due to this novel approach, we can declare constraints and schemer simply and effectively in the configuration data itself. Kube is also highly programmable. It supports software coding practices like templating and module, so we can take the same level of advantages as if we use general proofs language. In addition, you can generate Q-type differences from Go API, Open API, and Protobuf. So, you know, Q is a use case agnostic language, and it can support any data models by generating Q-types according to these API schemers. Now, leveraging cloud-native technologies I mentioned before and Q-rank, you can construct the sort of modern shear-shell pipe line like this. The declarative described manifest and system configuration is written by Q, and the configurations written in Q have types and schemer, so you will be able to perform static analysis like type validation. In addition to that, you can implement the logic of Porsche enforcement using Q, so you can perform Porsche tests with API as well using Q. You can set a proof-based get-offs using Algo CD or Flux CD. It performs pouring of the get-commit of the manifest level, and when it detects changes, the latest Q manifest is compiled to the YAML manifest, and then deployed to the Kubernetes cluster. In fact, as of now, Q is not widely used as far as I know. Customized or HEM is commonly used as configuration tool, but in my opinion, Q has the capability to repress them in the near future. In order to compare this modern shear-shell CD pipeline to the legacy network provisioning system, I have arranged modern shear-shell CD pipeline above and legacy network provisioning system below. They look similar, but there are some significant difference between them, and the legacy network provisioning system is behind in some points. As I mentioned before, there is no models and schema in the text-templating configuration, so there is no way to perform static analysis except for golden file testing. The configuration data stored in the manifest level is not single source of tools, so you need an extra operation in addition to running Get Checkout to execute rollback. And since simple procedure scripts and text-templating are included in the delivery flow, so the modification of the device config by the scripts might cause an error or a configuration derived. And it uses a pressure-based approach like shear-shell so all device secrets are stored at the network element provisioner, which increases security risk. All these issues are critical and should be addressed, and we can improve them much better by adapting the cloud-network technologies and practice to the network provisioning systems as well, I think. So let me introduce you to what I have created, the new network provisioning system leveraging cloud-network technologies. These are the basic requirements of this new network provisioning system. To improve testability and programmability, I need typed programming of network configuration, not simple text-templating. In the network configuration ecosystem, there is a good schema differential language called YANG, so it is better that the typed definitions are generated from a YANG schema automatically. And in addition to the programmability of network provisioning system itself, it is also important to improve the programmability of the Northbound application that calls Northbound API. In order to achieve this, it is needed to be able to provide an abstract, intent-based, however, cloud API easily and automatically without implementing a detailed merge or a deletion strategy. In my opinion, if this provisioning system can expose intent-based high-level cloud API, it is really helpful for developers of the Northbound application, since they can perform domain-driven development by using this network provisioning system as a repetitive layer of the DDD architecture. In order to achieve auto-generation of the high-level cloud API, the configuration language must have the ability to perform composite of multiple types of documentaries to be able to do that the characteristics commutative and associative required. And, you know, Q meets these requirements. So, I decided to select Q as the configuration language for this network provisioning system. And, you know, also GetOps is the key concept of this network provisioning system. And I've implemented this feature using FlexCD source controller. And for other fundamental requirements of the network provisioning system, we need a feature-like transaction of distributed network devices in support of multi-pender, multi-person devices. And I've implemented these features as a Kubernetes custom operator. And I selected QBuilder to implement Kubernetes custom operators. I've implemented several custom operators for this network provisioning system. It was rarely helpful for me. And I focused on the core logic of the custom operator. This is overall design of this new network provisioning system I've implemented. But this diagram is a bit too busy to explain, so I will explain the details in parts. This is the API server. When it receives intent-based northbound API requests, it compiles a high-level model data to the low-level after-device config by evaluating Q configuration. Users of the system have to write the data mapper to map high-level model data to a low-level device config and it's very easy due to Q's high programmability. During the config compilation, Q performs the type validation and even policy check in the case that you write policy enforcement logic in data mapper. The API server exposes this GNMI API to the northbound, so operator or northbound application can deal with this system through GNMI. In order to generate the entire device config document from the multiple northbound request, it performs the composite of multiple compiled device configs. This composition is achieved by using Q's characteristics that is committed and associative. The compilation result sent to the Git manifest repository. The compilation result includes entire network device configuration so that this manifest repository is single source of tools so we can perform rollback the entire network configuration by already running Git checkout and we can also get the entire device config from Git repository without invoking show command on the network devices directory. As for shear testing we can perform shear test with the entire actual device config and we can also run static analysis like type validation and policy check which improves the testing quality. In this current cluster Flux-CD source controller device loader operator and device driver operators of each vendors on each versions are installed in advance. When prerequisite merged, Flux-CD source controller detects the configuration update and proves the latest configuration to the cluster. Flux-CD source controller updates the corresponding device loader to custom resource then the configuration update transaction starts. Device loader controller is responsible for transaction management. It acts as a commander of the transaction of distributed network devices. So if any device provision is failed all devices will rollback to the previous state. All device drivers are to be implemented as Kubernetes custom operator so you can easily extend to support multi-bender and multi-person devices by implementing device drivers as Kubernetes custom operator. As for secret since this network provision system uses Kubernetes cluster as its infrastructure it can use Kubernetes secret resources to store the device secrets. As you know there are some great open sources like external secrets operator and secret storage driver. We can easily integrate with secret managed services of public cloud. The secret merging integration is still in the conceptual stage it's a huge work. After all by integrating all components I've introduced to you we can make up the new network provision system like this. The new network provision system platform is already prepared but in order to use this system we need to create the diverse driver custom operator and the Q type definition for type validation. If the provision trigger device supports open config style young module and you want to configure the devices by open config model you can use open source y got which means young center code tools provided from the open config community. By using y got you can generate the code struct from young module and also you can generate Q type definition from the code struct by performing Q get command. You can use the code struct in order to implement the diverse driver custom operator and the Q type definitions to implement mapping logic from northbound to southbound device config inside the API server. As I mentioned we can get lots of benefits from open config and y got the diverse driver and mapping logic implementation but if you want to configure any other network devices which do not support young and open config we have to seek another way to generate them. It's also future work. Ok so I'll show you demonstration of this network provisioning system. So let's move on to the demonstration. I'll show you the simple demonstration that the network provisioning system config used to open config based GNMI emulator. All network provisioning components are installed in the single Kubernetes cluster so you can see all active containers by running get deployment. There are seven deployments. This is the API server and this is named network controller as the tentative name of this application. And the flux-shitty source controller and the Kubernetes custom controller for device-lowered resource and the custom controller as well for the device driver and two GNMI effects servers for the demonstration. You can also check the custom resource definitions like this. There are device-lowered custom resource definitions one for the open config demo device driver and flux-shitty get-represed custom resource is installed as well. There is one flux-shitty get-represed resource named network-shitty testbedder and it is addressed to this testing get-represed. And there is also one device-lowered custom resource that is named the same as flux-shitty source controller. Its current provision state is healthy which means that the provision transaction has been completed successfully. And the two open config custom resources are installed as the device driver in order to configure the apparatus. And these open config demo custom resources are registered to the device-lowered resource. In this mine list, there are two devices config 01 and config 02 and some data to manage merge device transaction is also stored at this mine first. So let's try to call northbound API and configure the fake devices. First, let me introduce the data mapping logic to convert the intent-based northbound service model to the southbound device model written in Q. This is data mapping provision logic and there are two fields input and template. And in this input field, there is a type definition and the northbound interface is exposed using this schema. In the template field, the main mapping logic is written. It has an input and output field and output field stored the key var pair. The key is the device and the var is the device config that should be configured to each device. This is just a conceptual demonstration and the use case in the configuration is also fake. Only brilliant sub-interface and brilliant definitions are provisioned. And you can configure using this interface and mapping logic from the northbound API by setting this request payload which means defined input interface. For the demonstration, let me set port forward to the API server and also set what to the device over to customer resource to show the update of its transaction state. I want to set VRN999 in this demonstration. So let me check there is no existing configuration of VRN999. You can get the entire device configuration by using GNM like this. And also check there is no existing configuration by graph. OK, fine. And so everything is prepared. Let me send the set request to the northbound API of the API server. There are no errors and API server make git comment and push it to the git repository then device robot customer resource receives the changes and conduct the transaction management and it changes to the running state it changes to the running state and the transaction is completed. OK. So let me check whether the VRN999 configuration is configured. The device config that is generated by evaluating mapping template working by Q is provisioned to the average to success free. And if you go to the git repository you can find the commit which has the same git revision. And VRN999 open config service model is configured then to device config generated from the northbound service model is also added. And you can see the detail is provisioned to the devices from the git console or by conducting it that is the end of this demonstration. OK. So let me check whether the requirements are satisfied by this network provision system. Reverging Q-RUN we got great programmability and testability compared to text templating. In addition, Q's characteristics, commutative and associative gave us the ability to perform composite of multiple pressure document trees. As a result, it enables us to create a high level abstraction model and expose graph interface. All these requirements are satisfied by adopting Q-RUN. As for GitOps and other requirements, fundamental requirement like single source of tools, tool-based approach, multi-device transaction are satisfied by adopting FoxGD and installs Kubernetes custom operator. But there are some remaining future works like secret manager integration and test with actual network devices. So now we are planning to perform an integration test with actual devices. At first, we are trying to do some field trial with transport white box transformers using open conveyor GNMI and as you can see, we need to conduct much more integration test cases with actual network devices to improve the quality of this system and also we'd like to release this network provision system as an open source, so we are actively developing missing free tools. Now, the system is just a Peeroshi quality, so we have to take much effort. These are the takeaways for my presentation. First, I've developed a new network provision system leveraging some cloud-nit open source technologies and Kubernetes. And the Kubernetes operator pattern is well designed for automation. It's really helpful to manage even external resources and can be upright even for the network configuration. Finally, Q is a great language that has the potential to be a game changer and we might be able to change our network provisioning operation drastically by integrating Q. This is the end of my presentation. Thank you for listening.