 Hi, thank you for being here today. So we are going to talk about our recent work with Cluster API. My name is Kotaro. I am a software engineer at L.Y. Corporation. And? I'm Shotaro. I'm also a assistant engineer from L.Y. Corporation. Yeah, nice to meet you. And these are today's topic. First, let us briefly introduce our company and services. Then let us introduce our platform and its scale. LINE is a communication app that connects people, services, and information through various services, especially free messages, voice, and video calls. LINE was launched in 2011, and we now have over 171 million users in a total of four major markets. Almost all of our services are running in our data centers. Verda is our private cloud platform that helps LINE service developers build and run their services on our infrastructure. Our mission is to build a platform that enables infrastructure automation for both provisioning and operation. Verda consists of infrastructure as a service, like VM or VMETAL, platform as a service, and a set of managed services, just like the various public cloud platforms. This is the service catalog of Verda. Now we have over 40 services in Verda, and users can use them easily via the Verda dashboard. Verda is built on top of the sales service principle. Users can manage their resources via a graphical interface, but also use API to operate Verda services. Verda is now becoming larger. We have three regions, over 100,000 VMs and over 30,000 VMETAL machines in total. Our team provides one of the managed services in Verda called Verda Kubernetes Service. VKS is a managed Kubernetes platform for Verda. Our service aims not only to just simplify the cluster lifecycle, but also to provide native integration of various corporate platforms to reduce engineering costs. Our team has seven members, and we manage over 1,000 clusters in total. OK, then let's move on to the next section about the story of why we chose cluster API and how to adopt it with minimizing users' effort as much as possible. This is the overview of our legacy system. We have a cluster provisioner that is responsible for bootstrapping Kubernetes clusters and machines used by those clusters. Machines are turned into Kubernetes nodes by cluster provisioner via SSH connections. In front of the provisioner, we have an API server to abstract basic operations for the cluster itself like creation and deletion. Since we launched VKS, we have faced several issues with our legacy provisioner. The first thing is the management of the cluster provisioner itself. Our cluster provisioner was D1, worked from the upstream code base, including multiple patches for custom features and bug fixes. We've been maintaining six repositories and their patches, which makes it hard to backport upstream changes into RF work. This will be bad in terms of both security and stability. And we've got to need to find another solution. The second thing is about the node management of the cluster provisioner. Our provisioner uses SSH for the initial node bootstrap and established stateful connections to all nodes for managing them. The more users use our platform, the more clusters and nodes are needed to be managed. As the scale of our platform grows, those connections become often unstable. We would like to quit establishing stateful connections for all nodes and make it more scalable. To overcome existing issues, we need a new cluster provisioner that meets our requirements. Verda is built with the power of open source technologies so the new cluster provisioner should be an open source project. Lower maintenance cost is also an important point. We don't want to maintain any fork so pluggable interface will be better to have. Scalability is also required for our DEX provisioner. Based on these requirements, we finally chose cluster API. So what is cluster API? Cluster API is a project hosted by the Kubernetes community focusing on simplifying provisioning, upgrading, and operating multiple Kubernetes clusters. It has a declarative API and a set of controllers that are useful for cluster and node management. The key point here is that cluster API has pluggable interfaces called provider. There are three types of providers in cluster API. Infrastructure provider is in charge of preparing actual resources like log balancer or VM. Boostup provider is in charge of turning the machine provided by the infrastructure provider into Kubernetes node. Control plane provider is in charge of managing Kubernetes control planes. We can use community provided providers for each and also we can implement our own providers. Okay, so which provider did we choose? Basically we want to use community provided providers as much as possible. As for the bootstrap provider and control plane provider, we decided to choose providers powered by Cube ADM, which is provided by cluster API community. So how about the infrastructure provider? Verda is built on top of OpenStack and we have cluster API provider OpenStack, which is maintained by the cluster API community. So why don't we use it? First, we tried adopting capital, however, we weren't able to adopt it directly. Most of use case should be covered by Keppel, but we have several reasons coming from company-specific limitations. Verda uses its own customized API for both infrastructure as a service and log balancers as a service. So forking Keppel is required to use it. Also, we cannot use the community provided image builder for building VM images because of our internal image management policy. We don't want to maintain an internal fork anymore, so what should we do? Our decision is to implement our own infrastructure provider. Cluster API provider Verda is the infrastructure provider for the Verda platform. The key point here is that we only need to implement three custom resources, cluster, machine template, and machine based on the provider contracts defined by cluster API. Keppel has Verda cluster, Verda machine template, and Verda machine respectively. Also, we have a load balancer in front of each cluster, so Verda load balancer resource is also included, but this isn't required from the perspective of cluster API. Building our own infrastructure provider makes it easy to introduce custom features based on our internal requirements. Let us share several examples of our custom features. One of them is user script. Users often want to customize their worker nodes. One use case is configuring kernel parameters of such nodes. Several users want to increase ARP table entries when they want to access a lot of hosts. Another use case is installing node-level software. Some users installed network emulator kernel module and KOSD for a KOS testing purpose. Our Verda machine controller internally merges the user-provided script with the cloud-init config provided by the cluster API bootstrap provider. Another example is a static IP notebook, this feature assigns predefined IPs for worker nodes. It is useful when working with IP ACO system because nodes have the same set of IPs even recreating them. Everything discussed here was on our side, but how does it look like from the user? Thanks to the VKS MPI in front of the provisioner, the user interface and user experiences won't be changed before and after introducing cluster API. Users still can use the same CLI or dashboard to operate both clusters provisioned by different providers. Thank you. From here, I will explain our failure domain design with cluster API, specifically regarding how we treat regions and other ARP zones. Let me start with our region design. Our region design is simple. We provide our Kubernetes service independently among regions. In our private cloud, we have one region in Tokyo and another in Osaka. So we built our whole system in Tokyo and built another separated system in Osaka. By the way, this design diagram implies that each cluster has nodes in only one region. In other words, we doesn't support much region cluster. Thanks to this simplicity, we now can manage our product with small team for seven members. Next, let me explain our multi-AZ cluster design. We take different strategies between control brain nodes and worker nodes. As for worker nodes, users can specify availability zone per node pool. So if users, I mean, application developers, create multiple node pools with different AZ settings, they can distribute their nodes and achieve AZ auto-exploration. In cluster API, our node pool is implemented with machine-deployment customer resources. So much AZ is achieved with multiple machine-deployment distributed across multiple AZs. On the other hand, control brain nodes belongs to only one customer resource called QADM control plane. So we cannot take the same strategy with worker nodes. So how we achieved AZ distribution for control brain nodes? Here's the answer. We utilized cluster API built in failure domain support. Here in this example, Builder cluster is our cluster API's infrastructure cluster resource. And we specify our availability zones in specs of failure domains. In this example, we specify Tokyo 1, Tokyo 2, Tokyo 3 as the AZ. These values are automatically propagated to cluster resources, and eventually, one AZ is picked per machine resources so that our control brain nodes are equally distributed across three specified AZs. The key point is that this is done by cluster API core controllers. We haven't done controller implementation for this feature itself. So we could reduce implementation and code maintenance cost. We believe that this kind of pluggable built-in features are good points of cluster API. Next, let me move on to in-place migration to cluster API. To begin, why we need in-place migration? Right now, we managed two cluster provisioner, legacy provisioner and cluster API-based provisioner. But as you can easily guess, it's highly costly to manage different component for the same purpose. So we wanted to resolve this situation, but on the other hand, from the user's perspective, they want to keep using existing clusters built with legacy provisioner. They don't want to create another cluster and they don't want to migrate the workload by their own. This is the motivation of our in-place cluster migration or conversion from legacy provisioner to cluster API. To achieve our goal, we referred our QBECOM presentation last year from MailSales Spence. They had also worked on migrating clusters from their legacy system to cluster API-based system. In that presentation, they explained how they handled cluster API-related custom resources to replace legacy resources to cluster API-based resources on top of their open-stack-based on-premise platform like VELDA in our case. So it gave us a great knowledge to us. However, we had another critical challenge or issue to overcome. Let me explain it. That's the inconsistency of Kubernetes installer between legacy and cluster API like QBADM. In cluster API context, we face issue in node bootstrapping. Right now, with our cluster API-based system, we use QBADM. On the other hand, with our legacy architecture, a Docker-based Kubernetes installer was used. At the beginning of this migration project, we tried to reproduce the way that's taken by Mercedes-Benz. I mean, we naively tried to run QBADM join command to add nodes to legacy cluster, but it failed. That's due to some Kubernetes setup inconsistency. Let me pick up some examples of the issues we faced in the process of migration. The first gap is API request load balancing. In Kubernetes cluster, each node, or more specifically, KubiRed access QBAPS servers. In HA cluster, such Kubernetes API requests from KubiRed should be load-balanced among replicas of QBAPS server. In our legacy cluster, that's done with client-side load balancing. On the other hand, our cluster API and QBADM-based cluster it uses our cloud-based load balances provided by Verde. To resolve this gap, we prepared two load balances also for legacy clusters to be migrated. Next gap came up after the first gap is resolved. That is subjective alternative name inconsistency in TLS certificates. Kubernetes API access is done with HTTPS requests. Generally, such TLS requests are verified by comparing HTTP request and TLS certificates. In legacy cluster, Kubernetes API access is done with a domain name. On the other hand, our new cluster uses domain for load balances as a target host to access Kubernetes API. So request variation was failed. In this example, HTTPS request to CAPI.example.com was denied by TLS validation mechanism because the host name CAPI.example.com isn't included in the SAM in TLS certificate. So we added the expected domain name to SAM before we proceed migration. The third gap is configuration management. This issue is more specific to QBADM. In order to manage its configuration, QBADM uses Kubernetes API. I mean it stores config map in the QB system name space like QBADM config or QBADM config or like that. On the flip side, QBADM cannot properly handle nodes without such configurations. So to avoid the issue, we prepare such resources, I mean config maps, based on the cluster setups of the legacy cluster to be migrated. The last issue I pick here is H3 member discovery. In our system, we manages H3 clusters together with control plane components with QBADM. Unlike QBAPS server, where load balancer is prepared in front, QBADM have to know every members of managed H3 in QBADM based cluster. H3 members are managed as QBADM static ports, which can be seen from Kubernetes API. Like this, QBADM lists the ports with filter component equals H3 and tier equals control plane. By this, QBADM gets the H3 member. However, our legacy H3 members are managed just as Docker containers, independently from Kubernetes API. This cannot be seen from Kubernetes API. So we needed a way to tell H3 member locations to QBADM. Our solution is to prepare dummy ports on the nodes where legacy H3 instances are running. Like these gaps, we had to overcome some issues of node bootstrapping or cluster bootstrapping for in-place migration. To sum up our project for cluster migration, we have two things to do for in-place cluster migration. The first one is to resolve cluster setup in consistency, specifically with node bootstrap like QBADM. And the second point is to handle cluster API customer resources like the way Mercedes-Benz has explained in the past QBADM session. If you are more interested in the details, please contact us and let's discuss. Yep. Okay, thank you, Shotaro. And next, we're going to share some of the obstacles that we faced and the insights we got from them. Today, let us share two topics. The first topic is controller scalability. Now, the number of clusters and nodes is increasing for both environments. For the production environment, over 2,000 nodes are managed by the single management cluster. Since we have a lot of nodes, the machine controller work queue becomes constantly long. The operation for the cluster, like cluster creation or node provisioning takes time. One of the solution would be scaling up controllers. Another solution would be scaling up. However, it wouldn't be easy since controllers have related forward architecture. Scalability of cluster API is one of our interests. Currently, community seems to be also working on scale testing. Our target for now is 1,000 nodes per single cluster, so we'll continue working on the scalability. By the way, there was a maintenance session about the recent improvements to the performance of the cluster API controller yesterday. Yeah, really good. Thanks for the... Which was introduced in 1.5.0, so check it out. Thanks for the great work and for sharing the effective ways of improving controller scalability. Yeah, if you have interested, yeah, check it out. Next topic is SAD Snapshot and Restore. Disaster recovery is one of the crucial parts for cluster operators. Taking SAD Snapshots will be easy to implement. In our case, we have a job to take Snapshot using SAD Cuttle and upload them to our internal object storage. They're deployed for each cluster and running periodically. However, it's more complicated when we try to restore the cluster from Snapshots. For those clusters managed by cluster API, we need to work together with cluster API controllers to restore clusters from Snapshots. For example, when we try to restore the cluster from Snapshots, we need to prevent cluster API controllers from reconciling the control plane and worker nodes. We currently perform the entire process manually, but plan to automate the whole process. If this use case will be common among cluster API users, it will be nice to have such functionalities on the cluster API side. I was wondering if there is any good architecture to include this as part of the cluster API ecosystem. If you're interested, I'd really appreciate it if you could join this proposal. This is all of our presentation. In conclusion, let me summarize today's contents. I would like to say that cluster API is great for a private cloud platform environment. We can reduce the cost of cloud management because of its pluggable provider concept. It allows us to have custom features which is great for such a large platform with its own needs. Also, cluster API is friendly with a multi-regional model with the fairer domain concept. Adapting cluster API as a next-generation cluster provisioner could be a long journey for a large-scale environment. However, we minimized various costs for cluster operators with platform engineering. We provide consistent UI and UX with API abstraction. It allows users to operate their clusters without knowing the details of underlying technologies. Also, we support in-place migration with cluster API and QBADM to minimize the migration cost. And thank you very much for your time and attention. We are on Kubernetes, so questions are welcome. Also, we can talk in person, so feel free to ask anything about our session today. Thank you very much. So thank you for a great presentation. And I have a question related to custom feature that's stated, IP node pool. Yes. Yeah. The first one is, when was the process resource created and is it tied to cluster or node pool lifecycle? And sorry, could you repeat that? When were the process resource created in OpenStack Neutron? So the static IP resources are created when user create node pool, you know. But also, sometimes user want to expand node pool size, so that's also trigger creation of static IPs. It's internally, you know, OpenStack port resources are used. Is this answering your question? So I think, and is this tied to cluster or node pool lifecycle? Sorry. Is this tied to the cluster or node pool lifecycle? So earlier you said that the process created after user create a node pool. When user created the cluster, static IP node pool is not created. When user calls our APIs, yeah, it's created. So, and I have another question for this. Does the number of ports need to be equal to the number of nodes? Yes, so, oh wait, sorry, it's more complicated, but each node can have one private IPs and one public IP, so user can specify the policy independently. So if user specify both public static IP and public private static IP, yeah, so two ports are created for one node. Okay, so, thank you. Yeah, thank you. First of all, great presentation. I have a question about the capability for users to customize their init scripts when creating their nodes. Is that something that's actually exposed, like if I were to be an user and I wanted me myself to, I don't know, increment the number of TCP ports, of it, it's something that I can do, right? So, how often do people shoot themselves in the foot and do like wrong things with those scripts? And if it happens very frequently, how do you guys deal with that? Yeah, some of the users have used that feature actually, but in that case, the node will fail to provision, so maybe that's not harmful and the user can notice that. So the user will fix it, so actually. Yeah, also, one another background is our Quantities nodes are built as user's ER infrastructure as a service VM, so it's user's VM. So, yeah, we treat it as a user's responsibility. All right, so I'm assuming that you guys have metrics internally trying to predict that your nodes, like nodes that are being created by Kappier are healthy. Does that not impact those metrics? How do you differentiate between, oh, like there's an issue with the nodes that we're creating versus, oh, it's just people doing stupid things with their nodes? Yeah, that's a good point. So actually, we cannot distinguish the course, but hopefully, unfortunately, yeah, the user's using such features not so large, so we can manually help such users when they face issues. All right, thank you. Yeah, thank you. Thank you. Fantastic talks. Arigato. I have a few questions. You talked in the beginning that your private cloud is a bit different to what Cluster API can do, and you don't use the Octavia API. Have you considered writing your own provider so that it can be used, actually? Because we are in a similar situation, we also run OpenStack clusters with 8,000 nodes, let's say, and we implemented our own Octavia provider so that we can actually use the Cluster API provider for OpenStack. Have you considered that, like writing your own? Actually, we haven't considered that. That's more because of our organizational structure. So this Cluster API adoption is decided only by one test team. So yeah, also, the code base is not so large, so that's why we decided to write by our own. Okay, perfect. A second question, if I can. Can you share what type of like CNI do you use in your communities? Is it Calico or? Yeah, we use Calico. Yeah, it's the same. And have you considered using Crossplane? Kind of new to this, but it seems to me these two options are similar in a way. Can you say more about this if you have or have not? Or if you can maybe just say what do you think is better for us because we are now in a situation where we are trying to get as much information and decide which way we will go for it. Honestly, Crossplane is not on the list when we consider another solution. So maybe it will be better to consider when you try to decide the final solution will be. But as a result of our adoption, the Cluster API could be one of the options. So maybe it will be. I'd say that's one of the good option. Okay, thank you so much. Yeah, thank you. Hi, I'll try to make my question pretty quick, but I really liked how you had the infrastructure providers, like you kind of made your own kind of custom CRDs, which is really cool. Do you ever have any problems with like your development teams trying to take those and make them kind of a little too fine grained for maybe their particular application or maybe use case that they're using for? It kind of feels like you kind of built a really nice framework for people to do that, but I don't know if that's like the intention or people take it too far. Sorry, I'm not grasping your questions, details, but you are asking our controllers, sorry, could you repeat your question? Yeah, no worries. Yeah, so like in the case of like the Verda machine template, right? Do you ever have problems with people kind of saying, hey, like I need to deploy my application for X, Y, or Z, right, but I need this really complex config added to it. So they kind of like kind of really focus that template down onto their persistive use case and kind of break the formula that you kind of made for them. Actually, I'm not sure if I understood correctly, but user cannot create the Verda machine resource directly so we abstract that layer via the API, so user just request something like the create node, it translates by the VKS API to create the Verda machine resource. I think there is no special request to add some functionality, hey, I need Verda machine, I need this feature to Verda machine, so it's completely independent. And also I think another good information is because this cluster API cluster is just QBADM cluster, so sometimes we allow users to run QBADM command by themselves, for example, to add some PEMs to the clusters, so it's the money only. Okay, yeah, that makes sense. I appreciate your time, guys. Thank you. Maybe is that all? Okay, again, thanks for being here today. Yeah. Thank you.