 Deploying a production Kubernetes platform on OpenStack Magnum with Zul. It's an open edicts case study. Hi, my name is Navrata, and I will be presenting our story, how we run our distributed, highly available learning platform, Chlora Cloud Academy with OpenStack, Kubernetes, Ceph, and Zul. So a major release of platform we run forced us to change our entire stack. And here's what we did. So I work in Chlora and the education team. And what we do is that we run the learning platforms. Our platforms are based on OpenEdicts, a free open-source learning management system. And OpenEdicts has two releases a year, just like OpenStack, and they follow a naming convention of botanical tree names. So while the presentation, if I'm mentioning the names such as Mapple, Lilac, Nutmeg, or Olive, I'm referring to these specific releases of the OpenEdicts platform. So OpenEdicts Maple, the release name, with changed everything. So starting from the Maple release of OpenDicts, Upstream deprecated its prior method of deploying the platform using Ansible Playbooks. Instead, they decided to move towards a containerized deployment approach. And to add up to this change, we made the switch from OpenStack Heat to Kubernetes on OpenStack Magnum. This shift allowed us to deploy the OpenEdicts platform in containers, and that makes things more manageable for us. It makes more scalable and flexible for us. And alongside these deployment changes, we also had a transition in our CI-driven deployment approach. We moved from GitLab CI to Zool. So this is a summary of the architecture before and after, which also discusses some of the issues which we ran into along the way. And as I mentioned earlier, our platform is built on OpenEdicts. Now let me give you a concise overview of what OpenEdicts is and what are its fundamental components. So OpenEdicts is an open source learning management system. It's called LMS, an online course platform that allows individuals and organizations to create, deliver, and manage the online education content. So EdX platform consists of three main components. Firstly, the learning management system, which is LMS, it serves as the application to which learners can access and engage with the course materials. And the LMS provides a user-friendly interface to interact with the educational content. Secondly, the platform includes OpenEdicts Studio. It is an advanced course creation tool that designed for instructors where they can create the content. So it is a content management system for creating courses and course libraries in the OpenEdicts platform. And lastly, it's Django admin panel, which allows the administrators to handle tasks such as managing data and setting permissions. So how we used to run OpenEdicts. When we started building a learning platform in 2015 on OpenEdicts, we adopted a deployment strategy that aligned with the community's recommended approach. We opted to deploy the platform on cloud server instances using Ansible Playbooks. So this is the architecture and to achieve a cloud-centric and image-driven deployment, we incorporated OpenStack heat into our workflow and managed our clusters as heat stacks. We used GitLab CI to invoke heat to deploy the platform. So what we migrated to, we migrated to Tutor. So it is a community-supported Docker-based distribution that simplifies deployment, customization, debugging and scaling of the OpenEdicts platform, starting from the Lilac release. And it completely replaced the OpenEdicts native method of using Ansible Playbooks for OpenEdicts installation since the Maple release. So why Tutor came into the picture? So as for the community, it was difficult to install OpenEdicts with native installation scripts. For instance, there are no official instructions on how to upgrade an existing platform to a newer release. Like there is a recommended approach to backup the data, trash the existing server and create a new one. So as a consequence, people tend to run the older releases and are hesitant to upgrade to the newer versions. So how Tutor aims to simplify the installation and upgrade process of the OpenEdicts platforms? So firstly is application isolation. Tutor runs OpenEdicts processes inside Docker containers and it provides isolation and encapsulation of the application components. It comes with CLI, command line interface, for common administrative tasks, making it easier to manage and control the OpenEdicts platform. And it has a plugin-driven system that allows users to extend or customize the OpenEdicts environment without touching the core code base. And it comes with a default Kubernetes, inbuilt Kubernetes integration, and it facilitates the OpenEdicts platform integration to the Kubernetes cluster. So under the hood, what this does is it just use, it wraps the Kube-Cuttle commands to interact with the cluster. And so Tutor comes with a comprehensive documentation that offers a detailed distribution on how to use the distribution effectively. So if you are interested in learning further, you can explore the Tutor official documentation. So in order to adopt the new Tutor deployment method, our aim was to identify a solution that aligned with following our key goals. So we wanted a CI-driven deployment. Our objective was to achieve a fully automated deployment of the OpenEdicts platform to a magnum-managed Kubernetes cluster. Similar to our previous setup through a CI-CD pipeline. And we wanted to have a CI-driven configuration changes. So all configuration and setup changes to the OpenEdicts environment are to be done through our Gergzold tool chain and a thirdly, data replication and redundancy. So we wanted to have a deployment model that utilize clusters distributed across two different regions while ensuring data replication between them. As we had done in our previous setup. So this would provide redundancy and failover to a different region for uninterrupted services in case of any incidents. And stateful configuration and management. So we sought to maintain the ability to manage our OpenEdicts platform through a stateful configuration engine. Before this transition, we used heat stacks and now we wanted to use Kubernetes deployment as our orchestration tool. And lastly, Zoolgarit integration. So instead of GitLab CI, we aim to utilize Zool as a preferred CI-CD tool. And Zoolgarit integration would provide the necessary capabilities for code review, job orchestration, and maintaining a smooth development and deployment workflow. So after the transition, our tech stack changed and what does our tech stack look like today? So our cloud infrastructure runs on OpenStack and we use it both for the infrastructure of the learning platform and our lab environments. So in our previous setup, we used the OpenEdicts native installation method which was deploying OpenEdicts onto cloud VMs or bare metal servers using Ansible roles and playbooks. It was called native in OpenEdicts lingo because it deployed directly onto cloud VMs or bare metal servers rather than into Vagrant managed VMs. And we used OpenStack heat and Ansible to deploy our learning platforms to OpenStack VMs. After the maple release, which was transitioning to Docker-based deployment, we containerized the services and run them on Kubernetes. So we use OpenStack Magnum to deploy our Kubernetes cluster. Additionally, we run a self-paced interactive learning platform which is like we provide the learners to access to complex realistic distributed environments on demand where they can learn things by actually doing it and we use OpenStack heat to achieve that. While both the deployment of the labs and the deployment of the OpenEdicts infrastructure involve the use of heat, their approaches are completely different. Although we have replaced the heat-driven approach of the platform with the one that interacts with the Kubernetes cluster, we have not replaced the heat-driven approach to deploy our interactive labs. So this is the OpenStack Magnum project logo. So we use OpenStack Magnum in order to transition to Docker-based deployment model with Kubernetes. We required a container orchestration service and a container registry offering for storing private container images. Within our cloud infrastructure, we had two container orchestration services available. One was Rancher and another was Magnum. We chose OpenStack Magnum because it provided all the necessary components for a production-ready setup for deploying Tutor. So we needed a container orchestration service that could accommodate custom-built Docker images, and thus we decided to utilize the private Docker registry provided by Magnum. So the built-in Docker registry runs locally on each Kubernetes worker nodes, local host. So the Docker registry supported by Magnum is just the official Docker registry image with the OpenStack Swift storage driver. And when invoked by Magnum, it's Magnum that automatically populates the registry's storage configuration with the necessary parameters to match the Swift endpoint in the region which it is running in. And we have a CI-driven deployment, so we duplicate this for a local running registry instance backed by the same storage credentials that we push our images to. And once we push these images, register these images, these images become available to our Magnum registries. And in our case, it's not OpenStack Swift that backs the registry. We use Seth Redo's gateway with Keystone integration. And to avoid a single-point failure, we don't rely on a single registry, but instead we have two registries each in one region. All images are available in both the regions, ensuring redundancy and availability across our infrastructure. And OpenStack Magnum lets you create Kubernetes clusters via the OpenStack APIs. So to do that, you need to base your configuration in a cluster template, and the template decides how your cluster will be created. So in a production environment, we use six node Kubernetes cluster, which is three control plane nodes and three worker nodes. And we deploy our cluster with Cinder CSI driver enabled. So we use Cinder CSI driver because we want to create persistent volumes for our Kubernetes cluster backed by Cinder block storage. And we use Octavia, which is OpenStack load balancing service to expose our Kubernetes cluster to the outside world via load balancer. We use the Seth object gateway S3 API to achieve automatic replication between the primary and DR site. We don't use the multi-site replication as provided by Seth natively. In a Kubernetes environment, the backup and restore process is implementing using CronJobs. So it's the same CronJob utility you use in Linux. So CronJob allows you to schedule and run recurring tasks or jobs within your Kubernetes cluster. With CronJobs, you can define a schedule using a Cron like syntax and you just specify the desired frequency and timing for your task. And just Kubernetes will then automatically create and manage those parts and execute those tasks on the defined schedule you have given them. And in our environment, the CronJob is responsible for capturing MySQL and MongoDB dumps from the primary site and then uploading them to the S3 buckets and subsequently restoring from these S3 buckets at our disaster recovery site. So that in case of any incident, we can just failover to our disaster recovery site and our services are up and running. And so company-wide, we adopted Garrett as a code review tool and Zool for our CI CD to be aligned with the OpenStack upstream. So we migrated from GitLab CI, talking to OpenStack APIs to Garrett Zool, talking, okay. So we make all configuration and setup changes to an open edX environment directly from Zool, Garrett Zool review tool chain, and don't do any manual changes to the Kubernetes environment. So we have an end-to-end deployment, tutor deployment, to a Magnum managed Kubernetes cluster from Zool. Now let me explain our work flow in more detail, like how we do an end-to-end CI-driven deployment. So we run multiple platforms, so we have different topic branches for every platform that uses a separate Kubernetes namespace. That's how we isolate the platforms from each other. And when making changes to a platform, so I will send the code review and when it will be accepted with the updated configuration, a Zool job is triggered to build the corresponding Docker images with these changes. Subsequently, another Zool job pushes that image to a Swift-back registry that Magnum makes available to Kubernetes. Lastly, a Zool job that deploys these changes to Kubernetes cluster. So that's how our tech stack looks like and the issues which we ran after the transition and which we are facing right now also. So we'll discuss that. So first I'd like to discuss a load balancer issue which we faced in a recent incident. We lost access to our learning platforms. It was because the region in our cloud infra where we host our learning platform had a technical issue and we lost control and compute nodes. So this incident led to the unavailability of Kubernetes nodes and loss of all load balancers and Kubernetes API services, which are both load balancers behind Octavia. So we lost accessibility to services that expose our Open edX platforms which are load balancers behind Octavia and also the Kubernetes API service. So as a result, we can't execute any command against the cluster. Therefore, we can't initiate a fail over to our disaster recovery site as it required to run some scripts against the cluster to make this primary DR switch. So this incident highlighted flaws in our DR policy and our failover process. Our systems will only work once Octavia is fixed and Octavia always gets fixed last. There is no way around that. For Octavia to work, NOVA and Neutron must work. So after you encounter a technical issue, you always get controllers up and then NOVA and Neutron and then only you can look into fixing Octavia and the broken load balancers. Even after these issues are solved and we can communicate with the Octavia API, the load balancers are often stuck in an error state which we can't fix it on our own and we need someone with admin privileges to fix it. So which adds to our downtime. So to mitigate the downtime during the outage, we reconsidered our disaster recovery policy and looked for an alternative solution to expose our open edX platform other than the load balancer and make our platform Octavia free. So for background, we use Caddy as a web server as it is used by Tutor for web proxy and for generating SSL, TLS certificates at runtime. So we were looking for a solution to expose the Caddy service besides the load balancer. As per the Kubernetes documentation, there are many ways to do that. You can expose your services other than the load balancer type. So the first approach is cluster IP which allows the service to be accessed only via an internal IP. So this exposure is not suitable for a production ready environment because it restricts accessibility within the cluster. The second option, NotePort, was also not suitable since our Caddy service runs on port 80 and 443 and NotePort range starts from 30,000 to something 32,000. And there is a way not commonly used among the Kubernetes community because many use cloud providers load balancers which is using cluster IP with external IP. So this solution requires that the customer maps the external IP address of a service to the private IP addresses in the cluster. But unfortunately, we were not able to make this work in our environment. We wanted to use floating IP address as the external IP address. So it was because our floating IP address was getting mapped to a different subnet than the one which cluster IP private address was running. So at the end, we were not able to figure out a way to expose our services without load balancer. But we settled with an approach that rather than doing anything with the APIs, we would just operate on a single DNS record that we could just flip without using any API calls to the backup site. This allows us to have at least a rudimentary service available on the backup site until the load balancer, which are failed at the primary site, becomes available or are fixed. Another issue which we faced was often pods. Now, let me give you a concise vocabulary of Kubernetes, what is often pods and what we faced the issue. So like what are often pods in a Kubernetes cluster? Often pods can occur in a Kubernetes when their owner objects such as deployment, replica set or replication controller is deleted or modified. So by default, Kubernetes deletes these dependent objects. Therefore, the responsibility of cleaning up this object lies with Kubelet. So what is Kubelet? So the Kubelet is the primary node agent that runs on each node and it is responsible for applying, creating or destroying containers on the Kubernetes node. And it is also responsible for the garbage collection of unused images and unused containers on these nodes. So when Kubelet fails to delete these, often pods or delete these dependent objects, often pods remain dangling in the environment and they hinder with the deployment process. So we are facing this issue while performing a rolling update in our deployment. So what happens in a rolling deployment? During a rolling deployment, the controller deletes and recreates the pod with the updated configuration one by one without causing the downtime to the cluster. However, our rolling deployment gets stuck when the pod with the mounted persistent volumes running the older version remains in the terminating state and the one that needs to be created with the updated configuration gets stuck in the container creating state. This scenario is encountered with read write once access mode for persistent volumes. If we could use read write many, means many number of Kubernetes nodes can make the changes. We did not face this issue. So to understand, when we check the Kubelet logs on the nodes, we notice there are massive log entries stating often pod found, but volume parts are still present on the disk. So we dig deeper into the logs and saw that when the pod is trying to get deleted, it tries to unmount that persistent volumes. It tries to unmount that CSI volume. The unmount operation is able to unmount the volume successfully, but fails to remove the volume parts from the node. This problem of pods getting stuck in the terminating state is due to inability to clean volume subpath mounts is a known issue in Kubernetes and in many versions. You can just find the bug report, like in many versions they have faced this issue, but there is no definite cause what arises the issue. Like they have fixed in many versions and it has reappeared in next versions. So the work around suggested in the bug reports was to manually delete the often pod directory and restart the Kubelet service, which works for us also. But as I told you, we have a CI-driven deployment. So it's very complicated to manually shell into a node and delete these tailed directories and restart the Kubelet when the pipeline is still waiting for pods to get terminated. They are like stuck in terminating pods. And we have to go into the Kubernetes node, shell into the Kubernetes node and delete those often directories manually. So ideally, Kubelet should intelligently deal with the often pods. Cleaning a stale directory manually should not be required. And as per the bug reports, the often pods issue is present in Kubernetes versions, up to at least version 1.24. It is possibly fixed in the later releases, which brings us to our next issue. So one other issue is that Kubernetes releases available in Magnum frequently are quite behind the upstream. For example, the current most recent Kubernetes version supported in Zena, open stack Zena is 1.21. Even after we upgrade to open stack Zena to open stack Antelope, we get official support for Kubernetes version 1.24. But this probably won't fix our often pod issue. So with open stack Magnum, we are running the limitation of running older Kubernetes versions or even versions that are end of life. They are UN. So that was the summary of the architecture and our transition and the issues which we ran into. So if you are interested in trying out the learning platform, this is a learning platform that consists of many courses related to open stack Magnum, open stack deployment or Terraform, Ceph courses also, and some quick tutorial also, which we have launched today. You can just check it out on their link. And that's it. Thank you. So if you have any questions, yes. Why do you not use multi-site for Ceph? Why do you not use Ceph multi-site? I didn't get your question. I think in your presentation you said for Ceph, you don't use multi-site replication. Why is that? Yeah, we don't use the Ceph native. It's not supported in our environment. We don't use that. I mean, is there a reason? I'm not sure about it. I can get back to you, but we use the replication policy, not Ceph native. We don't use that. Okay, if you don't have any questions. Thank you.