 Thank you, everyone, for taking the time to attend our session. Really appreciate it. My name is Sadeeg. I'm working as a cloud success architect for Asia Pacific, currently living in Singapore. So today, we are going to talk about the basic architecture of our cloud containerized deployment with a focus on double shooting. Before we start, there's two things. We prepared our presentation using open stack queues. So there could be something in not in sync with compared to the latest Rocky release. So we apologize for that. Then we wanted to, the next thing, we wanted to focus on advanced double shooting. But without focusing on the basic double shooting and basic architecture, there is no point in going to advanced double shooting. And we don't have enough time to cover everything, basic double shooting and advanced double shooting. So we will focus our presentation on basic architecture of the deployment process and the basic architecture of the our cloud. Then we should explain how to double shoot if you come across some issues. And over to you, Dev. Thanks, Sadeeg. Welcome, everyone, to the open stack summit. My name is Devendra Shambad. I work as a senior consultant with Radha. So quick look at the agenda today. We've got two sections here. Containerized deploy deployment in the first part. And in the second part, we'll be speaking about and deep dive into containerized overcloud. So in the interest of time, and because we've got a lot to cover here, we're going to keep the Q&As to the end of the presentation. Thank you. Quick note about Triple O. So with Triple O, we have two clouds, the undercloud, which is the operator-facing cloud and the overcloud, which is the tenant-facing cloud. In our typical deployment, we use the undercloud and deploy the undercloud first, which is used to deploy and manage the overcloud. Let's look at traditional deployment. In a traditional deployment, you have pre-built images and all the packages installed and base configuration done. You have heat, which builds the puppet configuration. And on the overcloud nodes and the open-stack services. And also, all the open-stack services sharing the same underlying libraries. Now, fast forward into Triple O containerized deployment. In a containerized deployment, we have all the system D managed services containerized. All the services still remain the same, but they're all in a contrast format. So instead of running packaged-based services managed by system D, we have all the services inside a container running on the same hardware. So the only obvious difference that we see here is the deployment of the services are on containers in a container runtime managed by Docker. What does this bring to us? Having containers brings us a great deal of flexibility. It's easier to move containers around. In terms of upgrades or rollbacks, it's easier to do that. It's easier to scale the deployments. And because containers have an immutable infrastructure, they are secure as well. Quick look at the deployment workflow here. And the key thing to note is all the containers require a registry to pull the images from. The overcloud does that by connecting to the remote registry and pulls the images directly on each node. And this has to be done for the each node. So in that case, you have a lot of network bandwidth that is required. And all the nodes require internet connection. Instead, we use something called as a local registry. So local registry is created that syncs all the containers to the local registry from a remote registry and makes a copy. This interns speeds up the deployment. And it also decreases network congestion. Better than having a local registry, the only problem with that is you have basic functionality with a local registry. So we recommend using satellite to sync all the images, container images, from the remote registry to a satellite server. Just deep dive into the key components of the container build. We have COLA that provides container images and scripts. We have PONGE. That's a library that is used to start or deploy the containers. And we have something called as DockerPuppet, a puppet Python script that is responsible for generating the configuration. And finally, for updates and upgrades, we have Ansible within the triple O heat templates. Quick look at DockerPuppet.py. DockerPuppet.py is responsible for generating the configuration for each of the services. And running puppet inside the container. It takes the DockerPuppet JSON as input and puts that into the container to generate the configuration. The way it works is you have two config files, the valid config data, and all the services inside the container. It's basically a full config tree. And valid puppet generated, which only is responsible for the files that is modified by puppet. And finally, it generates a checksum, which then tells PONGE that the configuration has changed and to restart the container. So once DockerPuppet generates the configuration, how do we have the containers started? So COLA is a project that builds the container images and has all the container services and start-up scripts. So like I said before, we have all the configuration that's put inside the container, and the config directory also put in the container. And COLA start copies the configuration on each of the containers, sets permissions, and then starts the container process. Here, we're just taking a quick look at how config.json for a particular service looks like. We see the permissions being set on the container, on the files, the command that it's supposed to run, which is start the service, and the source, which is the COLA config source files. Some important directories that we've listed here. So we have valid config data and the service names, which have the full copy of the container tree, which is basically etsy and all the services inside. We have valid config data puppet generated, which has configuration files only that is modified by puppet. And then we have valid triple-o config and DockerPuppet, which is managed by Docker. Just a quick look at how the bind mounts look inside a container and a host. So what you see on the underlying host is where lock containers keystone is bind mounted inside the container as where lock keystone. Networking on the containers does not containerized deployment does not change. What you see on the host is what you see on the containers as well. Even lock files for all the containers deployment are bind mounted. So in the container, you'll find the logs in their usual location, which is etsy, sorry, where lock and the service name neutral on Nova. And on the bare middle host, you'll see it under where lock container and the services. Finally, let's take a quick look at a stack update, which I'm trying to do to deploy OVN on the overcloud. So basically, my stack update has failed complaining about Docker image not being found. The first thing you'd probably look at is the config file that you've generated using the OpenStack overcloud container prepare and all the containers in your local registry, which clearly states that the OVN containers are missing. So next up, we include that in the container prepare command and rerun it. So here, we have a diff of what was the container registry before and the one with the local registry with the OVN containers present. This brings us to the end of the first part of the presentation. Just over to Sadeep. Thank you, Dev. So Dev clearly explained how triple or deploy a containerize overcloud. So he first gave you an insight into how COLA can be used to build the images, then how you can configure various sources to push the images so that the overcloud can use those images and pull those images during the deployment. He then said some later on what is the purpose of the point and how point is being used to kickstart various containers during the deployment process, both one short containers and other containers and during the deployment process. Then he explained how the Bain mount works in the background. Then he also focused a little bit on how COLA start is being used to orchestrate the open stack services within the containers. And then he gave you an example on how to approach a basic troubleshooting that can happen. Deployment failure brings into the picture. So now let's shift our focus a little bit into explain how does an overcloud deployment and an end result of the deployment looks like from a containerized architecture perspective. So the first thing that I'm going to talk about is the high availability of the containers. So we need for a lot of various open stack supporting services, we need high availability. So we have been using pacemaker to deliver the high availability for various supporting services. And the good news is that we still use pacemaker and traditionally we use pacemaker to make the service highly available. So for example, MySQL Galera, RabbitMQ, Redis are some of the examples. So we cannot just manage the containers the way we manage pacemaker, manage the services directly. So that is why pacemaker introduced a new feature called bundles. So what does this mean is you are going to pass a lot of values options into the pacemaker startup command. So one being the location of the image, the second being what type of networking is going to be used for these containers. Then third, the number of replicas, then whether how many masters or how many slaves, something like that. Then finally, what is going to be the command that is going to orchestrate or being started. In most of the cases, this is called a start. So pacemaker will take all these arguments and depending upon how many replicas that you have configured, this is going to create a bundle of containers distributed into multiple bare metal control point nodes. So this is what we have. So we have various services as I explained. So one is Galera, MariaDB, then RabbitMQ and Redis. These three services pacemaker cannot just manage the containers. So pacemaker need a little bit of intelligence and exposure into the application inside the container so that it can do various activities like bootstrapping and other steps directly on the application. So we need a way to do that and that is why the pacemaker going to start pacemaker remote D inside the containers and then pacemaker remote D is basically going to orchestrate the application, bootstrapping the application inside the containers. This is not for all the containers. It's explained only for the three containers that need special intelligence to bootstrap the application. So I have another slide that I explained this process, step by step process involved in this. So these are the one type of containers that pacemaker makes highly available. The second type of containers are AHA proxy and Cinder volumes I will explain later. So AHA proxy, pacemaker does not need any kind of intelligence or access into the AHA proxy that is running inside a container. What it does need is it just need to start, stop and make the container itself highly available. So we'll say pacemaker, you directly go and manage the bundle of containers created for AHA proxy. And Cinder volumes obviously is still running active passive in a default deployment. So we need to configure the Cinder volumes as just one replica so that the pacemaker will ensure that only one container of Cinder volume is running at a time. So the other thing is that pacemaker service itself is not containerized plus the virtual IPs that using which we access APIs are also not containerized. They are also running as the, just like the standalone services on the control plane. And let's take just one example of how the container bundles are built for my SQL Galera. So we have the container and we have the config.json. This config.json and the config.dir is something that is generated during the deployment as explained by there, using the Docker puppet. And these are available on the bare metal system and these are band-mounted into the container along with all other supporting files like Warlord containers, there are a lot more. Then once this configuration the container takes and we specify that call us start is going to be the process that should orchestrate the application inside the container, each container. Then what you see here is the call us start is going to be the first process that is going to orchestrate the services inside the container. Then what the call us start actually does is it reads the config.json that is already band-mounted and it first does copy all the configuration files from the config.dir that is band-mounted and it's a root directory of the container. So the configuration files inside the the band-mounted configuration file has a root structure starting with ATC or whatever. So once it is copied into the root directory you will get the full hierarchy of the configuration files inside the container. The next step, it is going to set various permissions for various files and directories. So if it is a maysql container that means the var log maysql need a permission or ownership to be set accordingly. Then once this is set, then since this is a special container bundles, we cannot just tell you start the process or galera process, but the call us start does is in fact going to start the pacemaker remote D inside the container. Okay, from here, what happens? Then pacemaker remote D is started, then the pacemaker on the bare metal node or the control pane node is going to talk to the pacemaker remote D inside the container and using the resource agents. So resource agent is galera resource agent. It has a lot of configurations associated with how the galera need to be started. It is going to send that configuration into the pacemaker remote D. The pacemaker remote D is going to basically orchestrate the galera application in service inside the container, each and every container. So we have some more details on how the resource agent looks like in the next slides. So this is how a PCS status looks like inside one of the control pane node. So we do have the first three bundles are the bundles created for maysql, rabitmq and radis and the VIPs are not containerized and the last two are basically the container sets for HAProxy and Cinder volumes. So this is not the only containers managed by pacemaker because if you use OVN, then the OVN database is there that is containerized. So we haven't explored that in detail here. And whatever you see, the guest online and the online nodes are the number of controller nodes, physical controller nodes. The guest online are the containers with pacemaker remote D running inside the containers. Then this is how the PCS resource show you an example for the Galera looks like and you see the first line is, I cannot see it here. So the first line is basically the configuration for the container, what is the location for the image and number of replicas and what process that need to be started. Then you see the second one is the control port. The control port is what is desets, what should be the port, the pacemaker remote D inside the container is going to listen. So by default, there is one port. So we cannot use the default port because there should be a conflict because we have three different containers running pacemaker remote D, one for Galera, Rabbit and Reddy. So there cannot be a conflict between the ports running inside each container. So we need to manually specify different ports for different set of bundles. Then we have the Bain-Mount configuration for that specific process. Then the last section is the resource agent configuration where we are going to tell how to build the pacemaker Galera cluster. What are the nodes, the container nodes that is going to be in the cluster and how to do various activity, I mean how should the pacemaker remote the orchestrate the service inside it. So that's about the containers managed by pacemaker remote D. If you see there aren't too many containers managed by pacemaker, then we do have the standalone containers. So the standalone containers are, most of the containers fall into this category. They are basically stateless, most of the API services and we use Docker to directly, tell Docker to directly manage these containers. So the configuration is generated during the deployment and we do have the Docker restart always set to yes so that the Docker will start and stop these containers. So there is no dependency between these containers because if one container goes down, there is nothing to bring up except the default high availability in Docker. So what happens is we have HAProxy that is being managed by pacemaker. HAProxy is going to load balance API request into these containers. So HAProxy is responsible to ensure that it's not send API request into the container that has gone down. So even if one container goes down, it doesn't matter. We do have rest of the two containers where HAProxy can send request. Then another set of standalone containers are managed or communicated. They communicate with each other using RabbitMQ. So there are basically some of the examples are Nova Scheduler, Neutron Agents, Nova Conductor, and there are a lot more. So the high availability for these, these are also standalone containers. And there is no API access into this. And this uses RabbitMQ to get registered with RabbitMQ. And all of the containers, for example, if we have three Nova Scheduler, all the three Nova Scheduler container will get registered into RabbitMQ. Even if one of them goes down, RabbitMQ still has two of the containers registered that can actually serve the request or do the task. So that is why these containers are still standalone containers managed directly by Docker itself. Now, what I explained still now are the containers that exist on the control plane. Now, each compute node is also containerized to some extent. So that most of the services within a compute node is, for example, the LibWord, the Nova compute, and Cilometer Agents. Nova Migration Target is another container where SSH DRuns listen into a port. And it actually accepts incoming cold migration requests from other hypervisors. So all of the supporting services are containerized. And just like there are also standalone containers directly managed by Docker. And at this time, at the time of the Qtune's release, OpenView switch that runs on the compute node is not containerized. It just runs as a standalone service. And all the VMs, by saying VMs, I mean the Qmuk VM process that manages the VMs are also not containerized, just like a Linux process like in the previous releases. Now, let's cover a little bit into CIF OSD node. So this CIF OSD node is deployed by a director. And director uses CIF Ansible to deploy it. And CIF 3.0 has support for containerization. So the containers in CIF OSD nodes works a little bit in a different way. So that means for each OSD, each OSD is a disk, for example, SDA, SDB, and SDC. And for each OSD, we are going to have a system D process created, a system D service created. And instead of the system D service, we are going to call CIF OSD run.sh and the name of the device. And what it does is it calls the Docker current comment. And it passes the OSD details, the device, and everything to the Docker comment. And the Docker is actually going to create a container for that OSD. So each OSD is a container inside the OSD node. If you have 10, 20, or 30 OSD nodes, you will see 30 OSD containers. And each container has the specific disk exclusive access in the specific disk. And that disk where you get the OSD created. So the important thing here is that you should not be managing these containers using Docker or PaceMaker or anything, but these are simple system D services where you can manage each OSDs using system D services. So if you think that the control plane services are the only one containerized, you are wrong. So we do have a lot of services that spawn containers. For example, Neutron. So you know that Neutron traditionally is responsible to automate your networking. So that involves creating networks. So when you create networks, that means if you enable a DSCP, the DSCP agent is going to spawn at the DSCP server, which in turn spawns a DNS mask process to serve that container. So just like that, you have a Neutron layer 3 agent. And when it comes to Neutron layer 3 agent, layer 3 agent is responsible to create a namespace. Plus on each and every bare metal node, depends upon the configuration. Then start a keep AliveD process. And the keep AliveD process will manage hooking the IP address into the namespace and enable routing for the virtual routers created by end users. So this is one of my favorite topic because this can make the scale of containers in the control plane really unpredictable, depending upon how many users are there, how many of them are going to start, and how many networks and routers. What happens is with the containerization, each network, DSCP server, DNS mask process is a container that is spread across multiple control plane nodes. What does this mean is that we need high availability for the DSCP services. So for single network, depending upon the configuration, the default is three DSCP servers. We are going to basically create three DNS mask process on each of the control plane node. That means 100 networks, you are going to end up 300 containers just for the network DSCP services. And just like for the routing also, so if you have keep AliveD, but keep AliveD will be hooking the IP address only on one container. And for each router, it is a container with the namespace and IP hooked into that namespace. And they are going to form a heartbeat network for the keep AliveD to work between the containers. Then we do have the metadata services. So that metadata services is hooked into the router. If you have a router, if you do not have a router, it is hooked into the DSCP agent. The example here shows that metadata service hooked into the router. So each metadata service for each router is going to be another container. So that means how this is done is we have a Neutron DSCP agent. We have Neutron Layer 3 agent. Layer 3 agent is a container. This container has Docker clients installed in. And these Docker clients can contact the Docker service on the control plane and can spawn different containers for each and every network and router being created by the end user. And the container that is being created by the end, then also a different container gets created for the Neutron network namespace. And the Neutron routers running in the container is going to contact or reach out to the metadata agent container for the metadata access, which is another container. They share the same namespace. If you look at, I don't have an example here. So the first one is a container where DSCP DNS mask process runs. The second one is a, sorry, the first one is a QRouter Layer 3 agent, I believe. And the second one is a metadata agent container. So inside the metadata agent container, it is going to run an HAProxy process. The HAProxy process will listen for the metadata port. And once it gets a request, it's just sending it to a metadata socket, proxy stock socket. Then the last one is a different container spawned to enable DSCP services in CDA for a network. And I said the namespace for the routing and metadata proxy is shared. They all run the same namespace. So we are using the shared flag to pass the namespace into both the containers. So let's focus a little bit on troubleshooting. These are simple tips and tricks for your troubleshoot. So just like I explained, it's very, very important to understand how the containers are started, stopped, and managed for various purposes. What are the containers that is being started and stopped or managed by a pacemaker? What are the containers managed by a docker itself? And what are the containers that are managed by system D? And what are the containers that is managed by a different container? Like I explained in the Neutron use case. There is another use case in Cinder also, where each, if you use NFS backend for a Cinder, each mode request is going to be served in a different container. So we need a clear distinction. You understand the clear distinction between all these types of containers to efficiently troubleshoot. There's no point in playing with the docker start and stop command when you work with a pacemaker managed container. It's not going to help you to troubleshoot anything. And the second, some of the docker commands that you can always use, like docker stats, docker top, which will give you some of the monitoring information from within the container. And most important, they can use docker inspect command to see the configuration of the container. And if you look at the log files, and the log files, you should not be looking at the default log file location, because every log file location is bin-mounted into the container. And they reside on var log containers and the name of the containers, NOAA, Neutron, or depending upon the service. And if you want to enable debugging for troubleshooting purposes, there are multiple ways to do that. And one way is you directly log into the container and then change it on the fly. So this will not survive or restart. But if you need to make changes permanently into the container, you can actually edit the bin-mounted file which resides at the Puppet Generator directory. There is one option. But for configuration changes that is going to remain forever, you are recommended to do it through triple orchestration. Then finally, if you find some bugs and you want to rebuild the container with a fix for the bug, you have multiple ways to do that. So you can create a docker file. Then say, I'm going to change this file. The changed file is there. You copy that into the container. I have a new RPM package. You install this new fixed patched RPM package inside the container image. Then rebuild the container image. Then you just run a triple or stack update so that the container gets updated with the new code. But for a permanent fix, it's better you download the patched image or patched container image and redo a stack update. So these are some of the troubleshooting steps for you to get started with. And by this, thank you, everyone, for your time. If you have any questions, you're happy to answer. Do you hear me? So I've been playing a bit around with the Red Hat OpenSack Platform 12. And we had to do some customizations to some of our services. And we saw that there are basically three ways to change your container. Either it's to rebuild the container from the Docker file or you do it with Puppet and the varlib Puppet generated folder. Or then you mount some folder from the host. And so basically, I'm a bit confused. What is the best way to customize your containers? Because there are many ways to do it, no? I didn't hear you properly, everything. But I understand your question is the configuration is generated using a container, a special container using Docker Puppet. And the Docker Puppet is just going to copy that configuration file into multiple locations. One location, it is going to copy the entire configuration for that specific service. Then the second location is only the files changed by Puppet is going to be generated, which is going to be bin-moded into the container. You're asking, is there any better way to manage this? Or is this the best way to? Yeah, so what is the recommended way to customize the contents of a container? If you want to do something which is non-standard, I have some troubles to find the best way. I can hear you properly, there's a problem. Yeah, so I was wondering what is the best way to do the customization of a container so you can do it from the mounts, from the host. You can do it changing the Docker file in your container registry, and you can also do it with Puppet. So it's a bit tricky for me to see. Yeah, so it depends on what is the preferred way for you. So if you want a patch delivered by a vendor, then you want to run that container because you will remain fully supported, then it's better that you inform the vendor and get a patched container so that you can run it. But if this test environment or something like that, or staging environment, or you don't need, or you have in-house expertise, develop the patch, rebuild the container images, and deploy and upload, push that into the container registry so that the worker can pull. That depends on what is comfortable to you. Hey, I'm just wondering, what's the value of running Galera and Rabbit through Pacemaker while they have their own clustering built in? The question is, the Galera and Rabbit is running by Pacemaker, and what is the purpose of running it through the Pacemaker, right? Because one thing, this need special orchestration inside the container. That means, for example, for Galera, if all the three nodes are down, you need to decide which, you don't know, basically, which Galera node has the latest copy of the database. So if you just start the Galera service inside the container, then you may depend upon the which Galera need a bootstrapping. So it depends upon which node you start first. It's going to be the master, which may not have the latest copy of the database. So you lose a lot of data times during that process. So what Pacemaker is going to do is, one, it has a resource agent. Once all the nodes are up, it is going to find out, automatically, looking at the database, which node has the latest copy of the database. I use that node to bootstrap the Galera services, so that you will not lose any data. So that is the main purpose of one, to bring Galera under Pacemaker. It needs special intelligence to automate the bootstrap process. So anyway, the second thing, if one of the Galera nodes goes down, there is no way to recover that node, unless and until there is someone who can recover that node. This is not applicable for the other standalone services in all sense. But the first thing that I explained, that special intelligence required to bootstrap the services application instead of the container is the main reason to bring Galera under Pacemaker. There is also the proof for RabbitMQ to circumvent some of the bugs, some bugs I don't recall the bugs, that need special intelligence built into the application to orchestrate it. When Galera goes down and early, you need to bootstrap it. It will not automatically bootstrap. So when you manually do this, you can do and choose whichever node has the latest copy looking at the sequence number. But in an actual deployment, you need automation for this. And automatically, someone goes and looks and which Galera node has the latest sequence number and use that bootstrap. That is the special purpose instead of using Pacemaker. And HAProxy is brought under Pacemaker because it has dependency on the VIP. So the VIP must run where they need to run together. So there's no point in failing over VIP without ensuring that HAProxy is running on that node. So that is why only these services are brought under the Pacemaker control. Let's discuss in detail. So the idea is to give some intelligence, get some intelligence, go and react into the application, and do special orchestration. I think we are already out of time. There aren't any questions. Thank you, everyone, for joining. I will be around. Let's talk.