 Hello, everyone. Thanks for joining us today and welcome to Open InfoLive, the Open Infrastructure Foundation's weekly hour-long show featuring production case studies, open source demos, industry conversations, and the latest updates from the global Open Infrastructure community, streaming every Thursday at 14 UTC to YouTube, LinkedIn, and Facebook. My name is Kendall Nelson, and I'll be your host today. Since we're streaming live, we will be answering questions throughout the show. Please feel free to drop questions that you have for today's guests into the comments section of wherever you're watching the stream. And we'll get through as many as we can throughout the show and try to save a little bit of time at the end as well for more. We have a jam-packed episode today, so let's go ahead and get started. Kicking off today, we'll have Tom Baron from Red Hat who is going to give us an overview of SAF before rolling into how it fits with OpenStack via Cinder and Manila, and also touch on Kubernetes as well. Take it away, Tom. Hi, Kendall, thank you. Next slide, please. I'm going to talk very quickly about SAF and OpenStack and how they integrate to set the stage for what follows. OpenStack, as everybody I'm sure knows, is an infrastructure as a service cloud technology. And for storage infrastructure as a service, we have object, block, and file storage offered by Swift, Cinder, and Manila. Now the important thing about these is these are self-service, of course, and that they are integrated with Keystone to provide tenant separation. So users belonging to separate Keystone projects get their own virtualized view of storage and basically do not step on one another, cannot impinge on each other in terms of security and so on. They also cannot mess with the cloud resources directly, so they're protected. Now this is most obviously perhaps relevant to people who run OpenStack as public clouds, but it's not just for public clouds. Large enterprises run tenant separation for departments and so on within their organizations, similarly for government and public sector. We see telcos who have regulatory requirements to run separate applications and separate tenants and so on. So this multi-tenancy and Keystone integration and self-service storage infrastructure are the themes on this slide, next slide please. Ceph is a natural fit here because Ceph technologies offer object, block, and file storage all based on RADOS technology. So a common low-level object store technology is used by a LibRADOS by the RGW RADOS gateway which can stand in for Swift and offer an S3 or Swift-compatible API and integrate into OpenStack that way. The RADOS block device, RBD, which offers virtual block devices, which can be a back end for sender. And by Ceph FS, which offers a distributed POSIX-like file system and which can be a back end either by itself or gatewayed to NFS or the Manila service. Now, let's go to the next slide. RBD has been consistently ranked as the most deployed back end for sender and Ceph FS has been consistently ranked as the most deployed back end for Manila. Now, with our distribution, I work at Red Hat, we typically see RADOS gateway also deployed as a stand-in for Swift when you're using it for sender and Manila. And I'll mention it's typically used to back Novemberal storage and glance images as well. Next slide please. We could, as I've emphasized, but let me take a moment though to talk about why it's popular and we can talk about this more in discussion, but clearly there's an economic advantage in being able to deploy Ceph on commodity hardware and to be able to handle all three services from the same hardware using the same set of software. Rather than having to purchase separate appliances, potentially from separate vendors for each service, rather than having to train operators in separate skill sets to manage the storage technologies behind them. And nowadays, I'll mention there have been such improvements in Ceph in terms of the management of it. There's a common dashboard, common pane of glass on all of this, so it's easier and easier to manage. When you combine Ceph with OpenStack, you get this multi-tenant separation with Ceph, okay? I'm not saying you can't do it with Ceph without OpenStack, but you get it integrated with the rest of your identity system and so on that you're using for OpenStack. And you end up with a cloud infrastructure where it's more like a public cloud like AWS or GCP than say vSphere or something where it's just a virtual, more or less just a virtualization platform where you have to have special privileges, for example, just to bring up the equivalent of the compute answers at SVM. The other thing I would quickly say about the popularity and fit here, pardon me for not going a strict connection here, is that of course, Ceph is open source. Now, what does that mean? So we have open source, OpenStack and open source Ceph. You avoid vendor lock-in, you don't pay somebody for a license to run something that you cannot even inspect the inside of. It's like, so people that don't like to get software or a hardware solution where they have to pay for a license, it's like buying a car that you can open the hood on and hire a mechanic and work on it or you can look at it yourself. And there's a vibrant Ceph community. So if you have a bug, you can see who else has the bug, you can talk about it, you can affect bug prioritization, you can affect the new feature development, you can participate in the conversation. So it's a natural fit with OpenStack in that way. Thank you, and sorry, I just wanted to dwell on that point for a moment. We can go to the next slide. Now we're gonna talk about another big open source project, Kubernetes on running on top of OpenStack with Ceph is the next topic here. Now, in the popular press, you often see OpenStack and Kubernetes posed as exclusive or proposition. When are people gonna move from OpenStack to Kubernetes? Should you use one or the other? Point of view we have here is that they're complimentary rather than exclusive, okay? What Kubernetes brings to the table is a simplified orchestration and scaling of applications. With a little bit of YAML or with distributions that build on top of it like OpenShift, even just point and click, you can deploy applications very simply, you can scale them out, you can combine them and run them in the cloud, meaning not just on a single node, but scaling out over the whole cloud. So, for example, with our distribution for IPI, installer provisioned infrastructure, a regular OpenStack user, a member of a project, nobody special can effectively press a single button and it provisions all the OpenStack compute instances they need for their OpenShift nodes. It builds the storage classes they need to do dynamic provisioning of storage for persistent volumes with sender and Manila, and it sets up all the networking, ingresses, load balancers, et cetera. So, bottom line here is users don't have to be system administrators to run applications on top of OpenStack if they're running on top of Kubernetes on top of OpenStack. Now, in the early days of Kubernetes, people thought it was gonna be stateless, but everybody knows now that you need volumes, and many of the popular applications that are often shown are things like WordPress. You need state, you need persistent volumes, and there are a couple of important concepts in here that people should be aware of. One is volume mode. Traditionally, Kubernetes has worked with what's called file system volume mode. It gives you a persistent volume at a mount path, like barWWW, something like that, as opposed to at a device path, like devVDB, okay? Nowadays, you can also run raw block mode, volume mode, and you can do this little adder. But what that meant is that normally with, even with this block device provider, like sender, you would get, it would automate for you the job of, I have a raw block device. I'm going to attach it. I'm going to format it. I'm gonna build a file system on it, and now I expose this path barWWW or whatever to an application. That was all done for you with file system. Okay, now this is, you can now do a block and you just get devVDB or whatever if you want. Now, most of the applications that people run, if you look at, for example, at the OpenShift catalog of applications, are assuming file system. So Manila gets an increasing fit in Kubernetes relative to what it had in OpenStack, I will assert, because it's so easy to deploy applications across multiple nodes. So in OpenStack, that's going to come, mean you're running on multiple open Kubernetes worker nodes, which are running on multiple compute instances or VMs, Nova instances, conceivably bare metal in the future in OpenStack. And having a file system and a shared file system that can span these nodes is handy and useful and it's gonna be something people do more and more. This pertains to something called access modes in Kubernetes, where they talk about read once many, read right once and read right when many. Of focus on the second two, with respect to sender and Manila, because they are complimentary rather than redundant. So Manila can do read write many, in addition to the other modes, in file system volume mode, but it doesn't support raw block volume mode. Sender does read once many and read right once in file system volume mode, but not safely read write many. We can talk about that at length that people want to, but you can potentially corrupt your file system and in any case, you get inconsistent contents across different nodes because you use XFS or by default EXT for node local file systems with Kubernetes, not a clustered file system. So sender RWX is for raw block volume mode and you can do that there. And if you have multi-attach and sender, you can do RWX with raw block volume mode, and that's useful for certain databases that work with offsets rather than file paths. It's useful for ISOs and boot volumes, stuff like that. Here, I'm gonna quickly point to a picture of, I'm showing two OpenShift clusters with Manila in this case, running on top of OpenStack. And the details aren't that important at the moment, but the point is that you got to set public network. We're sharing by a Ganesha in this case to provide NFS service to these separate OpenShift clusters which belong to regular users. There could be hundreds of these on the OpenStack cloud, all unrelated, all ships in the night to each other, just like if the somebody, I've got OpenShift because of the red hat package, Kubernetes, just like if somebody goes to Amazon or GCP, pays their credit card and gets on, they can run a Kubernetes cluster on top of it, deploy it themselves, and then another person can deploy it and they don't touch each other. And we can understand in this, this is the picture, this is the vision of hundreds of clusters running on a Kubernetes clusters running on a single OpenStack cluster. People worry sometimes that because of all the service layers involved in Manila and sender and provisioning, that it would not work at, quote, container scale. I kind of laughed at container scale because if you run containers inside OpenStack on VMs, your VM scale is actually a little bigger than your container scale, even if you do nothing but run Kubernetes. But what's meant here is that is so easy to orchestrate the provisioning of persistent volumes and of pods and containers with Kubernetes across the cloud that you can get rapid bursts of creation needs. So we have been studying this with, in this case, the entry sender driver, the newer container storage interface sender driver and the Manila container storage interface driver. I'm saying driver from OpenStack but provisioner in Kubernetes on, in this case it's on OpenShift on OpenStack. And so far we're seeing a nice flat growth pattern which is what you don't wanna see as a knee in the curve where suddenly there's an inflection point and you can no longer handle increased load. So what we're doing in these tests basically is creating a single new PVC, see PV via PVC, persistent volume for each pod and scaling up the number of pods. The pods never come up to running space if the provisioning is not successful. You stack up a whole bunch of requests right after one another, they're async so there's a lot of parallelism. And we're able to drive, in this case, three worker nodes can handle 250 pods per worker node. There's some pods already running. So we're running it up to about 650 pods and it all stays pretty flat. And we see the newer container storage interface visioners are doing a little better than the entries which is good. So I could talk about this stuff all day. I've already running over time, sorry about that. Let's hand it back to Kendall and thank you for your time folks. You know where to find me. Yeah, that was a ton of really awesome, helpful information Tom. It's awesome to see all the different open source communities working together. It's not just OpenStack, it's also Seth and Kubernetes together. Do you know if there are any easy ways into the connection points in those communities Tom? Well, I know that the OpenStack technical committee has been working with the Kubernetes. There's a liaison that works with them, I forget her name, and Kendall Nelson. So the communities are working together and talking about how to integrate. Obviously, Kubernetes wants to work on every cloud, not just OpenStack. So we have special interest in that and in getting the story out that the two are complementary. With Seth, you're gonna hear a whole lot of Seth integration stuff in the next presentation, I think, so it'll almost speak for itself on that. Yeah, awesome, thank you very much. So it's a excellent lead-in to our next presenters, Francesco Pontano and John Fulton also from Red Hat to talk about the triple O integration with Seth. Hi, yeah, so I'm John Fulton, I work at Red Hat on OpenStack-Seth integration. Let's talk about why we would focus on this. So Tom already covered that he already explained that Seth is the most popular storage backend for OpenStack for both file and block storage through Manila and Cinder. There's a reference to that in the OpenStack analytics survey. It's been the most popular this year in 21 and in previous years. There's also a interest in hyper-converged infrastructure or HCI for short and by that, I mean running storage and compute on the same server. So given all those interests, we want to make it easy for people to be able to have these things with triple O. So triple O is our preferred deployment platform for deploying OpenStack, but it can also deploy Seth. So to make Seth is easily accessible to people, we just add a Seth option to triple O. So, oh, I'm deploying OpenStack. Yeah, let me get Seth with that too. Just add in Seth as easy as adding some additional YAML files when you do your deployment. We also have the option for the OpenStack that is deployed by triple O to use an existing Seth cluster. So if you have an existing Seth cluster and some properties about it, like it's IP and some SethX keys and pools, you can put those in a YAML file and also pass it to triple O so that when the OpenStack is deployed by triple O, it's ready to go, ready to use block file and object storage from that Seth cluster. And then we also test both Seth and OpenStack together, like it's just one system. As if Seth were a part of OpenStack, though it is its own separate project. So to give you some context and overview of how a triple O deployment works, it will triple O can do everything for you. So you can give it IPMI information about your hardware and it will then use Ironic to provision the hardware. It will configure the network. It can then deploy Seth and then deploy OpenStack. So it can do the whole thing for you. You can have a deployment definition file or files and a couple of YAML files describing the end state of your system and triple O will make it so. So for Seth deployment, Seth Ansible is a pretty popular tool for deploying Seth, but with the Seth Octopus release, a new tool called Seth idiom was introduced. So on the triple O Wallaby release, we switched from triggering Seth Ansible to triggering Seth idiom instead. And in this change, it presented an opportunity to improve the user interface. So some of the usability enhancements we did was we now have a faster client configuration. So we no longer use Seth Ansible's client role. We wrote our own very light Ansible role in triple O Ansible that focuses on getting the confile and the keys out quickly to your OpenStack clients. We also use Seth's natives tools as quickly as possible. So we still have our deployment definition, which came from our deployed metal and our composable roles. But then we have a new Ansible module that we just feed into triple O and it gets translated into a Seth spec. And I'll get more into that. But we generate the spec for you when then you're free to apply it. But if there's issues with it, you can modify the spec and you can drop right into the Seth idiom shell during deployment. We wanted you to be able to do that as quickly as possible. And all day two operations are done directly with Seth's tools, not with triple O on the Seth cluster itself. So for scale up and scale down, we still use triple O to add new nodes. But once it adds to the hardware, you can use a Seth idiom to add your OSTs, for example, if you're adding more storage. And we also have more isolation. So you can actually deploy Seth before you deploy OpenStack, even if you're using a hyperconverged deployment, meaning even if they co-locate the same hardware. I call this the deployed Seth feature. So the idea with deployed Seth was strategic decoupling, based on user feedback. So there were things we did wanna do. We wanted to do provision the hardware and network, do create the Ansible inventory for me and do put the right Seth service in the right place without my needing to know the details, which we cover in our composable role defaults, although you're free to change them, but you don't have to worry about them if you don't want to. Of course, configure OpenStack to talk to Seth and let me use Seth's native tools as soon as possible, but don't make it harder to debug the Seth deployment and don't require all Seth changes to be done in triple O. So those are some changes coming. So to put that back in the other picture, the deployed Seth feature, we introduced some new commands. So that it stays well integrated with triple O and configures things so they'll work well together. It lets you focus on number three separately. After you've done one and two, you can focus on three without having to worry about four, even if you're doing hyperconverged deployments also. So I have a link there to the documentation on how to use that. So it's called the deployed Seth feature because when you're deploying OpenStack, Seth is already deployed, even if it's internal or hyperconverged. So I'm gonna talk a little bit about more about how this works. So the Seth ADM tool, it's a nice little tool. The way it works is it bootstraps a tiny Seth cluster for you and then you just scale. So to bootstrap the first node, you just want one command, Seth ADM bootstrap and you specify your monitor IP. And then you have a working Seth monitor and a Seth manager. And then assuming you've distributed your SSH key pair for other hosts, which will be a part of your Seth cluster, it can then add those other hosts if they are listed in a spec file. So I could do Seth or apply and then reference a spec file and another YAML file. So those three things happen. When we wrote Ansible in triple O so it does this for you when you use the new OpenStack client with the triple O client plugin and the tools we wrote for Seth. So it takes care of that for you. It knows the IP because it set up the network. It distributes the SSH key pairs and it generates the spec based on your existing deployment definition. So if you're using triple O and you're used to maintaining things in the existing YAML files from the previous versions, those still work. We do a translation. So we have backwards compatibility, but then it's in a new spec file in the new format which you can look at. So we apply all this for you. So Francesco is gonna go into some more detail on the interface. Yeah, thank you. Thank you, John for the great presentation. So at this point, you can have deployed Seth first or you can just describe your cluster and push the button. This work is available from Wallaby and we are basically maintaining the same triple O heat templates interfaces passing the new environment file and the other overrides that where your Seth cluster is described. And then after the interface is provided, there are some triple O Ansible roles and modules that do basically four steps. They prepare the context. They create the user SSH keys are created and distributed all over the overcloud. And then instead of triggering Seth Ansible which is the old tool, we trigger SethADM, the new SethADM playbook. And at the end of this process, there is the cloud configuration by the new quick triple O Ansible role. The thing I want to highlight here is that now with this from Wallaby, RGW is deployed by default which is a big change from the past unless you want to exclude it from the first deploying and deploy just the first minimal cluster made by monitors, managers and OSDs. The other thing is that the triple O heat templates interface is backward compatible. So we're maintaining options that are where and are defined by overriders. So you can customize crush rules and create different device classes according to the Seth cluster and the hardware where it's deployed. You can customize Seth pools and they can be tied to this crush rules. So you can just define a different crush map with a different hierarchy according to device classes that can be faster or not. And then you have Seth config overrides which means that at deployment time you can pass a couple of information to the Seth that are assimilated by the Seth.conf so that they can be translated into manager options. Okay. The other two things I want to highlight at this point is that triple O integration is not just deploying the first minimal cluster and then adding some demos but we have also TLS everywhere and high availability. These are two key points for triple O deployment because having services with TLS and in high availability way it's one of the key points for triple O. So Seth is going to provide at this point native HA for NFS, for the dashboard, for RGW you can define an ingress like the Kubernetes way, let's say. And everything can be deployed behind HA proxy and Kpalagd for the beeps. But at the moment we have some changes we need to solve on the triple level which is basically the key insert generation and requests. We at that point we will have multiple HA proxy instances and the co-location of Pismagger and Kpalagd is problematic. So at this point triple O do two steps. The first one is generating keys and certs using a new Ansible role and then execute this role against the services that should go under TLS. And then at step two HA proxy is configured at open stack level and Pismagger is used for virtual IPs. Then this information are passed to SethADM which is able to deploy and assimilate the certificates that are generated on the previous step. There is also a link with an example out of the debt. As John said before, we have a spec where we can paste some information and RGW is just an example of that. Next slide. Okay, this is one of the other challenges we're trying to solve within the triple O project which is upgrades. When you upgrade your open stack environment we can do the same for Seth. And we used to do that using Cefansible which was the official deployment tool and the most powerful one. And it happened in different phases. When we get from Luminous to Naudilus we had the rolling update playbook and five store to blue store conversion which was a nice feature. From Naudilus to Pacific which is the target here we still have the same rolling update playbook executed by Sethansible. And then we need to make the cluster being adopted by the new backend, the SethADM demon which means that at the end of this process all the variables that were defined the status of the cluster that was maintained in triple O is now maintained in SethADM which is a really big change. And from Pacific onwards the upgrade path is provided by the orchestrator and it's an asynchronous process. So this means that you can start your upgrade defining the Seth version that you want you can monitor the upgrade or cancel even cancel the upgrade which means we have a rollback path and this means that more work can be done at the triple O in the triple O site. And yeah, this brings on what's next in triple O. As I said, the upgrade is one of the biggest chapter in terms of integration. And it would be an asynchronous process so it would be managed by the orchestrator. So the idea is to decouple more from triple O and make the two process separated so that operators can work on starting the upgrade, monitor the upgrade, roll back the upgrade if any problems occurring in the Seth cluster. And deployed Seth that was mentioned by John at the moment is just deploying, isolating the Seth deployment in a different phase than the overcloud deployment. And it's just a minimal cluster made by monitors, managers and OSDs. The idea is to add more services this step and reducing the time spent during the overcloud deployment phase where the Seth cluster is described and you can push the button and triple O make it so. So yeah, thank you. That's it. Yeah. I had no idea that Seth was so integrated with triple O but so you kind of touched on this a little bit but with Xena coming out next week and the start of like yoga development now. Are there plans for continued improvements? You touched on that a little bit but for integration with Seth and triple O like actively being developed and discussed right now. Yeah, definitely. There are a lot of topics we are going to discuss. For instance, the Manila team, we're collaborating with Manila team to support NFS managed by SethADM which is a big topic because we need to sort out some backwards compatibility and move forward with new features. So this is something we are going to discuss at the PTG for sure. Awesome, cool. Well, I'll talk a little bit about the PTG closer to the end of the show but we did have one question from the audience that maybe you both can talk about a little bit. How do you divide resources between Seth and OpenStack in hyperconvergence? What rules do you use for it? Yeah, we definitely have that in triple O. The more you know about your workload, the better you can isolate the resources. So the general philosophy is to split it on the NUMA node in a hyperconverged deployment so that you have one NUMA node for your Seth deployment on average you want to allow about five gigabytes per OSD you might have multiple OSDs per device because if you have faster and faster OSDs you'll get like if you're using NDMEs you might want to make four OSDs on them to get more performance out of them and therefore you would multiply that. It's a big counting game, right? So a certain amount of CPUs like one or two CPUs per OSD then five gig of RAM per OSD and then you have a certain amount of room leftover for your Nova compute workload. And then depending on the needs of that Nova compute workload you can also schedule it with pinning. So we have done testing where we're doing network function virtualization on hyperconverged nodes so that the Seth processes don't preempt that low latency workload which we need. So it's definitely possible. It's just a matter of getting system tuning right. So we have that documented and we have flags in triple O so that you can deploy it with that network that isolation out of the gate. Awesome. Hopefully we can get some links to those resources and share them after the episode. Yeah. Thank you very much. So now that we have a basic understanding of like Seth and how it fits into OpenStack both integration with Manila and Cinder but also with triple O. Let's dive into some of the use cases and like real world companies running these things together. So first we have Luan Arun from Society General to talk about their deployment of OpenStack and Seth together. Oh, I think you're muted. You are correct. That's better. Thank you, Kenel. So, hello, I'm Luan Arun from Stage General and we're gonna explain about our experience with OpenStack and Seth as that we had so far. So just a quick word about Stage General. So Stage General is one of Rob's leading financial service group. It has been on the economy for about 150 years so far. So it supports 30 million clients every day with 133,000 staff in 61 countries. And like you can see on the screen, we have 127 data center worldwide, mainly in Europe, of course, because it's where Stage General starts but it's expanded in US, Asia and Africa. So let's talk about the origin of OpenStack and where it comes from. So the platform of OpenStack in Stage General started in 2018. It was first in Queens release. So at the beginning we only had two region available, one in Paris and one in North, both in France, of course. We had just 20 hypervisor and we used the cola and cola and sable to deploy it because it was very easy to use and also it's easy to integrate Seth with cola and sable. So it was, because we wanted, of course, Seth as backend, it was a good tool to use. And today, so it has been like three years, almost exactly three years. And so we have no four region. So we have still Paris and North in France and we have also Hong Kong in Asia and in US, we are in New York. And no, we have also multiple other regions. So in Paris we have 12 regions, two in Hong Kong and 20 US. And on safe cluster, we have one safe cluster of ability zone. So it means that today we have a total of seven safe cluster. We have a total worldwide of 350 hypervisors and currently there's around 12,000 running VM and that represent around 43,000 VCP consumed and 210 terabyte of RAM consumed. On safe side, so we start on safe. We used, of course, Seth on sable. At that time, it was the more useful tool. Safe ADM doesn't exist. So it was the best tool to use. We was in mimic release, of course. And one particularity that we have in Stez General, it's like we split the organization power services and our team was more focused about the compute side. And so for the backend, so we use Seth only for instances and images. So only for a disk, VM disk and glance images and about the volume offer because we decided that we are more focused on the compute side. There is a team dedicated to the volume services. And so they maintain on their side sender and for the backend they decided to use per storage instead of Seth. The reason is because they was more confident to use this product and to have some editor behind to have some backup and also provide a built-in solution with all the architecture and the hardware. And we use also cache-turing feature. The reason why we use the cache-turing feature is because when we started in 2018, so we had hyper-converge infrastructures and the hardware that we order at that time was similar to an hardware that we were used to order for another private cloud that is running on VMware on Stez General. And so on this hardware, we had some SSD and HDD and at start when we start to configure Seth, the crash map, the crash rule, sorry, that we configure was containing was not specified on the class of the SSD, the disk, sorry, but on all the disk at the same time. So it's mean that the images and the instances were running on SSD and SSD at the same time. So that's mean that the performance was kind of random depending of the luck that you have if you imagine or VM was popped on the, on the PG that was only on SSD or HDD. So that wasn't really a perfect solution. So to be able to be certain that all the client have the same performance and to use more efficiencies of SSD, we decided to use the cache-turing feature. And the balance module, I think everybody that have Seth in production knows that by default, the balance of data between the OSD are not very perfect. And so the balance story is here to help you to be certain that the usage of your disk is kind of optimized. We had some issues with the balance of module. We like, for example, when we did some new OSD, sometimes they wasn't being targeted by the balancer or the balancer didn't take them in count and didn't try to rebalance data. So we had to sometime work on the crash map directly to we did some stuff. But it's going better because recently we previously used the ChromePad method and we moved to web map and it's a lot better right now. And quick world about the scale of Seth. So we have today 60 hypervisor worldwide that are running Seth with 650 OSD. And worldwide, so the terabytes used, I didn't put the number of object but it's pretty high also, of course. It's like 500 terabytes used on 1,000 terabytes available. Depend of the region, of course, some region are more used than others. But we try to, of course, be able to always have some free space because it's running production and it's always better to have some free spare. And finally, we also use the primitive from Grafana for monitoring purpose. So what's next? So the first stuff that we want to do, of course, it's upgrade Seth. Like I said, we start in Mimic in 2018 and unfortunately we still are in Mimic Ries. We had some issues in the past because of that. So that's another reason why of course we want to migrate to Pacific. And of course, the other obvious reason is because it's more stable, performance has improved and there is new feature very useful like Seth IDM. Also, we want to get rid of cache sharing because it's setting a lot of complexity on the Seth side. And it's not a feature that is well-used worldwide. So in case of issue or question or anything, it's more complicated to find people that have the experience with the tool. And because Seth is not the main part of what we work on on the team, we want to have the simplest Seth infrastructure as we can. So that's another one we want to get rid of that. And of course, we want to duplicate the hyper-convert model for biggest region. For example, in Paris, that is the biggest region so far. We have more than on one cluster. We have around 160 OSD that has split and 16 host. And because they are running at the same time, OpenStack resource plane and Seth resource, there is some conflict time to time or when you need to put the host in maintenance to make some hardware modification or upgrade or anything. It's more complicated because you need to take care of negative the VM, say, remove the host from the scheduler. And then on Seth side, maybe put some flag, for example, on Seth to not replicate the data because, for example, just want to rebuild the host. So it's adding some complexity also. And so that's why we, for example, in Paris, the biggest region, we order a lot of new hardware and it will allow us to move to a specific hypervisor that will only host Seth with only SSD disk. So we want to, at some point, have only SSD on Seth running. It would be a lot more simple to manage and we will have less pool and less complexity on the crash map. And the last part is containerized Seth on the host. So when we use the colon Cable to deploy OpenStack, we, of course, decided to containerize the component. But for Seth, we didn't take that choice. So right now, for example, Seth is running on the bar metal itself. And so because we want to migrate to new infrastructure, it could be nice to, at the same time, of course, containerize Seth because we are more used to you, to work with container and the team. And for all perspective, it makes it easier to deploy and run Seth. I think using containers are just running it on bar metal itself. And that's conclude my presentation of the stage general. Thank you. That was very, very interesting learning about all the details of implementation. I'm glad you touched on upgrades and stuff as well. So if you're planning on upgrading from Mimic to Pacific for Seth, what release are you currently running for OpenStack? Is it you, Shuri? Yeah, we are currently in New Surrey and we plan to migrate to Xena maybe at the start of next week. Oh, awesome. For Seth, so you can't upgrade the athlete to Pacific. So we're gonna need to run multiple upgrades. Okay. And at the moment, we only perform minor upgrades on Mimic. So we never did a big major upgrade. So we still have stuff to learn on to read, but it's in progress. And I think it should be, the minor upgrade was quite easy to run. So I expect the others to be as easy as this one. Good luck with all of that upgrades. I know are always a fun topic to discuss, especially running a cloud at the size that your company does. Yeah. So thank you for all the information. We'll keep on moving here. Our next deployment is at Vexhost and here to talk about it today, we have Mohamed Nasser. Thanks, Kendall. Hi, everyone. So I think so far it's been a really awesome show and it's been a lot of different perspectives, but I want to start by sharing our perspective and give it a little intro about us. So my name's Mohamed, I'm the CEO at Vexhost. I'm also a board member on the, what is now the Open Infra Foundation that probably should fix that slide. And I'm also on the OpenStack Technical Committee. I used to be the project team lead for Puppet OpenStack and OpenStack Ansible, currently working a lot hands-on with OpenStack Helm. Here's my Twitter for the Twitter folks. And yeah, so I wanted to talk about our story and experience with Zef and OpenStack and we do a little bit of Kubernetes as well. So what we do is currently all of our deployments are running Octopus at this point. We try to do as well of a job as we can to keep up with updates and actually we have some specific deployments but Octopus is, I'd say, the oldest deployments in our cluster. Staying up to date in Zef is really something that we find is really good because a lot of times you think things are going stable then you run into some weird issue that is only solved in the future release. And so you're having to do an upgrade in the time where you really don't want to do an upgrade. So in my experience with Zef, the stability of the software is really, really good. So your risks of having an issues while doing an upgrade are probably far less likely than some weird problem that shows up down the line. So what we use it for, so our OpenStack Control Plane for all of our deployments runs inside Kubernetes. And so we actually use the Zef Rbd CSI that's maintained by the community to provision Rbd volumes that are attached as Zef persistent volumes to run our OpenStack Control Plane for things like Rabbit, QState, and MariaDB storage. We also use Rbd for virtual machines for block storage. So obviously for Cinder, anytime you create a volume in our public clouds or any private clouds that we deploy for customers, it's actually a Zef volume underlying there. And for a few clouds, we offer the Swift and S3 Compatible API using Rados gateway. So that's something that's quite popular and especially for a private cloud environment, having one storage solution that can encompass both of your block and object storage needs as well as file share storage needs is really nice and useful. So over the years of us operating as Zef which is probably entering maybe 10 years or so, we've gathered a few things that I wanted to share on what to do and what not to do. So I'll start it off with using the straightforward code base. And what I mean by saying the straightforward code base is there's a lot of little features and interesting things that are kind of maybe recently introduced or recently added or things that you could possibly do but are not usually like as well documented. Those are usually the places where you might start running into edge cases and that's when you might be in a really bad spot because now you're working with something that not a lot of people have had experience with or perhaps the code base is not as rigorously tested as possible. When I say like straightforward code base, I mean, simple pools, maybe different device classes that you're using, maybe different sticking to three replicas, three monitors, don't put OSDs on the monitor. No, it's like whatever is kind of the de facto, if you go with that, you're probably gonna have a really good experience and very, you'll be able to sleep at night well. And when I also mentioned word carefully, it's really important to kind of realize that it's like a large distributed system. And so everything, every change that you make could have a lot of different effects across the entire system. So being very mindful in every change that you do, because you could cause potentially problems for the cluster, I remember, this is a really, really, really old story, but I remember the folks had dream host where Seth first started, had botched the crush map and ended up moving all of the data to like the first track of their entire system and then having to rebalance everything again when they made I think a crush rule typo or something like that, it was a long time ago. Which I really advise to using three replicas at least. Two replicas is very tempting, especially with how we feel NVMEs and SSDs are reliable. But the biggest problem with running less than three replicas is if you get unlucky in how power is shut down or something like that, Seth is gonna have a hard time figuring out which one is the authority because it's impossible to have quorum. It can't have two replicas that are agreeing with each other and one that's not so that it could decide that that's the right thing. And sadly, the internet is full of horror stories of folks that were running two replicas that ran into problems when they had a crash or something. And also it really kind of hinders you a little bit in your maintenance and any issues that you have. I mentioned staying up to date, so going back to that at least, if you're too worried about doing major upgrades, at least stick to minor upgrades, Seth is relatively painless to upgrade. As much as it's a stateful storage service, its services are relatively stateless. So it's really just a package update restart of a service sort of thing. And if you've got the rest of your infrastructure well-designed, you should be pretty easy to do these rolling restarts. So these are the things that I wanna ask people to be careful about. Cached hearing, it's one of those things that looks great on paper, but you really need to understand your workload. You really need to understand what are the consequences of using cached hearing. And it's not a straightforward code base. It's not something that people commonly use. So you might run into weird issues that can really rattle you up your deployment and kind of get you worried. Erasure coding is the same thing. It's also, I feel like it's not as commonly used. I've always been worried about it because we've seen some edge cases, just reading some of the community experience. And what I find is, I mean, hardware like storage has become pretty cheap relatively speaking. And when you're running Ceph, which is an open source toolkit, you're saving a lot on appliances and software licenses and all that other stuff, which you could invest into hardware and get some reliable hardware and just go for normal replication. Careful about hyperconverge. So hyperconverge is something that we do offer and we've ran in the past, but it really is a very tricky thing to continue to run and do it right. You would get things like, what if down the line, you don't need more compute resources, but you need more storage resources. But you've only allocated 40 or 50 or 60 gigs of memory on that compute node for Ceph. Well, do you add more disks? But if you add more disks, you're gonna start eating up in some of your compute capacity. And so you starting have that kind of linking of compute capacity and storage capacity, as well as the noisy neighbor issue that we all know about. And then it's even worse when you have Ceph OSDs that are running. So it's quite tricky. You've got things like contention for network, contention for memory, contention for CPU resources. There's just a lot that you really need to get right to get it working. And overall, I would say the cost savings is probably trivial at the end of the day, unless you're working, and you know, some of the red hat folks mentioned NFV. So unless you're working in an image environment where space is very expensive, the one-time cost of purchasing a few more servers to run dedicated Ceph is going to significantly outweigh the advantages of going hyper-conversed. Rate controllers is another tricky thing. If you're gonna run them, just put battery back up units. Ceph assumes that when it writes to disks, things are written to disk for real and rate controllers can lie about that. So if you have a rate controller with a bad battery or no battery that's doing right back caching, you're probably gonna be in for a bad time if power goes out or something like that. Finally, don't over-complicate and over-orchestrate your Ceph deployment. For example, Opens.com allows you to deploy Ceph inside Kubernetes. I am terrified of that. I think Ceph needs to be a stable and as calm of an environment it needs to be. It's also very easy to deploy. So there's not really a need for us to implement a lot of orchestration tooling and all of that. It's just, you know, make it simple, simple binary, simple packages, get it up and running. And then finally is don't cheap out on hardware. I know we all see Ceph as a great thing to run inside of a white box environment and you don't have to buy expensive appliances from vendors and whatnot. But the difference in cost, take some of that and put it towards really reliable hardware because one of the things that Ceph is great at is doing what it does. But one of the things that it can struggle is if the underlying infrastructure is problematic. If your network is weirdly dropping things, if you're getting weird disconnects across the network, if your systems are running out of resources, those are the things that can introduce edge cases and then you don't really want to have issues with yourself because it's really gonna probably be at the heart of your entire cloud deployment and then the Ceph issues is a result in a widespread outage and nobody wants to deal with that. So those are some of the things that I wanted to just share about what to do and what not to do. And I guess I'll hand it over to Ken Bill. Yeah, so you talked a lot about things to consider when deploying your clusters and getting your cloud set up. And you definitely have a lot of experience with all the different deployment tools, having been PTL of two of the different projects and being involved in OpenStack Helm now. So which one would you recommend, I guess for like first-time deployers? So I think they're all pretty good. The only one that I would rule out and I think it's a great project, it's not kind of like, I don't think it's not good, but Puppet OpenStack is designed for purely like simple modules. It's kind of like you have to go and tape them all together. OpenStack Ansible and OpenStack Helm is a bit more integrated where you can just kind of go ahead and deploy and triple O as well. So your options, if you want to get something like a single, all like a single node, that's pretty close. I would look at Packstack, which is an interesting option that's provided by the RDO community. That'll get you something inside of a CentOS system very easily. And DevStack is also an option. Those are the, in my opinion, the best one server tooling and then anything that goes a little bit beyond that, I would suggest, seeing what works for best read between triple O, OpenStack Helm and OpenStack Ansible. Those are all very solid community based options. Sorry, and Col-Ansible. That's something that I haven't interacted with that much, but I know a lot of individuals that deploy using it. Yeah, I have definitely heard of a lot of good work being done, especially in the last couple of releases on Col-Ansible. It's definitely been one of the larger growing in popularity deployment tool options. So if we could bring everybody back on, we have a lot of questions from the audience, but unfortunately we're running out of time, so we'll try to get through a couple of them quick. So earlier in the show, there was one from Danny on YouTube. It was, why Ganesha into OpenShift, as opposed to, say, RBD or straight Cepifes? I think, Tom, maybe you're probably best poised to answer that one since you were talking about that, but if anyone else has an opinion, feel free. RBD, and they're basically, you wanna be able, RBD's block and Ganesha's providing file. So they're very different use cases. We briefly talked about the fact that when you write from multiple nodes, I were talking about it in the Kubernetes context, but it also applies just to VMs. When you write from multiple nodes, if you have a node local file system on your block device, it's not gonna work out well. You need a shared file system or a clustered file system. So that's basically the reason why file system, and Ganesha's one example of file system would work that. The other case was native Cephifes versus Ganesha in the question. And there it all comes down to whether your users, your tenants, in general, so to that word, are cases of public cloud, it's some large enterprises, many cases your general users need to be able to use stuff but are not trusted to access the cloud infrastructure with the native Cephifes you run a Cephifes client in a VM that a guest owns, and there's cooperation required between client and server. If the user isn't keeping the software up to date with CBEs, you might get in trouble. Quota enforcement depends on clients, et cetera. So you really only want to use native Cephifes for example, if you're a cloud administrator, deploying your own public Kubernetes service on your own VMs, then yeah, use native Cephifes. You are trusted, you're yourself, that's the clear. But in the case of the other extreme, a public cloud where somebody gave you a credit card and you don't know whether they're running Windows or BSD or an old Linux of some sort or whatever, you don't know what kind of Cephifes client's running there. You don't know if the kernel's been hacked, et cetera. So an NFS gateway is one way to protect that. I'll put in a plug for PTG, we're also working on BirdIOS for as another way to protect untrusted VMs and keep them from direct access to the Cephifes public network. Yeah, security is always a thing to keep in mind when making decisions on how you're gonna set something up and deploy it. So we have another question from YouTube, YouTube's a popular place today, from Vlad. Any tips to lower latency on all flash Ceph for read operations, what are the main things to look at? We run NVMEs across all of our stuff. I will recently finish switching to everything. One of the things that I would suggest looking at is if you have the latest release of Ceph, the recently introduced feature, which is a read cache that runs on the hypervisor, they could run on the hypervisor itself, right? So whenever you're doing reads, it could actually cache the data locally. So the next time that you do the read, it's actually reading it from the local system. So you can designate a hard drive on the same local system that's going to be the caching reads so that the reads are coming faster and don't have to go across the network. That's gonna help reduce latency for reads, but that's kind of one of the things that I might throw out as an idea. How's that? Any better? I don't know if you're using multiple OSDs per NVME. There's a nice graph on how you can just get more and more performance out of your NVME by having multiple OSDs on them. And of course, there's BlueStore and then CStore is coming out, which will use the NVMEs even better more efficiently. Cool. Well, one last question before we wrap up here, there are so many good ones from the audience. I really wish we could get to them all, but generally, why are OpenStack and stuff such a good combo? I'm sure everybody could come up with an opinion on this. Well, sorry, I get excited about it. But I mean, it's kind of like OpenStack here, come onto the hardware, build your cloud and you're like, okay, cool. The virtualization is there, but what about the storage? Well, the storage tools in OpenStack are all provide interfaces to get to your storage, but what do you want to actually use to provide your storage? What is the commodity hardware solution to this and that's Ceph? So that's where it fits so naturally. It's the open source way to take commodity hardware and make a great storage cluster out of it. Yeah, I agree a lot with that as well, because when you think at all the other options, I think the biggest thing that draws to Ceph is the availability features. So I mean, if you use something like LVM through Cinder, I mean, a great, it does the job, but if that node goes down, everything goes out and Ceph is kind of like a really, really intelligent system that does all of that for you. And as well, it's probably, it's the most popular and like I said, use the most tested code bases. So most people are running block storage with Cinder, you're probably gonna have a really good experience using Ceph and there's a reason everybody is using it. Totally. And the integration, I guess, on the tool with Colanceable on everything else is quite well done also. So it's quite really easy to integrate Ceph with OpenSack because it's just open source and all the other open source tools like Colanceable and everything is just well implemented all together. So let's make a great point to use. Open source communities coming together is a beautiful thing, building open infrastructure with project after project and all of them are supported by global communities. It's just my favorite part of my job. So we've covered a ton of ground today, everything from the basics of Ceph and how it integrates with OpenSack to large deployments globally with Society General and Backstiles as well. So thank you all for joining us. I wanna give a huge thanks to our awesome speakers. We really, really appreciate you taking the time to be here with me today. So a couple of things before we wrap up this week, if you're interested in getting involved or following the development of these integrations further, you can go directly to the Ceph documentation. We also have the OpenSack or formerly OpenSack, now Open Infrastructure Project Team Gathering happening October 18th through the 22nd, where we will have a lot of teams talking about these topics. So Manila, Swift, Triple O and also Cinder will all be meeting throughout the week. And the OpenSack Technical Committee is also coordinating time at some point with the Kubernetes Steering Committee to talk about how our communities overlap and integrate. Quick shout out to all of our member companies, several of whom were represented here by speakers today for making Open InfraLive possible. If you're interested in joining the Open Infra Foundation as a member, you can learn more at openinfra.dev.join. If you haven't already heard about the Open InfraLive keynotes, those will be coming up November 17th and 18th. Join us for a two-day special edition of this show, not this one specifically on set, but generally Open InfraLive. And it'll be the best opportunity for the global community to get together this year and hear about all things related to Open Infra. Registration is now live, so you can go sign up today and join us for those keynotes. Also, remember if you have an idea for a future episode, we would love to hear from you. Go ahead and submit your ideas at ideas.openinfra.live. And lastly, we really hope that you are able to join us next Thursday for an episode covering what will be our 24th on-time release of OpenStack, the Xena release. So join us to hear all about what's coming and hear from the developers that made it happen right here next Thursday at 14UTC. Thank you again for all of your wonderful input and information from today's guests, and we'll see you next week on Open InfraLive.