 Hi there. Hello to everyone listening and watching on the internet. My name's Jose. This is Ashik. We're here to talk about hyperconverged persistence storage for containers with Gluster FS. All right. We just, I just recently came down from giving a five minute version of this presentation a couple hours ago. So I'm still a little jazzed from that. So bear with me if I'm talking a little fast. All right. So who are we? I'm Jose. I work for Red Hat. I'm a member of the Kubernetes storage SIG and still of the Samba team. They haven't kicked me out yet. I work on a lot, mostly integration work with container platforms, distributed storage systems, network protocols and things like that. Most recently I helped develop a tool for deploying Gluster FS in the hyperconverged scenario in Kubernetes. And we'll be demonstrating that later on in this presentation. Hi. I am Ashik. So I work mostly on Gluster containerization and the integration part with Kubernetes and OpenShift. So I kind of contributed on all the components which is required for the integration. And that's about it. All right. So in this presentation we're going to go over, you know, a little bit of obviously why we're doing this and what problem we're even trying to solve. We're going to go over what it took to solve this problem. And that's going to be a lot of, a fair amount of technical detail as to how we created hyperconvergence that way. And then we are going to have, unfortunately, not a live demo because, you know, networking and disconference. So I recorded a video that we're going to be talking over instead. All right. Why are we doing this? So basically we found we could solve a, we found a problem that we could solve effectively. Containers being ephemeral in nature or wanting to be ephemeral in nature. Conflicts with a lot of traditional and even just modern, and even some modern applications that just cannot be ephemeral and stateless. They need storage and they need storage to persist between, you know, service outages or restarts of the service. To do persistent storage, especially in Kubernetes usually involves investment in some sort of external infrastructure, whether it be an onsite storage solution like a, like a NAS device or something, or going to, or going to a cloud backed storage solution. It still requires a fair amount of investment, whether it's on one hunk and piece of hardware that you then have to service, or some bit of storage that you have to rent out month after month. And we wanted to do something that cuts that cost down as much as possible, while at the same time being relatively transparent and easy to use, both for administrators create using, providing the storage and users wanting to make use of that storage. And while we're at it, let's make it free and open source with the support of community, because we're Red Hat. All right, so our target platforms are Kubernetes and OpenShift, because Red Hat. And the technologies we're using to solve this problem are GlusterFS and Hiketti. Those are their cute little logos up there in the URLs. You can catch these slides online. Right now, actually, I just uploaded them yesterday. All right, so GlusterFS is a distributed, software-defined file system. It creates what are called BRICS, which are just what most people know as logical volumes on storage devices. So you take a storage device, you put partitions on it, and usually those partitions can then become BRICS. Those BRICS are then put together into Gluster volumes, which can be accessed from any node in the Gluster cluster. So if you have a client that's trying to access your Gluster cluster, you can go to any one of the nodes, any one of the server nodes, and they will all have access to the volumes that they know about. Some interesting, some cool things about this is that Gluster was designed to run on commodity hardware. There's a link up there for a blog post showing it running on a Raspberry Pi. It has a scale-out design, meaning that it's easy for you to expand your storage just by adding more nodes to it. Obviously, I don't need to be talking too much about this here, and it provides useful features like cross-node replication, usage balancing, and iSCSI storage access. Now, the part that most people probably won't know too much about. Heceti. Heceti is the restful volume management interface for Gluster FS. It allows you a consistent and programmatic interface for performing most of the common Gluster volume management tasks, like creating volumes, deleting volumes, expanding the size of volumes, etc. It can manage multiple clusters from a single instance, multiple Gluster clusters specifically from a single instance, and it's fairly lightweight, reliable, and simple by design. When you put it all together, it looks something like this, where you have your Kubernetes or OpenShift cluster running various pods in it, one of which will be your Heceti pod for your restful API. If it goes down, obviously, Kubernetes will just move it around or spin up a new one somewhere in the cluster, and then you have various Gluster pods that are all logically joined together into one Gluster cluster on top of the Kubernetes cluster. Each node has some nodes, as you can see there, have some storage attached to it. It doesn't have to be perfect mirrors of each other across the topology. It can vary. Some can have three disks, some can have two. Then you just put your Gluster pod on whatever node is running storage, and now you can access that storage via Gluster. You can find our work that we put in to sort of glue all this together on the Gluster Kubernetes project on GitHub. URL is up there. It documents how we put all this together and how you can put this together in your own setups, provides an easy to use deployment tool, that thing I mentioned I worked on earlier, and has a quick start guide for those who want to start playing with it right away in VMs or on bare metal, whatever you choose. All right, so that's kind of the setup of what we're doing and what we decided to do about it. Now, here's how it all happened. So, hi. So, when we started hyperconverging the Gluster pods on Kubernetes or OpenShift, first task for us was to containerize Gluster. So, Gluster was mostly system software kind of a thing, but it was user space software, but it was dependent on those nodes more. So, because of the devices, which we need to access to create the volumes. So, this is how the Docker command looks like if you run it. So, it's pretty big. And that is the Kubernetes YAML file which we have. So, what we faced when we containerize Gluster is what I'm going to talk about now. It's just these are the issues which we faced. So, when Gluster runs more than one process from Gluster. So, it has its brick process running under it and it has, it spawns it separately and it has its own NFS service servers running. So, these processes need to be ran along with Gluster. So, we needed a container which can hold more than one process. So, we moved to SystemD with Dan Walsh help. So, we moved to SystemD and then we needed privileges because we were running a SystemD container. And then this start-up script, I'll explain about this start-up script little bit later. And the by mounts for the Gluster configuration. So, these configurations shouldn't be moving around along with the containers. So, it should be spawned on the same node with the same configuration when the Gluster goes down and comes back up. So, we need, we bimounted from the host and the devices were bimounted from the host and we use, we prefer using host networking for better performance. So, this is basically what I am supposed to tell. Yeah, SystemD container. So, we needed someone to manage all the process, all the Gluster process which we run to clean it up. And also, we need more support to run Gluster containers because we had more process before running Gluster. I mean, when we wanted a containerized Gluster. So, we needed SystemD and privileges. We don't need privileges anymore for SystemD to run in a container because of the OCI, SystemD hooks. So, we don't actually need SystemD, I mean privileges for SystemD container, but we need privileged container for accessing the devices and to create the LVs from the container. So, we create logical volume from Gluster container for Gluster to use it. And the startup script is just a initial script which we run which does all the things that an RPM installation does before running Gluster. So, in case of upgrade, this script will take care of doing versioning for Gluster which we do in RPM installation or RPM upgrade. So, we needed a place holder for these things to do. So, we have an initial setup script which does these things in the container. And these are those persisting configurations we need from the host. So, Warlib Gluster is required to manage all the Gluster volume configuration. That's where that's the working directory of Gluster. That's where we store all the context of Glusters. And the Warlib Gluster is just a log. And ETC GlusterFS is the configuration file for the GlusterD, which is a management demon of Gluster. So, all the configurations for that is in ETC GlusterFS. So, Bimount devices. So, we had to Bimount the slash div inside the container, but the initial plan was to create the LVs on the host and give the LVs to the container and then use it so that we can get rid of slash div. But it was really hard to scale because when you want to create one more volume when you create another LV, it was tough to create it on the host and then Bimount it again because you have to bring down the container and bring it back up with the node. So, we decided to put slash div inside the container. We had a lot of issues with Udev and it is solved now. So, it is working completely fine with slash div Bimounting inside the container. It does not mess up your host device. So, host network, we could have maintained the same host name for the container and still use Gluster, which could have, but we just thought it would be one more network hop. Instead, we can use the host network as we are not moving Gluster parts around. So, we use Gluster host networking for Gluster containers basically and it gives better performance for, because it is a network shat. Then containerizing Hackety. Hackety containerizing was not that tough. So, before telling this, what Hackety does it? It gets all these, let's say if three nodes are there in the cluster and you install Gluster on all these three nodes and there are three devices on each. So, you hand over these devices to Hackety. So, when Hackety starts with this topology file, Hackety goes and peer props with all these nodes. That's what we call in Gluster terms like creating a storage pool. This is how it forms a pool. We peer prop from one node to another and form a pool. And then Hackety goes inside these devices, creates the PV required and the VG required for those devices. And when you give a request of volume from Hackety, it comes to the Gluster nodes. It has a ring algorithm inside, which decides where the bricks will land. So, it takes each devices and creates logical volumes from each device. Let's say replica three volume if you want. It goes to three nodes and creates bricks on each node and then creates a volume out of it so that if one node goes down, the volume will be still serving. That's why we use replica three mostly in this solution. We prefer replica three. So, Hackety does this volume creation for you. You don't have to actually worry about the Gluster commands. It is really easy with Hackety on it. And it needed a, it has its own database to store these configurations of these at the nodes and these are the devices which Gluster can make use of. So, it stores that in the DB, which was really a problem for us because the DB will go down when a Hackety pod goes down. So, what we thought of doing was creating a Gluster volume for it and putting the DB inside the volume when the Hackety comes up. So, that's what we do. We give persistent volume and we use it as well. So, and it was, Hackety was using through SSH. Now, we moved to Cube Exec which needed few secrets in the Hackety pod. So, we also need to create a service account which will be used by Hackety to access the Gluster pods. That's all about Hackety containerizing and deployment and usage. So, persistent storage. In point of Kubernetes and OpenShift, they have lot of volume plugins inside. If you all know, so volume plugins is just a way for different kind of storage providers to use their volumes. So, in case of Gluster volume, what we do is, we have a Gluster volume plugin inside Kubernetes. If you want to use a Gluster volume inside your pod, you will mention the IP of the node and the volume name in the, I mean, in your volume mount section. What internally the Kubernetes volume plugin does it, it will bimount, I mean, it will mount the Gluster FS on the host and then bimount that mount point to the container wherever you specify for a persistent volume. So, that's how our volume plugin works in the Kubernetes world. And there are two ways to provision volumes. One is static provisioning. So, you request for a volume in Kubernetes. You request for a volume through persistent volume claim. And admins will create persistent volumes, which will be backended by some network storage provider. Can be Amazon, can be Gluster, can be F, can be anything. So, he has to create a static provision. So, admin has to go back in and create the volume and give it to the customers or users in the Kubernetes as a persistent volume. When a user asks for a persistent volume claim, if there was a persistent volume, which will support the persistent volume claim, it will get bound. If it is not there, then you have to request the admin and he will create it and give it to you. That's what static provisioning means. I mean, that is static provisioning and dynamic provisioning is when you have a storage class defined in your Kubernetes saying this is the storage pool and if someone requests you for a storage, go back and create a storage for you. So, that is what a storage class does. So, now when you create a persistent volume claim, you will specify the storage class. It will go to the network storage and create a volume and mount it on the persistent volume claim. So, that is what dynamic provisioning means. That's what we do in Gluster in the Kubernetes. So, when a persistent volume claim comes with a request for a volume from Gluster, admin does not have to do anything. It just goes back in and creates a volume from the Gluster pool and it creates a pv for it and admin the pvc is bound automatically to the pv. So, this is persistent volume claim and persistent volume and storage class. So, dynamic provisioning. This is what I was explaining. So, dynamic provisioning it has a storage class. Storage class is a way to define your back end storage. So, this is your storage, this is your URL and these are the option to create a volume is what you give in the storage class mostly. I'll show you how the Gluster storage class looks like. So, we have a persistent volume claim which is nothing but a user request for a volume and persistent volume is the actual volume which is back end network storage and the pvc is bound with the pv based on the volume size and the access modes. So, this is how it works. Dynamic provisioning you get a claim with and it points to a storage class. It goes behind to the storage and creates a persistent volume for you and attaches it to the persistent volume claim. So, if you can see that the name of the storage class is Gluster and Provisioner JPI and the endpoint is the point where you have all the Gluster IPs mentioned for using those volumes and the rest URL is the URL where Hikati is running. So, all the requests from this storage class will go to the Hikati and Hikati will create a volume for you. The rest user and the user T is also for Hikati. So, this is the name user name for Hikati and it is the user key which Hikati wants to use to access and create volumes. To use a Gluster volume inside a container these these two things are important endpoint and service. So, endpoints define where the Gluster volume is. So, if you if you if you are created a replica three volumes say so, it is from three nodes. We specify that all three IPs in the endpoint file and service is used to access those IPs and these are the options which you can specify in the storage class. So, as he already said Hikati manages more than one cluster of Gluster. So, let's say you have two Gluster clusters one has faster storage SSDs and one has smaller mean slower storage devices and you can mention those cluster IDs here which is which is created from Hikati. So, you can create you can create a storage class which will create a volume from the faster storage cluster of Gluster and you can create a storage class from the slower accessible volumes. So, that ID is given from Hikati and you can create that you can mention that in the storage class. So, that only that volumes are only that cluster is used to create this volumes and the user name is again the same Hikati user names and for security we have GID. So, these are options which you can use if you want to secure your wall bin contents in a wall persistent volume. So, only if you have the exact GID that you're requested in your part that's when you can use this volume. So, if you don't know the GIDs which you are going to use in your pod and the on the GID of the volume if this doesn't match you cannot use the volume or you cannot see the content of the volume. So, your user has a user has a mean secure way to create a volume and put his data inside. So, that no one else who has access to these volumes can still read the data or write into the data write into the volume. So, this is it. Alrighty, thanks Sheik or thanks Ashik. At this point we've achieved effectively full hyperconvergence. As mentioned GlusterFS and Hikati now running containers within Kubernetes. These are just some iterations of things we said earlier in the presentation. Yeah, it's about right. So, now is what I would normally be showing you a live demo but unfortunately I destroyed my demo cluster. So, instead I'm going to try and talk over this video I recorded which hopefully will be visible enough especially for the people watching at home. Alrighty, so we're starting with a three node Kubernetes cluster one master three nodes and we're running nothing other than Kubernetes the Kubernetes service. So, now we're going to run our GK deploy tool with a couple options and a topology file. I should mention the topology filing question is just a properly formatted JSON file that describes the layout of or that describes which note which IP addresses correspond to the servers that are going to have that are going to be running Gluster and also a listing of the devices on those servers that that Hikati is going to be co-opting for use with GlusterFS. Alright, so at this point we have started, we have started deploying Gluster on the nodes that we specified. Here the containers are spinning up on those nodes. This should take only a couple seconds and now we're deploying Hikati. So, as Ashik mentioned, one of the things we had to do is that we needed to store Hikati's database somewhere that was persistent. So, what we do is that we bring up Hikati one time and generate a and use it to generate a database file. Then we go through Hikati to create a Gluster volume within the cluster that it's managing. So, here in a couple seconds you'll see that that's the output from loading the topology file and adding all the devices. So, now we're creating a Gluster volume and then copying the contents of our database into that Gluster volume we just created and then killing the deploy Hikati pod as we call it and then spinning up a new Hikati pod that uses that Gluster volume we just created to run its database. And now Hikati is running and let's see to show you here. Come on, there I go. All right. We are running several Gluster pods and one Hikati pod and an endpoint service. All right. So, now I'm going to show you a quick demonstration to air quote prove that we're using persistent storage underneath. Here we have a Gluster storage class that is that is an example of what it looks like. Notice that we specify the endpoints the endpoint and URL in the storage class. This is all being done by the way in the guise of an administrator setting this up. So, this first step would be the administrator setting up the storage class. We specify the YAML file and it's done. All right. So, now we're moving into user land. We're going to create a PVC a persistent volume claim as someone trying to deploy an application in Kubernetes rather than providing Kubernetes. So, we create this persistent volume claim YAML file and we have an access method as to how many people should be accessing this and we want a size. That's all fairly standard persistent volume stuff or just storage stuff in general. The only thing to notice there is that we're using a specific storage class name that corresponds to the storage class our administrator provided for us. So, then we kubectl create the Gluster PVC and it's done. All right. So, now we're going to try and roll out an application. For this demonstration, I'm just going to use a relatively simple NGINX application. You know, it listens on port 80. It serves on port 80. It uses the NGINX slim container image and it mounts a Gluster volume to store its main HTML files. So, we go to create this resource and it's underway. So, we created a service and a pod and now if we go look they're all running. So, now I just go and curl at this URL just to show you that it's running or that IP address and sure enough it's running. So, now I have a little test file where I pre-wrote a a quick line to insert some data into the index.html of the NGINX server we are now running and sure enough if we curl the same address again we get hello world from GlusterFS. Now, I'm going to I believe delete. Yes, I'm going to delete the NGINX service I just created deleted. I'm going to delete the pod that I just created. Pod one. I type so slow. Okay. And then get all to show that it's to show that it's not running anymore. And then we go back and create the thing again and it's creating. Note that this time the service has a new IP address. So, this is a new service, new pod, using the same storage. And then if we try to curl the same address again it's not going to work because the old service is gone and it's been given a new IP address. So, we grabbed a new IP address, curl that and hello world from GlusterFS. All right. Now, I wish I could show you the real sexy thing which is where you take down a Gluster node and it's all still running because obviously three-way replication. But trying to do that thing it's exactly what killed my virtual environment because I use libvert snapshots and I forgot to take it out to undo the snapshot when I was taking down the node. So, things got messed up and I don't have the network bandwidth at this conference to redo my entire VM setup. So, apologies for that. Let's see you shift to 5, 15. All right. That was the demos and with 15 minutes left we're done. That's it for the main presentation. You can find us on GitHub and supposedly me on Twitter though I don't use that as often as maybe I should. Here we have a couple of URLs to the various projects that we've mentioned and are working on. So, with that we are ready to take your questions. Where in the back there in the gray shirt that is currently being worked on? The question was, I see, the question was is there a helmet? Are we considering or working on a helmet chart to deploy all this instead of using a instead of using a deploy script? The answer is yes. We are working on a helmet chart right here. Okay. So, can we say a little bit about the chicken and egg problem and the fact that we're using a Gluster volume to store the database that manages our Gluster volumes? That is indeed a known chicken and egg problem. The only reason we're doing that is because we needed something to persist the database between restart to Fiketti that we didn't want to be bound to a particular host. So, we didn't just want to use local storage on any particular node. So, we figured, hey, we're providing the storage. Let's just throw the database in there. This is kind of awkward, but it worked for now. We're currently exploring other things to sort of distribute the database like using etcd as a database store for instance, or just a slightly longer term product of getting rid of the database entirely and just trying to read information from the nodes directly. But that requires some modifications on the Gluster side as well. All right. Any other questions? That to backpack. All right. We got one. A scaling preference? Excuse me? No, scheduling. Okay. So, if there's a the question was if there was in Kubernetes, if there's a scheduling preference when a node goes down and a new one needs to be spun up, is that right? Okay. If the pod gets restarted and there's already a volume and there's already a volume mounted for that pod. I don't know specifically, but best I've seen, it just gets it just gets remounted to the new pod. So, the volume the volume never gets detached from the persistent volume claim unless the PVC itself is deleted. So, there's so there's a slight distinction between the PVC and the PV. Yeah. Oh, again. Yes. Oh, like that. Okay. So, is there a scheduling preference on whether the pod will be restarted on the same node that already has its volume mounted onto the node? And I don't know for certain my current observation of just as a Kubernetes user indicates yes. That it'll just get started on the node that already has the storage mounted. All right. You there? I couldn't tell you. The question was what are the pros and cons between HTFS and clusterFS? I have no idea. Officially. All right. There was something down here. Okay. Yes. What? Same way. Same script. Same script. Yes. The same script handles both Kubernetes and OpenShift. One thing I don't show in there is it will automatically detect whether you're running Kubernetes or OpenShift. And if for some reason it detects both it'll ask you if you want to do either Kubernetes or OpenShift. Ideally you shouldn't have both but there have been some strange configuration cases where it can detect both so it just lets you choose. But yes, the same script will do both. Let's go back up there since he's been waiting. Any non-containerized tools? Okay. So the question was do you need any non-containerized tools for using the Gluster volumes? Is that right? Okay. Best I know the, let's see, the only thing you'll need is that you need the what's called the Hiketti CLI or Hiketti Client. Say that again. And yes, you need Gluster Clients installed on all the nodes. On all the nodes that are running Gluster and you also need to open a couple firewall ports that Gluster needs to be able to communicate with itself and its other member nodes. The Gluster processes run in containers but it needs access to the raw host system in order to do most of its actual job. So the processes themselves are containerized but the task it needs to do require access to the actual storage devices. I believe so. All right. Anyone else? This side of the room? Back over here. Aha! The question was, can we do storage tiering? Not automatically but you as an administrator can define this via storage classes. Of course, you have to communicate that information to your users somehow. So you would create, usually the standard demonstrations you create a gold storage class and a silver storage class and then you just tell your users, all right, gold is like super fast non-volatile memory access, silver is SSDs and then like stone is, you know, spinning rust. Anything else? Doesn't seem like it. I'm calling it done. Thank you very much.