 Hello everyone and welcome to a brief discussion on persisting your data in an ephemeral ecosystem. My name is Eric Zietlow and I'm going to just be going through a couple things here today. Who am I? Why should you listen to me? Well, I'm a data on Kubernetes Community Ambassador. I'm a director of developer relations at Maya Data, one of the biggest contributors to Open EBS project. I'm also an open source software contributor and committer on a couple other projects as well, including things like the Apache Xander project. I've been a distributed system solutions architect in a previous life. I've been a network engineer, software engineer. I've kind of run the gauntlet. So I have a fair bit of experience with different aspects of this and I definitely was here throughout the rise of Kubernetes as a technology. So why does this matter? Why should we care? Well, CNTF did a survey back in 2020 that was really, really fascinating. They basically were asking people questions about containers, containerization, what people are actually using in production. And from the time they started the survey back in 2016 to 2020, they experienced a 300% uptick. So this last year in 2020, it was 92% of people who responded said they had used or were using containers in production right now. That's pretty wild and it gets even crazier when they asked about stateful workloads because 55% of respondents said they used stateful applications in containers in production. Now for some of you, that might be kind of a bit of a mind blown situation because containers typically are not used for stateful things. They're an ephemeral object, typically. So having a stateful workload running in a container generally requires some finessing, some extra technology. And today we're going to be taking a look at that and maybe the right way to do it. All right, so typically there are many, many solutions and there actually is kind of an entire sub-industry of the storage industry that has sprung up, things like SAP Storage OS, Gluster, Longhorn, and of course OpenEBS, which is what we're going to be talking about today. Now as I mentioned before, OpenEBS was basically majorly contributed to by Maya data. It's actually one of the reasons I took a job at Maya data. It is one of the better solutions. We're into the Longhorn and OpenEBS actually have the same roots and the same project. They're both basically a fork. OpenEBS itself is, I'd say, the more open source community version of that and it's really grown into its own thing and does a lot of really cool stuff. So we're going to end that today. Okay, first off, what is an elastic block source? So EBS. EBS quite simply is just that. It's an elastic block storage, which means it's block storage device. It's not a file system. It's not anything else. It's just a block storage device. And things like AWS EBS volumes, if you ever use those, they're exactly the same thing. They're EBS volumes, elastic block storage. This is just an open source implementation of the EBS technology. So pretty simple. If you've ever used any AWS or Azure has their own version, same as GCP, pretty much every major cloud provider has some variation on the EBS volume. This is something you can run yourself, though, OpenEBS being an open source implementation. Okay, container attached storage is something I'm going to be seeing a lot, so I figured I should define it. Containers attached storage simply means a storage device that is closely associated and basically bound to a container. So in this case, we're basically binding a storage device, an actual physical piece of storage to a pod, and we're doing that in a persistent way. So we're creating a container attached storage architecture when we do that. Okay, simple Kubernetes just really quick going to run through this. We have master, we have workers, everyone understands this. Master could be a control plane, it could be a single node. That's fine. Workers, it could be one could be very, you know, many, many, many workers. That's all cool. When we have our master, we have a couple of different parts, we have our API server. API server is actually how we communicate as the operator with the Kubernetes cluster. We have a controller manager and scheduler, which basically handle all the tasks that actually getting work done is set up and delegated by these two components. We have SED, SED is actually kind of the secret sauce or part of the secret sauce of Kubernetes. It allows us to reference other Kubernetes components, so networking, other pods by names, tags, all sorts of things. It allows for basically a lot of obfuscation of the complexity that we normally have to mess with when we actually deal with a full microservices architecture, for example. You can actually obfuscate a lot of that into SED using labels and in names of various things. So very, very powerful tool if used correctly. Okay. Workers have a cubelet, which is basically the Kubernetes process that's running to CAdvisor, which is a local resource management and monitoring tools that basically keeps track of everything on that specific worker. And then Kube Proxy, which is the networking everything when it comes to Kubernetes. So all the communication between nodes, all of the people going in and out, say to hit a web page that you're hosting, those sorts of things, Kube Proxy is involved in all of that. Then we have our pods. Now you can have many, many pods on a worker and pods are basically the containers for containers if you want to think about it that way. Pods themselves are actually ephemeral. This is where the problem comes when we start talking about data and persistence and statefulness and all those things. A pod being ephemeral means when that pod goes away, a power goes down, if someone shuts that pod down, Kube CTL shuts that pod down, that whatever it had, whatever it was doing, whatever it was storing is gone. It's just gone. And there's no way to get that back. That means that we have to start thinking about other solutions when we want to actually store data. We're running a database inside Kubernetes. So simply put, our most simple possible Kubernetes architecture looks something like this. We have our master, we have our workers, all these different components working together to create whatever the application was, whatever the workloads we had, it gets them running, it gets them running cleanly and hopefully we don't have to manage too much. Now as I mentioned earlier, the Kube Proxy helps us talk to the outside world. And as I also mentioned, we don't have data storage inside this Kubernetes cluster. So what are our options? Well, as a Kubernetes native workload, what we generally historically have to do is we would have to delegate all of our data outside. So whether or not that was a database as a service, maybe when we were hosting something alongside our Kubernetes cluster or maybe we're using S3 buckets, if say it's a web page and it's grabbing pictures or something for the website, generally all of that would be hosted outside of Kubernetes. And while that's very cool, it still adds complexity. It, you know, it'd be really nice if we could manage everything in one place because we're actually essentially simplifying our overall architecture when we do so. So what does that mean? Well, we're going to need a way to make pods unique. And to do that, we do this thing called a stateful set. Now stateful set is kind of a large topic all onto itself. So simply put, just like if it for now as a way to give pods a unique identifier that persists through that pod's lifecycle. So even if we spin a pod down and spin that thing back up, it allows us to spin it back up with the same unique identifier with any updates we've done to that pod persisting through that power cycling. So basically we're removing the ephemeral nature of the pod. Now we're not adding any storage yet, but we're giving it something where if we were to bind something to that pod, which we'll actually do in a moment. That's not going to go away just because that pod disappeared. So really, really key piece, stateful sets are the first piece of the puzzle when we're actually solving for persistence in a Kubernetes ephemeral world. All right, second, we have these things called persistent volumes, and they are associated with persistent volume claims or I should say, the pod associates themselves with a persistent volume claim, which then it glows on to a persistent volume that satisfies the claim requirements. Now, this does not just happen automatically in Kubernetes, you actually have to set this up, you have to set a padding. There's some complexity that's introduced actually a lot of complexity that's introduced when you do this. And because of that, it means we're really going to want to have a system to manage this otherwise we're basically going to have to be setting up a spider web of our own management solution and trying to bind volumes to the right places and make sure things stick around in the way they're supposed to. So how do we do that? Well, that's where open EBS comes in. Because open EBS basically takes all of the complexity of setting up your storage, binding your storage and maintaining your storage, and that puts it all in one nice little basket. Now, you'll notice here we've added this thing called the open EBS control plane. We'll get into that in just a moment what it does. But essentially understand right now it actually runs on the master and it changes the way that storage is handled or it actually improves the way storage is handled. So what are the different parts? We have no disk manager operator, which is an operator just like any other operator in Kubernetes is basically if you wanted to think about it in kind of simple terms, it's kind of like a custom control loop with my API server, which is associated with the API server, which is what the operator or the user would talk to using maybe like kubectl commands submitting this API server, the my API server extends that when we have a local PV permission. So my API server, just like I said, extends the Kubernetes native API server. So when I'm interacting with open EBS, I'm not having to run a bunch of side commands in a different, you know, using a different process. I'm actually running kubectl and I am running my commands all through there. So I'm setting up my storage classes, I'm setting up my persistent volumes, I'm setting up all those different pieces through kubectl really, really powerful. Any other tool you'd use to interact obviously kubectl is just one, but they all have access to that same API. So we're not having to kind of custom a manage anything it's all just built in as the extension of the normal API server. Okay, local PV provision is actually really cool. What it does is it works with NDM, which we'll talk about in just a second. And NDM creates a pool of devices, local PV provisioner actually pulls from that pool and creates the consumable resources. So when we use a persistent volume claim to glom onto a device, that device was created by the local PV provisioner, and that device can be all sorts of things. Local PV provisioner basically obfuscates it. So all we see is I have storage device, I use storage device, storage device is associated to a pod. Done. Makes it really, really simple. Okay, NDM or no disk manager is a daemon set that runs on each node. And what it does is it essentially keeps track of the physical devices that are attached to each node. And it has a way to filter through those devices, it's actually configurable so you can filter based on criteria, and it will create a pool. And that pool is your list of valid devices. It's all the places that your local PV provisioner can pull from it to get a device to then associate with a claim to then be used by a pod. So we're creating a resource pool that is actually referenceable through QCTL, Kubernetes handles it as a Kubernetes resource, really important there. Essentially when I say Kubernetes native, it means we're not doing anything wild with Kubernetes. We're simply using Kubernetes in the way it's intended to be used to manage our resources. So NDM allows us to do that by creating this device pool. It also enables some cool hot swapping stuff and some other really, really nifty features. Okay, putting it all together. So as you'll see here, the pathing changed a little bit because now we actually have a pod here on the left that is associated with a persistent volume claim that is associated with a persistent volume and the local PV provisioner is still associating other persistent volumes ready to go for when another persistent volume claim comes through. So this is essentially in simple terms the right way to do data on Kubernetes and is one of the ways you could then run, for example, a MySQL or Mongo or Cassandra database in those pods that have those persistent volumes without the worry of losing your data. Really, really powerful tool, highly recommend you check it out. As I mentioned before, it is an open source project. The GitHub code is out there. You can go download it, modify it, contribute to it. If you do contribute, put a shout out, tag me on LinkedIn. It's Eric Zietlow on LinkedIn. I would love to help promote. The other thing that's really a good way to get involved is the OpenEBS newsletter. Go ahead and subscribe to that. It is curated by us at MayaData. We are trying to put that out as a resource for the community, kind of all things storage related around Kubernetes. All right, with that, I'm going to leave you and have a great day.