 Thank you. Thank you very much. I'm really glad to be here. This is actually my first kube con Done a little bit of workshop support in ancillary events in past, but this is the first time actually attending a kube con and I'm super excited So I'm not gonna talk for five minutes about myself. Just so you know where I'm coming from software engineer Got into architecture after that and then I've been worked in defense and in hospitality and for some reason I always try to get myself on whatever the project is that the company is betting the business on That's where I seem to end up That has led me into some interesting spaces a lot of distributed systems So that got me into a patchy Cassandra and Kubernetes and then now putting them together. So that's kind of the genesis of how I got involved in this kind of thing So on the way over here I was grabbing a snack and I went outside To the coffee station and I'm walking past the table and I don't I don't recognize anybody here. So maybe I'll be okay So I'm walking past the table and there's a group of folks and they're like, hey Did you see that? They just ask is going to be doing a talk about how to put a database in Kubernetes and one and one guy goes, why would you do that? Oh There is still skepticism out here around this now, you know, I'm in a world where I'm used to this and I've already bought into it This is an article that I'm showing up here that a colleague of mine Chris Bradford wrote about his personal journey from Being very anti, you know running Databases in containers just even that idea to how he kind of went through that progression of running Databases on Kubernetes. So What I thought I would do is this is not the why you should run a database in Kubernetes talk. It's more of the Assuming that you agree with the premise of doing it. How do we actually go about doing it? That that's kind of where I'm coming from if you want to ask me questions about why at the end. Let's do it There's a whole community of people that are working on putting stateful workloads on to Kubernetes It's the data on Kubernetes community We had a great Full day of sessions here on Tuesday. You can go and watch a lot of those sessions online I may even stole a couple of my points that you'll see later in the talk from things I heard on Tuesday, so I'm an active and avid learner in this space as well There's a whole community of innovators doing great things here one of the things that I Learned recently is a survey that the DOK community commissioned I'm talking to a lot of developers Architects CIOs, you know all kind of all range of the the IT workforce looking at who is adopting Kubernetes for stateful workloads and it's kind of Encouraging and a little bit surprising these numbers that that were that were able to come through now Who knows is there confirmation bias from people who are willing to fill out a survey about data on Kubernetes? Yeah, I may be okay, but look at this. Okay, so 70% of People have at least some that are using Kubernetes have at least some stateful workloads there 90% think that Kubernetes is ready for it, which to me says you're at least thinking about doing it So how do we get there I? Want to be clear and you know set expectations and you probably saw this when you were looking for sessions This is an introductory level session so I'm hitting the wave caps and what I want to present to you is a way of thinking about how you put databases and Possibly other stateful workloads on to Kubernetes. So I've tried to break it down into a few simple steps and It starts with Making sure that you understand the Kubernetes primitives for stateful data including the persistent volume subsystem you want to Pick a storage provider because ultimately your data needs to end up somewhere unless you're just doing caching You need to pick a database and then I'm going to highly recommend that you find an operator Assuming that you're using a pretty common or popular database. So those are the steps What we'll begin with is making sure that we understand these Kubernetes primitives And especially the ones for managing stateful stateful workloads But actually we're going to look at some some of the other primitives as well that are not exclusively for managing state They're all involved in putting a database on Kubernetes. So here we go I want to start with demystifying something now when I was a junior developer I Was afraid of databases Okay, database was like there was one guy on the team that knew how to interact with the database to be the DVA Create the tables and manage all of that. You wanted anything you went to the guy and this was a bad thing because Mitch got stuck there for a while like he got pigeonholed and he was the only he wanted to go and do other things But he was the database guy. He was the only one that knew how to do it. So Let's demystify a database is an application and Applications when we deploy them in Kubernetes or anywhere else They're really an assemblage of compute network and storage. Those are their needs, right? It's code it needs somewhere to run and these how to talk to other things and it's got to have someplace to put its data That fits a database just as well as any other application Okay, so let's look at What kubernetes gives us and break it down organize the community kubernetes primitives in terms of compute network and storage, okay? so We have these primitives for running pods on worker nodes we have Replicas sets and deployments that we can use to run multiple copies of things We have now stateful sets that kubernetes gives it gives us to run stateful workloads For exposing our capabilities as services We have a kubernetes service. We have things like kubernetes ingress So these are primitives that kubernetes gives us for allowing things to find each other and talk to each other And then in terms of storage we have a whole persistent volume subsystem that You know, we're definitely going to focus on those but in order to deploy a database We just need to pick the right pieces from kind of this grab bag of resources that kubernetes gives us I'm going to show you some code here. There's going to be some yaml. There will be yaml in this presentation This is all available on github We have a repo that Patrick McFadden and I have created for a book that we're working on that yes I'll plug at the end. Of course, you know, I will but that that's where the material is being drawn from and Most of the images that you're going to see are also drawn from the book as well Okay, so I want to talk about the persistent volume subsystem portion of kubernetes So our pods can mount volumes And then the volumes can be of various types now in production systems What we see most commonly is the use of persistent volume claims So that's a PVC mount is the most most common type of volume that we see mounted For an application that's doing something that's staple Okay, so the way this breaks down is I create my pod. I create a persistent volume claim that's gonna basically represent a request for storage that my pod has and then the kubernetes is going to leverage a storage class which is managed Managing a section of storage in order to create persistent volumes and then when we create pods and replica sets and Stateful sets then that's when the process of creating those persistent volumes and associating with them with persistent volume claims happens generally what we see is administrators being involved with configuring the storage classes and Developers are more concerned on that consumption side with creating persistent volume claims So we'll talk through this is kind of a big picture of you and we'll talk through some more of the details So a persistent volume is the kubernetes way of getting access to storage that outlives the lifecycle of a pod and There are different types of persistent volumes so we have local persistent volumes and Those are going to leverage storage that lives on your kubernetes worker nodes Other persistent volumes Types provide access to storage that is maybe outside the cluster network storage Maybe it's provided by your preferred cloud that you're running on on there Also third-party services Maybe you are running in an on-prem situation and you actually have storage arrays that you're trying to allocate storage from So all of these are legit Types of persistent volumes that you can use to provide storage to your application And we'll talk a little bit more about selecting a storage provider in a bit So this is an example of a persistent volume declaration It references a volume size it references allowed access modes So you can have read only read write you can have volumes that can only be Written by a single writer at a time these kinds of parameters Now this particular definition is an example of a local volume that is mapped to a specific kubernetes worker node using node affinity Again, this is something that would typically be configured by someone who's responsible for For the administration of your kubernetes cluster. So more on the ops side than the app dev side Once a persistent volume has been made available for application use Either manually or they can be dynamically created by storage classes as we'll see We can reference the persistent volume in our pod specifications through creating a persistent volume claim and This provides a really good separation of concerns This allows us to as an app developer just ask for the storage that we need and the Characteristics that it should have without having to have the knowledge about the specific provider that's in use This also makes our applications more easily portable so that they can run in a different environment as long as we The persistent volume claim can be satisfied by some persistent volume that's available in the target environment Then we should be good to go There's also a second layer of separation So a persistent volume claim is actually defined externally to the pod that References it so a persistent volume claim just represents kind of in the abstract a request for storage So on the left side, we see the definition of a PVC It includes a desired amount of storage and access mode As similar to what we saw before with persistent volumes It can optionally specify a desired storage class and that's in the case where you actually would like Or okay with your persistent volume claim being satisfied dynamically by the storage provider Provisioning more storage on your behalf So on the right side, you see a pod that has been defined to reference that particular PVC So pods linked to PVC's which link to persistent volumes which are created by storage classes All right, so speaking of storage classes This is where the idea of picking a provider comes into play So we understand the Kubernetes primitives at this point now we're ready to Take what we've learned and assemble things to deploy applications Okay, so Storage class is responsible for the dynamic provisioning of storage on persistent volumes in order to help satisfy our PVC's So the storage class handles the details of interfacing with our requested provider or the provider that we've configured so that the Requested amount of storage can be set aside So there's actually a ton of different storage providers I didn't I did like sort of like an informal survey I didn't count on my fingers or anything But as I was going through the kind of the solution showcase there are a lot of storage providers here at kubecon This is a rich area of you know competition and innovation and so there's a lot of options that are available and That's just even from third-party vendors. I'm not even counting the what's available from our standard cloud public cloud providers. So One of the things that's pretty cool is this little tool recently discovered new to me at the bottom of the slide here the kubester Which is a tool that's going to allow you to see what storage classes are already available in your cluster and make sure that they Are configured correctly. So that's a that's a pretty fun way to educate yourself The example that I am showing here of declaring a storage class is a really simple example of Three storage provider from you know open source from rancher labs that basically just allows you to provision Your your desktop or laptop computer as a storage provider If you're just running Kubernetes on your desktop kind of for you know local dev purposes So I use this I use this one all the time If you want to peel back the covers a little bit This is where I for a one-second kind of stray into non-introductory material, but I think it's interesting. So There is a specification called the container storage interface. It's actually not unique to Kubernetes or tied to Kubernetes you can use CSI compliant storage providers on other Container orchestration platforms as well, but this provides a specification and basically an API for providing cloud native storage and Most of these CS CSI providers Not necessarily all but most of them actually Implement their control plane on Kubernetes. So I think it's really cool that you know You can have your storage actually managed on the Kubernetes platform And so anyway, I love geeking out peaking under the covers for just a second Okay, so now that we know about the primitives that we have and The the storage providers that we have available to us so we pick a storage provider now We're going to pick a database And I'm going to give you two options here of different deployments for databases We're going to look at a single node deployment of my sequel now I understand that multi node deployments of my sequel are possible. So don't give a set with me I know that there is the test which helps you to do all that and there and other operators that help you to do that for Different different types of relational databases. I'm just going to do a simple one node example here And then we'll look at a Cassandra deployment That is a multi node deployment and maybe compare and contrast and look at using some of the different application compute primitives that we introduced earlier namely replica sets and deployments and staple sets Okay, so here's a sample deployment of my sequel and this is based on an Example that you could find in the Kubernetes documentation I've kind of forked it on our our repo that I shared with you earlier that that data on Kate Kate's book repo our github org and So very, you know, but relatively minor modifications to that standard Kubernetes documentation example What this does is deploy WordPress a single node of WordPress on top of a single node of my sequel And one thing that's that's kind of interesting around this example is that it shows not only my sequel Creating a PVC and getting some storage allocated to it But then also WordPress is using my sequel and also on top of that getting its own volume where it wants to store some configuration data, so It's a good demonstration of the idea that Applications can use databases which use persistent volumes and applications can also assign volume or acquire volumes directly themselves So because we are only deploying a single node of my sequel in this example Kubernetes deployment is a good choice now Deployment is compute construct that is going to sit on top of replica sets so deployments manage the lifecycle of replica sets which in turn create pods according to a number of replicas that we request so This is better than just running Database in a bare pod by itself because when you create it as part of a deployment Kubernetes is going to take on responsibility for that lifecycle for making sure that your desired number of replicas In this case one is running so we're gonna this this might not be Super high availability because we could have some downtime if a pod dies and has to be recreated We're down from a database perspective during that restart period, but it is going to give us some measure of availability The other thing that's curious to note here is You see on this slide that there's two replicas that are created by this replica set They're both pointing to the same PVC So this is a characteristic of these replica sets is that there's only one PVC that is Defined in the replica set if you create multiple replicas, they're all pointing to that one PVC Now this is a great config. This is a great configuration if you have read-only data You could certainly get some efficiencies out of this, but if you want to actually have Situation with multiple nodes that you're writing to you like Cassandra, which we'll see later This wouldn't be an appropriate configuration for you and you would want to use Something other than the deployments in replica sets Okay, so to deploy our single mysql node. There's a couple of things that we need to create to start out The first thing is on the left there is you see security credentials Now one of the things I love about working with Kubernetes is that things are secure by default, right? You can't get out of port unless you expose it So we want to apply these same principles when we're talking about databases. So the mysql that we're deploying Has an administrator username and password we can actually control what that is by defining it in a secret Which we will then pass into You know leverage in the definition of our deployment For mysql on the right side. We see the definition of the PVC that's going to be referenced by our replica set Okay, so these are two ingredients that we create up front now. We are ready to Specify the yaml for our deployment for mysql Okay, so again, we're not creating an individual pod. We're creating a deployment that wraps it and So part of this definition is not the actual pod but a template for a pod. So every time the Deployment is going to create an additional pod. It's going to use this template or kind of the recipe for creating that pod And again, you see in there the reference to the single PVC that we declared earlier All right the next piece that we're going to talk about is How do you actually make a database accessible to your applications again? It would kind of be lame to address our applications to a single pod instance a kind of a hard-coded Instance or IP address because that pod could die and get restarted. So we want to stick a Kubernetes service in front of that and This is going to abstract the details of where that database instance is actually living on the network So even if we're only running a single pod, this is still useful We have different types of services that are defined in the Kubernetes world So you have a cluster IP service, which is only within the scope of that cluster You can use load balancers And those are services typically you typically a implementation of a load balancer is tied to your cloud provider So incoming calls might be round robin to different instances behind that service. You might find that useful We have other things like external ports we have ingress that can be defined and What we see most often if we're talking about a database here is The use of a cluster IP service or maybe a load balancer. That's that's what I tend to see most often Usually you have an application that's sitting on top of your database and the application is what is providing In an interface outside of Kubernetes. So not that you couldn't expose the database directly, but just don't see that very often So this is an example of a simple cluster IP service. This is what is known as a headless service so what this does is when you do the DNS lookup based on this name of Wordpress mic SQL service what you would get back is the IP addresses of everything that's sitting underneath it and again in this case It's just a single IP address So headless service is a great way to go to put in front of your database instance All right, so that was a quick fly-through of my SQL deployment example again. I want to refer you to The GitHub repo in the book if you want kind of the more blow-by-blow detailed description We try to go through all the the various options kind of at a high level and then refer you to the the points in the Kubernetes Documentation where you can deep dive and get the lower level details. So just hitting the wave caps for you right now We want to talk about deployment of Cassandra now. So the way that Cassandra works is It's a it's a multi node architecture. No one runs one node of Cassandra in production Not very many people run three nodes of Cassandra in production Generally, you you have a lot of data if you're using Cassandra and those are organized in there's two different ways to think about how Cassandra Organizes itself and the data that it's storing. So there's two viewpoints on this slide They both refer to the same cluster one of them is a kind of a more physical layout in terms of where the machines are Located within your network. So a lot of times you'll have multiple data centers What can't go to Cassandra calls a data center and multiple racks So in cloud deployments, most people map a Cassandra data center to a cloud provider region and They map a rack to a particular availability zone So that's what you'll see in the if you look at the code details of the example So Cassandra is aware of where you are placing this nodes because you tell it where the nodes are in terms of the network topology and Then it's going to try to store multiple copies of your data So that they are distributed across the different availability zones and even regions if you have a multi region cluster So those are kind of the two viewpoints of the world Cassandra uses something called partitioning which is similar to the concept of sharding, but it's managed entirely by Cassandra. So you're never Interacting with what that kind of sharding our algorithm looks like when you're using Cassandra So I wanted to give you those details about the topology so that this slide would make sense if you have some familiarity with stateful sets The way that what's shown here is a Cassandra deployment That has three racks. So one data center consisting of three racks And there's a single pod that's shown here in each rack And so we have a stateful set that is managing each of the racks And then as you can see here, there's a there's a key difference from the MySQL example That we saw before in that each pod is actually getting its own persistent volume claim So this means each Cassandra node has its own dedicated storage and and that's what we want Okay, so we'll talk up front here About the idea of creating these standards a Service that is pretty much like the my the service that we put in front of my sequel. It's very very similar and this time we're exposed in the standard Cassandra port of 9042 and I'm showing it to you now because we're actually going to reference it on this next slide So this is probably our most complicated complicated yaml that we're going to see so trigger warning for Anyone who doesn't like reading yaml on slides possibly including me but This is better than me scrolling through a terminal window and screwing it up. So This is a the definition of a stateful set for for a Cassandra cluster It's going to span a couple slides We'll just walk through it a little at a time and I'll try to guide you through so the left side We see the name of the stateful set and we're going to reference that service that we just created So we're telling kubernetes that we that that is the service that we want to put in front of our nodes Also on this left side, we're looking at we're defining which policies we want kubernetes to use There are some options for how it manages the life cycle of the pods as it as it spins them up and down to scale up and And destroys pods in order to scale down in the stateful set so the options that I've specified here are actually the defaults and They represent a more conservative approach to To managing the stateful set in that They're going to start one node at a time and they're going to wait for each node to Report that it's ready before starting to spin up the next node and the the restart policy that's here It functions in a similar way So restarting a single node at a time when you so the the stateful set does Support the idea of a rolling update so you can deploy updates to the to the stateful set that will be rolled out individually to the pods There's other things that we see here on the right side of the slide Exposing ports for the different interfaces that Cassandra has for a client access with this with CQL Cassandra query language management APIs interfaces for talking to other nodes and so on and The last little thing on the bottom right there is is kind of cool defining a pre-stop command. This helps us Have each Cassandra node be a good citizen In instead of just ghosting the rest of the cluster When we when we scale down the cluster, it's going to actually communicate and offload its data Nicely to other nodes within the cluster. So There are other hooks that we can define In terms of we can we can customize the liveness and readiness probes that are used on each node as well as this pre-stop that you see here Okay, we're halfway through the yaml all right, so What we see here on the left side is overriding some environment variables the particular Cassandra image that we're using in this example actually allows configuration by providing a yaml file Which you can swap in Override Cassandra's built-in yaml configuration or you know There's also several environment variables that are supported that you can kind of override the the location of various things and some different properties And then finally we need storage So we're going to define a PVC template and every time the stateful set is going to stamp out a new pod it's going to Create a new a new PVC according to the template that we have defined here and this functions much the same way as the other PVC definitions that you've seen in the previous slides, so That's that's the great thing about stateful sets is that it's going to manage the The creation of these pods and the creation of the storage that they need at the same time one of the things that they do not do is when you scale down a cluster and Nodes are eliminated from the stateful set. It does not Automatically delete the pvcs for you So you're welcome. Your data is still there even when the cluster scales down You actually have to go and explicitly delete delete those pvcs in order to free the data all right, so stateful sets are pretty powerful and You can see that a simple example can involve quite a bit of yaml configuration So you might ask do you is that you know, is that too complicated? Do I want to manage that complexity? you may or may not so and That's just all I've shown you here is some brief talk about you know Initial deployment of the database and then maybe a little bit We've talked about scaling up and scaling down or kind of hinted at that now What about things that databases need Karen feeding tuning? You know debugging things Long you know identifying long-running queries There's all kinds of things that go into the operations of a database that that we need on top of that initial deployment So this is where the idea of operators comes in So this is a great quote from Tuesday This is very likely a paraphrase of what was actually said but I remember Rick Vasquez from Western Digital saying something like this in a great panel discussion that was part of that d. Okay day and His his words of wisdom were basically yeah If you're going to deploy if you're going to deploy a database in Kubernetes You should use an operator and that was like a word to everyone not not you know not just noobs or you know People that maybe have less experience doing this basically like you should be using an operator That's really going to save you a lot of pain And I would concur with that opinion. Okay, so this is where the operator pattern comes into play This is a Kubernetes native way of managing applications that take advantage of the Kubernetes deploy even the Kubernetes control loop So there's very likely An operator available for the database that you're using and in particular in the Cassandra world We had like five or six of them as of Earlier this year we've kind of reconciled as a community down to one called CAS operator And you can find it at the address shown on there And then we've actually kind of broadened beyond that so CAS operator manages the provision and running of your nodes But you also need other things and this is a common thing for other databases as well You need to manage backup and restores. Maybe you need secure provisioning of keys or different access credentials There's a lot of things that go into it right so An example of something an innovative that we're doing in the Cassandra community is this Kate Sander project in which We're actually building an ecosystem of things around the core Cassandra project It includes CAS operator to run Cassandra, but then also tools called Medusa and Reaper that perform Operational tasks including backup and restore capabilities We've integrated the cube herrithia stack so that we have metrics Reporting you can you can use the cube herrithia stack that comes with Kate Sandra or you can swap in your own Instances if you would like and then on top of that we've put stargate which is an basically an API layer that we've built on top of Cassandra and You know my this is not a plug for our our database as a service But we have Astra our database as a service what we're basically doing with that is a lot of the technology that That runs that goes into Stargate and the Kate Sander projects So when people ask are you you know, can you run a database on Kubernetes? Well, I mean, that's what we're doing We have a whole database as a service business that is running in Kubernetes So if you want to hear more about this kind of stuff, there's a talk that My colleague Chris Bradford is is co-presenting with Ty from Google this afternoon. I recommend Checking that talk out, especially if you want to talk about multi cluster going having a database that spans multiple Kubernetes clusters This is a really interesting and innovative area and there's a lot of work going on here This is the book plug that I promised that you I know that you really wanted to see the first three chapters are out and Available if you have an O'Reilly account, you can see them on the learning platform and I'm really grateful to Portworks who have agreed to sponsor the book and you can actually get the first three chapters for free That's what the first three that are available From them right now. They've been handing out cards and I'm giving you an address here that you can use This is something that I Don't we're not the world-class experts like no one there is no one that has all of the knowledge so I am really happy to be corrected and Fault to be found with things that we have written and things that can be made better So I'd love to have feedback from people that are looking at the early release of the book And I'm gonna go hang out at the data sacks booth after this We are giving away a video game machine But I know that you're not all about the swag and the prizes So I know that you want to hear my colleague rags come and give some demos He's gonna be doing some hands-on stuff with Kate Sandra at the booth And I think he also has a couple of t-shirts to give away if we have folks to they want to ask questions And I'm sorry virtual people. I cannot send you a virtual t-shirt Okay, so I'm ready for questions if we have time You know, I think we might be out, but yeah, thank you very much Jeffrey. All right. Thank you