 Hello? Anybody hear me? All righty. How's everybody doing today? Good. This is an interactive session. It's the first of, hopefully, many interactive sessions. But we've got to get the energy up in the room. It's a little bit different. It is stateful sets on Kubernetes, the guardians of your data galaxy. So already, you're starting to see a little bit of out of the norm talk here. So hopefully, it'll be a little bit of fun. So let me start. My name is Eddie Wasif. For those of you who don't know, that's a Dallas Cowboys uniform. I come from Dallas, Texas. Or I am from Dallas, Texas. Woo-hoo! All the Dallas folks out there. Tough game. Tough game yesterday. We didn't pay off the refs enough. We needed to, but we lost to that. It was unfortunate. But before tech, I was going to be a Dallas Cowboy. Fortunately, I had a career-ending injury. I was born too short, too slow, and not very good at football. So I got into tech. I got into tech. And that's how that happened. And I'm sticking to that story. I'm a husband to a beautiful wife, Besma. Thank you for letting me come. She's going to watch this later. I have two beautiful boys. I develop and I architect. Been doing it for about 23 years. So yeah, that's a little bit about me. I love open source. I didn't always love open source. I used to think that was how I made money. So why would I give it away for free? But I've changed my ways. I love open source. I contribute to open source. I speak about open source and different products and stuff like that. So I've come a long way. I hunt and I fish very expensive hobbies. So I do this to pay for that. So holidays are coming up. So I got to do the speaker circuit to buy more fishing lures. They're very expensive. What else? Being from Texas, I barbecue, I grill. We make a mean brisket. And like many of you during the pandemic, I picked up a sourdough, but I was never very good. And I ended up doing sous vide instead. And then finally, last but not least, I'm a VP and chief architect at Vonage, which is an Erickson company. We deal with communication. Many of you have heard about us in the landline world, but we do SMS, video, voice, network APIs, and so many other things. We deal with a lot of data. We have lots of billions of minutes of voice and video and SMSes and so on. Do come scan this QR code. We've got all my information here. Now you can connect to me. And speaking of that, we all produce a ton of data every day, every minute. We produce about just shy of 2 megabytes of data a second. Think about that. That's a lot of data a second. That's almost 150 gigs a day. And it's estimated that on the whole world collectively, produces about 330 million terabytes of data. And I didn't know this new metric, but apparently 1,000 million terabytes is a zettabyte. And that's how much data the world produces a month, a week, and a day. And that is not just for shock value. Let's think about that for a second. You may not be producing contextual data about yourself, but you're producing logs. If you have a smart phone, smart watches, smart glasses, like the ones I'm wearing, they're calling home, that's creating a log record. Your phone is getting GPS data. That's logs. Every time you go and you scan that QR code to look me up on LinkedIn, you're creating more data. You're creating more context. You're creating more things that are being stored somewhere. When you post pictures or you tweet, or I don't know if we're calling it a tweet or a few X or whatever it is today, you're creating data. And that's a lot of data that's being produced. And I argue that because we're able to collect this data and take that data, turn it into information and to knowledge and add context to it, that's what's driven really the technological advances that we've had in the last two decades. Because all that data is gone. But we can use that data to really move us forward. When your TV knows what shows you're going to like, before you know what shows you're going to like, it's because of data. When you are looking to see how you're going to get to your hotel and there's traffic on the map, that's data. All of this is data that's really driven the technology and our whole life, really, is because of dealing with all that data. Well, we are in a Kubernetes world. And we use Kubernetes every single day because it's great. We're at a Kubernetes conference. And you don't usually hear about data or you don't see people advocate for data on Kubernetes. I don't know why. And if you look at Kubernetes, Kubernetes is great. It gives us efficiencies on our infrastructure because it lets us auto-scale. It has self-healing. It has the ability to pack machines properly. It really creates an abstraction, lets you deal with your infrastructure very efficiently. It also gives the developers a lot of freedom and enhances their productivity. There's a whole backstage con and a whole developer going on about all the things that have been enabled because you don't have to worry about infrastructure anymore. You don't have to worry about what kind of machine I'm running on. If it runs in a container, you can define a YAML file and just worry about making your application the best it could be. It also lets you scale very easily. I'm going to do a quick poll here. How many of y'all say KubeCuttle? How many say KubeControl? And how many say KubeCTL? Hands up for KubeCuttle. How many hands up for KubeControl? Hands up for KubeCTL. Wow. How many of you, like me, have left it and just created an alias called K and just say Kscale? All right, there we go. I think we've got a tie, I think, between the two. But really, it's a one-liner. You can, through the CLI, scale your stateless applications very easily. Scale them up and down. You have tools like KDA. You can scale in and out on whatever you want. Very, very easy to do. It also has a very high resiliency. You can remove a pod. It'll come back up somewhere else. You don't have to worry about it. You don't have to have alarms or anything. Half of the time when we're fixing a problem, we kill the pod. And it just fixes itself. You don't have to worry about all the things that we used to worry about back in the day. It's also vendor-neutral. And that isn't necessarily to mean that anybody is, you're going to forget about the big guys. But it's about leveling the playing field. We have a dozen vendors that do the same thing. And you're really going to start looking at the merits and what fits you the best. And that is one of the things that Kubernetes has given us, is the ability to be vendor-neutral and really level the playing field. Consistency between installs, your dev, QA, prod, demo, staging, whatever it is, it's the same YAML file. It works on your machine. It works on that machine. It works in every other environment. And then options, right? There are 5,000 of us here at KubeCon. Hundreds, maybe even more vendors. You have lots of options. You can pick anything you want to do a lot of the same things, whatever fits your use case or your budget, or whatever things you have. You have so many different options for you to be able to enhance your applications. And so why have we not done this as much with stateful applications, right? And if you think about it, let's think about 10 years ago how our databases were, how our data-intensive applications were. There was a server in that corner over there that had dust on it that nobody was allowed to come within 10 feet of, right? And there was a curmudgeon that sat in the closet that couldn't take vacations, and he was the only person that knew how to run that machine. And that's how we did data for a very long time. Tell you a story, we had a Solaris database that we needed to move from Ann Arbor to Las Vegas. And we literally hired a truck that had special shocks and special AC unit and a special battery where they came and they took this server while it's running, put it in this truck, and drove it like we were driving the president all the way to Las Vegas to plug it back in. You don't need to do that anymore, right? And I'm gonna give you a little bit of why I'm an advocate of not doing that anymore. And if you guys haven't gathered, I'm not conventional, right? I like to think about things in different ways and in different parallels, and hopefully you got a taste of that from the title of this talk. But one of the things I'll tell you a funny story, I was speaking in Microsoft, and I'd never been to Seattle, and I saw a sign that said, respect the semaphore. And I was like, oh, look, it's a dev town, they're using semaphore. Apparently, semaphore comes from real life and we used it in code, right? I had no idea that you can get a ticket for violating a semaphore. It's not a threading thing, it's a traffic thing. But so there's parallels in the tech world to the real world. And I like to make parallels to the fantasy world. So I'm gonna take you on a little bit of a fantasy trip here. We're gonna live in the Marvel world or the Marvel Cinematic Universe a little bit. And we're gonna talk about Kubernetes being our Marvel universe. You have your Iron Man or your Captain America or your Black Panther as your API server, your scheduler, your deployment. They're great, they do their job great. And we're all pods and when they save us, we're just numbers, all right? So we need the unconventional Avenger that's gonna help us with our data-based or our data-intensive applications. And that Avenger is our stateful sets. Any guesses on who you guys think the Avenger is that's gonna be stateful sets? Star Lord. All right, Star Lord, for those of you who are not familiar with the universe is a half human, half planet. His dad's a planet, I guess. But he was stolen after his mom passed away. He lived with the ravengers. He was uprooted. He has a very quirky personality. He does things a little bit different, right? And what he does is he feels for the pods that have data. Just like him, he was uprooted and moved around. So he doesn't give them a number. He gives them a name, right? And he sticks that name to them. So they have an identity. They're not just number A, B, C, or X, Y, Z. They're pod zero, pod one, pod two. And that gives them a network identity for your data applications to be able to continually refer to each other through that name. It's not a number anymore. It's giving them personality. It also cares about seniority. If you've been with Star Lord for a while and we need to move, or there's too many people in this room, he's not gonna kick the first one out. Seniority matters. So it's gonna be ordered in how pods are upgraded. It's gonna be ordered in how pods are evicted, right? And that's because he's treating pods as if they were people with names, very personable. And he understands that moving around and being uprooted sucks. It sucks when you've got big suitcases full of data. And so he tries to keep you as stable as possible. Evictions are not gonna happen the way they do with deployments. You're gonna try to, he's gonna try to keep you in that bunk bed as long as you can with your bags. And so that's some of the cool things that stateful sets give you. The other thing is he still is an Avenger. So all of your pod lifecycle events, all of the everything else that you do with templates, everything that you're familiar with on your pods are still applicable with your stateful set. Plus a little bit more, plus the stable network identities, plus the ordered deployments, plus the stickiness of that capabilities there. So that's great. Now we've gotta handle or deal with the data, right? We gotta deal with the suitcase that each of them brought with them, right? It's not just a fanny pack or their wallet and their keys and they can go. We gotta deal with their data. And that's in the Kubernetes world, persistent volumes. Any idea who this Guardian is gonna be? I am Groot is who I think persistent volumes is represented by, right? And Groot for those of you who are not familiar is a big tree-like structure. Very strong, very adaptable, can starts off as a big tree, can get broken down, start again from a splinter, grow in a pot, become a little kid Groot, teenage Groot, adult Groot, and then grow to be the size of a building, whatever you wanna do. So I hope you're starting to see a little bit of how they're related in that aspect. So they're very dependable, he's very dependable. Groot is someone you can rely on, he's gonna be there, if the node is lost, persistent volumes don't care about the node, they're usually with some other storage and they can be part of the cloud infrastructure or your storage infrastructure and they are separate. Your node can go away and then when the new node comes back it'll attach that same data, so your data's safe, right? It's adaptable, right? You wanna have a little baby Groot, that's fine, small amount of data and that grows with you and it becomes teenage Groot, it becomes adult Groot. And Groot can be fast or it can be slow. It could be the very slow kind of storage or it could be the really fast in the latest movie where he's just shooting everybody up, right? So it adapts to all different kinds of things to handle that kind of data and to handle the needs that you have your application. The other thing is Groot can come back, right? You can't kill Groot, as long as you have a piece of Groot you can grow another Groot and that's kind of some of the things that you'll see. But there's a problem with Groot, right? All he says is I am Groot, just like storage. Dealing with storage is very, very complicated, very difficult. Are you gonna do replication? Are you gonna have, how are you gonna mount it? How are you gonna claim it, all of this stuff? So we need something to help us translate and those are our storage classes. Last chance, who do y'all think this is gonna be? Huh? Rocket raccoon, that's exactly right. Right, he is the guardian that can understand Groot and can do all the automation and helps us talk to Groot so that we can provision the storage wherever we want. He's a cool little buddy and by the way, all of this is AI generated because of all the data that we've gathered through all the different movies, all the different comic books. I just asked Bing here for a cool, epic rocket raccoon pose and so on and so forth so that's a little bit of what data can give us. But let's talk about what the benefits of a storage class are, is that it speaks Groot's language, right? I am Groot, I am Groot, but that could be, I am Groot, I need multi-replica data or I need fast provisioning or I need provisioned IOPS or I need to be on a NAS or I need to be whatever. We don't need to worry about that. We're talking to Rocket raccoon. We just say, hey man, give me some fast storage or hey, give me some replicated storage. And he's a master of automation. If you guys watch the movies, he can fly anything. He can tinker with any machine. He can provision you EBS volumes. He can provision you Azure disks. He can provision you NAS storage. You just ask them to do it and the storage class will go ahead and figure out how that's done and hand you your volume on a plate, right? He's very flexible, like we mentioned, right? Where he can take a snapshot, right? He can work with taking that splinter from Groot and he can turn that into a whole other persistent volume and give you more storage or expand your storage and it's all very simple and very transparent to the application. So hopefully we're not gonna go too much more. There's a lot of additional guardians and a lot of additional things, but for the interest of time, I really wanna kinda sum back up and say, look, now we have the Guardians of the Galaxy available for us in our Kubernetes environment. We should be able to recognize that we can do everything that we've done with Kubernetes from for our stateful applications, right? There is nothing that's preventing us from running our Postgres MySQL or Microsoft SQL Server or CockroachDB or any kind of RDBMS, right? When we use stateful sets and persistent volumes and so on, we see that it's safe. We have the Guardians of the Galaxy protecting that data. There's so many different kinds of things, right? You can have no SQL databases, Mongo, Redis. You can have big data. You can have IoT, large language models. You can have anything that you need data. You should feel confident being able to do that and I'll give you an example of why we should think about our databases like microservices. We have Black Friday coming up, right? Who's already started chopping? There you go, me and you both, right? But in a few weeks, things are gonna go nuts and we usually think about sealing our web servers and that's because our database servers are probably very much over provisioned already but you can take the same concepts now with Kubernetes and apply them to your stateful applications. Why can't we provision large specialized machines just for the two weeks, a week before Black Friday and a week after? And then we can cordon and drain the nodes, move our data safely. Now we know it's very safe to move it, right? And now you can scale up instead of just out. For two weeks, save yourself the cost of having those really expensive servers all year long just for the two weeks that you're running Black Friday and you know about it. Use a Cata autoscaler for a cron job or use some other metric and now you've handled and scaled up to Black Friday without having to bust your budget. You can do things like, because you're in Kubernetes, you can emit custom events. You can capture any kind of pod lifecycle event where you are publishing, hey, I'm going down. Instead of waiting for whoever your leader is in your database service to figure out who's going to go down and we got to figure out what to do, they're going to tell you through an emitted event or through using some kind of metric that you can emit onto the Kubernetes event backbone. So hopefully, like me, I'm no longer scared to put data in Kubernetes. I actually encourage it and I think it's the way to go because we can use all of the things that all of us every single day use with our stateful applications to make our data applications even better and we can use it in a way that is safe and resilient and fun, hopefully for all of us. Okay, with that I'm going to wrap up and take questions. Thank you so much. Please go scan this. I hear there's a bug. You have to give like five stars to this talk for it to close. That's what I heard. It may or may not be true but we have five minutes left and I'm more than happy to answer any questions. If you have a question, you can go up to the mic or raise your hand and I can bring the mic to you. I hope you all had fun. Who had fun? I had fun, thank you. There we have a question over here. Come on up. Jump up to the mic. My question is right, I want to ask what's the largest amount of data? Can you speak up just a bit? My question is, I wanted to ask what's the largest amount of data you've seen organizations storing on Kubernetes, right? The largest amount of data I have seen organizations store in Kubernetes? So terabytes of data. A lot of the newer generation database systems can actually shard the data and store portions of it in different locations. And so it's not like a traditional database where you just have one giant persistent volume, you can have dozens of 10 terabyte persistent volumes around. We have a couple of different, we do micro database structures because we deploy a database where every team or every app. And we have probably our biggest is 15 terabytes worth of data, no problem. It's a really interesting question because one of the things that I've personally discovered when we've introduced micro databases to our teams is that the way that you look about sizing is very different because you're treating your database as a microservice and every particular nodes have their own attached storage, you are not calculating the size of the database the same. And if you need to scale, you don't necessarily scale your persistent volume, you add additional nodes or additional servers, right? And a lot of times that's hard to understand, right? So let's say for example, we have a one gig persistent volume per node, you have a four node cluster, that's four gigs. How do you increase that? Do we go to each one and make it two gigs or do we just add another node by doing a K scale set replicas to five? Both will work. One will give you an additional gig and the other will give you four additional gigs. And so those are kind of the things that when you start thinking about data as a microservice, whether that's a database or a message queue or so on, it really starts changing the dynamic of how you size, how you tune, right? And how you migrate data and schema across. You have another question back there. Yeah, thanks for bringing up the Star-Lawded Racket Raccoon. I see the stateful sets, persistent volumes and storage classes really fit into the big picture of managing data on a specific cluster. But now we are looking at geo-distributed multi-cluster databases, right? So they're all like you mentioned, so many options. So what's your take on that in case of... So you have options to go directly, directly over network load balancers. You have options to go over service meshes or virtual application networks. My personal preference, and this is purely my personal opinion, I prefer virtual application networks. And we use tools like LinkerD and Scupper to create a transparent virtual application network between our database nodes. What that lets us do is it effectively creates a replica foe stateful set, right? That's pointed to a bunch of routers and they all think they're on the same cluster but they're really somewhere else in the world. We also use CropCoachDB. We have geo pinning for data. And so that's some of the things that we utilize for like GDPR or CCPA or any other kind of compliance or other issues. That's, you can use those tools in a cloud native way with a virtual application network. So that's just my personal preference on that. We've looked at Istio service mesh. The question is, have we considered Istio service mesh? And we did, Istio is very powerful, very, very powerful, very, very complicated, right? And we found that using a virtual application network where basically you were just using DNS and a headless service was much easier to configure. And because we're doing a micro database and micro cluster segmentation, it just was a lot of less effort for us. But Istio is a fantastic option. And I know that they do that. I know with Anthos, you can do it. I know Azure Arc has a solution that is very similar as well. So no problem. Thank you for the question. Thank you so much. Thank y'all. That was my time. Thank you.