 Alright. What's up nerds? Yeah. So I'd like to know did you come here because you play Minecraft? Raise your hand. And then keep your hand up please. Alternatively, did you come here because you have important customers who play Minecraft? Yeah, I'm both. I have some kids who are massively obsessed with Minecraft and then I also like to play it. And for me it's a resource management game. I collect all the resources and my children consume them on the game. And then I also use it as a three dimensional programming model so I get to automate things and that's super dope. So if you do Minecraft, that's great. How many of you came here because you have to manage commercial off the shelf software that's provided to you by others with strange constraints you don't always know, right? Cool. Good. Everyone's going to be happy. There's going to be a lot more of the Kubernetes than the Minecraft. I hope that's alright. Okay. This is my first talk in an actual conference in more than two years for undisclosed reasons. But I'm pretty excited. You can reach out to me. I'm at KC West everywhere. I recently bought the Instagram handle from the other person. And if you do go and follow me, you're probably going to see hiking and rock climbing and stuff more than computer nerd stuff. But if you're into that, that's where I'm at. Let's start with the Minecraft architecture diagram. So those of you who run a Minecraft server know that this is the architecture diagram. You've got a client on the left there. That's your computer. And if you're running a server that's separate from that computer, then you run server.jar and you connect to it, right? Obviously, the title of this talk is enterprise grade Minecraft on Kubernetes. And so I got to thinking, you know, is it possible to run this server on Kubernetes in a way that's resilient? And we can also do things like, you know, data backup so that if we need to recover the world from a previous snapshot, we can do something like that. And it turns out you can. So I work for a cloud company, so my Kubernetes live in one place. Yours can live wherever they want. It doesn't matter. This isn't a specific talk in any way. But this is roughly what the architecture is going to look like, you know? So you've got your client there on the left. There's going to be a load balancer just in front of an instance. So that's fun. An instance of this server. If you're running commercial grade software that's provided you by others, the one that I love for work is like S4HANA from SAP. Incredible, massive database, super performant. And there are a couple of different ways to scale it, but, you know, one of them is get the biggest computer you can find, right? And that's kind of like Minecraft. So another thing about this piece of software is that you can do modifications, right? So you can install modifications on the server. And that's pretty fun. How many of you play or know about modded Minecraft? Yes, we really do have a roomful of nerds. This is great. Yeah. So, you know, I like, you know, fabric and some vanilla tweaks, data packs and things like that. And I figure I shouldn't just go and, you know, curl them from the web. I should do that. But then put them in a storage solution that I control, right? And that I can also do things like version control. And then I can get them into my Minecraft instance. So that's one thing we're going to do with customizations. And then, of course, the other big thing is backups, right? So how do we do backups? Well, I'm going to choose to do storage, like super cold, like version controlled storage. It's pretty inexpensive. And I can always pull out a version from the past. And I can configure that however I want. So that's kind of the plan. But it's a little bit, there are a couple of other little things to point out. One of them is, you know, being able to consistently know the address of your Minecraft server is handy. So we need like a static IP. We can throw DNS in front of that. We could do, you know, a solution like that. There's also, you know, the server is running, and we might need to run remote commands against that server, right? This is something that happens a lot with commercial software that's given to us, is that, you know, usually we're required to log into the machine it's running on or the VM it's running on and do stuff, right? And we don't really want to do that for obvious reasons. I think for probably most of us in the Kubernetes world, in a containerized landscape, we don't want to be logging on and messing around. So we need some sort of remote admin. Thankfully in the gaming world, Valve released an open protocol called RCon, which lets you do remote control of your game server. And it's been baked into Minecraft. Go Minecraft devs. I also want to use a mod called Voice Chat because I play with my children and, you know, it's really fun for them to bark orders at me to collect certain types of resources for them so they can use them in the build. And we do that through Voice Chat. So we're going to install that as a client. And of course that requires a service and a protocol so that our clients can interact with the server-side Voice Chat system. And then for backups we're going to use a good old Cron, but we're going to do it the Kubernetes way. So, you know, that's going to be the fun. And I have, let's see, about a maximum of 30 minutes to get into it. So just to give you a heads up, I'm going to go ahead and create a cluster. Forget the thing on the left, that's a typo. That'll come in later. But a thing that is handy to know, if you don't know about our machine types, it doesn't really matter. But this is a high memory, high CPU machine type. So I'm just going to create a node pool with one instance because I'm running one Minecraft server. And I can only run one instance of that server. And we'll get into why in a moment. And this is roughly the configuration that I'm going to use for that. And I'm just going to set a few other things to try and keep my Kubernetes cluster up to date and in good shape. So let's start with state management. There are some requirements for this particular piece of commercial off the shelf software. I didn't write them down because I'm a bad product manager. But I am an all right engineer. So, you know, it worked out for me. But one of the requirements of Minecraft, one of the things to do with this application is that it keeps its state on disk. So when you create a new world, a new world is a bunch of files on disk in a directory. Furthermore, you can configure your Minecraft server. You can set a bunch of parameters. Like am I playing in creative mode or survival? Am I playing on hard mode? Who's allowed to connect to this server? What is my white list of those individuals who can be an operator, right? These sorts of things that are important to know. And the devs for Minecraft have decided that the way to do that is on disk, right? So that's also on disk. All of this is inside of data directory. So we obviously need to keep that data directory. We have to manage that as state. We don't want it to disappear if our game crashes, right? If our pod disappears, we don't want our actual world to disappear. That makes the children literally cry. So we don't want to do that. So for state management, and you know, there's a lot of CRDs in here, and I'm not going to go through every line. I'm going to assume that we can all Google this or maybe even know what all of these things are. But I'm just going to highlight a couple of things as we walk through what the configuration looks like and why I chose the things I did. So one of the things that I want to do is I want to use a fast disk. So Minecraft is going to work better if your disk is fast. It's going to perform a little better. It's also going to perform if you have more CPU in memory or the most you can have, right? So up to a point because Java. But it's going to perform pretty well if you have a lot of resources. And so I'm going to use solid state disks. And in order to do that, I just need to set up a storage class first. I'm going to call it fast disk here. And there are a couple of things I'm going to do. The thing I want to point out is the reclaim policy here, which is to retain. So if something happens where this gets yeeted from your Kubernetes state for your cluster, we can retain the actual disk underneath. And the other thing is, of course, volume expansion. And you're going to see right now that I'm going to start with about 50 gigs of data. That's a pretty sizable world. But if you play the game for a while, you know that you can outpace that 50 gigs pretty easily. And you want to be able to expand that. But you don't want to have to pay for and over provision, you don't want to have to pay for 500 gigs or something like that. So that's one of the things we're going to do. The other thing we're going to do is set this access mode. So like a lot of, again, commercial off-the-shelf software that wasn't designed to work in a Kubernetes world and sort of scale in a cloud first manner, this application runs one instance against one set of data. So you can't run multiple instances of the same set of data. We would run into all of the classic problems that we're familiar with from the last 30 or 40 years with disk-based state and multiple instances of our applications, like file locking and reading and writing is an issue. So we want to avoid all that issue. And we're just going to say, read, write once pod. So only one pod can have access to read or write on our cluster to this disk. That's one of the key things here. And of course we're specifying that we're going to use this fast disk storage class. So that allows us to get away with having our solid-state disk. And we also happen to have a secret. And the secret is actually for that R-con tool. So I mentioned that we're going to set up the protocol for doing remote management of this application. And we're using the R-con protocol. Again, Valve open sourced the protocol, and it's an open sourced password to connect to it. So this is not a secure password. I don't recommend you using it. It isn't my real password. Yeah, use it if you want. Actually, it doesn't matter to me. Alright, application deployment. So let's get into this a little bit. This is of course going to be a bit longer. So one of the things here is our recreate strategy. So by default many of us may know that when we go to update a deployment it's going to do a rolling deployer. It's going to attempt to do a rolling deployer. That's not going to work out so well for us if we have multiple things trying to access this state on disk, right? So instead of that in this particular case we have to deal with the downtime and just recreate the pod. So we're going to shut down the old one, terminate it, and then we're going to start up another one. Super important for our state here. Here's a little bit more of the deployment under the specification here. So for our container I'm actually standing on the shoulders of giants. This user here, that's also his username on GitHub, created a Minecraft server docker container for us which is really handy. And I'm going to pretend that in the world of enterprise grade work this person is on my team and built me a kick in docker container and in fact that is what happens. So we've got a great image to work from and we're going to start from there and I'm using latest which should be a red flag for all of you, totally not enterprise grade, send your pull requests but hopefully everybody knows. Yeah, I saw a lot of head nods. We should pick our versions. Okay, cool. Yep, so then we're going to request some resources. So in this case I'm going to request as much CPU as I can get and about 5 gigs of memory and this particular machine type class I think has 6 gigs of memory so I'm leaving a little bit of room for the overhead but basically consuming it all with this application. And I already know because I've designed this solution that I'm going to run one instance on one machine. One node will run one pod and that's it. And in fact, because again the way that Minecraft works it's going to be a noisy neighbor it's going to consume and aggressively use the resources that you try and offer it. I wouldn't necessarily want to run more than one on a node. You know, your mileage may vary but I would probably choose one per node going forward. And then I'm going to set up these volume mounts. So I mentioned state is managed on disk in this container image configuration that state is at slash data from root. So we're going to mount our fast disk volume claim into that location. So that's going to be how we keep our state inside of this pod. So that's our stateful container. The dev that built our image provided through environment variables a bunch of ways to configure the server. I mentioned that we have all kinds of configuration settings that we can use. You can see here and this is actually pretty important for Minecraft if you want to tune its actual execution is to supply a bunch of JVM arguments. I've, for the sake of this slide and just space, I've cut out a bunch of what's actually in my configuration. And I would, you know, that's something you can Google for yourself if you haven't when it comes to running Java version of Minecraft but you can see I'm going to set the memory at four gigs. Now this is interesting because I'm I've provisioned five gigs for the pod and I've requested five gigs for the pod and I'm using four. I'm giving myself a little bit head room here just in case because I really want things to run smoothly. I accept the end user license agreement. I'm enabling RCon and I'm also setting the password through this environment state as well from a secret, right? So we set up that secret earlier. Cool. Okay. So our server exposes multiple protocols on multiple ports or multiple services, I should say network services on multiple ports and using a couple of different protocols so we need to expose those as well. So this is something that I've learned to appreciate about Kubernetes in general. I come from a more platform, a higher abstraction application development world and of course in that world like a lot of your apps like they're connecting over one protocol or you're using them over one protocol, you only have one port like you can have a whole bunch of constraints and everything is fine. But I didn't write this software and it was given to me and I got to run it and I need more flexibility than that, right? So I already have to do custom things with disk and state I have to do custom things with ports I can do that with Kubernetes it's really not that big of a deal, right? It's designed to allow for that and that's why I think we can get away with this doing stateful and caught solutions on Kubernetes in a pretty easy to manage way. So we've got voice chat that's UDP, we've got Archon, we've got Minecraft itself. These ports are all default from the configuration of the software directly. And then finally I'm going to set up some probes, right? So I want to know when the server is ready I also want to know if it's alive so we're just going to do some probes to check that sort of thing. It doesn't hurt to do it, it's a it's a decent idea. And this is where I had some fun so how do I get my mods on disk, right? How do I get my mods into the server instance before it spins up so that it knows that it has mods and this is a solution that I came up with is using initialization containers and this is pretty dope and you can actually have an array of them so I just have one here but you can do another dash args down below and you can have a whole list and they run in order and so that's pretty neat because then you can separate them out so this is the one to install voice chat and I've got one for Fabric and one for the vanilla tweaks data pack and so on and so forth. But you can see here that I am storing the mods as jars in a bucket that I can access and pull from and I'm just using curl and using the curl image here for this initialization container so I don't have to initialize using the Minecraft container all I have to do is make sure I mount the data directory and I make it readable or writable rather, right? So as long as I do that I can actually write the state into it and I do that before the game ever starts up and that's how I can get my mods in there. Now it does this every single time I restart the server but I don't restart the server that often and it's pretty fast and I think that's okay and I could optimize this with additional checks to like bail out early and you know freshness or whatever but it's good enough cool. So initializing containers pretty neat. Alright now service availability so we obviously have the Minecraft service itself so if I have a client and I want to connect to the server I'd have to connect to some IP address or if I set up DNS it would be DNS. We also have the voice chat which connects through the client as well if the client has that plugin enabled or that mod and we have the Archon remote control API so we have to set up these services. So the first thing that I need to do is have an IP address this is how I get my IP address and how I make it static you can do it however you like it's worth noting that by default this first command gives you an IPv4 I actually don't know if Minecraft can speak over or if the client knows IPv6 I see some nods I see some maybes so you know I stuck with IPv4 you could also do a global IP but all of my users are within the United States and I feel like it's not so bad if it's all in from one zone so I just stuck with that. And then of course if you do describe that I'll essentially tell you what the IP address is so we have three services so the first one is Minecraft this is relatively straightforward we're gonna set up the load balancer give it the IP address that we have that's static and in this case I set external traffic policy to cluster I set it to cluster because I didn't mind the hop I think we could do local and honestly I didn't check it so you could do local or cluster but it is nice to have the load balancer in there so I like to I like to have it and we're gonna do the same for Archon here it's again it's TCP same IP address we're gonna do the load balancer and finally voice chat so for voice chat it's UDP which of course we can do through load balancers so no big deal and same IP address at services so now we've got the services exposed but here's the big one so how do we do backup and I found this personally really a little overly complex for my taste and for those of you who have had to deal with identity management and access to external services and having access to manipulate Kubernetes through the API you maybe have already had to deal with this but many people in this audience right now who are more expert than me so I'm just gonna walk you through what I did but basically what I wanted to do is is about twice a day I just wanted to take a snapshot right take a snapshot throw it into a storage bucket it's versioned and I keep a certain number of versions I think I chose to keep five so that's about two and a half days of play so essentially if like one of my children you know TNT griefs the heck out of one of my other kids builds I have two and a half days to like make it right right so that's you know my that's my concern there so the first thing I had to do is create a service account you know just a backup runner then I had to create a role to interact with Kubernetes right so the Kubernetes API so essentially I need to be able to get information out of this pod the data is in the pod the pod gets this generated and I need to be able to access it in order to run some commands and you'll see why in a second and then I have to bind this role these are all pretty straight forward there's no like sort of fancy thing here no no magic necessarily but then because I want to store this on cloud storage in my you know infrastructure provider of choice I have to manage that as well so I have to create an IAM service account to interact with my service infrastructure provider and then I have to add that service account to a policy that can manage storage objects so that's this right here like be a storage admin let me do stuff with storage and do it in my project alrighty so now we're at this policy which allows Kubernetes workload to do to have an identity so we're going to run a cron job on Kubernetes and that's going to initialize a pod that pod is going to log on to this other pod get the data out of it and put it in the gcs object or google cloud storage object and in order to do that we have to be some some identity right this application has to take on an identity and I want to take on the service account for gcp in this case so that's a workload identity user here and I'm setting this member the minecraft backup runner essay so essentially it can be this other service account which leads us to the cron job which I was relatively new to when I first set this up I set this up initially about a year ago but I was initially new to it and this is super cool so it's just like cron jobs of old you say you want to run it twice a day I'm going to say specifically like I do not want to run this concurrently right so I just only want to ever run one instance of this thing at a time so I can if my world was massive or I was taking snapshots very frequently I wouldn't have these things running over one another and this is what the cron job looks like so I'm going to run a pod and the container that I'm going to pull the image I'm going to pull is the cloud SDK because I need some tools to interact with my infrastructure provider and it turns out I can just get this super lightweight cloud SDK it's running alpine or it's on alpine and then I'm going to install these components on gcloud because I get gcloud from the cloud SDK so I can get my Kubernetes API interaction and I can get my storage object management so I'm going to get my pod ID using this gnarly thing here and once I have that I can start executing commands now I'm running in the cluster so I'm using the rcon CLI so I mentioned already that this is a remote control API that you can use to manage the server remotely you can also use the CLI just like you'd use the Kubernetes API controller API directly or you'd use the command line and so I'm going to turn off saving in the game and then I'm going to save everything that's currently in memory get our snapshot snapshot it turn saving back on so there's a little blip here where maybe something would glitch out and we would actually lose some state if people were playing while it was happening I haven't seen that happen but it could happen it's a race and then I'm going to use gsutil to copy that new snapshot into the bucket and again this bucket is just called archive storage that gets versioned so old versions just go away automatically it cost me a little bit extra because I'm using a cloudy provider here it cost me a little extra to yank it back out but it's really, really, really cheap to put it in so that's nice but if it happens to fail then we'll restart we'll just try again and that's backups so that's not too bad this is what the backup configuration roughly looks like this is the important bits that you would need to know again I think you can do this on any provider that provides this kind of storage solution it's not unique but the idea is turn on versioning to see a certain number of days or a certain number of revisions so you can do that sort of thing alright so does it run so if I start the client which I just took snapshots I just took pictures to be super safe but if I start the client I would see my Minecraft server running on Kubernetes I've said that a maximum of four players I've only whitelisted four players that's how many I have and nobody was on when I took the snapshot and then when I logged in last I had nothing and I was in a cave but it did run and just for the heck of it we can try and I last deployed this last night because you know conference talks so it should still be and we will find out so if I go to multiplayer family server I should still be stuck in a scary, scary cave yeah there I am alright yeah thank you, thank you and you can see that we have our internet connectivity here so that's cool and ours like your mobile phone indicate how good your latency is and we're not doing too bad so that's pretty cool by the way I was here earlier than most because I was prepping to give this talk and it felt like we should have music and I don't know how many of you know this but if you do play Minecraft and if some of your constituents are are small or fun there's a parody of Katy Perry's last Friday night I know it's on Spotify because that's what I use but it's called don't mind at night and it is very funny it's like I know you're feeling kind of brave I know you're looking at that cave and you're feeling kind of brave don't mind at night it's pretty good so highly recommended it's a bop but you don't want to go find my slides on schedule and piece this together yourself by hand turns out the same person who wrote the awesome or put together the awesome image from Minecraft also put together the server charts and threw it on github and that's actually what I do use to manage my instances and they are great and he has in fact also done the backup management and stuff for you so if you want specifically to run Minecraft and do it in a decent way this is a pretty cool way to do it I think and it works on you know on Kubernetes it's not provider specific but he does have some notes here and there if you're using a particular provider and then I did a couple of things custom most notably like the storage class for persistent disks it's a certain configuration that works with my chosen infrastructure location so that's pretty cool and hopefully through this my goal was to explain some of the things that Kubernetes has on hand that I use to run an off-the-shelf piece of software that I also recommend when I talk to big companies and people who are trying to figure out how to containerize their stuff you know when it comes to off-the-shelf the first question is like are you really sure that's what you need to do and if they are really really sure that's what they need to do a lot of these tips and tricks are quite useful for a lot of solutions that you have to manage that could go into a Kubernetes cluster but need a little finessing of the configuration we've got options for you so hopefully you can take these and run with them there's four minutes if there's a question or two yeah so question is have I thought about using a job template to restore the data totally yes that is a good idea however I didn't write anything to do data restore because that's how good of an engineer I am now just because I honestly was like I can solve that problem if I have it at least I have the backups the snapshots I trust them and I believe that I know that my customer would alert me to an issue with this application immediately and I would have time to fix it that's my circumstances may not be everybody's that's a good question one last question oh we have a question right here yep go for it totally this cluster configuration I think on my infrastructure provider it's about $106 a month yeah now Minecraft is really important to my family yeah so yeah no doubt you don't have to run a 4 gig server instance and again getting nerdy about Minecraft specifically for a moment there are a lot of mods that hyper-optimize both memory and CPU consumption and you can get a really high quality experience with reduced resource claims so you could probably I think for four players you could probably cut that in half and be happy for a while but it's not free that's a good point yeah totally awesome question so the question is is this running 24-7 or can you optimize it I do not run at 24-7 because sometimes my children get grounded which is basically the same as maybe you only need this during business hours like you know in one time zone you can shut this down so I shut it down just by I don't have it automated I just go and run replicas to zero scale so I just say run zero replicas and when I run one because I have the persistent volume claim and it is persistent it just pulls the world back up and everything is good to go yeah that is an excellent question so in this particular case it's pretty easy and you could throw that into a cron job if you wanted to another good question yeah alright last one yeah so the backup still try and they fail yeah that's a good question too yeah so they fail because they can't get the pod ID and then there's an error yeah totally yeah that's a good question alright I really appreciate it everybody it's a long walk back to whatever talk you're going to next so thank you very much for coming down here