 Our next speaker will be Marcus Noble, who works as a platform engineer at Giant Swarm. And his work is basically focusing on, guess what, Kubernetes and containers and DevOps. And in his talk right now, he'll be talking about how to run Kubernetes clusters without going mad or insane. The talk will cover some practical tips and for any experience level with Kubernetes. Thank you very much. Let me just get my slides up one second. Shall we just give it a few minutes before we kick off or shall we start now? Whenever you're ready. There are still people still coming, so. I'll give it another minute, so it's five fast and then I'll start. OK, thanks. OK, I'm going to kick it off. So as I said, I'm going to be talking about some practical tips that I've learned on how to manage Kubernetes clusters without losing your call. So before I get into that, just a little bit about me to give you a bit of context. I'm a platform engineer at Giant Swarm. If you want to find me on the web, I usually have Marcus or Marcus K. It's dot social of Macedon. But I've got about five, probably six years' experience now running Kubernetes in production. And throughout that time, I have gone through kind of the different roles that interact with Kubernetes. So I started out as an app developer working on full stack, no JS applications, deploying my applications to Kubernetes. I migrated into a kind of a support team that built out tooling to help other teams work with their Kubernetes clusters. I then moved on to more building applications, building system applications to be deployed on to Kubernetes, so controllers, operators, that kind of thing. And then finally kind of moved into full blown operation or management of Kubernetes clusters. So I've kind of experienced a few different ways that people work with Kubernetes. And throughout that time, I've picked up a few things that I'd like to share with everybody today. So these are going to be kind of my top tips for working with Kubernetes. And I've broken them down to the first five anybody can pick up today. Anybody that's working with Kubernetes, you can start using these today if you're not already. Six and seven would be good to have a little bit more kind of like old school ops knowledge, so comfortable with the terminal using some Linux commands, et cetera. And because of how much time we've got today, I'm going to just summarize at the end some additional things that's worth looking into for more advanced ways of working with Kubernetes that can save you time, save you headaches when things go wrong. So without further ado, let's kick off with my first tip. And that's love your terminal. So if anybody has worked with Kubernetes for any kind of length of time, you'll know that at some point you'll be using Qt control to access your cluster and figure out what's going on. So regardless of what you use, whether you use Bash, ZSH, Fish, Power, Dallon, Windows, whatever it may be, I recommend spending a bit of time to get comfortable with what you're using and work out how to best use it for yourself. So whether that's learning the different aliases that are available, learning the different tool that's available, setting the right font in your terminal so you can be more comfortable when working with it, whatever that may be. I also highly recommend leveraging the RC files, like Bash RC, ZSH RC, things like that, where you can build in your own aliases, which are short commands that allow you to do more complex commands behind the scenes. I use these a lot for commands that I type a lot, but because of how clumsy I am when it comes to typing, I do a lot of typos and things like that. So I want them as short as possible. I also recommend looking out for .files on GitHub and things like that, which is where people have shared their terminal shortcuts that you can get and kind of copy and tweak your own usage. And mine are available there if you are at all interested. So a quick follow up from that is also learn to love Qt control itself. So first thing I recommend, going back to tip number one is add an alias for Qt control to shorten it to K, just to speed up when you're typing out all these different commands. You can add additional ones if you want, if you need commands to get all pods, you can have an alias for that or whatever it may be. But I recommend kind of whatever you use repeatedly and you don't want to have to keep making mistakes when you're kind of clumsy typing at 2 a.m, responding to an alert. Make a little alias or help a script to help you with that. I also highly recommend the official documentation on the Kubernetes IA website for the different commands. This is a really good single page website that gives you information of all the commands, all the flags, all the options you can have. For each of them, it is a great way to kind of go in, control F and then find what you want to work with. But there's also Qt control explained. So there is a command in Qt control that allows you to access the open API schema of resources in your cluster. So if you want to know what properties a particular resource can support, can have what value, you can use Qt control explained to have a look at it right from your terminal. So you can see an example on the right here where Qt control explained our pods.spec.containers and we're then getting all the documentation for the containers property in the podspec. So we can then dig into that and fill out the values as we need without having to go off to some external resource. And this also works with custom resource definitions as well, providing they have an open API along with their resource in the cluster. Tip number three, working with multiple Kube configs. So at some point, it's more than likely that you're going to have more than one Kubernetes cluster that you work with. If you're an app developer, this may be multiple environments. So it may be dev production staging, a local kind cluster, whatever that may be. And if you're more of a platform engineering team, it may be tens to hundreds of different clusters that you're working with. Switching between the different clusters is fairly easy with Qt control, but prone to mistake, you kind of not always what cluster you're pointing at. So I highly recommend looking at one of these three projects. I personally use Qt switch at the bottom that allows me to structure all my different Kube configs in a hierarchical directory structure. And it will just search over all of them for me when I'm switching. It's very nice. But all three of them do pretty much the same sort of thing. Allows you nicely switch clusters and between namespaces quite easily. So then all your subsequent commands go to that. If terminals are not really your thing, you're not keen on typing out loads of commands and all this kind of thing. Next thing to look at is interact lies. There are two that I'll kind of recommend. If you are still comfortable enough in the terminal, I highly, highly recommend the K9S tool. And you can see it here. It's a terminal based interactive display of your cluster. And it allows you to do so much. View the current state of things, your pods, nodes, deployments, ingress, whatever it may be. And it allows you to go in there and edit things. It allows you to patch resources and things like that. And I spend probably a cent of my time using this sort of tool to interact with the various clusters that I've got. And I highly recommend it. But if terminal is not really your thing and you want something a bit more visual, a bit more mouse driven, I can also recommend open lens. So it's the same basic principle, but it's with a desktop application where you can click with your mouse. You can see live updates of graphs, of logs, things like that. These tools also allow you to view the logs as they're happening from your pods, which is very nice. And it also gives you a way of actually executing into your containers that are running in your clusters to do some debugging. We'll come to this a little bit, you know? Next thing I want to talk about is kubectl plugins. So these are fantastic. kubectl is really clever the way that handles plugins. So basically anything on your path within your terminal session that is prefix with kubectl-something becomes a kubectl plugin implicitly. That something after the hyphen becomes the name of the plugin. So for example, on the right here, we have a file called kubectl-hello. And all it does is echo out hello, kubectl. And we can then call that just by typing out kubectl-hello. And that becomes a plugin that we can then use. Now, writing out these bash scripts, yes, we can build our own little toolings, but the real power that you've got behind it is there's a whole community around building different plugins for kubectl that allow you to do lots of different things. There is a kubectl plugin called Crew that is a plugin to manage plugins for kubectl. It's very kind of meta. And there's a website that lists all of the different ones that they've got, and you can very easily install them with their kubectl crew install and then the plugin name. And just very quickly, some of the ones that I use quite a lot and are my favorites, Stern is a fantastic plugin that allows you to tail the logs of multiple containers, multiple pods at the same time using filtering and all these kind of things. So if you have an application that's over three different controllers, for example, and four different replicas or whatever, Stern allows you to view all of those streams together and allow you to look out for errors and all these kinds of things, very nice. Tree is very cool if you work with things like cluster API that has this long hierarchical tree of resources being owned by other resources. It allows you to point, say, at your cluster resource in cluster API, and you see all the descendants. So you see the machines, you see the control plane. One that's very typical at the moment is the community image one. What this will do, it'll take a look in your cluster and will let you know if you are referencing any container image still using the old kh.gcr.io registry. If you're not already aware, that's going to be deprecated and it's going to be pointing to a new registry. So I highly recommend running this against the clusters that you manage. See if there's any that you've missed and get those updated to the user in the new registry. And then finally, just a little shout out to the GS plug-in that we have a giant swarm. So this is what we provide for our customers to make it easy for them to work with the clusters that we provide and manage for them. So we've built this tooling that just plugs straight into there, keep control like everything else, and allows them to have some helper functions around getting the various workload clusters and things like that. So section one's kind of done nice and cool. As I said, the next two that we're going to go on to, ideally could do with a little bit of storage, but I think everybody's going to be kind of comfortable with them enough if the 1-5 wasn't a problem. So pod debugging. Anybody that's used Kubernetes a reasonable amount of time will have had to debug a broken pod or a broken deployment at some point. Kind of it's inevitable the way I see it. And there's very good about doing this. And I have some helper tools that I like to use for this process. So going back to tip number one again with our terminal and we're using these aliases and things like that. I have this alias called case shell that is just a wrapper around a kubectl run command that allows me to just create a new temporary pod within my cluster that just has bash or you can change that for alpine or Ubuntu or whatever tooling you need. Gen cluster wide debugging. So I use a lot when I'm seeing issues with working with like cross pod communication issues, things like that. If something looks a bit off in the cluster as a whole, this is my go-to tool to try and start figuring out what's going on. So for example, I will launch this case shell and then I want to do a NS lookup against google.com to see if DNS resolution is working. So if core DNS is actually working or not within my cluster for example. As you can see here, this is returning back a valid response and I can then move on to whatever the next thing might be that could be causing the issue. Next thing I want to talk about is kubectl exec. So this allows you to drop into a shell within a running container in your cluster to kind of figure out what is going wrong within that container that might be causing an error. There are some, the container that you are trying to execute into needs a valid shell environment for you to drop into. It is an interactive terminal. So for example, if you are building containers based on go binaries that just have the single binary. So like from scratch, this approach won't work for those containers. You're also only limited, you're limited to what's available within the container itself or what you're able to then pull into the container. So all of the tooling like debugging tools that you may want are likely not going to be in that container because you don't necessarily want to be deploying those to production for example. So you may have to rely on installing them into a running cluster to, sorry, into a running container to start debugging these problems. And then the final one, which is quite a big one is the container needs to be running. So if you've got a crash loop back off, this isn't going to work. So what I mean, so if I'm trying to get into a container that is just a go binary, we will see that first error. So basically we don't have a shell environment to drop into so this won't work for us. Similarly, there is also kubectl debug which is available on Kubernetes 123. And this is one of my go to debugging tools that I use all the time. This has three different functions to it. But one of the main ones that I use is it allows you to create a new ephemeral container in an existing pod with whatever container image you want. So I can create a new container within my pod that's having issues and with a bunch of debugging, tooling built into that container and I can start trying to figure out what's going wrong. This is useful if we suspect it may be something like a network policy is blocking access or something that we've like IP networking or something like that. We want to kind of figure out why this particular pod can't communicate another pod so a little example of if we've got a pod that's crash loop, crash looping, crash loop back off the kubectl exec won't work. So we will basically drop into the shell when it's running, but as soon as that container exits we will get kicked out. We will have like a matter of seconds potentially to do some debugging. So ideal that with kubectl debug it allows us to bring up another container alongside our broken container to then try and figure out what's going on. And a quick overview of kind of when to use each of the different scenarios. Node debugging, so if all your pods are looking good we want to then maybe move on to the nodes themselves and maybe something there is going wrong. So we're going back to kubectl debug again. One of the other capability that this has is node debugging. And this allows you to say kubectl debug node slash and then the name of that node. And you can then give it a container image. And what this does is it launches a debugging pod on that node specifically and switches out the nodes process space and mounts the nodes host file system in the slash host. So you can get then full access to the node to an extent where you can then start debugging there's something wrong on the actual node itself that's causing problems with your Kubernetes cluster. One of the questions I get with this is why not just SSH onto the node? I personally prefer to have ephemeral instances where you don't have the ability to make changes to your running cluster. So no SSH, no port 22 open things like that. But if you've got SSH access that may be a better way of doing this. One of the other things that this does provide though is you then have your access control to be able to do this managed by RBAC. So you can have RBAC rules to say who can and can't perform this action. If you are before Kubernetes 123 there is another work around way of doing this. You can launch a privilege container with the NSEnter command. This is the command that's even available to tweet and as a sticker from the opinion there. This does have some caveats though is that the node you are trying to debug needs to have a valid shell. So if you're using Talos Linux, for example, for your Kubernetes cluster, this will not work whereas Qt control debug will work. And I have an example of this on my GitHub and Giant Swarm also has a similar Qt control plugin for this option. So those are my tips that I'm gonna cover today. Been going over this pretty quickly but I wanna very briefly talk about if you've kind of covered all of those, what to look at next and where you can then get some real power out of making your clusters work for you. So webhooks, for those that kind of know me they know that it's kind of a love-hate relationship when it comes to webhooks and Kubernetes. There are a very core functionality with Kubernetes they provide a lot of power, a lot of capabilities but if not used correctly they can break your cluster and I have another talk where I talk about various scenarios where bad webhooks have taken down entire clusters. So yeah, use with caution. But these allow you to do things like implement some more advanced RBAC capabilities is all additive. Webhooks will allow us to take away some permissions. So there's some good example where we had an issue with Giant Swarm where we needed to take away a very particular issue a very particular permission because we found a bug in one of our CLI tools and it would take too long to get the CLI update out to all of our customers. So we needed to block it in the meantime and we were using webhook, using a validating webhook. And they allow you to do things like defaulting logic enforce policies like you can't use the latest tag on your container images and things like that. And one of the favorite examples is they have to hot fix some security issues. So I'm sure one of the local show incident that was a few years back now, there's actually a Coverno policy that mitigates that vulnerability by setting a specific environment variable on all containers running within your cluster. So it actually shuts off that vulnerability cluster wide. And if you're, I recommend looking at either Coverno or OPA gatekeeper or similar tools that give you a more abstracted view on working with webhooks and allow you to do it in a more kind of declarative way. So with Coverno, for example, you create like a policy resource within your cluster and then that is picked up by Coverno and implemented for you. Similarly, the Kubernetes API itself is very powerful. Like all of these tools, obviously you're using the API in some regards and there's a lot of tooling libraries and just help out there to be able to leverage this in your own things. So if you are comfortable with Go, for example, there's the client Go, which 80 to 90% of all operators are using it seems. But there is also an organization on GitHub called Kubernetes Client and under there it has a lot of officials, officially supported clients for the Kubernetes API in all sorts of different languages. So Node.js, Python, et cetera, whatever you're comfortable with. And then you can use these to actually build out your own tooling. So we could take those little bash scripts that we were building in our step one and turn them into a more robust application that we need for our repetitive tooling debugging whatever it is. And then you also may want to look at building out your own operators and extending Kubernetes itself so that it does the work for you rather than you having too much things. And with that note, I'm going to talk about CIDs and operators. So CIDs are custom. So Kubernetes is fantastic in the way that allows you to extend Kubernetes itself, extend what it offers and extends the logic that's built into it. So you can define your own resources that run within Kubernetes and then you can build your own operators that do work against those resources. This allows you to basically put your cluster on autopilot and you can build in logic that uses this reconciliation loop to make sure that based on some state of a particular resource, other things happen that you want to happen. So this diagram here is taken from a brilliant article by Container Solutions. I highly recommend reading it if you're interested in CIDs and operators. But the basic idea is a user or something submits a custom resource or updates custom resource in the cluster. There is an operator running in that cluster that watches for changes to those resources, watches for creations, deletions, updates, whatever it may be. And it takes those changes, performs some business logic and then updates something else. So a good example of this is the cluster API project. So you have a cluster resource that you apply in your cluster API watches that. When it sees new ones, it goes off and it creates a cluster else provider wherever it is that you've got it configured to do. So let's recap. Love your terminal. So if you're gonna be using Kubernetes day in, day out, I highly recommend being comfortable with your terminal, with your CLI, with the kubectl commands and these sort of things. So love kubectl, make sure you're comfortable with the various things it offers. So things like kubectl explain, kubectl debug can answer all these sort of things that are available to you. I highly recommend taking a look at that one page documentation and all the different commands that are available to you. If you work in multiple clusters, make it easy for yourself and use all those toolings to switch kubectl easily and have a look at these UI tools. So K9S and OpenLens, take a look at them, see the development and your debugging, things like that. If you want to extend kubectl, have a look at some kubectl plugins, stern and tree and things like that, I highly recommend taking a look at those but there is a big list of community provided ones on the crew website that you recommend taking a look through and just picking out the ones that look like they will solve your problems for you. And then pod debugging and node debugging. So leverage these tools that we've got. Leverage the aliases that we've got terminal, leverage the kubectl debug command and leverage scripts and things like that to make it as easy as possible for us to fix it at 2 a.m. when we've had an alert that our cluster's broken. And then if you want to take it further, have a look into Kubernetes webhooks or mutating or validating webhooks. Can we use the API to make our more robust and more powerful and CRDs and controllers for actually making our cluster do the work for us so we don't have to. And with that, I'd like to say thank you.