 If I can see around this curtain, I'm going to be talking about security hardening for databases using Kubernetes operators. This is a 25 minute talk. We've just burned three or four playing around with the video problems. So I'll try to get through it quickly, but without sounding like one of those lawyers on the commercials. Yeah, so is it safe? I don't know if anybody here saw the marathon man. So you probably haven't been to a dentist since then, but that's where this title is from. Just a little bit about me. I'll introduce more, but I work for a company called Tinnity. I am a DOK ambassador. So I've been working on data on Kubernetes for a number of years. And part of the reason for that is my company is a provider for Clickhouse, both enterprise support as well as cloud. We let people run it anywhere. And we made a decision about five years ago at a time when to me it seemed really stupid to put all of our database processing that we did onto Kubernetes. That was, we had a big argument about it. In fact, when I was first getting my job, which as a side note is to run the company and I thought it was the dumbest idea I'd ever heard. And I've been working with databases for a long time. Anyway, we did it and it worked out great. So one of the things we've done is we wrote probably the first operator for a data warehouse called, used to be called the Kubernetes operator, but excuse me, the Clickhouse operator, but now we call it the Altinnity Kubernetes operator for Clickhouse just to be clear about it. So that's where our background comes from. So in this talk, I'm mostly going to talk about Clickhouse as examples, but I want to be absolutely clear that I'm doing that because those are the ones where I can get code really easily that I know works. But everything I'm saying here applies generally to database operators. And I'll come back to that theme. So first of all, I think everybody in this room probably knows that Kubernetes orchestrates container-based applications. So the idea is you have, in a very simple case, you have like a server. In this case, it's Clickhouse, but it could be MySQL, Postgres, something else. It wants to talk to a patch of storage and do useful things for you. Process transactions, process queries. Kubernetes, to set that up, what you will do is create a resource model. In our particular case, a stateful set, pod definitions, persistent volume claims, PVs, and those together, when they appear on Kubernetes, Kubernetes will, of course, rearrange the underlying storage, the underlying infrastructure to make it true. So for a variety of reasons, Kubernetes turns out to be an absolutely awesome platform for database. One of the things is there's a portability story on Kubernetes that if you need to run in different locations, and it's not just like, hey, we might have to be multi-cloud, but I'd like to be able to test something on my laptop, but also then upload it to Amazon EKS, run it in the cloud. Basically, the same description of the application, same configuration roughly will work in both those locations. The problem is that since Kubernetes is a good place or has become a good place to run data, that's where people will go if they want to steal it. So what we're going to do is talk about security. So when you dig into the details of protecting a database, and I'm just picking again ours as an example, but this could be Cassandra, this could be MySQL with multiple nodes, this could be any database you choose, it's complicated and ugly. Kubernetes is a great system, but it is not simple. And databases are not simple, so you combine two not simple things, and you have a fairly intimidating security problem. So one of the things that I recommend doing is trying to simplify the problem by framing it in a way that divides it up into pieces that you can solve separately. So yes, this is the punchline on that picture. So what we can do is we can think about database protection on Kubernetes is having three parts. The first is just protect the database as an application running inside Kubernetes. Another thing is to protect Kubernetes itself so that people can't get it into it in the first place and that we are not by inserting the database, somehow compromising protections that Kubernetes should be offering. And then the final thing, and this is one that if you haven't dealt with this before you might forget, is protect external data. It turns out databases increasingly talk to things outside of the environment, sometimes outside of your cloud domain, certainly outside of the Kubernetes cluster itself. So we need to think about all three of these things. In this talk, I won't discuss very much about Kubernetes because we don't have a lot of time, but it's mostly standard stuff. We'll focus mostly on the first and a little bit on the third. So the first thing to achieve security is just to simplify things as much as possible. Complicated things are difficult to predict because your eyes kind of glaze over, you will miss things. There are different attack surfaces. And one of the reasons that Kubernetes is such a great environment for data is because of the presence of operators. And what operators do is they create what's called a custom resource definition that allows you to reduce a potentially complex database configuration into a few inches of YAML. So in this case, the simple case of a server and storage, that's like a teeny bit of YAML, maybe 12 lines long, something like that. The Kubernetes operator is in a type of controller, which when that YAML is submitted, Kubernetes will snag it, recognize that there's an operator that takes care of that because of the resource type, and then it'll send it to the operator, which will then look at what's going on out in the world and make it happen. How does it do that? Again, I think most people here know this. It uses reconciliation. So you have the definition of the database. You have the resources that are already available in Kubernetes, and the operator just reconciles them, makes the stuff that you have in your resource definition match what's on the disk, or excuse me, in the resource definitions that are used by Kubernetes to manipulate infrastructure. It's not quite as simple as that because operators also do this, for example, in a certain order. If we have a bunch of databases and we make, excuse me, a bunch of servers and we make a change that affects all of them, we will typically do it like one at a time, or maybe in groups, so that we don't take the whole database down. So in that sense, an operator is kind of like an experienced administrator who does things in the best way so that your applications are correctly configured, but also don't change in ways that disrupt processing. So where this becomes a really huge win is not for that little one-node database, but for real databases, which often have a lot of nodes. One of our biggest customers runs what they call a Frankenstein cluster, it has 550 servers running in Amazon EKS, that is a lot. So when you're trying to manage, in our own cloud, we often manage servers that have dozens of nodes. So in this case, having this YAML definition that then maps to the actual resource, this means that it's a much, the surface that we have to examine and think about protecting is much, much smaller. So for me, the first thing you wanna do is pick an operator and then you wanna make a decision, hey, is this thing gonna help me with security? So good operators do have built-in security features, and this is just an example from our ClickHouse operator, but many other databases do similar things. You know if you use MySQL, if you just install it, you can actually set it up so that if you log in as root, you'll just go straight into MySQL, there's no password. That's actually bad. So a good operator will take those default accounts and stick a password on them. It might actually write it into a secret so you can find out what it is. It'll do things like restricting that default user to certain IP addresses so that by default, it can only, for example, be accessed by local host or maybe IPs that belong to the cluster. And it might do things like secure your communications with other databases in the cluster. These are things that a good operator will just do out of the box. Now, that's easy stuff, but one of the first things you have to do with databases is actually manage credentials, and most databases have a variety of ways of doing this, so we would expect operators to help us with this. So in fact, with databases, it's not just credentials for users logging into the databases, it's also things like key credentials that are used to talk to S3, object storage, or talk to Kafka, things like that. So these are things that the operator should help us protect. And the standard way to manage credentials in Kubernetes in a secure way is Secrets. So that's a resource type, which allows you to pass information between a user who creates this resource that the user being an administrator and the process or the container that actually needs to use it. During the time that they're passed around, they are stored reasonably securely so that somebody can't see them and make off with them. So what you wanna see is some sort of secret management inside your operator that actually uses these, enables you to use these effectively. And in fact, most operators that I have seen do understand secrets at some level now. It didn't used to be that way, but it's pretty common. So for example, our operator just has some built-in syntax for passwords. If I create a secret, then the name of the secret is DB passwords. I just have a little bit of extra syntax that's been defined that this operator knows when it sees it in the resource definition. It'll say, hey, I need to go fetch the password out from the secret. Now that's one way to do it. In fact, there's a range of ways that you can do this. But another thing you want to be able to do is leverage, using the operator, leverage the general mechanisms that Klikkos gives you to manage credentials. And one thing that, excuse me, that Kubernetes gives you to manage credentials. And one of the things that's kind of nice about Kubernetes is if you can get in and manipulate the pod definition, you can pass things in as environmentals. And this is a pretty standard way of talking to, for example, cloud resources, where you're just gonna stick them into environmentals and they will be read by the running process. And in this case, it will be able to read S3 storage, which is necessary to operate. So what that means is that you, you know, when you pick the operator, you don't want the operator to be kind of like a black box where you can't control what's going on. It's very nice if you can get in and manipulate the pod definitions that it creates so that you can inject passwords in whatever way works best for that process. So passwords are sort of the base protection, but we still wanna encrypt the communication. So that's the next thing we wanna look at. Databases for many, many years have supported what used to be SSL encryption, but now it's typically called TLS. So the way that it works is you have inbound connections coming from clients. So you want to have those TLS encrypted. Another thing that's really important, particularly in clusters, is to think about the protection of data moving between the nodes of the cluster. So for example, in the case of Postgres or MySQL, that might be replication, you know, where a secondary is attached to a primary and fetching data. In a cluster like an analytic database, it might be a federated query that's getting blasted out over 10 servers. You want that to be protected as well, or at least have the option for it. So the protection mechanism is universally the same. You're going to have a certificate for the server. You're going to have a matching private key. And then either implicitly or explicitly, you will have a CA cert or maybe even a chain of certificate authority certificates that allows you to verify the server certificate. So you need to get these files out to the servers so that your database servers can actually see them. Again, if your operator, you know, you're going to look for features in the operator that are going to help you do this. So there's two aspects to this. And one is, first of all, since you're now going to be encrypting communications, you want to make sure that the ports that are encrypted are the only ones that are open. So this is another thing that you don't want to do. You don't want to have, it's great to encrypt ports, but you'd like to close the non-encrypted ones so that they're just shut off. This is examples of syntax that we use, but other operators have similar ideas. They will simply shut down the non-encrypted ports and not allow them to operate. But the other thing is to actually get the certificates out into position on the file system so that the server when it starts up can actually see them. So the certificate, the key, the CA certs, as the case may be. And in that case, again, we can use certificates, we can use secrets. And this is something where this behavior is very well supported in Kubernetes. As you can see, you know, I'm giving these little gray examples of the secrets, they're simply not very complicated. So all you want is an operator that can actually consume these, and then it's easy enough to create them and supply them. So it's great to have in-flight data protected, but that's only half the battle. Another thing that you wanna do is make sure that the data that you have in storage, in other words, that rest is also protected. So many databases now, including, well, obviously proprietary databases like Oracle have done this for a long time, but most databases have some form of encryption available within the database itself. But there's also a very simple way of doing encryption at the file system level or at the storage level in Kubernetes, and I just wanna talk about that. So the first thing is it's important to be crystal clear about how Kubernetes allocates storage. So probably most people in here know how storage classes work, but in a word, a storage class defines a mapping from a set of properties or basically a name. For example, GP3 encrypted to a particular type of storage, which might be Amazon EBS using GP3 storage type and automatically encrypted. So you'll typically define a storage class and luckily these are very simple objects and you'll put that into your Kubernetes, you'll insert that into your Kubernetes cluster, apply it with kubectl minus F, and it might be a document just like this. This is pretty typical. This is EBS storage and it also allows volume expansion and other very, very handy feature but not necessarily related to encryption. But this will ensure that the block storage is actually encrypted out wherever it lives on EBS. And at that point then, if you're using an operator that gives you control over the storage definition, it's basically a one word change to apply this. And the key here is that the operator that you're using gives you control over the volume definitions in such a way that you can leverage these existing Kubernetes mechanisms, in this case, storage classes. Same thing as what I was doing with secrets. The operator allows me to control how pod definitions are set up so that I can use these Kubernetes features. So that's controlling the database itself but it's also useful to think about operational aspects like the code that we're actually running, is it safe, is it free of exploits, and configuration control. And configuration control in security is really important because a lot of handling security is not preventing security problems in the first place but remediating them quickly and safely when they occur. So let's talk about that for a minute. So if you're looking at operators and the first thing I like to look at when I go out to the repose, it's like, hey, what kind of build process do these folks have? This we use, for example, in our build process, Trivy and Docker Scout. These are Docker Scout is free. Trivy is an open source scanning tool for containers. It's great. It runs off the command line. It installs in literally in seconds. There's no excuse for not running stuff like this in your build pipeline. Moreover, the operators themselves, you can think of them as being a conduit for other code because when they start the database, they're going to pull down containers that contain the database images. Are those things scanned as well? So these are things. So you wanna look at that, make sure that there's some sort of scanning, that there's some sort of CVE management going on. And if you see that, then you feel like, okay, if there were a problem, there's a chance, it would be detected or there's a way of reporting it. And if reported, it would be fixed. So these are things that, and that just goes under the phrase good development hygiene or good security hygiene. The other thing that's really important for large systems is to have fairly tight configuration control. And what has increasingly arisen in, particularly in the world we work in, which is analytics stacks, is that these are large complex services and people are increasingly controlling them using GitOps. One of the most interesting developments of the past few years is projects like Argos CD, which basically allow you to paper over. Argos CD has this wonderful property that the complex installations that you have for all these different types of services like Grafana, like Prometheus, like Clickhouse, like the Clickhouse operator, like MySQL, which all might have to work together to make an application work. Grafana, or if you mean Argos CD, papers over the differences in the installations, allows you to stick them into GitHub and basically just build the entire application off the code that's sitting in GitHub and not just build it, but also be able to change it and have those changes synced out to the deployment in Kubernetes. This is a really important development. And in fact, what we find is we have a lot of users who are running things on Kubernetes themselves. And I would say at this point, the majority of them are probably using Argos CD. There are other projects like Flux. How many people are using Flux in here? How many people are using Argos CD? Yeah, it's pretty, it's a great project. We're doing a lot of work with it, but these are things that give you the ability then to respond quickly to security problems when they arise. So a final thing, this is a short talk, so we're not going too deeply into the details, but I think a final thing when you're protecting databases in Kubernetes is not to forget the data, those poor bits that go to live outside of Kubernetes. And in modern databases, there are at least three ways that this can happen and that are actually integral to the operation. First of all is backups. If somebody can get your backups and they are not encrypted, well, they don't have to break in. And particularly backups that are stuck in object storage, object storage is wonderful. Anybody with a network connection can theoretically get to it. So you want to think really hard about what's happening to backups because that is your production data. Another thing is tables, particularly in analytic systems are increasingly backed by object storage. This enables you to store, because the economics are so much better. It can be 10 to 15 times cheaper than keeping things on block storage. So how is that data protected? Again, if somebody can get in and attack that, they have access to your data without even breaking into Kubernetes. And then the final thing, and I think this is one that people tend to forget about, but we have seen come up is log messages. So when you have queries, like when a query fails to compile, you may get a message which contains the full syntax of the query that failed. It may include things like credit card IDs, for example, if you were selecting, if that were part of the query. So it's really important to think about where these messages are going and either sanitize them or keep them local within Kubernetes. But at least think about where this data is going to make sure it's not being exfiltrated out of your systems and revealing secrets. So speaking of the list, when you get down to it, there's a lot of stuff to think about. It's really not that bad. I don't want to scare anybody, but there is a good list of things. And I'm not going to go through all of these because you don't want to have to remember this. And we thought about this a few months ago and in the Data on Kubernetes community. And one of the projects we've been working on in a somewhat desultery fashion over the last six months is a Data on Kubernetes community operator, security and hardening guide, which is now published out in the SIG operator project on the DOKC GitHub. And this has basically a, first of all, sort of an overview of what the security problem is, how to think about framing it, some general principles, and then just a bunch of practical, sort of checklisty types of things to look for if you want to protect data on Kubernetes. And it ranges from everything from, hey, you know, like, you know, clean up default passwords, don't have like empty passwords all the way to documentation. You might not think of that as a security feature, but you can't protect things you don't understand. So that's out there. It's ready to go. We have had internal review on this. We would love to have people try this out, use it. You can file PRs, log issues, join us in the DOKC Slack workspace. You know, tell us what we've missed, but this is something that we intend to build on and make it as practical and as down to earth as possible. It's kind of the inspiration for this was the OWASP security guide that a lot of people have been using for years. So that's it. Thank you very much. Good luck. If you have questions about this, you can nap me by the collar after this talk. You can connect to me on LinkedIn, send me email. Come visit us again in the DOKC Slack workspace. We're really friendly. We don't bite and we love visitors. So thank you very much. Do we have time for questions? Oh, okay. Got one back there. Everybody's on this. So this is very good. So I got a use case where I'm running multiple clusters and they all need to get the same data. And a presentation earlier this morning talked about spreading my data across all my clusters. In your security best practices here, is any of this applied to multi-cluster? Are there words of wisdom on how that should be done? Yeah, actually, I think there's two ways that that, two ways I'd point to there in the implicit in this presentation. One is I talk about encrypting things between cluster members. So if you have servers in multiple locations, they might not necessarily be processing queries, but they would be replicating data, in which case you absolutely wanna have that encrypted. Another way to share data is through object storage. So you may have multiple clusters in, multiple database clusters in different Kubernetes clusters that are talking to the same data. So in that case, protecting your access, protecting S3 or whatever object storage you're using and ensuring those credentials don't get leaked is another key practice. I would favor streaming replication anyway. Scout, it's a wonderful concept and very few of the registries do it the way I would like to see it done. Do you have a favorite registry that will scan the image and create risks so you have to sign off an application instead? We are so boring. We're just on Docker hub still, so that's why we're able to use Scout. I personally like Trivy because it's so lightweight and it's really quick. What we did find, one of the interesting things we did find is it's good to run multiple scanners because they tend to give somewhat different answers. And so that was definitely kind of a lesson learned over the past couple of years. Is Trivy something we can run in a pipeline? Yes, absolutely. Yeah, it runs right off the command line. It's the easiest to use. Trivy, it actually has to be in a Docker repo. Does your controller have support for federated identities? Oh, okay, so you mean like LDAP or something like that? Like OIDC endpoints on the cluster? We don't actually, but that is a really, really great question. So I didn't touch on this, but sometimes one of the ways to handle security is to use these underlying mechanisms like OIDC. And the key reason for that is they're consistent across a bunch of different services. This is a real problem with Kubernetes is everybody does it a little bit different. And I need to make a note on that, to touch on that in our next talk. When you talk about scanning, right? Yes. Does that happen in the operator or where does the validation for scanning happen? No, we do it through GitHub. Most of our build and test is done in GitHub. So we have it in the actions that are used to do builds. And so basically if we see like severe important CVEs will just stop the process and fix it. Oh, okay. So it's like a validating admission load. It's a validation on the build. You could do it. I mean, it's the kind of thing you want to do it at regular intervals, but as long as people are not messing with your software like you have, check sums and stuff like that, you don't really need to do it again. Does your, sorry. I was gonna say though, if you say I didn't talk about protection of Kubernetes, that's a different manner. And like say if you're running an Amazon, then you definitely wanna be running things like GuardDuty, which is a really good piece of software or Inspector. They will keep an eye on what's going on on your, you know, if somebody's getting in and messing with your Kubernetes workers, for example. So really that's a just Kubernetes thing not really handled by your operator. Exactly. It's nothing specific to databases. It's just good for protecting Kubernetes. By the way, I'm gratified to see this level of interest because whenever we do security presentations as webinars, we get like nine people show up. We talk about object storage as 150. So this is really great. So when you talked about mounting, about giving users the possibility to mount the secrets, we do that as well, but I wondered if you have any considerations about the risks that users who are able to configure your Klig House operator but are not able to see the secrets that they can basically misuse this feature to mount secrets they should not see into the Klig House deployments and then that way basically compromise these secrets. Yeah. We don't assume generally that there are different classes of users inside Kubernetes itself. What we wanna do is try and make it relatively hard for people to like say if somebody, like somebody managed to get, it's an interesting question. Just trying to think what the right answer is. Yeah, actually I need to think about that one a little bit because within Kubernetes, we do ensure that it's more or less protected. We don't want stuff just lying around where it's easy to find. Here's the thing, if you get onto the pod, you can do, you can go scan the file system if your keys are sitting there for S3 and that's the example, you would just see them. But I guess you can go, I mean you can go find the stuff that's mounted on disk and yes, you will get to it. We're just trying to make it a bit harder for people. So secrets aren't a complete protection. That's probably the fairest way to put it. And within Kubernetes, one of the things I don't trust inside Kubernetes very much is the ability to have different classes of users inside a single cluster with different levels of protection. This is not a very good answer because I haven't really thought it through but it's a really good question. I think, can we take one more, Melissa? Yeah, one more. Oh. Yeah. Thank you so much, this is great to see the level of interest. Thank you. So you spoke a lot about development best practices, how does the build correctly? You also talked about operators, how to codify some of the stuff. So how much of the end-to-end processes, code versus process versus systems, any comment on that? I think that what you wanna think about is having as much configuration of your service be infrastructure as code and stored in GitHub. There are things like, here's an example of where maybe you're not so. The stuff that lies outside that, for example, is performance settings and things like that and the actual schema and the data that you load into the database that's outside of the scope of what we're describing here. That's handled through different mechanisms. But what we're talking about here is the base system and sort of the credentials that allow you to hack into it or potentially hack into it. And then also encryption, which protects data when it's moving between locations or in locations where it might be seen by people who don't have privileges to see it. Which brings me to the part two of the question, which is the operators you talked about, are they reusable across databases because? No, every database operator is different. There's no, in fact, one of the things we did in the checklist was we didn't specify how people should protect things but what they should protect. Like, hey, default users put a password on them or protect them so that they, somebody cannot just come in and log into them without any defense against them. How you do that is up to you because it depends on the database. Thank you. Yeah, thank you. Thanks again, everybody.