 All right, let's get started. Hi everyone. So we're going to be talking about deploying Kubernetes in a secureish way. And we're going to be talking about the under the Kubernetes stuff and the Kubernetes install itself. So if you're here to learn about how to run application securely on Kubernetes, this isn't that talk. And so if it's not what you are here for, feel free to leave. We won't be offended. My name is Paul Tchaikovsky. I am a dev advocate at Pivotal. And I talk about Kubernetes and DevOps and operations and all sorts of stuff. Yeah. So my name is Major Hayden. I work at Rackspace. And a lot of my focus is around deploying OpenStack Clouds and deploying Kubernetes on top of it and trying to do all that in a secure documented way. And so we'll kick this off. This is what it feels like when you go to KubeCon. You kind of feel like, man, the sun is rising. The fog is starting to clear. Maybe the fog is a bad outage or operational problems or things like that. But you just have that feeling of euphoria. Things are maybe going to get better. How many went and saw Kelsey's keynote this morning? I felt like things were awesome after that. I wasn't entirely sure what he did, but it was really exciting. And so this is what it's like when you're back at the office and you talk to your friends and coworkers about what you saw at KubeCon. You're like, man, you'd be amazed by this. Look at all this cool technology we've got. We can deploy our applications in a better way than we did before. It's really exciting. And then once you decide, hey, maybe we should try to use some of that internally, then you end up having a conversation with your corporate security team. And it looks a little bit more like this. So there's no natural lighting in this room. How many people like Breaking Bad? All right, it's a really fun show. But yeah, I know when you go and talk to the corporate security team, all the excitement and everything from the way you provision, the way you operate, everything kind of falls flat because the security team is not interested in what you're talking about. Do we have anyone from the corporate security team here? All right, cool. I'm a former corporate security team member myself. So I know how this feels. I've been on both sides of that table. So really what it comes down to is enterprise security teams care about a small subset of things. And if you boil it down, they want changes that are not gonna get in the way, have some kind of value that are well documented that they can actually go back and audit or you can audit and provide proof of compliance and that are easily understood. When you go into a corporate security team and say, oh, we've got this replication controller that takes care of this, this, and that, they don't hear any of that. But if you say, hey, you're concerned about availability, right? The security team says, yes, it must be up all the time. Okay, well we have a system that makes sure we have X amount of copies available at all times. And here's how our networking is set up. You kind of have to learn to speak their language. And so really what you wanna do is find a way to get here. So if you're already doing DevOps and you're looking at automating your infrastructure, whether that's Kubernetes or OpenStack or Cloud Foundry or whatever have you, you wanna balance that with security. You wanna find some way to make that relatable for your security team. And so to do this, you have to push back on your security team a little bit and ask for what the guardrails are. So a lot of times when you have those interactions with your security team, it feels like you hit a roadblock. Like they're just constantly throwing things up that you have to jump over or break through. But if you've ever heard of that concept of managing your manager, help your manager understand what you need, it's the same way with a corporate security team and say, hey look, well you're giving me a roadblocks, give me guardrails, show me where I cannot go, tell me where the edges are, and then I'll make sure I stay in the middle of that path. You'll find over time that most corporate security teams will start to expand those guardrails out a little bit. And so a quick public service announcement always enable Linux security modules and your container deployments. I went to a talk yesterday and someone said, App Armor was giving us trouble so we turned it off. Don't. So seriously, stop disabling SE Linux. If you go to stopdisablingselinux.com there's some more information you can review there. I'm surprised you're not wearing your set in force shirt. No, I didn't, I didn't wear the shirt today. I've worn it so many times, it's gotten pretty ragged. So luckily there's a lot of tools that can help with these challenges. And one of the ones that we both use on a regular basis is Ansible. So how many people have used Ansible in here before? Oh fantastic, this can be a review for some of us. So we'll go pretty quick. So Ansible does everything from deploying your software, restarting services, installing packages, running commands, changing config files. You name it, it'll take care of it. If you've never seen Ansible before, I'll explain it in three bullets. You have tasks, they each do one thing. Maybe change a config file, restart a service, clone something from Git. Then you take all those tasks and you group them together in a roll and you say, hey, I have a web server and a web server has these set of tasks. And then finally you say, okay, well I'm gonna write a playbook that deploys my web servers and then my database servers and then maybe something else. And so you group those rolls together in a playbook. And so that's essentially Ansible explained in a quick minute. And why we love Ansible is that it's simple. Everything's written in YAML. It's very easy to read top down. Even for security teams, they can take that apart and understand it. There's not a whole lot of code and markup and craziness to understand. The inventory system is very easy. If you can understand flat files or JSON, you should be fine. It's also very versatile. You can use it on containers. You can use it in VMs. You can use it to make your containers. You don't need any daemons installing your nodes. There's not a security concern around, as soon as you go to your corporate security team and say, I'm gonna run a daemon on every node, you'll watch them start to get nervous. They wanna know how it's configured who has access to it. And then finally it's repeatable. So you can run a playbook over and over and over again and you'll get the same results. So this is also handy when your corporate security team comes knocking and they say, hey, are you doing all the things you said you were doing? You could just run the playbook right in front of them and say, see, nothing changed. Everything is applied. And so auditing becomes very easy. It also has an auditing system and like a dry run capability so that maybe you come to a new system and you wanna know what Ansible's gonna do on it. You can just run it in check mode. And Ansible will say, hey look, you asked me to do 50 things. 40 of these are already complete or I don't need to take action but these 10 I need to take action on. Yeah, so this is what a very basic Ansible playbook looks like and this is kind of a pattern that we use a lot. It's basically install a package, run a template to configure it and then make sure it's started or restarted if that template changes. And we've done a ton of Ansible and we've done a lot of complicated, stupid things with Ansible and we keep finding ourselves trying to simplify it back to this and making it super readable, super easy for anyone to pick up and understand. You know, even like the knock team at like 3 a.m. if they can read, understand and run a playbook, they don't have to wake me up because I thought I was being smart and added in a bunch of logic that wasn't really necessary. So this kind of pattern of these three tasks like per thing you're deploying is super powerful and you don't normally need to do a ton more. And then you can do other stuff. You can, what am I doing here? Oh yeah, you can configure Cisco switches and other switches, which is really cool, right? So you can work with your security team and actually like, I need to change into this VLAN and set my MTU down so I can pixie boot or we can actually do that straight from Ansible and then it just works and we're all happy. You can deploy cloud instances. So this is deploying some Google compute instances and it's using variables that are pulled from an inventory which is like a big yaml dictionary of values and settings. So I can like, this will actually loop over like an array and like I can say I want 15 VMs with these settings and it will go and do that. This is an instance where you start to add a little bit of complexity to Ansible and you maybe lose a little bit of readability but you get that extra functionality that makes it worth it. And then you can do really complex stuff like pixie booting servers and it will go through and do like your IPMI commands to reboot, drop files in the right place, delete them once it's rebooted and all that sort of stuff. So you can get very hands off from like doing pretty much anything you need to deploy bare metal cloud from the infrastructure to the network to the software. You can basically do it all with Ansible. And then there's Ansible Tower is, I think they have an open source version now but it's a commercial product from Red Hat. I don't use it myself but for large orgs that need a little bit of extra access control and also maybe you need a knock that has buttons that they can press to run tasks and you can also join several playbooks together and several roles together using it in like a pipeline workflow and stuff like that. And it has a bunch of like visibility into what's going on. So a lot of people do find it useful. And so we've talked a little bit about Ansible and kind of given an overview and some of the things that you could do or even give your security team access to run for you. So one of these things that we've worked hard on is Ansible hardening. And so it's Ansible role that you can go and apply to any server. It got its start within the realm of open stack. And then we asked the question of why couldn't we just do this on any host? Then we expanded into that. And then finally someone actually came to the IRC channel and says, well, have you done this on a Kubernetes host? And we said, well, shoot, we haven't done that. And so we went and tried it. And after a couple of tweaks, it actually worked fine there as well. So we've had a couple of folks who have come by and used projects like CubeSpray, which we'll get to in a little bit. And then they've said, hey, I've just run Ansible hardening with that and it just worked. So what you get from Ansible hardening is it takes the security technology or why can't I, security technical implementation guides dig from the federal government that provides you a whole bunch of security hardening configurations for your system. And it applies that. However, it does it in a more sensible way where the changes that would cause significant problems within your environment are disabled and you can go and enable them if you need to. Or for example, if a 15 minute timeout is too tight for you, you could expand it. Or if you wanna tighten it down to five minutes, you could. So certain things like disabling IPv6 in my Kubernetes and open stack environment, that would cause problems. So that is disabled by default. And so you can actually go and apply this. If you use Ansible on a regular basis, we are looking for contributors. So feel free to jump in. Yeah, and then so while Major was working on that, I was kind of working on the other side using this tool called Inspec, which it's a fairly simple DSL that lets you describe how a system should look. And then it goes and checks if that system looks the way it wants. So like the Stig says that Etsy password shouldn't be writeable by anyone but root. And so you can say to Inspec, hey, make sure Etsy password is only writeable by root. And it will do that and it will validate it and it will say everything's good or it'll say everything's bad. And the reason I got into Inspec was when I joined IBM through acquisition, we worked with the security team. And this is how we would check that a system matched the IBM compliance guidelines. We had these Excel sheets that were like thousands upon thousands of lines of settings and people would go through when we'd like log into every server and go through this checklist in Excel and make sure they're all done. And I kind of, I made fun of them for a little bit. And then I realized that actually my job there was to help them. So I had to stop making fun of them and actually help them. And that's where I got into Inspec. I'm like, hey guys, this is pretty cool. We can take all of your rules in these Excel spreadsheets and we can write them in a fairly simple DSL which looks like this. And then our monitoring software sense who can run this like once a day or once an hour or whoever often. And then you'll actually wake me up when my servers go out of security compliance and I'll fix it or as well we'll fix it actually. And so we worked with them a lot and actually the security team basically took on all of that work and wrote all the Inspec rules and ended up being a really good, strong partner with me which I was super happy about. And then installing Stig rules, not Stig rules, Inspec rules with Ansible is super easy. You can just kind of get clone a repo of the Stig rules. And we actually have up on a Git repo all of the rules to order basically your Stig settings. And then so when we got to this point I sort of saw the security hardening bit and we're like, oh, we can use security hardening combined with this, combined with our own playbooks and suddenly we have a method to secure our servers and also a method to audit our servers. And yeah, so we had sense who was monitoring it so we'd get alerts. And also because we had a centralized logging service like once a day we had a report in our logging service of what all servers were compliant and what weren't. And so like all of that auditing and compliance and auditing and checking just magically became automated and saved a ton of time and allowed those folks that were keyboard warriors before to actually work on things who were adding a lot more value than just manually checking files on servers. And then another thing we worked on was called Cuddle which was originally called Site Controller. And that was basically all the other bits to give us a fairly secure environment, fairly secure infrastructure. So it was a SSH bastion that had two-factor authentication and some role-based access. It gave us an OAuth web portal and it gave us centralized logging, monitoring and a bunch of other stuff. We open sourced it and it's actually just a big monolithic Ansible repo that installs sent to LogStash, sets up the bastion and that kind of stuff. And it kind of looks like this. I don't really need to go through the architecture but you can just kind of run it locally or it can spread out and you can VPN between and it forms like a mesh and you have like one central dashboard to go across maybe 30 data centers which is what we had. It was super useful because it meant the operations team, the security team doing the auditing just went to one place and they could see all of our infrastructure spread across 30 data centers and they had one bastion host they needed to remember that could get them to any SSH into any server. They had one web host they had to remember that they could get to the logs or monitoring or whatever for any of the servers. And that's what the dashboard looked like. Listen to the data centers and the services that were enabled to look up there and then it had Uchihua for sent to, it had Grafana, it had net data for each server, a bunch of that sort of stuff. The bastion so had has two factor authentication for SSH either UB keys pressing that button or the Google authenticator. And it has a couple of tools that we open source called SSH agent proxy and TTY spy. And they basically TTY spy is if you do is if you were doing like script and then piped it to like curl bash and we're sending that off to another thing. So anytime anyone logged into this machine it just logged their entire console internet like input and output and shipped it off. And because that was their entry point even when they SSHed into a different server that recording still happened and went through. And then SSH agent would basically give them, it would fake an SSH agent and give them access to a key to SSH to try the machine without actually letting them see or edit or change that key. So it meant that once the bastion knew about them and it knew what servers they were allowed to go to it gave them the keys in their agent to get to those servers. And then from then on you could have a shared username amongst them and all the auditing happened right there. We could see what individual user was doing what and we didn't have to do a ton of crazy user management on every single server. And so I kind of touched on it earlier but you can also use stuff like Ansible hardening or your own Ansible roles along with CubeSpray. So CubeSpray will deploy Kubernetes for you with various networking components, various amounts of hosts. It's very easy to set up your inventory and get going. And then you can actually apply security hardening before and after to be able to prove to your security team that you have a deployment that's meeting certain standards. And so I think it's still an incubation status right now but you can use this in production. It has upgrades built in, HA, all that kind of thing. Yeah and so basically we took all of these parts and we stitched them together and you run like three or four playbooks in a row and you basically end up with a Kubernetes infrastructure that has like most of the primitives, if not all of the primitives that your security team wants to see that you have. So you run Cuddle initially to save a bastion and you're logging and monitoring hosts. Then you run Cuddle again on your actual hosts that will be Kubernetes. And that will set up the users and stuff that you need. It'll set up the sent to client, logstash client and all those bits and pieces interact with Cuddle. And then you run Kube Spray to get Kubernetes running and then you run security hardening and hardens it. And I actually did a bunch of experimentation with hardening and Kube Spray and found if I ran hardening before, then ran Kube Spray and then ran hardening again, it didn't change anything that second time. So I had a very high amount of confidence that the two playbooks weren't gonna fight with each other and like one set, one setting, one change it and kind of go like that. And so that was really cool. So I think you must have already worked on that to make that work, right? Yeah, we've handled a few bugs within Ansible Hardening to make sure that Kube Spray won't be affected. So basically that gave us a pretty secure infrastructure. The security team was happy with it because it was the same set of tools that we'd both used to do OpenStack that our security teams were happy with. But we haven't really touched on how we secure stuff on top of Kubernetes and that was kind of on purpose because we only have 25 minutes or whatever to talk. But there's a lot of stuff still to consider. What did your build pipeline look like? Where are you gonna be hosting your containers? How do you make sure that the image you want is actually the image you built and image scanning, all that sort of stuff, secret management, authentication to Kubernetes that you're gonna do like, Kelsey was talking earlier, do you give each person or each group their own Kubernetes or do you wanna go down the R-back route? So tons of other considerations to still think about and to work with the security team about. But what we found is that by going to the security team and bringing these tools to them, they're not necessarily familiar with the sort of tools we use in the, I guess the DevOps space, right? Cause they have different concept cares and concerns. And so we bring these tools to them and we say, hey, we can help solve the problems that you're trying to solve using the tools that we know. And then as Major said earlier, it becomes a discussion about the guardrails versus a can we know, can we know kind of conversation which nobody really enjoys. Like we don't enjoy being told no, security team usually doesn't enjoy saying no. And so you join the conversation together and you get everyone involved in that kind of DevOps mentality of like, let's just improve everything together. Some people would like to call that like DevSecOps and SecDevOps and stuff, but really just DevOps itself is kind of describing that everyone working together to improve like the, what we're doing with the business, improve the culture and stuff like that. And with that, that's the end of the presentation. We thank everyone for coming. If you have questions, we've got a little bit of time. So the question was, how are we handling the multi-factor for SSH? Yeah, so you can do UB Keys and we have a little service that runs so it makes it syncs between the, if you have more than one bastion and in your Ansible inventory, you have like the UB key ID and a couple of other pieces of information per user. And every time you run Ansible, it just makes sure those are set. And then we also have, if using the Google Authenticator, they have a PAM module. So it installs that PAM module. And again, you have like a Google Authenticator ID and a couple of other pieces of information that it needs. And it just runs that on every, drops that on every bastion server. And then the SSH AgentMux tool has a small little SQL-like database that tracks the users and what groups they have and gives them a key based on what group they're in so that they can then get onto the other servers. Very good questions. Okay, so the first comment was around, you can get Ansible Tower for free as AWX. So you can go download that today. And the second part was Ansible Hardening, why it wasn't part of the mind point. So we originally had a discussion in the beginning. So our first foray was into translating the Stig for Ubuntu. So that was the first thing we went to because the finding, well, there's now a Stig for Ubuntu. It still needs a little bit more work. It's getting there. But at the time there was not one. And so all of our deployments internally at Rackspace were on Ubuntu. So we had to kind of translate that over to Ubuntu. That's where we went first. And then we kind of came back to CentOS and Red Hat after that. And then it expanded to SUSE and then Debian and then Fedora and we expand further. So I think the mind point one has been very focused on RHEL and CentOS, which totally makes sense because it's what Stig's designed for. But we had some different usage requirements. So we support all those OSes right now from the same repo. Yeah, that's right. Inspec is written in Ruby. Yeah, so we were using service spec initially and then Inspec came along and it was much stronger DSL. And the DSL is Ruby-ish, but you don't really need to know Ruby to do it unless you're gonna start doing loops and stuff like that. And we didn't have any issues with anyone from our security team learning enough to write all of the rules for Stig and then all of the other IBM related compliance that we had to make sure we were having. So it was a little bit of a learning curve, but it was not a barrier to entry for anyone to actually work with it. And then Ansible has some options too for like slurping config files and then asserting that certain things are in there or maybe checking directories and things like that. So there's some auditing capabilities that you have there as well. Yeah, and we actually could have done it with Ansible, but we wanted to have a little bit of separation between the tool that writes the configs and the tool that checks the configs because I might have a bug in Ansible that gets it wrong. And if it gets it wrong putting it down, it'll probably get it wrong checking it. And so that was why we specifically went for something that wasn't Ansible. And from the five minutes I spent researching online, InSpec seemed to be the one to do. Anything else about the audit tools? Okay, so yeah, the question was around what are the audit tools that we're using? So we talked a little bit about using Ansible kind of as an auditing tool. It works okay for that. And then we talked about InSpec as another auditing tool. Of course on the system itself that you have audit D as well. So you can audit SysCalls and things like that. And security hardening turns audit D and some of those Linux auditing tools on by default. And if you've got a central log service and if you're uncartled it will set up a log-forwarder to send all of those logs to your elk service or whatever your centralized log service is. And so that way from verifying the side of things, you've got your monitoring is gonna alert you as soon as something goes out of spec and then you've got your elk logs that actually say everything, like everything that InSpec ran came up good or didn't at a particular time or date. And so we would just basically, at first we were just taking a screenshot of logs and then eventually wrote some elastic search queries to be a little bit smarter about that. And we were just dumping them to a file system so that they could give them to, like if someone said, oh, we need to check a server from this date, like a random spot check which they would do, we could just give them like a document that said everything's good rather than having to send them off to an elastic search or Kibana dashboard. And so auditing can also be helpful. How many folks in here have used or tried SetComp at some point? All right, yeah. So all of us have great hair from that experience. So the scary part about SetComp is that you might actually block something that you need and then you have a container that's not working or an application that's not working. So one of the things I generally prefer to do is have AuditD audit the SysCalls that I know are gonna be problematic for me that I might go and block with SetComp and have AuditD audit them for a while and trigger alerts on that. And then if we deploy an application and I don't see an alert on that SysCall for a week, I'm like, okay, well then maybe we could add that and just remove that capability from the, not capability, sorry, remove that SysCall from the container. So how do you manage SSH key configurations? So it's not included in here because it's not included in Ansible hardening, I should say, because it's not a STIG requirement, but you can use Ansible to do that as well. Ansible can actually place the new key and then tear the old one off right after. Yeah, so we did exactly that with Cuddle, right? So you had a list of all your users and their public keys and it would drop them in. So when they're logging in, they would have their public key and their second factor. And then you can simply look at the Git change log and some other stuff to figure out if it's too old or not and say, okay, everyone time to roll us up some new keys and they would just make full request to the Ansible to say, here's my new key. And so we didn't have to, it was fairly self-service. I mean, they couldn't run Ansible because only a few people could run Ansible across the Bastions, but anyone could at any point update their key and say, hey, can you just roll the playbook for me so my new key shows up. And then the same across all of the servers. So we had basically one or two access keys for all of our servers being managed by Cuddle because we had the RBAC at the Bastion. And so it was a very simple, drop down some new keys in the Bastion, put the public keys and authorized files on every other server and just run Ansible and it would just run it all out everywhere. And depending on how many servers we were doing it on, it would take like five minutes to 20, 30 minutes. Yeah, make sure you always add the new key first before you remove the old one. It seems like that would be self-explanatory. It's not always. All right, anything else? Oh, okay, so the comment was using SSH keys is kind of the old way. And then we've got 509 should get support and why don't we use that? Or 509 ish, sorry, yeah, I'm gonna be specific. I would definitely be up for it. I just think some of the challenges I've seen in the past were like, you got to make sure your revocations on point and all that kind of stuff. I know there's some technology to help with that. Yeah, but our repo for Cuddle is open source and we have the links in the slides which will be up on their thing. So if someone wants to contribute that, we would be happy to take it. All right, that's not anything else. Thank y'all very much. Yeah, thanks all for coming.