 But that's okay, it's not like I'm personally that big a draw. You know, it's not, mostly people notice about me as my brilliance. I just say the way the light reflects off of the lack of hair on my skull. So I'll be talking today about security automation with respect to DevOps. And I guess I'll start off by giving a little about myself. I've spent a long time in IT, I've done a lot of interesting things. Probably the thing I'm best known for is something that's nowadays called Pacemaker. How many people in the audience have heard of Pacemaker or Heartbeat or Linux HA or one of those things? So I founded that project, led it for nine years, something like that. And then in 2010 I started this project, the assimilation project, in order to help people manage their computer systems better. And it's in a lot of ways a significantly more ambitious project. And we'll be talking about it from the security aspect today. Let's see what I don't remember what these slides say. So how many people think that there's good security soft and plenty easily available today? Nobody? Me neither. How about security? You think it's getting it better soon? No, me neither. Do you think you have enough staff for security to keep up with your changes at DevOps and Agile rates? I wonder why that is. So I think as you look at this, DevOps, if security is bad and going to get worse on its own, which I think is kind of the consensus, and the teams you have now have trouble keeping up what you're doing at the rate you're doing it. When you start doing it faster, as you get better at it, they're going to be farther behind. Anybody have a problem understanding that? Does that not make sense to anybody, right? It's going to be the kind of thing that's going to get worse. In other words, make a problem that's going to get worse. Get even more worse. So what we're going to be talking about today is ways that we can help keep abreast of that a little bit. So here's a couple things to think about here. 30% of our break-ins come through systems that are not on the books. We've lost track of them one way or the other. And thanks for the sound improvement. 30% of them come through systems that have been lost track of, that are not on the books. 90% of everybody out there has had failures of services that they're not monitoring. In other words, most people are not monitoring all those services. 71% of the people, once they get in compliance, a year later, ah, they're not in compliance. The things that they spent all this time and all this effort to get going and get working and get in compliance where they had everything like it's supposed to be set up, a year later, it's not so. And honestly, about 30% of people admit, own up to the fact that they really only start monitoring after they see a problem. And the interesting thing about the Turnbull statistics on here is that they're from probably the leading edge type people. People who are running things like Chef and Puppet and so on. His surveys bias that way. So if you compare it to the world at large, I think you'd have no trouble at all seeing that the real world is worse than this, that this is sort of the best case for most of these things. Ah, and 30%, I forgot about this one. 30% of all systems are actually doing nothing useful. So those of you who came in late, I'm actually standing in the back, there are technical difficulties. And not only that, I thought, you know, we're going to do a yoga thing where you stretch and stretch your neck, but only in one direction today. So these are the kind of things that are real in the world we live in today. So I'm going to talk a little, in the process here, I'm going to give about 10 minutes of talk and about 20 minutes worth of demo. Because I promised to show you how to do some things. And the way to show you is to show you. And so we have this project, this simulation project, basically creates a database describing your entire environment and then does things based on certain knowledge about what's going on. And the architecture basically has two pieces of central system, or three pieces, the central system which manages everything that's written in Python, delegates most of the work out. In fact, the thing we do best is delegation. We do nothing better than anybody else. And doing nothing scales really well. You know, we can do twice as nothing better than anybody else. So that's one of the reasons why we scale really well. And the nano, we have these things called nano probes and you'll notice that there are plenty of board jokes hiding in here. And if you have a problem with geeky board jokes, you're probably in the wrong place anyway. So the nano probes are on every machine. They're agents, that they're policy free. They're written in C, so they're lightweight, and so on. They actually run scripts to do monitoring and discovery. And they basically send and receive heartbeats and listen for some other kinds of packets that they're allowed to listen to. Those collected data, it goes into the central management authority, which then stores it in a graph database. There are significant advantages to graph databases. You can answer questions like, what all depends upon this directly or indirectly and get those answers quickly? Whereas doing that through a relational database is painful on a good day if you can do it at all. So this is kind of the architecture that underlies the demo we'll be doing here as we go forward. Most of what we're talking about is security. The Assimilation Project does security, network management, monitoring, and this sort of configuration management database type function, pardon me. And it does all of those at once. But today we'll be talking primarily about the security perspectives. So we keep a database update of all IP and MAC addresses on any subnet we have an agent on. Listening to our packets is very effective that way. For the places where we're connected on bare metal, we'll hear CDP and LODP packets and keep track of network connectivity. We also track versions of packages and keep them up to date within minutes. Everything here happens in more or less real time. Services that you're using, services that you're offering, check sums of networking space in binaries, libraries, jars, things like that. Security settings and on and on and on. Basically we can discover anything. And so we try and discover as much as we can and then do work based on that. And we're gonna be talking about the security work today. And then we do things like, for example, in the case of your, in the case of your, all the security things, we have ways of comparing what you're doing to best practices and provides a risk scoring. The advantage of risk, the disadvantage of risk scoring is nobody ever will agree on what scoring method we use. The advantages are not very good scores, much better than no score at all. So we opted for the let's do a passively good job at scoring things. And the purpose of this, of the scoring is to help you track how you're doing over time to see whether you're actually making progress and if so how and help you manage that process better. We'll talk some more about how the scoring helps you triage through these processes as well. We find unknown IP addresses, we monitor services automatically, we can tell you unmonitored services just on the basis of a query. Most people, if they ask them, what all services are you offering but not monitoring? They would tell you, well, I don't know. And they'd say, I'll find out and three or four days later or three or four weeks later, they can tell you, depends on the place, right? But we can tell you that in seconds. And we can help you triage these security related risk scores. And we have ways of helping you do that. We'll show you what that is. And the best practice analysis, one of the interesting things, sometimes people run best practice analysis and they run them maybe quarterly, maybe they sample them. That's what audits do, they sample this. We don't sample it, we measure it on every machine that you put us on. Analysis occurs when the seconds have changed. We don't look at it quarterly, annually, daily or weekly. We look at it when it changes. If it doesn't change, we don't look at it. If it changes, we look at it right away. So within seconds of you making a change, you can come back to the person and say, what were you thinking? What was up with that? And they can still remember what they did. I mean, the problem usually is like, you find out a month later or three months later and you say, who made that change? And they don't remember making the change or at least they don't own up to it. You know, not that that would happen at your organization. I've certainly seen it happen in places that I worked at. And we can analyze anything you discover and write rules on anything you discover. They don't have to be security rules, but security is the thing with the biggest problem. That's why I'm talking primarily about security. Discover anything you want, write rules on whatever you want. And we have various alerts and reports, ways of generating, letting you know right away that these things have happened. You know, basically APIs for that kind of thing. And we're gonna do a demo. And the demo here is basically what I'll do is wipe out the database on this machine and start over. And nothing at all will be configured here. Everything is, one of the advantages of discovering everything is there's hardly anything you have to configure at all. So, you know, why should you tell me what I already know more about than you do? So those are the kinds of, that's the kind of things that we do here. By the way, for those of you who are watching this in the livestream, the reason why I'm not facing the camera is because I'm back in the back corner. And, fortunately, he turned off this speaker, otherwise we'd have massive feedback. So if you wonder why I never faced the camera, that's why. So, except occasionally. So, we're gonna discover everything. We'll tell you what needs hardening. We'll show you how to triage these hardening issues. How to demonstrate and track progress. All these things are on the screen here. And as a bonus, that I didn't promise in the thing, we'll tell you who has what package and what version. At least if the demo things go better for me than getting hooked up to this video system did. There we go. And that's, I know it's readable in the back because I'm in the back and I can read it. So, let's start here. I'll explain what the demo is doing as we go forward. Basically, this is a, you know, command line demo, right? So, oh yeah, passwords. Funny thing about that. Let me sit down and type. I'm gonna have to sit down to type. My apologies for that. Nothing I can do about it. I can't type on this thing down here while standing up. Probably more interesting is the screen anyway for those video people. So, oh, it's already running. Let me stop the demo then. I ran the demo earlier to make sure it was gonna work here in this environment. So, I'll go back to the demo. My cleverly named demo script. So, what this does, it erases the database. You can see up there, it says, a C made up PYRHDB foreground. So, it's gonna run more or less the, basically the purpose of running it sort of in the foreground is to get some debug output. So, now it also, it starts up in Nanoprobe, which is the agent. And Nanoprobe discovers where the server is. Now, discovery begins to come in. The central system requests the discovery and then all of the discovery begins to happen. And oh, look here, we've failed some rules. It looks like we're missing some stuff off the left on some of these. But basically, you can see we've got something that pass and some fail. The pass ones say SS and the fail ones say IL. And we've failed a number of different things. We've passed a number of different things. These are based on the NIST or DISO or NSA security rules. The ones I used as a basis here are the Red Hat 5 ones. And I've implemented about 70 of the 250, maybe 80 of the 250 rules now. And it fails about half of them. So, what you expect, hello, I didn't lose my mic. He just changed something. Expect at the end is probably your systems to fail about 100 rules. If you have 1,000 systems and they each fail 100 rules, that means you have 100,000 problems to fix. Anybody think that's depressing? Yeah, okay, the guys here who can see me, they think it's depressing. My friends here who can see me. I don't know them, but they're my friends now. So, what we've done here, we've failed a bunch of rules. Bunch more security. Stuff has come in, we failed and passed some more rules. And this one, which one is this one? If you notice something that happened here, it says we're updating checksum data for 202 files. What you didn't see before was I request checksum data on about 80 files. And the reason why it comes back with checksum data for 202 is it figured out what libraries, all of those things we're using, and also perform checksums on those as well. Everything here is discovered. And I figured out what to do checksums on based on what's running. In other words, what processes are talking on the network. We do checksums on all the ones we're talking on the network. And as you can see here, we've failed the number four of the past controls. One of the things I did there was tail the system logs, that tail the system logs here. And if you look up here, you can see where, I didn't mean to scroll that, but I did, didn't I? Well, anyway, you see where it says things like, a warning, failed security rule, and this something which wraps around it that you can't see all of it, and it gives a URL. That URL points to the IT best practices project, where you can see an explanation of what that means. Let's see if we can bring that up here. And I've already gone to that website because it turns out to be hard to type here at the moment anyway. Let's see if I bring that. So that's not really, let me make that bigger. Maybe. Well, I can't make it much bigger. I can't see my mouth. Oh, there it is. So let's make this bigger. You can't read the stuff on the left, but you get the idea. What we have here is a short, the severity is medium, the short description. It says we should allow ICP before redirects by default, and then the long description says the same thing. And if you go on it, it says how to check the configuration and how to fix it. So for each of the rules you violated here, here's the explanation of what it is and how to fix it, okay? So does anybody think knowing what these things mean helps? The interesting thing is I probably hear the noise the loudest of everybody in the entire room because I'm right next to the people here. Maybe I should go back there and tell them to be quiet myself. Maybe that would work. Hey, I'm the speaker, shut up. But I don't have to hear me. I know what I'm gonna say, right? So let's go back to the demo. The point is every single one of these rules goes to the IT best practices project where the URLs do, where it will bring up the explanations for you as you try and fix these things, right? So is this, where is, oh that's, I passed it, didn't I, yeah, done, done. There we go. So this is, these are the kind of things you see here. And I'm gonna, I'm not sure whether this is a full screen thing or not. Let me figure out what's going on here. Oh, it's taken away all of my, let me close this. Okay, yeah, that's better. Now I can at least tell what's going on. It looks like there's a couple lines at the bottom of the screen which are not showing there. I'll see if I can raise this up a little. No, I didn't mean to make it larger. I just meant to raise it up. Don't you hate that? Yeah, if you can zoom out. What's up with this? Check sensors. This is not, oh, what happened here? I stopped the demo. Oops, didn't mean to do that. One way, the intent is, oh, I see what's the problem. I'm on, I'm looking at the wrong screen. This one I've got to get rid of because it's confusing me. So let's do a few demo things, right? So, so let's bring this over here where we can see it. I think that's this way. So this is not very clear and I apologize for that. But what this is, because we know the services you're offering and the ports and IP addresses they're on, I have to explain to you what this is because I can't possibly read it from the back so my apologies to you in the back but I can't read it in the back either but at least I know what it says. At the top is this house thing, that's your server. And the reason why it's red is it's violated 31 different best practices. This is actually off of my machine at home. And on the left and down below that are the services. Now, if you look at them, if you look at them there, there's one on the left which is, let me get a picture where I can see it better actually. On the left is Dropbox which is providing service on three different IP port combinations and going offsite which is the box above it on the left offsite. So these are the points of attack that your attacker can use to get at your machine. Next to it is the Neo4j graph database. And it is listing on three IP port combinations here as well because this machine has three IP addresses on it at this time. So, and next to that are some other, yeah this is really washed out over here isn't it? DNS Masquerade, SSHD, RPC blind and I don't remember what the one on the right is and I can't read it. My apologies for this. But the point is it will create this graph for you and then from this you can look at it and say, oh, what are we running on here? Where are we vulnerable? Have an idea of what this is like. The things that are, it's much clearer on a piece of paper and by the way I have copies of this at the sticker table. So if you wanna look at this and actually see it it's at the, and two other graphs are at the sticker table. Those that are in red are services which are running as real user ID route. So they're potentially more interesting to you from a security perspective. There are also indications on here which I won't go into because we're running a little behind of what services are being monitored which ones are not because we know that as I mentioned before. But we're not talking primarily about monitoring primarily about security. Which of course availability is part of security but it's not usually what people talk about. So let me go back here and see if I can get this guy. It came to the front. Amazing. So can you shrink me down a little more there? I don't know if I can. Is there a prompt at the bottom? Yeah, there's a prompt. Oh, there's a prompt. Okay, I'm good. So let me find something here and cut and paste it onto that. I did the demo drawing. So let's do a little, I'm gonna do some cut and paste demo here, right? So what we have next is, let's see if I can control C over here. Alt, control, V. Yeah, okay. So this is going to produce a list of, this is the total security score of this machine. This machine is 57. The picture I drew the other day was not. How are we doing on time? You're getting close. Okay, well I'm just, yeah, okay. So more interestingly probably is looking at, that's just the overall score of how the system is doing right now. That's the kind of thing that you can plot over time and help your management understand how you're doing better or worse over time and help them understand and justify, you know, the resources and the time and effort that you're putting into things. Let's do another, I can't see. So this one, what this does, what this is here, this is your top 10 areas to go after. Let me explain what they are. So when you're triaging things, you want to minimize your own personal context switch. That is to say, you don't want to hop from fixing a proxess problem to a PAM problem to a login dot desk problem. You want to work in one area so that you get your brain wrapped around it. So what we've done here is, it says in this case 16 of our score is due to proxess errors. And then on the right are the different, to the right of that are the different rules, the ones I mentioned before. And then to the right of that is the score that that particular rule contributed to your entire infrastructure. So basically what you want to do is start from the top and work down here. And as you do, you'll solve all the proxess problems and then you'll go to all the PAM problems because context switching your brain back and forth to figuring out one, to figuring out the other. And for example, going in and modifying the file, it's easier to do this way. And the purpose of this sorting in this way is to produce, to help you understand how to triage this. Because if you have 100,000 problems, you can't solve them one at a time. Make sense? You have to solve them in bulk. So you go into here, you go into your set chef or a puppet or whatever you've got. You after these rules, figure out what they are, follow the directions, learn about how proxess works, learn about these things, fix the ones you're comfortable fixing and then go on to the next ones. But this will then keep you in one mindset without switching back and forth and doing it by the most important first. So this is the overall score. We have not just security scores, but networking scores. And this is the overall score of everything. I think the other one was actually server by server. This one is across everything. I think I misspoke earlier. So, oh yeah, let's do this. This is sort of a bonus, as I said. I'm not sure what I skipped here, but this is what I'm gonna do next anyway. Okay, so this is all the packages on your machine. There are similar ones to look for particular packages. And you notice that it's not just, these are PIP packages. It also picks up all the other kinds of packages you have on the machine. Dabs, RPMs, PIPs, RubyGems, PHPs and node.js NPMs. And I don't have, he's telling me I'm out of time here. I apologize for all of this stuff. But the point of all of this is that this is a tool which can help you understand how to triage your systems into a better security posture. And help you understand it and draw pictures of it too. And as I said, the pictures and stickers, lots of stickers are out by the sticker table. So thanks a lot for your time.