 Okay, I think that's time, so let's get started. Hello everyone, my name's Grant Murphy, I'm a security architect at IBM. My contributions to OpenStack has mainly been through the OpenStack security project and I'm also a member of the vulnerability management team. So today we're going to talk about OS Query. We're going to talk about specifically trying to understand how it actually works and why it's useful for understanding the security status of a deployed host and then look at some ways that you can extend it and make it applicable within an OpenStack environment. So for those that were around in the last session, you probably know that there's a lot of monitoring events that occur in a large-scale deployment. So monitoring an environment and knowing that you've been breached is a difficult problem to solve. And I think these kind of figures are, you know, they really set, you need to have a good story basically as a software solution, which is where I think OS Query comes in. This was a project released by Facebook in 2014. It's released under a BST license and effectively exposes the operating system as a high-performance relational database. It's got over 130 tables at the moment and its support for Linux, macOS and Windows. So it's actually being used in production right now by these companies and they're also contributors to the upstream project. And I think that's really a testament to the maturity of the project and the strengths of the concept. So to kick things off, I thought I'd jump into a demonstration of what it looks like using OS Query in interactive mode. If I can figure out how to actually... Okay. So here we just have a Linux host and I'm going to fire it up in interactive mode. First thing you'll note is that it says that you're using a virtual database. So the way that the actual query interface actually works is it uses SQLite virtual tables, which call back into a high-performance native code. So the interactive console is very... for anyone that's used SQLite before is very... will feel very familiar, so you can do things like look at the tables, look at the schema of a table, and then you can execute queries. So for example, if I was to select time stamp from time, what I'd get back is the current time in the operating system and that's all executed at the time that the query is run. So to demonstrate why this is useful from a security point of view, you can probably already imagine that the ability to pull together a bunch of different tables of information and gather them together to make some insights about the system state is actually quite useful. But I've gone ahead and written an example malware. It's not really very advanced, but it basically is a remote shell that tries to... retains its presence on the system at all times, so it tries to exist between reboots. So the types of things and the behaviors that you'll see by this malware are that it connects to a strange host. Can we read that down the back? Just thumbs up? Okay. When it executes, it immediately deletes it from the file system and then starts itself as a daemon process and just basically loops around. Nothing really clever here. The way it tries to persist between reboots is to add an entry to the front table. And basically it just tries to make sure that it's there all the time. So to save me actually typing a bunch of commands here while you watch, I've written a little script that will show some of the queries that you might execute if you wanted to detect that particular malware. So this is just wrapping... Oops, wrong one. It would be the finding malware demo. Wrapping OS query I and executing query. So here I've decided to look for any remote host... Sorry, any remote address that is uniquely connected to by a process. Given that this is a dev stack instance that was running on my local machine, the address there looks a little bit... unsavory. So I might want to pull in together information from a couple of other tables and look at specific things about that process that's connecting there. So here I've done a join on the process OpenSockets table and the processes table looking for that specific address and you can see that it is in fact a shell that's connected to that remote host. Some other things that we might look for is indications of the system stuff that I was talking about earlier. So you can see there the debash command to try and reinstall the malware on that machine. And other queries that you can come up with, for example, the fact that it removed itself from having a physical presence on the file system, you could just run a query like this and set onDisk equals to false. And then we have an example of the supervisor process ensuring that it has an active connection to that remote host. But I think you will probably have the idea from there. So... But in itself, that's not really that interesting, right? I mean, you could have ran a bunch of shell commands and figured it out for yourself. What is interesting from... in a production environment is you want to notice any changes or differences in your environment as they happen. So the way you can do that is by configuring OS Query D and you set a number of queries. For example, for that particular malware, you're monitoring the CRON table and looking for any changes to that might indicate a trigger alert and a logging event. So what it enables you to do is actually look for... basically do state-based intrusion detection. And it allows you to reason about the behaviors of malware, for example, in an environment rather than... or a malicious actor in an environment rather than looking for a specific file, hashes, or whatever. In addition to that, it's batteries included. It has a lot of great features. So you can do file integrity monitoring, process auditing, socket auditing. It has support for existing... indications of compromise in the YARA format, which is a format by a virus total. And one of the best things I think about it is it's so easy to configure, install, extend. And a major head and actually call out at the Austin sub... some of the features that... any security automation provider stack might have. And I feel that OS Query goes a long way to meeting those requirements. Okay, so the first thing I'll walk through is the query schedule. So say, for whatever reason you've decided, you want to monitor any changes on a particular host for the sudo group. You want to add this query to the query schedule. So this is an example of what the OS Query configuration file looks like. So all you have to do is create a new named query in the name of this one is sudo group. Slide in your query in JSON format and specify interval for that to run. The first time that it has actually executed what will happen is OS Query D will actually look in the underlying persistence layer, which is RocksDB. And the persistence layer is really used to maintain the query state between each execution of the scheduler. So the first time that it's run, it'll look for... execute this query and anything that was in the result sets will be appended to that persistence layer. It'll also generate logging events for each of those. So here's a truncated version of a logging event that's been sort of reformatted so you can actually read it. And you can see that it's been added. I should also note that in addition to the fields here, you can add custom decorators that annotate each log entry with whatever sort of additional host information that you'd like to. So in between a logging schedule window, I'll make a change to the system and add a user to that group, the user some guy. The only event that's actually transmitted and correlated out into my logging event management system is this one here. So there's only...what we're getting in and what we're detecting is stateful changes within the system. But you might be sitting there thinking, and I'm guessing because this is a room for security professionals, that what would happen if, you know, in between the first time the query is run and the next time, I was to add a user to that group and then remove them, would the change be detected? So to solve that kind of problem, OSQuery has a particular event framework in which you can define and it basically operates and it publishes a subscriber sort of module. So a good example of that is file integrity monitoring. In that, you have a publisher which publishes events and basically that runs outside of the normal query schedule. So you would have something using, in the case of file integrity monitoring, it's using iNotify to watch specific changes on the file system and then that publishes events to the subscriber table. And the subscriber table basically acts as buffer for those events and every time when you want to append that information into your logs and make sure that that is alerted on, that will actually get you a query to the subscriber table as you'll see in a moment. So this is an example of how you would configure it. So for the file integrity monitoring, what you need to do is add, select all from file events. Basically we'll retrieve all of the file events that have been published and picked up by this file events subscriber table and that will make sure that those get logged out. And then you can configure the publisher to look at specific paths. So to see that in action, what we'll do is tail, oops, if I could type, that would be good, tail the results log here, move all of that buffered stuff out. And to see that in action, we're just going to make some changes to those files within that SSH directory. So we're going to create a couple of files, change some attributes, and then remove them. And this is going to execute well within 30 seconds. So we should see those events turn up in the log file eventually. If I'm lucky. Hopefully, oh, there we go. I was going to say hopefully we don't have to wait the whole 30 seconds for that to happen. So you can see like, you know, each of the changes to those files was picked up, the fact that they were created, and you've got additional, you know, things there. But basically that's an example of the eventing framework in action. Okay, so in the interest, like say if you wanted to target a specific set of queries for a specific set of hosts, having everything in a single configuration file is less than ideal. So the way that you can distribute in a modular fashion a certain set of queries that you want to inspect on the host is to use query packs. Now, the OS query community has a number of these already, specifically around incident response, IT compliance, and vulnerability management. Clearly, all of these are sort of operating system-based focused and, you know, examining for abnormal behavior on the operating system. So if you want to enable any of these or all of these, basically all you have to do is go ahead and just add this section to your configuration and point to a specific query pack file. And if you wanted to write your own, all you have to really do is translate a query such as this. So you've been reading the security guide and decided that one of the security checks at the end of the identity section was something that you wanted to monitor on. You might use a query like this and just wanted to see, you know, what the file permissions are of the configuration files for Keystone on a regular basis. If you want to turn that into a query pack, the way that you would do that is simply... Well, there's a couple of things to note here. Apart from just adding the query as you would for a normal thing, you want to add to the schedule the discovery sage. So the discovery sage is making sure that a query pack is actually executed when it makes sense to execute it. For example, like you don't want to run a bunch of queries about Keystone if Keystone is not installed on the system. You can also make queries platform-specific and dependent on specific versions of ByWest query. Additionally, you don't always have to... always have to do differential changes. So here I've set snapshot to true and by doing that, effectively, every time that query is run, the entire result sets will generate logging events. I said earlier that one of the best features about OS Query is how extensible it is. So it's possible to write custom extensions for how it's configured, how events are logged, and even custom tables. Most of the examples that you'll see online are actually in C++, but thankfully for the OpenStack community, there's a project called OS Query Python that makes this really, really easy. The way that extensions work effectively, they run as a separate process and interact with the OS Query D and OS Query I extension manager via a thrift interface over Unix domain sockets. So as an example, this is probably the most brain-dead example that you can give of how to write a custom table. So you can see how easy this is. I've basically had to inherit from a table plugin to find a name method that returns the name of the table that's going to show up and also the columns that are in that table. And then the generate method is actually what does all the work when a query happens. In this case, I'm returning static values, which doesn't really make a lot of sense, but there's other things that we can do that are more interesting there. So I've sort of been playing around with this a bit and I feel like this is an interesting way to look at the security state of, you know, on a host that is running an OpenStack service. So I started playing around with some of the tables that might be interesting to look at. There's a lot of opportunity to do more than that and obviously there's a lot of operating-specific sort of queries that you'll be interested in running in production. So let's take a look at some of the things that I put together. Okay, so first, one of the things I thought about, well, the first thing I should do is start, I'm going to start our query in interactive mode. So if you were using this in production and you had a lot of custom extensions, they can be automatically loaded for any time the process starts. But just for the purpose of the demo, I'm going to start these manually. So the first demo I have is just using the SD, OpenStack SDK to communicate and present that information in our query table. So here's the example table that I created for Keystone and it's pretty much the brain-dead sort of stuff that you can get from running the OpenStack command line prompt. And I mean, clearly this is not really that useful from an intrusion detection type purposes, but it is just an example of the kind of thing that you can do. So after I've started that table, we should be able to see here a bunch of Keystone tables have been added to the potential ones that we can query. And you can query them as you would expect. There you go. And I guess that is the kind of thing that you can do. The next example I had was OpenStack configurations. So one of the artifacts that come out of the OpenStack security project is security notes, and often they're based on particular configuration scenarios that might be impactful or have a security impact on a deployment. So to be able to monitor or look within your environment for specific settings is kind of useful. So if we start that one up, do the same sort of thing so that did not work. Ah, thank you. There you go. And you can see I've used a secure password to connect to my Rabbit MQ there. Pretty nice for a security presentation. But there you go. Like I say, you might be able to write specific rules to detect OSSN in your environment. And the last one is because I'm interested in vulnerability management is basically looks for the versions of Python modules that are loaded by default within the Python environment and tries to fingerprint them. It also looks at, see, you can also query OpenStack versions from the installed... that are installed on the system, so OpenStack versions. And, you know, as this is a DevStack instance, you know, you've got all the different versions that are installed. So the idea here is that you could probably take OpenStack security advisories a step further and generate rules that would be detected by the effective version range. But that's really... As far as I got with custom tables, I think there's a lot more you can do there. So that would be the next steps. Hopefully by now you sort of got an idea about some of the sort of things that you can do with OSQuery. There's a lot I haven't talked about, like the way it can be extended, the different logging mechanisms, the way it can be centrally configured. But basically what I wanted to do in the next step sort of move is look at how we can convert some of the value that the OpenStack security project create and unlock it from, you know, the documentations and the mailing list sort of work and use that to actually make it easily consumable by operators and, you know, turn up on their log management sort of dashboard when something unusual happens. So some of the ways that I think you can do that as already mentioned like the effective versions for OSSAs and maybe even looking at sort of certain behavioral indicators of compromise. Major and Hagen, I actually talked in the last session about the crown jewels so you could actually add a lot of tables around what's connected to, you know, your RabbitMQ instance, what's talking to it, what hosts are talking to it. That kind of thing would be easy to add as an additional table. But I mean, yeah, that's kind of all I was going with for that. I think it's a great platform. I don't know if I've sold it very well, but I think it provides a solid automation platform that you could use in an open stack deployment to monitor the security posture of that system. And I think it also meets the requirement of performance and it's very easy to configure and tune the performance so it doesn't actually have any impact to the control plane or any operational workloads. Okay, I'm going to stop there. If you have any questions, feel free to ask them now. I put the code up on OpenStack Barcelona on my GitHub. And yeah, thanks, otherwise.