 Good afternoon. My name is Andrew Gavin and I'm here to talk to you about a tool I wrote about a year ago and I've been updating ever since. It's called open DLP and how you can use that to steal sensitive data from thousands of systems in less than an hour. So just the standard disclaimer, I'm here just representing myself, even though I work for Verizon business, they have nothing to do with the tool, nothing to do with the presentation. And also if you use it when you get in trouble, not my fault. So my outline here, I'm going to talk about what open DLP is for those of you who aren't familiar. This is, by the way, building on a presentation, I gave it to Smucon earlier this year. My reasons for writing it, how the agent portion works. I'll show benchmarks between the agent and the agent list scanner and you can see the drastic speed improvements that an agent offers. I'll give a live demo of the agent and also live demo of some new features. I've got four demos lined up. I plan on flying through these slides because I've got a lot, I don't know much time, but I do have quite a few slides and I hate slides, I like demos. And then at the end we'll do a con mile, show my contact info and we can have a few minutes for Q&A. So what is open DLP? For those of you don't know, it is a data discovery tool and there are two components to it. There's a web app that kind of controls everything and that's on the lamp stack. So Apache, MySQL and Perl and there's a Windows agent that runs on obviously Microsoft Windows. And it's open source, released under the GPL version 3 and it is useful for compliance people. So if you're like a PCI guy, you want to find out where your PCI data is, you want to use this. It's also good for proactive network and system administrators because we all know they are proactive, right? And then finally the coolest thing what I do, I'm a pentester so I really wrote this for myself and I write this, I use this after I get domain admin and then I just let this thing rip on the entire network and it's pretty cool. So what was my reason for writing it? Well there really was no free agent based solution last year when I started this and the only solutions were really GUIs that you could run on your desktop like Cornell Spider and you could hack those to be an agentless scanner where you would do a net use to the remote hard drive and mount it locally. But as you'll see with the benchmarks, it's not really ideal for a very, very large deployment. It's going to be very, very slow. So how does it work for the agent based scans? How do you get it going? Well the first thing you want to do is create a policy and this policy is going to be reusable. You're going to have your administrative credentials because the agent runs as a service and you need to be an admin on the box to install a service. And then you can do other things like whitelist and blacklist files and directories. And then you want to configure your regular expressions that you're going to use. It uses PCREs I assume we're all familiar with that here. And then a few other things that I'll show. Then you're going to start a scan and you're going to, it's going to be deployed over SMB and it's going to get kicked off by the WinNXE program which is like the Linux PS exec and it can concurrently deploy the scanners up to as many as you want in parallel. So instead of just sending out one at a time, you can send out maybe 30 or 50 at a time just to get it going faster. Now when the agent is running on the Windows box, it's going to run as a service as I said, but it'll run at low priority. So no one's really going to see or feel it. There's not going to be a little pop up gooey box or nothing in the system icon tray or anything like that. It's also going to limit itself to a percent of memory. So if you want to scan some huge 10 gig file and the Windows box only has a gig and try to load that 10 gig file in the one gig of memory, bad idea. So what it will do is it will chop up that large file into smaller chunks that's defined as a percent of system memory. So like 10 percent of system memory or 20 percent or whatever you decide to use. Finally when it's done, well it's going to scan, it's going to go through the whitelist and blacklist and then scan the resulting files. And then every so often it's going to ping back to your web app with results. And it'll give a little status updates and stuff. And this is done securely. It's over a two-way Trusted SSL connection. So if someone tries to man in the middle of it, it's not going to do anything. It's written in pure C. There's no .NET requirements. So if you want to run this on an old Windows 2000 or XP box that doesn't come by default with .NET, it's still going to work. And finally when it's all done, it's going to uninstall itself automatically as a service. It's going to delete its directory completely. Really the only way that you notice it was there is by looking at the logs. And certainly 99 percent of the Windows users won't even notice it was there in the first place. In the web app you can monitor the agents and as I said before it's going to ping with results every so often. And you can see how many files and bytes it's processed. You can control the agents, pause, stop, uninstall, resume the agents. And you can also view the results live as they're coming in. If you see a finding, you can download that file just to verify if it's actually there. There'll be a little hyperlink there and it'll tell you the byte offset inside the file where it thinks it found whatever regular expression. Like I found a social number at offset 500 in this file. So I know what you're thinking. Yeah, I invented multiplayer grep, but someone, I guess, had to do it. And just to go through some benchmarks, these are the specs. It's a couple of years old machine, but just for the sake of this benchmark, I ran it on two gigs with 13 projects. It took just over an hour, an hour and seven minutes. I'm not going to go through the rest of this, but on the flip side, an agentless scanner, the same exact thing, took an hour and 20 minutes for 13 regexes. And for the agentless scanner of that time, about 20% of the time was spent downloading the files, because with an agentless scanner you basically have to download the entire file system to your own box so you can process those files. So 20% of the time was spent on that and nearly 80% of it was spent on crunching the numbers. Now if you're going to do this for more than one box, more than one target, you're going to run into some bottlenecks. And probably the biggest bottleneck is going to be your own system CPU, and that's what's really going to slow things down. So just for one system, it's only really 19% slower, but if we extrapolate this to more systems, we see here the blue line is the open DLP agent remains flat, just about one hour. And the agentless scanner with one core for 25 systems will take over a day. Just 25 systems takes over a day. Oh, sorry. So on the bottom it's there's really not much information it just says for this graph it shows from going from 100 to 2,000, sorry about that. So for 2,000 systems, which is way on the right, it'll take almost three months to scan 2,000 systems with a single core system that I use my benchmark on, but with the open DLP agent it just takes one hour and you can't see that trust me it's there. So it just remains flat. The upsides to an agent based solution are that all the computations are done on those victim systems. It's basically a distributed project, it's like CD, but instead of searching for aliens you're owning data. And it also doesn't have much network traffic it's only sending out about one meg initially with the agent, and then every so often it pings back with that those results and the log files. So it's really not a whole lot of traffic. At the downsides to the agent list scanner are of course everything has to be processed by you, by your own laptop or your own system. So if you're going to do this 2,000 times in parallel it's really going to crush your CPU and of course you have to download everything to your system as well. So I'm going to show a live demo of the agent. And this is the interface. Make it a little bit bigger. And what you first want to do is go to the profiles and you want to create a new profile. So for this we'll just call it agent. And we'll select the windows file system for the agent. And you can mask or unmask sensitive data. I don't like to mask sensitive data because that's lame. So we want to do the local administrator account for the secure password of blah 1, 2, 3. You have to specify the domain or the work group. If you don't have the password though someone sent a patch to you can put in the SMB hash. So even if they've got like a 64 character long NTLM password that's super complex at rainbow tables won't even touch. No problem. Just put in the SMB hash and you're good to go. The install path, this is kind of important because when the agent is uninstalled it will recursively and forcefully delete this directory. So please do not, do not, do not install it to the windows directory or anything like that. You've been warned. This is the memory limit that you can set where it will chop up the files. Here's where you can whitelist and blacklist directories. So I've got some sample data in this directory. And likewise here's where you can whitelist and blacklist file extensions. So pictures, movies, exes, things you probably don't care about that would contain sensitive info. Here are the regexes. So we'll check some of these. You can add your own regexes as I said, they're based on PCREs. These options here tell the agent what regexes to treat as credit cards. So if it thinks it ran across a 16 digit number you might think it's a Visa or Mastercard. But it's going to run that through the mod 10 check. Yeah. Yeah, exactly. That's what this is exactly. So it will cut down on false positives. And these options here it will read inside zip files. So Office 2007, open office, just normal zip files. It will pass them over once as a normal file. Then it will try to unzip them and go through its contents a second time. This is the upload URL and it takes basic authentication credentials in addition to the certs. So I don't want to fat finger it so I'll copy paste. This is the time between uploads. So how often it will ping back. And we just fill out this stuff and submit the new policy. Now we want to go to start the actual scan. So we'll just name this agent. We select the profile that we just created. And we enter our guinea pig here. And it's going to start. So if you were to scan maybe a thousand, two thousand systems on this page, you would see a live scroll of this here saying zero systems remain, or 500 systems remaining, 400, 300. Once you get down to zero then you know it's safe to leave this page. If you leave this page before then it might interrupt the deployments. So if we go back now to our guinea pig system, we can see that open DLP is running below normal. It's going to run as a service. And let me try to bring that up. Hope it's just done. We see it running as a service. And eventually now it's gone. So when it's done, or even while it's running you can view the results live. So you just go to the view scans and results. And this is, it's going to give you a summary of the scans here. And you select one and here it's going to give you all the systems in that one scan that I just launched. So there's only one system. And we can view the results here. So we found possibly a social number, let's say in this file here. So we can click it and we can download it and open it. And we see yeah there's probably a social number, number one's, number two's and then down here number three's. So we can verify that. If you think you found some false positives you can check these guys and scroll to the bottom and just mark them as false positives. Go back, they're gone. If you think you accidentally mark something as a false positive you can manage your false positives here. And just drill down to the system and uncheck a couple. Now they're not a false positive. We can go back to the results and refresh and they're back now. So that's pretty much it for the agent scanner. Let me go back to my slides now. Recently I added some new features though. I gave a talk in Amsterdam in May and I added a database agent list scan. So I've got support for Microsoft SQL Server and MySQL. And then most recently right before this conference I added agent list support for windows and Unix. So for the database scans it's very very similar to creating a policy for an agent scan. The only difference though is that instead of whitelisting and blacklisting files and directories you can whitelist in blacklist, tables, databases and columns. That's pretty much the only difference. It's going to run as a shell script, a pearl script on your own system in the background. And it's going to walk the database structure just like you would walk through an SQL injection. So it's going to enumerate the databases and the tables and the columns and it's going to go after the data. And then you can control the scans too. So I'll give a quick demo with that. So we're going to create a new profile again. Call this MySQL. Test and test. And here's where you can whitelist and blacklist your databases, your tables, your columns. You can limit how many rows you can grab. So if you want to grab all rows just enter a zero. But if you're going to be aware that some tables are quite large, if there's a million rows it would take a while. So we'll submit that. And we will launch our scan just like we did last time. Select the profile that we just created. I'm going to cheat and just do a loop back because I didn't bother to set up MySQL listening on 3306. So this is going to go pretty fast. In fact, it should be done because there's not a lot of stuff. Here is the scan that we just ran. And we see that it's done. And we see, you guys really can't see that. There's five findings. Trust me. And they're all social numbers. And it will give the database, the table and the column name. So if we want to verify that, there's no option for me to verify that right now. But what we can do is just go into the database itself. And we see that. Here's what it found. All that stuff. So that's it for the MySQL demo. Now what I'm going to do is demo the agentless OS scan. Let me talk about it first. The policy is again very similar. You don't need admin credentials for this scan. It's helpful. But obviously if you don't give it an admin account, it's not going to be able to read all the files, most likely. So it's also honors the white listing, black listing, the memory ceiling. It's going to be the memory ceiling on the guinea pigs. And then it's going to run in the background as a shell script, as a Perl script. And I currently have support for Windows, the entire file system over SMB. Windows shares. You guys can't see that. And then also Unix over SSH using the SSHFS method there. So I'm going to do a demo of Unix real quick. So I'll create a new profile, call it Unix. And I've got some test data in a directory. And somewhere. It's only got about five minutes left. I don't want to scan my entire system. And again, the same file extensions options, the regexes here. Credit cards, zips. And we're good to go. We'll start the scan. And it's now started. So we can view it as it's going. And wow, it's okay. It's already done. And it's just like the last time you can see the results and do all that good stuff. So then finally what I'm going to do is I'm going to demo a windows share because that's just slightly different. So we'll create a new profile. This one, this particular share is completely wide open. You don't need a credentials at all. So I'm not going to fill in anything. So when you run your vulnerability scanners, you'll probably see that quite often. The directory here, though, is a little bit different. It's relative to the path of the share that you're going to give it when you start doing backslash windows. It's not going to know where that is because it's got to be relative to the actual share. So we'll just leave that blank for now. And the file extensions and regexes again. And the same thing, credit cards and zip files. So we'll submit that. And we'll start a new scan again. And there's a slightly different thing here where instead of giving it a list of IP addresses you have to give it the actual full path to the share. So it knows that. And if you were to whitelist or blacklist file or directories it would just append them here like that. But you don't have to do that. Just give it to the base path of the share. So we click start. And it's going to go in the background. It's going to download all those files over the share. And we can view the results as they're coming in. And that was faster. Usually if you catch it in time you're going to see that it'll give you like I'm 20% done. I'm estimating it again. But just for the purpose of this demo I don't have that much time. But here again you can see same exact stuff. You can download the files, check them out. And good to go. So conclusion for pen testers. Open DLP it's free. It's open source. After you get domain admin or after you find some database credentials or Unix credentials let it rip. Because you can show the C-level executives, show your customers that there's very much risk to getting domain admin. A lot of those people don't really realize that oh you got domain admin okay whatever. But if you show them that oh okay well here's all your customers' social numbers or here's all your customers' credit card numbers that were on Peggy and HR's system or Bob in Finances system that it's pretty damning. And then finally for everybody else if you're some sort of admin this is free. And really you should be using this to find your own those weird systems that you don't know about before people like Anonymous or Lulsec or our favorite you know nationally sanctioned hacking groups use or find and just to reiterate it's multi-platform. It does file systems and databases so really there's no excuse why you shouldn't be using this. But this is the project page. It's on Google code and the current version is 0.4 and it's kind of a bit of a pain in the ass to install so I made a VM about a year ago. The VM is a little outdated. I'm going to update it in the next few weeks but it's based on 0.22. It's easy to upgrade. And then my contact info is there. And I believe we have time. Yeah we have maybe five minutes for questions if anybody wants to yeah go ahead. Sorry. The question is if I've looked into using i-filters on Windows to look into different binary types I do want to get into that especially Outlook PST files because those are freaking gold mine. Yeah. In fact you can make your own regexes. So just by default it comes with 13 but here's an interface here where you can create your own regexes. So just give it a name and some kind of pattern here or whatever and then just you're good to go. Yeah. So the question was how do I know that this tool won't modify data or harm data in any way because people are leery about open source tools. I open the files read only so if they are modified after I open them I am not sure what happens. But it will not purposely modify the files at all. It's just read only strictly read only. Yes it would be listed in the logs here. There's a section here for the logs and any file that I cannot open it's going to be mentioned here in the logs. So there's not much here. I can open all the files that I could that I tested on my demo but it will mention it there. Have I thought about enumerating cackles? Oh the ACLs. Okay. Not so much right now but perhaps down the line. Yeah. Yeah. Great question. So as a consultant I don't like to leave my systems on the job and his question was how do agents deal with a lack of communication with the web app or your own server and there's that phone home option every five minutes or whatever you set. It's going to keep trying every five minutes. If it cannot contact your web app it's going to keep running and it's going to keep doing its grep and then every five minutes it's going to try to phone home. At the end if it's completely done searching all the files it will try every five minutes just to phone home still. So let's say you launch the scan on Tuesday you come back in Wednesday morning and plug in the first five minutes it's kind of cool to watch but yeah it will handle miscommunicating with the web server just fine. Yeah. It depends on how many systems you're running and also how many findings there are and certainly you can set the log verbosity in the profile too. I haven't really investigated it too much except that I know it can handle several thousand just fine on just a decently recently made laptop. No, there are no agents on a database server. The database scan is agent-less so it's going to remotely connect and download all the tables and stuff. Yeah. Negligible. It's just downloading the tables and stuff just like a normal client would. It downloads it locally and then does the processing locally. It doesn't do anything on the database except to download the data. Yeah. Yeah the question was self-destruct like the server after a few days it'll just uninstall itself. The problem that I ran to with that I haven't thought of that but I'm thinking of how Windows works and you can't as far as from what I understand a running process can't uninstall itself because it's running. I might be wrong but that's why when it when these uninstall them, when OpenDLP uninstalls itself it's the web app sending another one of those WinEXE commands to the system that is a really good idea just to cover your tracks more. Yeah. Another great question what happens when the victim systems that you're scanning with the agent die or they get rebooted or something. Since it runs as a service and it'll automatically restart when the system restarts and OpenDLP knows it keeps track of the last file it scanned so it'll just go back and resume where it was before. I mean if the system is completely dead I can't help that but if it gets rebooted or if no one's logged in it'll run and it'll resume just fine. Antivirus, good question right now OpenDLP is not labeled a virus by anybody and I think if it ever does it'd be quite interesting because a lot of those AV companies also have DLP programs so a little conflict of interest there but right now it's not identified as a virus. If it tries to open a file that's identified as a virus then something will pop up and the user will see that because I've run into that with AVG on occasion. Yeah, like a schedule. His question was have I set up any sort of scheduling or do these systems at a particular time? Not yet but that is on my to-do list, absolutely. Anybody else? Otherwise I'm going to wrap up. Yeah, one more question. I'm sorry it's really hard to hear you. How am I storing the data? It's stored locally in a MySQL database and you can select whether to mask or unmask that data so if you select to mask it and you're worried about you becoming another risk it'll mask the first 75% of whatever string it finds and it'll leave the last 25% unmasked but it is stored in plain text. If you are really worried about it you can set up a TrueCrypt volume for your MySQL stuff but that's kind of outside the scope of my tool right now. But that's all the time I have. Thank you.