 a little bit about who we are. I'm Jason Novak. We're both digital forensic examiners with straws freeberg which is a digital risk management firm. We get called in to handle SQL injection attacks after companies realize they've had a problem. We see our databases on paste bin, what happened, who took our data, how much data did they take, is that all the data they took. But sometimes we also see SQL injection attacks in sort of less obvious sort of scenarios. We have a large number of customers reporting that there are credit cards that they use at our store and only our store have been compromised and used in fraudulent transactions. Were we attacked somehow and was it a SQL injection attack? So we see this in a couple of different ways and we are trying to automate the response. And SQL re-injector is really the result of us trying to automate the response as attackers have grown more sophisticated in how they attack and automated their tools and their attacks accordingly. That all being said, Dre and I are incident responders so the framework in which we think about these attacks and we think about responding to these attacks are in the response mindset rather than preventative actions such as like the Etsy guys were talking about on Friday and we'll come back to. So in terms of how the presentation is going to go, we're going to talk a little bit about the problem, the historical solutions. We're going to talk about SQL re-injector which is the tool we are releasing today. It's actually up on, it's on your CDs, a newer, better versions on GitHub as of right now and we'll get to that in a little bit. We're going to do a demo and actually just so I have a sense of the demo, how many people here have responded to a data breach caused by a SQL injection in the past year? All right. All right, good show of hands. How many people here have caused a SQL injection data breach in the past year? Okay. I'm glad to see some honesty here. So I'll actually show you like Havish and SQL map just to give you a sense of what attackers are doing for those of you who aren't familiar with it. So that's just as background. So we'll go through where to get the tool and the most recent version of it. We'll take some questions that sort of as we were talking about why they're happening here, QA room or both. And then there'll be some slides about Strauss-Rieberg and sort of some work cited. So we're going to do a talk on SQL injections. We kind of have to show Bobby drop tables. I'm sure everyone here has seen it more than once. The thing that I think is kind of sad is that it was released in what like 2007, 2008 and it's still showing up in slide decks like this one and we're still talking about it in rooms like this. So it speaks to the fact that SQL injection attacks are still prevalent, still something we have to deal with and something that we need to do a better job of dealing with. But this also speaks to the fact that it's a lot easier to launch a SQL injection attack now. Right? Like if anybody knows how to craft the weird statement that causes the database to, you know, go crazy, it's no longer just sophisticated attackers or even like unsophisticated attackers who are acting in their basements. It's anyone. And now the issue has to be that anyone should have to be able to respond to them and quantify the records taken and what data was exfiltrated. So at this point I'm going to turn it over to Draya who's going to talk a little about sort of SQL injections writ large. Hey, y'all. So yeah, so before we demonstrate our tool we just kind of want to give you a little bit of background on SQL injection. I know that a lot of you it's going to be very remedial. But one thing that Jason and I kind of noticed when we came here was that the venue is very, the spectrum of the venue is very broad. And so we like to give a little bit of background so that those of you that maybe don't know can understand a little bit more about what we're talking about. So if you look at the URL at the top of the slide it looks very familiar to all of us. You have the URL followed by a question mark and then there's a parameter and value pair. When you go to this URL this is the page that you're going to see. If we change that parameter value pair though the website changes and this can infer to us that it's a dynamically generated URL probably with a database back end. So in the background what's actually happening this URL is sending this to a database in the background and the database response presenting to you what is on the screen that you see. So for the sake of this demo we're assuming that the web app is going to go ahead and distrust you and they're going to provide this to you. And therefore it is not validating or sanitizing what the input is which is kind of scary but again for the purposes of this demo that's what we're showing you. So I don't know if anybody here was at Dan Kaminsky's talk on Saturday but my sequel real escape is 25 characters long. Why would developers care to put it in? You know it takes more time so we still see this in the wild. Exactly. So interesting enough if we change that parameter value pair it will return more data than what was supposed to be requested. This is of course the breath of the attack and the concern. And so if you look at this query here when executed instead of bringing back the details of one page of one movie it grabs information about all actors in the database instead and what we get a result is this page which is clearly very different. So as we see in the sequel injection query once you figure out what the page is vulnerable you can accelerate data not only from the database tables that Web app knows it's using but also from other database tables and the same database system which is essentially what this query is doing. And as a result that's what we get. So from the page before we've gone from film data to back end payment data. And so obviously sequel injection is expensive and there's you know some bullets here. 97% of data breaches worldwide involves sequel injection at one point or another. That doesn't mean that's the entirety but it can be a part of. And the average cost and this actually gone down recently this year but the average cost of the data breach response and remediation for a sequel injection attack is between $194 and $220 per record. And just to kind of bring that to light as of July 9th when we were working on these slides privacy rights or excited 330 breaches in 2012 that affected 18.6 million records. And that was actually like the most conservative number we could find. DatalossTV.org reports much higher at 723 breaches as far this year. So just because this is a really old attack I don't want to say old but because it's been around for a while doesn't mean that the old stuff is not still working. It clearly is it's still around just within the last month very recently Yahoo Voices had 450,000 users credentials compromised through a union based sequel injection attack. And you know just caveat that sequel injection attack doesn't naturally mean you're taking data from something. You could also be putting data into something similar to Heartland and HP Gary incidents in the past. So I'm going to give it back over to Jason. Thanks. So sort of as we were talking, so Dre I think has done a great job of establishing the problem right like sequel injection attacks are still out there. They're costly. They exfiltrate a lot of records. They're a well known sort of category of attack. And you know what's made it worse for the IR community, worse for people who have valuable records and databases is that the cost of attacking has gone down because there are tools now out there like Havish like sequel maps that automate the entire attack for you. So the skill of running these attacks have gone down. The cost of running the attacks have gone down. And the amount of data stored in databases has gone up, right? So this leads to cheaper costs for attackers and higher costs for defenders and responders. So sort of my thinking is if attackers can automate their attack, why can't we automate our response? You know, the problem is that the traditional model of responding to sequel injection attacks, sort of IR cases more generally, is something roughly like the following. I think people who sort of raised their hand earlier about responding to sequel injection attacks would really appreciate this. It's, you know, you fly a bunch of consultants out to a data center. They image the server. They analyze the logs from the server to the extent they're still there. They determine what was exfiltrated through some analysis of those logs, which is typically running sequel commands against a sequel server that they brought back in their lab and then a lot of database work to get back up and running. And, you know, as I was talking earlier, there's only going to be more data in databases as time goes on. So it's only going to get costlier. So we also have problems with queries like this. If you've dealt with a lot of sequel injection attacks, you know what this is doing. You know it's trying to map the database. But if you haven't done a lot of sequel injection attacks, you don't know what this is doing. You think this is, might be exfiltrating everything. It gets really scary. It gets problematic. You need better training to deal with it, but you don't always have the luxury of better training. And also, as a responder, looking at these queries takes a lot out of you. I'm really tired of it. I don't want to have to look at them again. So we're saying that that process lets you figure out what was exfiltrated as if it was an easy process. And it's not, and it's not for a number of reasons that really all boil down to one thing. There's a lot of complex software like Kavish that's running the attack, like Mod Security that's acting as a web application firewall between the attacker and the web app. PHP or the web app language itself that's interpreting these sequel injection queries and the database server itself. So all of these pieces of software are interacting together to result in some sort of successful sequel injection to a database and then that data getting passed back up the stack to the end user or to the attacker who then exfiltrates that data, right? And we've seen problems in the field with these tool sets interacting with each other. Kavish, for those of you who don't know about it and will as we'll see in a little bit, doesn't bring back a table at a time. It brings back a cell at a time. And it brings back a cell at a time by pulling a limit from a database table. And using that limit as like a row ID that it then uses to reconstruct a row. That's all great. Except that those limits aren't guaranteed to be the same every time. So when you're selecting first name from table, order by first name, you know, limit one, you get Jason. You do that for last name, you get London. You do that for email address and you get who knows what. So you get commingled data in the exfiltrated data set. And then who do you notify? And how do you know that this actually happened and this happened reliably and happened repeatedly? Especially if you don't have the data that was exfiltrated and they weren't kind enough to paste it to post bin. It's a problem. So you're lucky when they sort of post it to paste bin because then you know what was taken on some level. So we know that there are a number of sort of steps that you have to go through in a response that's traditional to deal with these attacks. You know, you have the image in the server, you have the analysis, you have doing the sort of further analysis of the image and logs and sort of standing up a SQL server. All of this can be leveraged in sort of a new and unique way that we think. We think that if you virtualize the server and replay the attack against it to get the data that was exfiltrated, you're actually going to have a better sense of what was really exfiltrated. You're going to be able to ignore a lot of the exploratory queries that take up a lot of time of sort of like figuring out if they're malicious or not. And you're going to be able to say, okay, here's the actual data set that was exfiltrated. And at this point, you've already made the forensic image as part of your preservation process. You already have it back in a lab where you have some sort of virtual machine software, whether it's VMware, VirtualBox, or an ESX server. You know that there are things like live view out there that you can use to virtualize the forensic image. So why not do it? It's a relatively little cost at this point in 2012, and it lets you do some interesting things. So what I'm about to show you is SQL Reinjector, which is a Python script that leverages the Apache log module to take a set of web server logs, parse them, play them against the virtualized web server, and then says, okay, this data is different. This data is what was exfiltrated. So we're going to demo that now and we're going to move to sitting down. So give us a moment. Sorry, I'm not, sorry, I'm not very good at typing at an angle. So what we've done is we've built a pretty insecure lamp web app that looks like this. We've built a pretty insecure, is this my con? Yeah, okay. So we built a pretty insecure web app that looks like this. It takes a ID value and it will change the movie that's displayed. Whoops. And it will change what's displayed. So that's right there. So what we'll do is we'll run it Havage, which is one of the SQL injection automated tools that we've been speaking about. It looks a little something like this and you just point it at the web server, you point at the website you think is vulnerable. Going to mirror mode. Alright, thanks guys for putting up with that, sorry. So you give it the URL, you hit analyze. It does all the sort of database magic on the back end. It figures out that you have a database called Sekilla in there and you are vulnerable to SQL injection attacks. You can use this to get further information about the database including getting the further databases that are on there. You see that the three of these are probably not that interesting. So you get the tables off the one that you are interested in attacking. You run get tables. You see that there are a lot of interesting tables, the one we're going to attack here because it sort of has user names and passwords and also looks the most interesting is staff. Get columns. You get all the staff columns. You get all of these various rows. So you get the first name, last name, address ID, all this stuff. Get data. And as you see as I was talking earlier it's pulling back each cell of the database one at a time. And it's doing this through a set of limits on the back end that we'll see. So what, in a minute we've SQL injected a database, gotten one specific table out. I mean this tool is used in the wild on very significant data breaches because it's super easy and it is makes some pretty messy SQL on the back end that I'll show you guys now when you're responding. I mean this is what all those queries ended up looking like. So think about decoding this by hand. Think about how painful that would be. Think about running these one at a time against a SQL server after you've imaged it. I mean those are sort of the options you have. And you're pulling it back, you know, one at a time again because that's how the queries are going. Select your single field from the table at one limit. It's kind of lame. There are better ways to, there are better ways to respond. And the way we think it's better to respond is to take your compromise server, virtualize it using, you know, virtual box in this case. And then take SQL re-injector and play against it. So SQL re-injector has a couple of command line options that I glossed over in the last slide, but we will go into more detail now. At a very high level it's going to take your input log file, which we've already pulled off the server for the purpose of this demo. A database file that's going to write out to you, which will be a SQL-like database that you can then review at your sort of leisure. The actual URL that you want to sort of replay the attack logs on and to be clear, this should be the virtualized server you're running, not the website that is up and running. That would be bad. That would be replaying the attack on the attack site. Just throwing that out there. We also have a couple other command line options for more advanced parsing of the SQL injected return data. Sort of as I showed, Havish brings things back one cell at a time. We can reconstruct those cells as tables as Havish presented them to the attacker. And then attackers aren't always so nice as to show you that they use Havish or to always use Havish. And sometimes you have to compare the website that you have to a known good version of that website. And that functionality is also in here. There are a couple other sort of administrative arguments. There's a cookie flag for websites that require cookies. So, you know, if you have a web login type thing you have to get around, you have that possibility. And it also, this is a requirement of Apache log module. It will, it requires the log format from your Apache Conf. So you have to pull that off the server as well before you virtualize it and undertake this analysis. So we're going to pass it a set of log files from an attack very similar to the one I just did. And we are going to pass it the website that we want to attack. And I think that it's important to note here, I'm giving an IP address. If you are dealing with a host that has multiple machines on it and it's using some sort of Apache rewrite you're going to want to have this be a host name and deal with like a page DNS on your side to do DNS spoofing so that it's going to your virtualized host rather than the live website so that the virtual machine is also giving you traffic for the right website. So we're also going to give it a log file. We're going to show you sort of the basic configuration first. And we'll show you some of the cooler stuff. Like format. Sometimes you forget to give it an output location. So in the background it just, it just took all the logs, parsed them, threw them against the web server, captured the output, stored the output in a database. So when you open the database you'll see a table, SQL injected returns. Storing has three columns, an ID column, the actual request that was made to the web server and then the actual data returned. So here you see like the legitimate page that we showed you a couple of minutes ago. This is not very interesting. Where it does get interesting is down here where you start to see the union selects and you have broken web pages and then further on down where you actually have injected data that doesn't look like anything a normal user would expect to see. So this is all great. You have all the SQL injections. You have the data that they resulted in. You have sort of step one of your analysis. Great. But we can extend this. We can make this better. We can make this stronger. We know that this was heavy, but we're not going to pretend for a moment that we don't. We're going to say, compare it to a known good version of the website, which we've already saved down and have right here. So here it's going to replay the attack, capture all of the returned web pages, and then diff those web pages into the known good that you passed in earlier. And that's going to result in a database that looks like this. It's going to have, in addition to the SQL injected return data that we showed earlier, it's going to have this compare to data table. And this compare data table is going to show you for each row, what differs from the known good. And here you can see that Havij is pulling things out one row or one cell at a time. And this lets you sort of automate the response and say, okay, this is the exfiltrated data. But in the case of Havij, it's still a little bit messy because how does this data associate to one another? And here, if you pass the J flag, it'll actually go through and it will rebuild the database tables as Havij returned them. So it just replayed all the attack, grabbed all the data, and then parsed it using sort of the Havij parser that we built in. And there you're going to get a set of tables that look like this. You're going to get the SQL injected returns, which same as before. But you're also going to get back the tables that Havij brought out that are going to be prefixed with Havij. They're going to have underscores in the name, so it should be fairly obvious that it's Havij underscores, the database name underscores, the table name. And here, if we open up this one, you see it's the two rows in the database brought back as Havij brought them back. So that's sort of the Havij parsing. They also brought back the information schema, which less interesting, but it's there. So that is dealing with sort of Havij attack with this tool. The other sort of possibilities, you're dealing with a sort of cookie-based service, or you're dealing with a sort of a SQL map-based attack. And we're going to sort of show how the tool can be there. We have damn vulnerable Linux running in the background. So we're going to go here. We also have a copy of Backtrack 5 up. So how many people here have used SQL map? Just out of curiosity. All right, so this isn't going to be too new for any of you guys, but hopefully it's entertaining for the rest. So SQL map, you... Let me grab the URL from here. So this is SQL map rubbing the database, trying to figure out what's going on there, what the sort of attack vectors it has are. The ID is vulnerable, which we knew. So then you can figure out what databases it has. You can say, what does the database dvwa look like? In terms of tables. And it doesn't want to play nicely. All right, we know there's a table of users. It helps if you spell things correctly. So you figure out what the user's table has in terms of columns. You say, this looks interesting because it has user names and passwords. You dump it. You get something that looks like this in output. That results in a set of back end logs that look like this. And if anything, it's even messier than the Havish attack. Because you have to go through all the processes of logging and getting the cookie, all of that. You see that there are a number of probes, a number of different queries being made. And interestingly, ultimately, you know there's only one query made that brought back the table of interest. So it's a lot more complicated to pull out. What's relevant and what exfiltrated data and what data was exfiltrated. And for the purposes of dealing with dvwa, I've actually sort of, I went ahead and I just grepped out the things that I knew were SQL injections based upon a SQL injection keyword list. And as we'll talk about later, we think there's a smarter way to do this in the pipeline. But you pull those out, you replay them with SQL re-injector, and you end up getting a database that looks like this where you compare it to known good and you have some very weird tables and very weird returns, including one that returns basically everything. And you've dipped this against your known good and you've automatically identified what your exfiltrated data looks like. So that's SQL re-injector in a nutshell. Go back to our slides. There are a couple of things that we want to do. We're currently parsing the Haviz attacks to get the tables back in a kind of very manual, very string parsing, very Haviz specific way. We think that there's probably a better way to do this using regapps. The problem with regapps is once you try to solve a problem with regapps, you have two problems. The other thing we're talking about, better SQL injection identification instead of having to be based off a list of keywords that you grep out of your access logs. Did you see the talk by Neil Galbreth on Friday? Anyone? Yeah. I mean, that was pretty cool. There's some really good ideas there about SQL injections and SQL injection auto-detection through parsing. Find the video when you get home. It's 20 minutes. It's really cool. I want to integrate it to this somehow. Licensing is a little difficult. And then we're going to do speed and scale optimizations, which are, of course, on every project's future development list. Where can you get this? You can get this on GitHub. We have github.com slash draws freeberg. The code is there. It's more up to date than what you saw on the CD. The slides are there. The slides are more up to date than what's on the CD. It's licensed under LGPL v3. So you can take it. You can modify it. You can give us back the results of your modifications or not. We'd appreciate it if you did. But we understand that there are situations where you may not be able to. But if you did, that would be really cool and I'd like to talk to you later. And I'll be here. So that literally got pushed up like an hour ago. So that should be there now, though. We're going to open the floor to questions now. So if you guys have questions, you know, feel free to come up here and or shut them across the room and we'll repeat them. But you know, that's it. I want to give some things to various sources that we pulled data from, particularly the guys who did the Apache log module because that parsing and those reg apps are really hard to come up with and we didn't have to. I also want to thank some people who are responsible for actually getting this presentation approved and getting the tool released. Erin Neely Cox, Scott Brown, Sherry Carr, Brett Padres, who we owe a lot of thanks to. You know, this is where we're located. This is who we are. Questions. Yes. So the question was what about post requests? And that's a great question and I don't have a great answer for it. If you have post request logs, I think it's a fairly easy extension, but most people don't log them as you said. So that's something that we want to work on in the future once we see it. Next question. Yes, in the back. I'm sorry. I can't hear you. Do you mind coming forward? I'm sorry. You're asking about parsing logs from MySQL. Yeah. So we've had, in our experience, we sort of more typically have the access logs. We don't really have the MySQL logs to replay and sort of look at that way. There are also some funny things that happen with MySQL logs that we don't, it doesn't necessarily make them the best source. You know, this assumes that you have a full forensic image of the server sort of close in time to when the attackers attacked. You may not have that. You may have other problems. But you know, sort of like the closer in time we can get a full forensic image and the closer in time we can get a complete set of logs, the better because among other things, attackers may do funny things to the database or in the response funny things may happen to the database. So, yes. So the question was, is it dependent upon particular SQL flavors? It is currently not dependent on any particular SQL flavor. It is not dependent on any, it's not really dependent on any particular web server. Either IIS uses the common log format, which you can trick Apache log into parsing. So if you know the syntax for CLF, you can pass us, and you have IIS logs, you can pass a CLF string and it will parse those IIS logs behind. And we've done that in testing. So, anyone else? All right. Well, we'll be up here if you guys have questions or you guys want to talk further or doing cool things with SQL injections that you can't share with the room but can share one-on-one. So, thanks everybody. Really appreciate it. Oh, before everyone leaves, is everyone cool if I take a photo so that people, so that like my parents, my girlfriend, my boss sees that there are people who are interested in this topic and I'm not just crazy? All right. Cool. Thanks.