 Thank you. Good morning and welcome to backslash powered scanning, automating human intuition. Imagine if you could conduct a pen test and only do the interesting bits. If you could skip the hours of repetitive ffuzzing and trawling through meaningless results and jump straight to that one page that reacts in the mysterious way to your every input. The one that turns out to be vulnerable to code injection in a language you'd never heard of or SQL injection behind a heavyweight filter that almost stops you from cracking open the database. The kind of page that no black box vulnerability scanner could ever detect. In this session, I'll share with you the conception and development of a new type of scanner that can find research grade vulnerabilities. This all got started around four years ago. I just started a pen test on a company that I'm going to call Marketiser and we had a problem that's probably familiar to many of you. They hadn't gone around to giving me any login credentials. So I was just staring at the login page and this is quite a common occurrence, but this time it was particularly upsetting because just from the 90s visual design of the login page I could hear the vulnerability screaming out at me from within the application, but I couldn't log in and I didn't even have a valid username so I couldn't even try and brute force passwords. Eventually they did get around to handing me the login credentials and I logged in, found some SQL injection, dumped all the passwords out the database which were naturally stored in plain text and found that the password of the director was Marketiser 1. So in effect his username was harder to guess than his password. I thought this is kind of embarrassing. I don't want to let an application get away with having such a bad vulnerability ever again. So next time I do a test and the client doesn't give me the login credentials I want to just break my way in and to help with that I wrote a tool called let me in. With this tool you give it a long list of candidate usernames and it tries all of them over and over and it does some statistical analysis on the timing of the responses and it also does analysis on the content of the responses and tries to see if there are any clues that indicate if any of them are valid usernames. And some of the time this tool was extremely effective but most of the time it just failed horribly. But while I was trying to fix it one particular time when it failed badly I realised that I could actually use this technology and this approach for something far cooler and that's what I'm here to share with you today. So first I'm going to talk about three huge blind spots that all vulnerability scanners have when it comes to finding server-side injection issues and how those blind spots combine to form the million payload problem. After that I will show a new technique that we can use which deals with all of these blind spots and lets us find really interesting vulnerabilities in an automated way and I'll look at what happens when I ran that scanner on a bunch of sites. Then I'll look at what potential for further research there are using these techniques and finally take maybe five minutes of questions. So by the end of this presentation you'll know why existing scanners suck, how to build and tweak a better scanner to your use case and how to use that scanner for maximum effectiveness. First of all, a little bit about me. As was mentioned I work at Portswick it where we make a talk called Burp Suite that you've hopefully heard of and my job there is largely to refine our vulnerability scanner. So although I'm about to spend the next few slides slagging off vulnerability scanners all of my criticisms apply to Burp's vulnerability scanner. I'm doing this from a position of respect. I think scanners are great. I just know that they aren't close to their full potential at the moment. Something else that I do is research new types of vulnerability such as server side template injection and one of the objectives of this research was to make a scanner that can basically do this research for me. Something that can find a new kind of vulnerability and point it out to me so I can investigate. So the first blind spot is that scanners are really bad at dealing with security through obscurity. For example, how many types of server side template injection does your scanner support? Probably not all of those and that's just a list of the most popular ones from Wikipedia. ArtSploit recently got code execution on PayPal using dust.js which is a template engine not popular enough to be in that list. Of course, in 2014, no scanners detected any kind of server side template injection. Vulnerable stuff was being scanned. I know that for a fact but the scanners just weren't finding it. So any use of a slightly obscure technology can just break vulnerability scanners effectiveness on your site and that can happen in quite unexpected ways too. For example, scanners typically find local file include vulnerabilities by reading the contents of the ETC password file and if you happen to be using SC Linux on your application then the web server may well not actually have permission to read the ETC password file and so the scanner will miss all the local file include vulnerabilities on your site. You can have things failing in really unexpected ways if you use technology that's just a little bit rare. That's the first blind spot. The second is, well, okay, so as we've established we're limited to languages that the scanner explicitly supports so let's say we know the server is vulnerable to PHP code injection and a scanner will typically send a payload something like what you see here and if the server is vulnerable it will sleep for 10 seconds before it replies. So far so good. But what if the server is filtering out parenthesis? Well, the scanner's payload won't find the vulnerability it'll get a false negative. You can still exploit it using Bactics or a million other methods but the scanner won't find it and if there's a web application firewall looking for the sleep keyword once again that will just stop the scanner from finding it even though it's probably still exploitable. I'm just using a Cyrillic E there but that's one example of many potential WAF bypasses and also obviously if double quotes are filtered this payload won't work but it's still exploitable and of those three examples two of those are from pen tests that I've done personally. Scanners will even miss SQL injection if your application is using double quoted strings rather than single quoted strings to encapsulate user input simply because it's not that common so scanners don't try to find it. So if there's any kind of variation filter the scanner's just not going to work very well. So let's assume that the vulnerability is in a known technology and there are no filters and it's not some weird variant. All we need to do now is figure out where to put the payload and here this is a real end point on eBay that was vulnerable to remote code execution and the obvious place to put the parameter is in to put the payload is in the queue parameter but that doesn't work and neither does putting it in the cookie or in the refer a header or in the user agent or in the path or even in various additional headers that you might try. In order to successfully trigger this vulnerability you need to specify a second queue parameter containing the payload. Now what scanner is going to find that? This was found by Emmanuel Testo because he noticed that the normal queue parameter was being spell checked and he wrote up in his experience spell checkers are often implemented insecurely using eval in PHP applications and that's why he tried specifying a second one and found this from a code execution vulnerability. So the million payload problem is that simply put all of those payloads I've shown a scanner can send but that's a tiny number of the number of potential payloads you have to send to find every vulnerability class with every kind of filter or variant or technology and if you try all of that per request then you'll be sending over a million payloads per base request to the application and you'll never finish scanning anything. So scanners can't do that. They're reduced to sending best effort payloads which just say work most of the time and that is what leads to people saying that scanners are good for finding low-hanging fruit which is a statement that breaks my heart. What we deserve is a scanner that finds high-hanging fruit. So how are we going to do that? Well, we need to harness the intuition that human testers have. Rather than sending a highly specific payload that says find me injection into a double-quoted string that's being evaluated in a PHP context we need to send a generic payload just like a human that says find me something suspicious and then if we do find something suspicious then we can send a follow-up payload and gradually try to figure out exactly what's going on. So I'm going to start with a probe that's about as simple as possible. We're going to take the input and just put an apostrophe on the end and as you can see here this has caused a 500 error from the server so that's interesting. There might be a vulnerability here but the application might be saying you've got an error in your SQL syntax or it might just be saying invalid input, please only use lowercase alphanumerics, thanks. And as a human we can read the response and look at what's changed and come to a reasonable conclusion about whether this is worth following up on. But scanners can't do that so what conventional scanners do is typically grep the response for known error messages like that SQL syntax one there. That fails horribly if the application is using an unknown language if it's an unknown type of injection vulnerability or even if the application is just doing proper error handling. So we're going to try something different. We're going to try to use a property of the vulnerability to send a payload that is syntactically almost identical but doesn't cause an exception in the application. So here we're just going to escape the quote. And if that makes the application return to giving the original response that tells us there's potentially something interesting happening here. So just to give a visual overview of how that's going to work in this specific case we're just going to send we're going to append a quote to the input. If we get the same as the original response that's not vulnerable we can just give up straight away. But if that doesn't match the original response we're going to try escaping the quote and then depending on whether that reverts the behavior or not we can conclude whether there's potentially something interesting happening here or not. So using this technique of probe pairs where you've got a pair with one item that will cause an error and one that's syntactically very similar but shouldn't we can effectively ask questions of the application. As you've seen we can use a probe pair like that to ask am I in a single quoted string. And you can also ask am I in a numeric context just by doing 0 and divided by 1 much like a manual tester would. And you can also ask am I in a file path by providing something that normalises back to the original path in something that doesn't. And you can also try and find some slightly more esoteric stuff. For example maybe our input is the name of a function that gets invoked by the application. And by changing that we can change the function that gets invoked. We can test for that. Just try changing the input to a function name like sprintf and an invalid function name like sprintg. If those give consistently different responses from the application we can be pretty sure there's something interesting going on. And then yeah you can do even cooler stuff like saying is this input being embedded in a JSON string for example. By injecting something that breaks the JSON structure and something that doesn't. This is just a tiny sample of the number of questions that you can ask. And if you can think of a vulnerability that you can express as a probe pair like one of these because I've implemented this in an open source extension you can add that yourself and it takes roughly five lines of code. So give it a go. So that's cool. But the true power of this technique is that we can use the answers from these questions to decide what to do next. For example having figured out that we're in a double quoted string here we can then efficiently try and figure out automatically what character the server uses for concatenation. So here we've tried a few possibilities and found that PLOS works for concatenation. And knowing that we can then easily try and inject a generic function call. The abs function I've chosen that function specifically because it exists in almost every language out there. And if that works then great we pretty much know that we've got code execution in some kind of language. But it would be useful to know which language so we can follow up with a bunch of pro pairs using functions that only exist in one language. And here the isFinite function call has worked which tells us great we've got a server-side JavaScript injection in a double quoted string. And being able to find that is kind of cool but most scanners could just come straight up with that final payload. The strength of this technique is that due to the way we've reached this conclusion we've dealt with all three blind spots. For example, if the application is running in unknown language that I don't have a specific function for, we'll probably still get to this point and the scanner will still report this and say I've got code execution I just don't know what the language is. And if the application is filtering out parenthesis, well the scanner will still get this far and show that it's injecting into something where it can concatenate strings and it will show that to the user and they can investigate and figure out what's happening. And finally, if the input isn't actually vulnerable to anything, as you can see the number of payloads that this technique ends up sending is absolutely tiny, it's really really efficient and that means we can afford to put payloads everywhere, we can afford to duplicate all parameters and put the payloads there or put them in all the headers and so on. So that's really powerful but I've gotten ahead of myself slightly because I haven't identified how we know when two responses from the application are really different. The obvious approach of doing simple equality on the responses will fail horribly because as you probably know responses from applications are often full of meaningless junk that changes on every page load. Originally, back when I wrote Let Me In, I thought that I could address this problem by fetching several responses identifying the static sequences within those and then stitching them together to form a gigantic regular expression. But this failed probably for a huge number of different reasons that I don't even have time to list here. What does work is viewing the response as a collection of attributes, taking properties of the response like the status code, the lion count and the word count and saying do we have an attribute that has the following two properties. It needs to be consistently different between probe 1 and probe 2, like the one that should cause an error and the one that shouldn't. But it also needs to be consistently the same for a given probe across repeats. That technique is really effective. We've actually put an API in BUP so that you can use that technique very easily if you want to. The most important thing to note from this slide is that at no point do we predict what effect a specific payload will have on the application. We're just looking for any kind of consistent difference. So, here's a simple example. We've got one valid attribute here. The status code never changes. The lion count seems to change all the time so that's not much use. But the word count whenever we send an unescaped single quote is always one less than the rest of the time. And the scanner will represent that in a table as you can see that. And if there are multiple valid attributes they will also be put in a table. And this scanner has on real live websites found SQL injection where the only indication was a single word vanishing like this. Sometimes it's not quite that simple. So, here the status code is behaving as we would expect like that's a solid piece of evidence but something a little bit different is happening with the word count. Whenever we send a division by zero it's exactly 27 but the rest of the time it changes randomly. And that could be because there's some kind of interaction between our payloads and the application that we're not expecting or it could just be a flaw with our payload. And previously this would not have been counted as valid evidence in any way but as of the update that's going to be released today or possibly next week this will be counted as valid evidence but it will be put in italics and start so that you can see this is kind of a tentative piece of evidence. Basically, if you're in a massive rush for time you could ignore things that only have shaky evidence like that but it's probably worth investigating to see what's going on. Okay, so as we've seen we can deal with the problem of having arbitrary random content within responses just by repeating our probes. Some applications though like to alternate between two different responses and so if we alternate between the items in our pro pair we'll sync up with the application and falsely conclude all kinds of junk vulnerabilities. So to address that it's important to shuffle the order that you send the probes in. Even worse, some particularly vicious applications have deterministic random content. For example, one site that I tested on their homepage they would show a quote that told you how great their product was and this quote was taken from a pool of maybe ten different quotes and the way it was selected seemed to be a random selection seeded with the current URL so if you change the payload in the URL you'll get a different quote back but if you change it but if you re-send, if you repeat it over and over for a given URL you'll always get the same quote and once again this can confuse the hell out of scanners. So the way to address this is to use pro batches. Take each of the items in your pro pair and just make a few cosmetic variations of them. So for example 7 divided by 0 and 7 divided by 0 0 I likely should have exactly the same effect on any application that's vulnerable. But if your input is just being fed into some random number generator or maybe base 64 encoded or such like then that will get different results and that will prevent this false positive. So hopefully that makes sense. I'll just give a quick overview of what the whole scanning process and what it looked like. So here we've sent a payload 221 and we've got 965 words back. The scanner is going to try putting a single quote on there and we get the same word count so it's just like ok nothing's working there and it's going to move on and try 0. Here that changes the word count so it follows up with divided by 1 as per our tree but that doesn't revert the word count so we're just going to move on. And here we're trying to inject a function call in a way which typically works in SQL like statements. And that has changed the word count because we've injected an invalid function call. So we're going to follow up and send a valid function call apps, my favourite one. And that has reverted the word count to the original value. So that tells us ok there's probably something interesting going on here but we just need to make sure so the scanner will follow up with some cosmetic variations. Here rather than calling a function that doesn't exist it's still calling the apps function but it's using the wrong number of arguments. And once again that causes an error but sending the right number of arguments revert to us. So now we know this is vulnerable to some kind of language injection and we can try loads of language specific functions and eventually find the current request ID function works which tells us that we've got Microsoft SQL injection. Lovely. Ok that's enough about how the scanner works in theory. Let's look at what happens when you actually run it. I decided that I was going to scan everything that I could legally scan. And that means basically everything that has a bug bounty or a responsible disclosure policy that doesn't forbid automated testing. And in order to do this in a reasonably efficient way I wrote an extension for Burp Suite called Distribute Damage which implements a per host rate limit so that you can be scanning 50 servers at the same time but each given server will only see requests coming in at like one per second or one per seconds. Just to be polite and avoid knocking stuff off-line. And yeah that was the goal of that. A kind of side effect is that if you scan with only backslash power scanner enabled with all other scanners turned off and you use this extension from the servers point of view you're sending a tiny number of quite innocuous looking payloads really slowly so it might come in quite useful if you're on a red team or the like. If you give the scanner something a lucky cut of vulnerability it will find it really easily with loads of evidence no problem. For example here it found the server was vulnerable to my SQL injection in the user agent. That was a real sight. It happened to be running WordPress and that's cool but that's no better than any other scanner really. The strength is when it tells you I found something interesting but I need you to do some work. So what we've got here is critical vulnerability that the scanner found which made me extremely happy at the time. Would anyone like to take a guess as to what the server is vulnerable to? No? Okay. This is vulnerable to PHP code injection but the server is filtering out parenthesis and it took me quite a lot of manual testing to figure out what was going on but I did in the end I got code execution and that was nice. I manually tested this application before I ran the scan without finding this vulnerability and that was because the input was in the path if you put the payload in any query parameter in the URL anywhere else it didn't work input had to be in the path and calling a PHP even out on the path is the kind of thing you expect from an internet of things device rather than a household name website but anyway the scanner was patient it would try payloads everywhere so that's nice. Whenever you see a partial issue where the scanner hasn't told you exactly what's going on you can be sure things that won't be fully straightforward here it found it thinks it's found some kind of order by injection but it hasn't found out what the backend server was running so I took this to the repeater and tried to figure out what was going on it was extremely obvious whenever I sent any request anywhere on the server I got a 403 forbidden back because I'd been IP banned so I changed my IP and I rerun the scan and got my SQL injection Microsoft SQL injection so that was easy and the scanner actually found this vulnerability on over 100 military servers I think because it was in a library there were tons and tons of the applications we were using you have to wonder how you can have hundreds of SQL injection vulnerabilities on public facing servers that have got a bug bounty and nobody else finds them my guess is that other vulnerability scanners simply got IP banned before they finished scanning if the scanner finds something it says but doesn't tell you what it is possibly the worst result if you spend hours investigating because most of the time this is pretty much useless so one way to recognise this is if you inject slash 0 or slash 1 you'll see that being converted into a string as a back reference or maybe if you inject slash 1 you'll get a completely different response from something like slash 999 which is too high to be a valid back reference and obviously don't forget to look at the response and see if there's any clues sometimes though particularly on certain PHP applications you can break out of the regular expression and specify flags and provided they're running a little bit of an old version of PHP you can specify the execute flag and terminate it with an old byte and get code execution so that's cool but this hardly ever works in real life and I got bored of manually doing this so now backslash powered scanner will try this technique for you let me know if it works, that would be awesome other times you will get false positives so false positives are generally caused by flaws in the pro pairs that you're sending combined with something being slightly weird about the target environment all the pro pairs used in backslash powered scanner like as it will be if you just install it have been refined over the last few months and should be pretty unlikely to get you any false positives but if you add your own pro pairs then you'll probably find that you need to refine them here we can see there's something a bit weird happening because the scanner thinks it's injecting a function call but when we inject a valid function we get a 403 reply which is just a bit weird and the reason is that there's a web application my favourite crepping for the word substra so I fixed this in backslash powered scanner by using a slightly different cosmetic alteration so here rather than injecting an invalid a function call that doesn't exist I'm still calling substra but I'm calling with invalid arguments a web application firewall won't know you're using invalid arguments so it will still block this and we won't get the false positives so this is a good example of why payloads should be as syntactically close to each other as possible but by bringing them closer together it will be false positive other times the scanner will get you just not vulnerabilities but useful bits of intelligence here it's using a test that's supposed to find code injection into a numeric context by using an inline comment and a follow up technique that it will also try if that works is that weirdly injecting HTML string tags has exactly the same effect and the reason that's happening is that there's a web application firewall and it's rewriting requests to remove comments before it passes them on to the back end so this isn't actually vulnerable to anything but we have found some really interesting behaviour because we can use that behaviour to bypass browsers crossfire scripting behaviour of their filters because their filters they use rejectors and those can't handle requests being rewritten mid-flight before they get passed on to the back end application other times you'll find things that will make you very sad because they're destined to remain a mystery for example this is something that the scanner found several weeks ago and I spent several hours on this Gareth Hayes spent several hours on this several other people in the office spent several hours on this and none of us could figure out what was actually going on on the back end so it looks almost like the application is calling a Java eval on a single quoted regular expression but not quite and we couldn't really get any more information than that and we just had to leave it so there's a bug bounty program out there that has that behaviour which I don't want to name it in case it is vulnerable but it's out there if you scan you'll find it if you find something like this and you do figure out what's going on clearly if you just ask the client and they tell you or give you the code please let me know and I'll be happy to update backslash proud scanner so it can identify what's happening automatically as of the upcoming update backslash proud scanner will also find back end server side, HTTP parameter pollution so this is a vulnerability maybe you're already aware of it but just in case you're not the way it works is some applications take user input and if they did it in a request they send to a back end server that's not publicly accessible and that's fine unless they forget to euro encode the user input if they do that then as you can see here an attacker can specify additional parameters to get past to the back end or maybe override existing ones and that back end may be not expecting those parameters to be coming from user input and therefore it may be vulnerable to various stuff involving that and backslash proud scanner will now find that that's a new test so it's going to be a bit less reliable than the old ones but it's still found a decent number of real issues in the wild in fact it found so many issues in the wild that I got really bored of manually investigating them so now if you right click on the request attached to such an issue there's an extra option called identify back end parameters and that will use this different technique and run a list of the top 2000 server side parameter names on the server and try and automatically find valid parameters for you so here it's found that city is a valid parameter and could give you the evidence for that and that means you can now follow up and just scan that and figure out exactly what's going on okay so what else can we do with this technique well one thing we can do is cold start brute force attacks this is what I'm calling enumerating inputs where you've got no prior knowledge of what the response to a valid input looks like the example right from the start where you want to enumerate usernames but you don't have a single valid username to begin with is the perfect example and it's the same with parameters as we've seen we can do that I've already implemented that on the back end server and in fact you can brute force parameters really really fast because you can specify multiple parameters per request like probably a few hundred and then do a binary search to figure out which parameters actually cause the application to change behaviour so that's something that I'll be looking to implement in the future and yeah let me in it still doesn't actually work because I haven't ported it to use this new different technique so I'm going to be implementing that too and hopefully you'll get something you can just point to add a login page and it will get you some credentials that would be nice also you can potentially do some really quite advanced stuff with this for example maybe you found a Java deserialisation vulnerability but you're on a black box test so how are you going to build a gadget chain well a scanner could potentially automatically try to build a gadget chain by just trying valid parameters searching for valid function names and object names and the like and using this different technique to identify ones that are actually working something well with a bit of creativity we can actually go way beyond finding injection vulnerabilities so take this vulnerability here which is as a human you can read that URL and pretty much know if you can increment the ID and edit someone else's profile that's a critical vulnerability but many scanners will not find that like Burp won't find that and that's because as a human you can't read it and know what it does but scanners can't read English and so they can't know but using this different technique we can identify that this is an innumerable parameter so we can do that by verifying that if we increment this value once and increment it twice we get three distinct responses from the application if we only get two you might think that's vulnerable but it could just be that the application the first response is giving you your profile and the other two responses are just giving you a message that says you can't edit somebody else's profile but if we get three different responses we know there's something interesting going on and that technique just by itself will find all kinds of entertaining innumerable things but it will also find some quite boring stuff like applications that are taking your input and feeding it into some kind of function so they aren't you aren't iterating over a finite resource you're just doing some transformation of your input so to find stuff that's really interesting we can follow up and ask is there a finite number of entries and we can do that by just adding a huge fixed value to the base input and verifying that the responses that we get are the same so in this example the application will just be saying that profile doesn't exist and that profile doesn't exist either so it should both match and I've implemented this technique in an early prototype and that's something that I'm hoping to code up next week if I have time yeah so you can grab the code online it's on github, pull requests are welcome and you can make your own personal tweaks to it of course if you prefer and you can grab the distribute damage code online there's a white paper too which details a great length but there's a couple of things that go wrong if you try and use regular expressions for response diffing and this is absolutely not one of those security tools that gets released in the presentation and never updated I know that this technique has huge potential and I'm planning plenty of improvements to it and as things turn out to be stable and reliable we're going to port them over to core but over time so the three key things to take away are that scanners can find research-grade vulnerabilities provided they focus on enhancing rather than replacing the pen-tester and this is still just the beginning I'll take five minutes of questions now and if you've got any more after that please come and talk to me at the back or send me an email if you want to talk to me about how much of a joke the new A7 in the old top 10 is I'd love to talk about that too don't forget to follow me on Twitter thank you for listening