 So, yeah, this is my improving web generability scanning talk. Yeah, it is not about memory corruption, it is not about buffer flow, it is about web scanning, but about cool web scanning. And, yeah, so we should probably start with the reasons for this talk because obviously there are like 100 tools for penetration testing web applications out there already. I mean, we know skipfish, we know a little bit of Nessos, we know the W3AF, SQL map, a lot of tools, a lot of good tools. And so the actual reason for this talk is not to build a new tool that does exactly the same thing as all the others do. The reason for this talk is to improve all the other tools and to build a new tool that acts like proof of concept for the strategy that I'm going to present now. And, yeah, so this talk is all about web applications and how the web changed in the last few years. And obviously we're not dealing with static websites anymore, with, you know, a couple of static sites with some links and some forms and maybe some, I don't know, Java applets. We're still dealing with Java applets, but there's a lot of new stuff. There are WebSocket, WebGL, all that stuff. So the reason for this talk is simply that all the tools that are out there right now are sort of using HTTP libraries and HTTP parsers to handle web content that they get through a TCP socket. And I had the impression that there's a better way to solve the problem. And, yeah, so that's the reason for the talk. Yeah, who am I? That's my first year at DefCon. It's my first talk. And then I do some software development. I did some KVM stuff in the call space. And I'm trying to build a startup. Yeah, so that's me. That's why I'm interested in that stuff and why I want to improve that stuff. So the structure of the talk is primary really simple. I'm going to talk about the problems of web scanning. How I, at least from my part of you, solved them. I'm going to try to prove it with some demos. And, yeah, when you want to switch over to the meta-sploit and Kinect talk, then after the demo is a really good time for that because then I'm simply going to walk through a lot of code. Yeah, so that's it. And how I'm going to improve the problem. How many of you guys are penetration testers or actually worked in that space? Could you raise your hands? I thought so. So a lot of you. So, yeah, you know the tools out there. You know what they do. You know how automated they are. And you also know that during a penetration test, we can't really rely on one tool as the final and the only solution for web apps. Because, for example, if you use the W3F framework to penetration test a website, it is very likely to happen that some encoding error is secure. And if you use SQL map to explore the SQL injection, it is very likely that that process fails. So what we do is we build tool chains. And we build really large tool chains. And that was nothing wrong with that. But I had the impression that we build tool chains with a lot of web scanners. And all the web scanners have the same problem. So in the end, we have ten tools that fire the exact same HTTP request against a web server. And we don't have any benefit from that extra graphic. And yeah, so that's sort of my feeling of the penetration test is pain. You start a tool against a website, watch the process. You have no idea what happens in the background except that some injection attacks happen. And some crawling and, you know, looking for backup files and I don't know. But it's like a black box in the end. And you can only hope that one scanner finds more venerable than the other scanners. So yeah. I think, yeah, I just tried to handle that in a better way. A more reliable way. Because in web scanning it all comes down to reliability in the end. So let's start with skipfresh. And I want to say skipfresh is a perfectly great tool. I mean, the code is great. It's super modular. It's super plain C. Really cool. But if you use it against JavaScript rich applications or a website with flash, it's completely useless. It's great for really static websites. It's perfect if you want to find hidden backups. Or if you want to DDoS a web server, it's perfect. But you know, to authenticate and do web sockets and attack web sockets, it's more or less useless. And the reason for that is it's based on an HTTP library. And I want to say again, it's a great tool. I mean, the guy even built a own HTTP library in C and it's cool. So it's a great tool. But I think we have a lot of HTTP fuzzers out there already. And we really don't want to attack HTTP even more. Because we all know HTTP. We know how the requests look like. We know how the responses look like. But there's an additional logical layer of vulnerabilities and features and websites. And you know, if the core of your scan is HTTP and the website is based on flash, you get a lot of binary content and your HTML library is going to try to parse that content and it's going to fail because it's no HTML. And so that's just not going to work. And a second example for that problem is the W3F. Perfectly modular. Perfectly, yeah, it's Python code. It's really cool plugins. Really cool stuff. But the core of the W3F is HTTP library. In fact, it's just a URL from Python with an additional security layer. So it's the same problem in the very end. And the third example is a great tool as well. SQL map. We all know it. We all love it for SQL injections. But when you try to actually, you know, use the internal heuristics of SQL map to find some forums on the website and to perform injections against these HTML forms, it is likely to happen that the internal HTML parser of SQL map is going to fail. And I mean, that's okay because it's as security guys, it's not really our job to write good HTML parsers and to create good HTTP lips. But if you're a penetration tester and you run SQL map against the website and SQL map says the HTML is broken, it's sort of not the result that we expected because we're not interested in that. And when you start your Firefox or Chrome or your burp suit or whatever afterwards and try to inject into that form manually, you'll recognize it works. And the HTML is still broken and still a really broken website and it may not look nice. But your browser renders the form. You can use the web elements. You see the input fields and you can simply fire your requests. So, yeah. So the sum up of all that stuff is, we have a lot of tools. All the tools are based on HTTP libraries and have some HTML parsers in them. They also have some, you know, regular expressions to extract some URLs out of JavaScript. Maybe they have even read us for flash files. But as we have seen with the browser example, there was a better way to solve the problem. And the solution is from my point of view because it is incredibly hard to extract information from a Turing complete language like JavaScript. It's incredibly hard to handle all the broken HTML and web stuff. So the solution is from my point of view to simply use a web browser. So I jumped over a couple of slides but I wanted to come to that final point very quickly. Because the fundamental reason for this talk is simply, and that's really simple, I created a Python web scanner and replaced the actual, you know, the core component that we know from all the scanners out there. I replaced that with a web browser. I took webkit. I used the PySight project to get the web browser in there. And I don't build my HTTP request and all that stuff myself. I simply used the web browser. I have full access to it. I modified webkit a little bit. I patched it. I can fire JavaScript events and all that stuff. So yeah. So I've talked a lot about HTTP libraries and other stuff now. But it all comes down to, this whole talk comes down to the web browser that I put in the middle of my scanner. And yeah. I mean, there are some other issues that we have with web scanning. For example, URLs used to look like the example that is on the slide up there. You know, slash news, slash question mark and some paragraph, whatever. And then the search engine optimization guys came and they felt good with completely different URLs. And they completely ignored the RFCs. They completely ignored everything. And it's not bad. The URLs look nicer, in fact. But yeah. We as scanner developers have no chance to actually extract dynamic parts of that URL. And to, you know, it's simply impossible to tell which part of the URL today is dynamic, injectable or whatever. It's simply impossible. So putting all the problems of vulnerabilities scanning together looks like that. We can't handle JavaScript. We can't handle slash. We have problems with authentication. We have web firewalls out there. And all of a sudden we also have business logics. We have e-commerce systems and all that stuff. So now I come back to my little browser approach. That does not solve all the problems. But it is a good start because it eliminates most of the problems. It eliminates the JavaScript problem. It eliminates the flash problem. There are still some Java applets out there. So if we want to support that, that is eliminated as well. And yeah. So from there we can start addressing a lot of different problems. And we can start to build actually better tools. Because we don't have to care about all the stuff that we don't like. We don't have to care about redirects, about some HTML stuff. All that stuff is suddenly gone. Because the web browser will take care of it. And yeah. So actually put a little example scanner into the slides. It's just Python. Yeah. And the code simply shows Python spawning a browser instance. Loading Google.com. Crawling Google.com. Logging into Google.com. Killing the browser instance again. And starting the fuzzing. And as you can see, that whole scanner has seven lines of code. And of course the actual scanner that I'm going to release today is larger. But the concept and the design of the scanner is exactly shown with that code. So the follow up is if we want to build penetration testing tools for the web that are more reliable and actually more useful to attack today's web applications. You know, not the web that was online 10 years ago with what the most dynamic element was a, you know, form and post request. Then we actually need to start actually attacking web content and not HTTP. And so that's kind of what I wanted to achieve with the included web browser. And yeah. So as I said before, I did not want to build a new tool and a new solution for everything but I also wanted to improve the existing solutions. Because we have great tools for a lot of different purposes already. We have SQL map. We have the W3F with skipfish with all those tools. And the only process that fails in all those tools is discovery. So when this process and all those tools is actually replaced by a process that works and that works better than the existing process, we can still use all those tools and they are great, as I said. But we can also put a lot of pressure off the scanner developers because now suddenly they have actually time to focus on security and not on, you know, HTML anymore. And that's kind of good. That's good. I think that's a good solution for the really annoying problems that come with the web today. So the second and probably more interesting part of that talk is not about simply putting a browser engine into a Python scripting but it's about authentication. And the authentication process that I implemented is still relying on the browser engine and still relying on the rendering and JavaScript execution. But it's also a different problem. And the problem it solves is not HTTP, you know, HTTP basic that is supported by every HTTP library and other stuff. But we have a lot of different authentication schemes on the web. And they all look like that, for example. You know, Justin forms an e-mail address, a username and what I try to do is to fingerprint these authentication schemes. And actually what I learned is there are three fingerprintable elements in today's web authentication. It's as this slide shows. There are always two visible text fields. The first one is, you know, for some kind of username or an e-mail address. The second one is for a password. And if you have a HTTP library based scanner, the problem is that the scanner is actually blind. It sees a lot of HTML content, a lot of text. But it's hard to tell which of those text fields is actually used for the username and which one for the password and which one for, I don't know, the description of your WordPress comment. So when you have a browser engine that actually renders the CSS and, you know, the actual website, it's really easy to determine which fields are used for which purpose. And the three fingerprintable elements of access control on the web is, the first one is there are two visible text fields. The second one is the web tries to protect you from the guy behind you who wants to see your password on your screen. And the fingerprintable element is the type of the input field. It's password. So that's an easy one. And so when the scanner actually found a web form with a password field, the only thing missing is to find the text field that is used for the username. And actually that's pretty simple. But as I said, it all comes down to reliability. And so I came up with a third fingerprint, actually a geometry. Because when you look at all those login forms on the web, you will recognize that the companies try to make websites look nice. And to do that, the text fields are parallel. And that's more or less always the case. So even if the first and the second fingerprint fails, that one is still pretty reliable. So in the end, we have a login function. And the useful thing about the login function is that it is provided by the web browser itself. So when you build a venerable scanner, you don't have to care about finding some login forms, building some HTTP requests. But you can simply call that function. And I built a lot of functions for different purposes. For example, you know, the search function of a website, the logout function of a website. You can check if you're still logged in and all that stuff. So with that, it is really easy to penetration test internal parts of websites and open source apps. And that's not given because it has been a problem for a long time. And I think a lot of zero-day exploits for a lot of web applications are there and are still being released every day because it was difficult to scan them in an automated way. And that's really easy now. And, yeah, so that's actually the first demo that I want to show, that whole login process. Because it was kind of useful. And to actually prove that the scanner is able to log in and that it has a browser engine included, the scanner will make a screenshot as soon as it is logged in. And, yeah. So that's what I'm going to show. So I don't really have a name for the scanner. I call it 360. Yeah, that's the current name. It will be released on JetHub today. So, for example, let's use a website that is known for its complicated JavaScripting. Facebook. We could also use Twitter or whatever. But that's how the source code of the Facebook website looks like. As you can see, there's really a lot of nasty stuff in there. And you really need to be able to execute JavaScript to handle that website in an automated way. So I'll show the login process. I created a test account. That's the password. I kind of have to show that. But, yeah. So we scan Facebook.com. The username is Defcon20X. Password is here. And the scan started. That's some QT internal stuff. You know, the site gets actually rendered. So there are some changes applied to the fonts. The more important thing is the login seems to be finished. And a screenshot has been created. And as soon as I started my engine X here, you can actually see it. Yep. Looks like that. The login failed. That's interesting but not important for the demo. On which form? Because we have three fingerprints for login. Do you know any web login on the web that requires you to enter a two password? Yeah. Can I come back to that in a minute? I just want to show the successful login. That's a great question, of course. So let me just show the login. I think I just forgot the Gmail part of the mail address. Here we go. Logging in. Ready? Okay. So we're logged in. But the actual reason why I said it was not important that the login failed was because the scanner tried to log in by just and I proved that with the source code now. That's the source code of that example script. I simply get the browser object. Sorry for the noise. I simply get the browser object. I simply get the username and password from the command line and simply call that function. And I'll walk you through the different functions afterwards if you want to. But there is no, you know, there are no, I did not prepare the scanner to log into Facebook. And I'm going to prove that afterwards. But to come back to your question, your question was there are two different forms on that website. And thanks. Yeah. The cool thing about web forms is form is a specific HTML tag. So you have, you know, a HTML form in a HTML document looks like that. Starts with the form tag and ends with the form tag. And in one form tag are specific input elements. And if you want to submit a login form, you can, you know, usually you only submit the input fields in one form tag. And in a login form, there are specific data required. One, the username, one, the password. And in a registration form, which is another function of the scanner, by the way, dot register to sign up. There are different elements required. And I showed those three different fingerprints a couple of minutes ago. These ones. It is actually really easy to tell those different forms apart because it is a problem, of course. There are password fields in a registration form as well. But usually there are fingerprints to detect that as well. And the easiest fingerprint is there are more than two visible text fields required to register on a website. For example, birth date or email address or, you know, that stuff. And the scanner simply fingerprints the actual login form on Facebook by, you know, just two input fields, one password, one username. Yeah. I can show some more demos for logins on Twitter, on Google plus, on pics.devcon.org. If, I don't know, if you want to see that, it doesn't seem so. Yeah? Yeah. Okay. Okay. So I don't have an account for pics.devcon.org, but we'll see if the login would work. If you have a random website in your mind that has a really complicated login scheme, we can try that as well. So just as a proof that I do not, you know, prepare it. But okay. The pics.devcon.org one is finished. Looks like that. And the login. That's good news because, you know, the account is not valid. And the scanner at least tried to log in. I'll try Twitter now or Google plus. And as you know, Google is known for its really specific, really cool WebKit hacks. So I don't want to be the guy who builds a Web scanner based on HTTP and HTML parsing that has to, you know, parse Google JavaScript. So that one is finished as well. Seems to work. Good. Cool. So, yeah. And the site folded. But I'm going to talk about that later. So that was authentication. And the other demos that I wanted to show are all about, you know, I built a little PHP test app with very common vulnerabilities. As a fast TGI, beyond that, engine X setup. Looks like that. There's some JavaScript in that. For example, this link here redirects through two different kind of redirections. One is HTTP header based. The other one comes with a delay of one second and is fired by JavaScript. And regular crawlers used to break with some sort of redirection of that kind because they're simply not supposed to handle it just by parsing HTML and that stuff. So let me show that one as well really quickly. For example, there is a, you know, SQLI slash cross scripting slash whatever here. But that's not the important part. We have tools to exploit XSS with a delay of a couple of days. We have tools to exploit SQL injections, local solutions and whatever. The important thing about this talk is the detection of the feasible elements of the website. And this feasible element, the food.php, is not easy to find for a crawler. It's not that difficult, but it's not that easy as well. And this is the source code. The link is simply blood.html. And as I said, it simply redirects to food.php with some weird JavaScript slash header stuff. And I'm not going to tell the scanner about the food.php, but just about localhost. And we'll see if it is able to find the venerable file. Yeah, the problem with that stuff is to capture. It's not to detect the login field because the heuristics still succeeds. There's a username and a password. The difficulty is to capture. And we have some capture-solving approaches. I think one guy hacked the Google capture, the Google audio capture a couple of weeks ago, which was pretty cool. But the problem about that is not about detecting the login form, but about breaking the capture. And I have an approach for that as well. I'm going to talk about that a little bit later. Yeah, great question as well. By the way, the Google capture can be simply bypassed by ignoring it. I'm going to demonstrate that in a couple of seconds. So, yeah, the scanner found the food.php and posted some kind of a unique XSS string to detect a reflective XSS. Really boring, as I said. This talk is not about exploiting venerable leaves. I could have written some more cool demos, OS commanding venerable leaves and pushed in a reverse shell there and exploited it and exploited the kernel. And I don't know. But this talk is really not about exploiting venerable leaves, but about getting to the point where you can exploit the venerable leaves. It is really simple to exploit the kernel and install a root kit in there once you're in. But the difficult part is to break the defense and to build the heuristic. And once that is succeeded and when you can start to fire your hundreds of thousands of HTTP requests in there to, you know, exploit the venerability. 99% of the work is done. Exploiting the venerability is really easy. But working through all the JavaScript stuff at sites like Facebook or at e-commerce systems, that's the really difficult part. So, yeah, that's where I chose a really stupid example. I have a better example as well, a local file inclusion and some stuff. I can show you all that. But as I said, this is about discovery. Yeah. So if somebody knows a cool online scanner benchmarking site, I'll show you the most there as well. But, yeah. Yeah, so that's how it works. Okay. Good. So back to the slides for now. This is going to be about software architecture and about breaking captures and about doing the, yeah, telling registration forms apart from login forms and why that works. And so this is not going to be about high level stuff anymore. So I'm actually going to show a lot of code now. So if you don't want to see a lot of code, there are more high level talks like the connect talk, for example. Okay. So actually the architecture of web scanners is actually always the same. It's all about a core element. As I said, mostly the HTTP library and a HTML partner. Then there are a lot of plugins. And the plugins can be files. They can be stored in a database or they can be simply an array or a hash map. I don't know. And in those plugins, I'll show mine. My plugins, for example, the injector plugins, you know, they describe local file inclusion, venerabilities, OS commanding stuff, file upload, SQL, for example, the SQL plugin looks like that. There are some payloads and some ways to modify those payloads and different techniques to verify a venerability. And this plugin, as you can see, is really, really simple. There are simply a lot of arrays in it. That's it. And probably cool to mention about the SQL plugin are the different payloads because they are not that usual. For example, I mean, we all know the single-quote payload. If we start penetration testing our website, we throw some single quotes in there and look what happens. That's how it starts. But there are a lot more things you can do to actually fingerprint the architecture behind the website. For example, if the website wants you to enter an integer, you can throw a zero in there. And if it acts normal, you can throw a minus zero in there. And it's not that easy to enter. So you can throw negative integers in there, really large integers. You can throw 10 megabytes of strings in there and look what happens. And this is exactly what this SQL plugin does, not for just fuzzing and generating error messages, but to fingerprint the SQL server behind the website. Because I recognized that the SQL servers break in different ways. For example, if you throw a hex to 55, backslash XFF in there, my SQL mostly says something like, yeah, that's not encodable by unicode something foo. And this SQL statement fails. So if you throw a backslash XFF in a website and the website responds with nothing, you can be pretty sure that that's a my SQL server. And yeah. So I wrote a documentation for all the payloads and the reasons for them and how to fingerprint them. But, you know, it's really nice to see that there's a lot of fingerprinting work that can be done with different payloads. However, I was talking about the architecture of the scanner. And as I said, there's a core. There are different plugins for injections, for different types of applications, WordPress, I don't know. Wait a second. Yeah, there's some web shells to prove the exploit in the end for different languages, Perl, Python, and so on. And more important probably is the core element of the SQL server. And the scanner. And the browser object in it, which is simply based on PySide. PySide is a really cool framework that, how much time is left? How much, yeah. PySide is a really cool framework to use QT libraries from Python. And they did a great job doing that. And there's actually an alternative for that. It's called PyQT4. But PySide is much more stable. But as you have seen a couple of minutes ago and not that stable, it's still, like, fault a couple of times. I don't know why. I tried to, as Tracer, to co-dump it. But it's really some complicated stuff to make C++ and Python work together well on the web side. So, yeah. So there has actually been done some work for that stuff before. It's called a project called Spinner. They used that PyQT4 framework. And they built a web browser in Python and a stateful web browser. And I really tried to adopt a lot of these features. For example, the snapshot feature to create nice screenshots. And to use proxies and all that stuff. But the really important elements that I changed are the patches to WebKit itself. To the C++ code of Q WebKit. So a really difficult element of web scanning is JavaScript. And not just to execute the JavaScript, but there are events in JavaScript. For example, you know them from all kinds of apps. On mouse over, on key press, on focus. And I don't know. And if you want to crawl and spider all the possible locations of a website, you have to execute all these events. And I tried to do that from Python. And I failed. So I patched WebKit and created a function in WebKit that would fire all these, you know, event listeners for me. Because I looked for a function that not just let me trigger a event listener on a web element, but list me all the event listeners that are listening on a web element. And I learned that there is no, I mean, there was a specification for it a year ago or so, but they never really implemented it. I think in Netcape 6.0 something is a thanks for that. Yeah. So it's actually a really cool fact that we're not forced to use a original web browser engine in our scanner, but we can use our own one. And our JavaScript engine does not really have to do everything that the JavaScript of the, you know, the site that we are scanning wants us to do. Can also pull some callbacks in our scanner code. So it's really cool to have that kind of access. And another nice technical thing to mention is the fuzzable object. This is something that the W3F author, I mean, the idea is from the W3F, obviously, and I think it's a really cool idea. So what those guys did is they don't create URLs and forms in specific ways, but they create objects. And they fuzz those objects. And that's really more stable than just replacing different parts of strings. And, you know, it's really nice to handle that. It's really nice to save fuzzable objects. Because when your scanner found a vulnerability, you can simply save that fuzzable object at the state of the fuzzable object, and you have an exploit ready to use, ready to release. Because, you know, when you found a zero-day exploit in WordPress for OS commanding with that scanner, or a scanner based on that structure, you can simply dump that fuzzable object and release it on somewhere. And it's going to be really easy for you to, yeah. Okay. Fine. So I think there are some more slides left. This one is about further research and other improvements to the actual concept of not just building a TCP socket, throwing a HTTP request in there, getting a response and parsing the HTML, which obviously doesn't work in today's web anymore. But, you know, taking a browser engine and modifying it and actually starting to scan the web, not just fuzzing HTTP. Because in the meantime, after the last 10 years, we know HTTP well enough. And we don't need to do all that stuff just to fuzz HTTP. So, yeah. So the further research that I will do and that makes sense is to improve the binding between browser engines like ECHO, WebKit, or I'm not sure if it makes sense to use the Internet Explorer, but if you want, if someone wants to use it, that makes sense as well. Maybe. Not. So the other thing is WebKit, of course. Maybe it makes sense to replace WebKit and that approach by GACO. Because actually GACO of Firefox, I mean, GACO is the run engine of Firefox for those who are not that familiar with the web browser architecture stuff. And Firefox and GACO are, in the end, the more high quality projects. Because for those of you who are into exploits and browser exploits and browser vulnerabilities, you know that there is a remarkable amount of WebKit exploits and released in short periods of time. And I'm not sure when the last GACO exploit was actually released or when the last GACO vulnerability was reported, but, you know, there are not a lot of them out there. At least not public ones. So it's actually a good sign. Yeah. Another idea is, are you guys familiar with the Wolfram Alpha dot com engine, some of you? It's just a mathematical approach to analyze, to calculate the web. And I thought it may be a good approach to analyze and calculate error messages in the web. Because, for example, as we have seen, when a log in fails or when the scanner's heuristics failed, and it may happen, I don't adopt that. And the website answers with something like, hey, you need to enter the verification password for your registration. And the scanner actually wants to trigger a web login. The scanner can parse that error message, extract the important words and learn from the error message. So the next time you call the login function, the scanner will know that it should not use that form, but the other form. Because, you know, the error message said that it was a register sign up form. Yeah. So the last and probably most weird thing that has to be fixed is the requirement to have a X-server slash a Wittel frame buffer. I used XVFP. Yeah. That's a requirement to have such a project installed and to actually run the scanner that I'm going to release. Because, you know, WebKit actually wants to render the website. And even if we don't want to see it except when we want to have a screenshot, it needs a X-server frame buffer back end to draw the website. I'm still not sure if it is necessary or not, but it's not, you know, it's not a good thing to, the need for a graphical environment on a command line based WebScanner is just awkward. So, yeah. The content beyond this talk is going to be a framework for WebScanning. It has a browser engine included. It has, I don't know, 2,000 different plugins for different vulnerabilities and old WordPress exploits and, you know, JUMLA and Drupal and all that stuff. That's going to be a mathematical engine to do some of the error message parsing. That's going to be a Google translate engine to translate every website to English even if the English is, you know, wrong and strange. It is still enough for the language parsing engine to analyze it. And, yeah, the WebKit patches are included as well. At the second component is the scanner itself. It's not as sophisticated as other scanners, but it's enough for a proof concept and I actually started to write a nasal script to include that into nestles. So the nestles will be able to do some effective scrolling in the future. More effective because right now, as I said, it's just HTML. And, yeah. So, yeah. So, that's basically it. Yeah. So, yeah. So, yeah. So, yeah. So, yeah. So, yeah. So, yeah. So, yeah.