 Alright, cool. So, yes, as I mentioned earlier, I'm not Yan, but that doesn't really say who I actually am. That could be a large number of people. So, for those that don't know me, but I see a lot of familiar faces in here. I'm Adam DuPay. I'm an assistant professor here in SISI. I work really closely with Yan. We co-direct the lab together and do a lot of research together. And what Yan wanted me to talk about and ask me to talk about today is I know you guys have been doing a lot in terms of binary analysis and looking at different types of fuzzing techniques and how to essentially find vulnerabilities in a binary. And so, what Yan wanted me to talk about because what my PhD was on and what I still do research in is how can we do the same thing in a web application? So, how to automatically find vulnerabilities in a web application. That's what, I don't know. So, automated web vulnerability analysis is the name of the game. Cool. Okay. So, what is vulnerability analysis? Finding vulnerabilities in some program or software. What's vulnerability? Anything that is unexpected by the original program. So, if I make a typo in a dialogue box, that is a vulnerability. If your typo sets the buffer to negative one, a mistake that can be used maliciously. Maliciously. What does malicious mean? Something which was not the intention of developers and designers. Something which was not intentional from the developers and designers. But what if an unintentional behavior could be that it pops up a dialogue box or something? Anything that leads to a undecided state of confidentiality and availability? Right. So, that's like more of the traditional definition, right? So, there's ideas between confidentiality, integrity, availability, the CIA, TRIAD, which I'm sure you talked about. So, let's think about something like this. So, let's say I have, I know of a web application that, and I found this behavior in this web application where anyone can change the content of any page in that application. Is that a vulnerability? Yes. Yeah. No. Why, yes. Well, okay. It depends. It depends. It depends. Why? Why does it depend? It's supposed to do that. It's supposed to do that. Exactly, right? Wikipedia, that's the core functionality of Wikipedia. The fact that anyone can edit any page at any time is a feature of Wikipedia. And obviously they have access control mechanisms. They do have ways to lock pages. But fundamentally, you can essentially do anything you want on Wikipedia. But if you find that same behavior now in CNN where you can just edit the front page, right? That's a clear security vulnerability. So, kind of when we think about vulnerabilities, I always want to keep in mind that context is incredibly important. So, if you don't understand the application and what it's supposed to do, and this is actually one of the, I think one of the main differences between some of the binary analysis stuff, at least as it currently is, and the web stuff, because a binary, you know, if you can corrupt some buffer on the stack, that's highly likely to be a security vulnerability. There's no way they intended that behavior, right? Like, oh yeah, we want, you know, an adversary to be able to control our instruction pointer. Like, no, that doesn't make any sense, right? But, and so that's what we need to think about. So, and vulnerabilities fall into essentially two classes, especially when we think about web vulnerabilities. And this gets conflated a lot when I've seen, when I talk to people and be like, oh yeah, right, well, we'll talk about it in a second. So, some vulnerabilities are known vulnerabilities. So, these are ones that are already reported. Usually they will have a CVE number. What's the CVE stand for? Have you guys talked about CVE numbers or CVE? I actually don't know what the E stands for, but exposure is there you go. I knew. So, yeah, so there's a, basically a corporation, and I think in partnership with the computer emergency response team in the US, basically when you find a vulnerability, you can report it to them. They'll give you a number so that you can describe it, and then they can help you or you can decide what you want to do about it. So, some of the really cool ones are like in 2013, there was this Ruby on Rails remote code execution. So, Ruby on Rails had this functionality to do XML deserialization, and it also had a functionality where you could send an XML request to the Ruby on Rails server, and it would automatically perform XML deserialization if you sent XML to it. And it turns out the way Ruby did XML deserialization, you can instantiate arbitrary objects, which could then lead to arbitrary code injection. And this required no, it's a completely unauthenticated request. So you could just send this request to a web server and have full essentially remote code execution. You know, these are all, and then the CVE will have more information usually specifically about what versions are vulnerable. So for instance, like this CVE is JUMLA 2.5X before 2.510 and 3.0X before 304 allow remote authenticated users to bypass intended trigger requirements. So why do they give all this information? Let's go with a higher system. Yes, so this is part of the point of releasing these vulnerabilities, right? If you're an administrator of a system, let's say for a very large corporation that collects credit checks and credit information on Americans, you could see, ah, are we running JUMLA 2.5.9? If so, then we're vulnerable. And either I need to apply the patch and upgrade, or I need to put some mitigations in place to make sure that this doesn't happen. And so, so these are what I, so previously, right, you can think of, and these are timestamps essentially, 2013, this was found in 2013. This is just an auto incrementing number. So before 2013, like when, let's say version 2.5.0 was released, right? This was an unknown vulnerability that nobody knew about in JUMLA, right? But after, then it's now a known vulnerability that if you see that there's a version of JUMLA 2.5.4, you know you can probably try to use this vulnerability against it. So the flip side there is unknown vulnerabilities. So this is some vulnerability exists in a custom web application. And, ah, like zero-day vulnerabilities, you don't have an assignment three, so no, no, no, no. So what's a zero-day vulnerability? Why is it called a zero-day? There's nothing to start, nobody knows. Like the good guys don't know that it's there. Yeah, there's been zero days that the attackers have had to fix a vulnerability, whereas these ones are, say, end-day, they've had end-days. In this case, end is a very large number, right? It's five years to fix this vulnerability. I'm sorry, I just noticed this. This is a tablet for the writing pillar. That's super cool. Okay. Thanks. I have one too, but I didn't bring it out. So this is actually, you know, for an organization, it actually doesn't matter, right? All you care about is what type of software are you running? Is it vulnerable? And if you're running known vulnerable stuff, you're already crap, so you should be basically be fixing that stuff. And so to me, the known vulnerabilities are less interesting, especially from our research standpoint of what we're trying to detect. What we want to detect is finding unknown vulnerabilities. We want to find vulnerabilities in an application that nobody knows about, which is similar to binary analysis, right? In binary analysis, you want to find, you're not necessarily or not yet trying to trigger some known vulnerability. You're trying to find a unique crash, a unique vulnerability in the application. So, how do you go about finding these vulnerabilities? Hire someone. Hire someone? And then what do they do? They're going to look for vulnerability. It's going to be expensive though. Yeah, it's going to be expensive. So you can hire, well, you can just hire someone, right? You can just hire some undergrad or something. You can hire somebody in this course, right? In this class who has some security training. You can hire a company who has security professionals. Yeah. I'd start by figuring out what kind of software the web application is running. Yeah, I mean you want to, so we're not talking yet about how we're talking about ways to find vulnerabilities, kind of in the context of an organization. So, yeah, so, you know, like essentially manual process. Just like you can analyze a binary, you can find the vulnerabilities in a binary by looking through the disassembly. And then try to trigger those vulnerabilities. You can use tools like GDB to help debug it. You just pay people. Like this is a common northern capitalist society. You just pay someone to make your problem go away, right? But it's super expensive. I mean, we're talking about usually for an app, I don't know, rough. I think it costs anywhere from probably, I'd say, $5,000 to $50,000 or $100,000 for like a, depending on how long of an engagement, one week to, or one day to, you know, a few weeks. And so, because of that, if you're a company, you're a manager, or you're in charge of securing your organization, how often are you going to run these things? How much money do I have? Is this forced to? Essentially, yes. It's often as you're forced to by law or by your trade group, which that's part of, so PCI compliance is the processor card and payment card, what's the high stand for? Initiative. Initiative. Something. Yeah, that sounds good. So, if you want to have a company that touches or stores credit card data, you, the credit, the payment card company is like, I believe it's probably Visa, MasterCard, American Express, they have a list of standards that you must follow in order to do that. One of those standards is having, I believe, yearly pen tests of your system where you hire somebody to try to penetrate your network and then fix. So, they're expensive. You have them maybe once a year if you're in a good organization, and what do they actually tell you? So you get a report from them. They say, we found these five things. What do you do with that information? You fix those. You got to hire somebody to fix it? What? Yeah, I mean, you should be able to fix it or something in your org should be able to, but yes, essentially, you need to fix those five things. What else does that tell you about your system? So now you fix those five things. There's at least five things. Yeah, there's at least five things, right? It doesn't actually tell you any information about the state of the network, right? Or the state of your system. So, and this is actually true of all types of vulnerability analysis. This is something you should, just because you run a cluster against your system doesn't mean that there are no vulnerabilities if it doesn't find anything, right? Similarly, you pay someone to find bugs in your system or do a pentest. And they may say there's five or 10 or whatever, but that actually has no bearing on how many there actually are in that application. The flip side of this is what you're learning in this class for binaries is develop tools that can do these things, right? Because humans are expensive. The good humans cost a lot of money. And so, and how often do our systems change? Two years? I mean, in some organizations, I would believe that's true, but think about a website you use. How much does Facebook change? Like, hourly, I think, or daily, or something. I think the best moment is to, before freezing the code, before, you know, releasing the new version, you must have a penetration test before that. So, Facebook releases every day. So, you want to tell Facebook you should be stopping before ever release and doing a pentest and hiring this professional firm to come penetrate your networks every day. Well, actually, the penetration testers are part of the testing team. So, in the SDLC software development website, the test is an essential part before freezing the code. Sure. So, but now, even then, you have, you're now, essentially, rather than hiring somebody else, you're bringing them into your organization and you're saying, do a penetration test every day. It's going to be sustainable in a push code every, exactly, that's the problem, right? The problem is, exactly, and that's what companies do, is they do it every six months or every year, right? And so, you get into this cycle where, yes, the penetration test does stuff, they find vulnerabilities, they find bugs, you fix them, but then, now, your code changes every day or every week or whatever your release cadence is, your systems change, right? And so, to me, that's one of the key reasons why you need automated tools, is because automated tools are something you can run after every code change and you can create as part of your software development life cycle, right? You can't actually release code until it's been, until we run some automated tools on it. So, this is, and this is why I find this area so interesting, is how to, I mean, there's intellectual challenges of how to create these tools, but they're also incredibly useful to an organization. And a company to try to automatically find, and they're cheaper, right? You can run these, all it takes is CPU cycles, right? To find vulnerabilities. So, compared to a human, and humans, for those who have never been in a company, humans are actually much more expensive than you think, like in a company, right? Because you're not just paying their salary, you're paying also their benefits, the rest of the work to support them it's usually, the rule of thumb I've heard is usually double the salary, and that's how much roughly a person costs the organization. So, what we want is a list of vulnerabilities, right? So, we just talked about this, when we fix all these vulnerabilities, we're definitely not secure because there's no guarantee that the human or the tool found all possible vulnerabilities in the system or software, right? If I run some web scanner, I run some fuzzer, even if it tells me, aha, here's 10 vulnerabilities, I fix them all, it doesn't guarantee anything. And, even if, let's say, I mean, even if you have a super advanced tool that could theoretically find all possible cross-excripting or whatever vulnerabilities, that doesn't mean that there aren't these type of logic flaw vulnerabilities where somebody could edit your page is not supposed to, right? So, there's, when you think of the class of vulnerabilities, the class of vulnerabilities is huge, right? And then the subset inside there of the class of memory corruption vulnerabilities, right? There's a class of SQL injection and a class of cross-excripting. There's a whole lot of vulnerabilities that depend on the application itself. And also, you know, insider attacks, social engineering, right? These are all types of attacks that as an organization you need to be thinking about. So, and in the web, so, the web, this is one thing why I like to explain this, is a lot of professional people when you say like, I do automated vulnerability analysis for the web. They're like, oh, great, so you create a better tool like Nessus or a tool like that, which is looking for known vulnerabilities. So, I think about these as kind of the, if you're relating it to like antivirus tools and be like a signature-based antivirus where they just have a huge database of known vulnerabilities and say, if you make this request to the server and you do like in the previous example, it's Junela5.0.4, then that's vulnerable and report that as a vulnerability. So, they have signatures, they scan your network, looking for signatures, and they only find, and this is the key, they can only find known vulnerabilities, right? They can only tell you, you are vulnerable to this vulnerability that's been released, right? They don't tell you anything about any, they have no, I feel like, I mean, I think the way to think about this is they have no intelligence of their own, right? They're not actually thinking about or reasoning, they're just looking for signatures. And the other trick is they can only find vulnerabilities that live on known or expected ports, right? So, you gotta think about this, you're inside of a network or you're admitting a network, you have a thousand machines or let's even scale it down a hundred servers, right? And you're gonna run a Nessus scan against all those servers, you're gonna try every single combination of ports and signature on every single machine because developers can often stand up, like you got a Ruby on Rails app that's running on port 40,000 on some server that a developer created three years ago that nobody knows exists. So, depending on your configuration, this vulnerability scanner might not find it. And this is actually, so there's some tools here, look at them, Nessus, OpenVas, and Stink. So, this is actually a real example that I'm bringing up. So, I did some pen testing for a company and I ran the OpenVas because it's part of the thing you basically have to do as part of your pen test. So, I ran OpenVas against their system, I configured to make sure it was hitting every port, you know, UDP, TCP, everything. And it told me that their credit card processing server was vulnerable to an IPMI known username password attack. So, IPMI is the management protocol on a lot of servers. So, if you don't know a lot of professional servers because anybody works with servers in a server room or have, no, I mean, not our server room, I mean something real. So, you can't, so oftentimes servers are in a data center which is guarded and you have to have multiple biometrics to get in and so you can't just physically go there and reboot the computer if it needs to be rebooted or to update, I don't know, various kinds of things. So, there's actually a separate processor and a lot of servers that runs this IPMI and you can connect to that remotely even if your actual server is completely stuck to do things like remote administration. And I could have done things like added a new kernel so that completely root on their system that's processing credit cards but as soon as we told them what we found they said don't do any of that because it's in production it's processing credit cards like that would be terrible for our business. And they were actually shocked that we found that because they had set up Nessus to scan daily across their entire network but their configuration was poor and they weren't checking all the possible reports that they could and protocols. So, yeah, they were really upset but like, you know, the combination of glad we found it but upset that it happened in the first place. And fundamentally they can't find an unknown vulnerability, right? It doesn't make sense. There's no smarts here but they definitely are useful, right? These are useful tools that can tell you especially if you're doing this frequently that hey, you're running some known vulnerable version on your systems. And this is why I love automated vulnerability analysis so we need to automatically find completely unknown vulnerabilities in typical software. But this is an overstatement, right? Like I said, we can't, I mean we're not at a stage yet where we can just find any arbitrary vulnerability, right? We have to usually well define it into a specific vulnerability class. So, yeah, usually we'll know the class will know, okay, we can develop a tool to detect scripting vulnerabilities in a web application. It's probably impossible but who knows what's going to happen in like 20 or 30 years. So, I don't know. That'd be crazy. So, cheaper, scalable. One of the downsides though and this is actually, so the tool can be wrong, right? The tool can, it's not a human so it may be incorrect. It may say I think there's a vulnerability here and it may be up to a human to verify. Humans can also be wrong and if you don't believe that, go look up some examples of really poor bug reporting on security, bug bounty programs. There's some horror stories you can look at there and people who just like aren't. Or actually I will, I think it's the, this was probably two or three years ago in a class so I think the time is expired but I had a student so I gave him access to a server and I said if you get root access on the server like as the root user, the admin user on the system will give you massive extra credit points. So, then I had a student email me like I got root on the server and they sent me a screenshot and it's because they had done ls-la-slash on the root so they're like I can see the root of the file system. I see that that's confusing but this is not what we're talking about here but it's not eligible. So, this is something I had to deal with. This is probably I'd say thinking about the binary analysis field where it is now. This is one of the best things that they're at right now is actually verifying vulnerabilities by creating a proof of concept exploit. So, that's part of why that's so important is because you can say look I actually have a vulnerability and I've verified that it actually works because here's the proof of concept that exploits this bug. So, the key thing to think about, if a tool tells you that there's some vulnerability of some class SQL injection across the scripting what are the possible outcomes or what are the, what's not outcomes? So, what are all the possibilities in regards to that output? I mean, what are, so what are some, so a tool, okay, so let's think about this. So, a tool tells you there's a vulnerability, right? Either, there is a vulnerability, either it's correct, there is a vulnerability there or it's incorrect, right? And so, that's basically the two options when it tells you that there's a vulnerability. There's also the possibility that there exists a vulnerability somewhere else that the tool did not tell you about, right? So, this was, so this is how we classify these things. So, this is the way I always think about true positives and false positives is a true positive is a true report. It is actually vulnerable and the tool told you about it, right? So, it's a vulnerability that the tool reports that's actually vulnerable. These are the best. These things feel great because you don't have to spend time finding that and the tool found it. A false positive is when it, is when a tool reports a vulnerability that's not vulnerable, right? So, this is terrible. This is, as a developer this really erodes your trust in a tool because you just ran it and then you have to spend time verifying is this actually vulnerable and you look through code paths and you try out some exploits and you realize it's not actually vulnerable. Why am I wasting my time tracking down these not vulnerable, these false positives? And finally, false negatives. So, this is what we're talking about with, once you run any tool you actually have no guarantees about any other vulnerabilities that exist on a system. So, these are real vulnerabilities that the tool missed. All right. Yeah. So, false positive rate. Okay. So, we're going to talk about two basic types here. Good. Good. Okay. So, when I think of web vulnerability analysis there's two main types here. Black box and white box. So, a black box tool has completely, so, analyzes a web application by issuing HDP requests to the application getting some HTML responses back and maybe making some more to try to find the vulnerability. So, I think of the black box as simulates basically a human hacker who's external to your org as much as possible. So, this is, the tool essentially has, so the key is, has no access to source code. So, the tool has no idea and one of the beauties of the web which we won't get into is that, you can write a web application in any language. You can write Python, PHP, Ruby, C code. I know one year, Yann coded the ICTF registration page in a bash script. So, yes, that probably doesn't surprise you. So, and the really awesome thing for black box tools is they can find a vulnerability in a specific class against a remote web server without knowing or caring about the server-side language. So, this is, and I think of this as, in terms of binaries, it's essentially, you can think of, so a lot of the binary analysis basically requires that they can analyze the binary code. So, they need to understand the either x86, x86, x8664, the MIMPS ARM whatever code so they can fuzz it, or actually maybe that's not true. Okay, flip back. So, black box, I think is very similar to just straight fuzzing of just throw input at a binary, see if it crashes. If it crashes, then that's your signal that yes, that input triggered some vulnerability or likely triggered some input. The newer stuff where you're doing symbolic execution and understanding the path of the program, those type of things rely much more on you being able to understand the binary. If you have some binary that you don't know what it does, then the fuzzer can't work on there, right, or your symbolic execution engine can't work. These black box tools, the key paradigm is that they cannot, I mean they have no access to the source code at all. So, they're using the application just as the user would. Whereas on the flip side, in SAC analysis tools, so most of the event that we have is a lot of the web languages are written in scripting languages. So, it's the difference between like a scripting language, like Python or PHP and C. The result right away? You get the result right away? How? You have a how? Why? There's a runtime. There's a runtime? Support. He has a runtime language. Python's interpreted. Interpreted, yes. The key is that there's an interpreter, right? The Python program is an interpreter, which inputs a Python program, creates the abstract syntax tree, and then starts essentially executing it or converts it into bytecode. Yes, yes. What's the interpretation? I forgot that. Yes, converts it into bytecode, but doesn't actually execute, it doesn't translate it directly to x86 assembly that the CPU can actually execute, right? So it's interpreting those bytecode instructions similar to Java, right? You write the Java program, you can compile it into a class file, but then that's interpreted by the Java VM. Are we talking about that? Oh, because, so this, what this means is most web applications because you gotta think, so they're accessed over the network, right? So you have some application running somewhere, it's accessed over the network, so I don't know what's like, so you need at least three round trips to initiate a TCP connection to a server, so that's, I don't know what, like I don't know, I won't give you make up numbers, but that time it takes to set up the connection is going to be overshadowed by the time it takes to process the request, right? So you can make, if you rewrite your application from PHP to C, right? You may be able to make that request slightly faster, but overall it's going to be roughly the same because you're mostly dominated by that, and actually you're also dominated a lot by the database as well, so, so the language doesn't matter, and A, the language doesn't matter because anybody in any operating system running any web browser can access your web application, then you're actually not tied to a specific language like C or C++ that's going to be compiled that they're going to run locally. So this means when we do white box analysis, it means we look at the source code of the application to identify vulnerabilities, and this is tied to a specific programming language and possibly a vulnerability class. So you can write a tool that analyzes, let's say PHP code and looks for SQL injection vulnerabilities, which is great if you have a web application that uses PHP, but if you wrote your application in Bash, you can't find any vulnerabilities there, right? The tool is essentially useless, and because there's this huge range of, I mean PHP is still the dominant language, but there's a lot of JavaScript, I think it's the new up and comer where people are using Node.js to write web applications, you have JavaScript running both on the server side and the client side. So I'll briefly talk about black boxes, what we're going to dive into today. How long do I have? 1.15? Okay, cool. So black box tools are, well, they have been very dumb. They basically act similar to a human, they crawl the web application. Why do they crawl the application? What does crawling mean? Exploring all the links. Exploring all the links in the application, preferably just on that application, if you want to limit your scope. So they crawl all the links, so why crawl all the links? Increase, can you increase the attack surface? Your knowledge of the attack surface. Yes, increase your knowledge of the attack surface. The important thing to think about, just like a binary, if a binary is reading from any of the argument vectors, that's a potential way your input can get into the application. Or if a binary is reading from a file, that's where you can maybe put input. Or if a binary is reading from a network socket, that's where you can put input. It all comes back to the concept of injection vectors. What are all the ways that an attacker can get malicious data into this application? Similar thing with a website, you want to know where are all the places that I can give input to the application. Turns out crawling is actually not enough, just in the sense that you don't really know in a black box manner what the application is actually doing on the other side. Maybe it's taking tweets from a Twitter account and then using that as its output. It's talking to third party APIs, which you could inject data into that will eventually go into the application. Even just crawling itself is not necessarily enough, but it's the start. You have to do this to discover the attack surface or explore the attack surface. Then you essentially fuzz it. It's actually not that different, except in the fact that a lot of these tools basically have a... They say, hey, as you're looking for SQL injection, here's 20 possibilities. If any of these... Then they have some way to detect on the response whether that was correct or not. Basically, when you're fuzzing, it's the same concept with binaries. You're trying to trigger a vulnerability. In the binary case, you're wanting to trigger a crash. Here, it depends on the class. A SQL injection, you're usually trying to trigger some kind of SQL error. On the other side, when you're cross-eyed scripting, you're trying to look for your output in the correct way to execute JavaScript. Then you need to analyze. You need to figure out if the fuzz attempt was successful. In binaries, this is trivial, essentially. Fuzz, and fuzz, and fuzz, and see if you get the second fault. Then you can work from there. With the web, it's a little bit more complicated, although not, I'd say... It really comes down to the vulnerability class. It depends on the intrinsic properties of that vulnerability. For instance, we talked about in the case of a... Yeah, vulnerability class. In the case of, yes, arbitrarily editing a web pages content. You could develop a tool to detect that, but detecting whether if that's intended functionality or not is a very tricky problem. This analysis problem is the difficult part. This is the way I normally think about a web application. This is going to be a super abstracted view. Here, we're going to have this tool or this web application vulnerability scanner. We have our web server running some server-side code. In this case, PHP. Talking to some back-end database, let's say, MySQL. Anybody do web programming here, or have done it? What type of things are missing from this diagram? Usually, you have engine X in the front. In reverse privacy, you get some caching stuff. Caching, that's a huge one. There may be caches at any point in between here. There may be a delivery and then cache server that the application is using. There may be often times the main one is you actually want multiple copies of your application so you can load balance between them. You have an engine X server or something else that's doing the load balancing between all of those. What else? JavaScript? This actually assumes that the PHP code contains all of the application's logic and just spits out a web page. Modern web applications can actually split their logic in client-side code that's delivered to the browser through JavaScript and the server-side code that's running on the server. So, that's another aspect that we're going to ignore for now. What else? All the third-party API servers? Yeah, any third-party that could hit or logging or monitoring that it's doing the database. So, this is the standard lamp stack is Linux, Apache, PHP, and MySQL. Right? But you can have more than one MySQL. You can have a cluster of servers so you have some increased availability. You could be using something that's going to say a disparaging remark. You could use something like a NoSQL database like MongoDB or something like that. You could even be not using a database. You could be using files on the local system like all kinds of... The configurations are endless because the user using this does not care at all about that. The user only cares about that they get an HTML web page back that the browser understands. So, you can do all this crazy, complicated architecture. You can even have multiple copies of this in different availability zones across different networks for greater availability purposes. Anyways, but we don't care about any of that. So, that was a great digression. We just treat that as a black box. The tool doesn't care what it's running and all it's going to do is make HTTP requests and get HTML responses back with usually some style sheets and some JavaScript code and then it has to essentially extract all the links in the HTML that point to other pages inside the web application and follow those and issue more requests. So, go down and fuzz there and then eventually trigger the existence or basically not trigger infer the likely existence of a bug in the application. One of the key problems is obviously false negatives similar with fuzzing what we just talked about. The key problem is how do we know we've exercised all the functionality of the application? In the same diagram, what do you mean when we are scanning from one place to the other? For example, you had made the request and then you said go and make the request somewhere else to see if you can correct the bug. There are three things that are incredibly important to web security. URLs, because the URL tells you how to make a request to the web server and specifically it tells you how to make an HTTP request and then the HTTP response contains HTML which contains links including URLs which tell you how to make new HTTP requests and get new HTML. So, it's all about, it's all driven by URLs. So, in this sense I like to think of it as a web application is actually an easier to explore GUI in some sense for a computer. So, you have, it turns out even though it may be slightly counterintuitive when you're running an application on your machine let's say like running some code or creating a tool that can try to automatically exercise all the parts of let's say an Android app or your GUI like Excel is incredibly difficult because the developer has 100% freedom on how that dropdown menu works. You can try to use things like in Windows Calm to try to interrogate but fundamentally they can draw it like whatever they want. Think of like a game you have a game menu where you press escape the game menu shows up. That's not using any Windows rendering to get that it's drawing all those pixels basically whereas here, in order for your website to actually work and for people to use it, how do you know how to interact with the website? Never thought about it before, doing it for so long. So, how do you know as a human you don't look at that do you? For instance, an interface based on what do you do? You look for things to click on, right? And when you dig into the implementation, you know it's all links and forms. That's essentially it. You can click on anything that's a link and you can fill out anything that's a form and those make new web requests. The interesting thing is from the web application perspective it actually doesn't care about any of those links or anything, right? Because all it cares about is the HTTP request which contains the URL you're trying to access and then any form data that you're trying to send as well. So, this is what we're crawling in sentence there. And this is as we'll see, this is how basic tools work is they just want to crawl the whole thing, get every single URL that's inside that application and fuzz every single parameter with every single vulnerability that they have. So, white box tools, I want to talk about them briefly because I do think they're super cool unlike I think essentially it also has to do with the difference between dynamic analysis and static analysis so if you guys talked about the difference there can somebody refresh me? dynamic analysis dynamic analysis and static analysis, yes. So, static analysis looks at what the code says, dynamic analysis looks at what the code does that's pretty good, I would agree with that. Yeah, I'd probably also add that essentially, so in some sense here with this black box tool, the dynamic analysis in the sense that the application must be running somewhere and it must be configured and set up properly and have whatever databases it needs because we're interacting with the actual application, right? Whereas with a static analysis tool white box tool, we just need the PHP code, I actually don't care about instantiating the database or anything I can just essentially look at the code and the idea is not execute the code, right? So, these can be super simple, you can implement a static analysis tool as a grep, you can grep for things that you know are known vulnerable, but yeah, so cross-site scripting is when you use user input in the HTML response, and the classic way of looking for this is just echoing so this $kind underscore get is the PHP way to access any get parameters that are passed in the URL so in the URL after the question mark, you have a series of key value pairs, so key equals value separated by the ampersand so all those key value pairs get put in here, so if you look through this and you can do this on GitHub, you can find multiple examples of vulnerable code that's always a vulnerability you can look at syntax, so you can look at vulnerable code patterns there's a lot of work here which you can do on looking for maybe crypto style vulnerabilities by saying like, oh, if you use crypto with ECB rather than CVC, that ECB is pretty much inherently vulnerable so that's a problem, so you can look for that or you can statically analyze the code and try to understand if the code contains vulnerable behaviors, so in the case of cross-site scripting, what you want, the question you want to ask is, is it ever possible for user input to go from the user to output in the HTML without being sanitized properly and that's the question you want to ask so you look through the control program on all program paths essentially and to do this is complicated and super cool but I'm not going to talk about it there it's also impossible which is good otherwise we'd just be done with security so actually it's really easy who's going to pay any money for a tool that will tell them every single cross-site scripting vulnerability in their PHP applications how much money would you pay for that I guarantee I will find every vulnerability zero false negatives which I said were one of the worst things zero false negatives probably nothing because you're going to give me all false positives no I'll give you some true positives too so you should pay me nothing because I can give you a tool that every single line of code I say there's a cross-site scripting vulnerability oh this line that's just a curly brace there's a cross-site scripting vulnerability there's a cross-site scripting vulnerability I can do that and within there there will be every cross-site scripting vulnerability in your program there will also be a huge percentage of false positives so the goal right the goal we're trying to do is 100% true positives which inherently means 0% false positives based on how those work and 0% false negatives this is actually provably impossible so I'll briefly sketch it a little bit essentially you reduce it to the halting problem so what's the halting problem we'll program stop yes so the idea is if I give you an arbitrary program P can you tell me if this program will halt on every possible input or I'm sorry will this program halt in the sense that it will not loop forever will it halt on every possible input it's supposed to be with the given input you have the halting method H that takes in P and I or P is the program and the ways in which it will I think it's a generalized input because if you can just say does this halt on this input you can just execute it because the way they prove that that's not provable is having this same input yes that's I think the key problem but I think the trick there is the other person can choose the input so we'll just generalize it there for now because I'm stealing this proof from here so go read this proof because I'm not a halting theorist or a a prover I don't know but the idea is okay so let's say actually this is relevant for you all because you're thinking about binaries so the idea is if you have C code could you create a tool 100% tell you that in array access all array accesses are correct or not so we want to do bounds checking array out of bounds so we want to say for every single array access in our program is this safe that's classic array out of bounds, classic memory corruption vulnerabilities so you basically need to prove that if you can do this static analysis then you can solve the halting problem and get a perfect outline of this proof so assume we start we're going to prove by contradiction we assume first that we have the perfect static analysis that can detect array out of bounds so you give it a program it 100% tells you every single array access and every single yeah that are actually vulnerable and it has zero false negatives so it finds them all and it tells you so the way you do that is you first take the program replace every array access with a bounds check that does if i is greater than or equal to zero and i is less than the length of the array then access it otherwise exit so we're going to replace every array access with this therefore if we have an array access out of bounds it will terminate the program so this false condition of this branch will exit the program now then transform every actual exit of the program with an explicit array out of bounds so change every exit with a deliberate array out of bounds then run your perfect static analysis algorithm on the program if the static analysis says that there are no array out of bounds right because those are the original exits of the program it means that we will not hit any of the original exits of the program which means that the program does not halt if the static analysis says that there are array out of bounds access then the original program does halt which means we've solved the halting problem because we can use that perfect static analysis tool to create a program that solves the halting program all you have to do is give me a program P I do this transformation and you can use that perfect static analysis tool on it no questions on this read the blog post you can ask me but don't expect an answer yes can you just have inside that program that you're doing the essay on can you just have if essay is true that run forever because that's how they prove that you cannot prove the halting problems can you just do that in this one too I think what we're trying to do here is reduce the halting perfect static analysis to will this program halt or not by essentially flipping the condition and saying that every exit I'm going to put an array out of bounds and every access I'm going to protect every access so that out of bounds check causes it to exit and then so you're essentially because you have like the yeah could further reduce it to that problem that's what you're saying there the halting problem proves it's own thing so you assume with this problem is not problem is not solvable so there we've reached a contradiction in our proof because we started with the assumption that we have a perfect static analysis tool and then we get to this point and it says we can't have a perfect static analysis tool we can just solve the halting problem so yes please sorry just a little bit confused because wouldn't our static analysis tool always return that there are array out of bounds so we'll go ahead and say yes there are array out of bounds a perfect tool will tell you no no it doesn't just tell you there could be so if you have a if false branch and then an array out of bounds that's actually not a vulnerability because it can never be triggered by code so it's about the path through the program as well so you want to say other way to think about it especially when you're thinking about binaries and fuzzing and states and all that thing is starting from the initial program state is it ever possible for this program to reach a state where there's an array out of bounds and if you never execute that code then that's fundamentally not possible I agree you'd want to clean that up though that's where it gets tricky yes so I'd agree that yes a practical tool you'd want that but for more technical terms you'd say that it's not really a vulnerability because an attacker can't exploit it I guess it's like if a tree falls and the wood just makes it sound right it's like if a potentially that's why you want to fix it because it's potentially vulnerable if the control flow ever changes to execute down that program path but as it stands right at that moment there's no vulnerability there good question alright so throw up my hands no we either have to have false positives or false negatives and this is that's really just a decision so I realize we're kind of running out of time so I want to talk about some research stuff but the idea is if you have no false negatives then that means your tool is sound in some sense in a very specific sense but usually with caveats like in a modern language it's very hard to make this claim even something like analyzing Java code because you can have reflection to dynamically load code at run time so after you have that statically you can't tell what code is being loaded and so now you have no idea what the program's behavior is so the other way is no false positives and this means the tool is essentially so tools have to figure out which way you want to go but this is what I meant where the nice thing with binary analysis is saying you have no false positives essentially means that you do that by verifying every single vulnerability report that you report and when you think about more of a practical tool you can even sort your vulnerabilities these are the ones I've definitely confirmed with static analysis with group concept these are ones I think are vulnerable but I could improve it but maybe that's useful information to you so why not use both why not use a static analysis tool to tell you where all the vulnerabilities are and then use a dynamic tool to then exercise those vulnerabilities to generate group concept tools do that some tools do that but why? I think about in the web context not the binary context the binary gets a little blurry there because you have the binary code so you can look into it and do some static analysis very easily but here black box tool fundamentally can't like you take a web black box tool which doesn't care about source code and then take a static analysis tool that does care about source code maybe use the static analysis to find vulnerabilities and then pass that to the black box tool to verify exactly yes so these are kind of people trying to make up a term of gray box tools but they really have the worst of both worlds right you have the fact that you still are only tied to one language and one runtime and then you also need this other black box component to verify so it's probably one of these things where it's probably more practical in some sense in my opinion it's just lame I feel like we should be improving both tools rather than trying to mix them in ways that don't make sense where they just inherit the weaknesses of each other cool so static analysis is really cool I've done some work on static analysis and we are doing work here it's really cool because you're trying to understand essentially over approximate the behavior of a program so somebody gives you a program what does the program do what are all states the program could be in so it must be either an under approximation or an over approximation so we need to say well this could be a vulnerability but maybe not because I don't know if maybe this path through the control flow graph is not valid alright so I want to get into research here so the goal basically of the research inside automated vulnerability analysis is basically you want to try to increase effectiveness by decreasing false negatives decrease errors and develop brand new tools to find new classes of vulnerabilities or errors in other types of systems so I want to talk, because we've got 15 minutes so I want to talk about some of the research that I've done in this area to give you an idea of what are some of the things because this is more of a question mark or maybe not so black box web application vulnerability spanners so these are the tools and the other way I forgot to mention them are point and click tools where you put in a URL you click go and they find all the vulnerabilities in that website there are a number of tools that are on here they cost anywhere from I think proxy is free to use a limited version which is still super functional and I actually use the free version but the full version is I think only like 300 dollars where it's like acumenics and some of the other ones are like in the tens of thousands of range and it depends on your license too so you can buy a pen testing license that allows you to run it against any URL and that's about 10 grand or you can get a version that only works with a given URL so basically it's like hard coded against your software for less money but I don't know these prices may have come up too so we wanted to ask the research question of how effective are these tools right and it's actually when you think about it it's kind of a tricky problem so how would you say that one tool is better than the other let's think about hypothetical scenario I tell you that the Birk proxy found 5 vulnerabilities and acumenics found 10 vulnerabilities which one's better depends on maybe the severity of the vulnerabilities they found what else could it depend on it could be false positive just because a reported vulnerability doesn't mean anything what else they're all zero pay vulnerabilities so they're all undone vulnerabilities does the speed how long did it take then to find the vulnerability usually as an academic I would say it's less important because you could always just run this with more cores if it takes me a day it's an automated tool right so you can create a test but yes that is something that is important code coverage code coverage in what sense because they want to generate some inputs and so based on these inputs they should follow some for example tags or strings in the program and if these tags change so how much these two pay change based on these even with code coverage let's say the tool that found 10 vulnerabilities only exercised 25% of the code but the tool that found 10 the code what do you mean total number of vulnerabilities that exist in whatever you're throwing at like how many was it out of a thousand that were there exactly so and the other thing nobody mentioned is what's the overlap right so did Burke found the exact five that's inside the set of the 10 that the other one found are they completely distinct sets in which case that's interesting is there overlap there and furthermore it's hard to say whether 10 is good because it sounds good it sounds like a high number right 10 vulnerabilities but if there's 50 that exist in this program they only found one fifth of the vulnerabilities or if there's a hundred that exist right each tool only scratch the surface of the vulnerabilities so and using off the shelf software it's essentially impossible to answer that question right of how many so I gave you a version of WordPress and say I want all these tools against WordPress how do you evaluate the false negative rate because that's an important part so what we did is we developed our own custom software that had no vulnerabilities that we inserted so this is Waco Pico the website that we created I think it has I have to look in the notes and it's something like 16 intentional vulnerabilities and we also wanted to test the scanners across those axes we talked about so we had some crawling challenges the for instance the what's going on today gives you a link to a calendar that's an infinite calendar it keeps going forever the tools there's a login functionality so vulnerabilities behind the login form to seeing can they and some of the tools actually did create create user accounts and logged in and crawled and fuzzed that part of the application there's things here like this is JavaScript so this was created dynamically by JavaScript so the tool didn't support JavaScript you wouldn't find that so there's all different types here and this was actually taken by a guy who's French who created this picture so 16 vulnerabilities across and we also wanted to not just test the standard SQL injection and cross-site scripting but also things like command injection where an attacker can essentially inject like command line that they input is used into something like system and so they can, yes this happens a lot so they get remote code execution on your server we had a week session where the session ID in one part of the website was generated just the session ID was not random it was auto incrementing so you could access other people's sessions we had our admin area the website had the username and password of admin-admin parameter manipulation this is where there were private parts of the website but by changing the parameters there you could access somebody else's information which actually so this was something that we found in a company doing a pen test where they had their credit card records and the link to the PDF of one company's credit card report was some long number and so we tried what if you change that number up and down and then we started getting other people's reports of their credit card receipts basically and so we stuck all those down to include in our report file exposure if there's any using directory traversal attacks directory traversal fire exposure and we had a logic flaw the logic flaw here was there was a coupon code where you enter and it could drive you could keep entering it to drive the price down to zero so this is something, of course we didn't expect them to find but it's something that is a realistic vulnerability and so we wanted to see we actually also so when we created this website we gave it to the hacking group at UC Santa Barbara and we let them and they found I believe 15 out of the 16 vulnerabilities maybe even all of them I actually can't remember but yeah they found all of the vulnerabilities which was awesome and then we ran 11 tools against the website to try to figure out how they did so the lowest, this was a really terrible one only around 10% but the highest were only around not even 40% of that found actual and these are after false positives are all removed and everything and so no tool found more than 40% of the vulnerabilities and even I believe even all together so across all tools uniting the set of all vulnerabilities I don't think it passed 50% so they missed a lot of vulnerabilities I'm not going to go into them here and they had false positives ranging from 0 to 200 plus which was insanely annoying to deal with and go through those results the average was about 25 one was a server path disclosure where they'd say you're leaking paths about the server and like flash far slash something but that was input from the scanner that was being used in the output it was like cross-site scripting vulnerability so they were injecting their own input and thinking they were disclosing local paths on the server which was not true so they're easy to scan out the so like the actual false positives that were interesting was yeah so they would say that like the server wasn't and they'd be like oh there's this PHP eval injection vulnerability which there was not and so this is kind of an interesting point you're actually expecting for these tools the false positives would be pretty low because they're sending some input triggering a vulnerability but that analysis component of figuring out why it wasn't vulnerable was not very good and so what we did is the problem of we looked at well how do you say when one is better than the other so we actually I took this from one I was thinking like a theory class we were learning about like a lattice we guys talked about like lattices and all yet I don't think you will but maybe the basic idea is so this is ordered in terms of we created a metric called dominates so the idea is question why can you definitively say that one tool is better than another what if it finds a superset if it finds a superset of vulnerabilities and the other thing that I didn't mention was we actually ran each of these tools in three different configurations so not just we ran it in point and click we ran it in a configuration mode where we put in a valid username password so it can exercise more of the website and we ran it in a mode where most of these tools have a manual proxy mode so you set up the proxy and then you use the website through the proxy as a human and then it figures out what's going on from there so then the other flip side is not only did it find the same vulnerabilities or more but also found it at a lower level so if like I found it on a point and click but the other tool found the same vulnerability after manual analysis that tool is not as good so anyway that's just what we remembered here so this is the dominant trap so the tools on the top and this is the number of vulnerabilities I believe that they oh no no no is that the number of vulnerabilities you think that can't be right it definitely did the worst is it going to rank ordering you know that number because Burke did it have the same percentage I just want to check yep yep yep alright great okay yeah so the idea is they all found so the interesting thing is those four tools all found the same six vulnerabilities but did it overlap like none of them found more or better than the other and even end soccer is interesting because it found a vulnerability I believe that nobody else found but it missed the stuff that Grendel scan found because otherwise it would have dominated that so super interesting here and this is kind of an interesting way to think about these tools and what they can do so anyways I'm not going to talk about this but the we actually went through and we looked at all of the requests that they made and all the payloads that they sent to try to understand more stuff about there but the biggest problem was the login page the biggest problem was that the tools and we saw this is the tool would be crawling fuzzing the app and even if we told it necessarily how to login it would maybe end up logging itself out while it was doing its testing so what we realized is these tools take this shotgun approach where they just blast requests of the application without and get a bunch of responses back and try to determine that something has a vulnerability this request logs you out of the application now you're no longer testing the application essentially in that same state so if you think about code paths this changed so all of these requests even though you're testing the same input with the same vulnerability class all of these requests you're now going through a different code path and that's the fundamental problem is these tools were created in a way that they thought of the web application as this dumb thing that just takes input and gives them output why these tools are somewhat state agnostic because they are that's what we're going to talk about in a second so that's what we identified as the main problem is that they essentially state agnostic is a good term they don't even think that they're talking really to an application in some sense this is why I use the term web application all the time instead of a website to me it's just something that has static html pages and just gives you output whereas an application is something you deal with and interact with and it has a section and it has state and all those nice things and so the idea is so for a simple web application thinking about state let's say there's an application where you access home.php that's the main page that has a link to login then after you access login you can access the view.php page which is now a link on home.php and now you can access those pages of view and log out so the idea that we had is can we create a black box tool that can automatically infer this state machine or these state transitions and what we thought of so we first thought abstractly this is essentially what's the state of the application right there's a guest and a log out state or a guest and a user state so you first access home and you can do that as many times as you want you stay in the same state then you access login and that transition to a new state where you can continue accessing home as many times as you want it doesn't transition you out you can now access the view page and then you can log out and go back to the guest state so the idea and here now we can see is A if you can never log in you can't fuzz or test view.php so you'll never find more abilities here and it's possible, although we don't know it's possible that home.php is executing different code in these different states so we want to test it in both of those states and so what we realize is this is actually we can use a, as Ayman heard of automata there's a similar type of model of a machine called a Mealy machine where in every state you have the input which is let's say home.php like the request and the thing after the slash is the output so the idea is in the guest state when you make a home.php request you get output HTML page A and then when you do login.php you make output B so the user state you now get home.php will give you C and you can access view.php which gives you D so the idea is we realize if we can model, use this as a model we can actually use this to infer what state the application is in in a completely black box manner and the reason is so why is the output for home.php different in guest and user there's a link to view.php so I know if I make a request so the core idea of a Mealy machine is if an application is in a given state and you make a given request you will get that response back A so if I knew this in advance I could access the home page and then know which state I'm in between A or C but we can't know that the application we can crawl the application get some html back A we can crawl the login page get back B crawl the home page get back C and crawl the view page and get back D and this is the core idea so we first crawl the application getting a series of input output pairs we then have to change between those responses but we don't know which request changed the state so then we need some heuristics to try to identify because there could be 20 requests that we made so we don't actually know which one transitioned even you can use the cookies because maybe decision cookies changed in this way when you are logged in so you get the new cookie yeah that's tricky though we actually tried to not rely on any cookies because all of these lines of the html consider it has 1,000 of lines so it's very hard to find that oh those are easy we'll talk about that in a second actually but the cookies are actually tricky because most php applications as soon as you do start session you get cookies and those never change right so the mechanism of how the state is done is kind of we actually developed a process that's agnostic of that we actually don't care and the idea is it's interesting to decide this exact thing is A and C different and there's actually a lot of difficult problems here when you have advertisements which we're not really concerned about the other thing is with this model so we're thinking about no external so the only thing that changes the state is our request it's kind of a simplistic model but it still kind of works are you only looking at the response body? you're not looking at the full response like the headers or anything in the response no we're only considering the html do you have some kind of smarts that determine what if they put the correct time on the page yes that's a huge problem we ignored that just to see if we could do it first so we actually changed some applications to remove time-based components but I think it's one of those things you could add over time and one of the ideas and what we realize actually the main insight in detecting if A or C was different is not to use the text of the page because the text of the page matters a lot less what matters is so the key difference between the home and these two states is the fact that there's a new link to view.php and the links and forms on the page define what you can do with the application so when there's a new link when the link structure changes the application has changed your GUI has changed you now have a new button that you've never seen before what's your approach for writing or even parameter manipulation it's tricky but yes we did cover that we'll talk about it very briefly I'm actually over time but I'll just finish this with these thoughts up we need to do a couple things we need to first figure out these similar pages if A and C are actually the same page or different we need to like I said we know that the state must have changed but we need to try to determine which request changed the state because actually we use a lot of heuristics here like a post request is more likely than a get request than a whole bunch of heuristics and then at this point we actually just have a you can think of it as a chain so we know we were in state zero and now we see state change state change state change you know are we flipping between two states so then we have to have a way to essentially collapse those onto itself and we use a graph coloring algorithm based on trying to determine which states can't possibly be the same based on some techniques we developed but I'm only going to talk about clustering similar pages so the idea is we use the links and forms on the page and we actually develop a way out of this as well as all of the parameters the URL itself is like a tree and as well as all the parameters and then we try to cluster all of the requests together based on this hierarchy so I mean for example for A which is this tree we can as A0, A, A1 yes yes yes so that would be I mean the parameters are part of this tree that there is a lot of a lot of HTML responses that all map to the same and I believe the request as well that map to the same they're only different in this one part so you would cluster those together but you need some heuristics based on how similar the rest of the parts are the code is also open source so you can dig into it and then you can complain with FISA about it because she did her master's project building on this it's very complicated it's not a brag it's just a complicated stuff so the idea here is that this page has a form with a post method to add that PHP whereas these forms are all the same but they differ in this ID parameter here so we felt our clustering algorithm to cluster these together but leave that one separately so we ran our tool we created a tool called WGETSKIPFISH and W3AF so these are all open source tools and W3AF the results are super interesting because we didn't create a new buzzing engine we used the exact same buzzing capabilities of W3AF so we just crawled the application differently and then once we have this graph of all the states of the application we could test the application in every single one of those states so we developed against a number of apps one of the things I really like about this is that I think vanilla forms and gallery were Ruby on Rails apps so we weren't just testing on PHP apps and this WAGO Pico version too is because I haven't killed some of the time-based functionality so we could ignore that but the WordPress stuff was all good and then this was the key thing is we're saying what we're doing our vulnerability we're improving the crawling engine which the only thing that matters there is are you executing more code so we did use code coverage as our core metric here and what we did is we used a baseline approach so the other thing is code covered by itself is meaningless if I say I covered 10% of the application's code is that good, is that bad it depends is 90% of that in admin functionality what we did is we used Wget just like a recursive Wget which is the stupidest crawler you could ever write that's not even a vulnerability analysis tool we used that as our baseline so we just said what's the code coverage with Wget and then what do all these tools get above Wget so briefly the results so some interesting things to note here that skipfish the green tool actually had less code coverage than Wget on something like WordPress yeah whereas like you can see some of it this PHP bulletin board we had more than 100% increase in code coverage over Wget Wacko Pico was obviously much better because the tool could login already would create accounts and login like it was pretty cool excuse me you test your solution only I mean for the testing do you use any logs from the web server or even the skill server to check queries no no no for testing how much you dig into the code no this is just code coverage but we did get so we do have vulnerability metrics we did find vulnerabilities that nobody else found the other interesting thing is this is one of the graphs so these graphs are actually incredibly complicated because the way we defined state it means that we made a comment on let's say a picture on a website you've now literally fundamentally changed the state of the application and unless you can delete that comment you can never go back so we didn't create like a loop detection thing to say like oh this is actually fetching from you know whatever but it does work and it does increase the effectiveness here so yeah so you have all these interesting login and log out functionalities based on and there we had Waco Pico have this cart functionality so you can get pictures add them to a cart so yeah that's pretty cool anyways I want to give you an overview of the research I've done automated vulnerability analysis and what kind of cool interesting research areas there are so I'm around