 I met Ian and my colleague gets sense post and this is mostly his work. I'm just a lucky guy or unlucky guy who got to read through 3,500 lines of his fuzzing grammar. What is fuzzing? Seeing as you're all in the room, I'd like to assume you know what fuzzing is, but if you're expecting a talk about furry animals, anything along those lines, sorry, wrong talk. Fuzzing is basically the process of applying input or manipulating the inputs to an application in an automated fashion and seeing if you can change the application state. We could simply say throwing shit and seeing what sticks. If we dig into the history of fuzzing, we have to go back to 1989. The golden years of Miami Vice, great hairstyles, awesome suits, rocking dance moves, and Boris Bizer who presented a paper on syntax fuzzing or using syntax for robustness testing of applications. Fuzzing has continued since then and we've seen big names like Charlie Miller, Dave Aitel, and others applying fuzzing across multiple applications, multiple protocols, and platforms. One thing that has stayed constant through the years is the fuzzing methodology. It basically consists of five steps. Step one, identify your target. What are you looking at fuzzing? Is it a browser? Is it the network stack or anything like that? Once you've identified your target, you need to identify inputs to this target. This can be done in multiple ways. Either by using the application and identifying inputs, reviewing documentation for the application or for your target, or by reverse engineering the application and finding hidden inputs that weren't known before. Once you've identified your inputs, you need to generate test data. Generating test data can fall into two categories. Either have dumb fuzzing or smart fuzzing. Dumb fuzzing is something we've all done before. This can be simply inserting a thousand A's into an input text box and seeing if the application crashes. When you're doing dumb fuzzing, you apply data to your inputs without any prior knowledge of what the application is expecting or what the data should look like. Smart fuzzing, on the other hand, you use knowledge about the application or the inputs to generate your test data. This is usually taking valid test data and mutating it in unexpected ways. This is where our fuzzer falls in. Once you've generated your test data, you need to start fuzzing. This is simple. You just feed your test data to the target application and see what happens. To do this, you monitor the application and in our case, we're monitoring the application for memory errors. So there are some tools that help us with the fuzzing process, specifically fuzzing harnesses and memory error detectors. We're going to talk about fuzzing harnesses, what they do, they basically run the application, feed the test case to the application and monitor the application for crashes. For Windows, we have Grindr by Stephen Fewer. For Linux and Mac OS X, we have NodeFuzz by Etika Thunin of OSPG. Memory error detectors, what they do is they are early detect errors in memory by hooking the alloc and the free calls. They poison the memory allocations around the allocations and after the free, they quarantine and poison memory. Any access to these poisoned locations will result in a signal crash to the application. Also for Windows, we have PageHeap which is part of the Windows debugging toolkit. For Linux and Mac OS X, there is Google's address sanitizer. But enough fuzzing, enough theory. The real reason you hear is Wadi Fuzza. What is Wadi? Well, apart from the obvious meaning in Arabic of a dry valley bed or a river that may contain water at high season, Wadi Fuzza is a grammar-based browser fuzza. What this means is that we apply, we use a well-defined grammar to generate our test cases and feed these into any browser that uses the same grammar to construct its parsing methods. This approach has been very successful and has resulted in numerous crashes in, well, high security crashes in browsers. The reason we created Wadi was we wanted to be able to identify bugs in existing and new web APIs. The process for this has usually been hard, a manual process and you normally focus on one browser. By looking at an LL1 grammar or using grammar to create our test cases, we are able to quickly and easily generate new test cases as new APIs are brought out and implemented in browsers. By using the specifications, we're also using the same technology that your browser manufacturers or creators are using. This approach allows us to create test cases that are standard across browsers and you can use the same test cases for Chrome, Firefox, the new Spartan browser and hopefully you get browser crashes. So what is Wadi Fuzza? Well, it fuzzes all the way through three, four, our mutation observer, HML5, CSS, CSS3 as well and at the moment the web animation API. But there are no limitations to which APIs Wadi can fuzz. You simply need to feed it the correct information and it will generate correct test cases and be able to fuzz these for you. We all know what fuzzing means, crashes which mean money. So no one has made it rain for us yet but we have made some money, almost specifically CIFS and yeah. Simple intro into the DOM. The DOM or the document object model provides us with the standard objects to describe HML and XML documents. It also provides interfaces for interacting on these objects and manipulating them. So you also have web APIs and web APIs give us JavaScript interfaces to interact with the document object model. These web APIs can also, can consist of speech, web audio, animation, the latest one is web crypto and new APIs are being pushed out constantly and hopefully being supported in all browsers. Grammar time. What is grammar? Grammar is something we use on an everyday basis. If you apply grammar to languages such as English, your grammar basically defines how a sentence should be constructed. When do you use a verb, when do you use a noun, et cetera. Grammars in compiler theory or in computer science are used to construct our compilers and used by compilers to pause a programming language and verify with a syntactically correct and how to generate our compiler data. What do you on the other hand use a grammar to in much the same way as a parser would or compiler would and generate our test cases for browsers. If you want to put grammar into one sentence, it's basically knowing the difference between your shit and knowing your shit. Grammar when applied to web IDL. W3C, the web standards committee, they provide us with the interface definition language or IDL. This defines how browsers should implement new web APIs and the document object model. This IDL can be described with an LL1 grammar. Those of you that have done computer science and compiler theory will know what an LL1 grammar is and how this relates to the specifications. Simply put, it allows pausing of the DOM and the web APIs in a standardized manner that allows browser manufacturers to all apply the same standards to the browser technology. Unless you're Microsoft, you kind of follow your own specifications. If we look at an interface, an interface in the DOM defines a structure that can contain attributes and functions that interact with the document object model. This is the, here we see a grammar that defines this interface object and we can see that we have a token called interface which will be followed by an identifier, possible inheritance and then all the interface members or objects belonging to this interface. If we create a simple and simplified interface for the text object, we can see that we've got our identifier being text and our inheritance being from the character data class. We can also see in this case we've got four interface members. These interface members can be described individually as well and if we look at the grammar for this, we can see that we can have an interface member can either be a constant which we are not interested in in this case or we are interested in the fact that they can be attributes or operations. An attribute and operation can either be values that describe the interface or the functions that interact with that interface. So if we just look at the attribute definition for simple attributes, we can see that we've got possible inheritance, read only flag that can be set, the type and identifier. In this case, we've got one read only attribute and one read write attribute and one Boolean type and another DOM string type. Same principle can be applied to our functions and we can see that functions can have a return type, an identifier and possible inputs and these inputs can by themselves have different input types and values. When mapping our IDL into our fuzzing grammar to create our test cases, we map these to JavaScript objects. What we've done is we've created an object for attributes and attribute can have three members. The attribute identifier, the functions that allow us to generate data for that attribute type and whether the read only flag is set. When we look at our functions, the same principle has been applied. We can create an array containing two members, one with an identifier for that function or method and then a second array containing functions that generate our test inputs to those functions. Here's the full JavaScript object that we have created from our initial test, from our initial text interface. We can see that we've set the name, that it's a text. We've got attributes that have been defined and we can generate expected value for those attributes and the same thing for the methods. We can generate our expected inputs. We can also note that we've actually concatenated this with the character data interface and the associated attributes and methods. This is due to the inheritance. So we have some helper functions that help us all through the fuzzing process. Most of it is used during generation of input data either through to the attribute, which is governed by the input type or to the method that generates parameters dynamically. Most significant of these are the functions you've seen in front of you. Randoms, for instance, returns a random hex number or a random integer. Our int returns a random integer in the range that is supplied. String will return a random length string. Most importantly, we have three functions that we like to focus on. The array array and the array walk, which basically walks through an array and test the data in it. If it's a string, it will just return the string. If it's a function, it will execute it and return the return value. The last function of these is the return random element, which will reference a randomly created element through the fuzzer and we'll see either it will reference the element directly or we'll reference a near element like element dot first child, element dot last child, element dot parent node or whatever it may be. The fuzzing model how Wadi flows, how it creates the test cases is first element creation, then fuzzing domain API interfaces, then string array parsing and preparation and at the end output the HTML test case. Test cases everywhere as you can see. First, Wadi works on two spaces, the fuzzer space and the browser space. The fuzzer space will allow us to access the interface objects in the fuzzer itself. The browser space will allow us to access the element created in the browser itself from the test case. The next thing is element creation. You can see here that Wadi has three main functions for element creation. The create element, the create text node and the mangle elements. The create element will basically choose a random interface for the grammar that we have, create an element for it and save two references, one to the browser space and one to the fuzzer space. And this is the output for the Wadi create element. The create text nodes will create random lens text nodes and randomly attached to the create elements and the mangle elements will mangle the DOM tree. Next, we have fuzzing interface functions. Basically what it does is it will call a function called fuzz with a certain number as parameter. This number will be used to exact the number of rounds that this fuzz function will execute. What this fuzz function does is basically randomly call one of the functions we have in the fuzzer. And these functions range from fuzzing window, document or element interfaces to fuzzing styles and two different methods using insert rule or using the normal DOM level 2 dot style directive. We have functions that dynamically create mutation observers, dynamically create events and dispatch them, dynamically call random range and node iterator functions and walk through them and as well as player attributes or the garbage collection depending on what it is. The last thing is how we parse or prepare the string. First, we have a function that will generate random function names to be used as callbacks, for example, for events or for mutation observers. It will create this simple function depending on the number of statements that you want to insert in it that will contain this. Next, it will get all JavaScript statements related to element creation, be that the main elements itself or the mutation observers or dynamically created elements through the fuzzing process. After that, it will just insert all the object create JS statement like node iterators and all. And at the end, it will randomly insert the JavaScript statements that are used to fuzz these interfaces and these objects. This is the sample output from Wadi. This will be in between script tags in a full HTML documents. This is all dynamically created. None of this is manually created at all. We wanted to prepare a demo for fuzzing but we felt it might be a little bit boring so we decided to give you a little bit of sneak peek onto a new project that we were working on which is basically the Spartan fuzzer. This is a true fuzzing session that we were testing the fuzzer on. As you can see, Spartan is running and we connected the fuzzing server. As you see, it doesn't take long for Spartan to crash. This is a simple null pointer dereference but it means that we're going in the right direction. Next, we talk about our findings. Basically, using Wadi, we're able to find four bugs in Chromium. Two were duplicates and two were confirmed. The ones that were confirmed, one of them is CVE 20151243. This was used after free in the DOM itself. This is the POC for it. We got rewarded 3,000 for it. The second one, as you can see, is a very simple POC. What is assertion error? We got rewarded 1,500 for it. The third one is a duplicate so I did not want to put the POC on 24. The second one, the fourth one and last one, it was also a duplicate. Unfortunately, I hit Google's cluster fuzz because I found it 24 hours after cluster fuzz found it so I was a bit pissed about it but it's fine. Basically, thank you. The code will be up on GitHub. This is safe. I hope you enjoyed our talk. These are the references if you need to look at them that we used during our research. If anyone has questions, we're happy to take them. We still have time, I believe. Yeah, sure. Sorry. Yeah, basically, there is an automated tool that we have created or Etienne have created which basically takes our IDL and will generate the JavaScript objects dynamically. And of course, there will be documentation explaining how this whole process have been done. Anyone else want to ask anything? Anyway, so because we are generating JavaScript statements, if we put mutation in it, it means that we're going to be fuzzling the JavaScript engine itself. What you want to do is apply mutation to an HTML and already created HTML document. There is an implementation of that that Etienne thankfully made which is Radamsa and Surku that can be used to basically apply term mutations to an HTML document or SVG document and feed it to the browser. So mutation based if we are fuzzling the JavaScript engine itself of the browser, I wouldn't apply mutation on that. Sorry. Well, we haven't tried yet, but we have some ideas that we have tested but not fully implemented that worked against something like Ruby, for example. Anyways, guys, we're going to be around if you want to ask us anything. Hope you enjoyed it. Have a great day.