 It's my great pleasure to introduce Olivier Cremieri all the way from Switzerland. Give him a hand, ladies and gentlemen. Olivier is the co-founder of Bugbuster Incorporated and this is going to be one of the highlights of the show to watch out for this. This is a session on automated testing and he's just going to walk you through the product that he's built for it and show you some cool tricks that he can do with it. Let's thank Swissnex for bringing him down from Switzerland and Bangalore JS, so Barbara and Jonathan for getting us in touch with him. He's already done one great workshop yesterday and you're about to see what he's built. Over to you, Olivier. Thank you very much. Thanks for the intro. Thanks for having me here. It's a great pleasure to be here in India and see this thriving community. As you said, I'm a co-founder of Bugbuster. I've been working with the web and with JavaScript for a long time now and a few years ago I decided that I was actually sick of JavaScript, so I was like, okay, let's do something else. Let's do a PhD, some serious things and obviously it worked very well because here I am at the JavaScript conference speaking. So anyway, I did research in software testing and it ended up in the creation of Bugbuster, which is sort of based in Lausanne, Switzerland and we're a spin-off of the school where I made my PhD and our goal is to help you guys build better applications and for that you need tools that will basically reduce the effort of testing these applications and increase the coverage so that you can produce the quality applications that your users expect. So back to the title of my talk, Generating Tests from Codes. Actually yesterday's talk was automating automated tests and it was pretty much the same idea. So if I talk about automated testing and I give you the definition, actually this is from Wikipedia, automated testing is about controlling the execution and the outcome of the test automatically. Obviously this is much better than doing everything manually, right? So let's look a little bit more about what this means. You're going to use tools like Selenium, WebDriver if you're doing front-end development or Jasmine if you're doing front-end or back-end development for JavaScript and in practice what you're going to do is you're going to start by writing one or more test cases. A test case is a piece of code, it's a script that explains basically what your code is supposed to do and what are the expectations, right? Then you're going to render tests, hopefully you're going to find some bugs. So you're going to debug your application and then once you have no more bugs in your application you're going to really use your program. Everything's going to work nicely, you're going to be happy, you're going to be able to go do something else. Then when you write new code, you have a new version of your application, you have a new commit, well you basically start the cycle again writing more tests, running more tests and so on and so forth. That's all very nice but the thing is if you look at your program, this is sort of a graph showing the control flow of your program with the if statements leading to different blocks of codes, it's actually going to be very, very big. Most non-trivial programs are going to have a huge control flow graph, sometimes it's actually infinite. When you write a test case what you're doing is you're essentially encoding the expectation for one path, right? Then if you're lucky that path will lead to a bug and you're going to be able to debug the program. But if you're not so lucky, well maybe you should have written another test case because bugs were somewhere else and then you'll be missing some problems. So if you look at how automated testing really looks in practice, I mean in real life, well you write your test cases, then you run the tests, then you debug your tests because tests are actually more code, right? Then you debug your app, then you release and then what? You fix the bugs that were reported by your users, right? And well because you were basically code with your pens down, you don't want this to happen again, so you write additional test cases, right? And then you repeat the cycle for every new version or every new comment, right? So that's exhausting and that's why, you know, testing has this reputation of being so tedious. Our goal at Bugbuster is to basically help you do better than that. So how can we test better or faster? First idea, use static analysis. We've all dreamt of having a great ID who will basically underline your code as you type it and basically tell you where the bugs are, right? So static analysis is the technique where you basically analyze the source code without executing it and well, one could imagine that with such technology you would be able to do that and, you know, and find the bugs before you actually run your code. In JavaScript, there are already some very cool static analysis tools like, you know, just hint or just lint. If you're not using them, you should definitely do that. They'll, you know, tell you about a lot of errors, syntax error and that kind of stuff. There are other tools that use static analysis like Clojure or Aglify.js for different purpose for basically compiling or minifying your codes. Well, that's all very good and useful, but the problem is that static analysis is inherently imprecise. Especially for JavaScript, which is a very dynamic language, it's very hard to know what your code is going to do without executing it. I just think that statement here is full, it's greater than zero and I want to analyze it to make sure it's correct and there is no bug. Well, I have to figure out what type, what type full can be. Could be anything, a string, a number, a function. Then I have to figure out where full is pointing to so that I can basically have an idea of what value it can take. And then I have to figure out what this statement is supposed to do in the first place, right? So it can make assumptions about all of that, but it's going to be difficult to be very precise. It's been done with other languages. It's used in other type of industries. In the aeronautics industry, for instance, it's used a lot. But for JavaScript, it's extremely difficult and basically all these imprecision are going to lead to false positives. And these false positives are going to slow you down when you debug. And so it becomes really difficult to have something really efficient. So what can you do if static analysis isn't enough? Well, we can do dynamic analysis. The idea with dynamic analysis is basically the reverse of static analysis. The idea is to analyze the code while it runs, instead of just looking at it and thinking about it. And you're probably already using a lot of dynamic analysis tools, right? A debugger is basically using dynamic analysis to step through the code and inspect the code and so on. You have profilers, you have various types of checkers. Actually, I've seen that Microsoft has a stand and they're showing modern.ie. I think they're using dynamic analysis to figure out for instance if you're using APIs that are unsafe or not compatible across browser and so on and so forth. So since we're able to basically do all these analysis at runtime, here's an idea. What if when we run a test case, what if we could actually record the path precisely and use that information to basically come up with other type of inputs that will lead the program down to different path? Well, that's actually an idea that's been around for a long time, which is called symbolic execution. I think it was first formulated in the 70s. It's been a very hot topic in research in the last few years, because we now have the computing power and the algorithms that we can implement to make it work. So symbolic execution can be used to generate test cases. Microsoft, for instance, is using it to test, I think, Windows. They found a lot of bugs in the file system, for instance. NASA is using it. But these are all prototypes for C, C++, and Java. We at Bugbuster are building an implementation for JavaScript. So how does symbolic execution work? I'm gonna give you an example. But this is basically the principle. Symbolic execution starts by recording all the constraints on inputs while you execute your program. So if you're processing input in one way or another, we record everything that's going on, and we formulate that as a mathematical equation. Then when you have a branch, a new statement in your program, we basically record the outcome of the branch, whether it went to the true side of the statement or the false side. And we record the constraint associated with that statement. We then solve that to come up with different inputs. And then we use these different inputs to rerun the program, but this time down the different path. So let's go over an example. I have here a simple JavaScript function to validate an email. So what it does is it takes the input is a string, an email address, hopefully. It trims the white space from the email address. Then it uses a regular expression to see if the syntax of the email address is correct, right? So how is symbolic execution going to handle that? Well, first it will run this function with arbitrary inputs. Let's say just the string A. Then, well, for each line, it analyzes what's going on. So at line two, the code is trimming the white spaces. So it's recording that email is not just the variable anymore, it's now email that trim. And then next line, you have an if statement. And that if statement is looking at your input value. And so we're recording a constraint that basically says, okay, I went to the false side of the branch. So my expression was false. And that's what I record. Then I basically pick that expression because I want to craft an email address that is actually valid. So I take my expression, I feed that to a solver. And the solver is basically gonna come up with another input string for my program. So I can then rerun the program with a valid email address. And that's how I basically cover the two paths of that simple statement. So what is this good for? First, it's actually very useful to understand your own codes. So we as developers, when we write code, we generally have an idea of how it works. And how it should work. But sometimes and actually very often we write things that are capable of doing things that we have no idea. And by using that kind of techniques, we can discover these other behaviors before our users do. And that's pretty useful. So from there, we can generate or complete our test suits. Which is already, which is also very useful. And we can also use that as a refactoring tool to compare the behaviors of two functions. So we are very happy to introduce today Unite. Unite is a new, it's a new toy actually that we built to demonstrate symbolic execution for JavaScript. We unveiled it yesterday at the workshop. And it basically does exactly what I just described. So I'm gonna go over a quick demo. Now if you can just stop breathing, stop tweeting, stop downloading so that I have internet access, that would be useful. So this is Unite. So Unite works by solving puzzles. So here I have a very simple puzzle that takes one variable, x, which is a number. So I have to declare that the type of the input. That's pretty trivial, right? If x is defined, it returns true. Otherwise it returns false. So I can click on run here. And if I have internet connection, I'll get the result. Yes, I have. So here you can see that the Unite came up with two input values. So first the input was x equals zero and the output of the function was false. And then x is equal one and the output of the function was true. Now if I want to test my example about validating email, I can do it very easily. So I had first a trim on the email so that I remove the white space. And then I basically said, okay, if my email matches my regular expression, which was saying basically I want to have at least one character, then I want to have an ad symbol, then I want to have another character, then I want to have a dot. And then I have to have again at least one character, right? First time I type a regular expression live, so I hope it's gonna work. Okay, so I have my validate email function and now I'm gonna change my puzzle to called validate email. And now the input is not a number anymore, it's a string. Right? Now if I run, okay, you're still tweeting and downloading. So the people at the workshop yesterday can attest that this actually works. Okay, so I'm gonna try to download again. Okay, no internet connection. So let me just explain what this was supposed to do. So it was supposed to show basically the same as before, but instead of x equals zero, it will have shown x equal a as a normal string. And then x equal an email address that was composed basically of space characters. So backslash t, backslash n, and add symbol, spaces, a dot, and some other spaces, right? And so that would have actually been a syntactically valid email address as far as my regular expression over here tells me. But it's obviously not one that is valid in reality. So basically, using a tool like this, you can figure out an input for the function or the unit you wanna test. That is probably different from the one you would have thought when writing a regular test case. But that is actually very interesting because such an email address, such a wrong input could actually be the starting point of an attack to a piece of code, right? Because if you can start inputting some weird characters in a text field, for instance, God knows what you can do with it as far as code injection goes and that kind of stuff. So it's pretty disappointing that it does. Oh, yes, it works. Ah, let's put it on the pie that it went to Ireland, to Amazon, and came back to India. So that's why it took some time. So here you can see backslash t, add backslash t, dot backslash t, backslash t. So from there, you can figure out, for instance, that the email that Trim is actually completely useless because you have lots of spaces. And that, instead of dot, you should probably put backslash w because you only want word characters. That's what's doing. Plus, so backslash w plus, backslash dot. Right, so now if I fix that, then I rerun. We have time, right? Anyway, it should work. So another thing that we can do with this thing that I'm not gonna demo but that I encourage you to try is we can actually compare two implementations. That's what I was saying before. So let's say you're doing refactoring because your code is never as nice as it should be, so you're improving it. And as part of doing this, you're writing a new function that's supposed to be functionally equivalent to the old one just nicer. Here I have an example here with string that substring. In JavaScript, you have actually two native methods, substring one is substring and the other one is substring and it's actually a cool example for refactoring because actually this is the kind of mistake you can make very easily because they sound the same, they're supposed to do the same and they mostly do the same except for some parameters. So if I run that into Unite and the other example finish, no, okay, no hope that this works but you can try it for yourself. So the address is Unite.bugbuster.com. So basically what this will show you is that with this set of parameters, there are actually inputs to the substring function that produce different result than the substring. So, and you can use that with more complicated piece of code. We have another example here. With some tag parsing codes, two functions that are equivalents where the tool actually shows all the inputs where the two functions are equivalents. So Unite is actually, as I said, it's a toy, it's a playground, we wanna get people used to the idea of dynamic analysis, the idea of generating inputs for your tests and in order to get yourself familiar with it, we also built a few challenges. So you can go on Unite, maybe not now because it doesn't work, but, and you can actually try to guess what a secret function does. So it's the same idea as the compare mode, but it's, you basically have to guess the implementation of the function. All right, that's it for the demo, which was a great success. So beyond Unite, our main product is actually called Backbuster. And what it does is integration testing. So at the core, you have the same idea with symbolic execution, but on top of that, we also detect the UI elements on your webpage. We trigger events so that we can trigger the JavaScript codes. And then when you have text values, we use symbolic execution to figure them out. So this is a way for you to explore your application as much as possible and try to basically see if, you know, your application behaves in odd ways. You also have a cool API that is timing insensitive. So if you compare that with Selenium, you're gonna actually write much less codes. I think I still have some time. So now I have a few slides explaining how this is all implemented and what are the challenges, what are the limitations. So as you can imagine, there are a few. The first one is how to go about tracing of constraints. So I have showed you this thing where, you know, when you execute the code, you analyze everything that is being done, that actually requires basically instrumenting every single statement of your JavaScript code. So there are two ways to implement that. One is you can use a JavaScript parser and you can insert hooks in the JavaScript codes so that you basically spy on everything that's being done. That's one way, it's difficult because then you're also gonna have to basically reimplement a JavaScript virtual machine embedded in your JavaScript code. It's sort of very tricky. Another way, which is the way we implemented our solution is you take a JavaScript engine like Rhino or JavaScript core. In our case, we took JavaScript core from WebKit. So JavaScript core still has a version that is a pure JavaScript interpreter. So you have all the code that basically executes every single statement of JavaScript. You can modify that so that you can spy on the values and understand how your code works and trace the constraint. One thing that you're gonna get, I think if the JavaScript code developer were to look at what we did with JavaScript core, they'll probably start crying because we have one to two orders of magnets slow down because obviously we're messing up with everything. So it's gonna be slow, but it's not a big deal because this is a tool that's meant to run by itself. It's not, you know, the JavaScript core interpreter is not gonna be used live by users. So the slowdown is not an issue. The other issue is how to solve the constraints. So again, with this example with the email address, I have to somehow be able to solve that equation and it does not really look like an equation, right? It's code. This is actually the most difficult part of the whole thing. In order to do that, you can use SMT solvers. So SMT stands for Satisfiable Modulo Theories. So it's a set of theories that map real world problems like this problem to basically logic formulas that can be solved. This is fairly challenging and this is the main reason why symbolic execution has not been used so much until now because actually this is really very hard. There are lots of smart people in research that are improving these things nowadays. In particular for JavaScript, you're gonna have specific issues. You know, you have fairly interesting string manipulating functions in JavaScript. We have regular expressions. These are very challenging because you can express very complicated things with them and then solving that is difficult. So some constraints cannot be solved. So if you try and unite you will see some things can be solved in less than a second, others they'll just time out. That's the reason there are things that we cannot necessarily solve. But in general for most type of programs we can do a good job and we can progress well in the program. Last problem, the path explosion problem. This is a problem that you're facing every day when you're trying to write test case. It's basically the fact that there is most likely an unlimited number of paths in your program. You're writing if statements, loops and things all over the place. And so it's likely that your program can actually work in many different ways. So this problem affects symbolic execution because what we do there is we're taking this huge tree and we're trying to follow one path at a time. So in a big program it's gonna take a long time. But the good news is it's gonna be much faster than you doing it manually. So you can explore thousands and thousands of paths with these techniques whereas if you were to write test cases entirely manually you'll probably only do a few dozens if you're very, very motivated. Okay that's it. I'm getting to the end of the talk. I hope I was able to give you an overview of what dynamic analysis and symbolic execution can do for you. I really encourage you to start looking into these techniques. We strongly believe that this is the future. This is what's gonna help developers build all these fancy applications for the web including Firefox OS and all these very nice application platform. It's not something that you should think of as a replacement to manual testing or unit testing or static analysis. It's really a very nice complement. So you can, for instance, use JS hints so that you get your warnings about your code and then use Unite or Bugbuster to generate more test cases and complete the test suits that you would write manually. We currently have two products. One is Unite which is this playground that I just showed you and our main product Bugbuster which is currently in beta. So it's easy to use for everyone. So that's it. Thank you for your attention and I'd be happy to take your questions. Awesome. So we have plenty of time for questions. Thanks for the talk. Is there any API which we can use from our application? Yes, but it's not documented and public yet but it's coming. Okay. So both for Unite and Bugbuster, we're gonna have public REST APIs so that you can use it all from the command line or build plugins for your IDs and stuff. We have, for Bugbuster, we have for instance a Jenkins plugin that is pending so that uses the API. So that can be used as an example. My second question is in the example, validate email example you showed us. So it was generating just a couple of test cases. So I mean, how can we trust this particular system to be a more exhaustive than a test case which is something manually prepared through some other system? So you'll see that it will do a lot of things. So it's gonna be obvious that it does more testing. Now, there is no way for us to tell you that we tested everything. The main reason being that in a program that has loops and stuff, it's very difficult to compute the number of paths in a program. So what we're doing is we're actually showing coverage. So coverage of the statements. So that gives you an idea of, where is it? So, let's see if it works better now. Yeah, so with these two things, if I highlight here, I see the coverage. Can you see the green that changes, right? So this gives you an idea of how good the tests were. You should definitely use coverage. I mean, not just with Unite, in general, that's, I guess, a common that applies to everything. If you're doing testing, you need to measure the coverage. Otherwise, you're in the dark, you have no idea if you've tested 1% or 99% of your code. The mic is moving. So to your point around coverage, doesn't the tool automatically give you all the parts because it's able to get to the parts? So why is there limitation on it not reaching all the parts? I'm still not able to get it. It's just a matter of time. Okay, so if you expanded the time horizon, will it give you? Yes, unless you have expressions that our solver cannot understand. So there are things, for instance, that are extremely difficult. I can give you an example. Here I have a regular expression with a match and it works usually just fine. If I have a regular expression with a replace, that's probably gonna blow up our solver. So there are limitations, but when you have constraints that our solver can understand and if you leave it enough time, then in principle, it can explore all paths. And can this tool be used also for REST-based APIs and services? No, right now it's pure JavaScript. It's pure JavaScript. Yes. Okay. So if your JavaScript interacts with the REST API, then, well, that will work. But if you have, typically, if you have inputs that is processed on the backend that we don't see, it's only front-end code. Or I mean, you can paste in here backend code, but if you talk to a remote component that we don't see. Hi. Excuse me. Where are we? Okay. Yeah, see, there can be cases of infinite recursion and all, right? Yes. So how do you deal with that? Timeouts. So if you run for too long, we just stop. But I mean, if you run for too long, you're in trouble, right? If you write a manual test that runs forever because you have an infinite recursion, that's a bug. So if you have a timeout, that's an indication that something's wrong. Okay, one more thing. The previous example that you showed, you know, there was only one path that was tracked. And there were two test cases and both the test cases returned false. So, I mean, how do we get to know that, you know, how much of the code is actually covered? So that's basically what I showed before. You can see for each path, what is the covered codes, right? Now, if, yeah, but that gives you individual coverage of one test case, right? Sorry? That gives you a coverage of one test case. Of your code, yes. So is there any way to track down, you know, which particular lines of code were not covered at all? So there could be, in the web interface that is there on Unite, there is not. On Bugbuster, the main thing, it actually runs and explores for a long time and then you see the aggregate coverage on a file. So if you have a JavaScript library that has 1,000 lines and it's been running and triggering hundreds of events that were using that library, then you will see the aggregate coverage. And so then you will see the lines that are not covered. And then you can decide if that is either things you should test in another way or if this is actually maybe that code. Oh, we have a question upstairs? Yeah. So in addition to doing, is there any way for me to kind of, or does it automatically give garbage values and try to check that? So, you know, exception cases and all that. So for example, in this particular case, you have one valid email, but emails can be of multiple formats, right? So would it actually look at your regular expression and try different versions of the valid code? No, but we have a million ideas for doing that kind of things. So far we've been focusing a lot on the exploration part of the code and we report the bugs that pop up in the form of exceptions that we will tell you. And, you know, in Backbuster, we also report all sorts of browser issues. But yes, over time, we'd like to add more dynamic analysis checks that can, you know, take, you know, look at various aspects of your code and give you hints or tips about what's going on and what should maybe be done differently. One last question from Appian. Can you use this coverage information or like instrumented code to figure out dead parts in the code and hint the programmers that these parts of your app are probably not being used anymore? You think it's also possible? You can use that to get hints about what parts of your code are dead. Because if you run for a long time and it cannot figure out inputs, and you as a developer cannot either figure out inputs to reach these parts and chances are that this is dead code. But it cannot tell you for sure this is dead code. Yeah, it doesn't have to. It can just hint at it. Well, the hint is you've been generating tons of tests and this code still hasn't been touched. That is the hint. It's difficult to go beyond that for sure. Oh, that's new. So let's say there are two functions, function one and function two, and they are modifying similar same state. And the order switches sometimes and that order has a bug because of Ajax or some Async execution. Is it possible for you to figure out that in normal case, function one is always called before function two, but in situations when function two gets called first, we'll have issues. Can you find out those issues? So there are actually quite a few papers that have been published that look at timing issues. And what they do is they will actually try to figure out the timing dependencies between the various parts of the code and then influence these timing dependencies to trigger bugs. We don't do that. This is something we could potentially do someday. If there is a hundred of us of you that want to help us do that, please come and join us. There is quite a bit of work there. But yes, these are very interesting things, especially when you run the, so this is just for unit testing. In Bugbuster, we run the full browser thing. And yes, there are lots of Ajax requests going on. We've had cases, especially with one customer where there was one bug that was due to timing issues and the bug will pop up with Bugbuster sometimes and sometimes it won't, because we don't control the timing. So then we have the same issue as a normal user or tester. If the latency of the network changes, then you get a different result. But yes, this is a very interesting direction for future work. Okay, the puzzle function, it'll take primitive objects only, is it? Yes, yes. Okay, because many cases our functions might have an actual object. So how do you test those cases? So this is a very simplified API. Currently we do not support, so basically this is fuzzing values. It's actually fuzzing in a smart way because it's trying to solve equations to find the values. We can fuzz only primitive types, but we can go into an object to tell the tool that this part of the object, which is a primitive type, should be fuzzed. Yes, so. This is not exposed in this tool yet, but it's actually there. Okay. Cool, all right. Thank you, Olivier. Round of applause, guys. Thank you. That's awesome.