 My name's Howard Diner. I come from the United States, if you didn't guess. I live right outside of New York. My offices are in Redmond, Seattle, where Microsoft is right down the road. We also have a great office here in Bengaluru, and welcome. So I'm going to talk this afternoon about a particular tool. And I wonder if I raise my hands. How many people know about mutation testing? And a tool called PIT. OK, let's get started with this, because this is going to be pretty brief. I want to start by talking about this fellow. His name is Dykstra. He was a professor, a big person back in the early days. He made a really great comment back in 1988. And what he was talking about was the complexity of development. Now back then, he was saying, people that are programmers have to go from bits to megabytes, microseconds to a half an hour. He was talking about a complexity of things about 1 to 10 to the 9. Now, I've done a little extrapolation. And since computers are about 1,000 times more powerful than they were, which is approximately the number of neurons in your brain, OK? It'll come up, OK. I'm going to keep going, although it would be nice to have the projector at this point. There's a cost of defects that are out there. There was a study done in 2013 by Cambridge University. They put the cost of fixing defects at about $312 billion with a B dollars annually. They also said that developers spend approximately 50% of their time finding and fixing bugs. I looked up that number, OK, $312 billion. And in terms of gross domestic product GDP, that would put software defects ahead of countries of their total GDP, countries like Israel, Hong Kong, Egypt, and the Philippines. We're talking huge numbers. And the problem is that that number, that $312 billion, doesn't even include the biggest cost of defects, right? It's not in the direct remediation of those defects, but there are much larger problems like reputation loss. So I'm going to play a little video. I don't know if anybody has ever seen this out here. It was made by IBM probably about 15 years ago and talks about how do you fix defects. Sir Arthur, a giant sloth has attacked our service hub at Slough, bringing great unrest to the customer base. Consultant Ned, what do you advise? You must build a giant catapult to find the greatest of projectiles, sir, the sloth. What kind of projectile? Are you suggesting we throw money at the problem? Precisely. And, thus, sir Arthur saw the need for IBM consultants who actually are accountable for results. Well, I saw it from the plug from IBM. That's usually the way that we approach problems. We throw money at it. We throw people at it. And wouldn't it be great if we could save all of that cost? So how do we go about fixing all this stuff? Well, the first thing about it is that when we develop software, we call ourselves software engineers, right? That's what everybody usually gives birth themselves. The problem is that the stuff that we design is like, well, it's kind of better than nothing, or it's better than the stuff we had before, right? But if we were a structural engineer and we built a bridge and the bridge fell down, people wouldn't think very much of us. Or how about a guy that designs a plane, the avionics on a plane, and you were 35,000 feet and cruising, and all of a sudden, all the avionics go out, okay? Would the engineer that designed that panel kind of like say, oh, oh, well, it happens. If we want to mature as an industry, we've got to figure out better ways to deal with this stuff and adopt a zero tolerance policy. So one of the ways to do that, of course, is to find defects early, right? Even large defects, if we can fix them quick enough, the costs are usually really low, right? There's not all those kind of indirect costs and we don't have people that are really upset at us, right? Little defects, though, that escape into the wild can be company killers, okay? We have to avoid that kind of stuff. So it really comes down to turning testing, turning development on its head, all right? We used to have a pyramid and at the bottom of the pyramid, we'd have things like analysis and design. We work our way up and do some coding. Then we do some testing and as everybody knows, the right way to do things nowadays is to start with tests, write code to make the test pass and then refactor, which is actually analysis and design, right? So that's extreme programming, okay? And when we get to really extreme programming with true unit tests, truly decoupled, right? We can run our tests thousands and thousands in a second. We can start to look at how we can get better at writing better code. However, test-driven development is only necessary. It's not sufficient for quality in code, okay? And the problem with it is that there's four things that keep us from using TDD to its fullest all the time, right? The first is I like to say that TDD is like being on a diet. You know it's good for you. You know you have to do that, but wow, it's so easy not to, right? You also have managers that are out there that are pushing people to say, look, just get it done, just write it. And then they kind of do what I call code and pray, right? They put the code out there and pray nothing happens, it's bad, right? It's also really hard to do test-driven development when you're talking about dependencies, the evil dependencies, and finding out how we can lock in and fake our way around that stuff can be difficult at times. Finally, there's the problem of legacy code, right? Does anybody in this room not have problems with legacy code in their lives, right? The problem with legacy code is it can't just go and put new unit tests inside of it, okay? It takes so much refactoring that many times it's more expensive to take the legacy code and go back and unit test it than it is to write it again. And the problem, of course, is that it's a risky proposition. You got code that kind of works, right? Better than nothing, so maybe we'll just continue with it. So, many managers kind of like to hedge their bets. They'll say, you know what, do some unit tests, get me test coverage that's 100%. Then they feel kind of confident that they can go forward and not have problems in production. However, that's kind of a misplaced confidence. We'll talk about why in a second, all right? But if test coverage isn't the right way to tell that you have enough unit tests to give you real confidence in what is. So, I want you to imagine that you have your code written, you have real unit tests going, you run coverage, your favorite coverage tool and shows 100%, and then somebody's leaning over your shoulder, and they say to you, okay, this is great, you got great code. Would you mind if I messed with it a little bit, right? I'm gonna not tell you what I did, but I'm gonna take some of your conditionals and maybe reverse them. Or I'll take a random, you know, void method call that you have and take it out, right? And if your code coverage runs the same, right, you don't have some kind of unit test failing, then there's something wrong with your unit test. You didn't catch the fact that I messed with your code, you got different answers. That's mutation testing, okay? What we do is we wanna scale that up. Instead of having one person leading over your code, we want that army of mutants to come in and mess with it all over the place, okay? So, let me first talk about test coverage and how test coverage is really, in my opinion, a management report. To do that, we're gonna see a little code. So the code we're gonna write is something I call bar world. And bar world is gonna take people that come into a bar and they have like a cover charge, right? You know, you have to pay so much money to enter. Okay, wanna make sure you got that, never know. And there'll be three categories of people. There will be women, and there'll be two categories of women. Women over 35, well, pay one price. Women under, I think over 21, they get in for free. And men, they gotta pay, they gotta pay $30 a person. Let's see this code, right? I'm first gonna define an interface for a person. Only thing that comes out of that interface is something that gets a cover charge. I'm gonna then implement this, okay, and define a person factory, a factory method for a person, right? It's gonna look at their gender and their age. If they're male, big M, right? Then we're gonna return a person male. Otherwise, we return either a person female or a person female young. If none of that happens, then we tell them, send them home. Okay? You come in, person female, that implements our person, it extends the person factory, right? We call the Superport, a regular woman pays 10 bucks. Not a bad deal. If you happen to be younger, right, and you're over 21, I guess, you don't pay anything. And guys, 30., okay, belly up. Okay, so that's our stuff. I'm gonna then write some unit tests. This is really simple code, right? So I'm not doing a tester of development, but I am writing the unit test for it. All right, I test my rules, make sure that all my rules fire, men 35. They gotta pay 30, $10 for a woman who's 37. Nothing for a woman who's 22. I test that I can throw the exception with a woman who's trying to enter at 18. No, you gotta go home, right? When the coverage is 100%, pretty good. So now we take this and we give it to our DevOps people, probably is the word for today, right? I'm gonna simulate what would happen, okay, with this particular main, okay, where I'm gonna have a male who's 45, that should be 30 bucks. A female who's 37, that should be 10. Female who's 33, that should be 10. Male 30, that should be 30. Two young women come in, we don't do anything for them. We then iterate over that collection that we have. We count the number of men and women, the amount that we collected. We print it all out at the end. Looks pretty reasonable, right? Well, nope, train wreck. Okay, when we run this code, it comes back and tells us that we don't have any men, that we have six women, and that we collected $30. Every 100% coverage, for God's sakes. What went wrong? Yeah, okay, you got it, okay. And you're not even a mutant, okay? But, like I say, it's very easy to make a mistake. Okay, so we're gonna look and get into that. What I do wanna say, though, is that test coverage is not the same as quality. And I think it has to do with how we get mesmerized for that particular metric, right? It's very simple to capture it, right? It's not a bad one, okay? But test coverage only tells us the code that we didn't test. We happen to run through every statement in our unit test, 100% coverage, right? But that's kind of a shallow compliment when it comes to quality. And it's easy to gain, right? If you wanna impress your manager and they demand that you have 100% coverage, go into JUnit, write a big old integration test, the test, every kind of feature you can think of at the end, say assert true true, you run the thing, boom, high coverage rates, right? Metrics are supposed to start a conversation. They're not supposed to be everything. And when we stare at those metrics in the big fancy dashboard and look at them, right, we get overconfident that things are all fine, okay? How many tests do we need? Well, if you look at Martin Fowler, he'll tell you, if you rarely get bugs in production, you're fine. He expects like 80 or 90%. Here's some of the mutants that you'll find. I'm not gonna go through these in depth. All right, there's a whole bunch of them, okay? There's stuff that takes boundaries on conditionals, negates conditionals, removes them completely, all right, takes out a return of an object, returns a null, all right? Stuff that takes out a void call, okay? So there's side effects going on, all right? Changes around iterators, incrementers. Here's the whole list. Great, let's see how that really works though, okay? So here's a demonstration of some code, okay, happens to be for Quicksort. Okay, let's work through a not so trivially example, a Quicksort. Found this code, it's kind of nice code actually, at mycsutorials.com for Quicksort. And it's the traditional CS-101 variety of Quicksort, all right, put into Java. You know, we're gonna be sorting an array, this is gonna be an integer array, doesn't matter for us right now. You'll see the standard sort of pivot points, standard sort of recursion stuff going on, swap area over here, wrote a test for it, okay? We're gonna sort the first nine digits of pi, which just so happened to be the first nine digits, which is kind of interesting. All right, so the first thing that we wanna do, of course, is to make sure that our test runs, do that, run this as a JUnit, voila, our test runs. Let's check the coverage now of this code, okay? So we're gonna use the standard Eclipse plugin for code coverage, and we're gonna run it through JUnit, we can actually test what this is looking at. When I'm doing the coverage, I'm only doing the source, all right, so let's go ahead and run that. Our tests still work, of course, our coverage is 100%, so everything inside of our code has been tested. Now, let's run pit against this and see if there's actually some code sitting out here, which could cause us some problems in the future. Of course, those yellow areas are pretty sub, let me see something with that. So we're gonna run this again as a pit mutation. I've installed the pit clips plugin for that. So as you're seeing, it's taking the byte code that's been out there for our, that we got during our testing, running mutations against it. There's a lot of mutations that it doesn't have to test. It starts finding some and testing them out. That timeout will occur when a mutation takes place and it's making the thing run forever, so it cuts it off after a certain amount of time, which is, you know, resettable. It's running through this stuff. Normally, you know, naturally, this is taking a lot longer because of the output and because of all the code that's being tested. All right, so let's wait another couple of seconds for it, it's still running. And we finally come out there. We can actually see that what's happened with this. Let's take a look at the summary first. Now, interestingly, all of our line coverage that we got is 100%, that's the way the pit's supposed to be. We only have one class. Okay, so inside that class, we only have one method. Now we start looking at the method. Now, even though the code had 100% line coverage, these lines in here were tested, they were run through, not every condition on them was run, of course. So one of the things that we're seeing, right, is that when the conditional boundary on this while loop was changed, changed from a greater than to a greater than equal, timed out, right, never could complete. That's not so terrible. We start seeing, though, other things inside here. Here, conditional boundaries were changed. And because this is fairly complex, we'd have to look and see which ones were changed. And we got a problem. All right, there are some things that didn't change in the output, meaning that our test didn't catch everything. All right, so not enough tests involved. The code still may be right, but our tests were inconclusive. This was a different one where there was a different conditional boundary that was changed. All right, we get a list of all these things inside here, line numbers and all that's going on inside of it. So as you can see, this code, which is really quite good code, right, it's really nothing particularly wrong with this, it's right by the book, gives us some pause. Naturally, this area in here, we're pretty sure that it works. It's been written a lot before, but do we want to test it so that we're sure that it's going to work correctly all the time? All right, we want to make sure our bridge doesn't collapse. Now, I did the same thing for JUnit for the sake of staying in this time box. I'm going to skip over it. I took the current head from the middle of GitHub for JUnit, RANDSTOP, there's a lot of things that are exposed, okay, but we're going to skip over that. Let me get to the conclusion here, okay? First of all, writing code is hard, right? You have to always challenge that hypothesis, it's science, right, that as we modify the code, everything still works and our assumptions are still true, right? JUnit testing is really the tool of choice for writing high quality code, but it's hard like being in a sweet shop when you're on a diet, or having a manager that wants us to code and pray, or having nasty legacy code, all right? We ultimately have to ask ourselves how many tests do we really need? It's hard to answer, right? If our code never fails in production, there's a good chance we actually don't need more JUnit tests, okay? And that's actually not a bad thing to think about, but if our code hasn't been released yet, we want some indication that it's going to stay good, we might want to think about getting past just 100% coverage, because 100% coverage does not mean quality, it doesn't mean good code, right? It only shows us that there was no code that wasn't tested at all. Do that, you need that mutant army, you need a tool like PIT. PIT isn't really perfect, especially for trivial products. It can take a lot of time to run. It presumes that you already have 100% coverage. It can give you false positives, right? And also, it'll tell you where the problem is, but it doesn't write the test that fixes it, okay? You know, I want to think of tests as the practice that we do when our code is really run, right? It's like figuring out how to get to Carnegie Hall, right? You have to practice. Hey, do you know how to get to Carnegie Hall? Practice, man, practice. The old ones are the best ones. I want to thank you very much for your time, and I'll be around. There are some questions, answers. We're going to debate, and we're going to depart. Link me in. I'm always happy to engage with people, and thank you very much. It's possible, I don't know of any tool that's done stuff for JavaScript. But something like the rapid application development where we do testing for the line coverage, and we do for the branches and then the functional. Right, I'm aware that inside of like Jasmine, you can go through and get coverage data that's a step in the right direction, but it's not mutation testing. You could write a tool that does it, right? These are the same, mutation testing actually isn't anything new, right? It was a doctoral thesis back in the 80s, okay? But it's just been too hard for people to go through and actually make the tools. So there's tools for Java Fit, which actually stands for parallel integration testing, which is where it started, and then they said, hey, this is really cool for this other stuff. So they named it, kept it with Fit. That exists, there's some stuff for CLR languages that exists. Microsoft is actually big into something called PECS, right, which is actually going to, not only do mutation testing, but give you unit tests that now exercise that mutation. But for JavaScript, I know of nothing. Sorry. Perfect. Yes. Well, Pit will work with Java byte code, okay? And the reason we wanna work with byte code is they don't wanna have to recompile everything. It's a very CPU intensive thing. So they can look at the byte code and then figure out if this is a boundary and then what could we do about this. Same thing with the CLRs and PECS and stuff. Right, but it brings you back into the source. Yeah. You know, let's go have some coffee.