 Hi, everyone. My name is John. Today, I'm going to be talking to you about mutation testing. I'm the CTO at a small tech company in Palo Alto called Cognito. Sorry if there's a little flickering here. We're not sure what's going on, but all right. So before I get into it, I want to give you a quick outline of the talk. I'm going to give you an introduction to what mutation testing is. I'm going to show you how it can help you improve test coverage. Then I'm going to show you how it can teach you more about Ruby and the code that you rely on. I'm going to show you how it can be an x-ray for legacy code. How it can be a great tool for detecting dead code. How it can be probably the most thorough measure of test coverage. How it can help simplify code. Then I'm going to wrap it up by talking about the practicality of mutation testing day-to-day and how you can incorporate it at your job. So before we talk about mutation coverage, we need to be on the same page for line coverage or test coverage in general. Usually when we're talking about test coverage, we mean line coverage. Line coverage roughly means the number of lines of code run by your tests over the total lines of code in the project. There's different variations like branch coverage, but that's sort of the gist of it. Mutation testing asks a different question. It says, how much of your code can I change without failing your tests? If you think about it, that makes sense. If I can remove a line of code or meaningfully modify a line of code in your project without breaking your tests, then something is probably wrong. You're either missing a test or that said code. So before we actually dive into how to automatically, you know, with a tool, do mutation testing, I want to give you a good intuition of what mutation testing is by doing it by hand. So I've got some sample code here. You can take a second to read it over. So I've got this class called Glutton's at the top. I just initialize it with a Twitter client. Then I do a search on the Twitter API using that client. And then I get the first two results, grab the author from it and return that. That's basically what the test specifies down here. It's got a fake client and then some fake tweets. On the left here, I've got the same code but in sublime text on the left. And then on the right, I've got a script that's going to run whenever I modify the file. That script is going to output a diff of the code against the current output or the current code in sublime text as well as the result of running the tests. So first, I'm going to go in and try to modify the hashtag that does not fail the test. I can also remove the search string entirely and that doesn't fail it. And I can actually call it with zero argument and that also does not fail the test. If I change first two to first one, that does fail the test, that's good. But if I change it to first three, that does not fail the test. All right. So going over those again, I can basically change the input to the search method, however I want. I can remove the hashtag, remove the entire search string, call it with a different number of arguments. It doesn't matter. If I change first two to first one, that does fail it. That's because we're giving us two fake tweets in our fake client. But if I change it to first three, then that does not fail the test. That's because we only have two fake tweets in our test. So that's manual mutation testing. You can imagine that doing that actually day to day at your job would be pretty tedious. This is just one method, but if we're adding 100 lines of code, trying to do this for every single part of the code that we're adding would be a lot of wasted time. And it's also going to be pretty hard to outsmart yourself. If you just did the best job you can, writing this code and writing the test for it, then it's going to be hard to then 30 seconds later come up with things you didn't think of before. All right, now I'm going to show you how to do mutation testing with a automated tool. The main tool for this is called Mutant. It's been around for years. I learned about it about two years ago, and it's how I got into mutation testing. And I've since then become a large contributor to the project. A friend and I also just started a fork of this project recently called Mutest. That is pretty similar right now, and you'll notice throughout the presentation I'll probably refer to them interchangeably, but you can use it either one. All right, so in this example here, I'm invoking the Mutant command line argument, or the command line program, and passing in a flag saying to use the RSpec integration. Then I'm telling it to mutate the class that we just saw. There's going to be a lot of noise in this output, so don't worry about it. We'll go over all the results again. Each diff here is a mutation that I found while running my test. So it found some things that we also found during our manual mutation testing run. It can remove the entire argument and it doesn't matter. It also pointed out that we can pass in a different type of variable to the search. We can also pass in nil, which is interesting. It's something that we didn't catch in our manual run, is it going to change first two to last two? And if this is the recent method that finds the most recent tweets, then this is probably a pretty bad change. If we care about finding the most recent tweets, then we probably don't want to return the oldest ones. You can also remove the first two call entirely, which is interesting. We probably want to specify that because if we ship this code to production and didn't have this code in there, you could see how we could possibly quickly exhaust our API token and rate limit ourselves. So in this case, our mutation system tool shows us how to improve the test. We give it three fake tweets instead of two, and we also explicitly specify the search that we expected to perform. So when we use new tests, it's automated, it's quick, and we don't have to think or expend much more effort, and it's probably going to be more clever than you do. Mutant has been accruing different mutations for years that all target very specific use cases and try to point out specific changes depending on the code that it's interacting with. So here's another example. In this case, I don't think you do imagine that you're working on an internal API. Here's some sample code. I'll give you a second to read it. Cool. So here we have the user's controller, and we've got the show action. We're taking in the ID parameter, making sure that it's an integer, passing it to the user finder, and either rendering JSON for what we found or rendering an error, and that's pretty much what the test will always specify. If we run this through our mutation testing tool, it's going to show us that we can replace the 2i method with the uppercase integer method. That's actually pretty interesting. If you're not familiar with the difference, the 2i method will work on any string and on nil. If I don't have any integers or any digits in my string, it's still going to give me zero. If we call it on nil, it's going to give me zero. The integer method is going to raise an error if I give it nil, and it's going to raise an error if it can't get a number out of the string. It's also going to change the hash bracket method there to hash fetch, and the difference there is a little bit more strict on the presence of the key. In the original implementation, if the ID key was not there, this would silently return nil. Now, in this code, it's going to raise an error if that key isn't there. If we put those together, our tool is forcing us to write a slightly more strict implementation of this action. It's saying assert the presence of the key, assert that the ID value is actually parsable as an integer. This has some interesting implications, too. We're modeling our problem a little bit better. For example, before, if someone used the API incorrectly and did not pass in the ID key, then we would try to get the ID, we could nil, we could color it to zero, pass that to the finder, and then return the user an error saying could not find the user with ID zero. So this is a bit more of a well-fitted implementation for this problem, and we're also being forced to think about things like not performing extraneous database queries instead of doing validation ahead of time. Here's another small example. In this case, we've got a created after action. I'll let you read it over real quick. Cool. So in this case, we're passing in a parameter called after. It's going to parse that input and then pass it to a class method on a user called recent. If we run this through a new test, it's going to show us that we can actually replace parse with a method called ISO H601. We're not familiar with the difference. That's okay. It's a pretty poorly named method. But basically, it's a more strict parsing method. Specifically, it specifies there's going to be four digits for the year, a dash, two digits for the month, dash, two digits for the day. And this is pretty significant compared to the parsing rules for data parks. It's basically going to try to do anything that can parse the input. It's going to support all these different formats, as well as something that we might not want to parse. If it finds the name of a month inside of the input, then it's going to try to parse that. So on the left, we've got every valid input. Now, for May 1, 2017, on the right, we have all the different inputs that can produce May 1, 2017. Now we're going to talk about regular expressions. I'm particularly excited about this part of the presentation because this is a feature that I think no other tool in the Ruby ecosystem can really help you with. And Mutant and Mutus can actually dig into a regular expression and show you that you're not covering branches within it, which is pretty cool. So here's some sample code. Basically here, we are iterating over a list of usernames, presumably an array of strings, and we're selecting the ones that match this regular expression. The first thing we're going to see here is it's going to try to replace the carat and the dollar sign with backslash uppercase a and backslash z. And if you're not familiar with the difference, the carat and the dollar sign mean beginning and end of line, whereas the backslash uppercase a and backslash z mean beginning and end of string. So in the first case, I could pass in Alice, newline, John, newline, Bob, and it would match. And so it's showing us in this case, hey, you don't provide any test input that shows that you want to handle these multi-line strings, so I can actually change this to the more strict format and that's what works. It's also going to try to remove each value in the alternation and make sure that we're actually testing each conditional, because inside the regular expression we're actually saying John or Alan are both valid matches, so we should be testing both cases. It's also going to try to put a question mark, colon at the beginning of the string, and that means that it's changing it to a passive capture group. Basically parentheses and regular expressions serve multiple purposes. They can both be a mechanism for grouping expressions like here where we have the pipe where we're saying John or Alan, but it also means that we want to extract this value and preserve it in the match data. So in this case the question mark, colon means we don't care about extracting this value, we're just grouping. And so in this case it's recommending that we either test that we're capturing something or use this more intentional reviewing syntax. Then finally, if we're running this on Ruby 2.4, it's going to say hey, I can use the new match predicate method, and if you're not familiar with the difference, this method is new in Ruby 2.4, and it's about three times faster. It only returns true or false, and the way it's faster is it doesn't do anything with global variables, whereas every other regular expression method will actually set variables regardless of whether you want them. And if we put all these together, we get something that is more strict about it, better tested, more intentional reviewing with the passive capture group, and more performance. And the cool thing here is that we didn't have to know about any of these features in Ruby in order to write this method. We wrote what we knew, and then the tool recommended all these changes which resulted in a pretty different method, but better fitted for our task. Now I'd like to talk about HP clients. Here we've got a method called stars4, and it's using the popular HTTP party client. It's going to take in a repository name, hit the get up API, turn the result into a hash, and then get the key under the stargazers count. If we run the stormy teaching testing tool, we're going to see that it can remove the 2-H method, and everything still works. Now this might seem a little confusing at first, but what's going on here is the HTTP party client actually will look at the content type response header, and behave like a hash if the response is JSON. And as a result we can actually remove that 2-H method and interact with the response object just like we were before, and it works the same. And the cool thing here is that MuteS does not have any like specific HTTP party like support within it. It just knows how to walk through your method definition and remove different methods. And so as a result, even if we didn't know this before, we're going to see this mutation, read the documentation and update our code. And we learn a little bit more in the process. All right. Now I'd like to talk about legacy code. This is the same code example we had before that created after endpoint where we're passing a date. In this case, I'd like you to imagine that instead of implementing this method yourself, you're being tasked with updating the method, maybe adding a new feature. And to make this more realistic, let's say that the original author wrote it two years ago, there isn't much documentation, there's only a few tests, and they no longer work with this company. When you run your mutations and tool on this code before you actually modify it, you're going to see this mutation to ISO 8601. And if you don't know about it then, you're going to probably look at the documentation and see what the difference is. Huh. This is a more strict date parsing format. Interesting. This leads to us asking a few questions about the code and questions. What was the author's intent here? Did they mean for people to only use this very strict format? Or did they mean for people to be able to use any format, and they just didn't add tests for that? More importantly, how is this code actually being used today? If there are other services that are passing in other formats here, we probably want to actually update the test to reflect that we support this. We don't want to break their integration. And so running our mutation on this code before we modify it is giving us a checklist, basically, of things or questions that we should answer before we modify it. In other words, it's basically giving us sort of a list of hot spots where if we modify this part of the code, we might actually introduce a regression and the test won't fail. This probably isn't too surprising, but mutation testing can be a very thorough way to measure test coverage. Consider this method right here. If we invoke this method at all within our program, then a line coverage tool is going to say that we have 100% coverage. But even if we test it directly, we are still probably not testing it in the ideal way. Our mutation testing tool is going to show us that it can actually fiddle with the boundaries here and say, hey, are you actually testing for the off-by-one errors here? It's going to say, do you have a test specifying that 21 is the minimum age for buying alcohol and that 20 is rejected, 22 is allowed? And by fiddling with its boundaries, it's actually helping us improve our test. And this very thorough modification of the code can be a very big help when we're dealing with very complex methods that seem pretty simple. This is only a nine-line method here, but in this case, we're dealing with a lot of complexity. So here, that's deciding whether a given user in a system called an editor here can edit a given post. We've got some different user roles here. By modifying each one of this code, each individual token, it's actually going to ask us, are you testing the case where the user is a guest? What about when they're muted? What about when they're a normal user and they are the author of a post? What about when that post is locked? Are you testing these conditions together? When the editor is the author of the post and when it's locked? The same condition about when it's not locked? When they're not the author and it's locked? What about when they're a moderator? Are you testing the condition where the author is and is not an admin? Are you testing the case where they're an admin? This might seem like a large amount of tests to be writing for this pretty simple method, but the mutations in tools is pushing us a little bit closer to the actual complexity here. If you think about it, the editor of the post can have five different roles according to this code. The author can have five different roles, and we also have the case where the editor is or is not the user trying to edit the post. And then finally we have the condition where the post is or is not locked. So we're actually dealing with at least 31 different conditions here. This is a lot of complexity at least forcing us to embrace how complex it is and actually prove that we are handling for all these different conditions. Here's another small example. In this case I'm taking an list of users and I'm napping over them and grabbing their email, and I'm filtering out users that either don't have an email or have previously unsubscribed from our mailing list. This is the sort of code that I would usually write to test what we just saw. In the first example we've got a valid user and then a user without an email. And then we're asserting that the valid user email is the only thing in the output. In the second case we have the same thing, we have a valid user and unsubscribed user and we're asserting that the only valid user is email in the output. Our mutation system tool is showing us that we can change next to break here. It's not pretty interesting but it makes sense given the test that we wrote if we look back at them. In each case we have the invalid user or the user that we're trying to filter out at the end of our test input. So in this case skipping one iteration is the same as ending iteration. And so in this case the way to correct these tests is to put the user that we want to skip at the beginning and have the good user at the end. There's just another small change that the mutation test is able to make to help us improve our test. Mutation testing is also a great tool for detecting dead code. Consider this example right here. Maybe I'm new to Rails and I don't know that ActiveRecord is going to do this for me if I have a column called name. Even if I don't know this, if I run the mutation testing tool on it it's going to show me that I can replace this method. This might seem a little weird at first. What it's saying is that the entire implementation of this method is already covered by the parent class. So in other words, as a new user to Rails I didn't have to read any documentation I didn't have to talk to any co-workers in order to discover that I'm introducing a redundant method. Here's another example where I've got a post controller and then a private method called user. It's got an optional argument called user and that's going to default to the current user. If the usage looks like this then it's going to find one mutation. It's going to say, hey, you're always passing on a user so we actually don't need this default argument. But if the usage looks like this it's going to do something different. It's going to try to apply that previous mutation and the test is going to fail because we are calling it with zero arguments. Instead it's going to say, I'm going to put it in the beginning of the method body. So in other words, no matter what you pass into this I can overwrite the local variable with the value of current user. In other words, the value of user is static here and we can actually just inline current user into the method and remove the argument entirely. This is a very small feature that I actually like a lot because I find myself running into it a lot. Maybe I'm doing a refactor and I have this code elsewhere and I moved it into a method like this and I forgot to update the constant. Well, Mutest is going to show us that we can replace colon-colon-my-app-colon-colon with nothing. It's going to remove it. And we get this for free. It's just going to say, hey, I can actually simplify this constant reference and it's the same thing. Here's another small example. In this case, we are passing in an ID parameter to this controller calling the post-finder rendering the response giving it an HTTP status of 200. The mutation testing tool is going to show us that we can actually remove that status okay entirely. And if we look at the documentation, it makes sense. In this case, the default status code is going to be 200. So again, we're learning a little bit more about what Rails provides for us without actually having to read the documentation or learn from a co-worker. Similar to the dead code that I just showed you, mutation testing is also a great resource for simplifying your code. Here's another method that we might have inside of a controller. Basically, we're taking in a user ID parameter, which is presumably an array, integers. We're calling the user finder and splatting the input. It's going to say, hey, you don't actually need the splat here. You can just pass in the array and it behaves the same. So again, we're learning a little bit more about ActiveRecords interface. And it's basically zero cost. Here we have a user decorator. At the top, we have an attribute reader for the user, and then the greeting method just returns welcome in the name of the user. It's using the instance variable. The mutation testing tool is going to show us that we can actually replace the user instance variable with the user method. This is a very small change, but I actually like it a lot. We have the attribute reader, so why not use it? Also, the method call has some nice properties that we don't get with the instance variable. If we type out the instance variable, we're going to sign on to get nil, and then we're going to get a slightly more cryptic error here. But if we type out the method call, we're going to get a slightly more clear error saying that we type out something. Here's another small method where we're passing in a string, which is just a path in the Unix system. We've got the leading, and then we're going to replace the leading tilde with the home environment variable. Running this through Mutest is going to show us that we can replace gsub with sub, which makes sense. We don't need a global substitution here. We're doing one substitution, so it's recommending to us that we can use the more intention-reviewing and specific method sub, which only does one substitution. Here's another example that I run into pretty frequently. Maybe we would have this sort of method if we were a company like Imager, in a delayed job or something. We are going to regularly look for images that haven't been viewed in the last two years. Then we're going to iterate over them and log a little bit of debug output, and then delete all of them and return the count. Well, it's going to show us here we can remove the map and replace it with each. It makes sense. We're not iterating over this input and returning a new array. So we can just use the normal method of each method. This is something I run into a lot when I'm reflecting on some code where previously I was mapping over the input and returning something new and moving somewhere else, and I forget to change it back to an each. And the nice thing is I don't have to always worry about making these little mistakes. I know that Mutes will catch up with me. Then finally, here's another small example where I'm using the Ruby standard library logger, and I am setting a formatter, which is going to take information about a log event and take the different data there, format a string, and then that will be what's logged to the output stream here. Now, Mutes is going to show us that we can actually replace this proc with a lambda. Usually they're pretty similar, so usually I forget what the actual difference is here. But using this very simplified example here, we've got a proc that takes in two arguments, forms an array, and then specifies that array. Now, if we call this the proc version, I can call it with no arguments. One argument, two arguments, three arguments, it doesn't matter. There's too few arguments it's going to fill in the arguments with nil, and if there's too many arguments, it's going to silently drop them. And it actually has the same behavior if we pass in an array. It silently splat that array and then behave the same as before. But if we use a lambda, it behaves a little bit more sanally. So you're probably thinking now for one regular expression we get five mutations. That seems a little ridiculous. I usually open PRs that are hundreds of lines along, and my tests take hours to run. So how can this possibly be practical? Well, there's a few features that make this more manageable. First, it takes in a since fly where you can pass in a git revision, and that basically says only mutate code that has changed since this git revision. So in this case, if we have two commits, and we specify since master, it's only going to select that code that has changed in those two commits. You can also pass in a test selector, which is like this constant name and then a method. And that's saying and maybe my giant object where I have hundreds of methods, I only change one thing, so I only mutate that. And it also understands our spec conventions. So if you're describing a class and then describing a method, it's actually only going to select for mutation that small method that you're worked with before, and it's only going to select the half-dozen tests that actually specify the behavior of that method. So mutation testing, I think, has been one of the most powerful sources of growth for me the last few years. And I think if you're not using mutation testing today, it can probably help you grow a lot, too. It helps you learn more about Ruby. There's dozens and dozens of special case mutations that are baked into the tool that only show up when they apply to your current task. And so you sort of learn about them just in time. Some examples from this presentation are how it changed a parse to ISO 8601, all these different regular expression features, the new feature, the new match feature in Ruby 2.4 from regular expressions, and also the product and lambda change. And the generalized changes that it makes, the different removals of lines of code or arguments that you're passing into a method or default arguments, you'll learn more about the code that you actually rely on and you'll be surprised by how frequently this actually results in you learning something new. Some examples from this are how we learned about how HTTP party behaves differently if the content type is application JSON, as well as all the different behaviors that we see from different rails, methods, things like the controller behavior with the default status code and active records interface. And the net result here is not just that you learn more, I think you also learn a little bit faster, at least that's what I've found. Whenever you do work, you're going to also be learning a little bit more about Ruby on average and learning a little bit more about the code you interact with. So this is sort of an amplifying effect, I think. And it's obviously going to improve your testing skills. You're going to start thinking more about what actually is the expected behavior of this feature that I'm adding. And then the net result here is you end up modeling your understanding of the code a little bit better and you end up shipping fewer bugs. You're understanding what tests are still not doing anything or what test cases are not being tested. You're removing dead code and you're removing unnecessary code. You're using more simple methods within Ruby. And if you do this mutation testing on code before you modify it, and you're unfamiliar with it, you're probably going to introduce fewer regressions too. As I mentioned before, you're going to get sort of a list of hot spots from the application that are likely to allow you to break the code without failing the test, they'll only show up in production. And it gives you the sort of checklist of like, before I change this, I need to understand is someone supposed to only pass in this date format, or are people now using it in different ways? It results in writing a simpler code similar to the dead code detection I mentioned before. You're going to be surprised by, you know, removing a few lines here simplifying the method call here using a simpler Ruby method over there. That comes together as dramatically simpler code and it's not much effort for you to arrive at that after writing the initial implementation. So I hope, at least some of you are excited to use mutation testing on the job now. If your coworkers are not excited about using it as a team, you can still use MuteTest before you push. If you do so, you're probably going to learn a little bit more about Ruby, a little bit more about the code you depend on, you're going to write better tests, and you're probably going to grow a little bit faster than your coworkers. And if you are a team lead, you should consider adding MuteTest to your CI. You don't have to aim for 100% mutation coverage in order to benefit here. Just being able to see what code can I change here without failing the test is a powerful tool for both the author and the code reviewer. For the author, it lets them sort of review themselves and ask should I change anything else here before I ship this? And for the code reviewer, they don't have to deeply understand the tests and the code involved. As much in order to understand is it safe to go to production? They can at least look at CI and think is this what modifications can we make here if there are a bunch? Maybe we should add some more tests. If you like what you saw here today and you love writing great code, you should email us jobs at CognitoHQ.com and I hope that you all have a great day. Thanks.