 Hello everyone So my name is Alex I have been testing open-source software for the last 10 years This is how to reach me if you if you like and this is how I look no pretty close So today I'm going to talk about mutants and zombies, but not the ones from the movie slightly different and I have some code examples So first let me get an idea how many of you are familiar with Python at least to some degree Okay, good. And how many of you are familiar with Ruby? Okay, great So don't worry. The examples are not very hard to understand So let's take This piece of code It's a simple class describing one model in the rails application. There's only one field that goes into the database And this is only one method, which is actually, you know, really stupid method Just returns an uppercase string of that field and as you can see from this screenshot, this class has 100% cold coverage and The question that I'm asking myself as a tester and the question that I expect everybody should be asking themselves here is is my test sweet good enough and good enough in terms Whenever there's some change in the software under test Is my test sweet able to detect if this change will break something or the change goes undetected and possibly goes into production and something bad really happens and Mutation testing can help you answer this question, but first I need to explain what a mutation is So a mutation is usually a very small change in the software Which somehow changes the behavior of the software mutations can Can come from comparison operator and if statements and if we take the example on the screen it's the mutation is replacing the great done operator with a less than operator and If you apply this mutation to your code and read it it becomes if H is less than 18 then by beer Now imagine that you are an online store and suddenly, you know You start doing this so beer to small children and this is not good Then another possible source of mutations is constant values You can replace these values with something else and in this example We replace truth false and because it's very early in the morning yet if we didn't have our coffees We would be looking like this another possible source of mutations is loops we may For example modified the loop condition or in this example We may change break to continue and we get an endless loop and when you have a landless loop. This is what happens So this is again an example from production and all my examples are taken from production environments today Well, not the beer one but by definition mutation testing is a Technique and also the tools for rotation testing they modify your software under test They don't modify your test suite, you know, you keep the test with separate and you for each Mutant that is produced in your software under test you execute the test suite again and again and again And this gives you a pretty good idea at the end How good your test suite is? What it can tell you places where you have tests, but they don't go then they don't do a very good job of finding You know all possible things that may change or they can also tell you Places where you have missing tests, but you already notice, you know, because we use coverage So mutation testing is really good for telling you Where you need to to make better tests? And the idea behind this is some of the mutations which you see and which the two support try to Try to mimic errors which developers may May do while writing code, for example, you know plus one minus one errors delete something by accident and you know just commit this to source control stuff like that and Other mutations you which you may see, you know, they are purely artificial But somehow they they help us Validate the the test conditions the test environment which we run our test suite in and you know help us expose something that's missing The algorithm for mutation testing is actually very simple so We we run three loops One after the other first one is we go through the all all mutation operators that our two in particular can support These are stuff which the two knows how to change with something else Regardless, whatever that that else may be then for each operator we find the Places in the source code where that operator is used and replace it with something else Most mutation most operators and conditions can lead to only one another type of mutation But some sometimes you can produce different mutations for only one place in the code like with the comparison operators And then of course you execute the test suite There are three simple rules to kill mutants Which you which you must remember first When you execute the test suite against the non-modified version of the program Everything should pass and this is a hard requirement You cannot go without it if something fails obviously your software doesn't work or your test suite is broken And you need to take measures And if you have flake tests and sometimes they pass sometimes they fail And you have no idea why this is obviously mutate your mutation results will also be unreliable and You need to take take measures to fix your flake tests the second rule is When you run the test suite against a mutant that is a modified version of your program You expect the result to be fail and that's a good thing when the test suite fails then we say that the mutant was killed or the mutant died and This means We had at least one assertion or one condition in the test suite Which wasn't met and the test suite failed so that means at least we have one test Which is able to detect this change and tell us you know the software was modified And the last thing is which you don't want to happen when you run your test suite against the mutant and The result is pass This is a bad thing Then we say we have a zombie or that the mutant survived and as you know from the movie zombies Are they think that go around and try to eat you and now imagine you make some change in the code Run the test suite it passes, but it doesn't really Understanders anything change and now this change goes into production and suddenly this becomes a bug and tries to eat you So these are the three rules Testing against the non-modified version should pass always testing against the mutant should fail This is the way mutation test tools Figure out when you kill the mutant when something fails in the test suite the tool says you kill the mutant Okay, now Let's play a little game. I am going to show you all possible mutations and Because you all know about testing You are going to tell me what test cases. I need to write to kill the mutants I am Using the code from earlier the one with the hundred percent coverage and only take this method See, this is a string variable and we return the upper case of the string variable. That's what what it does So very simple first possible mutation is Instead of returning the upper case string my method under test returns new value. So what test can you propose? So I can kill this yep. Yes, obviously not new. Yes, correct So we we execute the method under test and expect the result to obviously not be new if we I'll go back If we apply the mutation and run the test The method will return new and the expectation will fail everything to fail and we kill the mutant Okay, next possible mutation Instead of returning the upper case string I'm returning self because this method is part of the class and I have access to the self object So I can do this What test do I need here to kill this mutation? Okay type string correct so I must be checking what The type of the return value is and if I expect only string then it must be only string and nothing else Another possible mutation is Instead of returning the upper case. I just return the The value of this variable as is So how do I kill this? Okay, so I start with a value which is In a lower case and then I execute the method under test and expect this to be in a uppercase And I use this with just a constant because it's easier for me No, so this one for example will kill the the first and the second zombies And here If yeah, and here if I start with with new okay, let me think Yeah, it might kill also the first one So some sometimes yeah, why one of the test is enough to kill more mutants But actually this is the way I discovered them and I developed the test for them And actually didn't go back to think whether or not I have some test which is not needed And okay, so and the last example from the game is Replacing the this Ampersand this is the new the new safety navigation Operator in Ruby and I just replace it with a dot so for the folks who don't work with Ruby The new safety operator works this way if the object on the left is new then the result of this operator is new and Nothing else is executed and if the object on the left of the operator is not new the execution continues to the right and When we replace this with a dot Then if a language code is new we just get a runtime exception because a new object doesn't have another method called up case That's that's the difference so we do this mutation and What what tests do I need to kill this? Should what? Should not roll. Okay, how do you assert on that? Okay? Well at least I don't know how can I assert that there was no exception but I can Set this Variable to new execute the method under test and if there is an exception the test will fail anyway So I don't have to assert there wasn't an exception But I can assert that the result was new So if the framework asserts whether an exception was not raised then of course that's a valid answer But I don't know how to do this with Ruby Okay It collides if you read if yeah, it collides if you read the The examples as is on the street on the on the slides What I haven't shown is that this variable has a non-new value by default So we don't so that works then But a good good catch Now let's try and find some bugs with mutation testing One bug that I was able to discover just by using mutation testing in a project not actively looking for the book is this so we have a class called network which represents the networking settings on your Linux computer and There's an attribute called self-device which is the name of the network interface and This method for equality is obviously wrong You know as highlighted here and we'll see why so if self-device is none or an empty string That piece of the bullet expression is always evaluated to false and the entire Return value is always false So it doesn't matter what you know, what's the other object you are comparing to this method always returns false and If self-device isn't none or it's not an empty string then which means it has some meaningful value in the software This is evaluated to true and the bullet expression always depends on the second part So and that's actually the fix just you know Just just remove this first part of the bullet expression and and here we go the reason this this statement detected for I guess about seven or eight years is There was there wasn't a single test in the test suite Which tried to test this equality method when some of the attributes was an empty string core was none So they were always testing with some valid values and this went on undetected for many many years And also in the software under test When you see normal conditions this value is always it always has some value so it doesn't you know return Force or you know something bad happens Another book which I was able to find is As shown I have two classes the second class inherits from the first one And you see both classes have parameter code speed limit in the init method and Both parameters have default values. Now. This is perfectly valid Python code So there's nothing wrong with the way it's written. I Run this through mutation testing and I get a surviving mutant The reason here is this is a constant change the tool for Python adds a plus one to any integer constant And to see what happens and I immediately know the reason for this is that I don't have any test Which creates an object from this class and asserts what the default value of these attributes should be So I create my test like this Just you know create an object from the class under test and assert what the default value should be and the test immediately fails of course and The reason for this is if you look closely You see that this parameter. I'm not sending that to the init method of the parent class so it must go here after self and When I'm not doing this the parent class has It's default value. So it uses 15 instead of 90 and that's why it might as fails and This is also very subtle. It stayed undetected for many many years And also the reason for not being detected is the attribute to this value is assigned to Is never used in the software under test. It was meant to be used by external Clients of the software the software in under tests in this case is a library. So it's used by other tools And apparently, you know, nobody bothered to check whether or not the default value is it should be Another possible book very close to the previous one is again, I have two classes the second class inherits from the first one again, I have a Parameter with a default value and this time notice I am actually sending this to the init method of the parent class and everything must be fine. I Run this through my mutation testing to I get the same surviving mutants So again, I write the same type of tests create an object and assert what the default values of the attribute should be And I and this time I get another type of exception It's an attribute error telling me object from the motorway class does not have an attribute called speed limit And at this point I start wondering why this is so, you know I look here everything looks cool. I look here everything looks, okay And I traverse back to the parent class and immediately here. I notice first there's no parameter called speed limit and then something starts No to look not right and then I look in the body of the parent init method and Nobody cares about whether or not there's a parameter called speed limit. Nobody sets an attribute called self speed limit So that's the problem, you know, I don't have this One possible fix is to if you really need this to just take care of it in the Class under test and set the attribute or another possible fix is to delete everything related to this parameter I don't bother with it. And that was actually the fix in production Another thing which I mutation testing is really good at is Forcing you to look at your source code and refactor it and Places where mutation testing really shines is where you have if statements and comparisons and lots of Boolean expression stuff like that. The reason for this is we have Many mutations in places like this So in this example on the screen we have around one 100 different mutations every comparison operator Can lead to Almost 10 different mutations. So equals here can be replaced with non equals with Less than or great done less than equals greater than equals In Python isn't is not and also in and not in operators. So that many Boolean n can be replaced with Boolean or we can negate the entire Boolean expressions Also in the two for Ruby you can replace the Boolean expression with a true or false constant And I think also you can change only parts of it So you can replace this with true or false and then leave the rest as is and the Python 2 doesn't do this at the moment but it's fairly easy to add and So this goes through mutation testing any something, you know, there are surviving mutants and When you start looking at it, you notice the pattern highlighted in red So my thought is I can I can delete this and move the second block block of his statements To the right and it becomes a little bit more clear And then I notice another pattern that whatever value is I'm looking for an attribute under self-handle with the same name And do something with it. So I can refine this even further and use the get attribute function And this becomes like this In reality this fits only on four lines instead of ten lines and it's much more easier to test and much more You know easier to read actually and that's a good thing So we've seen, you know, what mutation testing can do It forces you to write better asserts and in my opinion When you have a complex soft to render test We not only should assert what the return values of our methods are but we also should assert What intermediate state or side effects are produced by the functions under test and you will all agree that It doesn't matter how much we try to write clean software. We always have These methods which do more than one thing at a time So they they do something some calculations return some value and oh by the way I've just said this attribute on the side just so you know and That's what mutation testing is really helpful with it helps you discover these things which you are not testing by mutating them So you can write your test better We saw we can find some interesting books and we saw that We can find places which we can refactor and the question still stands is my test suite good enough and Another question another side of this question is how do I measure how good my test suite is? Which metric do I use to tell whether or not my test suite is good? and You know metrics is fairly controversial topic. I just want to mention some research That's been going on in the last few years In 2015 at the GTAC there was one lighting talk which says you know coverage is not a good metric because it doesn't give you a Lot of information so go for mutation score You know use mutation testing and measure how much of the mutants are killed if you kill 100% of the mutants then then you're good And then last year at the GTAC there was another researcher who basically said well, you know the guys from last year They didn't do really good research. They didn't research on On many software. So we did better research and we claimed that The coverage metric like lying coverage and branch coverage is still the best metric in practice They say the problem with mutation testing is it is very expensive to compute and In their research it gave only additional 4% of information Which there is the guys didn't know already compared to coverage so they say you know use coverage don't use mutation testing and I decided to do a small experiment and see which one of these researchers Is right So my software under test is called Pelikan AB And this is a very small library. It's a plugin for Pelikan Pelikan is a static HTML generator for Python, which you can use to to run your block or your site on and Pelikan AB gives you one additional TAC for the templating markup, which you can use to encode Varying layout for your website. So for example, you can change the cover of links or covers of button stuff like that the way to use this software is To define the AB experiment variable in the shell and run the make command If you run make github, then the site gets rendered everything's published directly to github The way to render several versions of your website is first start with the control version and then name each experiment by name and Run make in a sequence like shown on the screen and if you do this The three commands then you get the control version of your website on experiment Which is called one two three and everything about this experiment goes into a directory with the same name the URL structure is updated And you know you can point your users to only to that experiment and see how they respond and stuff like that And in the version under test We have 100% branch and line coverage Of the software and we also have a book There is a setting called to delete output directory Which is set to true by default and this setting isn't something the software under test controls And this leaves into an external fire where you know your website configuration goes so like stuff like Your website name your github handle goes into that file as well The bug is when that setting is set to true Pelican will go and delete The output directory and delete all the HTML files and then starts rendering them in a clean directory So the result of this command sequence With the setting set to true is that you have deleted your entire website and only left with the last one And imagine how you know you delete everything and type make github and everything goes live So that's a pretty good way to destroy your website So, you know, I didn't have a test which will fail if the setting is set to true so I decided to integrate mutation testing into the project and I wrote a few more tests to achieve 100% mutation coverage You know, mind you, this is a very small library very small plug-in and the book was still present so this means I don't have any test which fails when the setting is set to true and I started thinking, okay Why I have these many tests and this book is still present and the answer is You know, you cannot discover this type of book Without looking at the external environment and that's why we need integration tests So I added an integration test which simulates the external environment with the settings and simulates the make command Then tries to verify what content has been rendered and whether or not It's correct and the test immediately fail, of course So then I said I fixed the book but in reality just you know change this setting to false and Also, I did the check whether or not this is set to true just raise an exception in my software and Now we have 100% mutation coverage 100% branch coverage and also we have in at least one integration test and I'm thinking okay. I must be good then you know If I have so many tests and I'm using these techniques then Possibly my software is bug-free and of course, this is not true I've added a pilot to the project and pilot was immediately able to discover this book this is This is a problem with how we call the super method instead of passing the class name I was using a shortcut which is self dot underscore class And this works perfectly fine when you use the software in its intended environment Because we have only one class and self classes evaluated to that particular class's name and everything's fine This becomes a problem The moment you try to inherit from the class under test and create a new class and do something different with it then when you call the class under test in it method Python goes into a loop between the the parent and And the inherited class in it methods So if you want to you know, if you want to learn more just head on to this pilot pull request number and there's a very detailed description why this is a problem and I am guilty of using this This shortcut in Python But I've seen this in many many projects online at github and I've seen this in popular software Which is used by many people so this tells me People don't have very good understanding of what self class actually means and how it works And that's why you know, we keep using it in the wrong way So to conclude the mutation versus coverage topic This is a link to my blog describing my experiment with more details and There are links to actual git commits. So you can see what actually what code was changed and how it was changed So I think when when we first start doing testing What we look at is How much coverage we can do so we strive to write more tests to test as much as possible of the software under test and that's a good thing Until some time when we Go to 100% coverage. We don't get any more information out of this metric So it doesn't do me any good Then we start looking at mutation testing and mutation testing tell us okay now you have some coverage and you are testing some stuff But then you know, there's a lot more you can test and this is you know What you need to do and we start doing mutation testing and we get to 100% mutation coverage and when we get to 100% we don't get any more useful information out of that and We also need integration tests because of the external environment and As you see from the examples, we have different types of environments one type of environment is The regular environment which our users will be using and another possible type of environment is a Developer just taking our software and trying to do something else with it or you know built on top of it And we have no idea what these environments will be and how people will want to use our software So that's why we need different types of tests and you know Possibly we also use different types of tools to detect the problems and deal with it And I do need more examples on the topic I will be continued to explore this topic throughout the year, but if you have Examples which you can share with me or you know publish something to github Please send me an email Now I'll go to something more practical speed of execution So as you as you can imagine mutation testing is very slow and Just to measure how slow and I've taken a real-world project from fedora. It's called pie kickstart It is a text parser library used by the fedora installation program It's a Medium-sized project with a little over 100 files. I think All of these files are Python modules They don't have many dependencies between each of them. So that's good Each module, you know, that's some checking a few if statements. They're hardly any loops in the code So, you know, very easy to understand actually There is some text then then they write some text as well as as an output And this is a library which is meant to be used by other programs. So it doesn't really do anything useful on its own The project has a fairly good test suite over 90% coverage with a lot of tests Some and the other good thing about is the files under the source directory a map almost One to one with the files under the tests directory. So they have the same names And first thing I did was take cosmic ray the mutation testing tool for Python and told it, okay So here's the source directory, which means load everything into memory You can find under this directory in terms of modules produce all the possible mutations You can you can produce on these modules and then here is the test directory Use the test runner to discover all possible cases you can discover and just you know start running and I let this run on my computer that took over six days then I became smarter and Wrote a small script to go through the source directory Take only one file load this Into cosmic ray, which means you know produce mutations but only for that module, you know Don't bother about the rest of the software only that module and oh by the way here It's only one file in in the test directory which contains your test So use only that for testing and not everything else And that was faster. I I've also added an option called fail fast, which means Whenever one of the test fails We know that there was a failure and will kill the mutants So don't bother to execute the rest of the cases. This is an option for the test runner and I did some refactorings like stuff like if a length of string is greater than zero I've replaced this with only if string And this helped me save about 1000 mutations Which are no very obvious things and I let this run the execution time was now a little over six hours and This is a 20 times more improvement in speed of execution, but still quite slow for any practical purposes. I miss a So the way to use mutation testing at the moment in my opinion is If you you're testing a very small library or a very small project then you can go You know full on into CI and just tell okay this is my common line to schedule my mutation testing jobs and let this run for 10 15 20 minutes and That should be fine If your project is sending anything bigger than 200 lines, this is not going to work very well but what you can do is Create a commit hook or pull request hook Which examines the payload and you can first thing you can do is a schedule mutation testing only against the files Which have been modified but by that particular commit or that pull request and that should should be faster next thing you can do depending on the tooling is Go on that go down from the module level to the class name and to the method name which has been modified The Python 2 doesn't know anything about classes and modules. It only knows And doesn't know anything about classes and methods. Sorry. It only knows about modules So regardless if you have one class or 1000 classes in one module Python loads the entire module and starts doing mutations against everything On the other hand the Ruby 2 knows about classes and methods I think and you can you can tell you know, that's only that particular stuff and you can try and go even further because this is a Pull request or commit hook you have access to the actual diff and you can apply this to to the source under test and Assuming you've tested everything before and it's fine. Then you can schedule mutations only against the lines which were changed So you can do this and it's not really impractical to do this another thing you can do is Go into parallel again depending on the tools The Python 2 is built around celery. So you can you can very easily hook this to some messaging back end like rabbit mq and Scheduled, you know hundreds and thousands of messages and let your infrastructure Deploy a Docker instances or virtual machines in the cloud and just you know run your tests in parallel Just might get back the results and you can do this if you have a lot of money, of course Here is a list of some mutation testing tools I have been using only the first two. I mostly use Python and I contribute to cosmic ray for Python and I've used mutant for Ruby, but not very actively at the moment The name to the names to the rights are github repositories Okay, so as far as I know the tools on the top are based on the abstract abstract syntax tree of the language and they are language specific So whatever, you know, you want to do in the two it works on ASTs and modifies the nodes of the AST one thing I like to do is Look at the other tools and see what they're doing especially in terms of mutation operators and try to bring these to Python So if you if you want to actively use mutation testing I really advise you to look at tools for other languages and see how they work and how they are doing Because many of the two the tools are very new. They're not very mature And on the bottom. This is another tool for mutation testing called to the mule project Which Alex Denis off will be presenting later today. So I'll definitely be checking out checking this out This is an LLVM base to so it should work for several different languages If you are into mutation testing also check this out And now before I can go further I can take some question from the audience if you like Okay, go okay, so the question is if we have a zombie and but we have 100% Cold cold coverage doesn't mean that coverage was wrong Yeah Yeah, okay, so it doesn't mean coverage was wrong Like I mean it doesn't mean the metric the measurement was wrong. It's it was probably correct There are other problems with coverage like for example if you have one line with the lots of With a long a Boolean expression Then it still gets covered regardless of how much of that expression is evaluated So we may be evaluating only the first part of the expression And still cover that line, but you know the next 10 parts of the bullet expression are not evaluated Yeah, yeah, well the way to count you know These cases you should take this up to the to the tools to the authors of tools who do coverage But there are many many publications online With respect to problems in coverage and why it's not really a good metric why you shouldn't use coverage and really for me Coverage is a vanity metric It really tells you how much of the you know of the lines you've covered but nothing more It doesn't tell you anything more And if you have a zombie on a line which was covered this simply means You might have executed this line, but probably you didn't assert on some condition Or you know you assert it on one condition and you need it to assert on two conditions Okay Yes It's unlikely that you have 100% Coverage and have have zombies But most of the tools count on the line coverage and a very few counties and statement covers Okay, we have question there What Okay, so the yeah the question is what if we we don't use primitives we use some library for the business logic How does mutation testing come into play? Well, this is very dependent on the tools that you use So for example the the Python tool out in this until recently It was very poor on the mutation operators because it's a new tool and not many people use it on the other hand The Ruby tool called mutant. I think this is one of the best tools Currently in existence and it has tons of mutation operators tons of conditions It understands because it's used by people who work on commercial software They get paid to actually write test suites for commercial software and they support the tool for that reason About JavaScript. I don't really have no no idea. I don't use JavaScript. But really the you know Detect here is you need to know your tools very well and you need to know the software very well And that that's when you can decide whether or not that's going to be useful now maybe maybe in your case you might write a plug-in or you know an extension to the tool and Produce the mutation based on the functions in that particular library. Okay question here Okay, so the question was about equivalent mutants equivalent mutants are things like which You know change the code, but don't really change behavior in any practical way. So for example, we may have less-done operator which in practice is equivalent to less than or equal if we you know depending on the values we accept in the application and I don't have any concrete measure about this in practice But I think about 10 or 15 percent at the time I see equivalent mutants Equivalent mutants because because the behavior doesn't change but the syntax changes, but the behavior doesn't change the test We doesn't fail and we cannot kill them and they just stay as is In in the project I use mutation testing for I don't I usually have some threshold like about 10 10 percent or something like that and if the mutants core is You know above a certain line then I say we're good and then the CI system goes green and if you drop under that line then We go red and inspect what's happening And that threshold is based on how many equivalent mutants I have so I try to you know get some idea about that and then go Go from there I've talked to other people who do mutation testing and have been doing mutation test for a long long time And this is a problem, but still the benefits you get from Asserting on all those different conditions and looking at your code base and understanding the software much more better Which is a result of mutation testing It just you know the benefit is greater than having to deal with equivalent mutants Okay Okay, more questions or no questions We have two minutes, so you know there if there are no more questions. I'll just show you this very quickly I have started to document my findings About mutation testing. So first of all, I don't forget them and some of them make really good examples This is available on read the docs and it's also available on github if you'd like to contribute and It since the last week we have also Chinese translation for this and I'll skip the trivial examples and show you the last one so we have We have this in Python. It's fairly common to have methods for equality which compare to objects by comparing That all of the attributes are equal So in this example, we have a class called sandwich and you can modify the meat of your sandwich and the bread type of your sandwich and we say that two sandwiches are equal if Meat and bread attributes are equal and that's about it No, we have a safety check here if the other object is none then it or falls and the not equals method is just a negation of the equals And the way to test this with mutation testing is like so So we create two test objects sandwich one and sandwich two by default all of their attributes have a value of empty string and Which is not shown on the screen and first thing we do is Test the Test for equality. They should be equal because everything is an empty string inside these objects And we also saw that the two objects are not you know when compared for non-equality it returns false and that takes care of Half of the testing then we test the safety check compared to none and expect to not be equal And then comes the fun part the way to So this starts changing into Different comparison operators and the way to test is like this So we we need to modify only one of the attributes At one of the test subject and leave the rest of the attributes and the other object unchanged And we need to do this for every single attribute. So When all the attributes are of the same type You just do this and if they're not of the same type you can do grouping or something like that, you know To modify this block So when the time is up, so thank you very much