 Alright, so I'm also going to be talking about testing, but specifically mutation testing, which is kind of a meta subject. So a little bit of background. I wrote a tool called vertigo, which the implements mutation testing for smart contracts, which I released about a year ago. And in this presentation, I'll first give a little bit of background on mutation testing, a crash course of sorts. And then I'd like to talk about the features that are currently in the mutation testing tool and the features that I would like to add and how other parts of the technology stack can enable these features, because the mutation testing tools kind of build on top of other tools, like this little computing compiler. When we talk about mutation testing, you always always first have to talk about code coverage. And code coverage is this ubiquitous method which is used to evaluate test feeds at the moment. I think almost all people will use this metric to see whether their test feed is adequate. There's a few issues with code coverage. Specifically, it doesn't really tell you anything about the code that you do cover, but it does tell you something namely the code that you don't cover yet. So it is actually a very useful metric because it, it helps you improve it tells you where where to change your tests to have a better test feed to have better guarantees. And on the other hand, tries to improve on this metric by telling you exactly how efficient your test feed is at detecting bugs, rather than just telling you which part of the code are covered. Which is great because it allows you to improve your test feed even more efficiently. It also gives you a nice measure of the guarantees given by your test. So, kind of to give an overview of mutation testing. This is the general strategy. We started out by creating a lot of bugs, which could be introduced in a solid team project. And we run the test feed for each bug and see if the bug is detected. And then we know something about which where these bugs that didn't get detected lead, and we can improve the test feed. Generating these bugs. How does this happen so in mutation testing, we call this bugs mutants, because they are mutations of the original program. And we use so called mutation operators to take the original project and modify it to get this bug in the program. And mutation operators are basically rules. They can be substitution rules, which describe how to inject bugs, basically, and for the tool vertical what it did is I look at existing tools. The mutation operators used there. Also in previous research. And I implemented some that are somewhat specific to the weaknesses that happen in some of the smart contracts as to give you an example of what mutation operators do is I have made a little table. So for example, we swap the addition sign with a subscription. Another one, which I really like is the modifier removal mutation operator, where we take a look at the source code and to generate a bug we remove a modifier. What I like about this is that it kind of emulates a case where a developer forgets to introduce some authorization and authentication logic, which, which generally is used, which generally is implemented using modifiers, so just the only owner modifier. So having generated a bunch of bugs, we get to a part where we need to evaluate them to know like whether the test we detect them or not. Which is the next step. So kind of the two basic things that can happen are the test suite succeeds or fails. So if the tech, the suite succeeds, that means that we were not actually able to find a bug, or the introduced bug, in which case, the, the mutant survived. The other case is where the test feed fails, in which case, the mutant is detected and we say it's killed or dead. Then there's two, two additional categories to deal with some edge cases. So, for example, what could happen is a mutation operator modifies a piece of code that results in in that solidity. The compiler will not successfully compile. And we cannot even run the test feed. So in this case we say the mutants is error. The fourth case is timed out. And that's included because mutants can also introduce infinite loops. So what we do we take a time out based on the original time that a test we took. And after this time it has expired we killed the execution of the test feed and we categorize the mutant as timed out. Then there's this fifth category, which I put in brackets because it's not really a category on its own, but rather a specific instance of a life mutant. So, what mutation operators also can do is modify code so that there is a syntactic change, while not changing the underlying meaning of the code. So to give you an example. I have the implementation of the, the max function by opening a separate. And as you can see, there's a slight difference here in the comparison operator. However, so the difference lies at a point where amd are equal, but for the evaluation of the max function, it doesn't really matter which branch you take when amd are equal, because both will be the max number. So this is an equivalent mutant. And we don't really want to account for this as an alive mutant, because the test feed was not insufficient. It was doing okay because this is actually also correct. So, unfortunately, this process is somewhat manual, and we implemented some feature to automate this a little bit, but it's not possible to automate that in general. So there will always be some categorization manual categorization required when you're performing mutation testing. So if we have categorized all the mutants we can compute the mutation score, which is kind of similar to the code coverage metric, but in this case, we're talking about the efficiency of the second box of the test. We actually we the the mutation score is computed using this formula. And it's basically the rates at which you detect mutants. So you take total number of mutants that you killed or that were detected and divided by the total number of valid mutants. So these are the non equivalent alive mutants and the killed mutants. And then you separate the other ones for the mutation score. So this gives you a general metric of the quality of your test suite, and then the specific surviving mutations to something. If you really detailed information about which part of the code, you could improve, or you could improve the test feed for rather. So that was kind of a bird's eye view of mutation testing theory. Right now, I'd like to go over a few of the features that are in vertigo the tool right now, then what's kind of on my wish list, followed by how kind of the foundational technology such as the compiler or test framework, can enable or support the development of these features. So, first, what do we have today. There's a few features listed here, which, which I think are kind of the main ones. And the first one is parallel evaluation, which is pretty straightforward, instead of running the test feeds, the sequentially for each mutant, we run them in parallel. This is the chance there's going to be a lot of tests so we won't run all of them in parallel, but we'll be able to use whatever computational resources your machine might have. The second optimization is mutant sampling, where instead of taking the entire set of mutants we've generated in the first phase, we take a random sample of those. This will give you an estimation of what the original mutation score would have been. Well, we're drastically reducing the time you need to compute it. And there's a balance here, because there's of course, in accuracy, because you're taking a random sample. And the third really nice feature, what I think is a really nice feature is the support of universal mutator style rules. So this is a project in for general mutation testing, where the authors have designed a method of formulating the rules, using records patterns. And this will allow you to easily develop and demo different mutation rules. But what it will also do is enable you to write mutation rules specific to certain project. For example, you might be able to write mutation rules for specific safe math library. Here I have a little example from the universal mutator repository itself, which describes some mutation rules for solidity, specifically the time keywords. So this should give you some idea of what this is capable of. So lastly, we have compiler equivalence, which is enabled by the solidity compiler. And it has to do with these equivalent mutations. So what vertical does is it takes the original program and the mutated program, and it will compile both. It will then compare the generated byte codes. And if they are equal, conclude that therefore also the source code must have been equal in meaning, and it will disregard the the mutants entirely so it won't even start the testing process. This assumes, of course that the compiler is correct. And let's hope it is. Okay, so what's next. There are two categories. First is optimizations the other is usability. There's no time to really do a demo but these would really help. But first optimization so first is incremental evaluation, and this would be really nice to have. Because you in a regular scenario, you would be running mutation testing every once in a while maybe, and between those runs. You could reuse a lot of information. So for example what you can do is look at the previous analysis results and look at which test killed which mutant. The same mutant again, try to find that test and run specifically that test first, instead of running the entire test week. That can save a lot of time. So, maybe 80% of the mutants will be killed by the same test, and that will save a ton execution time because you won't have to execute the entire test week every time. And then there's mutant clustering and mutant clustering is kind of similar to mutant sampling, but instead of taking a sample from the entire set of mutants we group them first, and take samples from the groups. And this gives a more accurate estimation of the third optimization which I really would like to have is test selection based on code coverage. So what we do is given detailed information on the coverage, specifically, you'd want to know which tests cover which lines of code, you'd select just those tests that cover a mutant to evaluate rather than the entire test. So assume, for example that each mutant will only be covered by 10% of the test week. Then this optimization will give you a performance improvement of 10 times, which is huge. It takes like a 10 days. Anyway, so then there's a usability aspect so I didn't get to show this but if you set up a vertical to run parallel evaluation, then you have to instantiate bunch of national networks. So this set of process shouldn't really be necessary I think, and I think, removing that will make setting up communication testing a lot easier, because you won't have to set up like for example 12 development networks. The other part of this is that I believe Ganesh is not made to run the test feeds, like 500 times. And my machine is that it would create like a few million files, which would use up the I notes on my machine and it would break almost everything. So with dynamic Ganesh network creation you could clean up after a test feed run and prevent this problem from happening. And then we also have framework expansion, which is basically means that I would extend for to go to work with other frameworks, which also are commonly used. So this currently we only really support trouble. So, then there's kind of three areas of foundational technology which support the development of mutation testing framework in general, but specifically vertical. And that's the compiler test framework, and then I made it a little category of others. So this is compiler, and so the Solidity compiler does actually already does a bunch of things right for mutation testing. So, first, we're where vertical really uses the Solidity compiler is in the ST generates. So instead of doing manual parsing and analysis of the source code. So I think we kind of use the Solidity generated is T to find specifically the locations that want to mutate based on the information in the is T we determine how we should modify the original file to get to the mutant. So the recent version of Solidity. It's also, I think, 6.2 062. It's also possible to recompile from a modified ESD so that would make the process even easier. But on the other hand, it would also likely require some more tight coupling between the mutation testing tool and a test framework, because you need to be part of the compilation process. The second part where the compiler really helps out is in the compiler equivalence feature which I mentioned previously. So, so I won't really go into it. So we have the test framework, which kind of enables almost everything of the mutation testing process, because it's our interface to the project. So, the first part is the interaction with the unit test, which is not optimal at the moment for mutation testing. Because it's not as far as I know, not easy, at least to directly execute single test or maybe a list of tests, which I will, which the mutation testing framework will want to do. So, for example, the test selection optimization and the incremental mutation testing optimization would require this functionality. I do think it's possible to sing out specific files to run. So that's already like some improvement over the generic run the entire test week. So I think, like a more detailed interface would be super beneficial for the adjacent testing tools. Then we have the other part which is automation of testing network creation. And I think this is handled by some of the test frameworks or ID frameworks, but to a limited extent, because I guess it's not a common use case to run your test suite in parallel five times. So, that's probably why it hasn't been implemented yet to have multiple test networks being created and cleaned up all the time. But this is also a feature that could be handled outside of the test network. And lastly, there's this item of test evaluation speed. We're really any improvement to this evaluation speed so this could be like a compiler optimization, a team optimization or some other optimization also reflects in the performance of a mutation testing tool. And then the last category miscellaneous own, I could detailed code coverage here, because code coverage is currently handled by, I think, by separate tool called celebrity coverage. And it's on the issue tracker for this project, but I'm not sure when it's going to be implemented. But this is one thing that would enable the test case selection optimization. And I believe some other features outside of mutations. So that was kind of the overview of mutation testing theory, kind of what the tool does right now, and what could be improved and how other people in other parts of the tool stack might help with the implementation of these optimizations. I think there's some time for questions. I'd also love to hear any suggestions maybe improvements on like usability, or other questions. Yes. Thank you for your talk. As you want already outlined, we still have five minutes left for questions. So feel free to raise your hand if you have a question and you are attending here in the chat room in the video conference. If you are watching the live stream, please put your question in the Gitter chat. And we will be patient and wait a bit, because we know there's a delay. Any feedback here from the room does not seem like it so far, but let's give it a few more minutes to also give the people on the live stream the chance to react to this. In the meantime, just checking in Leo and Martin you're both there already right. Yes, I can see you. And Nicholas wants to say something. Yes, go ahead. I would try using a couple of mutation testing tools in the open sibling contracts library a couple times. And the issue always is that the test rate is so large and the number of contracts so large that running the entire test for a single mutation doesn't make any sense. Instead of having the automated, like, only run what tests need to be run based on coverage. If you could, like have a way to very simply say, only do mutations on this contract, and then only run these tests, which would be as simple as provide a way for the user of the library to write the desk command where they could say I will only run this test that make it much more usable. So that's like for reproducibility. I think you could certainly do that. So, protocol already supports the inversion of this by allowing you to ignore certain directories. So I guess what you could do right now is ignore everything except the thing you'd like to test. There's no selection of tests implemented yet, but I think I could. But your question is kind of valid about like the duration on a project like yours. And I ran for to go on. Well opens happening and and everyone was as well, and it took while using 16 parallel processes to two hours, or between two and three. That's, as you said, a really long time, but I think that given these optimizations for example the test selection and incremental evaluation optimizations that could be cut down a lot. For example, the test case election at 10 times speed up would decrease in time to maybe 30 minutes or so. Of course, it would have to be evaluated to see if that's actually the speed up that you would get. But I think with some optimizations that should certainly become a lot more usable and practical to run in a CIS setup or at least frequently. That sounds great. Thanks.