 Okay, so really sorry for the delay, but let's get started My name is Moritz Kronbach. I work at Blue Yonder in Germany And I'm going to talk about pretty much the same thing as Tom just did My hope those who watch both talks still will find some interesting things First a little bit about me and why I want to talk about this topic My job is doing predictive analytics Basically means predicting the future But actually it doesn't have anything to do with this. It looks more like this We have some machine learning pipeline with data pre-processing external data sources and Done then some super secret algorithms which produce a machine learning model We work with big data If you do an image search for big data, you will find that big data is shiny clean and most importantly blue From my experience big data is often not shiny and not clean Can be very complex and dirty and nasty to work with So it's more like this But it's it's definitely blue So there big data is really a breeding ground for weird edge cases and things that can go wrong And of course we have to find ways to protect ourselves against these things and discover potential problems before production and randomized testing or I call it dynamic testing is One such tool to help us find problems before going into production To reduce stress for us and for our customers Okay, so what do I mean by dynamic testing? Basically, I mean both property-based testing and fussing These are these things are usually seen as Kind of separate things Which sometimes makes sense because they are often used for very different things But it's sometimes also said because there are lots of cool stuff in both fields and Often it doesn't make sense to separate them So by dynamic testing, I mean any testing where test cases are generated automatically For example either by fussing which means the user gives some example input, which is then mutated automatically or by parameter templates Which means the user gives some generative model from which test cases are generated And then the function behavior is checked for properties like does not crash does not time out or other universal properties like mathematical expressions that the function result should Fulfill in contrast to dynamic testing is of course Traditional static testing where test cases are provided by the user and the function behavior is also precisely defined by the user I want to compare the two types of testing a bit For this I want to introduce two attributes of tests. The first is precision Which means how closely the expected behavior of the function is defined and the other is case coverage Which means it's sometimes called input space coverage and it means what proportion of the input space is covered by test cases I've had some discussion about If case coverage matters so for example Does it matter if we check five out of two to the power of 64 cases or 5,000 out of two to the power of 64 cases in any case It's still a really really small proportion of all possible test cases But I think it it does matter and I give one example to maybe illustrate this So let's say you have you have implemented an algorithm and there's a Numerical instability in your algorithm that you don't know about and you don't know that there could be a numerical instability and only one One in a thousand inputs are affected by this So if you use five test cases You have a probability of about 0.5% to detect this instability If you use 5,000 test cases you have a probability of nearly 100% to detect this There are other reasons like Tom said before The independent code pass coverage can increase which is much stronger than just branch coverage So in general Dynamic tests help you find classes of K of test cases that you didn't think about before and Usually it's like this static tests have very high precision or even perfect precision, but low case coverage and dynamic tests have sometimes low positions because we have to Define general properties, but a bit higher case coverage and Sadly usually you can't have both high precision and high Case coverage So the best way is often to have just one of one at a time So it makes sense to use both static and dynamic testing if you want to increase if you want to maximize robustness Okay, no a bit more practical I'm going to show some Examples for dynamic testing in Python and I'm going to use hypothesis which is quick quick check style testing for Python I really like to work with hypothesis. It's stable, but still in steady development And it has a lot of innovative features Some of which I am going to show Okay, first example So your colleague went on holiday and he left you this function which calculates the Fibonacci Numbers Does anyone does everyone know what the Fibonacci numbers are? I think yes Okay, so in any button Maybe if you look at this code It's not really clear that this formula computes the Fibonacci numbers There's the Fibonacci numbers are integers and here is some stuff with square roots and then an inverse here so We are not really sure if this is correct But our colleague also left us some tests He's testing the base cases one and two is equal to one and then two more cases Which are easy to compute in your head and Even another case 50 which he probably looked up on wall from alpha And we run these tests and everything is fine But I'm still not really trusting this code. So I'm going to write a dynamic test For this I will first set up hypothesis The most important piece of hypothesis from the user's perspective is the given decorator Which provides a test function with? Automatically generated test cases The other important piece our strategies Which define how? Data is generated. So here we import a strategy for generating integers And then we use we import some settings stuff to De-randomize hypothesis because for the presentation, I don't want any random behavior And we limit the number of iterations and set a timeout Hypothesis has a built-in example database Where it stores any examples that it that falsified an assertion before? and Every time you run your test you again these examples will be run too of course that makes it Non-deterministic so I'm disabling this for for this presentation. Okay. This is my dynamic test Let's look at this function first test flip recurrence takes one parameter n and We check that the function fulfills the recurrence that Defines the Fibonacci sequence combined with the base cases and This recurrence should hold for all values With at least three Let's run this and We find there is a falsifying falsifying example Namely n equals 71 So this assertion fails for n equals 71 Let's take a closer look first at what happened So I've added the settings parameter to Given and increase the velocity of the output and we see that the first falsifying example is n equals 805 But hypothesis doesn't stop at this point. It didn't it tries to find the simplest falsifying example now and for and for Integers this means finding the smallest counter example so what it does It starts with small values and equals three four five and then Pretty much a stochastic binary search to find the smallest counter example. So it gives goes up to 67 here Which still works and then 131 which doesn't work anymore So it goes back one step So the it tries to find the counter examples somewhere between 67 and 131 now and After a few more iterations of this it will find the example we have seen before Namely 71 okay to summarize this The first step when hypothesis runs is sampling it Generates test cases until it finds a falsifying example or until the maximum number of iterations has reached and When it's done when it finds a falsifying example It tries to shrink this example meaning it tries to Find the simplest falsifying example For integers. This is the smallest example for strings. This is the shortest string for example Yeah, and we've seen two key elements of hypothesis given which is the decorator that supplies test functions with data and strategies which describe how data is generated and I think We have seen that Dynamic testing is very easy to implement and often very useful for math heavy code Where you have nice beautiful properties that describe your functions behavior But let's do some toy example with some less mathematically Allegiant code this one So it's really just a toy example We want to test the quote function from the URL lip package The quote function prepares a string To be used in a URL So you can see here It encodes the space with percentage 20 we can run this static test and of course it passes To write a dynamic test for this We have to think what do we expect from the quote function what are properties that the quote function should fulfill for any input And one thing we might expect is that when quoting and then unquoting no information is lost So if you quote a string and then unquote it again, you would expect to get the original string back Let's try this And we find a falsifying example So have we find a buck in URL lip? Sadly not or maybe not sadly But anyway, we didn't we just weren't careful about what kind of input we generate The text strategy Generates strings consisting of all possible unicode characters And the counterexample we found here is just a control character. So it doesn't make sense to encode this in URLs We can easily fix this one. We import the string module and Pass the parameter alphabet to the text strategy And tell it that we only want printable characters And now it seems to work fine Okay, but in general handling impure objects like strings and date times can be a bit challenging Not only in data generation, but also in your assertions because often the underlying functions have a lot of special cases That you somehow need to handle I just want to give you a quick overview of what kind of tools there are available to assist you with this Hypothesis has some things to build in to help you for example the alphabets for string generation For date times. There's the hypothesis date time package Do who knows what the fake factory package does Okay, two persons The fake factory package is really cool. I think you can generate all kinds of all kinds of everyday Fake data like you can generate random URLs random phone numbers random email addresses and Hypothesis fake factory exposes these random generators to hypothesis So you can for example test functions that taken you are a taker URL as a parameter Hypothesis has also built-in support for custom strategies So you can create your own strategies for example by mapping Existing data types so basic data types to your own custom data types Yeah, that was quick Just to summarize Dynamic tests increase case coverage at the cost of precision Dynamic and static testing Compliments each other none is a replacement of the other and Sometimes it makes sense to use both. Sometimes it's better to more pragmatic to just use one of them For math heavy-code dynamic tests are really great really great For if it gets more less mathematical more real world Sometimes it can be difficult, but there are tools available and steadily improving and if you have a problem with Anything with any object and do you think maybe there should be some tool available for this? Just you should just ask on the hypothesis mailing list for example because They want to know this Okay. Yeah, thanks for listening And you get out in time for lunch Okay, thank you. Do we have any questions? Hey Very interesting talk. Maybe this is covered by the custom strategies, but are there provisions for Providing templates for for a sample protocol passing No, I think that doesn't exist yet Okay, thanks Thanks for the talk Do any interesting bugs come to mind that you found in your production K base? Yes We had we had one combinatorial optimization algorithm Which would only fail so as input it would take some distributions that we predicted and It would only fail if there are two degenerate Distributions and at least one non degenerate distribution in that case it would enter an infinite loop So it was data you weren't expecting or could you have hardcoded that if you thought really hard about it as an example? Sorry, was that data you were expecting or if you did kind of if you've been hardcoding your examples might you have come up with that? No, I think we would have never come up with that ourselves. Oh Any more questions? Okay, please join me in thanking our speaker once again