 This talk is about escaping from auto manual testing, but so can I just see a show of hands? Who's ever written a test? Good Has anybody here written a test and then discovered a bug after they wrote the test? And you all feel that this is kind of unfair Yeah, me too. So this is about a library that I work on which is aiming to make your tests more powerful Discover more bugs with less work from you the programmer So let's get started The idea is called property-based testing and you'll understand why in a minute I'm gonna walk you through an example of a property-based test from coming up with the property that we want to test right through to getting a correct implementation So we start with a simple example I'm not claiming that this is an example of a thing that you actually should test But it's a good example to teach you what this testing style looks like So we start with pretty standard unit testing. We have two tests one that checks So the property we're testing is that the sum of a list of numbers is Greater than the maximum of the list of numbers, right? You add up your numbers You expect to be a number than you have in the list And our initial tests we go we add up a list of small numbers and we check that that's true And we add up a list of big numbers and we check that it's true, but we can do better So the simple way is a parameterized we go here's all the inputs we want to test and for each of those We've run the test and if it fails Pi test or our other test runner of choice will helpfully tell us which of our examples were no good We can do better again with hypothesis We can say instead of checking these specific lists of integers Please generate any possible list of integers and in practice this will try a few hundred examples So if we run this test we learn something and specifically we learn that you can't take the maximum of an empty list Instead you get a value error Okay, we didn't actually want to check that So instead we tell hypothesis look we want lists of integers, but lists have to have at least one element in them And when we run that we discover that of course the maximum of a list of one element is the same as the sum So we go greater than or equal to And if we run this one We discover that if we have negative numbers in our list then the sum might still be less than the maximum So we tell it that we want integers at least zero in a list with at least one element and now the test passes So this style of testing can be applied to a very wide range of things I'll walk you through some of how you can generate data how hypothesis works and what to do with it For the rest of this talk, but the basic principle is that humans are not so good at thinking up Thousands of slightly different tests. That's the kind of work that we should leave to computers and because we're programmers We have a program to do that So hypothesis gives you a huge number of ways that you can describe your inputs. We've seen lists of integers But there are a number of other Strategies we call them to generate values or collections of values Most of them, yeah, we have none booleans floats integers date times times All the kind of standard library types that you might use Where it makes sense they all have minimum value and maximum value arguments So if you have a more constrained test case, you can support that quite easily and collections always have min size max size Where it makes sense for a particular kind of thing there will be additional keyword arguments So if you're generating floating point numbers as well as capping the values You can also specify that you don't want infinity and you really really don't want not a number Nobody wants not a number Strategies, of course may not immediately give you what you want. So they also have map and filter methods So if you want even numbers, you just take any integer then you multiply it by two For filter you get more or less the same thing you take your strategy and you give a function And you only count values from that strategy if the filter returns true So I've got demonstrations here if we take an integer strategy You map the string type over it till you get strings composed of digits You can map a multiplication by two to get even numbers you can filter by Modulo to to get odd numbers This is actually relatively inefficient because hypothesis will have to keep trying to generate numbers And if whenever it generates an even number The map or the filter will tell it no try again It's smart. So it does learn from that eventually, but it's more efficient to multiply by two and add one And finally you can do fancy stuff like you want a list with at least two elements With at least two unique elements even though you allow repeated elements There are various fancier things I'm not going to go into these at great length if you want to know more I invite you to come to the open space later You can have a deferred strategy and deferred in this case means that the strategy can refer to itself in a recursive kind of way So in this case, we're giving it either integers or lists of integers or Lists of mixed lists of integers and integers and so on forever This one's particularly cute because you can define any value that's valid JSON in about two lines of Python You can also draw data within your test which becomes really powerful if you have a multi-step logic chain You can for example do something get your corresponding value and then sample from parts of that for example And the composite decorator lets you define your own strategies in a completely from scratch custom way The final thing which I find really useful when I'm working out what values to test is that often The kinds of values which your code accepts can be inferred from the code itself If you've ever had a database schema That tells you what valid things are for that database code if you have a Django model same thing If you have a regular expression, we can infer what strings match that regular expression If you have numpy or pandas data types or classes defined with atras or a number of other things even functions with type hints on them Hypothesis can inspect this and work out what ought to be valid in every case You can also override that with explicit arguments, but if we can work it out, it'll do it automatically for you So I'll talk briefly about cool stuff if you want to run this in prod test testing stuff at scale There are config options. Of course, there are config options You can configure the verbosity whether you have debug info you can turn it into deterministic mode We generally recommend leaving hypothesis to random exploration because it caches any bugs it finds So if you run these random tests and you run them again, and it found a bug the first time it will never lose that bug The internals are complicated somewhat by this determination Not to lose bugs not to be flaky, but we think it's worth it There are more you can define these in code. You can load them in profiles. You can get them from environment variables and so on It's all in the documentation Performance performance is a constant issue for these kinds of tests And I'll say hypothesis is fast. There's a statistics option You can check how long it took to run your tests to generate data for your tests compare those signal hotspots And typically somewhere between zero and twenty percent of the time when we spent generating data to feed into the test However hypothesis will run your test at least a hundred times if your test is slow running it a hundred times will not be faster If your test is fast running it under branch coverage, which hypothesis does so it can tell when it's exploring new parts of the space Can also be slow if it's pathologically slow you can disable the coverage tracking But it adds enough power that we recommend leaving it on in almost all cases and The final thing I'm going to talk about about hypothesis itself is how it shrinks values because you might remember at the start if we Go back a bit each time it gives us an error. It tells us the minimal falsifying example So here the list is just of the single element zero even though this would also be true for a list with any number of zeros in it Further on the minimal list with a negative number is zero and minus one This is kind of complicated how it works But I'm going to walk you through the fundamental idea of it because this really helps in defining your own data to make sure you get good behavior It's essentially a three-layer thing you have the strategy function itself or the strategy object Which you've composed out of all the parts hypothesis provides This basically translates up and down from a stream of bytes When you're generating data initially these strings of bytes are just random or pseudo random guided by coverage for the sort of thing Here represented as just a stream of bits Because bits are easier to read on a slide than but And then those are translated by the strategy into the actual values that we want So in this case we're asking for lists of at least three bullion values We have this infinite byte stream and we have our output Let's see what the minimal example hypothesis finds is with the constraint that some value has to be true That is the predicate is the any function The first thing that hypothesis does is just observes that well the stream is not infinite Right because the minimal value must have fewer bytes of entry to be to draw from So we can just chop it off and if we ever try to generate something that needs more We just throw it away as a bad example The next thing we do is we try to just set a byte to zero or a smaller number But for bits you've got one or zero so it's just set to zero and we do this and the predicate still passes So we keep going we try setting the next bit to zero and the way this works Right is that for ones that we know we have to have We just treat each of those bits or bytes or collections of butts in the real shrinker as values directly But once we get to things beyond the constraint first We draw a bit to say do we want another value or not and then we draw the bit for the value So if we drop that to zero our list is now of length three And hypothesis goes I wonder if I can chop off those last bits and it can so now we only have three bits of entropy and The problem here is it tries to flip that last one to zero and now the predicate fails In a real test it runs the test with this new input and the test no longer fails with the same error So instead it tries to perturb that byte string bit It just adds one to it and tries to shrink again And in this case it can shrink further once it's done that and this is The minimal list the smallest possible list according to the heuristic we use which still passes the predicate any I have time now for some questions if anybody has questions about what you do with this or what it's good for Two kind of related questions. First of all, do you have examples of say like user inputs that that you can use or Like sequel injection strings that you could you could pass to your to your I don't know your view and say Trial these things that are known to like break lots of views and secondly In your examples, you're basically looking for an exception, but what happens when you want to check what the Like you obviously don't know what the answer should be because the inputs random Do you write certain tests where you're like as long as it isn't thrown exception I don't really care what what the answer is for my function or how in the real world do you do you Let's say we have a function which Tries to find the first day in a month and you give it lots of different dates and it finds where the first day is You don't know what the what the first what the answer should be depending on what the date is How do you deal with that or just no exception? so the first thing first we have heuristics for things like Floating point numbers or unicode strings which will throw Nasty a inputs at you more frequently than they would generally come up But if you wanted to do something like testing injection strings specifically I would do that with something like the sampling strategy where you hand it a sequence of inputs And it will just give you one of those inputs. You could say test any string or a string sampled from this list of known problems For the second part the bit of my talk which will go to your next is on how you find properties to test Which may expose bugs even when you don't know the exact answer? Test that fails hypothesis somehow is able to remember this or record this Yeah, and I wonder how that's exposed how that's implemented and how's I exposed because one of the ways One of the reasons I write a test is oh I've hit a corner case that I don't know I will write a test to actually document the fact that I've had to handle this corner case because It wasn't obvious It's yeah, so how how how does hypothesis expose these corner cases that have failed to the programmer are they documented anywhere Yeah, so there are two parts the first is the way we case it is by storing that underlying byte string So we just have a file structure that we dump in a directory which is in the standard path and get ignore on GitHub now And if you rerun a test which has previously failed It will first run all those failing examples to check whether or they've been fixed if you want to document a particular Edgecase or bug that you know is failing Sometimes we would just say why don't write a standard unit test for that hypothesis is great But I'm not saying it's the only thing you should use We also have an at example decorator So you say at given lists of integers you can say at example some specific list of integers And it will always try that one before it starts the random generation in the example you gave We're dealing with inputs of the form of strings or integers and so on what if the things that? You might be testing our relationships with a Of a Django model in a database for example that are not that kind. They're a different kind of thing altogether. How would you? Find out whether certain relationships are likely to produce anomalous Results from your code Partly if you can write and assert for it in Python code which can be reached by feeding it pseudo random inputs Hypothesis stands a decent chance of finding that For specifically more complex data types There are a number of strategies to help with that for example the build strategy you pass it a callable and a strategy for each Of the positional or keyword arguments to that callable and it will draw from those arguments pass it to your callable and give you that object For Django specifically we have a Django plug-in which will generate your database model save it to the database appropriately and so on any others I'll take one more and then move on to some tactics for using just a quick one. How would you recommend to? do the test for multiple inputs provided by hypothesis without Reimplementing the the exact thing you're trying to test in the in the test itself for example For example, if you're testing something that converts a dict to to Jason In the implementation you would call Jason, but you don't want to do that in the test You just want to compare with the actual string. Yeah, so how would you recommend something like that? Perfect question to move on to the next part So the second half of my talk is on how you find or choose properties that might reveal bugs in your code when you test them So the classical one which is fantastic if you have it is called a test oracle an Implementation that you know gives you the right answer So it's pretty simple you test your version and you test in their version with the same inputs and then you compare them If it's the same your version is correct defined as the same as the other version Great for rewrites or refactorings new hotness equals legacy Great for fancy algorithms if you have some linear time sorting algorithm Congratulations, and you can compare it to a brute force sort to check that it really is sorting If you have a multi-threaded thing you can test it against a single-threaded version of that If you have a test oracle Use it if you don't have a test oracle there are still things you can do one of them is function inverses For example serialize your data deserialize your data You should get the same data back if you don't you know that you've got a bug somewhere Add and subtract does a similar thing If these don't have to be strict inverses in the mathematical sense You can also for example set some attribute of your object then get the corresponding attribute and expect to get the thing that you set Append something find the index of it in the list There's another class of problems where solving them is hard, but checking a solution is easy Solving a maze finding the prime factors of something tokenizing an input language It's relatively easy to reverse from your input and check that you got something back or at least something reasonable Last one There's a lot of invariance in your code where you can say doing this operation shouldn't affect anything So you do the operation and see if it did sorting a list twice Should give you the list that you got for the first time sorting it When you sort a list you should also have the same number of each element as when you sorted That's half the definition of sorting the other half being that the elements are now ordered and the final one is that there are item potion to another potion operations where If you keep repeating them or if you do them, you know that you have invariance of your system that you can check each time On that note, I'm gonna stop giving examples and go back to some more specific questions for people So if you have this cache of of No, no bad inputs that might cause problems and you have say You have a bunch of bunch of developers, you know Do you have them at each for their own cash or you and see this own cash as well or do you can you share this somehow? So the cash is designed so that it does work correctly if you just check it into you get repository That said We find it doesn't usually help that much So we recommend persisting the cash between CR runs persisting it you know on each local development machine But not bothering to share it between them Whether you do or not will depend largely on what your team is working on But for the general case we wouldn't bother doing that How would you get started adding? hypothesis into a system that had always been designed around Many slower artisanal handcrafted tests Where to crack the nut? It really depends on what your tests are checking But one that I've had great success with is just to find a Function you know in a core somewhere that you know is called a lot Work out what you think of valid inputs and just call the function Don't assert anything about it Just see if you can call it with inputs that might come up and see if that can ever trigger an exception Once you've done that for a few Date you till has I think two tests with hypothesis both of which are testing the various formats roundtrip successfully that Dumping something to an ISO format at the string and reading it back from that give you the same datetime value Look for roundtrips look for test oracles There's a number of things I'll post my slides on the slack channel in a few minutes And there are links down the bottom of each slide to some more resources to follow up on or come to the open space And ask me more detail When exactly are you doing the open space in the talk? Slot after afternoon tea. I'll be upstairs or just come find me anyway Another questions if that's it for questions. I guess the final pitch. It's under the MPL So it's open source. You can use it for anything. We love new contributors and will actually help you write patches And if you're some corporate organization with more money than time, we also offer training or sponsored feature development If there's something it doesn't do So thank you very much