 Hi, everyone. I'll come back. Our next talk is Stop Writing Test by Zach Hatfield. Zach, the change is yours now. Thanks very much. Today, I'm talking to you about stop writing tests, and given that I work on a number of testing libraries, you might guess I'm going to be a little provocative, but I'm also giving a serious talk. This is not a joke. I don't think you should do less testing, but I do think that you should stop writing quite so many tests as you probably do at the moment. But, yeah. There we go. The first thing I need to do is an Australian tradition. I'm giving this talk from the beautiful city of Canberra, Australia, where I was born just off to the right-hand side of this photo, live just off to the left and work in the middle. Canberra is the land of the none of all people, the first Australians who came here about 80,000 years ago. I want to start my talk by paying my respects to their elders past and present and emerging, and acknowledge that Australia was colonized, not peacefully settled. But back to the testing part of this talk. When I talk about testing, I mean an activity that we do as part of our software development process, where we execute our code in order to find bugs or check for regressions and let's step through that line by line. First of all, testing is something that we do as developers or as testers. The execution of code is what distinguishes it from other very hopeful things we can do to try to make our code correct. I would encourage all of you to look at type checkers like MyPie, or Lintas like FlakeH or Pylint, which can read your code and check for errors automatically, or use tools like Black which format your code and exclude a whole bunch of confusing style problems that can lead to errors. Those are great, but they're not what I'm talking about today. I mean testing where you actually run your code, see what it does, and check that it does the right thing. The last part to either find bugs or check for regressions speaks to a bit of a philosophical split in the testing community. Some people say that the reason we write tests is to actually look for bugs. It's because we're not sure whether or not our code is correct, and we want to search for counter examples that would demonstrate that there's some problem with it. The other view which I've heard more often from companies or start to have a move fast without breaking things idea, is that the role of testing, at least in continuous integration, isn't necessarily to find bugs. It's just to confirm that you didn't break anything which was previously known to be working. This is a bit of a deep distinction, but since the techniques and the tools you use are more or less the same in each case, I'm just going to ignore it again until closer to the end of this talk. The activities that we do when we're testing, I basically break up into four parts. The first is that we choose what inputs we're going to test on, and that could be what files do we feed in? It could be what do we call our test function with? It might be what websites or buttons do I click on? Then we run whatever the code is that we're testing. That's the core part, and then at the end, we check that it did the right thing, or at least it didn't do anything that we know to be wrong like crash or raise an exception. Then finally, we repeat that as often as we need to. For this talk, I'm going to split it into three rough parts. In the first part, I'm going to talk about why would we automatically generate test cases, and how would we get the computer to do that for us? In the second, I'll talk about how to test properties of your code. Thinking about the more general things that should always be true rather than specific inputs. Then last of all, I'm going to introduce you to some exciting work I've been doing recently about fuzzing your test suite, and I'll explain that to you. But to start with, we often talk about automated testing, and I want to make the case that automated testing is actually pretty much still manual. So when we say an automated test, what we usually mean is there's a test scenario which we've expressed in code, and the advantage is that that makes it much cheaper to rerun, right? After the first time that you write out all the test inputs and the test logic, you can just run it by typing pi test, or maybe unit test or whatever else you prefer, and that does save a lot of time, but I think this is still much more like the mechanical Turk. You might be familiar with that because Amazon runs a service called Mechanical Turk where humans do work that is then attributed often to computers, but the name actually dates back to a chess-playing robot in the 1800s, and this was a huge stir at the time because no one could work out how you would make clockwork play chess. In fact, sometimes it's got a little overexcited and declared that machines would never be able to play chess, which turns out to be wrong, of course, but at the time it was true, the machine could not in fact play chess. There was simply a dwarf, a small human with a good grasp of chess and a complicated periscope tucked up inside the mechanism. And so when we talk about automated testing, I reckon that what we're talking about is still more like the mechanical Turk than a modern chess-playing computer. So let's get into some examples, but I do need to warn you, right? When I'm talking about examples in this talk, because they'll have to fit on a slide, they're going to be quite simple. They're going to be relatively simple code, relatively simple data inputs, but I just wanna promise you that these techniques do actually scale to more complicated data. If you have really complicated JSON structures or Jango models or whatever else it is, you can still test it with the same techniques. What you can't do is put them on a single slide in small enough font that you can fit it all on and still have people able to read it. So forgive me. And then for my friend, David, I promise I'm not gonna reverse a list. Let's try sorting lists instead. So this is like a basic exercise you'll get all the time, right? If you learn computer science, you'll probably be told there are many, many ways to sort lists. You've got bubble sort, quicksort, merge sort, heap sort, bogo sort, bogo sort, bogo, bogo, bogo sort, and a bunch of other slower things. But I reckon hypothetically, I've invented a faster way to sort lists. It's a way that takes only constant time. So it's a super fast list sorting mechanism. And without spoiling what it is, let's work out how I might test this. The first thing I usually do when I'm testing is not automated at all. I just open up a Python session and I run the code. And in this case, I've thought of two basic cases I should test. The first is that if I sort a list which is already in sorted order, I get the expected output, that is the sorted list one, two, three. And then the other case is if the list is not in order and I sort it, I should still get the sorted list back, right? One, two, three instead of two, three, one. And so to make this an automated unit test, I would just write the test functions. And then I could run PyTest on this file and each of those assertions would be checked automatically. So this is actually a little more work the first time I run this test if I'm automating it, but it makes it much less work afterwards. And as your project grows beyond the trivially small, that's a really good investment. Then of course, we're probably told we shouldn't repeat ourselves as software engineers. And so because our assertions are basically the same, we can put that into a primatrized test where we list out the arguments to the function and the expected results. And then we can check that with a single test body. There's only one problem with this. When I told you that I'd implemented a sorting function, I had lied. This sorting function will always return the list one, two, three, no matter what the input is. And so the output is sorted and it passes all the tests, but we might not think that that means it's actually working. So what else could we do? Well, the first one is that we probably need to come up with examples that aren't just one, two, three, or two, three, one, and that's gonna be easier if we don't have to work out what the output should be. And so if we can write a test function that looks like this, that says given any list when we sort the list and then we compare each pair of elements in the result, they should be in order. And now this function can detect a whole bunch of wrong implementations that our previous test couldn't detect. What it can't do is actually find the bug before. What we also need to do is assert that we have the same list elements in the output. This test checks that our result is in fact a sorted list. When we add the counterline, that checks that it's a sorted version of the input list. And now all we need to do is run this test function on very many different kinds of lists and we're sure to find almost any bug that we could have. And notably, in order to write this test, we don't actually have to know how to sort a list ourselves. In the case of sorting, we could just compare the results as Python's sort and built it, but we don't have to do that to be able to use this technique. And so this is called property-based testing because we're testing more general properties of our code. And I find that as well as being a really great way to find bugs, this also helps me think more clearly about the design of my API. It helps me think clearly about what my code should do as well as helping me check that it does actually do that. And generating test inputs, once we've written a test that can have any of them, is great because it means that our tests are no longer limited by our imaginations. And that means that our tests can find bugs that we didn't even know were possible. And that might seem a little crazy, but this has happened to me many times. I've actually learned things about how Python implements the quality based on the tests that I wrote. And I'm gonna demonstrate that later in the talk as well. So here's the foolproof plan. Step one is to install hypothesis. Hypothesis is Python's leading and I think in fact only library for property-based testing. And that's just pip install hypothesis. If you're on Python 3.6 or later, you'll get the latest version. If you're on an older version 3.5 or 2.7, you'll get whatever the latest version was when they went out at the end of life. Then you can read the documentation and then profit. But let me show you through an example, right? And I'm not gonna talk about sorting a list again. That kind of algorithmic library code is great, right? Property-based testing is great for it, but you can also test stuff that looks more like complicated business logic. And as an example of that, that I think we'll be familiar with, I'm gonna think about Git. So our business requirement is that if you check out a new branch in a repository, then the current branch is whatever branch you just checked out. And this has a number of side effects, right? You've got to think it's actually touching the disk. So it's having persistent effects on the file system. It's got state, it's got multiple steps. It's not just comparing inputs and outputs, right? But we can still write a property-based test. And in fact, we can even migrate our existing tests to be more property-based. So the first step is just to say that if this should work for any branch name, we can just make that an argument to the function. This is still the same test, but we're expressing better what our constraints are. And in particular, we're saying the specific name of the branch shouldn't matter. And then if we start to use hypothesis, we can say given the branch name, which is just the string new branch, same test. So we still haven't changed the logic, but now we're using hypothesis. So that's a great start. And then because we don't want to repeat ourselves, we can also define our valid branch names function, which returns the strategy, the just generator, which describes what valid branch names should be. And that means that we can reuse it easily between many different tests that I don't know do something with branches. But recall, we're still running the same code. We haven't actually changed anything yet, but we're ready to. And this is where the design feedback comes in. If you say a branch name should be any string, you're gonna run into a lot of problems. I tried this briefly and I discovered get branch names can't have trailing or leading white space. They also can't have trailing or leading dashes. It turns out that many services reject names which are too long and unicode text or right to left text can also cause a lot of problems. So we ended up discovering something about get just by trying to write a test for it. And we can easily tell hypothesis to restrict the range of inputs. So we say, maybe we should be able to pass the test of all things, but we're just gonna do this for now. And that's a really useful thing to be able to do while you're migrating. And then if we go back to the body of our test, we can see we're still only ever using new repositories. So the final version of this test that I would try to migrate to says, given any valid branch name and any repository, if we assume that the branch name isn't yet in the branches of that repository and we check out that branch, then the active branch is the branch we just checked out. So property-based testing in a nutshell. But what happens if you don't have any existing tests that you can migrate or you've migrated your whole test suite, good work, and now you wanna move on? Well, this is why I introduce you to the Hypothesis Ghost Rider. The Ghost Rider is a project I've been working on for about six months earlier this year and announced just a few weeks ago at Pike on Australia. So you guys are the second audience in the world to see it and I am so excited to share it with you. So if you have Pit install hypothesis with all of the optional dependencies, let's actually just go with the Ghost Rider. All right, and I'm updating my installation, which is a good habit tab. Now, we can ask Hypothesis to write us some tests. In fact, let's just ask, how do we write tests first of all? Well, the Ghost Rider, the Hypothesis write command will write your property-based tests for you. If you have tight annotations, it will actually inspect those to work out what the arguments should be, but even without them, it can sometimes guess and when it can't guess, it'll just leave a little template for you to fill in. So let's start with this suggested version. Let's test the read.compile function. Let's regular expression compile. And you can see that we have a couple of parts here. So Hypothesis, first of all, we import the functions that we're going to use. We have a note that we need to specify what the pattern can be, and then we have our test function. Let's try a more complicated example. Let's think about JSON values. So one test that we could write for JSON would be that if we have a JSON object and we serialize it to a string and then we read it back into Python objects, we should have an equal object at the end of it. That's kind of the basic idea of saving or loading data. And this is one of the most powerful properties that you can test, because almost every application will translate data between different formats, whether that's to send it between the front end of the backend or to save it in the database and load it back. And testing that your data round trips correctly is something which should always be true. So it's a great use case for hypothesis. So let's test that if we use JSON.jumps and then JSON.loads, so it's jumping to a string and loading to a string. And so I've got the code Hypothesis wrote over in my editor window this time. And you can see that there are way more options to these JSON functions than I would have naively expected when I first wrote this. And then ultimately our function has been written for us. We dump to a string, then we load the string back into a value and then we compare to check that they're equal. And so our comment says, we just have to replace the nothing strategy. So JSON is a recursive value, which consists of either none or Booleans, or JSON can have integers, or it can have floats, or it can have text. And so that's our JSON values, right? But then because JSON is recursive, we can also have lists of JSON, or we can have dictionaries of text. So that's unique code strings to JSON. So that's all we needed to do to write our entire test. And now let's run PyCast on it and see if this actually passes. Any bets? I want you to think about whether you expect this to pass before I run it. And if you think it'll fail, why? All right, here it goes. So we've actually got two distinct failures here. The first one is that if we have the allowedNAN argument equal to true, but our object is NAN, then the floating point not a number is not equal to itself. That makes sense. Our second argument though tells us something different. If we have allowedNAN equals false, this is Python's option which says the JSON spec doesn't actually allow for not a number or infinity. And so if we pass infinity, then we get an error because this isn't really a compliant JSON value. So the correct way to handle this would be to tell our float strategy that we don't allow infinity and we don't allow not a number. If we run our test again, we'll discover that this time it passes. So that's the power of property-based testing. It can teach you about your code and it can do so very quickly and it finds errors that you might not have thought or even possible. The other fun side of course is just because I have these tools, I can write convenient tests that for example hit the entire NumPy API and this is gonna be a big one. So with one line of code, we have generated three and a half thousand lines of tests and they're pretty good tests. I'm not gonna promise that they're amazing tests, but they're pretty good. So final thing, that's the Ghost Rider and now I wanna tell you about fuzzing. So fuzzing is the practice of getting more out of your existing tests. Instead of just generating random data, you actually keep track of which inputs major functions do something new. So you run the whole thing under coverage and every time you find a new covering example, you then keep track of that and use it to generate more. So hypothesis has, this is a demo, it's not open source yet, but I've been working on it for a while. And so this is the live view of the hypothesis fuzzer I've been working on. And what happens here is that it will automatically collect your tests and start running them constantly, not one at a time for a few seconds each, but interleaving them, working out which one is currently making the fastest progress and running them indefinitely. You can run this overnight over the weekend. And the cool bit is whenever it finds an error, it will save that error in the same way as hypothesis. So if you just run your tests with PyTest again, it will replay all of the failing examples that it found through this fancier fuzzer. And I think that is getting me pretty much to time. So at this point, I think I'm going to say, thank you very much. I hope you have learned something or maybe even decided to use hypothesis and I'm happy to take questions, just loading up the chat so I can actually see your questions. Awesome, Priyab. I am delighted that you are interested in the hypothesis. How does the Ghost Rider work with custom methods? It really depends on what your methods are. So the short version is, let me mute this so I don't have to listen to myself talk. The Ghost Rider works perfectly well with custom methods as long as it can introspect them. But the only short pitch I can give is try it and I'm pretty sure it will work for you. And if you don't, the documentation is on our documentation side. The fuzzer is not yet available, but it does have the website at hypophuzz.com and I am hoping that that will be available for businesses in the next few weeks. There is a lot of accounting stuff to work out before I can release it, sorry. And yep, that is the link to my talk page where all of these links and details are available. That's this link right here for anyone who is watching the video later. It's also in the chat. Yes, Hypothesis does in fact integrate with Django natively. So if you have a form or a model, you don't even have to specify all the arguments. You can just tell Hypothesis, hey, give me instances that fit this model and it will even generate them in the database for you. So it works very well with Django. It also works very well with pandas. We have a whole pandas extension. So you can say databases defined either by the rows or the columns. So you can say, I want columns for this or I want rows for that. And the same for NumPy support. Pankage has asked, does every test run a different set of cases with Hypothesis? And the answer to that actually depends on the settings you choose. By default, each time you run PyTest, we'll spend a couple of cases replaying things that we've seen before. So if your test failed before, we'll try to replay those examples so that they don't flakily start and stop failing. The idea is that if they failed, try something to fix the failure and then you run your test again. If they pass, you can be confident that they really are passing now. But otherwise, if you want them to be deterministic, so they run the same set of test cases each time, there is an option for that. But we tend to recommend generating new ones each time you run the tests. And that's what it does by default. Jamudar has asked, does this work for Keras machine learnings? Or another person has asked for PySpark support. The main response to that is, if you want arrays of numbers, you can do that via the NumPy support and then convert those into PySpark objects. And I do know people are doing that and it works pretty well for them. Intranel hypothesis is absolutely free for commercial usage. It's under the Mozilla public license, which is kind of like the LGPL. The fuzzer is probably not free for commercial usage because we need to fund work on hypothesis somehow. And substantially because I think for non-commercial uses, most of the value actually comes from hypothesis itself. To get a lot of value out of the fuzzer, you probably actually want to have professional software engineers and a couple of servers to run it on. And most hobby or even open source projects just don't have the resources to take advantage of that. And yes, you can integrate hypothesis and the ghost router into projects with existing PyTest tests. And in fact, as you saw on the stream, when I write tests with hypothesis, I then actually run them with PyTest. So they work really well together. If you want to use hypothesis with unit test or with knows that also works really well. So Mirage asks, does it support code coverage? Yes, it does. The hypothesis test suite uses hypothesis and coverage and PyTest. Babu asks, do we have metrics that define the success of the library? That's a bit of a tough question. So one way of thinking about that would be, how do we see if it's accomplished what we set out to accomplish? And in the manifesto on our site, we define the goal of hypothesis is to drag the world kicking and screaming into a new age of higher quality software with less bugs. I like to think we're making some progress but we're certainly not there yet. The other one would just be the straightforward idea of how many people are using it. According to the Python Software Foundation survey, four or five percent of Python users are already using hypothesis and it's a little less than 50% are using PyTest. So we have about 10% of the possible users who download any testing tools at all. I think that's pretty good, but I'd still like to do that. Perfumer asks, what does this mean for test-driven development? Property-based tests are not a different way for testing to fit into your development cycle but rather a different kind of test you could write. So property-based tests fit just as well into test-driven development as unit tests or integration tests do. Shashank asks, where can I learn more about testing in Python? I'm afraid I don't have an easy answer for you aside from the way I learned it was really just muddling my way through. So I found the hypothesis documentation. There are a couple of really good books about it but part of it would be, I would suggest finding some projects that you use and looking at their tests or projects which are similar to what you're working on and imitating the way they're testing things. Pratnush asks, do I see this competing with any other peer libraries? There are similar libraries in many other languages but it's rare for anyone to be deciding, do I use Java QuickCheck or Hypothesis because if you're writing Java, you use the Java library and if you're writing Python, you use Hypothesis. In Python, I don't know of any competitors. I think largely because as an open-source project, Hypothesis is usable by anyone and maybe I'm immodest here but too good to compete with. Ankit asks, is my ideal recommendation unit testing plus Hypothesis testing or just Hypothesis? I almost always do both. There are some edge cases that I want to make sure are always executed and Hypothesis does have support for decorating here a specific input storage test and then generate more but some edge cases, it's just not worth pulling out Hypothesis. So for me, probably about 50 to 80% of the tests I write are Hypothesis tests which means 99. something percent of my tests are generated by Hypothesis but I wouldn't say never write anything else that's for sure. Hi, Zach. You can totally use mocks and stubs with Hypothesis and with PyTest. We have completed our time, Zach. All right. Thank you very much. It was a great talk and yeah. Thanks very much, everyone. I will be hanging out in the Zillip for a while. So thanks very much and I look forward to seeing you around the conference.