 Which track is this? This is software quality. Welcome to this, ensuring software quality track at DevCon US. Thank you, everyone, for being here. The next talk is property-based testing in Python and Rust by Ann Mulhorn. And I'll hand it over to Ann now. Thank you. OK, thanks. All right, so I've been introduced. And now I'm going to talk about who I am. I'm actually primarily a software engineer. Am I not on? I am on. OK, good. I am primarily a software engineer, but I don't have a naive confidence in the constant correctness of everything I write. And so I care a lot about using the tools that'll help me catch my mistakes, design flaws, and so forth. So if I'm serious, if I'm writing a serious piece of software, I'll have some sort of continuous integration thing running. And then I care a lot about deploying static and dynamic tools that'll help me get stuff right, essentially. And so static tools can be many things. And dynamic tools are often testing. And that's what I'm going to talk about today, specifically a particular kind of testing, commonly called property-based. OK, so why do I think I'm qualified to talk about property-based testing? Well, I discovered it a long time ago because I was using the language Haskell. And Haskell has the very first implementation of this idea called QuickCheck. And I was really impressed by this idea. It seemed very powerful. But as everybody here knows, Haskell has not taken over the world. And so when I stopped using Haskell, I left this really cool idea behind. And in fact, I didn't trouble myself much about that until I was working in Python in 2015. And I had a library that I was constructing that I knew was absolutely riddled with bugs and that it screamed for property-based testing. And with very little hope, I typed QuickCheck Python into the Google search box. And a few seconds later, I was looking at something that looked really, really promising. And a few hours later, I was running a bunch of tests using this library. And the bugs that I had known were there were flying out everywhere. And that's a somewhat sad story in its way. But I was very impressed with hypothesis. I went from not knowing it existed to using it in a few hours and getting really good results. Then I sank into the open-source Quagmire, which open-source advocates like so much to talk about. I ceased to be a user entirely and also became a contributor. So at first, it was the usual things. Their docs were maybe a little lagging their development or something. And I wanted to contribute to that. And then I had some actual needs for a hypothesis strategy, as I'll mention later. So I enhanced that. And so forth and so on. And time went by. And I gave, this is now my third talk about the subject that I've given. And sometime in 2017, I was invited by the hypothesis developer to become a co-maintainer. Can I fix your microphone? It's got a problem. I'll just do it like that first. OK. So I was invited by the hypothesis developer to become a co-maintainer. And I was actually quite proud and happy to become this co-maintainer. And from then on, I really haven't contributed to a hypothesis at all. And that's sad for me a bit because there's a lot of interesting work going on in hypothesis. But the reason for that was because I was working much more in Rust. And with a little more optimism than formerly, when I started working in Rust, I looked for a property-based testing library. And I found one, and it was pretty nice. It was called QuickCheck. And we used it and got some benefit from it very clearly. But the one I'm going to talk about is PropTest, which is younger, newer, and which is hypothesis-alike, explicitly hypothesis-alike. And from my point of view, a much more friendly and easy thing to work with. OK. OK, so you know that I'm a hypothesis booster here. The question might be, do I actually use this stuff? And the answer is, yes, I do. OK, so I've written a bunch of these little Python libraries here. And with all of them, I've deployed hypothesis. And you can see over here that my code coverage statistics can be quite good. And other places, somewhat inexplicably to me, they're not so good. And some places I don't know. So Paiutev is something that I took over from a previous developer. And I seem never to have really followed up to get the coverage statistics. I'll talk a little bit more about the influence of Paiutev. And this one here, the one below the double line, is in Rust, unlike all the others, which are in Python, above the double line. And getting coverage statistics in Rust is necessarily harder than it is in Python. And so we just don't know that yet. Although I guarantee you that it is less than 100. A point I want to make here is that in the open source world, I often hear people talking about code coverage as just kind of a metric. So people will say, it goes up. It goes down. We feel better if the number is higher. We feel worse if the number is lower. I tend to think it's something more than a metric, especially if you can have a well-known number, a well-known coverage number. 100 is the most convenient, but others are good too. And the reason for this is that when you're deploying this property-based testing that I'm going to tell you about, it gets this coverage in a very useful and interesting way so that if suddenly your coverage drops, that doesn't just mean that your tests are maybe not so good. That's a hint to you that right now you should look at that. Find out why it's missing that spot that it used to hit. You've introduced something new in your code. And it could be a bug, or it could be a nice event. You've introduced an invariant that actually causes some code to be dead. And if it is dead now, it's as good a time as any to remove it, in my opinion, and simplify whatever you've got. So I think that this coverage stuff is nice and useful, not just as a metric, but as a sort of debugging aid. OK, so what am I going to give you here? Extremely simple code examples so that there will be so dead simple that you will hardly notice them so that I can give you, so I can actually convey what I'm trying to convey with them so that the code examples shouldn't distract you at all. I'm going to talk a little bit about things that I learned. And also, I'm going to refer to other things because this whole property-based testing world is really big and getting bigger. And I can only cover a smidge of what this is all about in the time allotted here. Promise extremely simple example. This is something that's going to convert, take a natural number, which is actually an integer, which is non-zero in Python. And given a base, it's going to convert it to a string of digits that represent that number in that base, OK? And so here's the old way, the so-called example-based testing to contrast it with the property-based testing. And what I do is I think of good examples and predict what they should do. And I write all that down. And then I run my tests. And hopefully they all succeed. So what you see here is checking that it does the right thing for zero, which I've defined as, gives you the empty string and other things explained here. And there's the ellipses here, which means that I would do a whole lot more to cover as much as possible. OK, so that's example-based testing. And I'm not doctrinaire. I think that when that's what you can write, that's what you should write. I think that that's better than nothing by a margin, which is considerable. So what is good about these tests? Well, they catch bugs, and that's good. And we want to do that. And they catch surprises. When you do something new somewhere else and that breaks something that your old code previously relied on, that's good too. And they are a partial specification of the code they test. They give some idea of what the person who wrote this or the tester thinks it should do. But they're implicit, because you have to look at these examples. I just went back to the previous slide. You have to look at these examples and try to figure out what the person who wrote the code was thinking their code should do. So they're implicit. So now here's the property-based testing that I'm advocating. And the idea with the property-based testing is you say something. You express something that should be true about your function. And what I've done is I've actually stated what should be true about my function as a theorem. So I happen to have an extra function called convert to NAT. And all I'm saying here is that if I run convert from NAT on my natural number with a base and then I convert it back to convert to NAT, I'll come back to the same value that should happen. So I happen to have this extra convert to NAT function. And these are logical symbols that say for every value b that's greater than or equal to 0, that's all the natural numbers. I'm implicitly assuming here that they're whole numbers. And for all bases that are greater than or equal to 2, because base 1 is uninteresting, this will hold true. So that's my statement about this function that I'm making. And now here what I've done is I've copied it into hypothesis. I've used the hypothesis functions and decorators and some so-and-forth to get this. And what you should see here is that it really just is a copy. I took a mathematical expression and I changed it into an expression written in code. It's more verbose. It's less nicely typeset. But it's almost the same thing. We can see that here v and value are the same. v and base are the same. And I'm using this hypothesis strategy that gets Python integers and exerting that I'm only interested in ones that are greater than 0 for values or greater than 2 for base. And then this is just a hypothesis function which asserts exactly what I'm saying up here. It's as straightforward as that. And here's another example, which is simply a repetition of the same ideas. So just shows that you can turn them out pretty regularly. In this one, I'm just saying that the strings I generate will not have leading zeros. That's how I've defined it. I like it that way. It's nice and consistent. And so I didn't write out a mathematical expression for that. But this is how I say it in Python. OK. So just to show you that there is variety even in this extremely simple example. So I've written this code and it's in a file and so forth. And then the question is, what benefit do I get from having written this thing? And the benefit is very nice. We assume that I'm running this code under in some Python testing framework, either PyTest or UnitTest. Those are the ones that hypothesis is compatible. And so hypothesis will go ahead and it will test that specification. It will test that that specification holds true over and over and over again. So hypothesis will do the work of selecting arguments for the value in the base. It will make up those arguments randomly, more or less. It will run that function, that test function I had. It will check whether it failed or it succeeded. And if it succeeded, it will keep on going, coming up with more possible inputs. And it will keep on going until that assertion fails, in which case it will stop and report that it's found a counter example. Or alternatively, it will stop whenever and whenever, whenever your configuration parameters that you've said say it should. It'll keep trying until it's told to stop or it finds a counter example. And then, obligingly, if something failed, it'll hang on to that information, OK? So as I said, I'm not doctrinaire. I think example-based testing, when that's all you've got, is a good thing. But I think that property-based testing is really great. So if you write a hypothesis test, you still only have a partial specification of your code. You may not have been able to say every single thing that your code should do. But you've converted it from an implicit to an explicit statement about what you believe your code should do. So the reader of your test doesn't have to intuit from a bunch of examples what you're thinking. It's right there in the code. And that's good. So also, you get more real tests with less code. In a sense, they are the same tests over and over again. But because they vary the inputs, they do something different. So you write less code to get the same amount of tests, which are arguably just as good. And for me, personally, that's a nice benefit because I'm one of those early testers. I'm testing as I'm writing. And with example-based tests, maybe you've generated quite a few, and then you realize that you want to change the arguments to your function that you had somewhat misdesigned it. And practically speaking, it's annoying to change 50 example-based tests once you've just discovered this thing. But with property-based testing, if you've just written that one specification, then that's one thing to change. Tidy and quick. Another thing you get from this is you employ code reuse in your choice of examples. And that's really important. Because generally, if things are going well, you can use a hypothesis strategy. And those hypothesis strategies have been written by people who actually know about the values that they're fine, that they're those random values that they're selecting. So they know often more than you. So for my first example, I don't know why I was using this, but I was using Python's decimal class. And so the first time I ever used hypothesis, the strategy started producing random instances of decimal. And it turned out that that exposed a naive idea in my head about what decimal was and actually how it worked. So the strategy, since it was written by a person who knew more about decimal than I did, tried certain varieties of decimal number that I didn't even know existed and broke things all over the place, which was good, which was what I wanted. So that kind of code reuse is very nice. You hand off all the work to some expert in decimal or floating point or whatever. And you don't have to do that. And then the other thing is this database of failures that hypothesis keeps around. That's really nice, too, because now it has a property that it wants to check. But the next time you run after you fail, it goes straight to the thing that failed last time and tries that first. So you can actually be less clever than you would want and perhaps want to be with them. Then you might have to be with example-based tests, where you probably do something to make sure that the example that was interesting somehow got to the top. Hypothesis does that for you, so that's great. And in fact, hypothesis does a huge amount of other stuff, which I'm going to mention at the end and let you decide if there's any particular thing. You want me to talk about more. OK, so I've been talking about how great property based testing is and the advantages you get, but it's not actually easy to use it. So it's harder than example-based testing. Sometimes it's difficult to come up with a useful specification. You can't think of anything to say about the function you wrote. And as a former academic, I'd say that's a bad sign already. So I would say, you have no business writing that function if you don't know what it does or something like that. And I still stand by that. But it does require some mental work to express things. And that makes property based testing harder. And you can always, this is great, you can always cheat with the example-based tests. And people are tempted to do that. So you write an example-based test. You think, oh, these are some interesting values. So you stick them in, then you stick some random value in an expected result. And then you run it. And then you see what the real value is. And you're like, oh, yeah, that's what my function does. And you plug that in. And you have a test. And my point here is that property-based testing kind of makes it impossible for you to cheat in that way. And that's good. Some things are just going to be harder to express using hypothesis. The whole idea is that we have these universally quantified statements. And then we can say a lot of stuff because Python is very expressive about what should happen. But some things are difficult. Or sometimes you can sometimes just use hypothesis in a silly way. And I think I've done that a couple times because I was already using hypothesis for something. And it was cheaper to use all the infrastructure of hypothesis to get a string that didn't even have to be random because randomness didn't affect the quality of my test. But somehow it ended up there. Sometimes time is a problem. And I've certainly encountered that with more numerical things where you want to test something. But you can really see that hypothesis is cranking along. And it's taking a long time to actually search even a few examples because it's computationally expensive. Or you might run into a practical difficulty where you want random values. And hypothesis doesn't have a way to generate those particular random values for you. Hypothesis contains many strategies for generating typical Python values. And also it contains many higher order strategies which allow you to build and assemble strategies between those higher order strategies and its internal strategies. You can build more powerful strategies. And that works great. And I know that I checked this. It took me closer to two years into one year of study. Well, no, it took me 18 months as one and a half years. It took me 18 months to get from the point where I was using hypothesis. I started using hypothesis regularly to the point where I needed to actually write my own from scratch strategy. Before that, I could always just synthesize strategies using hypothesis higher order strategies. So that's another, I think, vote or a demonstration of the power of hypothesis as it is. I'm also going to mention that I had a prejudice about property based testing, which I lost due to the offices, kind offices of a person I never actually met. So because I was first introduced to property based testing by Haskell and anybody who's written in Haskell knows that Haskell is the pure functional language. And usually, if you're writing in Haskell, the demonstrations involve abstract syntax trees and fun things like that. And it's all kind of, it's got a nice functional feel. And so I associated property based testing with pure functional stuff. But I was really wrong about that. And I found that out when I took over this PyUdev library, which is just a Python wrapper for LibUdev. At this point, many of you could be forgiven if you're not storage people for not knowing what LibUdev is. And so I'm just going to mention briefly what it is so that you understand what I mean by not really a functional thing. You all know about devices. We have devices on our computers, and they're like the mouse and the keyboard and the hard drive. And there's this device abstraction, which Unix introduced and which is very useful. And besides all those familiar devices that I talk to you about just now that we all think of when we think of device, there are lots of subterranean devices that support those visible devices that we care about. And also, if you're a storage person like me, there are also super devices that are built from the devices that people normally think about. There's lots of devices cooperating. And Udev, that dev part is about devices. Udev is just essentially a database, not in the sense of a relational database, but just a table. Udev is a table. You identify the device, and then there are some properties of the device. What does two mean? Two minutes? OK. And then questions? All right. So I'm actually going to stop talking about PietyUdev, and you can ask me about that later. But the point was it's all about devices. And the original developer had actually had invented his own system of property-based testing. And I went ahead and hypothesized it up for various reasons. And because of the combination of hypothesis and also the fact that I was running on funky machines, I found more bugs, bugs in PietyUdev, bugs in LibUdev, and a bug in the actual Udev database. It couldn't agree with itself. And that bug, I think, is permanently stuck there. But I found it. So there you go. OK. Python to Rust. Do I have one minute? OK, thanks. Yeah, so as I mentioned before, I did QuickCheck at first, because that was really what was available. PropTest is the newer one. And the difference between QuickCheck and PropTest, which you can find out about because they had many cordial discussions about this, is it's about how counter-examples are minimized. And you'll notice that in this short time, I didn't even talk about counter-examples and minimization. But what I have to say is that PropTest just works much more nicely for me. PropTest is good, but it's not as mature as hypothesis. And what I did is I just reworked the first example that was in Python into PropTest. And you notice there's a macro here instead of a generator, because Rust supports macros. There's this little test annotation. And then I specified my values a little bit differently. But it's really the same thing just in a completely different language, and done in a completely different way, but the same thing. And that should get me to the end. And on saying, now you can ask me questions, and you should feel free to contact me about this topic because I would like to help you if you're interested in getting moving in many ways about this. OK, so questions? Yes? So I do like, I don't know, I was thinking about how to have the stuff that goes away. Do they have prefab strategies for things like a positive definite matrix? I can't answer that definitely, but they talk about NumPy a lot. So that's not really my area, which is why I can't. But they may. And another thing that I didn't mention is that on the hypothesis site, there are external hypothesis-related activity. And well, I've actually contributed to that to writing my own strategy. And if you were absolutely desperate, and this didn't exist, you could write your own and put it up there. And it would be featured. And can you just repeat those? Oh, the question was about numerically heavy, algebraically heavy specific things. And does hypothesis have strategies for that kind of stuff? And I couldn't answer that directly, but I do see NumPy floating by. So there's hope, but I can't be specific. Yes? You do get an error in CI that you didn't see locally. Is there an output? Like, hope makers have a run and says if it's for RAM space, you can start to see. Sorry, say that a little bit more. You're running it like? In continuous integration, yes. You have a failure that maybe you didn't write one, maybe you're running something subtly differently. Is there some sort of output that you get that you can instead of doing the whole space and start with the thing that causes the problem in the other environment? Right, so that's a good question. I think in the old, longer slides I explained that that's a problem, because it is hard to figure out how to reproduce precisely how you got there. Hypothesis, it's not so bad because hypothesis will actually tell you the failing examples as it's output, and so that could get you a reasonable amount of way there. So in fact, I should be more positive. Hypothesis will say these are the arguments to the test function which cause this to fail, and that will all be visible in the CI, yeah. How do you deal with the sort of inherent complexity sometimes in writing specifications for things? Do you run into situations where the tests themselves have spugs because of the nature of writing that, and how do you deal with that? Yes, so the question is, well, what if your tests have bugs? And that's a good question because your tests will have bugs on occasion, right, yeah. And that's an interesting rabbit hole, and I do not say I have any advice. I once told somebody you have to write your test, you have to test your test, you have to have this infinite regression, but obviously that's not practical. Yeah, so there are two things that can happen. One is your tests let bad stuff through, and that's unpleasant, that can definitely happen. The other one is that your tests fail because they catch good stuff, essentially. You wrote them in a way such that they fail because they're perhaps, you either wrote them and they're a bad match for what you meant to write, or else you were wrong and you wrote them. And those are two different things, but both of those are less troubling than the other one when you just let the bugs through. Yes? So should I just start yanking gloves out? Oh yeah, you can. I don't wanna cause any trouble. Excuse me guys, this is just a couple of announcements. So if you are not aware, we are having a DevCon party in the evening. If you wish to attend, you can pick up the tickets at the registration desk. And tomorrow morning for the DevCon keynote, we are having the Red Hat CTO and the founder of Code Newbie Speaking. That will be at 9.30 in the big room. So please do attend that, thanks. No, that one is you guys. This one is mine. Thank you very much. Thanks. Yep, okay. Oh wait, so if you're gonna be sitting over here, do you want to use this one? Can I turn it on? Yeah. I mean, this one is out of range. So if you've learned to walk, how do you understand that? Yeah, I mean, it's very, it's very easy. Yeah, exactly. Do you want this one specifically or? No, I mean, is this one also working? No, no, this is working. This one. So whoever plans to stand here can use this. If you're planning to stand away, then you should use that. Okay, so not this one. You can, both of them are, all three are active. All three are active. So yeah, you can use that if you're more confident. That's the clicker, yeah. So if you want to use the clicker, you can just put this in. Can you just test it? Blue. Check, check. Yeah, yeah. Cool. Awesome. Check, check. Yeah. Right. Yeah, I don't even know how to do it. Just give me a minute and a second. Can you close that? I think that's the issue. That's why you can't see the slides. Can I do that? No. Okay, show me the next person. Okay, it's just not. Yeah? Okay, go back. Okay, go back. Okay, go back. Okay, go back. Okay, go back. How can you point here? One of us can just click it physically. Is there a power button? How is it off? It's off. Oh yeah, that would make sense. Oh, there we go. You guys have a minute. Yeah, you have a minute. Just come to the side. Okay. Do you want me to introduce you guys? I can't, sure. I mean, I can just give like a welcome to the talk. Sure. Thank you. In this corner. Make sure you don't stand here. Oh, sure.