 I'm going to start a little bit early. I have a question. You have questions? Great. OK. I'll start at half past then. Yeah. Testing one and two? All good? We work through many comps. I mean, this is way to organize. Yeah. It's like just like break head and like maybe if we could just like, you know, have microphones not work, maybe a projector explode and, you know. I don't know. Perhaps the projector's even pushing the right resolution. See, that's just not right. I'm just pleased that I don't have to like, reboot the whole laptop because I don't know if it works and someone comes up with a Mac. No, no, no. This is previous DSO. Oh. You know. Now. Suspenders, you've even worked these days, which is pretty awesome. Oh. Yeah. It's taken us a long time. It's an extra 15 years. Yeah. Just about. OK. Well, AV works too well. And the projector works too well. And this is distinctly on mini-conflicts. We're also on time. Even more on mini-conflicts. Even more on mini-conflicts. So maybe a volcano will erupt and everyone will be stuck here for three months. It's happened at 1 o'clock and so is that. But yes. Welcome. Thank you. Thanks, Stuart. So hi. Yes, I'm Fraser. This is a talk about property-based testing. So I'm a developer at Red Hat where I work on the free IPA, Identity Management System, and the Dog Tag Certificate Authority. At work, I'm mainly using Python and Java. But in the real world, mainly Haskell. So a lot of Haskell side projects. And I'm really into Haskell. Which is why I'm wearing this shirt. So I'm going to introduce property-based testing and motivate it with some examples. Concepts will primarily be demonstrated in Haskell. But hopefully the examples are straightforward and comprehensible, even if you're not already familiar with Haskell. We will have a brief look at property-based testing in other languages. And I'll conclude with a discussion of the limitations of property-based testing and a look at some alternative approaches. So property-based testing is a testing paradigm where you state algebraic properties of functions or data structures. A property-based testing framework will give you a way to declare how to generate random data of your types. And typically they will provide a library of these generators for types in the standard library of the language. And finally, when you run these tests, the framework will generate lots of random data and see if those properties hold. So you will attempt to falsify those properties and should it succeed, it will report counter-examples. And the good property-based testing frameworks, upon finding a counter-example, will also try and simplify those examples to get you a minimal counter-example that falsifies the property. So this is great for checking the laws and invariance of functions, algorithms, data structures and abstractions that you're using in your code. And these exist for basically anything worth its salt in programming. As the programmer, all you have to do is find out or work out what these laws or invariance are, write them down, and if you write them down in the correct way, those are your tests. You're pretty much done. It's great for checking code against a model implementation. So if you have your function myFancySort, that's intended to sort a list, maybe it has some particular time complexity properties that are different from the standard library sort. It's very easy to state a property that says that the behavior of myFancySort is the same as standard library sort. And the framework can then check that. Finally, properties are a meaningful documentation. Users of your functions or your libraries will appreciate you stating what these properties are. A type is good, but often a type is not enough to fully specify the behavior of a function. However, stating properties that hold for the function is something that downstream developers can take. They can reason about your code and how your code interacts with their code. Or to put it another way, properties can be composed and people can construct logical theorems about their code where they're using these functions that you've provided, where you've specified these properties. So they're meaningful documentation and that just happens to be machine checkable. So we'll start with some examples and I'll just switch over to my editor. So this is a Haskell module where we are going to test a list reverse function. So this function here, REV, has the type list of A to list of A for any type A. And it's an inductively or a recursively defined data type. So this is its implementation. The empty list, reversed, is the empty list. And if it's a non-empty list, so a con's list x con x's, then we reverse the tail of the list and then append to that the singleton list containing the thing that was on the front. Now the type list A to list A has many possible implementations. So we can state two additional properties that uniquely characterize list reversal. The first one is propRevUnit, which is parameterized over one value. In this case we have to concretely specify the type so the compiler knows which random generator to use, but you could equally put string or float or something else there. The property itself, propRevUnit with a single parameter x, which corresponds to this int, is defined as the equality of the reverse of the singleton list containing x being equal to the singleton list containing x. In other words, the reverse of a single element list is the same list. The second property is parameterized over two arbitrary lists, which we introduce as x's and y's. And this property is the equality of the reverse of x's and y's appended together to the reverse of x's which is prepended to the reverse of y's prepended to the reverse of x's. So does that make sense to everyone? If you have two lists, the reverse of the two lists being appended is the same as reversing this one and reversing that one and putting them the other way around. So in the Haskell REPL now, GHCI, we can simply call quickcheck, prop, rev... I'll type out that. revunit. That's going to generate 100 random integer x's and check that the property holds for each of those. And we see in this case past 100 tests, 100's the default, but that's tweakable. And similarly for revappend, it's going to generate 100 random sets of arbitrary lists x's and y's and check that the property revappend holds for each of those pairs and indeed it does. So another example, this one is a little more involved. It's an expression transformation so we have here a very simple expression type to some type containing literal integers or an add value with two sub-expressions or a mull, a multiplication value containing two sub-expressions. We can define a function elim-mull1 which simplifies the expression by removing cases where something is multiplied with the literal one, one being the multiplicative identity. That's a simplification that you can perform on an expression. So we'll write a function to do this. If it's a literal x, the function is defined as just that same literal x. If it's an add expression, the result is another add expression but we perform the elimination on both of the sub-expressions. If it's a multiplication, if either of the sub-expressions is the literal one, then we simplify the expression and the result is whatever the other branch was with the elimination having been called on that. And finally, if it's a mull without the literal one in either branch, then we just return a mull with the elimination being done on each of the branches. Now, if we state the property elim-mull1 elim, so we're going to assert here that the function elim-mull1 does actually eliminate these cases from the expression. It's parameterized over an arbitrary expression and it's defined as false if the literal one appears in a mull expression on either branch. Or if it's a mull without a lit one, we just have to assert that the property is false on both of the sub-expressions, likewise for an add and if it's a literal value, the property holds. So if we call quick check on this property elim-mull1 elim, hmm what have I done? Ah, yes. Proper elim-mull1 elim. We've got a problem. There's no instance of arbitrary for the expression type. So this is the mechanism by which we declare how to generate random values of the type EXPR. So I'll just code this very quickly. It'll be gen0 so we use the slide function to make sure that we don't build an infinite tree here. You can build infinite data structures in Haskell, but they tend to be quite difficult to test. So if it's gen0, then it'll be one of either an arbitrary literal or the pure literal one just to make sure that we have a reasonable number of the literal one throughout our tree. And if it's a gen n, then let n' equal div n2 in one of. Again, we can have a arbitrary literal or we can have an add value with two arbitrary sub expressions. And likewise, we can have a mall value with two arbitrary sub expressions. Now we can reload this call quick check on our property. And this time it runs, but there's a problem. There's actually a subtle bug in my implementation. Specifically, the problem is that when you have a mall with two literal ones inside another mall, the inner mall is simplified, but then the outer mall doesn't check that both of its branches or both of its sub expressions aren't the literal one after doing the recursive call on each of its branches. So some of you may have picked up that this bug was present in the implementation. Some of you may not have. If you were going to be writing example-based tests for this function, as a programmer, you would have to have been clever enough to realize that this corner case exists and tests for it. But with property-based testing as a developer, you are relieved of this burden. And that's a very good thing. So most languages have at least one implementation of property-based testing. There is an incomplete list on Wikipedia with the link up there, but there are some decent or popular implementations missing from that list, including one called Pixie for Python, which I'll show on the next slide, and also the functional Java test module. So for the Python example, again, we're just looking at the list-reverse example for the sake of familiarity. So there's this QC, QuickCheck and the statement of properties are, well, obviously, the syntax is Python, but conceptually, it's very similar. You state the types, or in the case of Python, you have to explicitly specify the generator for integers. And in the body of the function, you just assert the equality and similarly for rev append, except this time we specify a generator for a list of int. And then if we run this test, PropRevUnit passed 100 tests, and this one's a bit slower, but you can see that PropRevAppend also passed. So we have a high degree of confidence that the implementation of rev is correct. So some of the limitations of property-based testing. If we consider password hashing and verification. So here we have PropVerifyEQ, which states that if you verify the password s against the hash of s, that that should be true. Okay, that makes sense. For PropVerifyNotEqual, given two passwords, s and s' with the precondition that those passwords are unequal, then verifying s' against the hash of s must be false. Now, this is a perfectly sensible algebraic property. However, what if there's a bug in the hash function and it truncates its input to say eight characters before it does the hashing? Well, the problem with that is when you use s and s' you're very unlikely to get two random values with a long common prefix. So you're very unlikely to trigger this sort of bug with random values. And there are many kinds of bugs like this. So in some cases, random data isn't enough. You do actually need to be a little bit clever and fuzz the data in order to hit the particular corner cases that could be present in whatever sort of code it is you're writing. Of course, this relies on the developer cleverness, so it's not ideal. But, yeah, in some cases, purely random testing is not quite enough. So to resolve the problem in Haskell, you might write a function fuzz which takes a password and returns a generator of new passwords generating random truncations, extensions, permutations and so on. And then you can state a new property, probably verify fuzz that takes one password and returns a property and then explicitly uses this generator fuzz s to verify the not equal property. Arbitrary and its analogs in other property-based testing frameworks and other languages are great for generating random data, sorry, random valid data. But what happens when you need to specify and test the behavior of your functions under invalid data? In that case, I recommend just using a just an example-based test, so in this case it's Hspec, but it could be PiUnit or Rspec or whatever. So we can describe load, fails on bogus input string, if it's JSON we're dealing with, and should be nothing. But there's still useful properties we can test in this case and that's that if we do a round-trip of dumping a value a, an arbitrary value a, and then loading it back in we should get back the original value. So in conclusion, property-based testing is true automated testing. It gives you more thorough testing in less time, which means less money and it relieves the developers of the burden of having to be clever and, or knowing about the corner cases and manually writing tests for those. Properties are also meaningful documentation that just happens to be machine-checkable. So the best test data is random test data, but sometimes a bit of domain-specific non-randomness is needed and example-based testing still has its place. Now a brief look at some alternative approaches. Exhaustive testing is one, which says that the best test data is all of the data. So what that will do is, well it will generate all the data, or all the data up to a given size or depth and check that the property holds in every case. This supports existential properties, so instead of saying, you know, for all X this property holds, you can say there exists some X for which this property holds and that can sometimes be useful. Exhaustive testing is available in several languages. And finally, proof. The best test data is no-test data. So some languages have theorem-proving capabilities. In these systems, your properties become theorems and if you don't provide a proof for those theorems, you don't have a program. Typically it's a compiler error. These also support extraction to mainstream languages like OCaml and Haskell. And also Java and JavaScript and C in the case of Idris. And I will show you a quick example in Idris now. So again, we're going to look at the list reversal function. So hopefully that's fairly familiar now. The syntax is also reasonably similar to Haskell. So we have the definition of rev here. We have our rev unit equality type. So the type is actually inequality here. And the implementation is a reflexivity tactic that, given the definition of rev, can deduce that this equality holds. The reverse of the singleton list A is the singleton list containing A. And for reverse append this is a little more involved and we introduce a meta-variable here. If we fire this up in the Idris REPL. So you can see that we have an undefined meta-variable here. Proof rev app. So we can enter an interactive proof mode. Proof rev app. So we use intros to move all of the variables and assumptions into the proof context here. And this down the bottom is our proof goal. So we can perform induction on X's. We use the compute tactic to simplify the goal. And then we can rewrite the goal using a theorem already provided in the standard library about append nil write neutral. So this is a theorem that states that if you append an empty list to a list then the result is the original list. And we can apply that to rev wise. Right. And now you can see that both sides of the goal are the same. So we can just use the trivial tactic to remove that. And we now have the second induction case goal. So again we use intros to move the induction hypothesis and the variables into the context. We use compute to simplify the expression. We use another theorem provided in the standard library to rewrite the expression. No sorry it's the induction hypothesis next. So we rewrite with symmetry using the induction hypothesis. And now you can see that the goal is almost the same on both sides. There's just these brackets. So we can use an associativity theorem to rewrite that expression. Rewrite appendice associative applied to rev wise rev l0 and t0 and trivial. And there's no more goals. QED add proof quit. And now you can see that this proof has been added to our source file. So in some cases it's not only possible but feasible to prove the correctness of your code. If you're working in very crucial code that if it doesn't work then people are going to die and they're going to blow up or something. This is something you might consider doing as well. So some resources. Right so there's the quick check paper from 2000, the original paper by Klassen and Hughes which kind of justifies and explains the mechanisms that make quick check work in Haskell. A blog post by Tony Morris about automated unit testing Java code using ScalaCheck. This is actually a really good post because it talks about some of the principle reasons why this sort of testing is good. So things like how it moves the developer to formulating their tests in a way that involves the formation of the logical theorem which Tony asserts is an essential part of what we do as programmers. There's a UC San Diego lecture which is a great introduction to quick check. Again the URL is up there. A talk by my mate Dave Lane on quick check beyond the basics which talks about some of the gotchas in quick check and how to work around them. And finally if you're interested in learning Haskell then by my apps Haskell Learning Path is a great resource. So thanks for listening. We have time for questions too. More than two questions. Yes. How scalable is this property type thing? If you're looking to test user input validation or transform for data storage and things like that how do you deal with all the properties? Yeah. So there's a follow up paper to the original quick check paper and I'm speaking specifically about quick check here. I'm not sure what the situation is for other languages but there is a follow up paper that talks about how to use quick check to test effectful code. So that's the sort of things that you're talking about code where you're interacting with the real world using IO or interacting with a database and you're expecting to have assertions about the state of the world outside this pure code at the start of the test and at various stages throughout a computation and then at the end. So I'd recommend that resource and I'm sure there's plenty of blog posts and whatnot as well. What about for even just say a validation stage where it might be quite complex and if you're looking at traditional unit testing you might need dozens of specific tests to check the cases that you think might be in there. Does property testing have a is that something that works well in that case? Well it just depends on what you're testing. So there'll be a number of properties that apply to what you're doing to a particular algorithm or to a particular unit of functionality. So like we looked at the list reverse example where they're along with the type exactly two properties that uniquely characterize that. Well if you're going to look at a Q there's actually six properties for a Q and so kind of the more complex the data structure the more properties there may be but then it's quite surprising how few properties there are for some fairly elaborate data structures. So there's actually kind of a third way of viewing dealing with test data for testing and that's the there is some bug in the program and the goal of the testing is a randomized search to find it. So you start off with some set of randomized data and then through a set of rules probably also apply randomly mutate that data and continue to iterate over your test set until you find the bugs or give up because you've spent a day generating test data. I had done a similar presentation to this at the X developer summit last year and that was the approach that I took for the case I was using and it is very similar but kind of not really the flip side of the coin maybe the edge side of the same coin. Sure, so is that talking about something like having kind of some known good data that you start with and then starting to mutate that in particular ways or because it sounds like it's kind of random but there's some known starting point where you would begin with those tests. Right, so in the case that I was testing I was testing that a compiler was generating data structure layouts that adhered to a particular ABI so I would generate some random data structures and then gradually generate more and more random data structures that move things around or increase the nesting of data structures and did different things and then when it would find the test cases would kind of gradually grow until you had these several thousand line data structures and then it would find a bug and it would apply a different set of rules to then gradually trim stuff out of the data structure until you had a minimal reproducing test case because looking at here's a data structure that has 100 levels of nesting and is 1000 lines long. Where do I even look for that bug but then go through and trim it down to here's 12 lines and now it's easy. Yep, right and so that's the purpose of the shrink function here which I didn't talk about at all because it's a little bit more advanced but what shrink is designed to do is basically you have a counter example that QuickCheck is found and shrink is just designed to let the programmer give the tool some heuristics on how to reduce the complexity of the data structure but there's not a lot of smarts in the current implementation of shrink in QuickCheck it's just like here's some more examples that may or may not fail if you find a simpler example that fails go with that and try and simplify it again and eventually it can't simplify it anymore and it declares that that is a minimal counter example. Cool, that probably sounds good we're about done for questions unless someone has a question that results in a 10 second answer. How does the theorem programming with Idris compare with something like Coq? Quite similar but syntactically they're quite different but I first began exploring proofs and certified programming in Coq I actually learnt some Idris because I felt that with the syntax being similar to Haskell it would be a bit more accessible to show you guys but I'm actually quite a bit more familiar with Coq than with Idris Thank you. Great, thank you very much.