 All right, folks, it's Friday. It's the last day of the conference. Don't be sad, we have the full day today. We have two days of sprints tomorrow and lots of fun still to be had. I want to introduce three speakers for the first talk of Friday in Brian. The talk is titled Automatic Testing of Python Functions based on contracts, very interesting subject. And we have here Marco, Philip, and Laurent. I hope I pronounce your names not too bad. We'll start with a first, with a video of Marco's presentation. He's here, he might be able to participate in the Q&A, but there are some technical issues. So we go for a pre-recorded video and then we will go through Laurent and then Philip. I believe this is the order for the live rest of the video, the presentation. So folks, you have 45 minutes. It's a very interesting topic. I'm very excited to hear about it. And I will leave the floor to first the recorded video and then to the two of you. Thank you so much. Thank you, Philip, very much for the introduction. Now let's see what the contracts are. Think for a second, what's a good function? There are many qualities that the good function needs to fulfill. Efficiency, readability, maintainability, evolvability, and so on. The pretty high on the list is correctness. Good function needs to be a correct one. And it needs to do what it should. You know, if it doesn't do what it should, then it's not a correct, then it's a buggy function and then probably also not a good function. Now, if you look at this example where we have a function that takes two arguments, we have an implementation that tells you how something works, but then you also have the specification of the behavior of a function and that tells you what a function is doing. Now you need to specify this what if you want to check whether how the implementation is correct and our tools help you with these checks. Then you can add proper naming. Instead of do something, we call our function approximate sqrt and we also name the arguments. There's a number and then there's precision. For many of you, this is probably already pretty clear what this function does. Now we can add the doc string. We can describe in human readable text what the function does. We can also specify the details. Here we say that the number and the precision and the result are related in terms of the absolute difference. And the problem with naming and doc strings is that they are human readable text. They cannot be automatically checked. So this documentation tends to rot in large systems. So you write a function, you implement it, then you go back to other tasks, then you come back to the function. You change the implementation, but you often forget to change the documentation. For example, imagine here if you change the absolute difference to a relative one, then the readers would still think the function is computing the absolute difference, but then your function does something else. And the documentation is also on biggest. This is one of the bigger problems. Here we did not specify whether the numbers can be negative, can be zero. What about the precision? You can add type annotations. They already help a lot. In Python, you can even add ranges to type up annotations. The problem with type annotations is that they cannot capture relations. So though you could add the ranges, so for example, you could say number is a non-negative, you cannot say that the result is related to the number and precision in terms of absolute difference, for example. Now you can write unit tests. You can pick a couple of input points and then a set that certain properties hold on the result. And the problem with unit tests is that we often tend to forget about edge cases. We only pick a couple of obvious input points. For example, I often forget or almost always forget to check for not to number. And in this case, when you compute an approximation, not to number can even result, for example, in an endless loop. So alternative is to use property-based tests instead of picking only a couple of input points. You use a framework here. We use hypothesis. You specify the whole input domain and then you assert on the properties of the output. Maybe some of you saw also, yes, that they stuck with Zach Howard, who showed the framework in more detail. Now the problem with property-based tests, of course, is that there's some learning curve. You need to use a framework, so you need to get familiar with it. And these strategies for defining and generating the inputs can become complex in certain cases. Now another problem with property-based tests is that they only run in tests. They live in a separate module, separate from your codes. So you cannot check properties during staging or production. And in our experience, a lot of bugs actually happen in these environments. You catch some of the bugs in your tests, but oftentimes it's actually the real users who will discover the bugs. Now you could write assertions. You write assertions at the beginning of the function and then you write assertions just before the end of the functions. These first assertions are called pre-conditions and the assertions just before your return and post-conditions. So the assertions is good. They can be checked in production and staging, but they're hard to process by analysis tools. You need to parse the whole body of the function and you need to do some inference on the code to check to figure out what your pre-conditions and post-conditions are. You need to be careful about multiple returns. You need to repeat the assertions whenever you return from the function. And when you have inheritance and when you deal with instance methods, that's when the assertions become really tedious because you have to be careful not to break the list of substitution principle. Now we present here our solution. It's based on contracts. We use a contract library called iContract here, which introduces decorators for the function. So we write contracts here. We actually rewrite the assertions. We turn them into contracts. There are pre-conditions. Here they're listed with a required decorator. And we specify the post-conditions here with Ensure decorator. The pre-conditions need to hold before the function executes. These are the contracts that the caller of the function needs to fulfill. And the post-conditions need to hold after the function finishes. They need to be fulfilled by the function. So the nice thing about the contract is that they live close to the code. So when you change the implementation, you can immediately also change the contracts. They can be processed by analysis tools. Now we just need to parse the conditions. That's much easier than parsing the whole body. They can also run in staging and production. They give you much better documentation because you can use a Sphinx plugin, for example, and then all these conditions are also listed in your documentation. It's very practical when you, for example, write a library. They're formal. So now if you don't have, the reader does not need to think hard, you know, a positive number. Does it include a zero or not? The contract is really clear. Number creator equals zero. So we used iContract. The tools are based on iContract library. You can find the repository on GitHub. It provides a rich ecosystem. There's a plugin for Sphinx. There's a plugin also if you use fast APIs, you can include contracts in your rest endpoints. We collected a list of recipes to help you start it. There's more to this ecosystem. But please note that there are also other libraries like deal and DP contracts and others. So iContract is not a single contract library in Python. Please note at the end that you should combine all these approaches. They're all complementary. Name your functions and your arguments properly. Use type annotations, write doc strings, write unit tests, write property-based tests. Also use contracts, you know, use whatever you can to make your functions correct. And note that there is no solution that fits them all. So for example, oftentimes complete specification is not possible. So you can, for example, write property-based tests to cover some part of the input, but some part of the input also cannot be formally expressed. You can write maybe only some contracts. So you don't have the 100% specification. But already these few contracts will reveal probably a lot of bugs in staging and production. So for example, even if you write an input, must be positive, whenever you pass in a negative input, you will reveal a bug. And oftentimes there are bugs in the color code as well. So do make sure that you also, you know, figure out something in the staging and production. And then to increase the coverage, write unit tests for all those cases where you cannot formally express them and specify them. Thank you very much. And now Loreen will present eye contract hypothesis, a tool that infers the generative hypothesis strategies based on the contracts. Loreen, the stage is all yours. All right, thank you very much, Marco. So as Marco said, I'm loud and I'm going to present to you the first of two tools called eye contract hypothesis. So eye contract hypothesis is an integration of eye contract with hypothesis, basically combining design by contract ID from eye contract with property-based testing in hypothesis. So for those who don't know hypothesis, here is hypothesis in a nutshell or all you need to know for this presentation. Hypothesis is a property-based testing library in Python, where you can write parameterized tests and define data generators for all strategies in hypothesis, which gives you an easy way of testing your functions against a lot of random inputs. So that's all you need to know for now. There's, of course, much more on hypothesis, but then you should attend one of the talks or just go to the website. Why should you use something like eye contract hypothesis? It's because property-based testing is hard and tedious. Hypothesis already does a great job of making it way more easier to write test records. And eye contract hypothesis will combine the power of hypothesis with power of eye contract, so it's integrating contracts and hypothesis. So you can use eye contract hypothesis to generate property-based test efficiently and automatically. And although it does already a great job of assuring you of some properties in your code, it does not cover all situations, so mind that you still need extra tests for your codes. What does eye contract hypothesis exactly do? So you have your preconditions in eye contract. You see them here on your left. And eye contract hypothesis will match common code patterns to hypothesis strategies. For example, if you have balance on integer, then it matches those two integer strategy with a minimum and maximum value, or you can also describe patterns on strings, argument strings. So you have some common code patterns that we can match, but there are still, of course, preconditions that cannot be matched. And in this case, we add them as filters. So here you can see, for example, a filter where you filter out all integers, whether it's two have to be zero, so even numbers. You have to pay attention if you have a lot of filters in your strategies or you have restrictive filters because they may reject too much data and at the end, you will end up with a test, that's not worth much because you don't have any data to test against. So now I will give a short demo of what you can do with eye contract hypothesis. There are multiple ways which you can use eye contract hypothesis and the first way is by using it as a library integer fighting code. So we start here from the function that Marcus has shown previously, approximate square roots with your preconditions and your postconditions. And we will start with just writing a simple test in hypothesis without taking into account any of the preconditions. So it takes the same arguments as our original function, number and precision. And now we can just call our function approximate square roots because we have our postcondition and also necessary statement that will tell us if our test fails. Now, yes, the given decorator here we specified the strategies in hypothesis. So yeah, we just want a bunch of floats both for number and precision. And with these three of lines, we have our test case and we can just run the test case and we can easily see the results if anything breaks or codes. Now, of course, we haven't taken into account preconditions and in one of the preconditions we said we want a position that's strictly greater than zero because of course, perfect approximation cannot be calculated. So next, we're trying to make sure that all the inputs does satisfy our preconditions. The first way is simple way is by making sure that it just rejects all the input that doesn't satisfy our preconditions. So it's called assume preconditions basically a function that you can generate with a contract hypothesis. And what it will do is we'll just reject all the input not satisfy our preconditions and then we get already a better result. As you can see, our function apparently doesn't handle big numbers very well. But it's, although it's already an all right solution it's still very inefficient. And we still need some boilerplate codes to write the test case. The contract hypothesis just allows us to make it way more easier by just having a single line of codes where to use a more efficient way of testing it by inferring a strategy. So that's the inferred strategy. We'll infer the strategy for approximate square roots and then test the function against the inputs. So as you can see here we already have a different output. Falsifying example and that's because of course under the hood it's different from the previous solution. So now we have tested our codes in first strategy but of course most of the time we will want to know what was the strategy that was used. So we can also print the strategy that was used for the test of function. So here you can see the two arguments number and position with our strategy. So this is the first way you can use it as a library in Python. The next thing is a comment line tool. So you can just install PyContract Hypothesis. And then the first way you can use this tool is to test your codes directly. So you have the test function and then you specify the file you want to test. You have zero Python.py which includes the same function as before. And as you can see we get the same output as before as well. So it's just going to test your code against the inferred strategies. And then you can get an overview of what input breaks your codes. Also via the comment line you can inspect which strategy was used here. It tells you the strategy and you can also see where it was used for which function. The next functionality of PyContract Hypothesis in the comment line is the GoatsWriter. It's very similar to Hypothesis, GoatsWriter. And what it does is it writes entire test suits for you. So it's like basically easiest way to write tests. It manages the imports and then you get a class a unit test case with here the test for your function. So if you have multiple functions in your file you would also get multiple test cases. There are some flags you can pass to the GoatsWriter. The first one is if you want to see which strategies that were used. So you have the explicit flag and now you get more proposed overview and more proposed test case. And if you want you can modify it to your needs. If you just want to test case without all the stuff around it then you have the bare flag. So that was the comment line, the second way. And the third and final way, how you can use IContract Hypothesis is in your IDE. So there are already exist plugins for the PyCharm IDE for Visual Studio Quotes and for FIM. And it's very easy to use. Just I will show you here how I use it in PyCharm. So you can just install it in PyCharm, go to the plugins, look for IContract Hypothesis PyCharm. What you get done is, so we have the same function as before. Just right click on your function and here you get an overview of what's possible. So it's the same functionality as before but now very accessible in PyCharm. So the first way is just test your function. When you're developing, you can quickly just run your test and see what's going on. The next way is to see what strategy corresponds to your function. Third way is to use a Go Striker. So you have the test case of test suite in your prompts and then the last option is to directly write your test suite to a file and then you have an external test suite that you can modify that you can use. So this is one of the easiest way to quickly test your code to IContract Hypothesis. So that's how you can use IContract Hypothesis to end my part. Just look into the future of IContract Hypothesis or what we envision as a future of IContract Hypothesis. So right now, if you have a function where you have multiple variables and defined relations between them, then you get a piece of code that's not very readable and not efficient. So yeah, you need to use fixed dictionaries that you filter and just put a whole condition, a whole precondition into a filter and this will often lead to very inefficient strategies that are not very readable. So what we would like to end up with are composite strategies. This is a functionality in Hypothesis where you can define your strategies procedurally. So it makes it possible to define relations between multi-variable conditions. It's more efficient because it often don't need a filter and it also makes it more readable. Here you can quickly see you have the two input arguments and you can just read it when you note and there are immediately what is going on. So that was IContract Hypothesis and then I will now give the stage to Philip who will introduce his tool, Crosshair. Thank you very much. Great, thank you, Lauren and Marco. So I actually cover three topics. The first is Crosshair, which is a tool that's quite similar to IContract Hypothesis. I will also talk a little bit about a corpus that we've been developing for contracts and I'll sort of field Q and A. So firstly, Crosshair, I am the primary maintainer for Crosshair and Crosshair is a tool to check contracts but it works a little bit differently than IContract Hypothesis. It works with symbolic execution. So Hypothesis is a tool that will apply sort of random heuristic inputs but Crosshair attempts to verify your contract in a more formal way. So we use a theorem prover to reason about all possible values when we run your code. To be a little bit more specific, we find one concrete path through your code and reason about all values along that path and then we'll try other paths. So it's concrete paths but with symbolic values. I won't go into too much about how symbolic execution works. If you don't already know much about it, that's fine. If you'd like to learn more, you can, there's some links and information about how Crosshair works in more detail at the website. But today I am going to mostly just get into the demo. I'm going to attempt a completely live demo for us today and so I'm gonna describe a potential problem and so in this problem, we are implementing a online shopping system and so the objective is to compute a total price for your online shopping order and the inputs to such a function are a set of items that the user is buying. So this is a list of line items. The line items each have an item ID for the item that you're purchasing and quantity. In order to compute a total, you'll also need prices for all of those items. So this is a dictionary that maps a string which is intended to be the item ID of the line item into a float price. Of course, don't use floats for monetary units but in this example, we'll do this and then we will return a total. So the implementation for this function is fairly straightforward. We start with an empty total, we loop over the line items, we add the price for each item and multiply by the quantity and then return the total. Now, one thing we would like to ensure that this, that happens for our online system is that nobody should be able to check out with a total of zero. Now, you can already imagine many ways in which you can pass inputs that would not meet this and we could start implementing preconditions that would be as a required decorator to ensure these but one of the nice things about tooling and like crosshair and hypothesis is that we can use these tools to just sort of tell us what those preconditions are. So there is a, I'm using part charm, there's a part charm plugin for crosshair and so we can sort of continuously run crosshair as it's watch command. And so this will sit here and think about the function and try to find inputs that will invalidate the post condition. So we already have one here and so we're gonna go through these one by one and start adding preconditions to see if we can make crosshair happy. So one thing it's noticed is that there's a key error. So here are item IDs, empty strings, a little weird MPI string but okay, this does update in live so we maybe we'll deal with this one first. One is that prices dictionary has a zero price for this item ID and so one thing we shouldn't show that is that prices always has positive prices. If you're like me, you write your list comprehension and reverse so for all of the prices and values we want to ensure that price is greater than zero. Great, so we got that one but here we see items could be empty so we should also require that the items, the length of those things should be greater than zero. Let's see if that makes us happy. Not quite. Okay, here we don't have a false, oops. Okay, here's a false and in this case we have a positive price. We have item IDs that match but the quantities are zero. So quantities should be greater than zero. Again, we need an I for item items. What do we need to ensure? Well, I.item ID should be in the prices version. Oops, I forgot to add prices to my inputs to the precondition and maybe we did it. Nope, not quite. Let's see. Item ID, quantity, price and zero, ah, prices five. Oh, the quantities are zero. So we need positive quantity. I think now we've gotten it. So one thing about this example is we didn't actually find any problem with compute total. So we did a whole lot of work but we didn't find any bugs. And this is sort of a different way of thinking about contracts and useful in a different way as well. So one of the things this does for you is it makes a lot of your assumption sort of explicit. And by adding all of these contracts to this function, all of the functions that call it will have to meet these requirements. And that in turn may require more preconditions on those functions. And so these conditions spread throughout your system in much the same way that, say, when you change the type of an argument, you have to change it in a bunch of other places. So that may seem like work but it's actually fairly important in making a lot of important things explicit. So an example, like just to go through these one by one, we would like to ensure that when you change the quantity of the line item to zero that we remove it from the cart. We would like to ensure that you can't add something to your cart for which we don't have a cart. We should also make sure that you cannot get to the checkout part of the system if you don't have any items in your cart. And finally, if you're parsing some third-party price feed, you should validate that all of the prices are greater than zero. So all of these conditions sort of propagate these requirements throughout your system. This is really sort of a beautiful thing about using contracts generally. Okay, so that is this example. I'm gonna stop across here and I'm briefly gonna show also the approximate square root function that we've been talking using elsewhere. So we can watch this guy instead. I just pulled Newton's method for approximating square roots and you can see here also we find a counter example and that's because the function I pulled uses less than or equal to whereas our condition wanted a less than position. So we can just correct that in our implementation and the processor gets happy. Okay, so let's stop that and let me flip back to the presentation. I would do wanna talk a little bit about the limitations of processor. So the first is that processor is probably in a beta quality situation. It does not symbolically model everything in CPython. Basically what's required in order to make processor work is that we have to logic sort of implement in a formal logic way, everything that's implemented in C. And so we have a lot of it but it's not completed. So there's a lot of parts of Python where processor is not that effective simply because we haven't implemented it. Also because of the way it works with the solver as co-complexity increases it may increasingly have trouble finding or solving the problems that it needs to solve. And then finally there are some limitations that are built into the solvers that we use. We use a thing called SMT solvers and there's certain problems that it doesn't handle very well. So non-linear arithmetic in general is not decidable. And so to make that concrete, perhaps an easy example that processor cannot find a counter example for is this one where we say powers of two, powers of two should all be small but of course we all know many powers of two are very big. It's very easy to find a big power of two. You can put even about any value in cure for X and non-trivial positive value and you get a very big result. But crosshair doesn't tell you it. I contract hypothesis for example you just have to try some values and it's easy to find one. So those are some of the limitations of the system. I wanna talk about some related tools. So right down the center, how long here we have some other tools that work very similar to crosshair. Many of these are based on research papers and such I believe crosshair is the most feature complete in terms of implementing as much of Python as possible. But you may also be interested in these tools. There are more toolkit like crosshairs a little bit more product like. So depending on what you're looking for some of those other projects can be the right thing to look for. On the left side, we have some other tools that just other ways of analyzing Python programs that don't use symbolic execution hypothesis. Fuzz testing is a huge active area right now where mostly security based stuff but a lot of fuzz testing can those tactics can be used to demonstrate correctness as well. On the right side, we have some symbolic execution tools which are not analyzing Python. Anger and Klee are examples of this. It can be a little confusing because anger is itself implemented in Python but it analyzes binary. So both of these tools analyze machine code sort of manage analyzes code that's at the machine code level rather than at the Python code level. So crosshair is looking at your Python code and understands Python. So that's some of the differences with those tools. Okay, so that's crosshair. I'm gonna spend a little time talking about our Python by contract corpus now. So the Python by contract corpus is a set of problem solutions and contracts. And right now we have the 2020 advent of code problems as well as the fall 2019 exercises from Introduction to Programming at ETH Dirt. Both contain good and buggy implementations. So why would we do this? Well, corpus can be used to benchmark and compare contract analysis tools such as iContract hypothesis and crosshair. And so that's why we built it. There are also good examples for education. So if you're interested in contracts and want to see some examples you can use this for it. We also strongly welcome contributions because more code means that we can do more testing of our tools and improve them. So those welcome and you can find instructions at this link. By the way, the slide deck is linked in the top notes on in the online system. Okay, and so that's it for my sections and now we'll move on to some Q and A. I think perhaps all three of us are here online. All right, well, thank you very much. It was a fantastic presentation. I really enjoyed it. It's a hot topic and it's also reflected in the chat. I see lots of questions flying by. So I'll do my best and ask you the most upvoted question. So the one that gets the highest number of thumbs up, little hearts, that kind of stuff. And we'll start with one immediately. I'll put it here at the bottom of the screen. I'll try to summarize it. So people are asking, this is fantastic, but can we have, or is there an easy switch when you put your code into production to maybe disable some of these chats or all of them entirely? So Marco is actually the author of the iContract library. There is a way to disable it. I don't know if he wants to talk in more detail about that or whether your connection is good enough, Marco? Yeah, let's try. If it breaks down, you can talk to the end. So yeah, there is a switch in the decorator. You can, at the load time of the module, you can turn off some of the contracts or all of them or whatever way you configure it. So you can be really fine-grained. You can control that at a very fine-grained level. Sometimes at load time, you can also use implications like an or not end or if you want to have that at runtime as well. So there are ways. They're also listed in recipes on iContract repository. Cool, excellent. Another question related to maybe some of this, this very topic, I guess, somehow, and I will put it here. Let's see. So a lot of us are using a lot of other libraries that we don't write ourselves and obviously don't have contracts. So what about those? Can you easily, I don't know, maybe add contracts outside of these libraries? Do you have any comments on that? I don't know whether Mark or has any thoughts about this, but I don't know a way to do this. So you can, of course, wrap the functions that you need with your own functions and apply contracts to those. But I'm not aware of a way in iContract to just apply, well, that's true. You could do some fancy Python things where you just sort of monkey patch the functions with ones that are annotated. So at least that might be possible. I don't know if we'd recommend it. Yeah, yeah, yeah. I was thinking more or less along the same lines. Anybody else wants to mention anything on this topic? Cool. Another question that flew by is what about performance implications? I mean, I imagine all these checks have some performance hit. Do you have a little bit of an idea of how much we can expect? So as Mark said, there's... So on iContract or anything? Go ahead, Marco. Sorry, guys. So on the repository, we also present benchmarks. So there are basically no, there's no efficiency penalty if you turn the contracts off. Of course, you need to pay for what you're using. So if you turn the contracts on, they will run as fast as they run in Python code. And then there's some margin overhead due to decoration. So you have this, but it's also due to how the Python works. So assertions are the fastest way if you just want to use assertions, but then you pay some marginal effort for the decoration and that's it. We present benchmarks on the repository so people can check them out. In most practical settings, like in all the production systems I worked with, for example, we didn't really observe any significant overheads so that we had to turn the contracts off. We usually turn off the post conditions, the insure decorators, but we leave the preconditions on to catch the bugs in the core code. Fantastic. Anybody else wants to comment on this? If not, we have lots of questions. We don't have a huge amount of time, but some of the questions are really actually, all of the questions are fantastic. I'll show another one here with the little sad face. What about Python 2? Some of us have a large code basis still written in Python 2 and the porting, as you know, takes time and effort. Any labs for Python 2 or should we really go to Python 3? I think all tools need at least Python 3.6 even. Or I think I contract might work with 3.5, but yeah, I think it's 3.6 at the moment for all the tools. Yeah, makes sense, makes sense. Cool, a couple of more questions. This one is quite interesting. Do you folks have any thought about this new PEP, relatively new PEP 647 about typeguards for Python 3.10? I don't know if you're familiar with that. Yes, so the typeguards, if I remember correctly, it's that you can add value ranges on the variables, for example, on input arguments, right? I hope I'm not mixing it up now. They're okay for single arguments, but they cannot model the relations. So for example, if you take the approximate square root example, there you cannot say that the result is the absolute difference from the inputs, right? The input squared is the result up to the precision in absolute terms and not in relative, et cetera. So I think they are serving different use cases. The typeguards are more looking at the single arguments, whereas contracts are really good when you have relations in the arguments or even relations to the external code. So for example, if you have a singleton somewhere, et cetera. I would say the contracts are just more powerful. So there was also discussion in Python, ideas, mail list, introduced contracts in Python, natively, but not fruitful yet. Yeah, I had to refresh my memory on this, Pep, but I am fairly certain that this one, it helps you narrow type. And so the expressiveness of the typing system isn't changing. It helps you sort of like, say, when certain functions are producing things of certain types, you can do that more effectively, it's not changing the expressivity of the type system itself. Whereas in terms of contracts, we're saying, you can say arbitrary things about your data, which I believe is not true for that. Excellent, fantastic. Thank you. We are just at the break. So there were a couple of more questions. I will copy them into the breakout room for Brian. So if you folks are hanging out a little bit longer, you can keep the discussion going into breakout, Brian. And thank you again for the talk. It was a fantastic talk, very, very interesting. And lots of people enjoyed it, lots of discussions, very cool stuff. Thank you for taking the time.