 Let me welcome everyone to this session of property-based testing and examples of properties from business applications by Saurabh Dandha. We are glad they can join us today. So without further delay, over to you Saurabh. Hey, thanks Dinesh. Hi folks. My name is Saurabh. I run this company called Vacation Labs. We are based out of Goa where it's still raining like crazy. So I'm an engineer at heart. I've built large scale systems in many tech stacks, Rails, Angular, Java. But lately I've been having a love affair with Haskell. And I'm also trying to make learning Haskell simpler in this passion project that I've started, haskelltutorials.com. And with all of this love affair of Haskell is when I discovered property-based testing. So I heard about property-based testing at Functional Conf in 2016. Last year I gave a similar talk, version one of this talk that I'm about to give right now. I gave that last year. But that was primarily intended for a functional programming sort of community. So it had a lot of details about using hedgehog, a Haskell library, hands-on code examples, what kind of problems you run into when actually writing property-based tests, stuff that people don't talk about. But this is version two of the talk. So over here I'll skim over a lot of slides which have core Haskell examples because I'm assuming no one is interested in Haskell over here. So actually let's try that out. I always do this in my talks but there's audio and visual feedback. How many people in the room right now know about Haskell? There are no thumbs up. One comes up. So I think that's just two thumbs up. So yeah, that's the thing. This talk is, I changed my slide now. I'm waiting for the slide to get synced. So when you're talking about PBT, you can take two angles into talking about it. The first one is you talk about the infrastructure around setting up tests or the libraries that you need to use to run property-based tests, the kind of problems that you would run into while generating random data, things like that. The other thing is once you've set up the infrastructure, you have to think of good meaningful properties. So this particular talk is more about the latter, which is thinking about meaningful properties. This will get into very little details or almost no details about the first half, which is the infrastructure around property-based testing, which library to use, how to run parallel tests, how to generate random data, etc., etc., because all my experience over there is with respect to Haskell, which is for this audience is pretty irrelevant. So keeping that in mind, I'll move on to the next slide. These are all from my previous talk. So this is how I got started, John Hughes Conference, the Functional Fund Talk at 2016, where I first learned about PBT. I was really excited. I went back and I tried implementing it at vacation labs and failed miserably. So the thing about this particular talk, and he's given the same talk multiple times is it's very motivating, very inspirational, but it lacks sufficient detail. So in fact, twice in this conference in Functional Fund, I've uttered this cheeky quote of mine and I'll do it again. My experience is that only two people know how to use QuickCheck effectively. One is John Hughes and the other is God. And the more I keep looking at this problem domain, I realize that the community is still figuring out how to use PBT. So if you're struggling as I was, that's par for course. We've started this PBT, this term itself has started becoming popular over the last couple of years. And people are still figuring out how to harness it and use it effectively. So let's continue the skimming of slides, all of this is not relevant, all of this is not relevant. This is about getting into details about Hedgehog, the library. And here is where we start with the meat of the current talk. So the problem with the other, apart from the infrasight, the other side of the problem of PBT is that how do you think about properties? How do you write properties in real life, business, line of business applications that typical web apps, user registration systems, blogging systems, e-commerce checkout cards, POS systems, those kind of apps that a whole lot of us are writing to make a living. Because all the examples that I saw back in 2016 and even today, the situation is slightly better. Back in 2016, everything was all about the typical, if you reverse a list twice, you should get back the same list. Sure, theoretically it's right, but practically it's useless. The other class of tests which everyone spoke about were round-tripping tests. If you have a JSON serializer and a DC serializer, you take a data structure, you serialize it, and you DC serialize it, you should get the same data structure back. Now, that actually is a very useful test. I'll also talk about it in one of the slides. But it's limiting. That's not the only thing that you can do with PBT. So we have to move beyond that. Compared to 2016, today the situation is slightly better, but really only slightly. While preparing for the stock, I went to Hackage, which is the Haskell package repository. I looked at all packages that had a reverse dependency on either Hedgehog or Quick Check. These are the two most popular PBT libraries in the Haskell ecosystem. I found a lot of them. A lot of them are trying to do property-based tests. I dug into about 80 or 90 of those projects, open-source projects. And almost all of them were using property-based testing in a trivial manner. Either it was a round-trip test or some extremely small part of the entire program was being tested. So reinforcing my thought that, yes, PBT is getting popular. It is useful. But figuring out how to put to use is still a challenge. So this is the typical, I know this is Haskell code. Please pardon me for that. But the idea over here is the last line. So list.reverse, list.reverse, like you're reversing a randomly generated list twice and you should get back the same original list. That's the whole point of this test. That's your typical theoretical reverse list twice sort of an example, which I don't particularly like. So let's move on from here. Not interesting. So how do you think about properties in real-life apps? So we'll start with some hand-waving examples and then I'll follow that up with some examples from real-life code that I have written. So first-hand experience and I'll try to spend more time in how the thought process that went behind writing those kind of properties. So the first rule for property-based tests and to make sure, I mean, if you really think hard about it, property-based tests and example-based tests, example-based tests are the kind of tests that most people are used to writing, which is a known set of inputs. You apply functions to it. You take those known set of inputs and process it. You process it through your system and you get an output. But because your input is known, therefore your output is also known. So everything is hard-coded in your test. Your assertion, the input that you're feeding into your system is hard-coded. The output that you're comparing it against is also hard-coded. Now in property-based tests, you try to generate random inputs, but in real life, you cannot generate completely random inputs because in any real-life business application, there is a known range of inputs within which your application is expected to function. So it ends up being a grayscale that, for example, that list example, which we always use, reversal list twice and it should give you back the same list, that can work for any and every list. But in all probability and in my experience, that is not how things work in real-life business cases. You would want to generate a set of random inputs, but you would want to apply some minor constraints on it so that the inputs are meaningful. However, if you constrain your inputs too much, then on a grayscale between example-based tests and property-based tests, the more constrained you make your inputs, the more you're moving towards example-based tests. So your thought process should be that for all inputs x, I need to find a property y which should hold true for all inputs x, but there can be some minor constraints on x, depending upon the business scenario in question. You constrain it too much, you start moving towards example-based tests. So that's rule number zero. So now the first one, let's get this out of the way, round-tripping. Basically, everyone starts off with PBT using round-tripping tests and that is not a bad idea. Round-tripping tests are extremely powerful. If you are handwriting any sort of codec, JSON codec, CSV codec, writing a record from the database to a file to YAML, once you run that codec to a round-tripping test, hang on, I'll just have to ask a thumbs up again. Is this pace correct? I completely assumed that people who are attending this talk have an idea about property-based tests, so I have not gotten into the absolute basic theory of PBT. Is that good enough? I can see four, five, six, seven thumbs up, eight. All right, so I'll continue with this pace. So not going through the basics, but just focusing completely on examples. So this is perfect for serialization and deserialization. My own example says that even if you, you know, something, some codec, some serializer and deserialization function pair, which seems very obvious, the round-trip test for it is literally one line, write it. You will be surprised by the kind of edge cases that it catches, right? So we wrote a simple round-trip test for one of our APIs, like that's the third example, can possibly be used for simple rest APIs. You post to a rest resource, so basically you create an in-memory data structure, you convert it to JSON, you post it to your API and then you call another API to get the same resource back. It'll give it back to you as JSON, you convert it back into your in-memory data structure and you compare the input in-memory data structure and the final output in-memory data structure. They should be exactly the same, right? This simple test, you know, in our case, we realized that in our, somewhere in our API pipeline, there was some library we were using, which was not handling Unicode properly. So when you end up generating so much random data throughout your entire input space, right, simple strings, if you're writing a basic example-based test, you would, you know, for a name field, you would use your own name for an email field, you would use your own email. But you know, in PBT, when you're generating random data, it's going to be all over the place. You will generate valid ASCII, you will generate valid Unicode, you will generate invalid Unicode. If you don't have a round-tripping test, you would never spend the time writing these kind of edge cases, but a PBT will do that automatically for you. So even though these tests are very simple to write, literally just one line of code, you will be surprised by the kind of stuff that they can catch for you, right? So some examples, perfect for serialization and deserialization functions. The third one also we've just spoken about is round-tripping across a REST API, right? As much as possible in your REST APIs, you would want your input JSONs and output JSONs to be symmetric that can be tested and validated across refactors and across multiple versions of your app by just one line of code. The other thing is the Unicode example which I gave, right? Somewhere along your text manipulation pipeline, you're not handling Unicode properly either or you're making assumptions that you would always be working in the English language. And trying to say something as simple as trying to split something by first name and last name while splitting by space works in English, it might not work for some other language. Conversions to lowercase and uppercase might work from English to English, but they might not work in other languages, right? So those kind of things can be caught very quickly with simple writing tests, right? Moving on. Now this is a bunch, once you start, if you're interested in this problem and you start reading about this, a lot of people use these mathematical terms. idempotency, associativity, I have also used those, but let's try to exemplify those, right? So the idea is that the result should not change if the same function is applied multiple time on its inputs, right? So one example is changing the case of a string repeatedly, right? You call two uppercase on a string repeatedly, it should give you the same results unless there is a bug in your Unicode handle, right? Very simple example, not very meaningful, but I just put it there. What about the second one? You make a put call to update a rest resource. As long as your put update is the same, the resultant resource on the server side should be the same, right? If you've actually followed rest recommended best practices, right? So calling a put with the same payload repeatedly should not change the server side representation of the resource, right? So it should be idempotent, right? Third example, and you know, we actually wrote one of these tests and we realized that in our vacation labs code, there was a bug in the way in which sign-in was working, right? So basically, otherwise you wouldn't end up thinking of testing these sort of things. So user opens two tabs, both of the tabs have the sign-in page open. He does sign-in on one page and then he forgets that he has done a sign-in. Then he opens the other tab, the sign-in page is already there and he does a sign-in again. But the next time when the post-to-slash sign-in happens, the auth token is already in place. Are you 100% sure that your current API implementations are handling this HKS problem, right? A simple idempotency PBD can help you cache things like this, right? So that's one way to think about it, right? So first one was round-tripping. Look at your problem-to-paying. Can you find certain data models, certain functions, certain APIs, which should satisfy the round-tripping property, right? Can you find same? Can you find certain processes, functions, data models, APIs, which should satisfy the idempotency problem? Once you have the overall infrastructure set up, right? Setting up the overall infrastructure is a bit of a pain. There's a whole lot of boilerplate involved, not the topic of this talk. But once you've set that up, each of these is literally a one-line test. And the value that they deliver is pretty high, right? This is a bit of a variation. I don't know the mathematical term for this. But this is a bit of a variation on the idempotency sort of property. So you have a transformation. You have a data transformation. And you have an input. You transform the data. You get an output. Now, the result set between the input and output should be the same. So I've seen this a number of times. Like most common use case, almost every one of us, if we are writing web apps, we end up writing a search and sort API. If either it is written on top of an elastic search, or it's a wrapper on top of a basic SQL. But there are at least 20 times in a project that you end up writing some sort of a search and sort API, search, sort, and pagination, right? So without hard coding input examples and output examples, one way to verify that it's working fine is you perform the same random search. But you sort it by different random fields. So the search part of the query is the same. But the sort is different. The result set, not the exact list, because the order is different, so they will be different. But the distinct set, the result set should be the same, right? It seems obvious you would not, you would say, not very useful test. But we've actually caught bugs with such a simple property. So in our vacation lab sport, we had a very complex search and sort API which was joining across multiple tables doing a bunch of subqueries and joins, inner joins, left joins, etc, etc, etc. And using a whole lot of aggregations. And there was actually a bug that depending upon what field you order, one of the subqueries had to change and that was not implemented correctly, right? No amount of manual testing, no amount of example based testing would have caught that, but this sort of a test caught it. There are more variations of this. Once we get to the sort of deep dive examples, one of my longer examples is about the search and sort API. So this is not the only property. There are other properties around search and sort API that you can think of. All right, that slide should have been deleted, moving on. The other two sort of tests that people end up writing are associativity and commutativity. In the Haskell space, these are mostly used to verify monoid laws or monoid laws, not very interesting for this particular talk. But here are two interesting examples. So hang on, just quickly. Do I need to explain associativity and commutativity or does everyone just understand it? Please thumbs up, please. Okay, just one thumbs up, I think I'll explain it. So essentially, associativity in mathematical terms, it means is that the first thing is the first A plus bracket of B plus C should be equal to the bracket of A plus B plus C, that is actually associativity. Which means that if you've got three values and an operation which combines these values, it shouldn't matter in which order you combine them. So if you combine B plus C first and to that result, combine A. Versus if you combine A and B first and to that result, combine C. You should get the same answer, right? The simplest example for that is addition. The second is commutativity, which means that the order of arguments, like if you combine A and B, if you add A and B, or you add B to A, like flipping the order should not matter, right? Now in terms of business applications, what does this mean? In terms of data structures, I mean if you're actually sort of handwriting a sort of a tree or like some esoteric data structure that is required in your problem domain, then yes, you can apply these properties. You can think very mathematically about these properties. Because then your problem domain is actually the data structure. And data structures like your four trees, red rice, maps, red black balance trees, all of those kind of data structures actually have four mathematical properties that they need to obey. But taking that idea and putting it into a business application, right? Let's look at a hotel booking system, right? Generate a set of random bookings and cancellations. Keep that set constant. You've generated a random set of bookings and cancellations. Now, apply that to your system, right? New booking, cancel the previous booking, new booking, whatever. It's a random, the set of bookings and cancellations is same, but they are applied in different orders. At the end, the resulting room availability should be same. Irrespective of the order in which bookings and cancellations are applied, there will be an edge case around cancellations, which means you can't cancel a booking before it is done. But that's an implementation detail. But conceptually, irrespective of the order in which bookings and cancellations are applied, the resulting room availability should be the same. Seems like a very simple property, but in a complex system, real life use case, our availability was being cashed. This uncovered bugs in the caching logic, right? So seems like a very simple property, but once you write this property, you sort of realize that this property should hold for all sets of bookings and cancellations, and you write it down, and you let it run over a large random data set. You will be surprised by the kind of bugs it catches, which you would never think of when doing example-based testing. Another example, although I have not used or written something of this nature on my own, but follows the same principle, serialized, and so you have an algorithm, you have a certain process, you can either run it in a serialized manner, it picks up one data, processes it, picks up the other data, processes it, single-threaded serialized, versus if you write a concurrent implementation for it. If your problem domain is associated or I'm not sure, I mean, if it follows these properties, you should get the same results, like the order in which you're processing your input, either it's serialized on or concurrent, it should be same, right? So as soon as you realize that, you can have two versions, and sort of have a simplified version which does it job serially, have an optimized version which does it's job in a concurrent manner, use PBT to test their results against each other, right? That is possible only if your inputs and outputs obey these laws, that you can process them out of order, but still get the same results, right? So taking those mathematical ideas, associativity and commutativity, and seeing how they map to typical business problems. Moving on. God, I have only 15 minutes left. All right, can I speed up or is this pace good enough? Thumbs up. Okay, there was an ill-formed question. Thumbs up for speed up? Speed up, everyone wants speed up, right? So one more thing is that once you've got the infra in place, I tend to do this. I basically write a no-property test. You've written so much of boilerplate, so much of infra for generating a random input, you let it run on your system, and you just observe where your system breaks. There's no property, you don't assert for anything. You know, the only assertion can probably be that say if you're testing an API, it should not throw a 500 error. Just let it generate random data and let it hit your API, and you see where your system breaks. And that can result in a lot of learning and a lot of unexpected sort of bugs that you wouldn't have thought about, right? You can use this to discover potential SQL injection spots, spots where, you know, Unicode handling is incorrect, unhanded system states, things like that, right? So essentially there is no property over here. It's just there's piggybacking on top of the testing infra. You've written so much, you've done so much of work to write the infra. Just use it to run your system across a large input set. The other one is comparing against another system or implementation, right? So you've got two systems for the same system. You've got two implementations. For example, you have a pipeline which delivers things from a cache. You have a pipeline which delivers things, not from the cache, but hits the database directly. For any input, they should return the same results. With some constraints, probably you make some changes and you have to wait for a offline job which updates the cache to complete. So you let that happen. You put that in your test infra, but after that precondition is met, your cached results and your non-cached results should be the same, right? Optimized algorithms should have the same results as brute force. Specifically, if you are sort of doing, if you're migrating legacy systems to new implementation, you can use PBT to test results from the new implementation against results from the old one, right? So I'm speeding up next. A special case of that of the previous slide is implement a simplified model of the system under test, right? So over here, both of these might be implementations that are already there in your code, right? A cached implementation and a non-cached implementation both need to exist, right? But, sorry, but in this case, you write a simplified implementation only for testing, right? This is, you know, if you want to use PBT and you have a complicated stateful systems, this is sometimes important. It's a lot of hard work to get right based on my personal experience. You end up with two implementations of the same system and when you find bugs, sometimes, you know, the bug is not even in your original system, but in your simplified system, right? So you really have to look at the intersection of these systems to find bugs. So moving on to the more in-depth example, we have about 15 minutes. I think it should be enough. This time I should be able to finish my deck in time. So this is the odd jobs library. It's a database queuing library which we wrote in Haskell and tested using property-based tests, right? So quick overview of what this looks like. This is the UI. So this is where you can see that you have the ability to search, sort and filter jobs, right? So the backend for this database is Postgres, right? So it's not Redis, it's not anything. So this search and sort is basically a Haskell API written on top of SQL, right? So very, like a very common problem across most business apps. So now using PVD to test job filtering. Like is the API written on top of simple SQL ware clauses to search and sort? Is that implemented correctly or not? I'll skip over this Haskell code. This is from my previous talk. So essentially the reason why it was worth even writing the test for this is because I was not using any advanced SQL library. I was using string concatenation to generate raw SQL strings because I wanted to keep the dependency footprint of the library though, right? So these are all actual problem being a test infrastructure. I'm going to skip over all of them. These are all from PVD stock. So now the problem is with normal tests, you have known data. Like if I were writing this three years ago, if I had to filter, I would end up writing 20, 30, 50 known rows into the DB, making a bunch of known search and sort query calls and then testing it against a known set of good results. Basically example based testing, right? Now, but as soon as you move to PVD, the rows that you are inserting to DB are random. The filter query that you are issuing is also random. What do you compare it against? So this is where you compare it against a simplified model. So in this case, the simplified model was, okay, Haskell code again, Haskell alert. So in this case, the simplified model was you generate a bunch of in memory. So this is a job queue, right? So this is searching and sorting over jobs. So you generate a bunch of in memory jobs, you write them to the database, you retain them as a simple list in memory. You generate a filter, a random filter, right? So in, you know, the random filter would look like, I'll just show the data structure. Yeah, a random filter on the basis of created after created before updated after. So you basically randomly generate values which fit this data structure. You run the same filter via the system, which is it hits the database and gets you the results. And you run the same filter against the simplified implementation. The simplified implementation is doing basic and limit offset is done using the simple take and drop functions. So take elements after dropping O elements, drop offset elements and take limit elements, right? Filtering by created after is basically, you know, get me all the jobs less than equal to job created. So this is a simplified in memory implementation of the same filtering. You apply the same filter to the in memory list. You apply the same filter to the jobs you've just stored into the DB and you compare them against each other. Now, essentially with this system under test, you know, the simplified implementation approach, you end up with this problem that you have two implementations to define, right? And this is where the art lies, right? How simple it should be or simplified implementation, how detailed should it be, things like that. This only comes with practice. There is no one size fits all answer to this, right? So there is another example from the same case study, another simplified model to test against. So this one was for, so that was only for the search and sort API, right? Job monitor, the UI needs to have searching and sorting functionality, that was just for that. This was for the core job runners, right? So you have a job monitor, which is monitoring the queue for new jobs. It spawns up multiple threads. It gives them, you know, the job time of minutes. So every job gets a time out if it doesn't finish within those, those many number of minutes, it's assumed to be crashed. It gets re-queued again. So there's a whole lot of stateful logic, which is going on over here. So the ultimate test that, you know, which was written for this is, I'm just kind of hard to explain Haskell code right now. But anyways, I'll try to basically explain this conceptually. You generate tons of random jobs, you store them to the DB and you start running the jobs. So your system is running, but as you're running the jobs, as and when jobs are being picked up, you maintain an in-memory audit log. So you basically use the hooks that the job itself provides. So these are the hooks. So whenever a job is started, you can call a function. Whenever a job is failed, you can call a function. So you use these hooks to maintain an audit log, in-memory audit log, which creates a serialized representation of what the job you did. Now, after some time, you stop the job queue and you let it stop gracefully. And then you start running property-based tests against the serialized in-memory audit log that you've collected and, you know, the properties. So here are the kind of properties that I came up with. First is this one was added a little later, the very first one, no job should be in a lock state. So this one was added after graceful shutdown was implemented. So if you have 10 jobs running in memory and you want to shut it down, what the system does is it waits for the jobs to complete execution. After a wait time, if they are still running, it kills those jobs, but it needs to unlock them. So that some other thread can run. So if that is implemented properly, at the end of this entire process, no job should be in a lock state. If graceful shutdown is implemented correctly, irrespective of whether a job was in the middle of running, whatever, whatever, whatever, lock job count should be zero. Next thing is that, next one was that all jobs should have been picked up. Given a certain, so when we've given enough time for the job queue to run, given that precondition, every single job should have been attempted at least once. Right, so you can basically look at the audit trail, look at, there is a column being maintained for each job, the number of times it has been attempted. So essentially that's what it's doing. Then the next thing is, no job should have been simultaneously picked up by more than one worker. So this is for ensuring that there are no race conditions. Because the SQL which figures out what job to pick up next, it's a complicated SQL. And as we speak, that SQL is being changed even further to introduce more features, which the community wants. So this sort of simple like one line test at the end of it all, that after running this entire system, observing its execution in the form of an audit log, at no point should a job have been picked up by more than one worker. This one, without the last one, without the function definition over there, CC violated, concurrency, there is another feature over there, wherein you can control how many concurrent jobs can the job runners run. And that can be dynamically controlled on the basis of your CPU load, on the basis of your memory pressure, on the basis of IOPS pressure. So again, looking at the audit trail, you figure out that at no point, the concurrency control should have been violated. Now these tests are still not perfect. We are still evolving this, but the amount of stuff that they can catch and the amount of confidence that this one test. So in my test suite, if I can quickly show you, it's not all property-based tests, by the way. Is this GIT URL? Let me, yeah, all right. So it's not, so if you see, there are these simple tests, these are example-based tests. So this is essentially just jobs should be created, jobs should be scheduled, failure, blah, blah, blah. One-one test, I didn't bother any more than that. Most of the bulk of the quality assurance is being done over here. I just wrote one test called test everything. It goes crazy running jobs, actually stress tests the system also as a result, and then ensures a bunch of properties on the test results. All right, moving on. I guess just three minutes are left. I don't think I can get into this particular example, but this is not from Haskell actually, this is from the Rails world. Once I started reading up and doing more on PBT, I started all of that learning and putting it back into the Rails world as well. So in this particular example, we have a SaaS platform where there's a meter usage, a certain part of the platform can be, it has essentially meter usage and it depends on a bunch of factors. And those bunch of factors grow combinatorially, right? So testing whether the meter usage, the billing code is implemented correctly across all those combinations of all those factors. Earlier we actually had example-based tests and we always ended up hitting corner cases which were not tested. So finally, again, using a test against simplified model kind of approach and randomizing the entire inputs, we wrote much better tests. So, David, I'm almost up, right? 415, this gets over, right? Yes, yes. All right, first time, first time when finishing this off on time. All right, so if you're interested in this sort of, you know, using PBT outside of the mathematical core data structure kind of environment, here's a bunch of stuff that I also read, has some interesting ideas. The first one is PBT in a screencast editor. This particular gentleman, Oscar, you should check out his, this is actually a paid PDF, that's a lean pub link. He basically wrote a screencast editor and wrote property-based tests for that. His journey is also exactly the same, struggling with PBT, figuring out what are my properties of my system, et cetera, et cetera. It walks you through the entire thought process and a number of times he came up with incorrect properties as well. Towards the end of his project, he realizes that this property itself is incorrect. His understanding of the spec is incorrect. His other project is even more interesting. It's called QuickStrom. It's using property-based tests for testing web-based UIs. It's still very much in a researchy kind of a phase, but it's very interesting, right? Actual real-life application of property-based testing. If he manages to make it work, it's going to be an amazing case study. This other talk, Thinking and Properties by Susan Potter, the third link is also a talk. Fourth one is a blog post written by the F-sharp world, not Apple, but similar ideas. That's about it. But thanks, Saurav, for sharing your experience with us today. All right.