 Hi, hi everybody, thank you for coming to my talk. My name is Dmitry and I create software that works in some cases, or last few years I'm working in the field of API testing. I built tools that aim to make it a simple and effective process. For example, I implemented one of the most performant Jesus schema editors in Rust and created schematics, a project that automates API testing to a single command. Also I contributed to popular packages like hypothesis and Django Mani. Today we are going to talk about effective API testing and let's start from the opposite. What happens if it's ineffective? First of all, logical errors. For example, if you had bad input requirements, then for example, the user can top up for a negative amount of money. Then all data API docs can be a huge source of confusion. Teams working on the project are out of sync and it becomes harder to grasp what's going on. Security issues are also possible. Last year's research from Microsoft shows that at least four security properties should be in place, but of course there are much more of them. For example, if your input validation doesn't work well, it's possible to create some payload that will slow your application down and eventually it may lead to denial of service. And of course unexpected inputs might lead to crashes. Besides all of these issues, we have a lot of time wasted, higher cognitive load and potentially even financial losses. But how did we get here? Modern systems are complex and to test them, well we need to balance resources, time and system complexity. And as practice shows with all the trade-offs, the Murphy's law works. Anything that can go wrong will go wrong. So we somehow understand the problem and how we got there. Here are some common solutions. I've heard things like less hire more people, less overtime or less just buy more powerful hardware. And I think that is quite a superficial way to solve this year. Hiring more people increases the amount of communication. Fred Brooks formulated it a long time ago. What one programmer can do in one month, two programmers can do in two months. Then other times, like the law of diminishing returns generally holds here. The more you work, the less productive you are and even worse, it often leads to burns out. Then better hardware, maybe but it's better to know why exactly you need it and if it will work in the long run. All these practices may work short-term, but all of them will likely to have some negative consequences if applied blindly. But a good thing is we tend to formalize testing processes. It gives us our pressure, precious structure, but people are incredibly good in forgetting things and we are still limited by our imagination. Nowadays there are many tools that can automatically generate schemas or documentation from the application code or vice versa. It surely removes a lot of headaches, but often it doesn't work well. For example, the fast API framework doesn't always generate fail-to-open API schemas and so on, there are many cases of that. And there could be a more effective solution. It's better to start from the design stage. So we try to prevent some issues in the first place. We need to reduce the costs because our time is the most valuable resource and we need to automate as much as possible, which is somehow connected to costs. My opinion of most important principle in API design is to make it hard to miss use. And here are some example where we can prevent problems. Let's take a look at the, one of the most popular serialization for us, YAML. Here is a list of countries, but when we try to deserialize it, we might get not what we expected. It also applies to strings like yes, no, now and so on. So these we need to escape strings and then it's parsed correctly. And then maybe we can just use more reliable information from us like JSON. At least we will know that there was no transmission errors and we don't work with a partially written schema. We can also prevent typos. Here we have a schema that accepts an object with the success key and expectedly these value passes, but success result the last letter will also work and it's not what we might expect. So to solve this, we can add a list of required fields here or in this case, like everything will work as expected, no type of one possible addition, or alternatively we can restrict additional properties and this key will remain optional. So by balancing strictness and flexibility, you can prevent some erroneous inputs. Another source of problem is regular expressions. Do you remember the Cloudflare outage from 2019? It was caused by certain kinds of regular expressions that are very resource-intensive. For example, these regular expressions took about one and a half minute on my machine and I bet that a string of 50 letters won't be processed in any reasonable time at all. So please pay extra attention to regular expressions and if possible, use alternatives so you can express your API constraints or use regular expressions engines that grant you slender complexity. So now let's take a look at how do we test things. For example, we can check that the community property of addition works. With the example based approach, we can write something like this in Python. That's a function that accepts two arguments and revise the property and we have our test cases with PyTest. So we write them manually. Of course, it's super straightforward. But again, we can test only for things that we can think of and often many H-cases are missed. In the end, we need to maintain all these when you are in test cases which also adds some maintenance costs. There's a different approach that looks at data and generates test cases instead. The most popular library for product waste testing in Python is hypothesis. In this example, we test, we reuse a test function and add some data generation strategies from hypothesis and define our numbers as integers or flaws and connect them with the test function arguments. So the run might look like this. So running this quickly reminds us that the community property for addition works only on real numbers. That's something that we might not expect initially. And this property-based testing, you will have a high variety of input data which is in many cases can be inferred from input types. Then there is test case modification called shrinking. You will have a minimal examples instead. Also, often it expands your understanding of how your code actually works by showing you many, many corner cases. But generally, it requires some guidance. You need to define your data generation strategies unless it's kind of being automatically inferred. These two approaches work perfectly together. Use symbols and getting back to the API testing topic. Actually, there are a lot of properties we can use for a property-based testing. For example, we expect that our API won't crash. There will be no unhandled errors. Then in general case, we expect that the input should match the schema and it should be accepted and errors should be rejected in general. Of course, there are some general limitations to it. Then we expect that responses match if there's definitions and all examples are working. And so on, there could be much more of them including expected response time or authorization on certain endpoints and so on. So sometime ago, I started working on a tool that combines API schemas with property-based testing. It's called schema disease. And here's how it works on a toy example. Let's say you can create a user and need to pass a name as a payload. So something like that. Then schema disease-stake the API schema creates hypothesis strategies for all API operations defining the schema. It makes progress to the API and verifies responses. So it supports three different schemas. Supports open API two and three. And also we support GraphQL. And generally you need only a valid API schema to make it work. But actually not necessarily valid. Here's some statistic on that. There is a project called Open API Directory which is a collection of various API schemas. And at the time of writing, they had 3,225 Open API schemas which are mostly syntactically and semantically valid in terms of individual keywords. Some of these schemas are incomplete. They contain references to other files that do not exist. But schema disease can process the most of them including semantically invalid or incomplete schemas. So if the issue does not affect the data generation, we can work, no problem. And in terms of API operations, schema disease works on more than 97%. The rest goes to some cases for recursive schemas with some page cases I would say. Logically unsatisfiable ones where you just can't find a fitted example. Fitted in example. Also to non-platin regular expressions and some similar issues. But there is a room for improvement for sure. Let's see how tests look like. Here I will show you a small Python test that utilizes our PyTest integration. You can load your API schema from the network file or dictionary or any readable buffer. Then this test will exercise all endpoints with built-in checks. Schema disease has five built-in checks at the moment. They cover server crashes and API schema components but you can easily write your own. Running this example, you can run it with PyTest. It will show you, in case of error, that there is a failing operation here, post-users, and it can record multiple failures per operation. So here's an error message, response payload and a Python snippet that will help you to reproduce the failure. Soon it will be possible to have a curl command instead. Option. So these are unit tests. They are usually very fast to run. They are pretty easy to customize. You can use regular PyTest fixtures, like what is strategies here are regular Python things. They are pretty easy to integrate with the rest for the same reason. And Schema disease will also test explicit examples defined in your schema. Maybe if you use dreads, we will do the same thing but also you will fill into places in these explicit examples. So you don't have to write them completely. So here's an example of payload that test generates on the previous slide. Or if you don't want to write any Python code you can use our command line interface. This command will run unit tests for all available operations in the given schema. There is also a Docker container for your convenience and the loader for command line options to tune the behavior. And Schema disease also can generate sequences of API calls. Let's extend our example API. So we create a user and we get an ID back. Then you can get information about this user by passing this ID as a path parameter to another operation. And you get some information back. Schema disease utilizes a very cool open API feature called open API links. Here you can specify how different operations are connected. Let's say when you get a 201 response on user creation then you have a link to another operation called get user. And you need to pass the user ID parameters there. We should extract from the response body by using JSON pointer syntax. And basically you extract the ID key value. So Schema disease will automatically generate API calls like this. And we try to create a user. It will take, if it's successful, then it will take the ID and it will call connected operations. The more connections you have, the more different random sequences of API calls will be generated. Use all dependent calls in place. For example, if you have a connected update operation then the created user might be updated before the get call or vice versa. And there are like all combinations possible. Let's take a look at how we can write a test with our Python API. The implementation is based on hypothesis state machines that describe all possible transitions within the schema. Let's say you already loaded your schema as with the code from the previous example. Then you need to create a state machine first and expose a test class, which is a regular unit test test class. Then you run it with spy test. And in case of errors, you'll see a report like this. Error message, response payload, stuff like that. And additionally, there'll be a Python snippet that you can basically copy paste or produce the failure. Usually it's pretty verbose depending on your API complexity, but you should be able to run it as is. So integration tests, they are slower because they utilize features like swarm testing that is laughably effective in finding bugs. It exchanges some performance to higher found defects rate. You will need to specify connections with the open API link syntax in your schema, or you can do it programmatically in your test. There is an API for that in schema thesis. But it may uncover bugs much deeper in your application. The statistically are almost impossible to find without this approach. So why you might think about using schema thesis? If you want to pay less efforts for finding bugs, the simplest way to use it is to run a short CLI command. It may find a lot of complex problems and edge cases. For example, schema thesis knows how to maximize arbitrary metrics like response time or response size thanks to hypothesis targeted testing. And it allows you to find denial of service attack and amplification attacks. So also you'll have a great diversity of generated examples that will cover a lot of valid and invalid test cases specified in your API schemas. I will tell you about invalid cases in a bit, but generally you can easily generate custom data like CSV files and so on. There are five built-in checks as I said, they allow you to check the API schema for compliance with implementation. For example, of status codes, response schemas, required headers, content types, and so on. Here we have a pretty flexible data generation. You can use any existing hypothesis strategy and tune individual request components like body query, headers, and so on. You can specify your explicit examples or actually, schema thesis will fill the gaps. There is a command line interface, you can record your tests and rerun them later with VCR style cassettes. For Python users, you can, it's possible to test ASCII and Whiskey apps directly without hitting the network. There are a few teachers in progress at the moment are working on negative testing. It'll be possible to exercise negative testing scenarios or you can create, it will be possible to automatically generate e-smail or binary formats like images and so on. And at the moment, I'm involved in academic research about the API fuzzing, fuzzing in operation with SAC, who is a hypothesis core developer. And the main question that we want to answer is whether this is effective at all. And the short answer is yes, but actually the long version is that it depends on many factors, including schema complexity, how precisely it defines inputs, and so on. We are still running our experiments and evaluating the results. So this unit includes around 20 open source projects and eight different tools, including recently open sourced Microsoft's wrestler project. These results are in progress and cover only a subset of what we planned. This part represents only schematysus, which works in all, on all test services in a reasonable amount of time and hardware resources. So far we tested 18 services and there are around 1,000 API operations of which we crashed around 20%. These projects are pretty popular, I would say. And found various non-conformance issues on around 30%. Some of the crashes are caused by the same application called pass, as well as some non-conformance issues are caused by the same schema components. And the duplication and categorization is in progress. These issues have different severity, of course, but they all become more visible with schematysus and it's up to project maintainers on how to handle it. But anyway, I plan to report them in the future and open-source the test and the self. So here's a little glimpse of what will happen next. Maybe we will put hypothesis trust. I'd like to say that generally speaking, hypothesis handles general case all standardly well, but for complex data structures like complex API schemas, there are some limitations. There's already some work going on. We have a core engine that is written in Rust, but it's used only in the hypothesis for Ruby version. But the long-term plan is to make it usable for the Python version as well. In the end, it will benefit the whole local system. Also, I'm building schematysus as a service where you will have dynamic recommendations on approving your schema, it's much faster. We will have automatic links inference, so you don't have to use open API links. There will be testing API callbacks, repor schedules and usual things for a seamless CI integration. So to summarize, keep in mind that a good API is hard to use in the first place, that you can prevent many problems upfront by designing them right. Property-based testing is effective for finding defects in web applications. And try out schematysus, let me know what you think, and please consider supporting our work. Contributions and donations are welcome. Here are some links. You can find schematysus on GitHub. Feel free to write me an email or ping me on Twitter. Here's my handle and schematysus as a service will be available here. It's all I have. Thank you. Thank you very much for having me. Thank you very much Dimitri. That was an excellent talk on an important topic. We had one question in the Q&A, which was from Kasai, I apologize if I mispronounced that. So was this only for testing of Python apps? I know one other person put a comment in there, but I'd like to hear your view in the two minutes we have left. Definitely not. You can use it with any application because basically the schematysus is written in Python. It is partially, but it works over the network. So you can run a Docker emission schematysus and put your API location as an input and it doesn't matter in what language your API is written in. And you mentioned a container so that people that are not Python programmers can do that easy way for them to onboard and use schematysus. Yeah, that's right. Also, you can use Microsoft Raft, I believe it's called. It combines multiple different tools and it includes schematysus and they use our Docker images directly. So yeah. Okay.