 Hi, everyone. Very happy to be here. Yes, like you said, my name is Stefan. I'm a tech leader at Yelp. And in this talk, I'm going to take you a little bit on a journey of testing services at Yelp. And actually, this journey started, or rather me talking about it, started already two years ago because I gave a talk two years ago at EuroPython. And it was also about testing, and it was also about testing services. So just a quick recap, like what did I talk about back then? If we look at the testing pyramid, my talk back then was really focused on the top end of that pyramid, or what is supposed to be the top end. So on end-to-end tests, or what sometimes are called system tests, inside Yelp, we often call them acceptance tests. And these are the slowest and most expensive tests to run, since typically what you do is you spin up each of your dependencies, and then their dependencies, and so on, and so forth. And that is obviously not always a fast or cheap thing to do. So why do we do this? Why was I talking about just these tests? The reason is this, because when you have many services and we do have many services at Yelp, they're all interconnected. You have this huge graph. Your code is split across repositories. When you execute your code, it's separated by a network layer. And so actually verifying the correctness of this whole system is no trivial task. And end-to-end tests help significantly in that effort, because you do try to replicate something that is as close to production as possible. And so you actually gain confidence that everything works well together. Obviously, it would be better to use something like unit tests more, but the problem with unit tests is that as long as soon as you test something that interacts with an outside system, you have to use mocking for that, and especially with something that can get complicated, like service API calls. It's not always easy to make sure that you get the mock right. So sometimes a test can fail, and you don't know is your code wrong or is the mock wrong. Otherwise, a test might pass, but actually your code is still buggy. You just adapted the mock to fit your buggy code. So that's why our testing permit doesn't really look that much like a permit. It's more like it's a lot of unit tests, and then maybe a few integration tests, and then again, actually quite a lot of end-to-end or acceptance tests. And that's why we invested in making end-to-end tests easier to write, easier to run, and that's why I gave a talk about it, about, well, running end-to-end tests more quickly and how to manage state in all of these other services that you're including in your tests. But it turns out, they are not just slow these end-to-end tests. They can also be flaky. This is an extreme example here, but yeah, we were actually running into issues due to us spinning up all of these other services in Docker containers, needing to create state in some of these other services. So we had Docker failures, we had request timeouts, and we had high hardware requirements. So in this extreme example, the box we used to run the test was just simply too small. But still, this is not something that scales infinitely since we also have a large monolith. And it turns out, yes, it's almost always used in some form. Again, a lot of other services. So we actually continued our investment. How can we make this run better? And here, this is a different view from Jenkins. We are actually able to run these acceptance tests end-to-end tests in a reasonable amount of time and with no errors. And we use MountBank, which is an open source project, to do this. So what we do is we record, like when we do these full test runs with all of the dependencies, we record the communication between our service and the other service, or actually within all of the services that we spin up. And then we can play them back. And we actually even, I think like many companies do this, we even augmented it a bit so that when we record these request end responses, we know about the context, so which test is this. So we actually don't rely on executing the tests in order. We can actually paralyze them or shuffle them around. And it will still, when you call a service 10 different times from a 10 different test, it will always return the answer within that test context. And I would say this is actually what, for example, Martin Fowler describes as contract testing. It was self-initializing fakes, how he calls it. And it does eliminate most of the flakiness that we saw due to timeout issues. There are still some Docker flakiness there. Plus, it only helps when you can do this in replay mode. Record mode still has all of the problems I mentioned. And we do need to run tests in record mode, actually, at least once during an engineer, like doing a feature branch. Because if you change the behavior of your service or the pattern of services it calls, you typically have to re-record these interactions. So maybe making end-to-end tests better and better is not the solution. What can we do? Contract testing. There's, like, many articles and papers about it. I just picked this one definition. Contract testing is writing tests to ensure that the explicit and implicit contracts of your microservices work as advertised. And in this talk specifically I'm going to focus on consumer contract testing. So testing the contract from the consumer, so the caller of another service. And oftentimes, like I described, people mean or talk about contrast testing. They say, like, testing against a real instance or like a real instance that has a special isolated mode that doesn't require any of its dependencies or maybe a stub or a fake. But what if we could, like, use in-process testing like unit testing, so with marks, but with validated marks? So we are actually able to validate that the marks we provide adhere to the contract. And this is what I'm going to talk about. It turns out that at Yelp we can already do that. And we could actually do that for each and every request that is done between two services. To explain that, I'm going to talk a little bit about how we do service to service requests just very quickly. And Yelp, each service has an API specification. And this will be our contract. Or this is actually the contract. We use OpenAPI or Swagger, how it has been called for a long time for that. But there's, like, other systems as well that you can use just as well. And when we talk to another service, we have a client library. This client library is, for each destination service or producing service, we have a client library. It is versioned. And it ships with a Swagger specification or OpenAPI specification for that service we're calling. So each point and time, whenever this specification changes, we create a new version of the client library. And then people can use it. And when you do make these API calls, what you really make is Python function calls. And then this client library does all of the magic in the background. We actually, like, the main part of that client library is an open source library called bravado that you can check out that we maintain. So how does that help us in having, like, fast testing that provides a similar confidence like end-to-end tests, at least in some respects? Well, if we look at how internally this client library works, there's, like, several steps that are executed. First of all, it takes this function call and from the information encoded in that function call. So not only, like, the arguments that we pass in, but also the name of the function, it constructs the request by looking at the API specification. It then marshals the data, which is the process of converting, like, Python objects and data into the on-the-wire format, which forces JSON typically. And then it has the option of actually validating that marshal data, like, that on-the-wire data against the specification and making sure it is valid. And then it sends out the HTTP request. It comes in and it goes, like, the other way around. So this response, because there's, like, you can actually use different HTTP clients. So at first, it's adapted for bravado, then optionally, again, validate it, and then on-martial, so transferred into Python objects. So what could we do here? What if, instead of, like, when you typically use unit tests, you would, like, mock out the whole function call? Like, you have this function call that translates to making a service request. You mock it out and you provide a return value. What if we didn't do that, but we let most of our client library machinery actually run? We just made sure it doesn't call the remote service and we provide, the mock we provide is, like, the on-the-wire format for the response. And then we let it run through the validation. And the validation is actually done by a library called SwaggerSpecValidator, which in turn uses JSON schema. So it's actually, like, pretty good at making sure the data is valid according to our contract. And since we know the contract, what we can actually do, we couldn't even provide a default mock for free. So developers can specify it, but we look, like, we look at the structure of the response. We know all of the fields, all of the values, all of the sub-objects, and we can provide the default data for it. So if you look at a code example, as you can see here, all we need to do is we need to patch the client lib. So that's, like, one function call we do, which is actually a context manager. And anywhere in our code where we use this specific client library, it will just be replaced with our mock machinery. And then we call our function here, which you can all assume, like, does this service call and you can see it returns data, even though we never specify the mock. And yes, it's, like, initialized to some default, in this case, for integer zero, for strings, it's the empty string, and so on and so forth. It knows these names for the days because that's, like, an enumeration, actually. Yeah, and that is it. And it turns out, in the open API specification, you can even provide, for each response, you can provide an example response. And if that is present, we actually use that, so we don't use, like, zeros in empty strings, but the developer of the other service can provide an example response, which will be used. But of course, there's many test scenarios where you don't want some default data, so obviously it's also possible to provide your own data, like in this case, you can see, I set the return value and also the HTTP status code that is to be returned. You can also, like, you know, from mock because internally, we obviously, we use mock. You can also provide a side effect, so you can actually do almost anything, raise an exception or whatever. Yeah, and this works, but this is actually not different from just patching on your own and doing mocking. So what happens if the return value I provide here is not valid according to our contract, according to our service specification? Well, what happens is actually not that impressive, but I hope you can read it, but basically you get a JSON schema error in this specific case, the error I produced, I provided a response to specification said in the response, there should be a field named ID and my response didn't have it, so there was an error. Same thing if I said like, the ID field is supposed to be an integer and I provide a string, it would also complain. This is actually something that we deal with internally because obviously internally, IDs are integers, but whenever we do external APIs, then we provide them as strings, so there's always the potential for confusion there. Yeah, and this means like your marks will actually be validated against your contract and you don't have to spin up anything, you don't even need more than one process for testing and it is extremely fast. So let me summarize this a little bit and still talk a bit more about it. End-to-end tests don't scale infinitely, that I mentioned that, so we need a solution to that. Contract tests are supposed to help here, I didn't get into too much detail what contract testing actually is, but what I mean by that at Yelpus is that we can leverage existing infrastructure to do consumer-driven contract testing and essentially what this means for you as a developer, you get way more confidence in when you do the testing you're used to from unit testing as opposed to doing just unit testing. So the goal here was and is to gain way more confidence in these cheap tests and so having to do fewer of those expensive tests that are expensive to run and also sometimes a pain to develop. One thing I'll mention is that we are relying on the fact that our service, the service we're calling and we're testing against adheres to its specification. So if that service does a backwards incompatible change, we won't detect it. There's other tools like PACT or Postman actually has no framework for doing contract testing that deal better with this and that not only like do consumer-driven testing but then also do this testing against a producer like the destination service. This solution here doesn't help with that but we have developed a tool that is supposed to help you detect backwards incompatible specification changes because that is another thing. I actually talked about this in the past as well. Here at EuroPython, that's another pain point. Developers inadvertently making changes to API specifications that are backwards incompatible. So if you're interested in that, you have something that is called swag aspect compatibility that's open sourced and that you incorporate to hopefully make sure your service remains true to its contract. And that is it already. I'll mention that we are hiring and we are sponsoring so please find us at the booth. We'd love to talk to all of you. We do have European offices at Yelp in Hamburg, Germany and in London. And thank you for your time. Love to hear your questions. For questions, please come to the microphones or raise your hand and I can bring your microphone. So maybe I can start it off. How do you deal with the consistency of values? So in the Open API spec, if you give you types and structure, but how do you deal with values? Especially consistency between different fields in a reply. Yes, that's a very good question and that's exactly one of the limitations of this approach. So when you read about contract testing, it's not about the specific values, so this won't test it. It's not, you're not able to, well in Swagger Open API, you are able to set some restrictions for values and so we will actually test for those. But yeah, like fields that depend on each other, strings that have like an elaborate meaning like locales or something like that. It won't be covered here. You'll need to do more elaborate testing for that and maybe you need something that is more like an end-to-end test to do these sorts of verifications. Hi, so I was going to ask about the Open API spec and I noticed you probably use Swagger, which is opening API spec to zero, right? So you have a chance to actually ruin something for my company because we just finished writing our client library system for Open API spec three zero and last time we look into Bravado, it didn't have any chance of it appearing there so do you know anything about plans in Bravado to actually support the newest Open API spec? Yeah, that's a good question. Unfortunately, currently there are no plans. So if you want to use the newest spec, there's other tools out there. There was actually a talk by, I forgot the name, like there's actually people like doing great Python development for Open API three. So there's like, I think there's tooling like Bravado. I don't know if it's as well developed. There's official generator, but like no one likes code generation, so yeah. Yeah, and I think somebody also did like a pyramid integration for Open API three. So yes, if you require that, currently at Yelp, we don't have tooling around that. So ours is still about the version two of the spec. Thanks. You mentioned that you record your requests and responses in end-to-end tests. We actually do a fairly similar thing when I was just, I'm just curious about your opinion. So essentially as part of our tests, we run tests against the API of each service using a tool called Hoverfly. I don't know if you're familiar, which essentially records requests and responses as well. And as part of our build artifact, we publish the recordings of that service of requests and responses, which is then consumed by other builds. So in our continuous integration platform, when one service is built and changes interface, its recordings will change, and any service dependent on that will pick up the recordings and run its sort of integration test against these recordings. Have you tried something like that? Or what's your opinion? I think that's a great idea, no. So up to the point of sharing that between services, that's pretty much what we do as well. But we haven't looked into sharing that between services yet. Because what we did is, like I mentioned, we actually have stuff, we added stuff to PyTest, so with each request it sends the context of the test so that because oftentimes you do multiple calls, right? But yeah, I mean that sounds like a great idea to like further enhance the system. It does take several processes though, so that essentially you spin it up as a separate docker container and use it as a proxy that then records or mocks out any replies. Yeah, I mean that's the same for us. Like the test recording and replaying that's still multiprocess for sure, yeah, absolutely. Okay, great, thank you. Any other questions? Thanks for great talk. Have you considered Protobuf instead of your code by yourself library? It's made client. I mean Protobuf can do client and without any efforts. Yeah, so that's I think mostly historical decision. Please, I hope I'm answering the question correctly. Your question was about like why don't we have like a different system for generating clients, right? So that's absolutely something we looked into and that we might actually do in the future. So the way Provado works internally is actually like very dynamic, whereas inside Yelp we have switched to these static client libraries that still use Provado internally but actually don't use the dynamic nature of Provado anymore. So that's definitely, I think that's mainly due to the effort involved in changing that, why we haven't done that yet, but that would definitely be like you suggested a great improvement. Hi, thanks for your talk. Do you use any strategy for sharing schema and enforcing schema between different microservices or to implement API guidelines inside your company so that all those processes could be simplified or streamlined? I don't know about simplified or streamlined. What we do have is a central repository where each version of each service specification gets posted to. So you can imagine that as really just a version git repository. And whenever developers make a change to an API specification, a new version for that service will be posted there. So we definitely have the tooling for you to interact with these API specifications and figure out what changed and things like that. That's actually how we implemented the tool that detects whether you're doing a backwards and compatible change. It actually looks at the code as it is now, just like the YAML or JSON for the spec and then Fetch's previous versions and things like that. But in the end, it doesn't really, for example, it doesn't help for making sure that all of our API specs are consistent, things like that. Any more questions? Not the case. So let's thank Stefan again for his nice talk. Thank you.