 Our first talk in the knee track today. Today, we have with us Haki Benita. And he is a software developer and a technical lead. He takes special interest in databases, web development, software design, and performance tuning. He also writes about development and performance in his blog, which you can check out in the schedule in his bio. Over to you, Haki. Hi. Thank you all for joining me for my talk titled Taming Non-Determinism with Dependency Injection. I want to start by saying thank you to the organizers of the Europe Python 2021. It's a great conference, and I'm very happy to be presenting here. So thank you, everyone, for attending and for organizing this conference. My name is Haki Benita. I'm a software developer, and I'm also leading the development of a large ticketing and payments platform, which I'm proud to say is based almost exclusively on open source technologies, such as Linux, PostgreSQL, Python, Django, and the list goes on. So today, I want to present to you something that is very, something that I experience very often when I work with code, and that's non-determinism. So in this presentation, I'm going to present what non-determinism is and how to identify it. We are also going to look at some examples that hopefully can illustrate why it's challenging. And finally, I'm also going to present a few ways to handle non-determinism or, as the title suggests, how to tame non-determinism to control it. So dependency injection, whether you know it or not, is basically a design pattern that you can use to structure your code. So a lot of talks about dependency injection end up talking about libraries and packages and all sorts of dependencies that are all consuming and affect every aspect of your code. So I really, really tried not to do it, OK? Because at the end of the day, this is just a pattern that you can use in your code. And you don't have to bring in a lot of packages and dependencies in order to do so. In fact, you don't have to do anything special. You don't have to import anything special to tackle dependency injection in your code. So this is basically just Python. And I would even go farther and say that this is not just Python. This is just simple programming. This is how you structure your code in order to address different problems that may come up when you code. So this is what we are going to talk about today. And to do that, I want to present a very simple problem. And that's how would you write a function that returns tomorrow's date? So there's no trick here. Simple. How would you write a function that returns tomorrow's date? So a straightforward implementation, the naive implementation, might look something like that. You use the daytime model from Python. And you take today's date and you add one day to it. That's it, right? Simple. Now, how would you test this function? So that's the catch. It was a very simple implementation. But now, how would you test that? As a first attempt, you might do something like that. You just compare the function with the expected outcome. In this case, if today is the 28th and we call this function tomorrow, the expected outcome is the 29th. So that's OK. That's a valid test. However, it will pass today. But it will fail on any other date. So this is something that we call flaky test. So you might pass today, but then you come tomorrow and poof, it fails. Another approach for testing as a second attempt is to basically repeat the implementation. So this function asserts that basically this is what the function does. So this is not ideal. And in addition to that, we can't really test special cases, like what happens, cross here, or on a leak day. So this is also not a very good approach. So another approach is to try and patch daytime today. Because we understand that we can't really control the outcome of daytime today, so we can try and patch it. Now, if you try to do something like what I've done here, you'll see that you get this arrow saying that you can't patch built models written in C. So this is also not a very feasible approach to controlling the value of daytime day to day. And there are, just like everything else in Python, a third-party package or library that can do it for you. Some of the most recognizable ones are Fliesga and LeadfakeTime and Time Machine that can patch the daytime built-in model. And this works, but it requires a third-party dependency. Now, as we are going to see throughout this presentation, this thing that's making it so difficult to test this function can repeat in various ways throughout the code. So if you end up adding dependency every time you need to test something like that, you are going to end up with a lot of dependencies. So that's also not ideal. So why is this really so hard? Why is this simple function so hard to test? All we wanted to do is get tomorrow's day, right? And the problem that hopefully by now is clear is that we are having a very hard time testing this function because we do not have control over the outcome of daytime date to date, okay? So this is what makes it so hard to test this function. So let's pause for a second and talk about non-determinism. So to understand non-determinism, let's first define what a deterministic function is. So a deterministic function is a function that given the same inputs, always return the same outputs. Take this function as an example, okay? This function called add, it accepts two parameters, A and B, and return the addition of A plus B. So we are in Python, this is not JavaScript. Every time we send add one and two, we get the same result, right? So add one plus two equals three. Every time we send one, two, we get three. So this function is deterministic. On the other hand, if we look back at the function we used before, daytime now, we can see that this function does not accept any arguments and every time we execute it, we get different outcomes, right? Because if we turn to now, today's date or the current date, current date and time. So it's changing every time we're executing it so it's not deterministic. Let's look at a few more examples of common sources of non-determinism. So we have this function called add jitter, it accepts an integer and adds some random factor to it. So is this function deterministic? Will it return the same result every time we send the same value for N? So let's see, if we send one, we get this, the same argument, we get different outcomes. So this function is also not deterministic because it uses random, okay? How about this one? This one is a function called ink. It accepts an argument and increments a global counter by the value that we passed for this parameter. So is this function deterministic? Let's try. Let's call this function with one three times and we can see that every time we get different results. So this function is also not deterministic because it depends on some mutable global state, in this case, the counter. How about this one? This is actually a very common example. You have a function in this case called get username. It gets the user ID, opens a connection to the database, execute a SQL statement to get the username for this user with ID. And if there is no result, it turns none, otherwise it tells the user name. So is this function deterministic? Let's try. The first time we get username one, we see that the username is Aki and then someone maybe even outside the system deleted the user in the database. And now if we execute the same function again with the value one, we get none. So this function depends on data from the database. So this is also not deterministic. In the same way, we can check the status of some user in a remote API. In this example, I'm using requests with some remote API. And it's entirely possible that the status of the user changed in the remote system. And if I execute this function multiple times, I will get different statuses for the user. Now, another thing to note in this example is that even if the status of the user did not change in the remote service, it's possible that we have some network error and then the same argument can result in an exception. So it's possible that I'm trying to check the status of user number one, and then I get the status. And if I execute the function again, I get network error. So exception is also a type of outcome of the function. So this function is also not deterministic. And finally, last example of promise, this function check if this path is a directory or not. So we know that this can be a directory and I can change this to a file if I want to. So this function depends on the file system. So this is also not deterministic. So there are many sources of non-determinism. The ones that are very, very common are time, randomness, network file system in database access, as well as environment and some beautiful global variables. So basically, non-determinism is everywhere. If you look at your code, you'll see that you have many sources of non-determinism that finds their way into your code. So now that we understand that it's everywhere and we understand that it makes it very difficult to test a function, how should we deal with it? So we already understand that in our function tomorrow, we know that this daytime date today is non-deterministic, which makes the entire function non-deterministic, which makes it very difficult to test because we don't control the outcome. So instead of trying to patch this in different ways, how about we change the API of the function and yank out the non-deterministic part? So one way to do that is to inject the current date as an argument, as a dependency into the function and then this function adds one day to the date that we passed to it. So this is now tomorrow as of Sunday, okay? So now is this function deterministic? Let's check. If we take today's date, 28th of July and execute this function multiple times with the same date, we are always going to get the same outcome. So this function is now deterministic. Okay, we made it deterministic. The main benefit at the moment is that testing is now much, much easier. If you look at our tests, we can now test many different scenarios. For example, just to play next day, cross year, right? Leap day, not a leap day. It's very easy to test now that we can inject dependency into the function. So let's make it a bit harder because this is a fairly simple example. And let's say that we now have this function that does some heavy processing and we want to mark the time that it started and the time that it completed. Now, for the sake of discussion, I know that this is a very simple example, but for the sake of discussion, let's say that these dates are meaningful, meaningful enough that you want to test them. If this is just for logging or audit purposes, you might not want to go through the trouble of doing all of this, but if these dates are actually meaningful and you want to test them, then you now have the same problem because you don't control their values, okay? So using what we've done before, injecting some little daytime value would not be enough in this case because we don't know in advance when it's going to complete, right? So one approach here, if we want to control this in tests is that we can move this part into a function outside of our process function, and then we can use it inside the function process. Now, how is this better? We can now use a patch, okay, from the unitest mock. We can patch this function and have it return predictable values in the tests, right? And now we have full control over the outcome. So using patching, we solved one problem of being able to control the value in tests, right? So let's talk a minute about patching. So patching is great, it works, but using patching, the dependencies are not clear, okay? So if one person implemented the function and another one is using it or trying to test it, it can easily forget to patch the value of now because it's not clear from the interface or the API of the function that it depends on now. And also in order to patch the function, we needed to know something about the way it is implemented. In this case, we needed to know that the function is implemented and that it calls now and what was the path to this function that it's called. So this is very fragile. So can we use the patch? Can we maybe implement this function without using patch? So just like we inject a date into the process, we can inject the function. So this time, instead of declaring the functioning on the model and patching it, we now pass the function as an argument. We can even give it a type, callable that accepts no arguments and return the time. And now the function uses it internally. So basically we took this dependency and made it part of the function's API. So to use the function, we can just pass daytime now. This is what your production code could look like. And in tests, we can use a different function. Here I'm creating a function called fake now that every time you call it, the first time returns D1 and then returns D2. This is how it works. And now to test the function, I can inject this into my process and I have full control over the outcome. So before we injected a value, now we injected a function and we managed to ditch the patching. So we have a fairly nice solution without too many magic behind the scenes. Another benefit of embracing this pattern is that if you can actually inject a function into multiple functions and then you can implement end-to-end testing much more easily. So to summarize injection versus patching, in the injection, the dependencies are very clear because they are part of the function's API which makes it harder to make mistakes. If we take the previous example, the other person that is trying to use the function now is now forced to provide some function that produces today's date. So he's forced to address this in the test or in his code or wherever he wants. And since it's just a parameter, it does not break abstraction. It does not break any abstraction because we made it part of the function's API. So far we identified non-determinism. We eliminated it in tests by injecting values and functions. And as a result, we got full control over the outcome. We made dependencies clear and we made it very easy to test, okay? So now I want to take a slightly different example, one that you may know from your code bases and show you how you can apply this function, how you can apply this pattern to this problem. So let's say that we want to add some IP lookup capabilities to our web app, okay? We want to take a request, get the IP from this request and use that IP to find the country that this request probably originated from, okay? This is called IP lookup. So if you wanna do that, you might end up with a function like this. The function is get country from request and I call this function top level function because it orchestrates processed using different kind of lower level functions. So the first function that this top level function does is it extracts the IP from the request. I'm using Django requests here just to demonstrate but it's not very meaningful. So I'm getting the remote address from the request header and then I'm calling some remote API with the IP address and I take the data if it did not fail and get the country code. So I actually see these types of functions a lot when I read code and let's try to write a test for this function. Now keep in mind get country from request is like a top level, what I call top level function. You need to work your imagination a bit and imagine that this is very complicated function that does a lot of things, okay? So if you wanna test this function, I would go through this very briefly because this is a very, pretty much a lot of work for such a simple test but we use request factory to create a request and add an IP to this request. So this is our fake request. And now we use a library called responses which provides mocks for requests and we had a fake response for a get request to this URL and it returns status 200 serialized as JSON and this is the response. And now after doing all of this setup we can actually test the function and check that it returns the expected result. So this is a lot of work for a very, very simple test and the main problem in my opinion with this approach is that I wanted to test the top level function and I should not care about the implementation details of the IP lookup task, okay? I should not care about the URL structure, the status code, the serialization format. Just imagine that right now it's using an HTTP request. I might do it over a web socket or some other transport mechanism that response may be different, okay? I shouldn't care about all of these things. Another problem, another way of looking at it is asking if for example the URL of the remote service will change or if instead of comma case it will be title case. Should the top level function fail? So I think that it shouldn't. So what I like to do is I like to wrap this in a very simple service. So let's call this IP lookup service. We're going to see some benefits of using this approach. So I'm implementing an IP lookup service. I instantiated with a base URL. This is very useful if you have different environments for the remote API and then I implement the function on the instance of IP lookup service to get the country from the IP. So this is the exact same function. We just yanked it into a class. So now to use this function we can actually inject it into our top level function and the top level function can use it, okay? Now if I want to test this function all I have to do is provide the same interface, okay? Provide a class with the same interface but one that I can't control the outcome. One that does not make an API call to the remote API. So this is an example of what a fake IP lookup service can look like, okay? Except it's a country code and returns it. And now the tests look much simpler, right? We instantiate, we create an instance of the fake IP lookup service and then inject it. In this case we test two scenarios. One that the remote address is in fact from the US and another scenario where the request does not have an IP, in which case even though the fake IP lookup service will return US the top level function returns none because it did not find any IP. So now the top level code does not care about the implementation details of the IP lookup just the outcome. So before we move forward, just want to say that we cheated. We cheated a bit because the function I'm sorry, get country code from request it expects an IP lookup service but we sent the fake IP lookup service. So if you're using typing, using my pile then you're going to get some arrows. So to address this problem, what you can do is that you can define an abstract base class called IP lookup. And this abstract base class defines the interface that you require from an IP lookup service, okay? So now both of our services, the remote one and the fake one can extend this base class and use that for typing. So I'm using IP lookup service for typing and I can use either one of this and I can inject either one of these services into my function, okay? I've just not enough time left. So let's look at another benefit of dependency injection and that's switching implementation. So I actually had this service in my code and suddenly or one day I found that you can actually download the entire geo IP database locally. In fact, I found that I already have the geo IP database on my operating system because I'm using a certain distribution that already comes baked with the file. So I said, why should I do a remote request where I can just search locally? So let's implement a local IP service. So look how simple this is. You implement a local IP service that extends the IP lookup service. Now each service can have its own init logic. In this case, we provide it with the pass to the file and then we implement the same function but this time we use the database that we set up during init to get the name of the country from the IP, okay? So all we have to do now is that this function, instead of getting an instance of a remote IP lookup service can get an instance of a local IP lookup service. And this is where you benefit the most because the top level function remains completely unchanged and the tests remain completely unchanged. So we basically switched implementations without making any changes to anything that is not related specifically to this service. So that's great. So this is like the final slides. I know there's not much time left. So let's recap. Dependence injection, it makes tests deterministic which means there are no surprises, no flaky tests. Once you have the test pass on your machine, it will pass on any other machine no matter what the date, the time not, does not matter what random returns. It makes things easier to test. You can avoid patching and it's also harder to make mistakes. Think about the example where someone else is testing and using your code. It makes dependencies very clear because they are now part of the API call. I mean, the API, the signature of the function and as an added bonus, it's very easy to switch implementations. Now, I know that dependency injection can sound very intimidating. So I just wanna remind you, this is just a pattern and you can use it whenever it makes sense. You don't have to use it in an all-consuming way. You can just start by moving things like daytime now as an argument to a function which makes this specific function easier to test and move slowly towards proper dependency injection or full dependency injection. So just to mention, if you love projects use dependency injection extensively, the first one is PyTest, maybe the most famous one. If you ever use the fixtures in PyTest, then you know something about dependency injection and also Fest API uses dependency injection a lot which is I think a very, very nice API to work with. And this whole talk is inspired by an article I wrote. You can read more about dependency injection. It answers many questions, specifically the three that I listed here. So we can find some more information about dependency injection in this article. And this is it, right on time. So my name is Khaki Benita. You can check out my blog. I'm writing about Python, SQL performance and scaling in Django. And I'm pretty active on Twitter so you can look me up. This is my handle. I also send a newsletter once a month roughly with new things that I write. And if you have any comments or questions, you can send me an email. So thank you very much. Thank you very much. All right, so despite all the troubles we had in the beginning, this was a great talk. Thank you, Khaki. Hope to have you soon again on the next version, on the next edition of Eurobyton. Okay.