 Hello, so here we're going to talk about testing in Rust, but specifically with a focus on mocking. So just a little bit about me. I'm a software engineer that works for a quant H1 called engineers gate and my time is split between building real-time low latency training systems and also scalable data infrastructure. So I'm primarily a Python C++ and Rust developer, although one day I'm hoping to kind of scratch out C++ and just replace that with Rust as I'm sure many people here are. So the motivation for this talk is that Rust focuses on safety, on memory safety, on safety when running concurrent applications. And it does a great job at this without compromising on ease of use of APIs and whatnot. But even if our code is safe, we still need to make sure that it's doing the logically correct thing. So what we're going to cover here is an extremely brief introduction to running unit tests in Rust, then about running behavior verification tests in Rust using a crate called double. And then we're going to talk about some design considerations because actually generating mock implementations in Rust is particularly challenging because it's a statically typed compiled language and the borrower checker makes things even more difficult. So unit tests. For those of you haven't actually wrote any kind of Rust before, you create a library with Rust package management tool cargo, cargo new. When you create a library, for example, some lib, it generates a single source file and this contains this bit of code here, which is just an empty test case. So if we had some production code, say in this lib.rs file, here we have say add to, it's just add to to an integer. We can write tests for this function by defining a private test module in the same source file and then annotating it with CFG test, which tells cargo to only build this code in the test build, not in the production binary. Then you write various test functions inside and you annotate in each individual test with test and then when you run cargo test, it builds all the production code, builds the test code and actually runs all of those test functions. Rust has native support for documentation tests. So running tests in that are actually example code that are using your documentation to make sure your example code and your library don't actually misalign as well as integration tests. But again, the focus here is on unit tests and mocking. So the motivation behind mocking is that if you imagine any software system, it's basically a dag of various components. And if we wanted to test one of these components, we have many, many dependencies or may have many dependencies. In this case, this top level component that's highlighted in red, if we were to test this, we have to actually construct, instantiate and configure all three of their dependencies and then their dependencies as well. So suddenly if you just wanted to test like five lines of code, suddenly you're writing dozens if not hundreds of lines of setup code. So the solution to this is to simply mock out or create fake implementations of the direct dependencies to simplify the actual overall test fixture itself. We typically eliminate anything that's non-deterministic that can't be reliably controlled in a unit test, data sources, network connections, potentially libraries that have global state or something or some horrible library at the end of using. And you can also eliminate large internal dependencies as well. There's advantages and disadvantages to doing that. But if you've got particularly large complex dependencies that take a long time to set up, you can eliminate those as well. So the solution to this is to use the test double and that kind of comes from the notion of a stunt double in films. And I'm surprised he actually managed to get away with being the stunt double of Brad Pitt. So a test double is basically a replacement for any actual real production code that behaves the same way but is easier to set up. And there are many types, but most people just refer to all of these doubles as mocks. In this case, what we're actually covering is spies, which is a specific type that you can configure to behave in different ways, but it also records all the interaction that the code on the test has with it, all the times it was called and what it was called with. And what this is called is behavior verification, which is a style of unit testing, it's a style of writing tests by testing the code, by searching on its interaction with its collaborators or its dependencies. So in Rust, we can generate testables in a variety of ways. Two ways of doing this with the double crate, for example, is by generating mock implementations of traits and also generating mock functions as well. And you can flexibly configure the various behavior, what it returns, does it run a function, does it error, as well as simple but also quite nuanced assertions about how it was called and how it was used. So classic example in the kind of field I work in is, let's say we were trying to predict the profit of a stock portfolio over time. Now imagine we have some traits called profit model. This has one method, profit act, which takes some time stamp and then will return the profit at that given time stamp. We have a function called predict profit over time, whose goal is to generate a time series of profits. So we give it a start and end time series. The model itself, which can be any model, and then we simply iterate through all of the individual time stamps, generating the profit of that time stamp and then returning a vector of the profits. And we want to test this function. Simple function, but for illustration purposes, let's test it. So we'll call that our test should be repeatable and not rely on an external environment. However, this is very challenging in this case because the profit model is a very complex beast. Predicting profit is really hard. So real implementations actually use a lot of different data sources and a lot of very complex mathematical models. So if you just want to test this simple basic code here, you probably don't want to have this ginormous setup of what is extremely complex proprietary code. So we mock it. And in Rust we do this with two macros. The first is mock trait. So mock trait generates a struct that has a bunch of boilerplate and bookkeeping inside it to keep track of how it's been called and how it hasn't. So in this case, we have this trait. We call mock trait. This is the name of our mock struct and then we list the methods in the trait that we're mocking so it generates the right boilerplate code internally. Then we have to explicitly tell the Rust compiler that the mock model implements the profit model trait. So we have input profit model for mock model and then we have the mock method macro inside it which again will generate the actual real profit app function which cause our internal struct. And so that's all the code you need to generate a mock that has all the features that I'm about to explain. So actually using this, if you imagine you wanted to just say run through three timestamps and assert that the profit over time, the time series generated was correct and that the profit model was used in the correct manner, we instantiate the mock by doing mock model default which just created default initialized mock model. We say profit app return value 10 and then this thing will, the profit app foot method will just keep returning 10. So we see that the actual time series it returns is 10, 10, 10. And we can make assertions at the end of the test how the mock was called. So in this case profit app dot num calls is 3. And there's various ways you can set mock behavior. If you don't specify anything, it just uses the default value of the return type. We can set a single return value for all calls like we did before. We can set a sequence of return values like 1, 5, 10. We can set return values for specific arguments. So we can say for timestamp 1 return this, otherwise do some other default behavior. In this case return 0. And we can even use arbitrary functions as well or closures. And the benefit of doing it like this as opposed to just writing a mock implementation that has that code in, like manually writing a mock implementation is that in this way you get all the boilerplate code generated so you can do these types of assertions that I'm about to discuss. So once you've configured the mock and how it's supposed to behave you want to assert that it was used as expected. It was called the right number of times and had the right arguments. Now there's fairly loose assertion. So you can say the mock was called at least once. It was called with 1 and it always called with timestamp 1 and also timestamp 0. But often you actually want to tighten your call assertion. So you might want to say not only do you care that it was called with 1 and 0 but you also care that it never had any more calls than that. So it has caused exactly 1, 0, 2, which will pass because in this case we're passing timestamp 0 to 2 inclusive. And then you can even say it has caused exactly in order. And it's kind of up to you how tight you want to make your assertions or loose. And that's something I'm going to discuss in a second. We can also mock three functions. So for example if you're passing in some function, a box function for example for runtime polymorphism, you can pass that in as well. You can generate something for that as well, sorry. So here we have mockfunk. We specify the actual mock object that stores all the bookkeeping, the actual function itself which is just a closure, and then the return types and argument types and whatnot. And then you can specify just like you would with the traits, behavior it will return 10 and also specify how it was used. So say we had some function that we expected to call the function we pass in twice, then we say that it was called two times. So that's all well and good. But there's some serious disadvantages to mocking if you're not careful. So let's talk about another use case. So imagine we were trying to test how a robot makes decisions. So suppose that we have some robot logic and this thing takes some world state. So it has some like perception of what the world looks like. And this is a value type, just a basic struct. And then we have a robot which takes some internal state and then decides what to do. And once it's decided, it acts on those decisions by actually calling this actuator component here. So saying I want to move forward or I want to speak. In this case, suppose that we wanted to test this robot logic. This is a very complex logic and it's the kind of thing that we want to poke and prod. So we want to mock the actuator in this case because if this world state is just a simple value type, we can just construct that as a struct and have various different unit tests with various different world states. The robot is the complex hard part. And the actuator, if you imagine that this robot was, say, an entity in a video game or like a renderable entity, or it was an actual physical set of hardware that you were sending orders to, obviously that's not very tractable to test in an automated fashion. So we mock this out. And again, if this actuator was, say, a trait, so for example, it could have many actions, one of them being move forward. Obviously it would realistically have lots more nuanced actions, but for the purposes of illustration, it has move forward and it moves by forward by some amount. We mock this as we did before. We use mock trait and then we do import actuator for mock actuator. The robot itself, in this case, takes a reference to the actuator and then it has this take action function which receives some world states. Again, we don't care about the exact contents of it at this point, we just receive some world states and then there's some business logic on how the robot decides what actions it should take and this is the thing we want to test. After we've done that, we can test the robot by, again, instantiating a mock actuator. We run the code on the test, which is take action and then we assert on what actions the robot actually took. One thing to note here is that realistically you would have many, many, many different test functions or with different world states or to test and poke and prod and test how this robot works. Now, the issue with this specific example is robots in particular on decision making is quite complicated and it's quite nuanced. Do we really care that the robot, in this case, we're asserting that the robot moved exactly 100 times? Is that something we actually care about? Do we care that it's moved forward a little bit or within a range? We might not actually care that it's exactly 100 units. So if you imagine a space of all possible behavior that the robot can take or the code of the test in a more generic fashion and this is the expected behavior, what we've just done is we've tightened the assertion way too much so now we've artificially constrained the test. So if someone decides to slightly change the implementation of the robot and how it makes decisions, instead of moving 100, it moves, say, 120, that might still be within our realm of expected but it will break the test because it's moved out of that tight asserted space. So what we really want is our expected and asserted to be the same. So basically, behavior verification can overfit the implementation and the lack of tooling and good ways of matching and argument values makes this more likely. So let's talk about pattern matching. So rather than match arguments to values, you match them to actual more generic patterns. So we have called with pattern. So you can say actuator.moveforward called with pattern and you've passed in some matching function. So here we have is greater than or equal to 100. This takes the argument being matched and then it runs some check, say, are greater than or equals 100, returning true if the value is matched and false otherwise. Now obviously handwriting a bunch of closures all the time is probably not ideal. It's very verbose and is quite painful. So you get around this with parameterized match of functions. So for example, you would have a greater than or equal to function. This is a generic function that can take any type that implements partial EQ and partial order. In other words, any type that you can say greater than or equal on. It takes a single parameter. The first argument is the value being matched and it takes a single parameter base value which is actually what it does here, like what the base value should be. And so now instead of actually generating, sorry, manually writing all of our matches, we can just use a macro called p which is defined in double to basically generate these matching closures for us. So this generates matching closures on the fly by you saying take this parameterized match of g e and then 100. And so what this code looks like is pretty much the same. So we have p g e 100 and then match it. And there's loads of built-in matches. There's wild card matches if you don't care about specific arguments, comparisons, floating points, string matches, container matches. You can also compose matches together so you can say maybe you don't care that the robot moved just more than 100 but then it moved within some range. So you can say it matches all of greater than or equal to 100, lower than or equal to 200. You can also, and this is a particularly useful feature, is do matching across individual elements in the collection. So any iterable object, you can use this each match. So here we have some mock that records numbers or something. It takes a vector of integers and you want to assert that this mock was called with a vector where each element matched to this pattern which is not equal to zero. And yeah. So you can also define custom matches. So design considerations. So there were two design goals in double and this was rust stable. So rust stable was a requirement, particularly for me because I was working on software that required rust stable and not nightly. And it's surprising how many mocking frameworks out there always go with nightly. And it's because mocking and code generation is often a lot more convenient when you're using various nightly plugins and compiler plugins and whatnot. For me, I didn't have that option, so I needed to go with something that could only use stable features. And the second one was no changes to production code. Now it's okay to refactor your code to make it more testable, but actually adding extra awkward boilerplate to the code just to make it testable always has been a bit icky to me. And also at the same time, if you do that, that means you can't just mock any arbitrary traits. You would have to rely on the library developer adding a certain annotation to a trait or a struct or a function for you to be able to mock it. So with this, you can actually mock any arbitrary trait from any library and it doesn't matter. And these are challenging goals. Again, because the regional talk that I had was the second part of it was 20 minutes purely on why this was so challenging. But sadly, we don't have time, but I would love to discuss it. But basically, it's really, really difficult and it's partly because Rust as a compiler is so strict that it actually makes all this automatic code generation and generic mock functionality quite difficult. One thing this was inspired by was Google Mock. So if any of you have used C++, Google Mock is an amazing mocking framework, but it cheats in some ways that the borrower checker catches, which is quite frustrating, but also a good thing as well. So as I mentioned, most mocking libraries require nightly and pretty much all I found require production code change. You at least have to annotate the traits that you want to mock. But there is a cost to actually achieving those two goals and that is the slightly more verbose generation. So as you saw earlier, there was two macros. There was mock trait and mock method. So basically, you have to repeat yourself twice. The limitations in the current Rust stable version that basically make it impossible if you don't want any production code changes to merge those two macros together. There's a specific feature I'm waiting for which is generic specialization, which will make this problem go away. And once that does, they can be merged, but until that point, it is slightly more verbose. And that's pretty much the talk. To summarize, mocking is often used to isolate unit tests from external resources or large dependencies. And one way of achieving this in Rust is by replacing traits and functions. However, using mocking can also overfit the implementation. So you need to be very, very careful. And you need to have good tools that enable you to make these more nuanced assertions, not just you were called with exactly this. You need to have these more lucid assertions so that your developers don't hate their lives a year in the project and have to constantly change test code all the time when there's slightly changed implementation. Double is a crate that generates these traits and functions. There's a wide array of behaviors and setups. I already actually covered a very small subset of the overall set of features. And that has first class pattern matching support. That was the biggest reason I actually made this library was because pattern matching and Rust stable pretty much. It requires no changes to production code, but that does come at a cost. These are some alternative mocking libraries that I recommend checking out after this talk. Depending on your use case, you might actually find these a lot easier to use for you. So mock is in particular quite a good one. Here's a bunch of links. And that's it. Get in touch if you're interested or check out the double repo on GitHub if you want to contribute.