 Thank you everyone for coming here. As you said, my name is Ryan Davis. I'm an elsewhere Zen spider and when you're talking today about making test frameworks from scratch, I took some advice from earlier and I am barefoot to try to make this talk a little smoother because I am more than a tad drunk. I'm a founding member of Seattle R.B., the first and oldest Ruby grade in the world. I'm now an independent consultant in Seattle and I'm available. I'm also the author of Minitest, which happens to be the most popular test framework in Ruby and I only mentioned that because I'm a little astounded that I'm actually feeding our spec. No idea. Apparently I've been doing this since July. So Toby's talk was a little amazing and in just 49 slides he described the physics of the solar system that's kind of mind blowing and I feel like maybe I missed the mark here. I'm only describing a test framework and I've got 332 slides. So that's segue into setting expectations. Something I like to do at the beginning of my talk. This is a very code heavy talk. The focus is on the what and the why and the hows of writing a test framework. It is 337 slides, about 9.5 slides per minute. That's 50% more than I've ever done before. So I'm going to be talking like this the entire time. And the slides are published at that first year and the code or similarly of the code is on the second year. I do find that increasing the number of slides kind of lets you connect the dots with little smaller jumps and helps with understanding. So I think this will work out fine. So first, famous quote, not said by a famous person. Tell me and I forget. Teach me and I may remember. Involve me and I learn. And that was not actually said by Benjamin Franklin who actually said this is a bit of a mystery. It's usually attributed to him, but whatever. Whether or not it's legitimate quote doesn't really matter. I think it points out an important problem in code walkthrough talks. But first don't get me wrong. Not all code walkthroughs are bad. Some are absolutely great. They're absolutely necessary to do your job. I'm only really talking about talks and some of them are still really good. But quite simply I could write this talk in my sleep simply by working through the current implementation of many tests and explaining each and every line. And you'd learn nothing. You'd forget it almost as quickly as you read my slides. Some of the many problems with code walkthroughs is that they're boring. They're top down. And as a result they focus on the what's and not the why's. And those are all good reasons to tune out and not learn a thing. So I'm not going to do that. That quote before that isn't by Benjamin Franklin, the real quote that we think it's based on is much better. And my contacts are not cooperating so you're going to read it not me. If you think that's too many words, here's the most concise version. Although he's not in the audience, I need to acknowledge that this was perhaps too many words as well. So for tender love, here's the emoji version. He is my most ferrity of friends. So starting from scratch is the point of this talk. Working up from nothing is the closest that I can get to allowing you to join me in building up a test framework from scratch. I will try to describe this in a way that you can literally code it at the same time that I'm describing it. And understand the steps that we went through to get there. Now would be a good time to open up your laptops if you want to attempt this. I was told last week that a number of people got about halfway through with me and then started to lose me. So I've got some timing notes in here. I'm going to try to be a little more even-paced. But if you lose me, it's not that big of a deal. You can download the GitHub repository and follow along that way or just deal with it afterwards. Further, this is not going to be a mini-test by the end of the talk. It will be some sort of subset. I'm going to play the 80-20 rule to show you most of what mini-tests can do in a minimal amount of code. And to emphasize that, I'm going to be referring to it as micro-test from here on out. Three or four of you got that. Awesome. And finally, I encourage you to deviate from the path that I'm going to be presenting and experiment. If you do things differently, you might understand my choices a little better. Finally, this talk is an adaptation from a chapter of a book that I'm writing on mini-tests. There will be more info on that at the end of the talk. So where to begin? At the bottom, the atomic unit of any test framework is the assertion. So let's start with plain old assert. In its simplest form, assert is incredibly straightforward and only takes one thing, the result of an expression, and it fails if that isn't truthy. And that's it. You have everything you need to test. Thank you. So let's take a look. When it comes out, are there any questions? No. I'd like to have a few more bells and whistles before I add to those, though. Let's try to figure out what I wrote and why. So in this case, I chose quite arbitrarily to raise an exception if the value of the test isn't truthy. In this case, I could have chosen to do something else other than raising an exception like throwing or pushing some sort of result object onto a collection and return it. And while there are tradeoffs to all these choices, it really doesn't matter as long as you report the failed assertion at some point. I mostly chose to raise an exception because exceptions work well for my brain. There's an added benefit that it interrupts the execution of the tests. It jumps out of the current level that we're currently running at, and we'll see more of that later. Okay, we're done. So if you ran this expression, 1 equals 1 would evaluate to true, and that would get sent to the test arg to assert, and that would do nothing in response. That would be a pass. Rather boring. However, if you ran this expression, 1 equals 2 would evaluate to false, which would get sent to the test arg and assert again, and that would wind up raising an exception. At some point, there will be mechanisms to deal with those exceptions and gracefully move on, but for now, the whole test suite is going to grind to a halt on the first failed test, and that's fine. One problem we have is that the raised exception reports at the place that the raise was called in the assert and not where the assertion was called, and that's a problem. I want to only talk about the place where the assertion was raised on line 5 of test RB, so I'll clean up the exception a bit by changing the way that the raise was called. I learned this last week, or two weeks ago, so there's always something you can learn about Ruby. Raise allows you to specify what exception class to use. That's not new, but you can also specify the backtrace that you'd like to show. So I'll use caller to generate a backtrace, which returns the current runtime stack. And now we see where the test actually failed. It's much more useful to the developer who's actually dealing with a real failure. Next, we're going to add a second assertion. Now that we have plain old assert, we can use that to build up any assertion that anyone could possibly need. At least 90% of my own tests are handled by checking for equality, so let's do that next. Luckily, it's incredibly simple to implement. I'll just pass the results of A equals B to assert, and assert does the rest. And here's how you use it. Where 2 plus 2 equals 4, it passes true to assert. Where 2 plus 2 does not equal 5, it passes false to assert, assert deals with the rest. This is really all I need to do most of my work quite happily. But the way that it stands right now, it has a pretty unhelpful error message when a failed assertion raises. First, the backtrace is pointing to assert equal. Didn't we just fix this? Kind of. I'm using caller for the backtrace, and that includes the entire call stack, including other assertions. So we need to filter assertions from it. So let's fix that by deleting everything at the start of the backtrace that's in the current implementation file. And here we wind up using drop while and matching against the double underscore file constant in order to delete anything that's got the current implementation file in the call stack. But it's still ugly. The failure just says failed test, and it makes it impossible for assert equal. Let's make it possible for assert equal to supply more information. Let's pull up the error message into an optional argument and use that argument and raise. Then we're going to change the assert equal so that we can provide a useful message. Now we get error messages that look like this. That's much more useful, although probably not 100% resilient at this point, but it'll do for now. Let's add one more assertion. One mistake people make time and again is misunderstanding how computers deal with floats. You guys are so much better than Salt Lake. The rule is really simple. Never, ever, ever test floats for equality. There are exceptions to this rule, but if you stick to this rule, you'll be fine in all cases. So while we have assert equal, we should not be using that for floats. What should we use then? We'll make a new assertion just for comparing floats. And that's going to see if the difference between the two numbers is close enough, where we currently define close enough to simply be within 1 1,000. We can make it fancier later, but for now, done is more important than good. So this is what it looks like. It's almost exactly the same as the assert equal, except we used the formula that was stated previously. And this is how it's used. And quite frankly, it works right out of the gate. So what does that mean? It means that our assert message is finally general purpose enough to be used for all the other assertions you might want. What about those? Writing other assertions you want is fine and good, necessary even. But that will take hours, and I'm only 25% through these slides, less now that I've added more. I'm going to consider this an exercise for you after the conference. What's your favorite type of thing to test? How would you write an assertion for that? Go do it, and then have a cookie. Once you're going to write one test, you'll want to write many tests. And this starts to introduce problems on its own. It'd be nice to keep them separate. There are many reasons why you'd want to break up your test and keep them separate. One would be organization, refactoring, reuse, safety, prioritization, parallelization, which I can still enunciate, so I'm doing good, less good now. It'd be nice if we could separate our tests from each other and organize them, but how would we go about that? We could do something really quickly and easily like this. Simply define a method called test that takes some sort of descriptor string and a block of code. That's it. It'd be a really easy to implement. You take the argument and you yield. You're done. This gives us the benefit that you can name the tests and you can put them in blocks to see that they're separate, but they're leaky. And leaky tests infect results. Now that we've been writing multiple tests and keep them organized, we need to be able to trust them. The problem is that these tests are actually all that separate from each other. Here we can see A as a local variable being set at the beginning, it's tested in the second test, modified and tested in the second, and then it fails on the third because the second one modified it. We really want those tests to be completely independent from each other. The fact that one test can mutate a local variable and that can be used by another is simply a mistake. This goes against good testing practices which state that a test should always pass regardless of what was run, what order it was run in, or anything else. Otherwise you don't trust the test and trusting the test is crucial. We'll fix this by using methods. There are a number of ways that we could try to patch this up. The simplest, perhaps, is just not to do it, and that's my favorite. Instead, we're just going to use Ruby. Ruby already provides a mechanism for separating blocks of code and making them immune to outer scopes. That's called a method. Here we can see that we have three methods and each one has to introduce its own local variable, modify it, and test it. The nicest thing about this approach is that it's absolutely free. There's no cost to using this that you aren't already paying by using Ruby in the first place. It's also important to remember that by using plain Ruby that anyone can understand it. It does have some drawbacks, though. First, you have to run the methods yourself. That's fine for now, and we'll address that later. Another, perhaps, more pressing issue is that there is code duplication in the previous examples, but there are simple ways to get around that, too, and I'm not going to bother going into them at this time in order to save time. Stick to plain Ruby, and it should be easy. Now that we have multiple tests separated by methods, how do we get them to run? Same way you run any method, you call it. We could come up with a more complicated way, and we will, but this will do for now. Methods are a good means to separate tests, but more problems arise when you do that. Unique method names, it's harder to organize, reuse, compose, et cetera. Luckily, Ruby comes with another mechanism, classes. Didn't I just say to keep them separate? Yeah, but it'd be nice to organize them, so how do we do that? Let's take the previous code and wrap it in a class. That's really all there is to it. But how do we run those? Wrapping the methods in a class breaks the current run. In order to fix it, we need an instance of that class before we can call the method. So we add that to each line, and we're passing again. This change doesn't really do anything for us. It groups the tests in classes, but it does put us in an ideal position to make the tests run themselves. Right now, we manually instantiate and call a method for each test. Let's push that responsibility towards the instance and have it run its own test. By adding a run instance method that takes a name and invokes it via send. That doesn't look like much either. In fact, by adding the call to run, we've actually made it a bit more cumbersome. We've kind of stepped backwards. But this will make the next step super easy. It also provides us a location where we can extend what running a test even means. For example, the run method would be a good place to add setup or teardown features. What would you add? Running test manually is still pretty cumbersome, so let's address that next. I'm actually on time. Wow, add more liquor. I don't have the same goals that Ernie does. Now that an instance knows how to run itself, let's make the class know how to run all of its tests. We can use public instance methods and then filter on all methods that end an underscore test. Public instance methods returns an array of all the public instance methods on that class or module. And then we can use enumerable grep method to filter on those. Wrap that up in a class method that instantiates and runs each test. So here we can see that we're using public instance methods. We're grepping on that. We're enumerating on that, and then we're instantiating and calling a run on each one just as before. That allows us to collapse this into this much better. This would be a good point to pause and apply some refactoring. What we've got is well and good, but it's only in one test class. It would really benefit us if we could push it up to a common ancestor and make a parent class a refactor. We simply reorganize the methods into a new class that we're gonna call test, very uniquely. Note that we also scooped up all the assertions while we were at it. Now let's make all of our test classes that we have subclass the new test. The new test class is starting to get to be a mouthful. And that's all there is to it. Do that to all the test classes, and they all benefit from code reuse. This makes it super trivial to have a bunch of classes of tests that can all run themselves, and let's push that a bit further. The only thing left to address is where we manually tell each test class to run its tests. So let's automate that too. Since we're using subclasses to organize our tests, we can use an underappreciated feature of Ruby. The class inherited hook. How many people know inherited? That's actually much better than I expected. Awesome. Again, I'm not in Salt Lake. Every time a new test subclass is created, we're going to record it automatically, and then test will know to fire all of them off. First, we need some place to record the things we need to run. Then, we use the inherited hook to record all classes that subclass test. From there, it is trivial to enumerate the collection and tell each one to run its tests. This allows us to rewrite this into this. And that would be an ideal place to put a file that has this require, and that would kick everything off. And that's all there is to it. So Microtest is kind of hamstrung at this point. We can run tests just fine, but we don't know what they do. Now that generalized testing is supported, it'd be nice to know what happens. On the one hand, silence is golden. If you don't see an exception raised, you know that everything worked. But it doesn't mean that they actually ran. I think this is one of those situations where the Russian proverb, trust but verify, is a good policy to have. So let's give the framework a way of reporting its results and see if we can't enhance things while we're in there. How do we know what the results of the run are? I'd be pretty happy just seeing that something ran. Let's start with that as a minimal goal. Let's print a dot on every test run. As a side note, this is my favorite slide I've ever written across all of my talks. I think Tufti would love that. Something about ink density or information density, it's fantastic. I have extra time, so I'm gonna wait here. Deal with it. Okay, so we're gonna add a print and a puts, and then we're done. It's a stupid simple thing to do. The emphasis, perhaps, is on stupid. Quite simply, we print a dot and then we add a new line at the end of the run in order to keep it pretty. Oh, there goes the enunciation. Gotta love whiskey. So doing so, we see this. Now that we see that we ran three tests and they passed, we have a better understanding of what actually happened and we can trust it more, that's good to know. We have valuable information about the test run. I am going to over enunciate everything. But what about the failures? I'm not gonna talk like that, I'm sorry. What happens when a test fails? Currently, if a test fails, it immediately quits since it's raising an unhandled exception. That's not too terrible, but it does imply that you only see the first problem that raises. And that might not provide as much insight as seeing all of the failures at once, so let's clean that up. We'll rescue exceptions and print out what happened. Now we'll see all the tests regardless of failures. And we also don't see loads of backtrace. We just see the failed assertion. We've been able to handle that and print it out better than it was going to do by default. Perhaps this is not the prettiest output, but it is much better than what we had before. But there are several things that I do not like about this code. I don't like the logic for running a test and doing IO is mixed in the same method. It's just messy, so I wanna address that and in the process, I wanna refactor the code to be more maintainable and capable. The problem I have is that the run class method is doing way more than just running. Here we can see about four categories of stuff that it's doing. Those colors are actually working out pretty well. The first thing I wanna do is separate the exception handling from the test run. I really don't like the test run as handling both printing and exception handling, but I especially don't like that it's doing the exception handling, so let's address that first. Since test class run calls test instance run, it's two hops up from where any actual exceptions are getting raised. We should refactor this and break up the responsibilities, and I should stop drinking. No. I want run all tests to only deal with running the test classes from the top. I want each class to run their individual tests. I want each test instance to run a single test and handle any failures, and I want something else entirely to deal with showing the test results, so let's move forward with that goal in mind. First, let's push the rescue down so that the instance run returns the raised exception or false if there is no failure. Now we change test class run to print appropriately based on the return value. Now that we have exception handling pushed down to the test, actually running, pushed down to the thing that's actually running the test. Having exceptions raise only one level up usually means that you're in a better place to deal with them when they happen. And by doing this, we've also converted some exception handling code into a symbol conditional. Makes it easier to deal with. Next, let's look at the IO. Let's extract the conditional responsible for IO into its own method and we'll call it report. By doing this, we've put ourselves in a better position to move that out entirely, and we'll do exactly that by extracting it to a class. We'll extract the report method into its own class called reporter, and we're gonna grab the puts too while we're at it. Call that done. This lets us rewrite run all tests into something that's much cleaner. We create a reporter instance in run all tests and we use that throughout. We pass down the reporter instance into run, and we use that to call report instead. And because name was a block variable, we need to pass that to reporter report as well. By doing this, we removed all IO from the test class and we've delegated that elsewhere. Throughout all these changes, we should be rerunning the tests to ensure that it works the same, but in this case, it doesn't. The class name is wrong now that we've pushed reporting into a separate class. For now, we're gonna go the quick fix route by passing in the class. I'm intentionally focusing on fixing this bug, not on using the right abstraction. Sometimes that's the right thing to do because you need to get shit done, but you pay the price in doing so, which we're gonna see. So let's add a new argument to report. We're gonna call it K and pass in the current class which is self in any class method to report. This fixes our output back to what we expect, but we had to add a third argument to do it and that should be a hint that we're doing things wrong. That's the type of thing that would make Sandy sad. So let's try to address that now. We don't need to pass the actual exception to report. We can pass anything that has all the information that we need to report the test result and what better thing than the test instance itself. All we need to do is to make the test record any failure that it might have and then make that accessible to the reporter. Let's add a failure attribute to test and default it to false. Then we modify test run to record the exception and failure and record the test instance instead of the exception. Now we can use the accessor and reporter to get the message and the back trace. Now we can clean up the third argument. Now that E is a test instance, we're able to get rid of the K argument and that gets us back to two arguments, which in my opinion is still one too many. So let's try to remove name. With a little tweaking test instances, can know the name of the test that it ran. First by adding an accessor and storing that off and initialize. Then passing the name to the initializer and not to run. We've swapped the arg over. Finally, removing the argument from run and using the accessor method instead. This means that a test instance is storing everything that reporter needs to do its job and we can get rid of the name argument. One more thing that I don't like is mixed types when you don't need them. Right now E is either false or the instance of a failed test. But tests now know whether they've passed or not. And false isn't helping and can be the source of pesky bugs since it's part of a logical operator. So let's get rid of false. By adding an insure block, we can get rid of that false and make sure that the run method always returns self. I should also point out that you need that return for some reason, you need the return, otherwise it might return the semantic value of one of the other blocks. Next, let's add an alias for the failure A predicate. That's the Canadian pronunciation. And switch reporter to using this new predicate method to test, this looks pretty good now. I just don't like the name E anymore since there's no longer an exception, but a test instance, so let's rename it. Let's call it result. This makes the code much more descriptive, albeit a bit longer. Okay, at this point, I could call it a night, go off the stage, and I'd actually be six minutes under and I'd beat Ernie. But the output is still a bit crufty, so let's fix that up. I want to enhance it. It would be nicer if we separated the run overview, meaning the dots and any failure indicators from the failure details. Something like this. Let's change report to store off the failure and print them in done. This is pretty easy to do. We need a new failures array. To store all the failures, we're gonna make an accessor, and we're gonna initialize it to a blank array. Then we need to print an F whenever we have a failure, and we need to store off the result. And finally, we need to move the printing code down to done, and then to an enumeration on that array. Now our output looks much better. But we're still not quite done. There are a couple of things left that are getting on my nerves. We've changed both report and done quite a bit. We're no longer doing what they say that they do, so let's rename them. Report becomes shovel, and done becomes summary. And to use those names and tests. At this point, I'm pretty happy with the code, but that does not mean that we're done yet. I'd like to add more enhancements. One common problem is that people will often write a test that depends on the side effects of a previous test where it to pass. If that test is run by itself, or in a different order, the tests are gonna fail. This goes against our rule of testing that tests should pass regardless of their order. An easy way to enforce this is to run the test in a random order. It's not guaranteed, but it will tease out any problems that we have over time. This is pretty easy to do in our current setup, but I'd rather not mix too many things into this method, either. So let's start by extracting the code that generates all the tests to run. Now tests run only deals with enumerating the test names and firing them off, and so we're in a better place to randomize those tests. And that's really all there is to it. We could get fancier, we could push up all the responsibility to run all tests so that we randomize across both classes and methods, but this is a good compromise, and I'll leave that as an exercise for you. So we're done for now, and I have three and a half minutes. Go me. What did we wind up with? We wound up with about 70 lines of Ruby. It does a good portion of what Minitest actually does. It's well-factored, it has zero duplications of any kind. The complexity score is incredibly low. It flogs at about 70 or about five per method, which is about half of the industry average outside of Rails, you guys know what I mean. And even without any comments, the code is incredibly readable. The reporter, the test class methods, and the instance methods, each one being a column, all fit on one slide, and that's not bad. It actually runs about twice as fast as Minitest because it's doing less. The worst thing about this talk is that I spent about nine slides per two lines of code. But that's a price that I'm willing to pay in order to explain this as clearly as I can, despite the slurring. So how did we get there? We started with the atom, we worked up to molecules, we gathered tests into methods and methods into classes, we taught the class how to run one method, we taught the class how to run all of its methods, we taught the system how to run all of its classes, then we bothered to add reporting, error handling, and randomization as a cherry on top. I'm hoping to soon publish a small book under Michael Hartle's Learn Enough to Be Dangerous series. If that works out well, I'm hoping to do a more complete book on Minitest. I will have a sample chapter coming soon out for review, but that's not ready quite yet, so please follow me on Twitter for announcements. Thank you, and I'm available for hire. Do I need Whiskey to use Minitest? No. No Whiskey was actually involved in creation of Minitest because I only started drinking about three, four, Aja, how many years ago? Four years ago, it was a Madison Ruby conference where they did a cheese whiskey pairing, and it was downhill from there. But I'd already written Minitest five years before that. Would I recommend writing Microtest as a team exercise? Fuck yeah, yeah, I totally would. This is actually pretty similar to how Minitest actually does its work, and I think it's really good for you to demystify the tools that you use and kind of peel the onion back and understand what's going on under the covers. And really, this is not far off from what Minitest is doing. This is very far off from what our spec is doing, and that's why in many ways, Minitest is winning. It's a lot faster because it's doing less. So the question is, where am I spending my time besides writing the extra assertions to make up the difference between Microtest and Minitest? The plug-in system is definitely part of that. The thing that gives Minitest the flexibility that it has to be mini is part of that complexity that makes the difference between Minitest and Microtest. And I have the hooks, the setup and tear-down and the other stuff that I was describing as an exercise to the listener. Other than that, there's not much of a difference. And the fact that Minitest does multiple test types, so it does BDD, TDD, and the benchmarking tests. It also has a really pitiful stubbing and testing framework. So it just does more categories of testing. This is entirely focused on, oh, I'm sorry, meat, and meat. Minitest is focused on multiple styles of testing, whereas Microtest is focused on just TDD. And not plug-ins. Anything else, I have, oh, that's a good question. The question was, what would be the next feature to implement in a Microtest if we were going to continue its development? I would say that the reporter is incredibly stuck and clumsy, Minitest's reporter has a good sense of the word, Minitest's reporter has an abstract reporter interface, if you want to call it that, for Ruby. And then a composite reporter, which allows you to plug in multiple reporters to do multiple things. And so I would say that would probably be the next thing that I worked on to make it be able to talk to CIs or output differently and do pride, that sort of things. And then after that, probably the plug-in system to make it flexible. And then, obviously, sorry, before both of those, assertions to make it useful. And thank you.