 Just like that's everyone. So hello, thank you so much for joining me. I'm Ryan Davis and today I'm going to be talking about Minitest 6 and what's coming to help you test feistier. Setting expectations up front. This is something that I always like to do in my talks. This is an overview idea talk also known as a hand wavy talk. There will be a little code in it. It is targeting testers at all levels. It is, my number is correct. I actually updated it. 35 slides within 40 minutes. That's about seven slides a minute, which is an okay pace. Something I don't normally do this, but I'm going to go outside of my comfort zone here and to my own home for a second. I figured out last night a bit drunkenly that I have released more than a thousand gems now and I'm thrilled to pieces to have realized and hit that milestone. So now that's taken care of. Let's go meta for a second. What's the narrative for a talk about a test framework? Yeah, I can't think of one either. As such, I'm stuck doing what is known as a presentation sandwich. Sandwich close enough. Turns out the emoji search is quite strange. So a presentation sandwich is where you tell the audience what you're going to tell them, then you tell them, and then right after that you tell them what you just told them. And this makes me sad. I do not like these talks. I don't like going with these talks because I always feel basic and a bit insulting. You can't track without the extra reminders. As an aside, I thought it'd be kind of fun to do my whole talk in emoji. But you wind up having to learn how to speak Neanderthal in order to do it. Go hamburger, bubble, eagle, bad, and that's just it was taking too much time. So this is about the last one. I certainly don't like giving these types of talks. That would feel hypocritical to me. I really prefer a nice story arc or some device that ties everything together. But there is no story here. Just the goal of telling you what's happening with Manita 6. So for that, I am incredibly sorry. But this just isn't the type of talk where a narrative is easy to come by. I'll do my best to keep you engaged. So to help with that, I'd like to know why you are here. So if I can get you to raise your hands if the following categories apply to you, maybe you're a brand new to testing and you want to see what the options are. All right, about five or so. You're a Manita user already and you want to know what's new. That looked to be about two-thirds of you, half to two-thirds. You're an R-spec user and you want to know how many says differs. Nice, awesome. You're an entrenched R-spec user and you want to heckle. You are not. All are welcome but just know that I heckle back. So Manita's test is my baby. It's a project, this project is a labor of love for me. Testing is something that I've been very passionate about. I've been doing unit testing in one form or another since about 1995, which is actually before Kent Beck's Seminal XP book came out that really popularized the practice. For me, TDD stands for test-driven design, not or at least as well as test-driven development. Coupled with Yagney and other practices from XP, I believe that TDD leads to better software every time, almost. And that's where Minitest comes in. I wrote Minitest using those practices and I wrote Minitest to help support those practices. So what am I talking about today? I'm gonna give a quick introduction to Minitest so that you understand what it's about in case you don't know. I'm gonna give a very quick history of Minitest so you understand where it's come from. I'm gonna talk about the philosophy and purpose of Minitest so that you better understand the decisions that I've made over time. And then I'm gonna detail the major changes coming in Minitest 6. Both architectural, mostly invisible changes that you may not notice as a user. And code changes, the changes that you're more likely to see. Contrasting that, I am not trying to convince you to test. That ship has sailed. If you missed it, you can't be convinced. This talk is not for you and you're probably better in a different talk. I am not trying to convince you that you should use Minitest. If you do, that's fantastic. And if you don't, that's okay too. Hopefully this talk will be informative for you. And I'm not here to bash on our spec as much as it's fun to do sometimes. After all, the maintainers like to get drunk together. Though I might provide statistics that you might not like because the following conditions hold true for all x. Whether it's test time or memory impact, but also the number of users and downloads. But really at the end of the day, I don't care what you use at all as long as you test. Please understand that in its entirety. I do not care as long as you test. That said, Minitest is faster and more direct and it always will be. I'm also not here to explain how Minitest works. Just the what and the why about Minitest and how it's changing. If you do want to know more about how Minitest works, Nate Birkepeck gave a really good talk. I want to say three years ago in RailsConf. Walking through Minitest, three. And I've given a couple talks on how to make a test framework from scratch that builds up to the design of Minitest pretty closely. And mine comes more bottom up, his is more top down, so if you watch both of them, you're covered. Thus concludes the first slice of bread. With that done, let's start with a quick introduction to Minitest. This feels a bit of a dull moment, but Minitest is a testing framework. It does support unit and functional testing using regular classes, methods, and assertions. This is known as X unit style for old geezers like me or test style. It also supports using described blocks, it blocks, and expectations. This is known as spec style. It supports benchmark tests for hardware agnostic performance testing. And it has simple stubbing and mocking. Some might say too simple, but that is actually intentional, as we'll see. It supports both TDD and BDD, which has nothing to do with whether you prefer spec style or not. That's a common misconception. As a very clean architecture, you can understand all the code in one short reading, which usually takes about an hour or two, literally. And it's fast. It's very fast. It's the fastest out there of the full-fledged test frameworks, and its job is to get out of your way. It has flexible reporting architecture that allows you to customize the output and integrate with other systems like CI's. And it's highly extensible using plugins from RubyGems. There are over 200 gems from Minitest. I didn't know that. Now, without other way, let's do the briefest history of Minitest that I can muster without being sarcastic and skipping the whole thing. History talks bore me to death. So this is gonna be very brief. I became the maintainer of test unit that shipped in Ruby in Standard Lib in about 2004. It was big, it was complex. I just didn't understand it, and, quite frankly, I was scared of it. Especially as a new maintainer with core commit bit, that was terrifying. It had a lot of confusing files, a fair amount of code. It was overly complex for what little it did. It had five different runners, including four different GUI libraries. It had two collectors or ways of discovering the tests to run. And it used every design pattern that you could find. Some of this was not the fault of test unit. It was following in JUnit's footsteps. Which was following in Smalltalk's test footsteps. He even had his own text templating engine for generating error messages that it did on every assertion, even if they passed. And everything boiled up to something called assert block instead of assert. And I can never figure out why. So generating all those extra blocks and closures was expensive and needlessly slow. And I thought, why? What does it take to implement only what I use? What is the simplest thing that I can do to ensure that my code is correct? That was failure driven development at its most extreme. I ran my test bare, everything that was undefined, I defined. So the first thing was test was undefined, and then unit was undefined, and then test case was undefined, and then so on and so forth, until very quickly it ran and eventually passed all the tests across all my projects. Which as the best I can spelunk and do software archeology was about 25 projects. Perhaps best of all, it was that it was only 75 lines of code. I don't use that diverse of a set of assertions. I'm a big fan of assert equal, and I use other things now and then. So really I had to implement just what I was using. That was much, hence the mini and mini test. You can actually sort of read this. Actually you can actually sort of read that. That's not bad. On the left, you have the entire framework in one method. Find all the classes, run their tests, set up and tear down. On the right is the test case class with set up and tear down stubs, assert, and all the other varied asserts underneath it using it. That's it. So here's the history in a nutshell. Version one came around 2006. One dot two was all the assertions that I didn't use. And then specs and mocks. That came in 2008. 2010 was benchmarks, yada, yada, yada. Five comes along in 2013, which is really where we're at now. In February I released the latest version, which is 511.3. That's the current functionality. The only real difference between that and 5.0 is that I've expanded the reporters a bit, and it's mostly an internal refactoring to make things easier for plugin writers. Two things of note. Minitas 5 has lasted for five years, and that changed very little other than the reporters I was talking about. It's API stable. And second, Minitas has existed for 11 years. I hadn't thought about that until I wrote this talk. And that feels like an obligatory gross point blank reference to me, which is much better looking than it is on my screen. My screen is really dark. Cool. 11 years, this stuff really sneaks up on you. So, I think this is more interesting data than a table of dates and descriptions. This is Minitas visualized over time. The blue line is the lines of code, and the red line is a complexity metric from a tool I wrote called FLOG. By comparison, here's our spec. Unfortunately, because of a repo split when they changed hands and stuff, I could only gather numbers from version 2.0 to 3.5, which I think is the latest. Here's Minitas numbers of the same scale. So, Minitas has had a number of firsts as a test framework, possibly in any language. That's hard to verify. It is the first test framework to shuffle tests by default. This prevents test order dependency bugs that can lead to production bugs. It was the first to include both test and spec frameworks, as well as TDD and BDD style support, providing options for a developer to choose any style that they want to work from. Mike Moore, who used to run the Mountain West conference, really likes to do something he calls spec assert style. So, he uses describe init, but then he uses asserts inside of there instead of expectations. It was the first framework that I know of to provide benchmark testing, providing platform and hardware agnostic performance testing. The nice thing about that is it means that your very fast laptop and your very slow CI can both pass, and you don't have to have fixed numbers or try to figure out the math to adjust between the two. That'll never work. Okay, now that you know a bit about what and the win of Minitest, let's look at the purpose and the philosophy behind it. As I said before, I subscribe to a lot of XP. Write one failing test at a time, then make it pass. It seems unintuitive at first glance, but it's the perfect carrot to keep you going without leading too far ahead of your actual needs. Do the simplest thing that could possibly work. This pair, I'm not going to try saying that. I've tried like 10 times and I can't. This pairs well with test first. You're never more ignorant than at the beginning of a project. So, why pretend that you can design for all of its complexities upfront? And you aren't going to need it. At least Yagney is pronounceable. Don't implement anything until you actually need it. This avoids the what abouts which kill off projects. And all of this tries to focus on being incredibly simple, at least as simple as is reasonable for your needs. Ruby has one of the most complex grammars out of today's programming languages. And don't even get me started on the semantics. They're crazy. There is a lot to learn just to get up and running, let alone semi-productive. Both mini-test and R-spec look like magic to a beginner. And that's because to a beginner, Ruby itself is magic. Beginners don't know about singleton methods on instances or why you'd want to use them. Sometimes we lose touch and forget that. I don't want mini-test to contribute to the bewilderment or at least contribute as little as possible to it. Your test framework shouldn't get in the way of learning or doing. But once they do learn a new construct, that beginner can start to use that new tool anywhere. Mini-test tries to expose Ruby with little magic involved. To use mini-test, all you need is a basic understanding of classes and objects, methods and method calls, and the ability to read some basic API.co. That's it. There's nothing extra to learn just to use it. Once you learn a new tool from mini-test like singleton methods, you can use it anywhere, not just in mini-test. Not say that you should because the singleton methods will bite you, but that's beside the point. Here's a minimal test that will actually run as is, and then it'll fail because you haven't implemented Calculate yet. There's a require to mini-test.run, which is the standard way of starting a file. You have a class opening up that has a descriptive name. It doesn't have to have tests in there, but I usually do. It subclasses mini-test test. Then you have a method that starts with the word test, that's important. And you have an assertion with an expected value and a calculation of a natural value. That's it. And here's the equivalent in spec style. Same require, you have a describe this time. You're naming off the class under test or putting a string, it doesn't matter. You're opening up a block, you're saying it, should handle the basics, do. You are pretending to ignore that underscore. You're doing the calculation, you're calling must equal and four, your expected value. You probably noticed the test style and spec style are one-to-one. There's only one extra construct in the spec style and that's basically incidental. They're actually the same thing and that's because spec things are just bridges over to test things. I don't want to reinvent the wheel. I don't want extra things to learn or teach. I don't want to document them. I don't want to implement them. I don't want to support them. I just want to use Ruby. By learning Ruby, you learn everything you need to write tests in Minitest. As well as being incredibly simple, Minitest drives to be incredibly fast. It's been a while since I ran my benchmarks, but Minitest is the fastest of the full-fledged test frameworks out there. By full-fledged, I mean has full functionality you would expect and is test safe. There is a library out there called riot that doesn't instantiate an object to run your test in, which means you can have it infect everything and I don't consider that to be full-fledged. It is slightly faster than Minitest. I'll never be able to catch up with that just because I'm doing the real thing and it's not. Pretty sure that Minitest will always be the fastest out there. It achieves that by doing nothing extra. In short, auto run at the top trips off run, which gathers up all the test classes, which tells each class to run its own tests and they go off and do their thing. See in another way, here's a call diagram to run tests. Call stacks pretty clean. It's easy to understand. We'll see later that I'm thinking of ways to clean this up even more. One way to keep incredibly fast is to store nothing extra. We do this because we want to reduce the impact on GC. Don't create any unnecessary objects. Let go of everything as soon as you can. You'd expect memory to be linear to the amount of testing you're doing, but in Minitest, memory is linear to the amount of failures and errors that you have because Minitest's statistics reporter class only hangs on to failures and errors. In RSpec, memory is exponential to the amount of testing that you're doing. I'm not sure why, but it's easy to see what generated tests and basic UNIX tools. I think that's one of the main reasons why it's so much slower than Minitest. It's just doing a lot more and holding on to all of it. Still here. So story time. There's a gem called Minitest to RSpec that ironically uses my Ruby parser, SexP processor and Ruby to Ruby to convert Minitest test suites to RSpec. So for fun, I ran this against one of my projects that was simpler and cleaner, didn't have any shared tests or anything. I did this and the tests that were in Minitest were already fast, which is what I wanted. I wanted something that would pretty much benchmark the frameworks against each other, not the test content. But in RSpec, they were 30% slower. Is that definitive? Of course not. This is not science. This is funny statistics. So Minitest is opinionated in order to improve testing overall. I have this idea of a testing knob. The idea is that your framework either helps or hinders increasing your testing mojo or your feistiness. So you start with an incredibly fast framework. You provide only meaningful assertions to reduce the nonsensical tests and providing a false sense of security to testers. I'm looking at you a certain nothing raised. You give the user the choice of test or spec style. You provide test randomization to prevent test order dependency bugs. That's a big one. Design for testing your results and side effects. Encourage mock last strategy, making good things easy and questionable things harder. Write benchmark tests against your algorithms. Refactor to custom assertions. It increases the expressivity of your tests. Refactor to abstract test classes and or modules. Require assertions to ensure that you're actually checking something in every test. This was a user contribution that I hadn't even thought of. I was like, of course I'm testing it on my test. So I turned this thing on, I ran all my test suites and I got failures. Cause sometimes shit just slips through the cracks. And then use test parallelization. I'm going to try to enunciate this well through the talk to drive you to a solid and thread safe strategies. Later we'll see about my ideas of trying to get to 11. Reproducibility. Mini test is pretty much just a fancy tool to reproducibly run things. All it does is call methods. That's it. In testing reproducibility is paramount. Without it, there's really no point in doing any of this stuff at all. You might as well just throw out your tests, throw out your implementation while you're at it. Running with a fixed seed should always reproduce the output allowing you to see the failures again in the same order that they ran before. Being able to selectively run failures allows you to reduce the problem set to a minimal reproduction. And if you really have to, you can declare that your tests need to run in order. Finally, do we need that again? Finally, mini test does not push a testing style, strategy, process or pedagogy. Personally, I like the flexibility of this approach but some people feel lost by this. So maybe it should. If I were to push a testing style it would be something like test first. I love to do quick and dirty sprints but when I'm writing anything that I know I'm gonna keep or if I'm doing exploratory programming I'll always write a failing test first. The design is always better in the end. Especially when you're still figuring out what you want this thing that you're making to be sticking to the essence of what you know keeps complexity and therefore bugs at bay. Like premature optimization I think over mocking is the root of all evil. I've seen tests mocked so heavily that they can't fail even if you remove the entire implementation and that's just sad. So set up a thing, call a method, verify the result or the side effect, nothing can be simpler. Simple, statist, tests. And if you knew this game and you don't know what to test I think this is a perfect rule of thumb. I've talked to Kent a number of times about testing strategies and one of the other things he told me that I really liked is getting the green just means I don't have enough tests yet. You have a gut feel as to when you do and it's just in your stomach. So to understand where Minitest 6 is trying to go it might help to see Minitest 5 a bit. One of the biggest incompatibilities going from Minitest 4 to 5 was namespace changes. I added a compatibility layer for users but a still bit plugin authors. I moved benchmark testing to its own class so they could be isolated. I removed all manager classes and I made tests run themselves. You know, simple objects with responsibilities for doing things. So here's Minitest 4. Do not bother to understand it this much. It's not important. What is important is the difference, not the content. Here's Minitest 5. The latest version 5.11 has been expanded in the reporter's side but that doesn't matter for this talk. Here's the interesting slide. All the new classes are in green and bold just in case deletions are in red and dashed. There were two major namespace changes. A lot of additional architecture. I want to avoid this in Minitest 6. All right, finally moving forward. Looking forward, sorry. Let's start with the architectural changes. For starters, there are no major structural changes. Normal users and plugin authors will be mostly unaffected. I've deleted all the Minitest 4 and 5 cruft old compatibility namespacing assertions that have been deprecated for ages are all gone. If you do need some of that cruft, if you're still, sorry, working on the upgrade path, here's a compatibility chart. If you still need your test suites unshuffled, you need to pin to 5.3.3. 5.11.x is gonna be the last real content release and then 5.12.0 is gonna be the last release and it's gonna just turn deprecations on. Well and good, but who cares? Let's get this to me. Okay, parallelism. Minitest has had parallel testing for quite a while, but I'm considering an architectural overhaul. Right now you use parallelizeMe to opt into parallelization on a class by class basis. And I'm seriously thinking about making this the default. And then maybe we can use serializeMe to opt out or maybe use a different superclass for serial testing. Parallelization is the next step to ensure that you're making solid thread safe design. It's the next thing to do to make your implementations really scale. I'm also seriously thinking about running all tests completely interweaved rather than on a class by class basis. I'll talk about that more in a bit. Easy worker-based distributed testing. So you can distribute locally and run all your workers on the same machine that lets you use all your CPUs and avoid locks. It also helps with tests that stub global methods like time.now. And here's some, there are some serious speed benefits. Here's some numbers that are not faked by Jeremy Evans on his gem Minitest parallel fork. The only, I think, difference between that project and what I'm looking at is to actually go across multiple machines where his just forks. You can also spread your tests run across machines or VMs. And then the worker-based distributed testing avoids the problem of having a file that runs longer than all the others. This will minimize run times automatically and help you on your CI's and any other machines that are resource tied. So how do you do the config and setup for distributed testing? Well, I don't know yet. And what is easy, what easy means at this point is up for debate. Aaron Patterson has contributed a patch that does provide distribution without any config whatsoever. And I'm working over on the config side trying to make it easy to do. And we're gonna meet somewhere in the middle. But a lot of that's up for debate right now. Minitest will make some assumptions about how to access your tests across machines such as SSH access, assuming that you're gonna have the proper ports open or tunnel, not sure which. And assume that you have the same source layout across all the machines. So if you're doing container-based testing, that should be pretty safe to assume. I have no idea if these are safe things to expect on non-unix platforms though. I will definitely need some help here. And if you wanna look at existing libraries for extending parallelization or distributed testing, you can check these projects out. I know that test queue is being used by GitHub. Napsack is apparently a pretty popular project and it has a paid service. And there's some others, I'm sure. I know that Shopify has a really crazy setup. So right now, the result of a test is the test instance itself. But a test can assign instance variables that can't be serialized, like a hash with a default block. Minitest is switching over to a new setup that guarantees safe serialization. This deletes all the custom Marshall code. And I added a result class that pulls the bare minimum needed from a test result in order to do the reporting properly. That guarantees that there are no extra instance variables that are hanging onto un-martialable data like Proxer classes. This makes it easier to do parallel and distributed testing. But it also means that things are easier on the GC because we're not lugging around closures accidentally. Remember, closures see everything above them that they can touch and they hold onto that until they get GC'd. So if one's live, all those other objects are live. But Ruby 2.5 added the ability for an exception to point back to its previous exception. So if you have something that raised, rescues and raises again, both of those things are pointing. That leaves the serialization hole open so I need better 2.5 tests to expose and fix this. Minitest 6 will have a simpler project structure. I'm falling behind on time. Some of the files are getting a bit hefty. For example, reporters have been broken out into their own file. As hinted before, I would really like to make the call stack cleaner. I'd love to do that. I'd love to make it even lighter, but I'm not sure this is possible without wrecking complete fucking havoc. Here's the current call stack. I kind of did a walk-through before, but basically auto run fires off run, which fires off double under run, which walks through all the runnables and tells them to run, and they walk through all their runnable methods and tell them to run. If we were to factor some things, again with the green bold, pull out the filtering from double under run, and then extend runnables to return both runnable classes and their methods, then we can shuffle across all tests. And here's my final crazy dream version with everything just clustered together and chaotic. I'm not settled how far to take this though. This could break a lot of current tests, but maybe they're meant to break. I don't know. There's a lot to say about parallel by default. It's an incredibly powerful and driving you to a safer design, but it's also incredibly problematic with some test strategies, such as non-transactional database testing, or stubbing class methods, because class methods are just global functions. So that's all I have for the architectural changes. It's not much really, but in reality, it's a lot to consider under the code changes. Some people really want a command line runner, so I'm adding one. I'd like to add all the usual bells and whistles without pulling in extra dependencies, but running by line number might be tricky without that. One thing that I really like about this is that it leads to better integration with other tools like Minihas bisect or Autotest. Rakes built-in test task is cumbersome at best. I actually hate it, and I don't use it. I have my own, and it makes it very hard to pass flags around, so maybe I should stop hoarding my own and provide one. There's a lot more bells and whistles, but it'll also be simpler just to pass options into the test run. There are some minor but important improvements to Minitest mock. Improved stubs to clean up testing stubbed methods that take blocks. This gets really hairy, but it's a lot more consistent now than it was in the past. And I'm considering adding fake, which is just a fancy wrapper around object new and that singleton method initializer I was talking about. It seems so simple, but maybe that means that it isn't needed. I don't know. That's up for debate. And assertions have been improved quite a bit. Assert equal with a nil value is no longer allowed that will raise an error. Use a certain nil to be explicit. This makes your tests more explicit and it signals your intent more clearly. It also drives out topological testing, tests that can't fail. Randy Coleman has a really good blog post about this and I highly suggest you read it. I can't explain it in six minutes. Assertions reuse themselves a lot more. For example, assert operator and assert predicate will call assert respond to first. This improves the error messages drastically from an error that is kind of hard to read, to a failure that's more to the point. I've improved failure message construction. If message is passed a proc itself, then that proc overrides all over their output. That allows top level user to find messages to win when passed down to wrapped assertions. I removed the last of the nonsensical assertions like assert send. They can be replaced with better assertions like assert predicate, assert operator, and assert includes, et cetera. Specs have gotten cleaned up a bit. Since expectations are just assertions, all those improvements to the assertions also apply. But I've also deleted all many of the spec expectations from object. This is pretty easy to switch to the new setup. Whereas before, he just called must equal directly. Now you have to use that wrapper to call it. I prefer underscore to wrap the expected value simply because it's short and gets out of my way visually. But the word value is a bit more descriptive and might be preferred. And honestly, I think expect is terrible, but it's there too. This cleans up the global namespace a lot. So last piece of that sandwich to review. First, I gave enough background to understand where Minitest is and how it got there. I went into the major changes company that has six architectural ones like parallelism overhauls, possibly on by default. Distributed testing and serialization guarantees needed for it to run smoothly. The possible overhaul of the call stack and the repercussions that could have. Then I went into detail on code changes coming, things that will impact test plugin or test and plugin writers like the command line runner, better rake integration, cleaning up stub and adding fake, improving assertions like not allowing nil and assert equal and providing better failure messages, and removing expectations from the global namespace. Remember that testing knob? That's really what Minitest and this talk is about. Increase it as much as possible, be feisty, but choose the setting that works for you and your project. And if you wanna help Minitest get to 12, I'm always open to more ways to encourage better testing practices. No one's good going yet, keep reading. All right, so that's basically it. Big features coming, not too many incompatibilities and most of those are no surprise. So when is this coming out? That's a good question. A lot of this is done in an experimental mode and I could put out an alpha quite soon, but some of it, not so much. There's still more debate to be done, more evaluating of different practices and just bake it in and see how well it works. I'm thinking weeks or months. Not really beyond that. As you know, stalling on releases is not really my style. But some of that depends on you. I could really use feedback and contributions. My current draft of this code is available on GitHub at the above URL and on the MT6 branch. Also post these slides today to my website and tweet it. Thank you. And if you appreciate and benefit from my work, I would appreciate it if you or your company would consider sponsoring me. Thank you.