 So our next speaker is Ryan Davis. I was really excited about, once we put him on the schedule, introducing him, because how do you introduce Ryan Davis, right? So I've got a little circle in my head. I missed that joke, was that a joke? I can't exit fast. So Rails 1 dropped 5.5 years ago, something like that. I lived in Salt Lake City at the time, and Eli Duke, who's sitting back there, he and I had made several websites in PHP and MySQL, and we had this feeling that, you know, like, that wasn't right. Like, quote, we didn't really know anything about programming or other languages to know what wasn't right, like, what we didn't want to do. And we were about to mark on rebuilding the site that we built a few times. And I saw the blog, 15 Minutes Greencast, and I was like, oh, this Rails thing is awesome. Let's learn a whole new thing when we rebuild this website. And then a month later, he and I, four of our good friends moved up here to Seattle. We started the house together, and we built a site. It was like my first Rails project, and it was just adorable to go back and look at. Well, just it up sometimes. It's the most hilarious models. And like a month after that, he was building a list-making site called List Your List. And we've rebuilt that, basically, with every version of Rails. And so we went to, we were having some problem that we could have figured out, because we knew fuck all about anything. And we went to Seattle RV when it was at Red Line Pizza, so it was fun. And Inky.net was giving a presentation about Gamble, and there was like a functional group, and we were like, what the hell is this Ruby thing? I don't even care about it. Why is that guy's name a URL? Why does that have Unicode in it? And then that's kind of weird that we came back the next week. And that's when Seattle RV used to be once a month of presentations, or microconf, and then the rest of the weeks were just hack night. And we came back the next week, and it was like Jeff, and Ryan, and Eric, and later we would meet Evan, and John Brennan, and Aaron Patterson. But I remember the first time we came back, we were like, we've got this problem. I don't even remember what it was. And we need some help, can you take a look at it? And the first thing Ryan said to us was like, your code's shit, just delete. That sucks. And that's sort of been the ongoing relationship we've had for years. But, you know, we've always gotten better for it. And we basically knew nothing about ActiveRecord, and we were just doing five by sequels for everything. He was like, he showed us the Titanium Finders, and we were like, whatever. So we met Ryan and Eric, and they really shaped us in the early days of learning the ruby and the rails. So this has nothing to do with that. Ryan, next. A bunch of overhead reshade, long-tentative stories. So I'm going to talk today about the size of this matter, or the ins and outs of Minitab. I just want to say right now that despite the new window, that this talk is best approved. So a tiny amount about myself. I'm one of the founders of CNLRB. My email address is there. My website is there. I've been doing ruby for almost 11 years now, in October of 2011. And I've got 84 slides, doing 30 minutes. So we have to keep up on the timer. So for those of you who don't know, what is Maytest? It's a replacement that I wrote for Ruby 1.8 Test Unit, because it scared me. It's originally 90 lines of code. It has since grown into essentially three test frameworks in one. And it's considerably more than 90 lines, although a lot of that has to do with the fact that we actually documented our code this time. It is available as a gem. And if you have Ruby 1.9, you already have it. It does ship with Ruby. The test unit that ships with Ruby in 1.9 is a wrapper around Maytest that provides some of the old functionality that we didn't want to implement. It's meant to be small, clean, and very fast. And it provides a lot more than Test Unit did. As I already had simulated, there's three test frameworks in there. I'm going to go into that. So there are six parts of the Maytest. There's the runner, which nearly every test framework has. It's just the engine that picks things up and does the work and then reports the results afterwards. Then there's unit tests, specs, mocks, something called pride and bench. So the runner, a very simple system. It's exactly how a test unit works. It picks up all subclasses of test case and runs everything that starts with test. That's it. The design mandates that there's almost no magic allowed. It even avoids the regular object space mechanism to find test classes. We did this in order to bootstrap Rubinius's tests. We wanted to have the cleanest, simplest thing we could have to get, at the time, very partial implementation of Ruby up and running tests. Minimal metaprogramming and adjusted plain classes and methods to do all work. It provides test randomization by default to prevent test order dependency issues. We have flushed out a lot of bugs by doing this. A lot of people will inadvertently write tests that have to run in order for them to work. If you run one or you run them in a random order, they'll break. And those are bad tests. Every single test should run in isolation and pass. Verbo prints in a sortable format so that you can find the slowest tests in your test suite. And you'll notice that I am using the advanced notion of pipes. Sort-k2, the second field separated by local signs. Sort numerically and then reverse so the biggest is at the top and grab the top three of those methods. And then test summary provides useful statistics. It'll tell you the tests per second that you're running and the assertions per second that you're running. And we use this across our projects to see where we should spend our time. So let's talk about unit tests. Here is a very simple example, nothing special to it. We have a tester thingy. We have a test called do-the-thing. We assert equals 42, thingy-do-the-thing. It is a simple subclass and it is a simple method. That's it. It's magic free. So that's not much. So what assertions are available in any test unit? We have positive assertions and negative assertions. We have everything that you would expect to see, almost everything you would expect to see in test unit. And then the bolded assertions are those assertions that we've added that are new to test unit. Unfortunately, I don't have enough time to go into detail on all these. They are well documented, easy to look up with RI or you can look them up online on our doctor or whatever. And a certain equal is italicized when we go into the differences of a certain equal. A certain equal now diffs. For simple object differences, you'll just simply see something aligned that says expected. The expected value, the actual value, and you can see that side by side, that's fine. For anything more complex than that, something that has multiple lines or complex structure or whatever, where we deem it important, we actually pass the results through diff and show you unified diffs that you can focus on the parts that are different. This has saved us countless hours. And we are using this new assertive rules to replace the unit diff pipe tool that ships with Zentast because it's gasping. It's much more efficient for us to do this out of certain rules rather than trying to filter the output. On the negative assertion side, we have a brand new complement of negative assertions that you just didn't see in testing it. Things like refuting delta, refuting clues, et cetera. Again, I don't have time to go into these, but they're mostly self-explanatory. And then we have utility methods. Skip, we can now skip things in test unit, or any test, sorry. Well, test unit one, nine as well. Flunk is usual. A new one that I really, really like from Zentast called Capturio. It takes the block, returns the standard out and standard error. And then I've wrapped those up in assertions for a certain silent and assert output to make it a little cleaner. And then we have these ugly names, MU, PP methods that are for customizing how differences should be output in cases of like assertiveism. So why all these extra assertions? Because it's more expressive. It's enriching our testing vocabulary and helping make our tests more self-inscriptive. As you can see, you look through a regular test and you see a lot of assert knots. Now we can just simply say refute. And then something that you couldn't do in test units without writing it yourself. You used to be able to either use your own string IOs and wrap up IO or use Capturio from Zentast. Now you can simply say assert output, run a block of code and know that it output what you expected. So the question that I hear a fair amount is where is a certain nothing raised? This is a usual tie rate for me, so you may have heard this before, but I've reformulated it. It is the same place as refute silent. In refute silent, we can infer from the name that this block of code must print something. But what it is we don't care. So that assertion is of no value to us. It's not doing anything other than, yeah, something was output. Of course that output could be a crash, it could be anything. What you should be doing is you should be asserting for this specific output that you need. In the same sense, a certain nothing raised says that this block of code must do something, but what it is we don't care. So that value, there's no value in that assertion whatsoever. And instead you should be asserting for the specific result that you need. This code with nothing in the block passes. It's clear that it does nothing for us. In the same sense, this code simply calls do the thing, but that method might be empty. This passes its assertion and people are going to count that as pass, and it's useless. So what you should be doing instead is this. If do the thing raises, every test framework out there says that an unhandled exception is an error. It's already implied in the contract. It's assert equal to assert that do the thing returns 42, and therefore we have tested the behavior of the method, and we know that it's doing the right thing. So let's move on to specs. Here's an equivalent example of specs. Line for line is almost exactly the same, except that we're using BDD language. We're describing thingy, it must do the thing, thingy do the thing must equal 42. The nice thing about the way many test specs written is it uses simple reflection to transform the previous example into this example, where it is a simple subclass, a simple method, and there is no magic. We have a full complement of positive expectations mapping one to one into the assertions. Negative expectations map one to one into the negative assertion. It's all free, because must equal, is assert equal, won't equal, is refute equal, et cetera, et cetera, et cetera. We get all of that through simple code reuse, simple reflection, and that's it. Many test mock was written as a 15-ish line example from Steven Baker. I think that's about right. It might have been closer to 10. As a proof of concept that mock frameworks that are available today are bloated and complex. It's grown up to be a whopping 50-ish lines. Here's an example. Many test mock new, expect that the meaning of life will be called and will return 42. It can also take args in various formats, and that the call to the mock meaning a life does return 42, and then we call verify to ensure that the mock did everything that we expected it to do. But I want to plead that you do not use this if you don't did it. Don't use any mock framework if you don't get it, because over-mocking is evil. I want this word in your vocabulary. What we need to do is we need to mock last, and this needs to be said time and time again that mock should be the last tool that you grab, the test should already be written, it should already pass, and you only use it to detach it from slower, unstable external resources. If it's fast enough, it's fast enough, don't bother mocking. We need to mock high. Don't mock your sockets. Mock your readers. Mock your library and knock the protocol. The more that you dive down in your box, the more you're assuming you're interacting with, and the more you're going to get a mock. And mock smart. Make sure first that your test can fail. I don't know how many times that I've ripped a lot of mops out of tests and prevented bugs from ever being seen, because they're self-validated. I see this a lot, and when I say a lot, I think this one. In a code Phil Hillberg, a now effects member of the CLRB. Death is my stuff framework. Ruby provides you everything you need, built into the language to do stubs like that. In this case, we create a new thing, and we override the timestamp method to be in the future, and then you can refute that the object is not done. And then Aaron really wanted me to point out that we should be using the list of substitution principles and more on our tests. Now LSP says that any subclass is a valid replacement for its superclass in all cases. So if we design our code that way, we can use that in our tests. We see here that we have an example, a bullshit example, of an IRC class with a read method. And presumably that's going to read something off the socket and return it back to us. We're going to subclass that with test IRC, and we're going to use that class in our tests. And we override read to simply return a fixed string. And you can then test that the next line method, which is not read, the next line method will return happy because it's properly using read and we're using the subclass. So resist mocks by design. As you can see there, we're able to do that very easily. Designs that don't need mocking are always better than designs that do. They're more flexible, they're more testable, they're better, period. Now, we've got something called Minitest Pride that's probably out of place and it's kind of curious that it ships through every one night, but I'm going to allow it because I've committed it. It's a simple example of IOPyplining that I wrote on Pride weekend one year. And all it does is it replaces the dots with colored sharks. I think this is the 35 line example of how you can plug into the IOSystem of Minitest and make it do what you want. In this case, we initialize with an IOPject, we hold on to it, we overwrite print, and when we make print out the dots, when it gets a dot to print, colored dots, or stars. And actually initially it was printing out all sorts of Unicode characters and I'm just going to leave it at that. And then everything else gets method missing and passes it on to the IOPject that we're holding onto it. You can see below we're setting the output to a wrapper around the old output and you can pipeline those, pipeline those, pipeline those and make them do whatever you want. So you could use this as a really simple example, a cheat to start off on plugging this into a GUI or into your IDE. You could use it to emit to growl the way we used to with auto-test, or any other notifier. You could use it to record test statistics and record those over time. Whatever you want to do. Finally, my absolute favorite is Minitest benchmark. Here's a very simple example. Again, a bullshit example for Minitest unit. And what we have here is we're saying that the performance of this block of code over the domain should be linear to 49's performance. We call it obj.algorithm. The benchmark framework takes care of everything else for us. Spec, I should go back and say something. The nice thing is we simply prefix a method with Bench and it's going to be picked up by the benchmark framework. We avoid them getting picked up by the benchmark by not requiring the benchmark file. So by wrapping that around the benchmark, we're able to have it so that only our CI defines that environment variable. And so our CI is one of our benchmarks so we can have benchmark testing over time and consistent results on it. Spec is a little bit different. It's a little bit ugly by two lines. That's because where before we had a nice passive method, here we have an active loop. And so we have to wrap up the verb that may not be defined with the extra check. But otherwise it's exactly the same except that it's in more active language. So what is benchmark doing? Well, it's running that block code over a domain. It's gathering the data and then it's curve fitting that and testing that fit against the curve you said it should be. Whether it's linear or exponential or power doesn't matter. You can have a test for that and then say whatever it should be. But there's something to say about that. Harbour, memory, run time changes, whether you're running iTunes or not, whatever are going to cause differences in your task for underrun. So you can't test those values for the specific curve. You shouldn't test those values for the specific curve. You upgrade your RAM, you upgrade your CI server or you put it out on a VM or whatever. It's going to change those values. But what won't change is that if you test the fit at the curve you're going to have consistent results over time. So we provide assertions that extend the language of any test for constant performance, exponential performance, linear power or generic sort of performance and you can write your own fit. We provide similar any test spec methods, reverse and you just go nuts on this. It's a really wonderful way to test your performance over time. So let's talk for a second about extending. I saw this really awesome example in Bacon's reading. Bacon can define a lambda that says that a given object should be given to its reverse, call it palindrome and then you can say something as simple as already should be a palindrome and you're done. Extending bacon is beautiful. That's about as good as it gets. In many test unit it's not that much different. You say assert palindrome, give it an object and assert equal to its reverse. In many test spec you can write something very similar to that must be palindrome with that up an object and you say that self must equal self reverse. Or you can use the reflection mechanisms that are used to do all the narrative of the expectations and simply pull it in from the test side and write one nothing once. Other examples, simple examples we have in RubyGems are over the place because we refactor the tests a lot. We have assert path exists, refute path exists. We have assert satisfied by for a certain version and a requirement. Assert result for a given set of specs and what it should result to. It makes our tests much more readable, much more high level. Something a business person can review. So let's talk for a second about the design rationale. I've got just under 12 minutes left hoping to fit all this in but we can skip over some of the numbers if we don't have time. Specifically less is more. I started off very specifically recognizing 90 lines of code and making it work for what we needed, making it do nothing more. And it worked great for quite some time and then I finally, filling it out over time with functionality, realizing that it's not much more and not much more for long. And it's still a nice long library in comparison to its competitors. And importantly, indirection is the enemy. I want my failures to be at the point of failure. I don't want it to be six miles down the road. I want an exact stack trace. I want it to go straight to the column and I want to fix it. So let's look at a specific example. Assert build is what we use to test float. The quality ish. You can't test for quality on floats because they're inexact. It's testing that the difference between the expected value and the actual value is close enough. It's within a certain value. So you can see here that we test on reverse. We calculate the difference to the absolute value of that. We defer the calculation of the stream until there's been error. And we did that based on numbers because we profiled tests of all of Ruby Core and realized we were spending a lot of time building up the strings for no reason and we wrapped that up in a block. It has a helper method to help format the strings and all that helper method is really doing is concatenating and ensuring that it ends in a period. It's not much work. And then we're doing a simple assertion because that's a call that's needed. It's very straightforward. You can look at this and add a client so you can know what it is. There's only two methods that need to be understood besides a certain delta itself. We read that. There's no mystery to anything that it's doing. On the other side, n must be close to m, there's nothing to talk about because it's the exact same thing because it's fully reduced. You already understand it. Now let's look at perhaps my most complex slide. Bacon's example, n should be close to m with a delta, is more complex. I'm not expecting you to actually read this code all that much. It all helps to it. The first thing we do is we say n should and that's going to be concatenating a new should object. B in this case is a no-op. We end up calling close on the should object which doesn't exist, we have method missing. That wraps everything up in a satisfy that takes a block that calls send on the original object. And it's not entirely obvious but the first line that I have method missing is putting a question mark on the end of the name of the method we're calling. So that winds up calling numeric close a and that's it. It's actually quite equivalent to what we're doing. The really neat thing about this is despite this extra complexity over my previous example is that once you understand this you understand almost all of bacon. Bacon's only 300 lines long and that's including all the extensions like numeric close a and it's done. You would now understand bacon. It's a really pretty framework and you should look at it. There's only seven methods to be understood in this example. It's about 50 lines of critical and once you do that the mechanism is there it just makes sense. So there's a top level described that it should be satisfied method missing and close a. That's everything. Other equivalencies or expectations you just have to go study their implementations and you're done because all of this is the same mechanism. Certain delta test unit one is a bit of a mess. I don't want you to read this. More look at the shape. We have everything wrapped up in this wrap assertion. It's not really clear why. And we have a lot of work being done with a hash of only three items with only three lines of the code. A lot of work is being done building up that message calling build message and trace that kind of down to realize there's an entire template system in the test unit. And then we have this assert block despite an assert operator. We're having a assert block that does something fairly well. And I'm not going to trace that. It's gross. All in all there's about a hundred to sixty lines of this one assertion. And I'm not going to call those out either. Version two which is a gem that you can use in 1.8 or 1.9 is refactored and cleaned up by a lot. But once you start looking into those refactors you realize that they've actually doubled in size and it's quite a bit uglier. But the top low methods a lot more readable and I think that's worth calling out. I think that's important. I don't like this level of complexity. Here's how I would have written it. You don't need a hash of a layer when you only have three lines in the first place. So I would just simply do three assert responders followed by the operator respond to and finally do my work using the assert operator because it builds up a proper message for you. When you do it this way there's only 50 lines of code to understand. It's a lot more straightforward and it's a lot more performing. In our respect, N should be close to N. I wasn't smart enough to figure it out. I spent about two hours trying to trace through the code and the levels of the direction are so vast that I would love to sit down with someone later today if they can walk me through it but really I'd flip my clothes a bit. So I think we actually have time. So I'm going to go over some test framework comparisons. After you've seen the code this makes a little more sense. Bacon and Maytest are very performant. They're very similar in performance. They're very similar in size once you realize that Maytest is three frameworks. And Cucumber is always on the bottom line of this. So if we take the multiple column for the positive assertions and plot it in the code and plot it it looks like this. I think that's a better visualization. With Bacon being one unit and Bacon is always a unit you can see that Maytest and TestTube 1 are almost entirely performant. This is a performance against running a thousand tests with a simple assertion of each one. And then very quickly it's much slower. This is where you spend your time every day with greater assertions. If we look at the number of lines of code executed on a simple single line test you can see that many test units inspect a very very low width or a hundred more for this fact. I don't know if I'll be looking at that. And the simpler frameworks are near the bottom or smallest. What's interesting is that Bacon is doing a lot more work than I am and it's actually performing about the same. We're looking at those performance benefits and seeing what he's doing besides the fact that he really doesn't have a runner he executes the test as soon as they can pass. And then if I suspect this is the second worst in the cucumber it is by far the worst. Start up time. This is where we're really spending our time on the duration. This is how long it takes just to get up and running. Again TestTube 1 which cheats a little bit because it ships with Ruby 1.8 and it bypasses Ruby jumps. It's a little bit more performing on that. Followed by Bacon, many tests showed up. TestTube 2, our spec 2 is way out there and then the cucumber is just ridiculous. Lines of code including their dependencies I'm not even counting the lines of code that I found yesterday. There's just a lot of stuff there in cucumber and our spec 2 and it's something that you need to parse and execute every single time you fire a test. And finally a flaw you can see that there's an absolute correlation here between the number of lines of code and how much it's executing on a single test and the flaw. The complexity of Bacon and the complexity of any test are just nearly on the floor and everyone else is just going through the roof still not counting a seed. So to wrap up I think the numbers speak for themselves about where you're spending your time where your company is spending their money every man hour spent waiting for this stuff is wasted but I'll let you figure that out on your own and what's appropriate for you all this is really important to me but I think that if you can take anything from this talk number one thing that you can do to improve your life is not technology but there's no conferences on this topic please stop using pre-ground versions because there's vastly better evidence. We're about four blocks away from a place called World Spice which you may have seen on Good Eats it's an excellent place and you should check it out later today it's right below Pike Place Market and finally a special thanks to Gregory Brown and to my foster kittens they kept me sane during the River Jets drama over the last couple months and without them I probably wouldn't be here today