 So it's a little bit later in the day, I hope the copy is kicking in for everyone, starting to work for me. Today, I'm going to talk about what I call real world Ruby testing, and that is a lot of things I guess we've learned from writing and maintaining a sort of freakishly large test suite, but a little bit about myself first. My name, there we go, that's me. I have a Twitter and a blog. If you want to follow along, the slides are up on Heroku, and I work in puppet labs. So when you run this on our project, you get this, and that is almost over 9,000, but what it's not actually showing are all of the tests that didn't run because I don't have a Solaris box or an AIX box, or I don't have a Mac Quartz or Fink, so if you run all of the tests in the puppet lab suite, it's probably well over 10,000, and that seems like a lot of tests, and we've had a lot of experience wrangling those tests together and trying to make them work well with each other and run quickly enough that they don't impose too much of a burden to actually use them to drive development. So I'm going to talk to you today about what I've learned about using tests to drive development and manage change. So there are a lot of testing frameworks, and I first want to give you guys a brief overview of the most popular ones as far as I'm aware and help you possibly decide which one may be right for your next project. We'll start with test unit, which is the classic Ruby X unit testing framework. It's been in Ruby for quite a while, actually before 1.8, but it's in the Ruby 1.8 standard library, so everyone has access to it. And you're probably all familiar with the way the tests look. On top of test unit, we have, oops, I'll get to that in a second. The next one is RSpec, which is the de facto standard VDD framework, I think. A lot, I would say it's probably the majority testing framework that people are using these days. So I think most people are familiar with it. And I think we're all probably familiar with the way it looks. Actually, before I go any further, let me ask you guys a few questions. How many of you have never written any tests before? No hands, that is fantastic. I don't have as much, that is actually, that's surprising. I don't have as much to talk about as I thought I would. So I'll just wrap it up. How many of you use tests to drive your development? Test driven development VDD, 80, 90%? That's a good job, guys, really. This is unusual. This is a very singular phenomenon in the community, I think. So you guys are all familiar with Test Unit and RSpec. You're probably familiar with Shutter, which is a VDD framework bolted on top of Test Unit that adds to Test Unit the context and should from VDD style development. It has a lot of helpful matchers and assertions, a lot of which help implement Rails testing idioms. And those matchers can also be used in RSpec, drop and compatible with RSpec, which is nice. It was written by Timer Sala and the Thoughtbot guys and its tests look a lot like this. You can see some of the Rails matchers in use there. Some of the less popular frameworks that I think are still interesting to look at for how they differ in their philosophy and their design. You may learn something new about testing just by experimenting with a new framework. Many tests, which is actually not that different is Ryan Davis's. And it is now the testing framework that's in Ruby 1.9 as of 1.90. So when you're writing Ruby tests in 1.9, if you're not using another framework, you're using Minitest. Minitest is a Test Unit replacement, but it also has a VDD syntax and it has its own mock object. And it looks like this. And if you guys want to follow along with the code, there's not a lot of code, but you can look at the slides on Heroku. If you want to follow along with the code, this is the spec style. And this is an interesting VDD framework. It's lighter weight than our spec, but it adds, it's designed to be API compatible with our spec, but it adds some interesting features like focused examples and arbitrary metadata. With focused examples, you can have the spec suite only run examples you designated as focused. You can have it run the entire suite or just the focused examples and that helps you trim down run time when we know you only want to focus on a certain area of your testing framework. And you can also use custom metadata to implement your own filters or whatever else you would like. And it has its own syntax that should look relatively familiar. It's a pretty simple VDD framework. It's designed to be simple and that's one of its pluses. Bacon, though, is even more lightweight. It's less than 350 lines of code and that's hardly any code at all. I'm surprised it even runs, but it actually runs really fast too. It's at least in order of magnitude faster than our spec, typically. I've actually got some, Ryan Davis gave me some benchmarks that I'll be showing in a minute here. Bacon has a similar but not identical syntax to our spec. It doesn't have a lot of our specs more advanced features, but I think that's by design. I think it's designed to be a very simple bare bones testing framework and its speed is important and I'll get into it in a while later. Finally, and this is a newer framework, BearTest. BearTest is very interesting. BearTest, written by Steven Resterholtz. And it's a VDD style framework that has a lot of formatter so you can use tap style formatting which, for instance, Vim and I believe Emacs can parse so they can return your test results. By the way, Vim at least can also mostly parse test unit and our spec output to determine file and line numbers. And that's in the Ruby of Embundles, but BearTest has XML and tap compatible output formats and it has the concept of a suite and in a suite you can declare dependencies which we, at Puppet, we bolted dependency declarations and that kind of context management on top of our spec but that's baked into BearTest and it has assertion helpers for things like float inequalities, unordered comparison of collections and it has also an interactive mode which is sort of like a debugger inside your test suite which is pretty cool. And the dev branch of BearTest is not exactly a VDD style framework. It's very different from other frameworks that I've seen before and I definitely suggest you look at the development branch of BearTest for a new take or an old take that's been, I guess, rediscovered in the Ruby community on test-driven development. So these are the benchmarks and Ryan can tell you a bit more about how they're calculated and I should point out that Shuda is slow because of a bug. It's actual, it's real performance with a bug that Ryan fixed is somewhere between test unit one and test unit two. But if we look up at the top, MiniTest and Bacon are way faster, way, way faster. I was wrong apparently about the order of magnitude but way faster at least, you know, 200% faster than most of the other frameworks and that's actually maybe more important than you might think when you're dealing with your test suite, when you're running your tests. Ryan, do you have anything else to say about the benchmark? So probably, I would guess one three but I don't. I'm sorry? He didn't test BearTest because I didn't tell him about my talk and he didn't test it on his own. So I don't have numbers for it but I bet he can have them pretty soon. So given the possible choice paralysis involved with choosing a testing framework, how should we go about deciding first of all how I can get my screen back and second of all, how, there we go, we should choose a framework for our projects, for our company projects, for our personal projects and the answer is actually a lot simpler. By the way, here's an example for, I forgot one, sorry guys, back up. Riot is a take on BDD that doesn't, by design doesn't have set up or tear down blocks because the idea is that it enforces test independence by not allowing you to do things that might possibly cause state leak in your setup or, you know, forgetting to tear something down and it has a setup that defines a topic that's used like RSpec's new subject block and its code looks like this and okay now I'm back on track, sorry about that. Choosing a testing framework boils down to two things, familiarity and expressiveness and what I mean by familiarity is how familiar you are with the testing framework, how familiar your team is, how much experience you have collectively and how much experience your community has. The ease of getting answers for your testing framework of choice is going to be critical because if you're blocked writing a test, that's one of the, I think the worst possible ways to be blocked because you're really not making any progress, you're not learning about your system and you're really just stuck. So choose a testing framework that you are familiar with or can become familiar with quickly and one that has a community that's active enough to give you feedback when you need it to solve your problems and your issues that you have as you learn it. And expressiveness basically means how informative and how much feedback your tests give you about the state of the system under test. A more expressive, some frameworks are more expressive than others but you can write your tests to be more expressive in any of these languages. So something that's maybe surprising but I think is true is I don't think it matters which framework you use. I think that the difference in test quality between a good tester and a bad tester is way, way, way more important than the difference in test quality between any of these testing frameworks. And I just have anecdotal evidence for this but it's pretty overwhelming for me. So use a framework that you like, use a framework that your team likes, use a framework that has a community around it that can provide you the support you need and find one you like enough that you'll use it and really dig into it and make the best possible use out of it. I don't think it, I'm not going to recommend one because I don't think it matters which one you use. And so pick the one that you're most comfortable with and pick the one that you understand the best. So after talking about testing frameworks and sneakily disregarding that topic, I may have pulled a bit of a bait and switch on you guys when I said that I would be talking about testing frameworks. What I really wanna talk about is why we test and how we test. And for a lot of you in here that do TDD and BDD, the first half of why we test may come naturally to you but I think I still have something that may be of interest. Ron Jeffries said a while ago that we test because it gives us clean code that works. And I think that's a strong argument but I don't think it's the best argument. In Extreme Programming Explained, Kent Beck says that change is inevitable but change creates the need for feedback. And this is critical. And I think it's interesting that in Evan's talk, two of the salient points were trust and communication. And I've found that development psychologically is fractal in nature. These values like trust and communication are applied across all levels of scope, across all the different aspects of development. They're applied at low levels and high levels and they apply everywhere. And it's interesting that my talk at its core is also about trust and communication. And we test for two reasons. We test because we want confidence and we want to have communication, better communication about our code. We want confidence both now and in the future. Confidence in the now allows us to drive development and confidence in the future allows us to manage change. And we want communication because feedback, as I said, is essential for dealing with change. Without feedback, you can't change a course once it's set. Feedback is critical. And communication comes in many forms. And another, one of the more, I think, commonly expressed benefits of especially BDD style testing is that the tests themselves become living documentation of the code and that form of communication is also valuable, but maybe not as valuable as tests that create feedback. And I'll get into more about what I mean by that in a minute. The other reasons that we write tests are that tests allow teams to relax and develop trust. Trust with each other, trust with their codes. Customers look forward to releases of well-tested software because they have a greater expectation that it will actually work when they get it. And this is all anecdotal evidence. I don't have research studies to back this up because I couldn't get a grant. And I couldn't get a grant because I didn't apply for a grant, but this has been my experience and not just my experience but the experience of a lot of other people. So I probably don't actually have to sell you guys very much on the benefits of testing. So I'm going to move ahead a bit. Why, so I was talking about why ought we test and that's an easy sell with this crowd, which is good, which is surprising but good. I want to talk about why we do test. What are our goals in writing tests? So what is their purpose? The purpose for writing tests I think are twofold. One is to drive development and the other is to help us manage change. And this isn't a pure dichotomy because development causes change. So driving development is also managing change. But these two tensions I think express most of the value that we get out of testing. So now that we know why we do test what our goals are, I want to talk about how we write tests that help us achieve these goals. And I want to talk about what, I want to try to formulate a definition for what a good test is that will help you write better tests. And I want to talk about some common testing smells or anti-patterns that I've experienced that can really hurt both the confidence or the trust that I have in my tests and their ability to communicate. And I want to talk about testing practices that help improve trust and communication and also that help us manage change. So good tests, I struggled for a long time with a concise definition of what a good test is. And I think I might have found one. A good test for me is something that provides fast, focused feedback. And all of those words are important because a test that is slow, test suites that are slow don't get run enough. If your tests take more than a tenth of a second, then they're too long. If each test on average takes more than a tenth of a second, that doesn't seem like a lot of time. But when you have 10,000 specs to run, that takes a long time. That would take minutes. In our tests, we'd actually run those, almost 9,000 tests run in about a minute, which lets us actually run our tests often enough to get the feedback that we need to drive development. It's also important that your tests only test one thing at a time and that they localize the problems that you do have quickly. When a test fails, you want to know why it fails, where the code is that caused the failure, and how the test describes that failure. And if you can't figure that out from your test, it's not a good test. So that's what focus means. And finally, feedback is about, it's about minimizing the distance between your understanding of your code and your understanding of your tests. Your tests provide understanding of the behavior of your code and how it responds to change. Having a testing framework in place allows you to manage change by being aware of the changes you do make, so that when you make changes, you know you're making changes where you intend to and not anywhere else, and that the changes you do make are actually what you intended them to be. So if you take one thing away from this talk, I hope it would be that the hallmark of a good test is that it provides fast, focused feedback. And that's been very helpful for me in guiding my own test right. Let's talk a little bit about some common problems in tests and how they manifest themselves and how we can solve them. One of the most common problems, especially in large test suites, is stately. This is when two different tests share data and possibly mutate it in ways that would cause, for instance, we for a while at Puppet, and we still somewhat struggle with this, although we're making concerted effort to remove this technical debt, have significant portions of our test suite that would fail depending on the order in which the tests were run. Does anyone see a problem with that? We could literally switch the spec options from make time or change time to change time reversed and get 30 new failures. On a regular basis, this is terrible for providing confidence in the quality of our tests and the feedback that they provide. If they just fail on a whim, how do we know that the given failure you get when you write code is a real failure that gives you real feedback that really localizes an actual change or a problem? You don't. It's crucial that you make sure that your tests are encapsulated, that if they create state that that state isn't shared, if the state has to be shared, you have to ensure that that state is returned to a known state at the beginning of each test. And some frameworks like Rails will help you do this by, for instance, resetting the database every time. But that might not always be enough. And the first thing that I would suggest to minimize state leak is to minimize state. If you have less shared states or no shared state, there can be no state leak. It won't happen. So anytime you're running into state leak, the first question to ask yourself is where's my shared state and what's changing in that state? Once you can answer those questions, you can start to mitigate the problem of state leaking and eventually remove it entirely. The next problem that I want to talk about is long setup or tear down blocks. And the problem with long setup blocks is that it tells you that the code you're trying to test is in some way too complicated to be tested efficiently. Either your classes are too big or their interactions are too many. Something is going on that forces you to go to unnecessary lengths to simply get the code you want to test into a test harness. And that shouldn't be happening. Setup blocks should be, it's hard to provide a metric, but if they're more than a half dozen lines, it begins to be questionable. And one of the ways I try to minimize setup and tear down blocks is if I'm using a BDD style framework, which we do at Puppet, the context block that creates the scenario under test should have a corresponding, should have a correspondence with the setup block that it contains. So if my context is context when a user is logged in, my setup block should only log a user in. If it does anything else, it's doing the wrong thing. That also helps us keep a mental correspondence between what the tests say they're doing and what we are actually causing them to do. And that's important. Because if our tests lie to us about what they're testing or how they're testing it, that is extremely detrimental to our confidence, to our trust in our tests. The next issue I want to talk about is long running tests. And I really want to, no it's not. Setup duplication, if you have the same setup block repeated in multiple places, it can be an indication that the code under test requires unnecessary steps to create objects, to create state in the way that it should be and that you may have a design. It's indicative of a possible design problem. And all of these test smells aren't indicative of problems in your tests. They're indicative of design problems in your code under test. If the code were well designed, were well written, these test smells would largely disappear, assuming that you're not introducing them yourselves. So as long as you keep in mind the principles here, if you find yourself writing long setup code blocks or having a lot of duplication between your steps, you're being told by your test the feedback you're receiving is that you may have a design problem in the code that's under test. And you should listen to the feedback provided by your test on all levels, not just spec failures but also your difficulty in writing tests, your difficulty in getting classes into a test harness. Your difficulty in providing dependencies to objects. All of these things are telling you about potential design problems in your code. So after step duplication, set up duplication, I'll get to what I was thinking was next, which is long running tests. And I really want to stress that if your tests take more than a minute or two to run, people won't run them enough. They won't get the feedback they need to and you won't have confidence in the code provided by your other team members. I've got a question. Yeah, what about CI and your version control that forces a test when things are committed? So the question is what about continuous integration and version control and running tests on commits? I think that this is fantastic. I think that all projects should have some way of verifying the functionality of test, of code that's committed. But what I'm talking about today is developer tests specifically. I'm talking about test that developers write for developers. I'm also not talking about QA tests or acceptance tests. I'm talking specifically about development tests. So that is definitely something you should be doing but it's somewhat outside the scope of my talk. The reason I'm talking about developer tests is because they're the frontline of defense. They are the first feedback that we get. Feedback in seconds and minutes is far more valuable than feedback in hours or weeks. It's the time value of time is important here. The shorter the feedback cycles provided by your tests, the more feedback you get, the more quickly you can respond to change and the more quickly you can take action. Having a continuous integration suite that takes an hour to build across all, across different platforms, possibly across different versions is valuable but it doesn't provide the same kind of value that a short running testing suite that written by developers provides. Does that answer the question? So long running tests probably haven't even been run in a while. We had a problem with our spec suite where it took, I think 15 minutes to run and 15 minutes doesn't seem like a super long time for people who are familiar with two hour compile cycles and things like that but in the Ruby world, when you're trying to run an agile shop, when you're trying to respond to the kind of change we see with our community and our customers, 15 minutes for a developer to run tests and determine if the code that he just spent five minutes to write worked is an atrocious and unnecessary waste of time. It's a huge productivity killer. It takes you out of the rhythm of writing tests and writing code, which is one of the great values of driving development with tests is that the tests and the code, the cycle of writing tests and writing code creates a rhythm that helps developers stay in flow for longer. So long running tests take developers out of flow, they disrupt rhythm and code that runs long can itself be indicative of complex, or tests that run long can be indicative of complex code or code that's hard to test and can itself be indicative of a design problem in your code. And the last problem is fragile tests and I think I could probably give an entirely separate talk on the fragility of tests. We have had in the past tests that changing a line in a file on one side of the library code would cause a breakage in a completely different spec that tests a completely different file for no discernible reason, except that we had global state that we weren't managing properly in the test. So stately can cause fragility. Another thing that can cause fragility is improperly scoped tests. Tests that are focused at the wrong level, tests that provide feedback. The fragile test problem is the underlying problem is it's a failure of feedback, right? It's a failure of your test to provide the focused feedback that you need to accurately assess how your code has changed. And one of the ways you can avoid fragile testing and again, I really feel like this deserves its own talk, is to be aware of the trade-offs imposed by mocking versus dubbing versus faking and understand that the more invasive your use of those tools, the more fragile your tests tend to be. And Kent Beck's advice, to do the simplest thing that could possibly work is no more applicable than it is in testing. Writing the simplest test that could possibly work is one of the tenets of testing and development that should be true of all of your tests. You should always be looking for ways to simplify your tests because simple tests mean that your code is simple. And if you can write simple tests that aren't fragile, it's a indicator, it's a perfume, if you will, that your code under test is simple, that your code under test isn't fragile, that your code under test doesn't have state leak or other problems that cause failures where you don't expect them. And that communication gives you better trust in the quality of your tests and the quality of your code. So fragile tests are, I would say, as important, if not more important, than the rest of these, I should put it up there. Because fragile tests are the most sure of our way to undermine your developer's trust in their testing framework is to make your test break all the time for no apparent reason. There is nothing more frustrating than writing a line of code in one file having a test break in another file and you're not understanding why, nothing. So if you see fragile tests, now is the time to repay that technical debt. Don't wait because I don't know, Evan said that premature process is the root of all frustration, but I'm pretty sure that this is very summed. So I don't have time for Q&A. We can do one question. Okay, so let me go over very quickly my list of testing practices and then I will wrap up with a question. So I wanna talk about isolation, isolation between tests, which is to say that your test should not affect one another and this is the antithesis of state leak. A test list when you're writing tests that derive development, I'm sure many of you know who use test-driven development is essential to keep work focused. It helps to prevent scope breach. Regression tests are a critical way to provide feedback about and improve the trust in your code. When you fix a bug, write a test first and then ask yourself why you didn't write that test before you wrote the code so that next time you may be able to anticipate the problem of the future. The first test that you write when you're starting out at the beginning of the day should be a test from the list that you think you are confident that you can implement and that when you implement it will teach you something about the system you're trying to test. Finally, I talked briefly about mocking and stubbing and the perils therein and so I will wrap it up and try to get one question. No, thank you all very much.