 Doors are closing, and now we're covered. All right, great. Can we get my slides up on the monitors? All right, great. Let me start my timer. Where's my phone? Uh-oh. Who has my phone? Where's my timer? All right, we'll start. All right. So there's this funny thing where every year, conference season lines up with Apple's operating system release schedule, and I'm a big Apple fanboy. So I like, on one hand, really want to upgrade. And on the other hand, really want my slide deck to work. So this year, it was because they announced the iPad Pro, I was pretty excited. I was like, you know, maybe this year, finally, like OS 9 is going to be ready for me to give like a real talk out of and build my entire talk out of it. So this talk was built entirely in OS 9. So let's just start it up, see how it goes. I'm a little nervous. All right, so it's a little retro. It takes a while to start it up. I built my entire presentation in Apple Works. So I got to open up my Apple Works presentation. OK, there it is. I got to find the play button. And here we go. And good, all right. So this talk is how to stop hating your tests. My name is Justin. I play a guy named Searles on the internet. And I work at the best software agency in the world, TestDouble. So why do people hate their tests? Well, I think a lot of teams start off in experimentation mode. Like everything's fun, free. They're pivoting all the time. And having a big test suite would really just slow down their rate of change and discovery. But eventually, we get to a point where we're worried that if we create a new change, we might break things. And it's important things stay working. So people start writing some test suites. So they have a build. So when they push new code, they know whether they just broke stuff. But if we write our tests in a haphazard, unorganized way, they tend to be slow, convoluted. And every time we want to change a thing, we spend all day just updating tests. And eventually, teams get to this point where they just yearn for the good old days where they got to change stuff and move quickly. And I see this pattern repeat so much that I'm starting to believe that an ounce of prevention is a pound of cure in this instance. Because once you get to the end, there's not much you can do. You can say, hey, our test approach isn't working. And a lot of people would be like, well, I guess we're just not testing hard enough. And when you see a problem over and over again, I really personally, I don't believe that the work harder comrade approach is appropriate. You should be always inspecting your workflow and your tools and trying to make them better if you keep running into the same issue. Some other people, they might say, OK, well, this is buckled down, remediate. Testing is job one. Let's really focus on testing for a while. But from the perspective of the people who pay us to build stuff, testing is not job one. It's at best job two. From their perspective, they want to see us shipping stuff, shipping new features. And as the longer we go with that impedance mismatch, the more friction and tension we're going to have. So that's not sustainable. I said we're talking about prevention, but if you're like working in a big legacy monolithic application and you're not Greenfield, this is not a problem at all because I got this cool thing to show you. This is one weird trick to starting fresh with your test suite. That's right. You're going to learn what the one weird trick is. Basically, just move your test into a new directory, and then you make another directory, and then you have two directories. And you can write this thing called a shell script. Get this. That runs both test suites. And then eventually you port them over, and you're able to decommission the old test suite. But I hesitate to even give a talk about testing because I am the worst kind of expert. I have too much experience. Naval gazing about testing, building open sourcing tools around testing, been on many, many teams as the guy who cared just a little bit more about testing than everyone else, and lots of high-falutin, philosophical, and nuanced Twitter arguments that really are not pertinent to anyone's life. So my advice is toxic. I am overly cynical. I'm very risk averse. And if I told you what I really thought about testing, it would just discourage all of you. So instead, my goal here today is to distill my advice down into just a few component parts. The first part, we're going to talk about structure, the physicality of our tests, like what the lines and files look like on disk. We're going to talk about isolation because I really believe that how we choose to isolate the code that we're testing is the best way to communicate the concept and the value that we hope to get out of a test. And we're going to talk about feedback. Like, do our tests make us happy or sad? Are they fast or they slow? Do they make us more or less productive? And keep in mind, we're thinking about this from the perspective of prevention because these are all things that are much easier to do on day one than to try to shoehorn in on day 100. So at this point, in keeping with the Apple theme, my brother dug up this Apple II copy of Family Feud. And it turns out it's really hard to make custom artwork in Apple work six. So I just ripped off the artwork from this Family Feud board. We're going to use that to organize our slides. It's a working board. That means if I point at the screen and say, show me potato salad, I get an X. But unfortunately, I didn't have 100 people to survey. I just surveyed myself 100 times so I know all the answers already. So first round, we're going to talk about test structure. And I'm going to say, show me too big to fail. People hate tests of big code. In fact, have you ever noticed the people who were really into testing in TDD? They really seem to hate big objects and big functions more than normal people. I mean, we all understand big objects are harder to deal with than small objects. But one thing that I've learned over the years is that tests actually make big objects even harder to manage, which is counterintuitive. You'd expect the opposite. And I think part of the reason is that when you've got big objects, they might have many dependencies, right? Which means you have lots of tests set up. They might have multiple side effects in addition to whatever they return, which means you have lots of verifications. But what's most interesting is they have lots of logical branches. Like, depending on the arguments in the state, there's a lot of test cases that you have to write. And this is the one that I think is most significant. So let's take a look at some code. At this point, I realized that OS9 is not Unix. So I found a new terminal. It actually is a cool new one. It just came out this week. So let's boot that up. Yep, here we go. All right, we're almost there. It's a little slow. All right, so this is a fully operational terminal. All right, so we're gonna type in like arbitrary Unix command that works fine. I'm gonna start a new test. It's a validation method of a time sheet object to see whether or not people have notes entered. And so we're gonna say like, if you have notes and you're an admin and it's an invoice week or an off week, and whether you've entered time or not, all of those four Boolean attributes, they factor into whether or not that record is considered valid. And at this point, I wrote the first test, but I'm like, oh, I got a lot of other contexts to write, let's start planning those out. I'm like, damn, this is a lot of tests that I would need to write to cover this case of just four Booleans. And what I fell victim to there is a thing called the rule of product, which is a thing from the School of Competitorics and Math. It's a real math thing because it has a Wikipedia page. And what it says essentially is that if you've got a method with four arguments, you need to take each of those arguments and the number of possible values of each of them, multiply them together, and that gives you the total number of potential combinations or the total number of upper bound of test cases you might need to write. So in this case, with all Booleans, it's two to the fourth, so we have 16 test cases that we may have to write in this case. And if you're a team that's used to writing a lot of big objects, big functions, you're probably in the habit of thinking, oh, I have some new functionality, I'll just add one little more argument, like what more harm could that do other than double the number of test cases that I have to write. And so as a result of somebody who trains people on testing a lot, I'm not surprised at all to see a lot of teams who are used to big objects want to get serious about testing, and then they're like, wow, this is really hard, I quit. So if you want to get serious about testing and have a lot of tests of some code, I encourage you, stop the bleeding, don't keep adding on to your big objects. I try to limit new objects to one public method and at most three dependencies, which, to that particular audience, is shocking. The first thing they all say is like, but then we'll have too many small things. How will we possibly deal with all the well-organized and carefully named and comprehensible small things? And people get off on their own complexity, right? So they think that's what makes them a serious software developer is how hard their job is. They're like, that sounds like programming on easy mode. And I'm like, it is easy. It's actually like, you know, not rocket science to build an enterprise CRUD application, but you're making it that way. Just write small stuff, it works. So next up I want to talk about how we hate when our tests go off script. Code can do anything. Our program should be unique and creative, special unicorns of awesomeness, but tests can and should only do three things. They all follow the same script. Every test ever sets stuff up, invokes a thing, and then verifies behavior. We're writing the same program over and over again. And it has these three phases, arrange, act, and assert. A more English natural way to say that would be given when then. And when I'm writing a test, I always intentionally call out those three phases really clearly and consistently. For example, if I'm writing this as a mini test method, I always put exactly two empty new lines in every single X unit style test that I write. One after my arrange, one after my action, and then it's really clear at a glance, what's my arrange, what's my act, what's my assert. I always make sure that they go in the correct order as well, which is something people get wrong a lot. If I'm using something like RSpec, I've got a lot of constructs available to me to specify what the intent is. So I can say let and give a value to do a setup. So let says I'm setting up a new thing. I can use before to call out this is an action with a side effect, this is my act, and then that allows me to split up if I so choose those assertions into separate blocks. And so now at a glance, if somebody knows RSpec, they'll know exactly what phase each of those lines belongs in. I also try to minimize each phase to just one action per line so that test scoped logic doesn't sneak in. The late great Jim Wyrick wrote an awesome Ruby gem. I hope you check it out called RSpec Given. I help maintain it now. He and Mike Moore ported it to Minitest as well. I ported it a few years ago to Jasmine and somebody else has taken it on and ported it to Mocha. It's a really cool given when then conscious testing API. And what it does is you start from the same place with RSpec as you may have been before and we'll just say given instead of let because that's more straightforward, when instead of before, so it's clear. But where it really shines as we see then is just a little one liner and I don't have a custom assertions API because it's actually interpreting the Ruby inside of that and able to split it up to give you great error messages. So it's a really, really terse and yet successfully expressive testing API. Now you don't have to use that tool though to just write your test in a way that's conscious of given when then. They're easier to read regardless. They point out superfluous bits of test code that don't fit one of those three phases and they can highlight certain design smells. For instance, if you got a lot of given steps, maybe you have too many dependencies on your subject or two complex of arguments. If it takes more than one when step then it's probably the case that your API is confusing or hard to invoke. There's something awkward in how you use that object and if you got many then steps then your code is probably doing too much or it's returning too complex of a type. Next step I wanna talk about hard to read, hard to skim code. So some people are fond of saying that test code is code but you know, test code is untested code so I try to minimize it. I try to make it as boring as possible for that reason because what I find is that a good test tells me a story of what the code under test should look like but if there's logic in the test it confuses that story and I'm spending most of my time reading that logic and making sure I got it right because I know there's no test of that test. So test scoped logic not only is it hard to read but if there are any errors they're very easy to miss. Maybe it's passing green for fantasy reasons. Maybe only the last item in this loop of data is actually executing over and over again. A lot of times though people have this impulse they say hey I've got a lot of redundancy in my test I could really drive this up by just generating all of my test cases. For example this person did a Roman numeral cata and they want to, they can see very clearly oh I could just have a data structure and make it really much more terse looping over that data structure and then generating using defined method a new test method that will give a good message and it's a perfectly reasonable test and in this case it totally works fine but I still think it's problematic and the reason is that person experienced test pain and their reaction was to go and make the test cleaner. Usually when we experience test pain the first thing I look is maybe there's something wrong with my production code that led me there and so if you look at that person's production code you can see all that data is hiding in ifs and elses. They've got all this really dense logic in there I would much rather take a look at the same thing and extract the same sort of data structure from that so that I can then instead of having all that if and else I'm looping over the same data structure and figuring out whatever rule I have to so now I only need a few test cases in fact I can just keep adding additional keys to that hash and now I've covered a lot of cases without needing a whole bunch of really explicit test cases it's much cleaner this way. Sandy Metz who's around, is she here? Sandy where are you at? Hey Sandy, so she's got a thing called the squint test it helps her understand and cope with really big file listings and she can draw a few conclusions I don't have anything nearly so fancy but when I'm reading your test suite I really hope that I'm able to at a glance understand what's the thing under test and specifically like where are all the methods and are they in order and are they symmetrical is it easy for me to find all the tests of just one method and I like to use if I'm using RSpec for example I like to use context to point out every logical branch and all the subordinate behavior underneath each logical branch it's very easy to organize this way and when you do it consistently it's easy to read tests additionally like I said a range act of search should really pop in a consistent way now if I'm using an XUnit style testing tool like Minitest at least I want to see a range act of search really straightforwardly throughout every single file listing and the names of the test should mean something all right next up let's talk about tests that are too magic a lot of people hate tests that are too magic or not magic enough as it turns out because all software right is a balancing act and test libraries are no different expressiveness of our testing APIs exists on a long spectrum you know smaller APIs generally you know are slightly less expressive than things that have a larger API because they have more features but you have to learn those features and so if you look at something like Minitest it's very cool because it's like it's classes and methods we know that so every test is a class we override setup and tear down to override that behavior every new test is another method assert is very easy to use Ryan's a funny guy so he's got some fun ones like I suck and my tests are order dependent to get some custom behavior but when you compare that to RSpec it's night and day RSpec has described in context in their synonyms subject and let and they're similar before after and around and each sweet all for each of those you've got it you've got specify which are similar you've got object should have and all of those matches you've got expect to be in the mostly similar matches you've got shared example groups tagging advanced CLI features there's a lot to learn in RSpec Jim tried to have both ways when he designed given he wanted a TIRS API given when then with just a handful of other things he came across and invariant and his natural assertion API so it's very very TIRS but it's also sufficiently expressive for most people's tests now because it's not a standalone testing library you're still standing on top of all of many tester RSpec so it is still physically complicated but it's really nice to live in from a day to day basis and I'm not here to say that there's some right or wrong testing library or level of expressiveness you just have to keep yourself aware of the trade-offs right smaller testing APIs they're easier to learn but they might encourage more one-off test helpers that we write and that you carry that complexity whereas a bigger testing API something like RSpec might help you yield really TIRS test but to an uninitiated person they're just gonna look like magic and you have to eat that onboarding cost if somebody doesn't know RSpec finally in this category people hate tests that are accidentally creative because in testing consistency is golden if we look at a similar test of what we had before we're gonna use let to set up an author and a blog and a comment but it's not clear at all what the thing under test is so I'm gonna rename it subject I always call the thing that's under test subject and I always call the thing I get back from that thing that I'm gonna assert on I always call that result or results 100% of the time so if I'm reading a really big nasty test at least I know what's being tested and what's being asserted on this is a surprisingly daunting task in a lot of people's test suites so if you learn one thing today and you just start calling the thing that you're testing subject this will have been worth all of this preparation and when you're consistent inconsistency can actually carry nuanced meaning for example if I've got a handful of tests here I'm gonna look at them and go oh wait there's something weird about test C that implies there's probably something interesting about object C I should look into that but that's really useful that speeds me up but when every test is inconsistent if every test looks way different I have to bring that same level of scrutiny to each and every test and I have to read very carefully to understand what's going on understand the story of the test so as a result like if I'm adopting your test suite I would much rather see hundreds of very very consistent tests even if they're mediocre even if they're crappy then even a handful of beautifully crafted brilliant and artisanal you know custom tests that are way different because every time I fix anything it's just a one-off thing also readers are silly right they've got this funny habit of assuming that all of our code has meaning but especially in testing very often the stuff we put in our test is just plumbing to make our code execute properly so I try to point out meaningless stuff to help my reader out in particular I'd make unimportant test code look obviously silly and meaningless to the reader in this instance I'm setting up a new author object and he's got a fancy name and a phone number and an email and they're all validatable but that's not necessary for this method so here I'll just change his name to Pants and I'll remove his phone number because it's not necessary and I'll change his email to Pants mail and then I'll update my assertion and now everyone in the room before you might have assumed you needed a real author but you didn't now everyone in the room could implement this method understanding exactly what it needs to really do so test data should be minimal but also minimally meaningful, right? Yeah so hey we're already through section one we're through test structure congratulations we did it let's move on to round two talking about test isolation and the first thing I wanna talk about that really cheeses me off is unfocused test suites so most teams define success in Boolean terms when it comes to testing they have one question is it tested and if the answer is yes then they feel pretty good about themselves but I think we can dig deeper like my question is hey is the purpose of each test readily apparent and does its test suite promote consistency and very few teams can answer yes to this question and when I raise the issue a lot of people are like consistent but I've got tons of tests all with different purposes all testing all kinds of different things inside of my test suite and I'm like yeah that's true but you know you could probably boil it down to four or five and in fact what I do is for each type of test that I define I create a separate test suite each with their own set of conventions and those conventions lovingly reinforced with like their own spec helpers or test helpers to try to encourage consistency I actually did a whole talk just on that called breaking up with your test suite it's up on our blog or there's a short URL now in Agiland there's this illustration people like called the testing pyramid TLDR stuff at the top is illustrated to be more integrated stuff at the bottom is less integrated and when I look at most people's test suites they're all over the place some of the tests call through to other units others tests will fake out the relationships to other units some of the tests might hit a database but fake third party APIs some other tests might hit all those fake APIs but then operate beneath the user interface which means every time I open up a test I have to read it carefully and then understand like okay so like what's the plan here like what's real what's fake what are they trying to get out of this test and it's a huge waste of time so instead I start with just two suites in every single test or every application that I write one suite I make maximally realistic as integrated as I can possibly manage and another suite I make as isolated as possible part of the reason I do this is because then intuitively I can answer should I fake this yes or no and I kind of like you know careen towards one of those two extremes instead of landing all over the place the bottom suite you know it's job is to make sure that every little thing works in your system and the top suite is to make sure that when it's all plugged together nothing blows up it's pretty straightforward and very comprehensible now as the need arises it might be the case that you need to define some kind of semi integrated test suite and it's just important that you establish a clear set of norms and conventions so for instance I was on an Ember team recently and we agreed we're gonna start writing Ember component tests but up front we had to all get on board with the fact that we're gonna fake our APIs we're not gonna use testable objects we're gonna trigger actions instead of UI events and we're gonna verify app state not HTML templates these were arbitrary decisions but we relish the opportunity to lock in those arbitrary decisions because we knew it would bias consistency next I wanna talk about how too realistic of tests bum us out because when I ask somebody hey how realistic do you think this test should be they don't really have a good answer other than maximally realistic like I wanna make sure my thing works so as realistic as possible and so they might be proud of their very realistic web test you know there's a browser and it talks to the real server and a real database and in their mind this is as realistic as it gets and to poke holes in it I might ask hey well does it talk to your production DNS server and they're like no like well does it talk to your CDN and verify that your cache and validation strategy is working and they're like well no so it's not the case that it's a maximally realistic test at all in fact there were very definite boundaries here but the boundaries were totally implicit and that kind of implicit shakiness is a problem because now if something blows up anyone on the teams liable to ask why didn't we write a test for that and it puts these teams in a trap where they write some tests stuff blows up in production and then the managers come and they all have a come to Jesus moment and they're like why and then they're like never again and their only reaction is to increase the realism of all of their tests increase the integratedness now that's that would be fine except for the fact that realistic tests are slower they take more time to write to change to debug they require a higher cognitive load we have to keep more in our heads at once and then they fail from our only reaction is to increase parts they have a real cost so instead think this way like if you have really clear boundaries then you can focus on what's being tested really clearly and you can focus and be consistent about how you control stuff so the same team with that clarity of mind same thing happens you write tests you know stuff happens something blows up in production and then they can have a backbone they can stand tall and have a grown-up conversation about how up front they all agree that that type of test was too expensive or you know hey they didn't intentionally break production they were unable to anticipate that particular failure so like simply having tests of it you know it's really hard to automate something you can't predict right and additionally maybe they could write like a targeted test of just that one concern off to the side without making all of their tests slower in some broad based way aside from having high cost realism and test isn't some kind of universal ideal or virtue in fact less integrated tests are useful too they offer much richer design feedback of how it is to use our objects and any failures they might have are much easier for us to understand and reason about I just said reason about damn sorry slip of the tongue alright next up let's talk about redundant code coverage so suppose that you've got a lot of tests in your test suite you know you got browser tests you got view tests you got controller tests those I'll call through to a model maybe that model has relationships with other models and they all everything's tested eight ways to Tuesday and so you're very proud of your very thorough test suite in fact you're a test first team so you need to make a change to that model right so the first thing you do is you write a failing test and then you make that test pass and you feel pretty good so you push it up to to your continuous integration platform and then what happens well all those things depend on all those other things so your controller test your view test your browser test they all broke those related models they call through to that model they incidentally depend on it so those all broke and what took you half an hour on Monday morning you're now spending two days just cleaning up all these tests that you didn't anticipate having broken so it was thorough yeah but it was redundant too and I found that redundant coverage can really kill a team's morale and it's the sort of thing that doesn't bite you on day one because everything's fast and it's easy to run all in one place but once things get slow having a lot of redundant coverage can really really kill your productivity so how do you detect redundant coverage well it's the same way you detect any coverage right you can run a coverage report and then look at oh right the only thing we look at when we look at a coverage report right it's just like the easy targets of ways that we could increase our coverage but there's a lot of columns there what are those other columns say we never look at those other columns the last column is the average number of hits per line and I think that's pretty interesting right because that top thing got hit 256 times as I ran my tests what that tells me is that if I change that method I'm gonna have test breaking everywhere it's an important thing to think about so one thing we can do is identify a clear set of layers that we test through like for instance that same team might agree like the browser tests are valuable but these view and controller tests are mostly redundant so we'll just test through the browser and the models and reduce the amount of redundant code coverage or totally different strategy you could try your hand at outside in test driven development where you test from the outside in but you isolate each thing from the stuff that it depends on underneath so that you don't have this incidental dependency on other objects in your tests some people call that London School TDD Martin Fowler called it mockest TDD don't love that term or if you've heard of the book Goose Growing Object Oriental Software I realize now that I've iterated enough on it that I just call it my own thing I call it discovery testing lately I recently did a free screencast series on our blog about it just to kind of explain the concept in my workflow I'd love if you check that out if this interests you but we don't have time to talk about that today however I did bring up test doubles and fake stuff so it would only be fair to talk about people hate in their tests careless mocking right so I said test double test double is a catch all term for anything that fakes out another thing for the purpose of us writing our tests like a stunt double and you know a test double incorporates like you know it could be a fake object or a stub or a mock or a spy something you get from like a mocking library and what's funny here is that I happened to co-found a company named test double and I maintain several test double libraries so when I go and talk about testing people are normally like oh Justin you're probably pretty pro mocking right and it's actually a little bit more complicated than that I have a nuanced relationship with mock objects with test doubles because the way that I use them is this very careful and rigid process right I start with like I have the subject that I wanna write and I think I'm gonna need these three dependencies so I start with the test of that subject and I create fakes of those three things cause they don't exist yet and I use the test as a sounding board like are those APIs easy to use or are they awkward? Does the data flow, the data contracts between those three things do they all make sense and if not I can very easily change the fake because it's like you know the thing doesn't even exist yet so it's a very very easy time to catch design problems. That's not how 99.9% of the world uses their mock objects most people are trying to write a realistic test and they've got dependencies some are easy to set up maybe now there's hard to set up maybe one fails intermittently and so they just use mocking frameworks as this cudgel like they're just shutting up those dependencies that are causing them pain and then they just try to get their tests to pass and then as soon as the test is done they're exhausted and then they push it but on day two and onward you know we realize that those types of tests they just treat symptoms of test pain not the root cause they greatly confuse future readers what's the value of this test what's real what's fake what's going on here what's the point and they make me really sad right cause they give test doubles a bad name and I gotta protect my brand y'all so if you see someone abuse a test double say something hashtag Machio mox really please so before we wrap up on test isolation I wanna talk about application frameworks because frameworks are cool they provide repeatable solutions to common problems that we have but the most common category of problems that we deal with are how do I get my app to talk to X thing they're integration concerns usually so if we visualize our application as like some juicy plain old code in the middle and some framework coupled code around the periphery then maybe your framework is providing you with an easy way to talk HTTP or email or other cool stuff and the way that I visualize applications is like some have like maybe a default amount of coupling to the framework maybe I've been to some projects where literally every single line of code is coupled to a framework given type or asset and then some have very intentionally dodging the framework designs where they try to like skirt away into like a nice little domain driven land off to the side but regardless frameworks raise this dilemma when it comes to testing because they focus mostly right on integration problems and as a result when the framework provides you with test helpers those test helpers assume the same level of integration cause you wanna make sure that you use the framework correctly and that's completely fair the frameworks aren't messing up here but when we as framework consumers look at like our framework as the giver of all things that we need to use that means we're gonna end up only writing integration tests when in fact if some of our code doesn't rely on a framework why should our tests? The answer is they shouldn't you know you might still have a first test suite that does call through all the framework stuff you know that overly integrated test suite to make sure everything's plugged together right but if you've got a lot of juicy domain logic then by all means test that without the coupling to your framework not only will it be faster but you're gonna get much tighter feedback much better messages and much you know a better sense of like how the test can help improve your design. So that was a little bit on test isolation congratulations we got through round two just one round to go we're gonna talk a little bit about test feedback we're gonna start about another thing what people hate about the tests are bad error messages so you know let's talk about error messages but oh crap I broke the build now what? So let's go pull down this gem that I wrote this is a real gem there's a real build failure so naturally it's gonna have an awesome error message let's take a look at the error message failed a certain no message given on line 25 what's my workflow here to fix this well you gotta see the failure and then I gotta open up the test to find that line I gotta put out a print statement or I gotta debug to figure out what the expectation was what the actuality was then I can change my code and then I can see it pass and at that point I need a coffee break because it's been 20 minutes and that's my workflow it's super wasteful every single time I see a failure in that particular project so even if a test is fast we pride ourselves in fast tests bad failure messages provide so much friction and waste that they can easily offset how fast your test suite is now let's look at a good error message so this is an RSpec given example we're gonna say like then username equals equals sterling archer we run that test and when we look at the test Jim designed this so well you can see the assertion right there expected a sterling Mallory archer to equal sterling archer and you can see that you trip to the failure by the whole expression evaluating to false and then what it does is it keeps calling until it can't call anymore so the thing on the left there is like user.name evaluated a sterling Mallory archer yes but then it knew it could just call user and it's like oh look user is an active record object and it prints that whole active record object there for me so now in most of the time when I see a failure in RSpec given I'm like okay cool so my workflow is see the failure realize what I did wrong change the code and then earn a big juicy promotion because I'm so much faster than that other guy who's writing bad assertion library stuff so in my opinion judge assertion libraries as well as how you use assertions most assertion libraries allow you to write really great assertions and we just find a way not to judge them on their message quality not just how cool and snazzy their API is I think this is really important and overlooked next up let's talk about cause we talked about productivity a little bit it's like about slow feedback loops 480 is an interesting number 480 is the number of minutes in an eight hour work day and I think about this number a lot so when I'm looking at my own feedback loops let's say it takes me 15 seconds to change some code five seconds to run a test 10 seconds to decide what I'm gonna do next that's a 30 second feedback loop that means in an eight hour work day I have an upper bound of 960 thoughts that I'm allowed to have you know if you're like me and you have some non-coding responsibilities though you probably have some additional overhead the non-code time might take some time context switching and in a 60 second feedback loop which obviously ties back to 480 that would allow for two hours of non-code time in an eight hour work day but pretend we've been very successful we have a lot of tests and running a single test takes us now about 30 seconds now we're looking at an 85 second loop so just 338 actions a day almost cut by a third but that non-code time that's two hours fixed that doesn't care how fast your tests are so that has to get bumped up too so now we're looking at a slightly slower loop now imagine like you've got really bad error messages like we just talked about instead of being able to see in 10 seconds what's going on you got a debug or whatever and it takes you 60 seconds to figure out what's going on so now your feedback loop is 155 seconds so you only have 185 useful thoughts that you can have in a day and that sucks but if you've ever been on a team with a lot of integration tests and you draw the short straw for a given iteration and your job is to update all those integration tests I was on a team once where it literally took four minutes as the baseline to run an empty cucumber test and that was really really slow so in that case at 422 seconds it might have been my feedback loop yielding only 68 actions in a day now I don't know about you but if you're running if I'm running a four minute long test what happens is I'll start the test and then I'll go check out Twitter or Reddit or email or something and then I'll come back and I'll realize oh damn the test finished three minutes ago so my real feedback loop is more like 660 like 11 minutes 11 minutes like that's 43 actions a day my brain at the end of this like six months was literally rotting I could feel my skills atrophy I was miserable even though I got to spend a lot of time on Reddit so 43 you'll note is significantly smaller than 480 and you may not realize it but we just did something really significant together here today we found it it's the 10x developer the mythical 10x developer in the room today so this stuff matters a few seconds here and there really add up I encourage you use a stopwatch profile monitor your activity but oh crap I broke the seriously try to optimize your own feedback loops and if you're not able to if your app is just too slow then you can always just say like then username and our spec give an example and iterate quickly and then integrate later if you have to it's really important so next up I want to talk about one of the contributors to Salonis that we just mentioned painful test data because it's fast we pride ourselves hard now how much control each of our tests has depends on the testing data strategy that we apply so for example you might use like inline model creation and every test so you have a lot of control over how your test data is set up some people might use fixtures where you have a pretty good start point for a schema every time you run your tests if you have a lot of complex relationships or if you need a boatload of data to do anything interesting in your application you might curate a SQL dump that you can prime and load at the beginning of each test run and then other places like who either can't or choose not to control their data have to write tests that are sort of self priming if I want to you know I test some behavior that requires an account I have to use the app to first create an account and then I can run my test so none of these are good or bad per se but it's important to note that you don't have to pick just one means of setting up data in your application and you're allowed to change it midstream so if we look at the testing pyramid maybe inline we agree is like a good way to test models because it's very explicit and we have a lot of control maybe fixtures are good for integration tests because we don't want to creatively keep creating users when we could just have a default one data dumps I think make a lot of sense for smoke tests so that we don't see four minutes of execution inside of our factory RB file on every single test run and then you probably have no other option if you're going to write any tests against staging or production other than self priming because you probably don't want direct database access so what I found is that in slow test suites data setup is normally the biggest contributor to the slowness I don't have proof of that but it feels truthy so I made a slide I encourage everyone though to profile those slow tests and use git bisect to figure out exactly what made them slower and if necessary change your approach to how you control your test data speaking of stuff getting slower let's talk about one of my favorite phenomena super linear build slowdown so our intuition about how long our tests are going to take to run really betrays us because if we write one integration test and it takes five seconds to run we assume that means if we write 25 it's going to take 25 times as long and if we write 50 it's going to take 50 times as long the reason we assume that is because it means like we think like oh well the duration of one test means that we're spending five seconds in test code because it's a five second test but that's not how it really is because we also spend some time in app code and some time in setup and teardown in fact we probably spend more time in app code than in test code the test is pretty small and we probably spend a couple seconds you know setting up our database so maybe we're only spending in that five second test one second in test code if we add five tests that means that like well the app is getting bigger those features are starting to interact with one another so the app code is going to get like marginally smaller as things get bigger and we're going to spend more and more time in setup and teardown as our models get more complicated and that test that first test which we did not change at all is now taking what's that? like six plus one seven seven I'm not really great at math there's seven seconds where it was five seconds and that's not a big deal right that's just a couple of seconds you can see the deviations right there it's not a big deal it's not that big until we start to talk about more tests like if we had 25 tests maybe instead of three seconds per test it's now four seconds spent in app code and maybe six seconds spent because we have a lot of data set up now a lot of factories or something and so now the same one second test the first test that we wrote is taking 11 seconds instead of seven and we start to see this this just geometric curve right go way up but now we take it to like 50 tests and like you know things are to get really complicated all like tangled up and and now we're looking at like 18 seconds per test and we have to zoom out the graph because now it's like 900 seconds so halfway through our journey of building 25 tests we added 150 additional seconds in our build above and beyond of what our intuition told us we should have had and that second half of tests we added 500 seconds as a consultant it's shocking to me how often I hear from teams who are like yeah our build's a little too slow and then three months later it's like oh my god our build is nine hours please help us out of nowhere like they don't feel it because it's counterintuitive so track this stuff in fact what I encourage everyone to do is avoid the urge to create a new integration test as if by rote for each and every new feature that you write instead I try to just handle a couple of integration tests that zigzag their way through all of my different features instead in fact that's a better way to test the interactions that real users are going to do as opposed to just having like and now I've got this model and this crowd and this crowd and this crowd feature by feature without any interaction between our tests so early on too a fun thing that you can do as a team is you can make any arbitrary decision you want you might decide like let's gonna cap our build at like five minutes or 10 minutes and once we start creeping up to nine minutes say we can all say like okay well now we got to delete a test or we got to make stuff faster you know before we can add this next test that people want to write it's really effective and it's you know it's drawing a line in the sand and I've seen a lot of teams have a lot of success with it on to our last topic this is my favorite false negatives so what's false negatives about well what it gets at is this question what does it mean when a build fails and immediately someone will answer well it means the code's broken nope because the follow-up question is what file needs to change to fix the build well usually we have to update a test to fix it oh so a test was broken not the code and they scratch their head and I have to define what a true and false negative is so a true negative a red build means that something was actually broken in a meaningful way and the fix is that we have to fix our code and make our code work again a false negative is a red build means we're unfinished we forgot to update a test somewhere and the fix is go update some tests somewhere true negatives are great because they reinforce the value of our tests when our managers pay us to write tests they don't know false negatives exist they think that every test failure is like a bug that doesn't escape into production right so they make us feel really good but what I've found is that in practice on big teams they are depressingly rare I can count on my hand like three or four in the last several months that were really true negatives like we really caught a bug with our gigantic test suite yeah bummer false negatives meanwhile they erode our confidence in our tests you know they're the reason why our build bums people out like every time we see a build failure now it's like oh sure I gotta go update all these tests we start to feel like we're slaves to our tests and that's really really negative and draining and demoralizing now oddly enough the top causes of false negative test failures are redundant code coverage right so like we we updated that models test and the model and then we forgot that it was going to have a whole bunch of other incidental dependencies on other stuff and slow test because if it's so slow that I'm not running all of my tests before I push where I could catch it early instead I'm pushing that up to see I outsourcing it and then creating a lot of work for myself later in finding out like oh shoot I broke a bunch of other stuff ie when you have a lot of integration tests you tend to deal with a lot of cleanup from false negatives if you've been tuning out for most of this talk the TLDR is please write fewer integration tests that's really all there is to it and you'll be a lot happier so I encourage you to track if especially this is a new concept to you I encourage you to track whether every build failure was a true or false negative and then how long it took you to fix it because that's the kind of data that you can use to analyze root cause problems in your test suite and then use that to justify your investment in making broad based improvements to your tests so you know what we just talked about five things about test feedback that means we won we did it we reached the end of our journey together here this morning if this talk bummed you out and felt like it was like a little bit if it hit too close to home remember that no matter how bad your tests are this guy right here I probably hate Apple works more than you hate your tests so I'm here from test double like I mentioned you know if you're trying to hire senior developers onto your team you're probably having a really bad time right about now at test double we've got awesome senior developer consultants and we love working with existing teams and we can help you with this kind of stuff so get a hold of me I'm going to be here all week we've got Josh Greenwood he's around here somewhere as well as Jerry Dantonio who's given a great talk on concurrent Ruby tomorrow afternoon I hope you check that out if you want to be like us and focus with a mission on improving how the world builds software renew feature that you write and helping others get better consider joining us you can hit us up at join at test double dot com but most importantly and I'm going to be here all week I hope this is valuable to you I got lots of stickers to give out and business cards and stuff and I would love to meet you and hear your story but most importantly you know thanks for sharing your precious time with me I really appreciate it