 So, I have two items of business to take care of before we talk about Cucumber. These slides, this talk is open source. The slides that I'm using right now are available at tinyurl.com slash Cucumbers have layers. So if you have any trouble seeing the screen, hearing my voice, following my train of thought or if you like spoilers, you can download a PDF with both my slides and my script written on your very own device. You can even download the slides now and walk out and see a different talk, and I will be in no way offended. It's been really hard to choose from this program. It's been really great. The other thing that I want to mention is that I work for a company called Real Geeks. We sell web-based tools for real estate agents, real estate, real geeks, yeah. Anyway, I'm sure you will be shocked, just shocked, to learn that we are hiring. You may also want to know that our office is in Hawaii. I live in Portland, Oregon, and I work from home, but here's a photo from my last office visit. This is my daughter and me looking at the full moon rising, and my partner who took this photo has it as a lock screen on her phone. In my opinion, this is the best perk of the job. But anyway, we've got stuff in Ruby, Python, and JavaScript, and we especially need some help with React.js. So if you're interested in writing software in Hawaii, come say hello. So I have stickers. So with that out of the way, welcome. I'd like to get a quick show of hands. How many of you have used Cucumber on any project, large or small? Wow, almost everybody, cool. How many of you have used Cucumber more than one time? Okay, cool. And regardless of whether you've used it once or more than once, how many of you would use it again? More than I thought, okay, let me see some maybes, but yeah. Well, this talk is directly aimed at people who may have used Cucumber in the past and decided not to use it again. I'm glad to see that less of you are in that camp than I thought there would be. But I hope for those people to offer you at least a different perspective, convince you to maybe at least give Cucumber another look. For those of you, like three of you who haven't used Cucumber, you should go get the Cucumber book before you start. It will give you a much better introduction than I could, even if this were a full 45 minutes of Cucumber 101. But just so you're not completely lost, in Cucumber, you describe features of your software in a language called Gherkin. Gherkin is a DSL for writing acceptance tests. Because this is a Ruby conference and we have a tendency to say DSL when we mean API, I have to clarify. When I say Gherkin is a DSL, I mean it is an actual domain specific language with its own grammar and semantics. Gherkin is not Turing complete, but it can be used to tell a Turing equivalent machine what to do. As I was saying, Gherkin is a DSL for describing software. Each separate Gherkin file is called a feature. Here's a feature directly from the Cucumber website. Feature has one or more scenarios. The scenario has more one or more steps. Those are the given when then that you see at the bottom left of the screen. And aside from a few keywords, which I've highlighted here in green, everything else is written in whatever natural language works for you. Gherkin's grammar is really simple. Everything from a keyword to the end of the line is treated, essentially as a single token by the Gherkin parser. And Gherkin is useful, quite useful just as documentation for your project. But Cucumber, of course, also lets you use these feature files to automate tests, which is why the people behind Cucumber like to talk about executable specifications. To go from human readable documents to running tests, you do have to write a bunch of step definitions. A step definition is just a regular expression plus a block of code. This is how you translate from those human friendly blobs of text to something that Ruby can actually execute. When Cucumber wants to run a step, it tests that step against every one of the regular expressions that you've given it. When it finds a match, it executes the block that was associated with that regular expression. There's also a mechanism for taking captures from the regex and passing them as arguments to the blocks, which is how you can get interesting data into your tests. And in my mind, Gherkin and Cucumber are almost two separate things. Gherkin gives you a human friendly way to describe software, and Cucumber processes your Gherkin files and uses them as a script for automating tests. And Cucumber can be kind of unwieldy. I basically put up with Cucumber because I really like Gherkin. As I said, Gherkin is a DSL, or rather it's a domain-specific language where the domain is talking to other humans about software. That's very free-form, so it lets you talk about your application's domain using whatever language makes sense to you and your team. And Gherkin has just enough structure that it can be used to drive a lot of machinery for automating tests, but Gherkin is not a programming language, as mentioned. This is a critically important point that I think a lot of people overlook or can overlook when they're starting out with Cucumber. And I draw this distinction because programming languages are great, but when we're using them, they require us to get all of the details right up front, and that process of writing code and fighting with a very nit-picky parser tends to shift our focus onto how to do a thing. Gherkin, on the other hand, exists to help us think about what thing to do, why we're doing it, and even who we're doing it for. And it's also important to realize that Cucumber is not really a tool for doing test-driven development. Cucumber and TDD complement each other nicely, but it's been my experience that Cucumber works on a very different rhythm than TDD does. I think if Cucumber is a set of guide rails for TDD, and my workflow for using it goes something like this. Start with a Cucumber scenario, I run the scenario, and watch it fail. And I look at the error message to figure out why it failed and use that information to go write a unit test. I watch the unit test and refactor. At this point, I have a choice. If I need to write is, then I do that and I go back around the red-green refactor cycle, maybe several times. But if and when I get stuck, I go back to the Cucumber test again. It's probably still failing, but it's failing for a different reason. And that helps me figure out what to do next. I go back into the TDD cycle again, go round and round that inner loop. And at some point, I run the Cucumber test and it passes. Do a little dance, refactor, move to the next scenario. When I'm working this way, I spend most of my time in that tight inner loop, red-green refactor doing TDD. And the fact that the tests and the code are both written in the same language makes for less context shifting, switching, and it makes it very easy to go around that inner loop very quickly. Sometimes I'm writing a new test every minute or so if I'm doing really well. And this is where I'm focusing on how the thing works. It's a good satisfying detail-oriented work, but when I start to lose sight of the forest for the trees, I jump back up to Cucumber. And that shift from writing Ruby back to writing Gherkin helps remind me to get out of that hyper-focused how mode and come back to thinking about what, why, and who. And that helps me figure out what the next thing to do is. Tom Stewart, who I really wish were here this year, I'd love to talk with him about this stuff, wrote something about Cucumber that really resonated with me. He described Cucumber as more like a mind hack than a testing tool because it helps him think about the big picture rather than the details. And as I was putting this talk together, I asked Matt Nguyen, co-author of the Cucumber book, if there was anything that he wanted people to know about Cucumber. And he tweeted back that he wished more people knew Cucumber as a thinking and collaboration tool, not just something for test automation. Both of these quotes lead back to something I said a few slides ago. I asserted that Gherkin is not code and Cucumber is not for TDD, but negative definitions are not very useful. Or to put that another way, positive definitions are much more useful than negative ones. So what are these things for? Well, I already talked about how I use Cucumber as guide reels around TDD, but let's talk about Gherkin a little bit more. And everything I say in this talk, unless I'm quoting somebody else, is based on my own experience. I definitely do not speak for the Cucumber team. This is just what I think. I think that Gherkin is for describing software at the level of user intent. And you might choose to use Cucumber to turn your Gherkin artifacts into automated tests, but you don't have to. Now by describing software, I mean that Gherkin lets you capture acceptance criteria. And by acceptance criteria, I mean the system has to do this stuff or you don't get paid. By user intent, I mean that your Gherkin paints its picture in fairly broad strokes without getting bogged down into a lot of details. Details are what TDD is all about. I've worked with scenarios that look like this. And what I've found is that every time I tweak my user interface when I've got steps that look like this, 20 of my Cucumber scenarios will explode and then I get to spend the next hour or two editing them, which is not the best use of my time. And last, just because Cucumber is pitched as a tool for writing automated tests, you are not obligated to use it that way. Personally, I think Cucumber's greatest value comes from Gherkin and using it as a tool for facilitating conversations between developers and the people who pay us. I've written Gherkin files, thrown them away, and felt like my time had been very well spent because it helped me figure out what to do and what not to do. So this is what I now think Cucumber should be used for, but it took me years and years to figure this out and I made a lot of mistakes along the way. Some of those mistakes were very painful and embarrassing and in the hope that you can learn from them, I'm gonna share them with you. Yeah, this is the part of the talk where you get to laugh at me. This is by no means an exhaustive list of my Cucumber fuck ups. These are just some of the more interesting, entertaining or educational ones. I'm gonna show you a Cucumber scenario that I helped write in a real live code base and before I click the magic button, I want to reiterate it is okay for you to laugh at me for this. Here we go. So when we automated this, it would visit a route that was only defined in the Rails test environment. That route rendered a static view that in turn required a JavaScript file and the JavaScript file contained all of the unit tests for our front end helper functions. And that number 99, by the way, started out somewhere around 40 or 50, but we kept adding more JavaScript tests and every time we did that we had to go back in and update this Cucumber scenario with a new number. So how did this happen? Well, we had a fairly large Rails project with an extensive Cucumber test suite and we needed to run some unit tests that were most convenient to run in a browser and we looked at that and we thought, well, hey, we've already got this thing set up to drive Selenium which drives a web browser, right? So let's just put it in there and apparently there weren't any grownups around us to stop us. Here's something else I did in an actual project. Oh, that's a great look, I love that look. What this did was visit the new page for the widgets controller, fill out the form with randomly generated data, submit the form, check that it's on the show page and the show page is showing the same data we just submitted, click the edit link, change each value on the edit form, click save, make sure that the changes were visible, then delete the record and make sure that it was gone from the list. Now to make this work, we wrote something that could automatically mutate values in the form fields. In order to make that work, we had to add some CSS classes to our markup to indicate that a given field contained numbers or names or addresses so we knew how to mutate them. But that's just good semantic markup, right? At least that's what we told ourselves. And like seriously, this was a lot of fun to write. But while we were gold plating our Cucumber Suite, we were avoiding writing actual features that our actual customer actually cared about and pretty soon we were actually fired. So when you get a bug report, it's a good idea in my opinion to add an automated test to reproduce that bug so that you can prevent any future regressions of it. But also in my opinion, it is not a good idea to put these tests in Cucumber. Gherkin is a great way for you to tell the story of your application. Ideally, you should be able to print out all of your feature files and hand a small stack of paper to a new developer or a new manager. And they in turn should be able to read through that stack of paper in an hour or two and come away with a fairly good, high-level idea of what your software does. And cluttering up that story with a bunch of regression tests has a tendency to turn your Hemingway into Charles Dickens. And doing this is also a good way to commit the next Blunder, which is just plain having too many scenarios. Opinions vary on how long is a reasonable time to wait for a test suite. Personally, I'm willing to wait for about five minutes, up to three or four times a day. Any more than that, and my tests just may as well not be there because I'm not using them. Now, if you do find yourself in this situation, you might consider tagging a critical subset of your tests to run before every commit and then let your CI server run the whole suite after you push code. That helps mitigate the problem. It doesn't really address it, but it helps. Another way to commit the too many scenario Blunders is to automate every feature that you write. It's perfectly okay to use Gherkin to facilitate a conversation with somebody, possibly even just yourself. Like I sometimes talk to myself out loud, sometimes I talk to myself in Gherkin. And then you can throw away the feature file once you've learned what you needed to learn from it. Now, if you do feel the need to hang onto it for posterity, you can go ahead and check it into your features directory, but tag it as like FYI or TBD or something and change your cucumber configuration so that scenarios with that tag never get run. And step definitions. There are so many ways that step definitions can make you hate life. If you get the cucumber book and you should, as I mentioned, get the cucumber book, it will tell you to define a bunch of helper functions and then call those helper functions from your step definitions. And this does help. But I started using Cucumber years before the cucumber book was published, which means that I've made mistakes like having too many step definitions, having huge step definitions, sometimes up to 100 lines of code in a block, having too many huge step definitions, having step definitions that call other step definitions. Really, just putting about, just about any logic whatsoever in a step definition is not a good idea. After making all of those mistakes and more, I found myself feeling very conflicted about Cucumber. On the one hand, I really loved the expressiveness of Gherkin and I wanted to believe in this idea that programmers and managers could sit down together in a room and write acceptance tests together in universal harmony. But I struggled to reconcile that with my experience, with the project I was working on at the time, which had hundreds of scenarios backed by like 750 step definitions, which together made up about 5,000 lines of code in step definitions. And this whole test suite took about 90 minutes to run. And eventually I found myself, as I was struggling with, I found myself, as I was struggling with this conflict, I found myself asking an interesting question. How would you write scenarios if you didn't know what your user interface was gonna be? And I think that if you can tell from reading your Cucumber features, whether you're using a web application or a desktop application or a command line interface, you're probably letting too much detail leak into your features. Here's an example of the difference that I'm talking about. Here's another one. So this question floated around in my head for a while as I worked on other things, until once upon a time I was brought in to work on one part of a rather large monolithic Rails app that calculated salesperson commissions. Now you might hear salesperson commissions, and first off fall asleep. And second, I think, okay, so what you do is you add up how much each person sold and you multiply that by a percentage and you cut them a check. But that would be far too simple. At this place there were usually about half a dozen compensation schemes in effect at any one time. These schemes changed a couple of times a year, sometimes quite dramatically. And they were worked out by the sales department who would put together something like 15 pages of dense, confusing, quasi-legalese to describe each scheme, which we then had to read through as developers and somehow translate into working code and tests usually went out the window as we did this. So one of my goals in working on this project as I trudged through it was to be able to describe every aspect of these compensation schemes using Gherkin so that it would be easier for me to think about it and so that I could take those Gherkin files and take them back to the sales department and say, this is what we would really like to have. Can we use this to maybe work together a little bit more smoothly? So I'm gonna talk about a simplified, fictionalized version of how just one of those compensation schemes worked. And it's really like there's some walls of text here in the next couple of slides. It's not super important for you to catch the details. They're not really all that relevant to the talk. I'm just, they're just here to give you some idea, some concrete examples of how I wrote and organized my futures for this project. We'll start with the concept of a sales target and a target bonus. Basically, the company says that if you sell $100,000 worth of widgets in a month, that's the sales target. We will pay you $100 over your base salary. That's the target bonus. And these numbers are not realistic. I just picked them to make it very easy to convert between dollars and percentages in my head and the different orders of magnitude helped me keep track of which was which. Anyway, there's a scaling factor here. If you miss your target, you get paid less. If you exceed your target, you get paid more. That's pretty straightforward, but there's a catch. The catch is this little thing called a pay curve. The pay curve is a simple function. You put in the percentage of your sales target that you hit and you get back the percentage of your target bonus that you're gonna get paid. Now to describe the pay curve, the sales department actually gave us a spreadsheet with example rows for every possible input value from zero up to 250% of your target. We had some overachievers, I guess. Now fortunately, since this was already in a spreadsheet, it was very easy to build a chart so we could see what the pay curve looked like. Looking at the chart shows that it was a piecewise linear function and that helped me relax and figure out what was going on and get my head around what was happening. So I went back to the spreadsheet and from there it was very straightforward to convert the rows in that spreadsheet into rows in a cucumber table, excuse me, and use them to drive a scenario outline. In case you're not familiar with this feature of Gherkin, a scenario outline is basically a template for a scenario followed by a table. And the template gets executed once for each row in that table with values from the appropriate column filled in wherever you put a placeholder value. This is a very easy way to automate a lot of tests that look the same. Now because I knew that this was a piecewise linear function, I was able to get away with not pasting all 251 rows of that spreadsheet into a giant cucumber table. I just put in a few examples around each of those inflection points to make sure I got the boundary cases right. And then I moved on to describing the compensation scheme. At first glance, when I first wrote this, it looked remarkably similar to the scenarios for the pay curves. And it felt like I was repeating myself, but when I stopped and looked at it again, I realized this was actually introducing quite a few new concepts. There's the idea that there is at least one and probably more than one compensation scheme. You get sales, bonus, or sales target and target bonus in dollars instead of percentages. This has actual sales and bonus amount also in dollars. With those concepts in place, I was then set up to introduce the next feature of this scheme, which is the safety net. And this is the last new thing, I promise. The safety net is this feature to help out new hires as they're getting up to speed for the first few months. This is basically a guarantee that you're always gonna get paid at least to the amount of your target bonus. If you don't hit your sales target, we'll kick in the difference so that you can count on some income as you're getting up to speed. If you do better than your sales target, we'll still pay you more than your target bonus, but you're never gonna make less, at least until we take those training wheels away. Anyway, that's enough to give you a sense, I think, for how I organized and wrote the Gherkin features for this project. But I do wanna talk about an underutilized element of Gherkin's grammar. I left this out on earlier slides so that I can make the text bigger. But Gherkin gives you some space at the top of the file where you can write whatever you want. A lot of the examples and tutorials that I've seen show that space being used for as I want, so that. But in practice, I find that people tend to fill in that template without really thinking about it. So sometimes I just skip this part. For this project, though, I used that space to provide some context about why this feature exists, or what makes this feature interesting in comparison to other features that may be similar. For this scheme, I was worried that the examples of how the safety net worked in specific cases might not fully communicate or explain what that aspect of the scheme was for, why it was there. So I just took a few lines to explain it using the simplest language that I could. And after I handed this project off to some other developers, one of the bits of feedback that I got from them was that this documentation in particular was extremely helpful in making sense of these frankly ridiculous compensation schemes. And again, this is the sort of thing that you may overlook if you're trying to treat Gherkin as a programming language. If you're thinking of Gherkin as code, then this section of the file just feels like a large block comment. But if you're thinking about Gherkin as a medium for communicating with other people about your project, this free form text area can be really useful because it gives you a place to talk about things without having to squeeze your thoughts down into that step by step given when then recipe. Last thing I want you to notice about these feature files is that absolutely every word of these is expressed in terms of the domain, not the interface. If you sat down and you read through all of these features, you would learn a lot about how this organization thinks it can motivate its salespeople. But you won't have any idea what kind of app they're using to do it. So let's talk briefly about the architecture that I settled on for this application. I did choose to use Rails, but I wanted to try a slightly more disciplined approach than I usually see in Rails apps. So I organized the code into three main layers. The user interface is just standard Rails controllers and views. That layer in turn talks to a set of active record objects and some service objects. Those then interact with a set of plain old Ruby objects or pro-Ros that model the rules for the compensation schemes themselves. This is slightly unorthodox for Rails, but it's nothing really earth-shattering. If you do want to explore these ideas a little bit more, Bob Martin gave a talk in 2011 called Architecture the Lost Years. Now personally, I found this talk kind of hard to watch because I felt that a lot of Bob's rhetorical techniques actually detracted from the really good things he had to say. But if you can wade through it, it's worth trying to listen. And while I was working on this sales commission project, I also ran across a really good presentation that Jim Wyrick gave to Cincinnati RB called Decoupling from Rails. This is a talk with some good solid ideas in it. Unintended by the way. And it's well worth watching just for the main topic. But right at the very end of the video, after he finished the talk and really at the end of the Q&A, Jim said something that was really interesting. Let me tell you a goal that I have. You can do integration testing coming in at this level and test all the way down to the database and back. That one's pretty doggone fast. The only thing you're not testing is the controllers, the webby stuff, the views and things. I would like to demonstrate that if you do your Cucumber test right, you could run the Cucumber test at this level or throw a switch and run it at this level. So if you want fast integration tests, you can run them here. If you want complete, include the web integration test, you can run them at this level. I think that would be an interesting experiment. Now, when I heard that, I just about fell out of my chair because that was basically what I was doing in this project. Now, unfortunately for me, I never met Jim, so I didn't get a chance to talk to him about this interesting experiment, but I do get to share it today with all of you. So that's pretty cool. Now, my initial idea was basically what Jim had described. Using the tagging feature of Cucumber, I wanted to be able to mark a scenario as being tested at the UI, the model layer or both. And the scenarios that were tagged with UI would use Capybara to interact with the full Rails stack in the way that most people think about using Cucumber with Rails. But the scenarios that were tagged with model would run directly against the active record layer so that they could be faster. And the active record layer, I initially thought, would exercise the Poros indirectly, and I thought that would be good enough. But I discovered very quickly that that core Poro layer was complicated enough on its own that I didn't want to have to think about implementing it and mapping it to a relational data model at the same time, so almost immediately I added a core tag as well. Here's how that played out in one of the feature files that I showed earlier. Once I'd written the scenario, I would start by tagging it with the name of the layer that I wanted it to run at, plus a whip suffix to indicate that it was work in progress. Run the scenario, watch it fail. And from there, I'd drop down to RSpec and do the usual small fast TDD cycles until the scenario passed. Once the scenario is passing, I could take away the whip suffix. And if I wanted to reuse the scenario at the next layer up, I would then add another tag for that later, again with the whip suffix. And again, I'd write up a bunch of RSpec tests until I got that same scenario running at the new higher layer, at which point I could remove the whip suffix from that layer. So that's how I approached this from the gherkin side of things. But it took me a while to figure out a good way to implement this in Cucumber. I wanted Cucumber initially to run all of the core scenarios with one set of step definitions in place, then run all of the model scenarios with a completely different set of step definitions, and then run all of the UI scenarios with a third set of step definitions. And manipulating the load path was a little bit painful, but it basically worked. The real problem was that anytime I changed the text of a step, that meant that the step definitions fell out of sync, and I had to go and edit three different regular expressions in three different files and make sure they all matched, which, no. So instead, I wound up consolidating down to one set of step definitions that invoked something I called a step driver. And this has the wonderful side effect of making step definitions small and simple. With this change, I was able to then define three different step drivers to interface with each layer of my application, put all three of them on the load path so they all get parsed and loaded at the same time. I don't have to worry about that. And then I used an environment variable to decide which step driver to instantiate. Further details about this are beyond the scope of this talk, but feel free to ask me later if you're curious. I have a few observations to make about how this project went. First off, I have a piece of advice that comes to you from my inner five-year-old. The step definitions are lava. Because of the way Cucumber works, the step definitions pretty much have to be there, but they should be, in my opinion, a very thin adapter between gherkin steps and a custom driver written to interface with your application. Ideally, a step definition should be one annoyingly obvious line of code. I don't like step definitions because they exist in this sort of PHP-style flat name space. Everything, you know, there's no way to qualify that this is related to a particular topic. And they have one global scope for sharing variables between them, which is all kinds of fun. So moving all of the interesting logic out of a step definition and into a step driver then lets me use Ruby's full object-oriented tool set for organizing and refactoring that code for that driver. And it also helps me keep the step definitions so simple that I never have to think about them. This worked so well for me on this project that were I to use Cucumber again, and I will if I find the right project for it. I would write a step driver for that project even if I wasn't gonna use this multi-layered approach. And these, as I mentioned, these commission's plans, they came to us in dense, like 15 pages of legalese that we then had to translate into code. And it was just too complicated for me to understand that and hold all of it in my head at once. Breaking this problem down across architectural layers helped me, you know, cut it down just to focus on just the core logic and then figure out once that was working how to adapt that to active record and then once that was working then I could worry about having the user interface display and interact with that information. Honestly, looking back at it, this was probably the only way that I could have completed this app in a reasonable amount of time. The developers who took over this project from me, it would be an understatement to say they were not as enthusiastic about this setup as I was. But after they'd worked with it for a while, they did say some things to me like, you know, yeah, I can see why you did it that way with the implication that they would not have, but they could understand where I was coming from. And they also said it was really nice to have such clear documentation about what these business terms meant. So I'll call that a win. I do have to talk about performance. At Cascadia RubyConf in 2011, Ryan Davis gave a talk called Size Doesn't Matter in which he talked about the speed and relative sizes of various testing frameworks in Ruby. Cucumber shows up in nine different slides from Ryan's slide, Jack, and they basically all look like this. In every single metric that Ryan chose to present, Cucumber came in dead last. And when I saw him give this talk, I sort of laughed and I also winced because at the time I was dealing with that 90 minute test suite. And so when I started this project, I was fully expecting to pay a huge performance penalty. I didn't care, it was worth it to me. With that in mind, here are the numbers from this project. There were 64 scenarios that were tagged with core. They ran in under a second. At the model layer, there were 118 scenarios. They ran in under eight seconds. And at the user interface layer, I did use Capybara to drive the application. I didn't care about JavaScript for these, so I was able to get away with using rack test instead of driving a full graphical browser. But even with that advantage, these ones were by far the slowest. I had 11 scenarios, they ran in three seconds. And of course, these are the times reported by Cucumber. So they don't include all of the times that it takes me to load all of Rails. I had a rake task set up to run the whip versions of all three layers and then run the passing versions of all three layers for a total of six separate test runs. And because I was using an environment variable to choose, I had to load Rails six times. The total wall clock time for that rake task came in about 40 seconds. So yeah, Cucumber is no mini test. It's never gonna run thousands of tests per second, but I was pretty happy with these numbers. In conclusion, if you do decide to try Cucumber again, and you haven't been happy with the way it's worked in the past, I would suggest that you keep just these two things in mind. First, your features should describe your domain, not your interface. And second, the step definitions are lava. I would like to clear the stage as quickly as possible for the next presenter, so I'm not gonna keep you around here for Q and A, but if you do have questions or feedback or if you would like some more stickers for your collection, please come talk to me after I pack up my stuff. Thank you.