 about your test suite. So just to kick things off, I just want to ask, how many of you wish that your test suite ran faster? There we go. That's almost everybody. Some people didn't raise their hands, so I'm just going to rephrase that. How many of you are very happy with how fast it runs and don't want it to go any faster? It's infinitely faster. I don't know. There you go. Okay. So some of you would be divided by zero, infinite test suite. All right. So continuous integration is great working, testing, and then committing testing code always works real nice. The problem is when we're working for 30 minutes and then our tests take 30 minutes, it's just not really a very efficient process. So a lot of people at that point are like, hey, let's build a CI server to run our tests for us. Now CI servers are useful for replicating direction environments, but this usually ends up just getting you out of sync with your actual testing process. So your feedback loop is still 30, 40 minutes behind. Some people have told me that their feedback loop is a day behind, and that can really cut into your productivity. So instead of that, let's make our tests faster. So I'm going to talk about the project. The project is a real production project. It has been running for a couple of months now. It is for a large company. This is real code doing real stuff. You're not going to see any synthetic benchmarks here. It's running on Rails. I've heard a couple of people say, oh no, no, the Rails is running here. For all intents and purposes of this talk, Rails is just a large Ruby application. I'm not actually going to show anything that's specific to Rails. I'm barely even going to show stuff that's specific to Ruby. A lot of this stuff applies all the way down to the kernel layer and file systems. We are using FactoryGrow. FactoryGrow is awesome for building objects. We're using Shuto, which is a layer on top of TestUnit, but the stuff I'm going to talk about today applies to TestUnit, Shuto, RSpec, Cucumber, even JSLint. We're using PaperClip for images on our site. It's a really great tool. And we do something called MTDB testing. What that means is that when we load up our test environment, we don't have fixtures. We have an MT database, and all of our tests are responsible for creating their own scenario in which to run, testing inside that scenario, and then tearing it back down. So just to start things off, I am going to be showing a lot of benchmarks, so I thought I would let you know what I was running this on. It was a 2.4 gigahertz quad-form machine, 4 giga RAM, and a standard platter of hard disk. I actually did all these benchmarks in one day on one git commit by rolling up all the little tips and things that I had. So these are actually pretty solid. It's not like I did them over a course of a couple months as the code was changing. These are the right stats. It's not a huge project. We did it in about two months. So we've got 4,000 lines of tests and 3,000 lines of code. So just to give you an idea if you wanted to benchmark your own test suite, and you had no 8,000 lines of code, you could look at mine, double it, see if I'm at versus where you're at. Okay, so the vanilla test suite, this is before I really did anything to it. This is where we were saying, hey, maybe we should have a CI server, and I said, hey, maybe we should just speed up our tests. It was taking 13 minutes and 15 seconds. So at this point I'm going to ask you guys to participate a little bit more. We're going to play the prices right. As I go through this presentation, we're going to be knocking that test suite number down. And what I want you guys to do now is think about how fast you think I can get this test suite to run. So think of a number, write it down. If you really want it set in stone, tweet it, and then later on you can see how good your creation was, right, or post it in the back channel, something like that. Really think about it. Each time we get a number, we'll see who's still left. Okay, so... Total tests, right. How many total tests? I'm not sure exactly. I think we had, I think we had near a thousand assertions, but I don't remember. Sorry. So okay, so at this point everyone should have guessed something that was lower than 13 minutes and 15 seconds. Is anyone getting knocked out already? Okay, all right. I figure I might as well go for the low-hanging first, right. Okay, so fast context. Fast context is more context in shoulder. In this case, we have a context that has a setup, and then we have two should blocks that run inside that context. Both of those should blocks, both assertions, are run after that setup is run. So this is also a lot like before all for RSpec, and before blocks in Cucumber. So the idea is that you're setting up some stuff, you're doing some tests, and you set it up again, you do your other tests. So all you have to do is fast context, is you drop it right on the front there and say fast context, and this is all you have to do. What this becomes equivalent to is this, which is where the setup goes first, and then both assertions run only after running the setup once. Okay? So in this case, I saved doing that setup twice, because I only had to do the setup one time to do my two assertions. So, big warning on this one, you must have side-effect-free should blocks. That's just a good practice in general. Do your work in your setup and check it out in your should block with assertions. Don't necessarily do any work in those assertion sections. That means you could even just alias context to fast context, and you should be okay. This is really the only big warning that I've gotten so far, because this does kind of change how your tests work, but if you've got side-effect-free should blocks, then you should not have a change to your coverage. Okay, so that was a quick one, really easy. I mean, you do have to go through your files one at a time, flip that around, but let's see what we get. Okay, so I'm down to five minutes and 32 seconds by combining that. The reason for this is we had a lot of heavy functional tests. So functional tests, you have a get for, let's say, the index, and then you have like five assertions like I got a bunch of users, there's a bunch of these tags in my view, it's set up this stuff, the query, that stuff, maybe it posts it, like it destroys something here and there. So by combining that pretty heavy get request into one task, with a bunch of assertions, we're able to save a lot of time. So at this point, who's still planning a game? Who have I still got? Good, okay. Don't want it to be too easy. Here's a pretty wrap. It'll get prettier. So can we do better? Yeah, I've only been here for about 10 minutes, so let's hope so. Alright, so paperclip. Let's talk about paperclip a little bit. Paperclip calls image magic. Image magic is awesome, it's powerful, it's incredibly useful, but like many powerful, incredibly useful things, image magic is slow. So when you ask for the geometry from a file to see what its dimensions are, paperclip actually shells out to the identify command to identify that file. So not only are you running image magic, but you're shelling to it. When you make a thumbnail, you're shelling out to the convert command. So as we're running these tests, we've got some models that require images on them, and we're creating all these images all the time. So there was a great talk yesterday by Han Gupta about debugging. Perf Tools was one of the ways that I found this issue, is by looking at where time is being taken in the test suite, and it really keyed me into that solution. So if you didn't see this talk yesterday, definitely check out that video. So I made a little mock, monthly hash thing I call the breaker clip. It just says if you want to know the dimension of a file to 100 by 100, when you want to make a thumbnail, I'm going to copy in my fixture file where you expect it to be. So you even have a real thumbnail there. And it's a real file on the file system, but you don't actually have to run it. So a little mock, also very simple, and the results. Three minutes and 34 seconds. So who's still here? Good, good, because that was the easy stuff. That was the, oh hey, this is slow, let's mock that out, whatever. And now we're going to get a little bit more on hard core. So of course we can do better. That was in one, two rounds. I looked at my system usage graph. This comes up in the toolbar in Ubuntu. I love this little thing. This is my CPU usage during a test suite run. I told you I did have a quad core machine, and it's really not being utilized. One core is doing all my testing, and three cores are for me to use to goof off all my test suites running. And I do really love internet videos, but even those only need one core, so I could probably switch this around, give them a little bit more of my processing power, and I could probably get by on one core on YouTube. So let's talk about multi-core testing. There are a lot of existing solutions. There's parallel specs, tickle, deep test, spec sure, and probably more that I've left off this list. There are some downfalls that I found with a bunch of them. For example, tickle, parallel specs, and spec sure, they pregroup your files. So they will take all 100 of your test files and say, hey, you've got four cores. You've got 25 here, 25 here, 25 here, 25 here. If you've got an alphabetical order, guess what? All your cucumber stuff is going on the first core, and all your unit tests are going on the last core. It's not a very good balancing solution. I'm not sure about spec sure. Oh, it's not sure about spec sure. Awesome, I will change that. They change. Good, that's great. So tickle only does test unit. So for me it was kind of like, okay, well we've got test unit, what if we want to do some cucumber stuff, what if we want to do RSpec? We're kind of stuck there. Parallel tests, which I believed was RSpec, it was local only, and it required a multiple database setup, which isn't that bad, but it's just one more thing to do. Oh, sorry, spec sure is RSpec. So I kind of wanted something that was a lot more robust and could do a lot more stuff. Yeah, so spec sure used Bonjour for networking. So it's actually only over the LAN, so if you wanted to have a cloud computing solution like tunnel ports and stuff, you also have to run a daemon on the server. This led me to deep test. Deep test by ThoughtWorks. It uses sockets, it uses multiple databases. You run remote daemons. It has a very difficult setup, and it's very powerful. I was going to put the readme on here, but it's like this long on my screen. It's like that, it's huge. But they use it extremely well, I wanted to make something that was more useful for those of us who are doing this every day and don't need a big crazy system, but would like to have some power available if we need it, so that's why we're on Hydra. Hydra does test units, it does cucumber, it does RSpec, and it will do JSLint. I think Jasmine's probably going to be a nice step there. It does active balancing, so after it's booted up all the workers that it's going to be working with, it sends out messages to those workers, gets the results back, sends out more tests, so tries to balance that as much as it can. It will even learn from your test suite and record metrics on your test times and run the slow ones first, to try and get your cores at the best balance that it possibly can. There's a simple setup. There are no sockets, there are no daemons that you have to run on the server. This is how to use Hydra in your rate file. You'll see all this is a task, and all I'm doing is adding my files in. This is one task that I added, my unit, my functional, my integration, my specs, and my JavaScript stuff. That's it. There's just one task. You just add it all in like this and you can run Hydra on your system. At this point, it'll only run on one core, so you have to configure it. In this case, I have one local worker that has two runners, which means that I have the worker corresponds to my computer, the runners correspond to my cores, so this is a dual core setup. But still, three lines of YAML, a little bit of tasks here, and you're going. One thing I want to talk about with environment loading, if you've ever looked at Test Unit or Cucumber, you'll see that they actually double-loop your environment. Test Unit, you can end up with four environment loads. You have one that boosts up your Rails project, then it shells out to your unit tests, which put your environment on the unit tests, and then it shells out to functional, and then it shells out to integration, so you get four environment loads. Cucumbers, well, it's kind of a bit better, but since it doesn't have those three types of tests, it boosts up your environment and then it shells out again. It does have a fork option, though, which can help it. RSpec comes up, one environment load. Hydra has one environment load for all of your frameworks, so it will only boot your environment once, and then it will run unit tests, Cucumber stuff, and RSpec, all within that same environment without having to remove your environment. So it does that by forking workers, so you get great copy-on-write memory savings, and that's how you can not have to remove your workers because the environment just gets copied along with it. So just a couple warnings on this one. You do need to have transactional tests, and you need to have independent tests, and what I mean by that is, this is a concurrent system. Can any two of your tests run any part of their testing at the same time as all the other ones? You need to be able to say yes. If you've got fixtures, if you have files in the database, or you have the same path coded into a bunch of tests, you'll end up running into your own code while your tests are running. But if you're using something like Factory Girl, and you've got empty databases, then you just spin up whatever it is that you want to work with. You build a bunch of objects, you work with them, you do your tests, yes, same database. So you don't even have to set up multiple databases because when Factory Girl builds different objects, they'll have different IDs. So the results of going multi-core here were down to 1 minute and 26 seconds now on the quad core. So who is still with me? Good, good. There we go, it's getting smaller. Can we do better? Of course. This is what my system monitoring looks like right now. Up top is the CPU, and on the bottom is the hard disk. Anybody know what the red stuff is? Wait. The red stuff is IO wait. So I was looking at the file systems a little bit. I'm running Linux, so I've got EXD4. I looked at the journaling options. If you're running journal data, all the data will be committed into the journal before being written to the file system. This is a very safe and excellent way to run your production servers. But there was another option called journal data write back. Here, data can be written into the main file system after its metadata has been committed to the journal. So that means that it can write some metadata into the journal saying I'm going to write a file there or something or whatever, and then go back to the process and say we're good, and then whenever it feels like it, the kernel can flush some stuff to disk. Like it says, this may increase throughput, well, and it may allow old data to appear in files after a crash. So it may increase throughput. Thumbs up, that's what we're here for. Old data after a crash. Ah, not my fixtures, right? I don't really care about that. If somebody pulls the power plug and I lose my test fixtures, that's fine with me. Don't try this in production. You never want to have to say, oh no, my credit card transactions. So please, this is for testing. Don't deploy this. So the Linux, the file system will record the access time when you read a file. That means that whenever you're reading a file, you also have to write to that file. And this is really poor in terms of performance, but it's still around because of old-school mail clients like MUT, which record when you read something by using the access time on the file system. So there's no anytime which is don't update the access time. If you're using MUT, please don't do this. If anyone here is not using MUT, this is a very nice way to improve your file system performance. So here's where it was before. And then afterwards. So there's two things in Linux here. The first one is that the red snow is a lot less prominent. But also look at the hard drive. See what the minimum amount of usage is across that test week time. Over here, we've got more gaps. We've got more times where we don't really write anything, and then we just spike the big write. That means that the kernel is able to go back to the process and say, your data is good, keep going. And the process can continue. Also, you'll notice that the average height of that blob is a lot higher over here because it's able to just keep that CPU busy the whole time. So data's not flushed. It's going to stay in the kernel's page of memory. It's going to get flushed out when it's convenient. This is as fast as a RAM disk. I tried it. The code base, the working directories, and the database itself, purely in RAM. And I did not get a speedup outside of maybe a 2% or 3% amount of bot error. This is as fast as a solid state disk. My company actually bought me one to see if this would work. It didn't, but they let me keep it. So you don't really need to drop all that money on it. Just do a couple of tweaks and you're good. However, S-listings are really awesome and they make a lot of other stuff fast. So the results of doing these tweaks. 57. So who got so bad? Good. Wait, hold on. Keep your hands up for a second. One, two, three, four, five, six, seven, eight, nine. All right, I'm here. Okay. Can we do better? They're clapping. I'm not done yet. All right, Ruby Enterprise Edition. The best way to do this is with RVM. I'm sure all of you guys know about this at this point. RVM is awesome. Ruby Enterprise Edition comes with TC mallet. It's the fastest mallet we've seen. Works particularly well with threads. So Hydra and Factory Grow. A lot of memory allocation. We're building up a lot of objects tearing back down. The database is working with all this new stuff allocating memory there too. There are a lot of threads in Forks. I'm forking to run my test processes across those worker. So in addition to TC mallet, we'll do some garbage collection tuning. These are Twitter settings. They turn it up a bunch because with Rails, you actually need a lot more memory than with normal Ruby processes. So the Ruby garbage collector is tuned for small applications to get it going quick and use a small amount of RAM. If you're doing it with Rails, you get a whole bunch of allocations right off the bat. So the first thing you do, and the Twitter did is they turned it up, my settings are anywhere between two and twenty times greater than Twitter settings. I usually end up with a couple hundred megs of memory allocated when I boot my environment, and it only runs like once or twice. So I just give it a huge slap around and just let it go. So the results of TC mallet from crew memory allocation and for garbage collection tuning. Eighteen seconds. Alright. Let's see it. One, two, three, people. How many of you have seen this talk before? Any cheaters? Oh, good. No cheaters. Alright, who gets one second? No? Sometimes I get people who see the prices right. Well, I'm gonna have this part. I've played this game before. Alright. Eighteen seconds, not bad. Thirteen minutes, fifteen seconds, down to eighteen seconds. That's 44.17 times faster, which, when you multiply by a hundred, which makes it look bigger, it's more than that. I never touched my app code, not once. I didn't change my test coverage, assuming that you have side-effect-free context in the case of best context and that you're happy with mocking out image magic. But for all intents and purposes, I was still loading my pages. I'm still looking at that stuff. I'm still writing this stuff. I actually don't like to mock. I like my unit tests to hit the database. So these are all hitting the database. What if you're in 4417% more this year? That's a big number. Think about it in a different context, right? And you could buy a few for everybody at the draft tax last night. Alright, so I did want to show, I see so many of these around. I figured I'd toss this in. I had one of my coworkers on a MacBook Pro run this, fifty-seven seconds. And actually, it's a little bit faster now because of the garbage collection stuff that wasn't run at this time. So it may even be a little bit quicker, but I didn't want to just kind of bug it. So yes. What kind of database were you doing? In this case, it was Postgres. Oh, really? Yes. On the same server as it? Yes. Postgres has a great option. It lets you do write back at Postgres level. So I have got write back going on class systems but also in Postgres. So you have a 200 millisecond write back window on it. So the performance is excellent. So can we do better? No, it's kind of cheating. So you guys win. The three of you, the big guy, we're going to stick with that. Distributed Hydro. If you can have cessation to another machine, let's say you've got another laptop sitting around on the computer, you can have a cessation. You can run a test file. Why can't it help? That's really the minimum requirement for having a computer help you out with running a test. So that's why I decided to make the minimum requirement for Hydro. I actually implemented its pipe messaging system that it does locally over SSH by just running up and down the SSH connection. So the configuration gets a little bit more complicated. You'll see there's a local worker with four runners and that's for your machine. Then what you do is you add in an SSH worker. You tell it how to connect and where your test code is and how many runners to the four of it. Then there's a sync node that you can put in your gamble file that will just use our sync over the SSH connection to synchronize your directories so that they can copy the code over and actually while it's copying the code over it gets started locally on your tests and by the time the remote worker boots up it starts helping as quickly as it can. So really simple. Results? 18 seconds. So I didn't go anywhere. And that's really just because there's a lot of overhead at this point. I crunched the numbers and it's about six or eight seconds of that 18 is rail loading. And how long does it take you to SSH and do a machine? Maybe two seconds. All right. The remote worker wasn't able to actually help a lot. So what I decided to do is get a different benchmark. I went to another project. We were having a lot of trouble getting us to be concurrent because it was using a more old-school kind of fixture style way of testing. So we threw another box out. With one machine it took eight minutes and 47 seconds. We gave it another machine, 533. So it is faster. There is a Wiki page on Hydra of success stories. We've got people who showed they were using distributed stuff to get 6, 7, 800% speed. Obviously, you can just keep throwing machines at it. So I wanted to end with this. Can you do better? That's not a challenge. I think that's just more of a call to arms. I was really surprised that I was able to do this well. I found that a lot of the areas that I look into in Ruby and Rails, somebody else has already done a better job. And so just the fact that I was able to actually do something kind of made me think, hey, we're not really trying very hard at this because I can't be this great at it. So I think there's a lot of stuff that we can do to make our test teams faster. And I think tightening that feedback loop is really going to help out with the quality of our projects. So please try to make the test teams faster, bookmark this, and come back to it. So any questions? Yeah. Do you speculate on how would it be worth doing with a single form machine? Hydra? Yes. Because it will immediately get rid of all those environment loads. So I actually created a Rails project where I had three tests. I had a unit test that asserted true, a functional test that asserted true, and an integration test that asserted true. And if you do that in just rate test, it took me 14 seconds. But when I did it with Hydra with only one environment loop, it took me four seconds. So you can at least cut off the environment using Hydra. Because people often have a lot of different frameworks, you'll have to run the R-spec framework when the Mac goes and then it shuts down and then it'll do the cucumber framework, whatever. You can run that all together in Hydra. And so that also gives you moves. Although honestly, Hydra's probably the more difficult of these solutions. So I would definitely suggest that you start at the front of the presentation and work right as back. Anything else? Oh, I've just been cut off. If you want to talk, we can talk afterwards.