 You folks who've been in the Ruby community for a while have watched the evolution of the sort of testing methodologies and frameworks. I mean, five, seven years ago, relative to where we are today, testing kind of sucked. We were still in the throes of discovering dynamic languages and, you know, hadn't really figured out how to make up for the loss of static type checking and stuff in the compilers. We were realizing we needed to do it. And I think now what we've done is we've sort of gone well past that into the realm of testing things that in the past, we didn't really test and we try to model user behavior with frameworks like RSpec and Cucumber and it's awesome. The thing is that the responsiveness of our applications, right, is arguably just another feature if you think about it in terms of the user's perception. So a slow application might as well not have any features at all. And so we're into the realm of load testing and what we're hoping to do here today is kind of take, there have been many, many people who have worked on the solutions associated that have got us to where we are today with functional and unit testing. And we're hoping to make a little small step forward in terms of load testing and we're going to sort of try to drop some knowledge here. We've got an interesting demo hopefully if everything goes well and we'll open it out. I guess I'll hand it off to you guys. So I'm an off-sky, but I think you, the load testing, or I'm sorry, unit and functional testing and review, you guys have really taken a step up and maybe more importantly, we can tie this to CI and continuous integration is wonderful. And I think there's, we can all agree that load testing is as critical as unit and functional tests. We've got to make sure our application limits, we have to actually care about the user experience. Further, it should also be tied to CI. The only way we can be confident about our stacks is it would be load test as part of every commit that we ever do as far as we're possible. Maybe most importantly is as an off-sky, nothing's more frustrating than a developer or a stack quality and a developer telling you, well, it worked on local, so surely it's an operational one. And then we get in fights. So luckily there are plenty of tools, maybe not as many as the unit and functional testing, but there are some. And some, you know, A, B, C, each of these with machine guns is a wonderful app from News app, the Chicago Tribune folks, where they spin up many testers, small micro instances to test some server. And you've seen this before, here's an output from A, B, lots of numbers, here's an output from Siege, lots of numbers. When you're looking at a lot of these, it's really, it's very hard to read those. And we get, maybe more importantly, we get some numbers, success errors, what have you. We can drive it via the CLI, so I can tie this into CI easily. But I'm not really measuring anything, I'm not going to hammer. I don't care how a Patrick responds, I'm fairly confident in a Patrick. I care more about going through a real user experience, maybe I'm out by the database, maybe IO or maybe log-able. You kind of do that unless, unless we actually test it. So then folks on things that's boring, there's HDKeeper, I love HDKeeper, there's also a lot of Bench which is a wrapper route in, the idea is let's wrap up connections and see where our connections fall, or our user experience falls off the table. Again, nice output, but the things that it adds is we get standard deviation, the idea that I don't care about the mean, I care about the users in the extremes, I care about the edge case. We get connections versus requests versus latency, that has one possibility. Yeah, so this is one of these things that hopefully we can, for those of you who haven't run into these things, maybe we can help you avoid a little bit of a conval. So connections is a great example of an area where you can go badly wrong if you don't really control sort of the whole stack of your testing. If you're trying to simulate user behavior, and this is something, for example, the HDKeeper professed really, really well, you want to avoid pooling connections, for example, reusing sockets, things that in fact, in many contexts, you actually want to do. I think Jeremy talked earlier about wanting to keep a connection alive as long as possible. Normally you want to do that, but if you're actually trying to simulate independent users, not only are you users not on local hose, they're also not going to be sharing a socket, except in very strange and rare circumstances. So it's an area where you want to make sure that you have control over what's going on. Do you want keep alive or don't you? Do you want to have separate connections being opened or do you want to reuse sockets? So perhaps one of the bigger problems is I'm bound by the tester. It's just like that my tester won't follow before my stack falls. So of course, let's distribute it. So auto bench moves back up, auto perk is a wonderful tool from IvoryCore, the speedy Google app, where let's just franchise out to many machines so we can have many testers to move on the server. You're a friendly specimen, probably likes playing old SSH loops. So here's an example of, in this case, we're running HDKeeper across eight boxes. I just run it through an SSH loop, to support loop through some less than just background these processes, and I get a big dump of shit on my screen. And again, you're trying to make sense of this and figure out, I've been architecture A after four architecture, I don't know. I don't know, I can't. It's confusing. You start looking at, they all run together after a while, and is that 1,414? So the idea is I push this up to chest, I curl that chest, I pipe it through some ugly curl, and I get an output that I can iterate over. So all I care about is numbers, and the concept that I can go back to the chest and look for a real specific data, but I just want something I can iterate over, and I can pull it really quickly. So some folks have extended this idea. There's Funkload, there's Song, there's Jmeter, Dan's played with a few things. Yeah, well, not necessarily personally, but as an organization, we did look at both Song and Funkload. The short version is, or not Funkload, Jmeter. We didn't really want to do the Java stuff to do Jmeter, and we didn't really want to deal with all the XML stuff. But they're on the right track, and this reminds me again of the evolution of these kinds of tool sets. Again, if we look at performance and responsiveness as an app, as a usability thing, this is really kind of just an extension of what we already do with TBD and RSpec and Cucumbers. And these are going in the right direction. We want to distribute the load just like real users are distributed, and we want to be able to model actual user behavior just as we do with Cucumbers. We just didn't quite feel personally, I'm not casting stones. There's a lot of great work that's gone into this. We didn't feel like these quite fit our requirements for the RL pair. And as a, for example, I wanted to load Funkload. It's called Funkload. But the, I couldn't, I spent more time trying to drop the documentation, and I went back to running a TBD person. There's also some folks here starting to treat load testing as a service, or some post-it Jmeter providers. Blitz.io is one that, I don't know if you kind of did, and they provide these nice cars, response items, hit rates, lots of JavaScript. But perhaps the most important thing we always have to remember is the ultimate number we're caring about is how many users are in the box, and therefore how we're going to scale our stack. Which, from that data, we infer $1 per user. So it's so cheap, like we still haven't written a line of code, and this is so easy, and it's shocking how many people don't do this. And it reminds me again of where we were five, seven years ago depending on how early you adopted this stuff with all of our functional and unit tests. Now I think most people would be sort of, would view it as a violation of best practices if you didn't, some people would even say if you didn't start with the functional or unit tests. And I read all the time, people bragging about the latest benchmark of this Ruby server, or that Ruby server, or what node is doing, or whatever the hell. And so it's clearly important for people to kind of get off on it, but we're not at the stage where we're saying, look, you know, let's start off with one of our performance goals, and let's make sure that we have a testing strategy already in place from the beginning. And I'm saying that as, you know, somebody, we ourselves, it's all we sort of, I think maybe we do it a little earlier than a lot of people, but we're still not at the point where we're defining that really upfront, we're trying to hopefully get there. And while these tools aren't quite all the way there, we can identify it much like the testing problem. It's like a higher profit setting, let's make it better. So, you give a give to you. We have the EWSC in mind, it's public, it has all these tools, it has many more, I like GitHub, there's Shane, Becker, or I'm sorry, having Enix once put up a just recently a couple months ago about how you low tested on that interesting. So, you've got a bunch of stuff in hops, a bunch of projects, it's yours. We also tune the kernel, so you're not going to get blocked by a U limit, you're not going to get blocked by, I don't know, max backlogs and tied weight or max backlogs into. And this is a product we've been going through many, many iterations of stuff that we're going to present here. And this is a product of a lot of learning that we've gone through, packaged up into, this is the AMI that we basically use when we're going to spin up a box, that we're going to do load testing, so we thought we'd share that and, you know, you know, a little bit more of a load. So, mind you, by this kernel tuning, this is not a production server. This is a load tester where the kernel will not save you. Your shit will fall. So, but the idea is, I want to make my stack fall before, so you just simply need more load testries. And you can get this from again, AMI's community repos. I'll take this up and see some repos on Spire IO, maybe some chef recipes of product for you to get that box for repos. And you should use it. And I think, once you use it, especially if you're in a cloud, you'll realize that AWS, lucky, they won't completely block you. So let me give you a, screw you. Let me give you an example. We went through a sprint before doing a minimal viable product where for about two weeks, I'm tweaking everything I can in the kernel. And we get to the point where we're doing very, we're getting very small incremental improvements and I've got numbers flying everywhere. And I started using my shit because three o'clock I could run a test and it was better than a tester ran it. And there's a moment of going like, well, then what am I measuring? How am I supposed to iterate over this? And what am I supposed to do? And here's an example. So same box, same stack, I didn't reboot anything. I didn't get into a better neighborhood. I got over twice the currency on my staff from Friday to Sunday. And I don't know what happened in those past two weeks, like what happened there. At one point, you thought you were a genius or something. And it turned out that you weren't. Yeah, exactly. This happens a lot. And you know, when you talk to AWS about this, they'll tell you ridiculous things. You're in a bad neighborhood. Yeah, noisy neighbors. But my favorite last one. So like, yeah, I mean, I didn't thank you for that information. I can use as much as you as I have seen you. I can use as much bandwidth as I want this available. And you know, the thing is, is, well, these are real, by the way, this is a, these are real responses from the AWS support. The, I mean, we're not dating it. So we realized that there's a bunch of folks on a Zen instance, we've got some buffering to the NIC, maybe multiple NICs. And we're all competing for this resource. And we're doing it all on someone else's network. Which is fine. This is why we went into the cloud for the first place. We can spin down boxes, we can spin out boxes and we trade this convenience. But the thing to remember is, I'm talking about an expensive box. It's not cheap, like that benchmark I showed you. And I'm going to see what extra large, because I know I'm going to be CPU about, but I never get CPU about because I'm blocked by the NIC. I can get a big memory box, but I'm never going to actually use the memory because I can't get that many connections into the box. So I can stretch my resource and back to the dollar free user. There's a moment where you have to stop and think like, what's the point of getting a huge service? And how can you even know this? How can you know how you're going to scale or how much are you starting to cost? Unless you scale or unless you test it, unless we get some of this data. So ops are around for a reason. We know boxes to grade, we monitor, we maybe walk our logs and we see how much we've already caused users pain, like, oh great, like I've got a user sitting around forever for a response. And ultimately, maybe the thing that aggravates me the most is, we accept magic. On AWS, we do stupid shit like, let's just bring down this box and spin up a new box, like, I don't know what's happening, like, reboot it three times. And that's, that's ridiculous. But it works. So we're going to get bots. Let's figure out, let's figure out how badly we're getting bots, and at least get some data. Like, I mean, at this stage, where let's, let's understand what's happening or and just see, see what happens. Basically, like we all, everybody, I don't know how many people here are like actively using cloud for their applications, right? So third, maybe the audience is currently getting bots. So again, one of the things that the good low testing, good agile, low testing strategy that's consistent with what we already do with our functional unit tests, at least gives you visibility into precisely how you're getting back or what likely are possible ways that that will happen. So I want to go back though. So now we're going to shift over into the developer part of the phone. What? Thank you for not letting me just ram one when no one could understand what I was saying. So I'm going to take a step back and, I want to take a step back and talk a little bit about this area. Oh, just paranoid now, that's all. One of the things we do really well with our functional unit tests, we model user behavior, we're getting better and better at that, we've got great frameworks to do that. We wanted to make sure that we could do that with our load tests as well. So one of the, one of the best ways to do that is to actually look at your logs or look at the cubes that you've already done if you're doing that kind of stuff and make sure that you're building out scenarios that in your load tests that aren't just, you know, your users aren't going to be sitting there refreshing the home page as fast as they possibly can, like you might do with sort of an AP type of test. So you've got some good potential data already there about what your users are doing, so to use it. Modeling behavior is a complex problem, but we already fortunately have programming languages that help us model complex problems. This is one of the issues, and again I don't want to get into knocking other approaches to this problem, but one of the issues that we ran into was you might want to, might think of it as almost premature frameworkization. I literally just made that up, but you can use it if you want. We're, instead of just saying let's just start with the basic, let's see if we can even just kind of script up in a dynamic language user behavior, you know, there's sort of this thing of okay we're going to have this configuration file and we just sort of ended up feeling like you know that's kind of premature for where we're at. Another issue when you're dealing with, certainly if you're dealing with what we have is you're sort of segregated the problem into our client apps and we have our APIs. A lot of you probably do the same kind of thing, and so we don't really need to, we sort of know that the client side of it's running, its scale is the scale of like a user in a web browser session. So all we really need to do is make sure our API scale and we'll be fine, and that means that we need, and if you're using REST APIs or even sort of just lightweight HTTP APIs, you really need to make sure that you have a solid HTTP file. And by that specifically we want to have low level control, and Jeremy earlier mentioned keep a lot as a great example of kind of low level control error handling, making sure that we're not accidentally reusing sockets when we're trying to simulate separate users. And then of course the thing that everybody, you know the big buzz, the event at IO stuff, we wanted to make sure that we were getting one of the nice things about having this segregated approach where you're just slamming an API instead of using some of that cucumber that's hitting like the web browser, is that you can run a lot more users perhaps on a single machine. But we wanted to make sure, you know, obviously the more that you get, the more users you can simulate on a box up to a point of some percentage if you're trying to, because you want to also spread, like your users are spread out, you want to spread out the source of the testing. So basically, you know, up to a point we want to make sure that we're getting as much bang for each box as we can in terms of load, because that brings down the cost of running our load test, which in turn makes it easier to run them more often and make them part of, say, a continuous integration or deployment process. So we're praying for the booze. We actually for our, in our defense we use Ruby on in the actual implementation of our stack. But we decided that right now where things are at node is a little more baked in this regard. There's some really interesting work. I would I'm really interested in, for example, in telling our CRS work on cellulite IO in terms of helping Ruby kind of get to this point. But when we started this effort, we sort of felt like node was a little bit further ahead and things were a little more baked. In particular, it has a solid HTTP core library. And then we wrapped it with something that we call shred, which is a very REST friendly HTTP wrapper runs as it can run as a node library, or it can also run in the browser. So here is some just some quick I'm not going to necessarily go through this in detail, but to give you an idea of the kind when I say REST friendly, you know, we wanted to be able to use HTTP sort of, you know, in all of its glory meaning we want to be able to easily set headers and deal with all kinds of different possible response codes and so on. And so that's that gives us kind of the foundation that we need for scripting our scenarios. So the next piece of that was to actually starting encapsulating bits and pieces of user behavior, and we just did it as functions. So here's an example of a just us, we're just encapsulating a very simple business testing our SpireIO messaging service. And we're just encapsulating a subscriber right here. You can see that we're saying, okay, this is the test that's finished. So there's nothing complicated or fancy about this, we just embed right in the scripts, you know, where tests start and finish. And that gives us a great deal of flexibility in terms of how we set up our test scenarios. So Harlow denies appointing this term, but I think he actually did. Now he's inclined maybe to take the credit, but depends on how people respond I guess. So, distributedly, we wanted to run our tests distributedly like our users are, the users are distributedly. And so we wanted to make sure that our tests are running all over the place. And so this, we've been working on this thing called Dolphin. And the reason we called it Dolphin was that we had the internal system that we had built was called Shark. So we figured it would be kind of cool to have dolphins attacking a shark. And the idea is that we run these pods all over the world. And we're going to run them. We're running them where today? So today we're running them from Tokyo, South Hollow, Brazil, Ireland, AWS East and AWS West. Now mind you, one of the great things about this is a pod is actually subscribing to the Spire IO service. So I'm not opening any ports to this pod. It's just a client and we're going to push data into that client. So the pods then work with an overlord through a mechanism I want to describe in a second. Right now what I'm going to do, the cool thing about the overlord is it basically controls all pods. I'm going to run and you can run, as a result, I can run the test and control the test from anywhere. I can control them from my Mac, even though I'm not really running or constrained to what my Mac can do as far as load testing. So what I want to do here is I'm just going to go ahead and tell all these pods that are all over the world to go ahead and start running these tests. And so with that segue, I will go back while that's happening. And then what we're going to do is we're going to come back over here. So this is the last test we ran. We're going to see the results show up here. But while that's running, I'm going to talk just for a second about the architecture that we're testing, just so you have an idea of what we're testing, and also then how the dolphin architecture works. So basically without going into a lot of detail, we put a lot of time in this diagram, but we don't really have time to go through it in detail. Basically it's a worker-based architecture, so we have workers. We use Redis as sort of our transport, if you will. And we have some ELB with Hoproxy for failover and load balancing, some more intelligent load balancing. And we've got these dispatchers, and all the dispatchers, the dispatcher's job, and this again goes to the point Jeremy made in his presentation, all the dispatchers, their job is to sort of process those things as quickly as possible, either turn them to tasks or return an error, and then from there the workers go. And so that way we don't ever have sort of the equivalent, especially in combination with the use of event in IO there, we don't have any longer requests to block anything. Now, so let's see how we're doing with our test. It's still cranking away over here. So hopefully you'll be able to see the chart update in the background once the results are in. So a good use case for this is we're running an AWS because we have an AWS account, and it's trivial to spend up these instances. But in a case where we have a potential customer in China, I can have pods running on client machines on customer or consumer-facing networks. And there's very little overhead. You can just have a box there, and we can actually infer how much latency is happening through each network. Maybe more easy to see is I can do something, like obviously on AWS I don't have control over V2D routing, I can do any metric, but I might be able to do something as simple as, I don't know, put from AWS to AWS. Well, the demo goblins have attacked it. So I swear to God, we ran this about 30 times before the presentation. I'm going to give it just to see how we're doing on time, Toby. Five minutes, all right. We're going to just give all the pods a little breather, more of a superstition than an actual. So hopefully this time it will. We got eight, okay, go ahead. So we can do something as simple as actually understand the user experience. Like in the case of California, I can have enough pods to get a sample sample now on Roadrunner, and I can have a bunch of those on Comcast. And it turns out Comcast has a shitty connection to Northern California, but maybe it's great over to Oregon. So let's float them. Let's float this service. Let's get these guys where they get their service past this. And that's what's nice about this particular load testing tool is, again, very little overhead. I don't actually need any ports open on the machine. I'm a client tester, so for ops, this is actually really, really cool. And so again, we, the idea was to try to get our test agile, the ground, as you can see, is a little easier to read. The architecture here uses the cool thing here is we're using spire.io to actually test spire.io. We're using the message service to coordinate. So basically, the overlord sends a message out to any pods that happen to be listed on the channel, which the cool thing is we can just spin these pods up and we don't have to click and then they're not going to configuration file or anything like that. And then when a test starts, the pods just sort of raise their hands and say, yeah, I'm participating. And we know, okay, we've got X number of pods running. And then we just kind of go from there. Once the results come back in, instead of having Carlo write a wonky, plural script, sorry, Carlo, I'm sure it's beautiful. We actually munch the stuff together. And then we do one last step, which is how the browser updates, which is we actually send out another message to any clients, any dolphin clients who are saying, I'm interested in whatever load test results you have. And when this comes back, hopefully we'll actually see it. Okay, so does everybody see that looks different than the last graph? So the cool thing here is instead of now, instead of looking at these this thing with all these numbers and getting a headache, and it's believe me, I mean, when you're actually trying to do it in agile fashion, it is actually very easy to make a colossal mistake and be like, oh, architecture A is better. And actually, no, it isn't. It's much worse. It was like an order of magnitude worse, but you just got freaked out staring all the numbers. So this makes it much easier. I can look at it and go, Hey, this is weird. Why do you get faster and higher? So this is, by the way, this is concurrency on the bottom. And we can do different variations of graph. This one does. And then the response time for that particular scenario. So we can see, based on the region, the regions are color coded, we can sort of see, you know, hey, this is about where we're at. And there's not any one region in this particular run that looks terrible. South America, let's see, we got US East is a tiny little slow US West odd week. Everyone's like right there is very slow. So that kind of goes to that you are cocked with using a cloud. So you got to run a lot of these, which sort of makes, if you're in the cloud, which makes sort of the point that it's really good to be able to run a lot of them cheaply. And for ops, I guess, it's that we can keep from. It's easy. It's easy to tie into our CI environment. And the most important thing to load testing is you got to actually load test, you got to drop the pain to execute. Just like we have with functional unit testing. One last point that we actually wanted to, we're planning to release this as open source for those of you who might be interested about how we're doing this or looking at the code or playing with it. We need to quite get to the point where we're ready to do that for this conference, unfortunately. But if you're interested in ultimately down the road, kicking the tires on this, playing with it, maybe, you know, using it for your own purposes, please don't hesitate to let us know. We're at, these are all the things that you need to know about how to get a hold of us. So please, don't hesitate to, to, to, you know, pay us if you're interested in, in working on this project. Thank you. Thanks.