 Well, hello there. My name is Mike Panisi and I work at Boku. I train people on testing front-end applications. In this screencast, we're going to look at how you can use Amazon Web Services to run a distributed stress test. Before we get started, I wanted to point you to the article I wrote for Boku.com that served to inspire this screencast. So this has a lot more detail about specifics. It has a lot of references that you'll want to check out if you decide to go through with this whole process yourself. But this screencast should be good for you if you just want to see, generally, what the process looks like and what it might mean to stress test your own app. So we'll get started looking at the Bees with Machine Guns tool. This is a tool written in Python by the folks over at the Chicago Tribune. So it's hosted on GitHub. It was built to work over the Apache Benchmark tool. And that's not quite appropriate for NodeApp. So I've forked the project and generalized it just to work with arbitrary command line commands. So you can use this fork of the project to work specifically with your Node.js app. And that's what we'll be using in this screencast. So it's defined here as Bees. And you see we have a bunch of commands that we can use. Right now we're interested in starting up EC2 instances. So we're going to start up, I think, 10 instances we'll do and have them run our stress test, our client simulator. So that means we're going to start with this Bees up command. And now, actually, Bees up takes a lot of arguments. So I'm going to cheat a bit and just paste these in. I'll step through what they mean, generally, because it takes a lot to tell this tool what it is that you want to do. So here you can see that we're saying where our key is. This is the key that we associated when we created our instance, the security group that it belongs to. And we made this so that we could SSH into these machines. And this is how Bees is operating under the hood. It's SSHing into each one of these and forwarding our commands along. The zone that they're all in, the Amazon instance, the machine instance ID. So you can get that from the dashboard here. We go over to AMIs and we'll see that it has an AMI ID. And so you can imagine if you had a lot of AMIs, this would be really important to be able to specify which specific AMI you wanted to use. So the login and then finally the number of servers that you want to start up. And so, like I said, we'll do 10. So I'm going to go ahead and start that. And so now they're all cutely loading their machine guns. We can go back to our instance listing here and refresh this page. And we'll see that indeed, yes, 10 new instances are starting up. They're all pending. And so this is going to take a few minutes while they all kind of get started. So we'll wait for those to be ready and then keep going. Okay, so they're all started up now. We're ready to go. Note though that we're not actually doing anything yet. They're just kind of just sitting there idly, not attacking anything or shooting anything with machine guns or anything. They're just up and ready to respond to our commands. So our server, which we have two windows open here onto our server to kind of keep track. Our server is here unsuspecting. No idea what's about to happen to it. The bees are pretty much idle right now. So what we can do to convince ourselves that this is working and the bees are actually going to do whatever we tell them to is we'll just do this kind of test command to see what that looks like. So we'll use bees exact this time. And you can use help to find information about any of these sub commands. In this case, we'll do bees exact. We're going to put the results into results.txt and then a dash. And everything comes after the dash as the command. So I'll put it in quotes and we'll just do date. And so this command just prints out the current date in terms of nanoseconds. So I'll send that command out to all 10 bees. All 10 bees are going to execute that command and then print it to standard out. That standard out is going to pipe back through SSH to this local machine. And then this machine is going to append all 10 results to the same file which is, as we said, results.txt. So I can take a look at results.txt and see that yes, all 10 of the responses have been added onto this file so there are no slouches out there. All those bees are hard at work and ready to do whatever we want. In this case, it's going to be to attack that server. And so what we'll do is get that server ready, is running the server itself, or running the node process I should say. So we'll change into our project directory and from there into the back end code. And I'm going to export an environmental variable. This is just kind of specific to my server. You don't have to worry too much about that and run server.js. All right, so the Socket.io service is started. Over in this tab, we'll use SAR, as we covered in the last screencast, use SAR to track system information over the course of this stress test. So SAR, we want information about sockets. We're going to print out to results.sar and we want to do this once every second. Good, okay. Looks like we're idling at 9 open socket connections. And now we're ready to initiate the attack. So we'll use bees exec. This time, we'll actually have this do something meaningful. So what we'll do is change into, each bee is going to change into the project directory. Specifically, they're going to change into the back end into the stress testing code. From there, they're going to use the forever utility. This is a globally installed NPM module that backgrounds a node process. They're going to start it and the client code is served from the client directory. So my client simulator has a number of command line flags that I can pass it so that I can be a little bit more flexible when I'm running the tests and change some of the parameters. So in this case, I'm going to specify how many clients I want each one of these instances to simulate. So each one is going to simulate 500. And since there's five instances, we should expect to see a total of 5,000 connections coming into our back end service. And then finally, I'll specify the number of seconds that these connections should be dispersed over. Without this, all the connections will be remade immediately and that's a little bit unrealistic. So we'll have these connections occur over the course of 10 seconds. So recap, we have our server running with a node service. We have our statistics collecting. So we're ready to go. I'll initiate this command and looking at the number of open socket connections, we can see that this number is steadily rising as each one of those instances attempts to connect 500 clients over the course of 10 seconds. But it's already been more than 10 seconds and we still don't have the 5,000 that we're expecting to see. So the best explanation I can come up with for this is that our server, as we covered in an earlier screencast, our server is running on EC2 itself. And our server is a micro instance and I'm guessing that it's not really intended for this heavy usage. To its credit, it's trying to handle all these connections and although we can't be sure it's not dropping them, it is steadily increasing the number of open socket connections. So there's definitely some sort of bottleneck going on here. So really our stress test is, how should I say this? Our server is failing but our stress test is successful and that we've demonstrated some sort of bottleneck in our system and we're now through SAR collecting a lot of data that we can now come back and pour through and look at what is it that is limiting and that will give us some insight into how we can better our code and our server to handle the connections that we want to handle. So given that things look about done here, what we'll do, oh one thing, we can look over at the node service and kind of see that it's freaking out. This is a little bit my bad because I didn't turn off debugging and I highly recommend that in a production environment you turn off socket is debugging information. But that's alright just for this stress test and just for our kind of learning purposes here. So now that we have our data collected and we want to start analyzing it, we should probably stand down these bees and there's two ways we could do it. We could just kill all the instances and then they would all be shut off and that would be done. Or we might want to leave the actual EC2 instances on and hanging around so that we can perform a few more tests before we're truly done. So we'll go through that step first. So that's just going to be another call to bees exec except this time we're going to use the forever utility. Which again it's installed globally so we don't need to change the specific directories for this. And we're just going to say stop all. So now all processes that we're back rounded with forever will be killed. We'll go back over to the server and watch as the number of open TCP sockets drops, drops down to around 10 maybe. See if I'm right. Alright cool, 9. So we're back to idling at 9's open socket connections and this says calm down bees. It's no longer serving anyone. So now we'll say that we could run this command again. We could change some of the parameters and get more data but we'll say that we're done. Probably the most important thing you do when you're done is you actually stand down the instances or stand on the bees or turn off the instances. And so this is a very easy command from bees with machine guns. It's just bees down. And so it just stood down 10 bees. We can jump back over to the control panel here and hit refresh. We'll see that these are now all shutting down. So cool. It'll take a few minutes before they're listed as terminated and then don't freak out because they're going to hang around for a little bit. As far as I can tell this is just for kind of accounting purposes to verify that yes you did run these instances and yes they are shut down. And so then in about like 20 minutes they'll actually be cleared out from this view here. So from here you would basically just cancel your statistics gathering. You'll find that your results are all here. You can use SAR like we talked about in the last screencast to pour into this data and start playing with the numbers and seeing what is going wrong. And that just about covers what basically the stress testing procedure looks like. So as I said at the beginning of the screencast I recommend that you check out this article for more detail and more references. And if you have any questions about the process then feel free to leave a comment at the bottom of it. But that'll just about do it for us today so don't forget with great power comes great responsibility. Use it wisely and have fun.