 Good afternoon everybody, my name is David Zarnicki and thanks for sticking around here for the last session at Ruby in the West. I think by virtue of the title of my talk I was probably destined to have the last slot here so if I ever reprise this talk I'm going to have to certainly change the title so I can go up first. I'm going to be talking about Ruby and Rails and how we use that in the game industry and some of the challenges that we face in making that work well. Before I get started, thank all the presenters for coming up here but I'd really like to if we can all give a round of applause to the organizers on the side. You can find me on Twitter at ZarnickiD if you're interested in video games or Ruby. I take pictures of a lot of food sometimes at Twitter about beekeeping if you're interested in those things and follow me but don't be obliged because I'm up here talking words at you. I work for a company called Elor Games. We are in upstate New York basically two and a half hours direct north of New York City. We got our SCART developing community sites for video games whether it be a handheld DS title like Transformers or Transformers 2 all the way up to console titles on the Xbox and PS3 and we also have a middleware product that we develop that game companies can integrate into their games to pull data out of games. We do interesting stuff with it and then we can feed that back to the game. We can feed that back to a community site, to the web, to a mobile device. Right now our middleware platform is Python based but really the ideas of Genesis in the work came out of all the Ruby and rails that we were doing and still continues to this day for certain titles. So we've worked with titles you may have heard of Call of Duty World at War. This year we worked on Brink Mortal Kombat Batman Arkham City. So you've got the Batman sticker on your laptop and we've been involved with the entire Guitar Hero franchise since the beginning. So again many of the ideas that I'm going to talk about are kind of worn out of what we had to do to scale Ruby, to scale rails and services that we have and scale our infrastructure. Unfortunately this isn't one of the titles that we work on but I'd really at some point like to get in contact with the folks at Covella to work on the Safari series and not just because it's the the meme of the Ruby Midwest Conference. We're also a part of Major Me Gaming. If you are into competitive gaming, competitive Starcraft, competitive Halo 3, Halo Reach, then you may know MLG as the MLB or NHL or NFL of eSports and we handle all the development for their online properties like MLG TV, game battles we can go online and compete against other gamers and all their all their vibe events. Just to give you an idea of some of the data that we're handling and that's going through our network. 60 million plus players across the different titles that we work with. In 2011, 1.3 petabytes of data and we'll probably be somewhere close to I think 2 by the end of the year as as numbers go. Kind of an interesting little Twitter byte here. We talk directly to our rail stack for all of the Guitar Hero titles less than Guitar Hero 5. So Guitar Hero 3, Guitar Hero Van Halen, if you're playing on the Wii you are talking when the you finish a song or when you want to get the game code to link your game, your account in your game to the community site, all that's talking directly to our rails infrastructure. So kind of awesome, right? We get to write rails and then in some small way really shape the way the online experience happens for a major title and a major console. So one of the problems that we ran up against was scaling rails and I think that the approach that I took in this talk, which was quite a bit different from maybe a similar talk that I had given two years ago at RailsConf, was to talk about a few problem areas and really talk about our approach of how we went about working with different services or working with large amounts of data and hopefully you'll be able to extrapolate the ideas that are presented in the talk and use those to come up with solutions or approaches for your applications because quite honestly every application, every code base is different. So the needs that you're going to have in terms of scaling are not necessarily going to be the same that we have. So let's tackle the easy stuff, right? You should Google right now if the internet is working for you, Google for Ola Beanie just at scaling. So it was a really great blog post a couple years ago that he wrote about the fact that scaling isn't present in any language. So you can't look at the might code for Java or what gets generated in the Ruby standard libraries. There's not a module that you can add. So I'm going to add the scaling module and all of a sudden my Ruby is magically going to work faster, better, harder, stronger. It's just not a thing. So we have to be smarter about how we approach scaling. For us in the video game industry, it would probably be okay if we could cancel all of the major holidays, if kids were in school every day of the year, if we didn't launch titles at all. But quite honestly, that's not the case. People never stopped playing video games. Data is always coming in. So just to kind of drive home the point of Christmas being canceled here is quote from our CEO. After Guitar Hero 3's launch and the associated influx of data that was burying us, I remember telling a random person that had just blocked Guitar Hero 3 at GameStop not to go online. You cannot do that. Okay? That is that is not within the realm of possibilities. People are going to play a game. People are going to want to use your site. So I think the point that I want to drive home with this slide is when your company may go under when your product is not having a successful launch, you are going to find the solution to scaling your systems that works for you. And guess what? That's fine because you haven't necessarily done it the wrong way. Again, hopefully with some of what I'm going to talk about here, you'll be able to take the approaches and apply those to your challenges. So I'll talk a little bit about process not too much because I don't want to inundate you with that. So the problem is we need some process for how we go about developing our software. And it's nice if you can actually start with nothing and then build up to add in the processes that work for your organization, whether it be standups or how you do flow in development, like GitHub flow, whatever it is. These are just a few that I've picked out that have kind of worked for our team. So I'd like to do things early. When I deploy, I'd like to deploy early in the day, like to deploy early in the week. When we were launching Guitar Hero, Aerosmith, and when we launched Guitar Hero World Tour, Activision wanted to do midnight launches on the weekend. That sucks. Because you're up all day kind of getting ready for the launch. And then you have to launch happens at midnight, you click the button, you deploy, make sure all your servers are in order. And then you're up for another two or three hours fixing whatever issues may have come up from from actually launching. So if you can, it's nice to be able to launch, launch early, so that you can fix bugs, or do would make whatever changes you need to make with your head. I can't stress enough the that documentation always is key to any large project. Guitar Hero as a Rails application is a single application that handles all of the Guitar Hero titles. And after or starting with Guitar Hero Metallica, I really made a concerted effort to codify that into a document. So it lives in our version control. And it's called creating a new Guitar Hero title dot txt. And it's about 13 or 14 steps of all the things that you need to do to the code base to, to bring in a new Guitar Hero title. So it was great because for Guitar Hero greats hits for Van Halen for Van Hero and for Guitar Hero five, I just followed the steps in that document. And usually what it was was I got an Excel spreadsheet from the developer of the game, sometimes Activision itself, to say, Hey, these are the 96 songs that it can be in Guitar Hero five, did a little munging of the data, get it into our configuration files. And usually by the end of the day, I had the title ready on staging to be able to play one of the builds of the game and be able to see stats being collected from the game. And so anybody in the organization could use that document and basically do do what I did. For us, a lot of what we were doing with community sites. And I guess the web in general is that web applications are never done, right? A new browser comes out. There may be incompatibilities with the latest release of Firefox. So you're constantly making changes to, to your software. And it's easier for us as a middleware provider, or as web developers to, to make those changes. It's very costly for games to do title updates. There's actually a bit of code in the Guitar Hero application for Guitar Hero World Tour. We interface with the third party for, for doing leaderboards on the PS3 and 360. And the identifier in the database for pulling out the song for living on a prayer is different from the song identifier that we get from the game. So we have to kind of do a little data munging in the application to actually make that work. And it's cheap for us because it's a couple of lines to, to change the, the identifier that would probably cost tens of thousands of dollars for a game developer to go through QA, go through recertification, push out the patch to, to the world and make that available. So as web developers, we can be more smart and have to fix problems for, for other folks. I definitely find that employing overhead. This is our VP of production. And they, they provide just a valid role as developers do in the organization just as designers do in the organization, and that they are managing the client. I could spend part of my day working with the client, kind of interacting with them. But, you know, that's, that's their forte, that's their expertise. And so having them around is just as valuable as the, as the work that I do. Talk a little bit about game integration. We get a lot of data all of the time as, as you can, as you may have seen from the numbers that, that I, that I displayed before. So how do we process that data? Because we have to process it all the time. In 2008, when Guitar Hero World Tour launched in the fall, those people didn't end up buying the game, or waited to buy the game until Christmas, because it was like a $200 bundle to actually buy it. So on Christmas Day, our systems crashed and burned because of the way that, that we were handling data. And I remember driving back from my parents, I would head back home, a couple hours north upstate, and got back, saw that the systems were basically not doing anything, and spend the entire night getting to a nominal state so that we could, that we could move forward. And that wasn't fun. So we had to re-engineer our approach for starting with Guitar Hero Metallica to take what we were doing in this weird, this weird threading and forking and collecting threads and sending them to the application server and doing this weird data processing. So we got smart. And it's like 2009. And there are things like queues that we can use to basically shunt all the data processing and actually have a pipeline for the different games to be able to take data from the game and have workers that will turn through that all the time. Guitar Hero has been running with Sparrow. It's an old queuing system that's that uses memcache. We built a simple controller that allows us to monitor the events. You're probably using rescue right now to do job processing. We use that now as well. We got smarter because it's awesome. But why is it awesome? RescueWeb. You have a view into what is happening in your system. And the more you can monitor and measure things in your system, the better you are to be able to handle and know what you're going to need to optimize for. So what do we optimize around? We have to optimize optimize around school vacations. So the summer, obviously Christmas, obviously, we can't cancel those holidays. So we need to we need to do what we can to be as performing during those those times when we're going to see more data. I'd like to think that all queues are equal, but really they're not. What we found is that when a new game comes out, people are always playing the new hotness. So we will throw more processing power when a game launches to process all the data for the latest title, because people are not going to be playing the older games. One big problem that we have is is leaderboards. I remember interviewing and getting asked, you know, how would you implement leaderboards? And so one approach that we used for quite a while was my sequel. And that will take you so far if you have a few thousand scores that that you need to collect. But obviously, for all the titles for, you know, a game like Guitar Hero, we have seven different titles or eight different titles, I think it is. And then every single song in the game. And then for games that have the separate instruments, you got to keep all those. So it's a morass of data. And, you know, like what's happening in in the picture there, the my sequel leaderboard code that we have to look at it, I just want to shoot myself in the head because it's so convoluted. So short interlude, it was this past year on New Year's and went out to dinner and decided to take a cab. And when I got done with dinner after midnight, called the cab company and tried to get a cab on New Year's like not going to happen. So they said call the the free ride. So free ride, free ride, it's like, yeah, we'll take you back home. That's fine. And I ended up riding the short bus back home on New Year's. And I woke up and I was like, it's 2011 now. And we don't have flying cars. And I just rode the short bus home from dinner. So I'm going to make something awesome for the universe. So so leaderboard is is is using red as sorted sets to to do leaderboards and we kind of migrated our infrastructure to to use to use that. Those two images were from Cobra, probably the greatest alone movie ever made. I mean, we can argue that after that ballroom, but it really is. So external services problem is you need to integrate with third party data services. So how do we do that? Well, and how do we do that internally? So we can use brute force, we can force ourselves to be performing. So we cap our unicorn requests at 30 seconds, 30 seconds is is a long time for requests to be to be happening in your application. So if the request is going to take more than 30 seconds, find a way to get that data back to the client back to the web in an asynchronous fashion. Because you're just going to kill yourself. And you're not ever enough processing power if you just let requests run on and on without imposing some some limits on on their execution. Timeouts. Beginning of this year, we launched a site for an LG called my MLG was basically like Facebook, but for an LG where you could go and give your status, we post status from game battles. Hey, this person just want to match. So when we launched, I couldn't keep the site up for more than a second or two or or at all. And it turns out, game battles that we talked to to actually render some of the information on the page was being DDoS. And so when I went and talked to one of our production folks that about four in the afternoon, he's like, Oh, game battles is being DDoS. Oh, I would have liked to have known that at 9am, as the site was crashing and burning. So if you can impose the constraints on actually talking to services. So I think the default when you are using like net HTTP to talk to a service is like 60 seconds, way too much time. Start with something like 10 seconds and slowly ramp back and see, you know, what is the what you need to do to not kill your systems? Obviously, caching data, we cache queries, cache view snippets. That's, it's really easy to do. Monitoring and infrastructure, I probably can't stress enough that you do need some some way of looking at the overall health of your system. So, you know, I won't be talking about here. Here's just a few approaches that that we use. We started with monitor and our systems folks have kind of gravitated towards using run it. And so it's just another tool that can monitor a process, look for it being errant in terms of memory usage, or it not being started and actually started up for you. It's great to have graphs of the state of your system. So we use mune and it has a lot of great plugins or just out of the box will monitor a lot of things about your system. And you'll have different thresholds for seeing kind of where the state of your system should be. And maybe after you deploy, you take a look at the mune and graphs and go, Oh, for some reason, our processes or our forks are spiking. Why is that? We shouldn't investigate that. We also do infrastructure validation. And it was an interesting exercise that one of our systems guys did. He actually wrote a whole set of infrastructure validations using cucumber. So it will check for things like is my SQL running is my SQL slave running? How far behind is your my SQL slave five minutes is probably okay. But you know, if it gets to an hour behind, somebody should probably check as to why the why the slave behind is cash running. And so, however you want to do that, it's nice to be able to know what's going on and continuously run those validations to know that everything's are operating normally. We had this by this in the ass, Tuesday of this week. So the machine that we have running redis men cash and our mail server, shit the bed and all of our applications. Instead of using the name service like redis dot internal, whatever, we were exposing the machine name directly. So moving to a name services approach, where you have internal DNS names to point to those services, just helps helps us now and that if a machine goes down, we can retarget the service to another machine. And then anything worth doing once is probably worth doing a second time. So when they built the desk started and build one able to. So if you have the ability to, to have a backup infrastructure, that is great. Let's see. Does anybody follow starcraft to and kind of the barcraft? Okay, so not many people here had a little treat for really Midwest folks. So we've got MLG province, which is our pro circuit event that we run throughout the year. And the big finals are coming up in three weeks, two weeks, middle of November. So the first 25 ads with that, I will give you an HD code to watch the streams for free. We can watch competitive starcraft, competitive halo. It's actually really interesting. I didn't think that it was going to be that compelling, but it's just fascinating that you know, these guys make a living playing video games. And I think how much more time do I have for questions? So two minutes for questions. But thank you very much. And happy reunion questions. I'm sizing why you talk about storage and players, how many simultaneous if you get simultaneous, how many simultaneous do we get at peak? Let's see for the middleware platform now, I think we can handle eight or 9,000 requests a second. So far, we probably have capacity that would far exceed any kind of launch, you know, even for an even for an upcoming launch like modern warfare three, you know, I think we have enough capacity to do that. Probably on average, we have maybe, I don't know, 200 to 500 requests a second internally that were that we're processing for game data. Failures, let you learn. How did you maybe it's just a misperception, but it seems like there are a lot of failures. And you are working with other companies. How do you handle that? And like, was that or was it just not a very big part of the big picture, like we're just hearing about that stuff. Lots of good stuff. I guess, okay, so the question is, I talked about a lot of failures. I guess, to go back to what Joe had talked about earlier, is that I always have a positive spin on what we can learn from those. So, you know, we failed initially in just, we didn't have enough capacity to to handle the data. And then we learned from that to, you know, not have crazy town to actually process all the game data. And we got smart about how we could do that in a cute fashion. So you just have a good relationship with the PS, and that's only with the other companies? Oh, yeah, we definitely have a good relationship. I mean, if if we're not successful, then they're not successful. So they they understood by virtue of sending folks out for two weeks to kind of oversee you, like, are you doing the right thing? Yes, we're doing the right thing. But, you know, it over time, we got to the point where it wasn't an issue in terms of handling the data or doing the integration, you know, they trusted us at that point, after, you know, the initial hiccups. Anybody else? You speak to your hardware capacity and structure, how many games are running on? Yeah, I can I can talk about the really quickly. For the past month, as far as the MLG side of things, I was just doing a stack smashing. So we have all these VMs running a rack space, probably over 100. And I took that down to two physical boxes. So physical hardware is great. If you have the money to throw at that. And we have the money to throw that problem. So, so physical boxes are are awesome. I think it's, you know, two 24 core machines, huge gigs of memory and space. So, so we know our IO is going to be set. We don't have to worry about kind of IO contention. I guess we still do with, you know, application is errant, but for the most part, that's, that's kind of a nonissue for us. And I think that's it for the questions. Thank you very much.