 Okay. Can y'all hear me alright? Alright, cool. Okay, my name is Ian Warshak. I am a developer. I work at RightScale. RightScale has a cloud platform management tool for y'all to use. I have some code that I'm going to be showing you, and there's the GitHub account that you can check it out at. I won't be showing a whole bunch of it, so you can look at it on your own. So I'm going to be talking about Rails in the cloud. The reason I want to be doing that, what I'm going to be doing is I'm going to be going over, it's mostly Amazon, almost exclusively Amazon's web services and their cloud platform and their cloud-based services. And I'm going to be kind of walking you through a simple, kind of arbitrary app that I wrote specifically for this. It doesn't do a whole bunch. It's kind of arbitrary. It's kind of silly. But hopefully that will kind of give you an idea as to if you're writing Rails applications and you want to kind of leverage some of these Rails tools. This will kind of give you some ideas how you can do that. The first thing I wanted to do was talk about, I just wanted to kind of differentiate. When we talk about the cloud, it's kind of a big nebulous term that can mean a lot of things to a lot of different people. So I kind of separate stuff into two buckets. The cloud infrastructure, which, you know, means servers. There's Amazon EC2, of course. They're the big one. Rackspace and there's the coming up and Flexyscale and GoGrid. And then there's cloud services. You know, we call them like Amazon's S3 or Amazon's simple queue services. So those are more services and not infrastructure. So I wrote this app specifically for this. And it's called Pictor. And it was a play on Flickr. And I kind of thought of it as labeling it like the Flickr killer because it can just scale. And I thought that's so stupid. That's pretentious. It was just a ridiculous idea. So that's the app that I'm going to be showing you. And it's a simple photo application. You upload a photo. The application transforms it in a couple ways and does some operations on the photo. And then it displays them on the home page. Like I said, it's not real impressive. But some of the interesting things are that the photos are converted and processed asynchronously. And again, it's a simple app. And I'm going to be talking about the technologies behind it. But hopefully it kind of gives you some ideas as to how you can use some of these technologies for your own stuff. Why did I pick this? Well, the photo application idea, I picked it because it has a lot of things that lend itself naturally to cloud computing. And some of the tools are out there, like lots of disk storage. There's Amazon S3. Pictures, you know, obviously can start to take up a lot of disk space. Lots of bandwidth. People downloading these pictures. The offline processing is because processing these photos, which I'm using image magic, can take time. It's not something you want to be doing as someone, you know, when someone uploads a picture, you don't want them waiting around for response for this photo to convert or be resized or whatever it is you want it to do. And of course, it's going to be an instant hit. And we're going to need this to scale just like right out of the gate, like everyone thinks, right? Yeah, so don't get your hopes up. It's nothing very impressive. So if I were writing this for real, why would I want to use some of this cloud infrastructure and these cloud services? And here's a list of reasons. And some of them is I don't want to build or configure hardware. I don't want to be calling a place and saying, oh, I'm going to sign the lease for 10 servers because I'm anticipating that I'm going to need 10 servers. I'm going to anticipate what my servers are going to need to be and my disk usage. I don't want to do that. I want to offload as much as possible because let's say I'm a small team or I'm one developer. I don't want to be managing disks or storage solutions. Or I want to be doing it as little as possible. So that's why this kind of lends itself to it. So how did I use to build this app? And I'll show you the app here in just a little bit. It's a Rails application. It's very small. 300 lines of code. No tests. Sorry. It's running on Amazon's EC2 servers. There's actually two pools of servers that I have. I have a web front end, which is managed by a pool of web servers that do the web app. And there's a load balancer that sits in front of them. And then there's a pool of processing servers that I'm calling, which are actually doing the work. And again, it's processing photos, but this is going to be applied to a bunch of different domains and a bunch of different problems if you have any offline processing needs. All the storage, all the photos are stored on Amazon S3, meaning I don't have to store them on my servers and manage, where do I put these files? Do I put them on servers? Do I need a SAN? All that kind of stuff. I'm also using a CDN, a constant distribution network, and I'm using Amazon's CloudFront. And I'll talk more about these technologies later. I'm using the right scale AWS gem for pretty much everything. So right scale, which is the company I work for, has a pretty extensive library for using Amazon's web services. And it's what we use for our own infrastructure, and it's what we distributed out as open source. And it's a library that covers, that's a Ruby library for Amazon's EC2, S3, basically all of their web services that they offer. We have an API or a gem that you can use to access this pretty easily, and it's pretty simple to use. So again, why did I pick Amazon? And it's mostly because Amazon, why did I pick Amazon to deploy this demo app? And a lot of it is, they are the market leaders as far as a lot of things. There's a lot of good competition, but right now they're pretty indisputable. They are the big dog. There's a lot of integration between the services, meaning if I have an EC2 server and I'm transferring data from between the servers, the bandwidth is free. Normally they charge you for bandwidth, and bandwidth between the services is usually free. And yeah, there's APIs for all this stuff, and there's lots of documentation and lots of libraries. I told you I'm using RightScales AWS gem, but there's actually a lot of libraries that are out there. So it makes it a lot more accessible if you're considering using some of these technologies. Amazon's EC2, just in case you're not familiar with what EC2 is, I'm sure a lot of you are, but I'm sure some of you aren't. It's basically servers on demand. You can basically, with an API call, you can start up a server somewhere on Amazon's data centers, and you pay by the hour, which is nice, because like I said earlier, I don't want to pay for anything up front, let's say. I just want to pay for what I need, and this is exactly what Amazon's EC2 does. One of the big caveats about EC2 is that it's not persistent, meaning that if you shut your server down, let's say you start it up, you configure a patch, you configure your Rails application, and you get it nice and perfect, but the second that thing shuts down, it's gone. It's not persistent at all. So this kind of forces you to automate your configuration a lot, which is actually good. It's kind of a pain up front, and it costs a lot up front to figure out, how do I get these servers to configure themselves when they come online? Because you can build an image of a server once it's configured, and that's not very configurable, because if you have five images for five different web servers, and then all of a sudden you need a sixth one, then you're going to have to build a brand new image for it, and that doesn't scale very well. So there's a lot of tools out there. RightScale, that's one of the things that RightScale does, but there's also Chef and Puppet, all kinds of systems configuration stuff that you can use. Amazon also has an elastic block store, which is a persistent disk storage that you can use for EC2 servers, I don't know about that as much. Amazon S3 is probably, I'm sure more of you have heard of or used Amazon S3. It's basically a publicly or online disk storage for you to use. You pay for the data that you're storing, and there's some more up there for you to read. I'm sure you're more familiar with it. CloudFront. So CloudFront is a content distribution network that I'm using, picked up to get these images and get this content to the end users quickly, just like some of the other content distribution networks like Limelight and Akamai. So Amazon has distribution servers all over the world, all over the country, and once you associate a CloudFront domain name, so if I register and I get a CloudFront domain name, I'm basically registering an S3 bucket to be available to the CDN. So I get a special domain, and any time content is served with that special domain, it's actually going to be coming from the CloudFront distribution network, which means that it's going to be closer to my users, and they're going to see the images quicker, which is something that I want. This is Amazon's... their simplest API to use, and you can see the two lines of code that I actually used to create my CloudFront distribution, which was towards the bottom, it's a little bit cut off, but all I'm doing is I'm associating an S3 bucket, which is a place for you to put your files, and I'm basically registering that as a CloudFront distribution, and I get back a domain name, which is longsetofstrings.cloudfront.aws.com, or something like that, and I can use that domain name to serve my images from. SQS, Michael was talking about this earlier, and this is Amazon's simple queue, which is a basic queuing service. I put messages in a queue, and I pull messages off of the queue. They can be up to 8k of text, and this is what I'm using for synchronizing my jobs. Once a user uploads a picture to Pictor, I create some jobs onto the queue, and the processing servers then pull these jobs off, process the images, and do whatever it is they need to do with it. One of the neat things that I think most messaging queues do is that if my processing daemon pulls a message off the queue, it pulls it off and reads it, and then does some action, and then it explicitly deletes the message at the end, which means that, and then that message is gone forever from the queue, which means that if something dies and my job doesn't run properly, that also means that the message doesn't get deleted at the end, and it'll go back on the queue after a certain amount of time, so that means you won't lose your job that you're running, or get put back on. This is Amazon's simple DB. It's Amazon's non-relational database that they offer, and again, it's a simple database, and this is actually one of the tougher ones, tougher things for me to figure out. It's a non-relational database, and it's actually kind of confusing with some kind of relational background, using MySQL or whatever, because there's no tables. What they call a table is actually a domain, and you can think of it maybe as a table or as a spreadsheet or something like that, but you can't do joins across two domains, so that forces you to really denormalize your data and keep all the data you're probably going to need for one page or for one action in the same domain, and that sounds really dirty and it doesn't sound very clean, and it kind of isn't, but it actually works out pretty well. There's also no schema, so you don't have to plan up front what you want your... not your table, I almost said table, but you don't have to plan what you want your domain to look like. You define it as you go, which is pretty powerful as well. There's some caveats to it that all data is stored as a string, which means if you're storing stuff like dates or money or integers or any kind of numeric values, you have to kind of change the way you do that a little bit sometimes. SimpleDB automatically indexes all your data for you, so you don't have to do any kind of... you don't have to run any kind of explain or analyze whatever. It kind of does its own thing for you, and it's also not... I put on here that it's not a speed demon, meaning that you have to hit the internet. You have to make an internet call every time you want to pull data from a database, so it's not that fast. As far as speed of the database itself goes, the speed of returning data, and when you run queries amongst the data, it's pretty linear, so if you have 10,000 or a million or 10 million records in there, whatever it is, as the number of records go up, the speed that it takes to query those stays fairly constant. So that's good. Here's an example that I pulled right from Amazon's website of what a SimpleDB domain may look like. So you can envision it like a spreadsheet, like a spreadsheet with a bunch of columns that you can think of as, you know, like their attributes. So you have an ID, a category, subcategory, name, color, and then size, and then we have it here, make and model, which is kind of weird. So this is for a sample store that Amazon uses as an example. And so right here, so they have a couple rows, several rows of stuff that they're selling that's related to clothing. So some kind of clothing they were selling, this store decided to sell. So the real interesting part, I think, is that with Amazon SimpleDB, for one attribute, you can store one or multiple values, which is completely different than what you would see in a relational database. Probably this would have to be a separate table that you would have to do some kind of joins on. But in Amazon SimpleDB, you can store multiple values for one particular attribute, and it works. And we'll talk about this a little bit more. But down here, it's interesting, is you can see that all of a sudden we're dealing with clothing, and then all of a sudden we switch to cars and motorcycle parts. So instead of creating a separate table like you would think you would do with multiple ones, you can just start adding attributes like make and model to your domain and fill them in as it makes sense. Here's some code that kind of ties in with the previous thing. With the right scale AWS gem, we have an active, a class called ActiveSDB, which emulates, not emulates, but it's similar to what ActiveRecord is of manipulating and grabbing the data. When SimpleDB first came out, they had its own query language that was very algebraic and kind of hard to understand if you're not used to that kind of stuff. And they're supporting more and more like SQL type queries. So here's an example. So I have a right AWS class. I have a class called item which inherits from ActiveSDB. The first thing I would do is I would call create domain which actually makes it called SimpleDB to create this domain and you only do that one time. And the next thing I do is I create an item. Item.create and then I have some attributes in here that are not predefined. I'm just adding them in there and they get added to the SimpleDB domain as I call it. So I create an item that's called some kind of TV. It's in the electronics category. Later on, in my code, I decide, oh hey, I want to add a new attribute to this and I want to call it illumination. So is it an LCD TV or is it an LED TV? I can use the brace syntax here and just add a brand new attribute to this Amazon SimpleDB domain and when I save it, it gets added. Same, something kind of similar for storing multiple values for an attribute. When I created this, my category was electronics but let's say I decided I want to change the way I categorize stuff and I want televisions to have its own category. Well then I can just append televisions to the category and it works. When I select an item using item.select which should look very familiar if you've used ActiveRecord before or at least should look very familiar. You can see down here I get my size of course is 47 but in category I get an array of values which is electronics and televisions. And you can also query against if an attribute has multiple values you can query against the multiple values and it works. Here's some of the caveats of using SimpleDB is that the data stores the string about the response cannot be greater than a megabyte of response and if it is then you get a token back which you use for your next query to retrieve the next set of results so they really limit you on that. Amazon also times the amount of CPU time however they figure it out that your queries take and you can only use a certain amount of time per month or depending on your plan but the maximum execution time for any query is five seconds so if you hit that limit then you kind of have to rethink the way you're doing stuff. The big thing about SimpleDB is that it's eventually consistent meaning if you put some records in a SimpleDB domain right now they may not be there immediately so you can put some records in and then query for those records and they may not be there but they will be there eventually and that's enough for availability. The idea is you can always write to SimpleDB it should never ever fail a write to SimpleDB should never fail of course your network, that kind of stuff can fail but the actual writing should never fail and it's just a trade-off that they made and I think a lot of the other non-relational databases that we see coming up are kind of making some of these trade-offs maybe not necessarily eventually consistent but people are starting to make some trade-offs with all the speed, every call that you make is of course over the network and for some of the benchmarks that I was doing I was getting like a hundred it was about a hundred milliseconds per call which is not bad but if you have five calls per page let's say you're displaying a page that takes five calls that can start to add up so you really have to think about this beforehand if you're doing a lot of queries. So I kind of went over all this but what does all this mean? You can worry a little bit less about scaling not a whole lot more because I do have to worry about how this all fits in together in my configuration management and automation like I said before the automation is important being able to get your systems to boot up properly or boot up and configure themselves appropriately is pretty important and if you don't do that then it could be kind of a headache the database performance is very consistent like I talked about with SimpleDB if you have 10 million records your performance for your queries and what not should be fairly consistent which is not like MySQL or whatever if you start having a ton of records in there your performance can start to degrade as you're writing to the database or whatever and that's what SimpleDB excels at S3 and CloudFront are going to be handling all my static file serving which means I don't have to store them on my servers I don't have to worry about the performance hit my web servers take from serving all these static files because they're actually going to all be done from CloudFront which are Amazon servers my investment up front is minimal and I'm only paying for what I need here's a diagram of what this application kind of looks like so SQS and SimpleDB and S3 are all my cloud services that I'm using my web servers and my converter servers which are the processing servers are on two separate pools my users are the people who are uploading pictures to picture so what happens is they're going to be uploading the picture I've even offloaded the uploading to S3 you can actually upload files directly to S3 so the old way of doing it when S3 first came out was if you were going to be handling someone's upload if they're uploading file you had to store that file locally on your own server and then somehow create a job or background process or whatever to then upload that file to S3 but now you can create your upload form in such a way with the right signatures and identification or IDs so that they can directly upload it to S3 so that actually works out pretty well so I'm really trying to keep my I'm really trying to involve my servers as little as possible sure now what happens is when you construct your upload form the form that the user is going to be clicking the upload button to it's actually going to be uploading to Amazon directly and there is your account ID which is my account ID and some other credential information is embedded as hidden inputs in that form to authorize that for authorization so when the user clicks they pull the page down the html down for me but when they click upload it uploads it directly to my S3 bucket I know what you mean I think that's on the next slide so what actually happens is when Amazon receives that file part of that hidden or part of that form that the user uploads one of the hidden values is a redirect URL so when Amazon's S3 they look for that file they look for that redirect URL and send that back to my user so what happens is the user is upload the file if it's successful they get a redirect to my servers which then in this case since I'm processing pictures I create some jobs and I start working on that photo and since it's stored on S3 I can get it immediately right from S3 does that answer your question before that it was you had to do that you had to either spawn a thread or do it in the background or something like that which kind of made it a pain so this is how pictures work so this is step one step two is once my servers receive the callback or once they receive the request from the once that redirect comes back and they hit my servers there is a picture ID of S3 so so what I do is I create jobs based on that event, that picture so in this case what I'm doing is I have a class called a convert job which is a simple it's a simple class that has a few that has a few instance variables and all it's doing is storing the name of the file and what I want it to do like this is going to be I'm going to be applying an effect to this picture I'm resizing it to 400 pixels and I want to have the paint effect applied to it I'm creating two of them and the second one is the monochrome effect applied to it so I create these job these job instances, these convert jobs and what I'm doing is I'm serializing them to JSON and I'm putting this JSON into a queue into the SQS messaging queue so that's what I'm doing right here is I'm taking this class or this object serializing it and putting it into the queue and the convert job I'll show it to you here let me see this convert job class actually I don't have it open this convert job actually has a method on it called run which knows how to do the conversion itself so yeah so this is it so I store the name of the file some suffix, the suffix of the file and the size and that's a run method which actually knows how to call image magic and do all of itself so the message that I'm putting into the queue is actually an instance of this convert job class so so that's what I'm doing there so that's step one was create those two jobs so that my processing servers can pull them off the queue and run them the third step is creating a simple DB record so in addition to creating these two jobs I'm creating a record in the database saying hey this picture is either about to be processed or it's in the middle of processing or it is already been processed so I create a picture a row in the simple DB domain and what I'm storing the image key which is essentially the file name and what I'm doing is I'm creating an attribute called total conversions with the values of paint and monochrome and what I'm doing here is I'm letting simple DB know that hey I actually created two jobs one of them I'm calling paint and one of them I'm calling monochrome and so in my code what I'm doing is I'm later on I look for these attributes paint and monochrome in that simple DB domain and the values for those are actually going to be the converted image file names that way I can tell if a picture is done processing or not so I upload one picture I want to convert it into two separate pictures and once those conversions are done so once it's done doing the paint job there's a new column in the database or a new attribute in the database called paint and the value is the name of the file so if both paint and monochrome attributes have a value to them and I know that it's done processing so here's kind of what I was talking about so before this picture I'm going to go through the process this is what the simple DB record looks like and this is coming back from the command line so the key is called upload slash F1 upload is my bucket name and F1 is just the file name that's the original file and my total conversions column contains two values like I talked about a second ago called paint and monochrome so after both of these jobs get processed or after each one gets processed it adds the appropriate file name to the appropriate attribute so after both after it's processed and converted into two separate pictures here's what it looks like I now have a paint attribute with a value and the value is the file name and I have a monochrome attribute and the value is the file name so what I can do is I can do a query saying oh hey show me all the picture do a select where the values for total conversions in this case paint and monochrome where both of those attributes are not nil and if they're both not nil or not null then I know that the picture is done if they're empty then I know that the picture's not done that there's probably the converter process or the converter daemon is still running and here's my processor daemon which is a loop that runs on a server and what it's doing is it's pulling these convert jobs from the sqsq and as it finds them it runs them and the running like I said earlier is what actually does the work in this case what I'm doing for each uploaded picture for each uploaded picture two separate pictures are created one of them has the monochrome effect added to and the other one has the paint effect added to it and there's like a little watermark that gets applied yeah that's it so let me show you the demo real quick if you have any questions right now while I'm giving this ready I'd be happy to answer them yeah so right now what it is is it's a there's one daemon per server I can show you my deployment here I'm running one daemon per server ideally it would be like probably 20 or 30 per server but for this demonstration I'm doing one daemon per server and I have two two of those converter daemons running right that's why I'm using sqs the queuing server so I put these messages into the queue and only one they only get pulled down by one by one process so once it's pulled down from Amazon sqs it can't get pulled down again yeah that's kind of what that does so I have two I have two web servers and I have two processing servers and so you can see right here I have the hostname of the server or what I'm calling the hostname and if I repress this it's gonna you know change back and forth between the server two but yeah if I show you so let me so what I'm gonna do is I'm gonna upload a couple pictures and so a quick story I did a a mission trip with my church a couple years ago we went to Guatemala and we took a medical team with us and there was a Guatemalan dentist that was doing dental work on people which there meant pulling teeth out I mean that's all they could really do and so he was practicing his English and I was learning practicing my Spanish and so we got to talking and I was asking about what he does and how all this works and he got very excited you know I think he thought I might have wanted to become a dentist or something and so he was like hey if you want to watch me help me pull out a tooth you know let me know and I'm like okay thinking that he was joking but he wasn't so he showed me how to pull out this lady's tooth and I got all gloved up in the middle of a pasture of a field and there I am he was very serious like I really think he thought I was gonna become a dentist or something and I pulled out this poor lady's tooth it looks pretty grotesque but this lady was very thankful and didn't like bat an eye when I showed up with a camera and put on a mask and started pulling out her tooth I would have taken off but she she didn't think much of it and she thanked me profusely afterwards so that was kinda cool so here's my application like I said it's pretty bare bones and what I'm gonna be doing how am I on time yeah so let me upload a picture so let me show you this form so I'm gonna upload this picture here and while we're at it I can show you this form as to what I'm talking about here's my form and my action is to picture.s3.amazonaws.com so it's gonna be posting directly to there and I have a bunch of attributes and values that are signatures and access IDs and what not so this is going directly to s3 and there's gonna be part of that is after that after it's uploaded I get redirected to my servers which redirect me again back to the homepage and so what's happening right now is two jobs are getting created and putting into being put into the amazon queue and a simple db record was created and as those jobs get processed that simple db record gets updated so let's see if this works it should well this is uploading any other questions doesn't have any what? yeah yeah it's on github and I'll show you the slide at the end my username is iwarshag so so here we go it took a long time to upload so file upload here's the key whatever is uploaded so what I have running on this screen right here is some javascript that's polling every 10 seconds to see if these pictures are if these pictures are updated and so as they become available they get put on the screen and so you can see that I have this color this colorize whatever you want to call it and at the bottom I have a watermark which imprints the name of the server that processed this image so I have processor there was two jobs and I have two servers and one happened to process one and one happened to process the other I think that's about it do you have any other questions? was it using HTTPS? I didn't even notice okay so I guess it is using HTTPS to upload to S3 I think that may be optional I'm not real sure let me check yeah I guess it was using HTTPS so the upload to S3 was secure and right in here one of this form here's my redirect I'm telling it to so this this URL is getting sent back to the user upon completion and the browser is actually getting redirected to my server and when their browser redirects to my server that's when all this action actually happens to kick the process off and the rest of this is keys and signatures of the form to make sure it wasn't tampered with any other questions? to get upload status right I don't believe so but I can't say for 100% certainty I used right scale to get these servers going that's actually not part of it that could be but I actually used right scale to set up templates what I did is I set up templates for these servers and then I launched them that's actually a separate part of it yes right so one of the things is Amazon has their I just forgot the name of it but they have a persistent EBS elastic block store you can create a block or a block of data or a block of disk drive that you can mount on any server it can only be mounted on one server at a time so what we do is we have an elastic block store block that gets mounted on our database server and all of our MySQL data is put on that elastic block store so that if the server dies then that data is still available that's what a lot of people are doing there have been some people that use S3 periodic dumps of MySQL to S3 but I think an elastic block store is probably a better idea anyone else? well thank you very much