 So Today we're going to talk a little bit about MIRB and why it may be cool or why it may not be cool My name is Ezra Zygmuntowicz. I work at Engine Yard. We build clusters and stuff Louder Okay We're gonna I'm gonna go through a couple of the guiding principles of MIRB development And then we're gonna take a look at some of the more advanced features So one of the main goals here is to prefer simplicity Over magic as much as possible simple code Runs better. It's easier for other people to understand and when you're building Framework style code that lots of other people are going to use you really need to keep this in mind So, you know, none of this really special tricks symbol to proc Returning alias method chain all this kind of stuff. I don't believe belongs in framework code It's fine in your own application because you're in control of it But if you're gonna make this big ball of code and handed other people to build on top of you You need to make it pretty sane When in doubt benchmark and profile like, you know, MIRB is fast and it's fast because I haven't made just like Random assumptions about how Ruby works of actually benchmark and profiled and you can't really get performance Numbers better without doing this Because you're almost always gonna be wrong if you're guessing at what's taking a time in your program Another big important issue is knowing your runtime and how it acts like Ruby is a real complex beast And so there's all kinds of little corners of the language that you might not expect So I like to I have like a little Folder on my on my Mac here where I keep all kinds of little little, you know Little idioms you can do in Ruby a number of different ways and like a benchmark of them all so you kind of can get a picture Yeah, I'll put it up on my blog after this and then there's there's some of them simple benchmarks in the Yep repositories will you can see So this is my big motto no code is faster than no code Tom kind of stole my thunder on now and already but You know if you keep if you just have a little code to possible. It's gonna execute faster than some big behemoth Monster, you know monuments of personal cleverness So let's talk about why MIRB, you know, we're all hackers here MIRB's all about no sausage like you should be able to go look in your framework and figure stuff out and hack it So to bend it to your will Framework should not in my opinion be a black box that people never go spelunking in because it's scary Also big factors don't leave any broken windows like if you you know as a start as a project gains momentum There's all kinds of people committing to it. You're getting more and more code in there You know if you start to leave little things that annoy you or little little parts of the code base that you know Could be cleaner or need to be tested better or or whatever That builds up and you build a debt and anybody else working on the project ends up going oh well It's kind of janky over here. So I don't mind if my code is janky, too, right? You got a So MIRB is all about web services. This is kind of what I originally wrote it to deal with is you know file upload services little rest servlets all kinds of different services a Lot of people using MIRB and products right now have you know Their main app their main UI tier is in a rails app and they have MIRB Services on the back end that do certain things and it's a MIRB really efficient at making small Memory footprint servlets. So you can have a lot of them doing different things This example here hopefully the code is big enough This is a MIRB's provides API, which is a bit kind of a different take on the respond to Format thing that's in rails. So in your controller, you can declare that this post Controller provides JSON YAML and XML and all controllers provide HTML by default So what happens here is up at the top? We're registered or registering mime types. So for a YAML mime type, which will be if the accept headers come in as application X YAML or text YAML then Okay, I'm getting ahead of myself see how we're finding a post and we say display post and that's based on what we the mime types We provided so when a request comes in say for YAML with the accept headers set to what's in the top line of the YAML We're gonna display post will call to YAML on the post object same thing for XML or JSON or whatever So it's just a nice simpler way to declare that this controller provides these formats and Based on the accept header will automatically call the right to whatever method on your post object So this is pretty clean cleans up a lot of the the code that you do with response format and Another cool feature of MIRB is we use parse tree and Ruby to Ruby on server boot To walk all of your controller classes and any methods they mix in for modules or whatever look at what arguments They have like if you'll notice this action takes an ID parameter So rather than just having the magic params hash that would have params ID or params user or whatever in it You still have access to that but you can define your actions You know every show action usually just looks at params ID. So here instead what happens is on server boot We walk all your controllers find out all the arguments to their methods and their default values And memoize that away in a hash. So when we're doing a dispatch For this post show action MIRB will know that it takes an ID parameter And it will stuff the ID in there out of the out of the either the query string and the post body or whatever so Just a little bit different ways of doing things it makes tends to make things a little bit more ruby-ish in my opinion Another thing about MIRB is the return value from your actions is what gets sent to the client So there is no auto render you're always gonna have to call render Or display or like you can just return to string and that string will get sent back to the client Or you can return an open IO handle and that will get streamed to the client or you can return a proc object that takes the response and Then you can directly write to the response and do a bunch of other stuff that you might have trouble doing If you didn't have such direct access There's a bunch the returning a proc kind of ties into this next stuff There's a bunch of advanced streaming technology in MIRB like this is a simple example of Like you could run this MIRB app and watch MIRB tail its own logs so This is a render chunk function that basically uses a transfer encoding chunked to send Could keep the connection to the client open and send little chunks down the stream So we call render chunked we IOP open the log file And then we get a line and we send it down and that can that'll keep the stream open and just allow you stream Chunks that works pretty well for doing you know Ajax progress uploads You can stream chunks of JavaScript down to an iframe and have them execute when they hit Another advanced streaming technique a lot of people are using s3 these days for file stores and if you have Say a file on s3 that is private that you need to authenticate to get but you want to use your MIRB application To authenticate the user first before you let them download the file with this stream file method will let you do is User hits your site requesting the file you authenticate them through your normal standard authentication mechanism And then we call this stream file method tell it the file name the type in the content length And that yields a response to the block and then we can open an AWS s3 object stream and then That'll each chunk that comes out from s3. We can write it directly to the response So what that means is you could have a MIRB app as a proxy between your client and s3 Authenticate in the middle when the download starts it streams directly through MIRB all the way to the client rather than You know waiting for the download from s3 to hit your server and fully finish and then start downloading the client So it makes a big difference for any kind of streaming stuff you need to do. It's very flexible MIRB is all based on rack, which is a web server extraction library kind of like modWizgi for Python And it basically distills a web request down into a proc or any Ruby object that has a call method That takes an environment hash and the environment hash contains all the standard CGI headers And it contains the body IO stream if there's a post body on the request So with this abstraction layer we get to have the same code run on any of these web servers and This is a simple example of a proc that could be a web app Basically with rack you have anything that responds to call that takes an environment argument And then you return a tuple with a three element array of the status code a hash of the headers to send and Then the string of the body or if it's not a string anything that responds to each and yield strings So this is a very simple example of a rack application that returns a 200 okay Content type text XML and then just sends back the inspected CGI headers about as simple as you can get In all MIRB applications if you look there's a config rack.rb file that just has a simple one line Run MIRB rack application new and what this does is this sets up the MIRB rack application that has its call method That does all the dispatching into the MIRB code and it's simple. That's what This is the default scenario, but if you have some more complex things you want to do Having rack like exposed like this in your MIRB app is really powerful So this is a kind of a trivial simple example Say we have some kind of API in our application for XML or for RSS feeds or for JSON like this and It's you know, it's a significant amount of our traffic doesn't require any of the UI tier of MIRB or anything so we can just Stick this little API handler in front of the MIRB rack application and any requests that come in So here's what we're doing. We've got its API handler When you initialize it it takes in a rack application, which is our MIRB rack application And when you call it it sets up a request object checks the path against API star and then if it find if the path matches it returns 200 okay Content type text JSON and then calls our API get JSON on the match from the request URI And if it doesn't match the path it falls back and calls our MIRB rack application with the environment And so rack has an idea of middleware and a bunch of different handlers and stuff So if you look at the bottom here where it says use API handler run MIRB rack application new What that does is it automatically instantiates the API handler with an instance of the MIRB rack that you ran and Then sets that up as the default environment to be called on a request So it's basically wrapping a request in another simple API. There's a bunch of other cool stuff You can do with this there's like a rack cascade which you can say here Here's an array of rack applications to try and the first one that returns a non 404 gets sent to the client So you could have you know multiple applications in the same process and have it try on each one until it finds one It likes there's also a bunch of middleware for doing pretty stack traces and logging MIRB has one that you can wrap a profiler around your request so that with the middleware So in development mode you could wrap a profiler around it and have it output profiling information for you a Bunch of really useful stuff and basically what this does That I like a lot is you know You can have the full power of MIRB for building anything that needs UI or the provides API or whatever part of your application But you can get really close to the metal when you need to have some kind of small API handler or something That needs to be really really fast Gonna talk about web servers for a couple minutes Mongrel has kind of been the standard thing that everybody runs their rails and MIRB applications on it's rock solid stable It's a threaded server Which means that every request that comes in spawns a new thread There's a bunch of new servers like Finn and ebb The people are working hard on their event-driven servers then uses event machine and ebb uses Libi V and They're both rack based servers, so they dispatch rack rack requests The event-driven servers are quite a bit faster than Mongrel But they have a downside in that if you have like slow actions or you have a file upload that does a bunch of our magic calls or whatever While that's running you basically block the event loop and so no other no other requests can get served so They're really fast if you have a well-behaved application with no no long actions But as soon as you have intermixed long actions in there They start to fall down and the response times go way way down Threaded servers aren't like as fast by default as event-driven servers But they won't fall down when there's long requests because each request is running its own thread I worked with the author of ebb and Finn to get a new A new API put into their rack handlers, so on your rack application our MIRB rack application You can define a deferred question mark Method that takes the environment unless the framework figure out if you're gonna have a long request or not And if it's a long request you can spawn a thread So basically what this does is in your in it are being your MIRB app You set a couple of regexes for actions You know are going to be slow like a file upload or some long-ass action or whatever and when MIRB when the dispatch comes through if It's one of these actions matches it gets spawned in a thread. If not, it's just a standard event driven style So what this does is allows you to get on the event-driven servers to utilize their much better performance But also not have them kind of fall on their ass when you have long requests So this kind of gives you the best of both worlds And it's pretty sweet MIRB has a Bunch of since you can return a proc from your actions in MIRB and have it like it called later There's a bunch of cool stuff that sets up for us deferred callable So MIRB has a method called render deferred Where what what it does is it'll it allows you to set up a proc and then kind of Drop that out of your action in return so that the server can go on to serve the next request And what will happen is mongrel or whichever server using will call this This block later. So we say, you know say we were doing an RS feed and we're wrapping it up nicely Or something so we take an Earl in our action We parse it with URI parse and then we say render deferred You know make this net HTTP get request out to get the RSS feed and return it wrapped in our RSS pretty print And everything inside of the render deferred block Will not be called until after the action returns and other actions can come through and get Dispatch so what this allows you to do since maybe it'll take a while to get this URL You kind of drop this proc out of the action so it can continue to process other actions And this thing will in the background thread go out and make the request and bring it back and print it back To that client without blocking other clients from getting through We've also got render then call and What this does is just what it sounds like it allows you to Render Your response to the client and close the client's connection So they're done and they've get the response already and then it calls of the block after that finishes So say you have you know this ping action where somebody posts an Earl and you're supposed to ping it But they don't care about the return value They just want you to ping it like a for a track back or something So what you do is you render then call and this is going to be return, you know got paying for the URL back to the client and then The recline the client's done They got their response back and then in the background thread Merb is going to go ahead and ping the the URL for you so You know, this isn't a general replacement for like a messaging queue or some kind of background Damon But it's really helpful for small things like if you have some action You need to call out to another web service or you need to do some calculation Or you need to do something that takes five or ten seconds and you don't want to block the request loop You can just drop this out and have it go in a background thread Merb's got a really powerful router It's it does about 95% of the stuff rails router does for restful resources. You could have nested resources Name spaces all the kind of stuff And you get a nice helper to regenerate these and they work basically very similar to how rails works so that you can have a Active resource compatible little servlets that are much less resource-intensive than a rails app But it actually goes quite a bit further than that as far as the power you get So if we take a look, you know, if we have up here, we've got a post resource, which is like a restful controller But if you look at this top one here, we can say okay when the request comes in and it matches this post regex and The user agent is mobile Safari Then we're going to go to the iPhone post controller instead of the normal post controller and you can see here Where in the post we capture? Whatever comes after posts in the URL in a regex and then if you look at title We've got a little bracket one which says take them Know the first match from the regex and fill this in so we'll get the title coming into our action So this is an easy way to say have an iPhone controller for Some requests coming in from an iPhone and have it dispatched at the router layer at the router layer rather than having to do it in your own controller and Then we've also got Deferred routes which are even more powerful and what these do is if the route matches this first part We say defer to we pass in the request object and the params hash And then you can do any arbitrary logic you want in here like here I'm looking up a subdomain out of the database And then returning the whichever controller the subdomain account Says is the admin controller and whichever action The subdomain says is an action from the token from the from the route So what this basically does is it allows you to do any arbitrary ruby code as part of a route matching? And if it does match you just return a hash that has to have at least a controller key in it The action if you leave it out will be index and any other params that you want to return back That will show up in the params hash and if it doesn't match like if this if the subdomain isn't found Then it just returns nil or false and what that does is it tells the router, okay? Well that deferred route didn't match so let's go on to the next one and keep trying from there So it allows you to inject arbitrary code at any point in your route matching and go from there, so it's It allows for really flexible routes If you want to contribute There is The MIRB core up on github and lighthouse. We got an IRC channel Google groups We just have a new wiki up at wiki.murbevore.com That is written in MIRB that's starting to get quite a bit of nice stuff on there And we use Defencio, which is kind of like a kismet spam protection turns out those guys are all written on MIRB And so they donated some accounts to us to keep spam off the wiki which has worked pretty well so far One more thing here So MIRB is heavily inspired by rails That's pretty obvious and I've you know kind of taken it as a chance to rearchitect how how a rails like framework works So this last week I spent I've spent the whole week kind of porting some of the cool stuff from MIRB back into rails specifically all the rack machinery So now it's still on my github fork of rails, but it will eventually get pushed back into into rails itself but I've ported all the rack adapters in here and added a script rack up script to a Freshly generated rails application. So this what this does now is it allows us to use any of the web servers that rack supports but Standardized command line for demonizing and clustering and all that kind of stuff So here we can say script rack up dash a thin which means use a thin adapter and it will boot up We could say script rack up a thin dash c 10 and it'll start a cluster of 10 servers with PID files and all that kind of good stuff. So this is This is a work in progress, but while I was digging through the the rail source there in action pack Change log say it hasn't been touched in like 20 months or something And it kind of kind of shows it's it's grown quite a bit of cruft in there And it took me a while to trace down how a request goes through action pack from the web server and everything and it turns out that This is what what it was doing when I first started Investigating it as you get a raw request from your web server that comes in that gets wrapped in a rack environment I should be rack E and V not E and D Which is the header hash and the Request body that got wrapped in a rack request object that then got wrapped in a CGI Rapper object that finally got wrapped in a CGI request object all before you get the request object in your rails controller and At each step along the way it was like duping the CGI headers So I don't know if anybody's ever done a an inspect on the request object from the rails app And you get a dump that's like this long It's because there's like five layers of abstraction there and they're duping the headers along the way So I was able to trim that down to just be the raw request comes in as a rack environment And gets wrapped in the final action controller rack request that you get in your controllers And it doesn't dupe the headers anymore So it's actually fairly a Little good a pretty good little memory saving because on every request to every rails app They're duping this big set of headers for no reason and so that's a lot of extra garbage for the garbage collector to work on So getting rid of that stuff is pretty is it's pretty good I've also been able to rearrange the way rails does its mutex lock like before What it would do would a request to come in it would lock the mutex It would run the dispatcher callbacks it would recognize the routes it would instantiate the controller and then call the action and And then do the after dispatch hooks and that was all with inside the lock And I've changed it so that the only thing that's inside the lock is the dispatch to the controller. So dispatcher hooks routing recognition controller instantiation are all happen outside the lock in a thread safe way and This is nice. It actually doesn't make rails thread safe But it makes it perform quite a bit better under heavy load because there's quite a few there's quite a smaller lock now So I think there's some pretty big wins for rails and it shows shows how you can You know really approach a problem from a different angle if you step back and rewrite it from scratch But it's also kind of cool to put this stuff back in rails I think and because rails is you know, it's a dominant platform. Everybody uses it Pays my wages and everything so I'd like to make rails quite a bit better So I'm going to be spending the next couple weeks working on heavily working on refactoring action pack and getting a bunch of that stuff cleaned up That hasn't been touched in so long. So Anybody got questions? Do things so I know I asked you this over lunch, but just so that everyone can hear is this mean that Rails running a direct doesn't use CGI dot rb for parameter parsing anymore It still does in my current branch of it, but it won't after for after a couple more days or whatever Which is awesome a second thing. I'm just quick big instead of I Don't know if everyone knows you could is instead of if you are using s3 instead of streaming like you suggested Well, that's definitely better than downloading and then start to get up again You can use s3's query string authentication to just set up a redirect to a one-time URL and let Amazon do all the serving for you Sure Yeah, that's just an example There's a whole number of other cases where you want to you know grab a URL and only stream chunks rather than waiting for the whole Stuff to come down. So anybody else I Have a question for you about performance. So it seems that whenever I ask anyone who's in the position of actually having a Rails app in production about performance. They always say memcache memcache memcache and I Right now I'm dealing with and can imagine dealing with again in the future Problems where Maybe an application involves many users and with my java head and I think well, that's okay I just keep this stuff in memory and you know keep it around in a cache right in process. So which is different than a memcache Yeah Does MIRB have any answers or opinion on that question and It's not a hugely different picture than how it works in rails You're still gonna want to use memcache if you're using multiple processes But since MIRB is thread safe and can run without a lock if you don't use active record Then you can get a lot more out of one process And so that way if you you know if you just have something smaller You know even to medium size that you need to have some kind of servlet for MIRB could probably do that without without needing Multiple processes to keep a cache right in memory But it's you know, it's still Ruby green threading at this point So maybe j ruby and Rubinius eventually will fix that with some native threads and then we'll have true parallel thread execution But for now, yeah, you're still gonna need memcache pretty much or something similar. Thank you. Yeah How long could you expect to have a MIRB process running under like thin or long roll or something like that? How long yeah before it leaks out And it depends like if you're you know, if you haven't written any code that leaks yourself Then MIRB doesn't leak and I've seen MIRB Damon's running for months and months. Well, the the VM though leaks back to the OS So like I mean, what's what's the I mean? I know you run a lot of these processes, right? So what's the feasible limit on like how long? It depends like I usually Set you know set up and monitor just watch the memory and if the memory goes above a certain threshold then restart But MIRB processes in on average restart a lot less than rails processes do because there's you know The framework is a lot less code. There's a lot less objects created on every request There's a lot less work for the garbage collector and that's one of the big performance issues with rails is not you know not is the garbage collector in in Matt's ruby is kind of weak and rails creates a lot of garbage so the GC can get called once or twice on a request sometimes and That you know that takes a lot of time because it stops the world to collect garbage and so by keeping the amount of garbage down You can really increase the life of a Damon and increase its stability Can you give us a little bit of insight into? Which aspects of action pack or what other aspects of rails you'll be looking at in the next couple weeks You said you're working on some more revisions Can you give us a little preview of what kind of stuff you're looking at beat up? Yeah, so I plan on completely tearing out Any dependencies on CGI dot RB? Because that's an old library that that I hate And should just be gone So I'd expect the all the parameter and form and upload processing to get much better in the rails future I also will probably be reworking the session implementation because right now it requires realized on CGI session Which is part of CGI RB, so I'd like to not even have that as a dependency anymore. So in general The main things I'm looking at is just make the dispatcher code path as quick and clean as possible Have all this rack stuff polished with one single abstraction for starting and clustering Damons of dependent no matter what kind of web server it is and just a bunch of general Refactoring inside of action pack. We'll see I have some ideas for action view as well We'll see if I get to that right now or not And then there's also some work. I like to do just to make rails thread safe. It needs Basically, we need a mode for rails in production mode that can preload everything that you'll ever require So that because the dependencies mechanism right now in rails is not thread safe if you had two threads running and One thread sees a missing constant foo and goes to try to load the file The other file sees the same missing constant tries to load it and they start stepping on each other So I'd like to fix up and make like a preload rail you know preload my app mode that just goes through and loads everything before it ever dispatches a request and Just there's probably just a bunch of general stuff in there that kind of needs to be touched up a bit Need to be really careful of all the class variables in there because those are really bad for threat safety And in general, I'm just going to try and do a little cleanup and see what how we can make it better Sure. All right. Well, thanks everyone