 Happy to introduce John Crosby, who's going to be speaking to you about, I don't know, some cloud stuff, and scaling, and using all those wonderful distributed web services out there. So here we go, John Crosby, and I don't even know your title. I can't see it. What do you call it? Cloud Kit. Yeah, Cloud Kit. Cloud Kit. There we go. All right. You guys hear me okay? All right. So first and foremost, thanks for all the plus ones and the votes that brought me up here today. I wouldn't be here without the votes. So again, thank you. I appreciate it. My name is John Crosby, and if you'd like to get in touch with me after the talk or just check out projects and things like that, you can go to my site here. My email address is on there as well. I work for Engine Yard, and I work on the Engine Yard solo team. I think Carl mentioned this project yesterday, but this is our platform for on-demand management of Ruby on Rails applications. It's not really limited to Rails. You can run MIRB apps there, RAC apps, et cetera. Find me afterwards. I can talk all day about that. And as Carl mentioned, we're also hiring on our team. So if you like to work on some interesting technologies, let us know. So Cloud Kit. With your permission here, I will do this presentation basically as a lightning talk with that fast pace for the entire 30 minutes and then collapse in a chair. After this is done, there's a lot of information to present a new framework and why you might be interested in it. So to give you as much bang for the bucks I can, this is going to be rather rapid. So what is Cloud Kit? It is an open web JSON appliance. And so we'll spend the rest of our time parsing that statement. JSON appliance. What does that actually mean? Well, it means Cloud Kit provides the ability to quickly and easily spin up an API for any sorts of RESTful collections that you would like to host on service endpoints. These collections are of JSON documents and you can spin these up without using anti-patterns like code generation or object-relational mapping. It's philosophically similar to CouchDB in Persevere. The difference, well, there are several differences, but one of the main ones that's relevant for this conference is that it is implemented in Ruby, so if you want to hack on it, it should be more accessible to all of you. Easy to install, gym install, Cloud Kit, ready to go. And it doesn't represent as much a new way about thinking about building web applications as it represents a web way of thinking about building web applications. So here we are in 2009, unfortunately the decision tree for building web frameworks usually looks like this. If someone's building one, they're building a new MVC abstraction that's been done before. One of the things I love about Rails is that in Rails 3 we're going to have an API for this type of thing that conceptually resembles this. We'll be able to take the defaults that Rails provides for us that we all know and love, and we can lay over the top of those individual things that we've implemented ourselves. Hopefully not another one though. So what types of situations would you not be using an MVC style approach to development or more specifically not MVC all the way in the back of your stack? That's the most common question I get about Cloud Kit, so I'll try to answer it here. Basically any time you're trying to build a RESTful application architecture, here are a few examples. An early one that was proposed by Dave Thomas way back in 2007 was this radar architecture. It's already showing its age because he's basically separated out a smart client talking to a RESTful application server from a browser-based client which are actually much smarter these days. We'll get into that. If you're trying to build something where you're composing resources and the representations within the browser, this is my favorite use case for this architecture actually. Let's say you have a user interface that bootstraps itself by loading a single static HTML page and then you let JavaScript take it from there basically. If you add an area of focus that the user is reforming some tasks, it has vastly different scaling requirements and semantics than say another widget on the right-hand side that has an activity stream in it. Totally different problems. Now you can point these things at two different services, scale and cache them differently. Examples of doing this kind of thing are new web frameworks like Cappuccino and if you haven't checked out 280slides.com it's kind of an interesting implementation of Keynote in the browser. They are using something called Objective J which is bringing Objective C to the browser, Collective Psi. SproutCore does the same thing essentially with using real JavaScript and it's actually very Rails friendly so also worth checking out. Apple uses it for their mobile me service. What these kinds of things need are just data stores that you can plug in and use from the browser instead of re-rendering the layout and re-rendering everything and reloading the page. It's more efficient that way. Obviously desktop and mobile apps can benefit from the style of architecture as well. One other architecture that less of us are familiar with is like actually composing resources in the caching layer so if you're using edge side includes you can have an architecture where your client is actually passing through a caching layer that has essentially been bootstrapped by your services on the back end that include what looks kind of like old school SSIs only their cache includes and then your caching layer can compose and configure those scaling properties of page fragments right there. Other JavaScript frameworks of course need these types of things. So about this web way of thinking about things it starts with Rack. Cloudkit is built on Rack. I absolutely love Rack because essentially it's modeling the web for us in Rubyland. We all know what the web looks like, right? Got HTTP coming in the front of the stack here. Passes through zero more intermediaries and finally you hit this app at the back of your stack. The apps where we spend most of our time doing our development and that's where Rack comes into play. Almost every Ruby framework I can think of that runs on the web is running in Rack these days. Inside Rack actually looks like this. HTTP is coming in the front of your stack again. Zero more intermediaries that can pass through. In HTTP land these might be auth proxies firewalls things like that. Same deal in Rack. In Rack we call these middleware. Quick side note if you're interested in a deep dive in middleware I give a 30 minute talk on it at Mountain West this year and if you go to the conflict site you can dig into that. Finally there's the app at the back of your Rack stack. So Rack is the web which is what makes it so awesome. It takes a set of constraints. It uses those constraints to build an interface and that interface is documented in its spec. The spec is both runnable and readable and even cooler is that you can run the spec as a piece of middleware in your stack to verify that what you're developing actually follows the spec. Very cool. So as the web is to Rack CloudKit strives to be with REST. And if you haven't taken the time to read Rory Fielding's dissertation where he derives the REST architectural style, that's chapter 5 linked right here. Don't take secondhand information from blogs and what people think REST is he actually derives it by starting with no constraints around the worldwide web and then layering on constraints until he arrives at this very useful architectural style. So enough theory. Let's build an app. Create a file called config.ru. That stands for Rackup. You can require the CloudKit gym and then say we want to expose a set of resource collections for to-dos and profiles. So this would be an example app where maybe we're building a to-do list that people who have profiles can share. Maybe it's a team-based to-do list. We're now done building our app. So we can boot this thing up. Rackup config.ru will boot this in Rackland. We're going to explore this using curl. And if you can't curl your REST API, you lose. So CloudKit boots up and bootstraps everything from the top down. It's completely discoverable. So if you latch on to this meta URI at the top level, that's how we can get started. So curling it. We're asking the server basically, what do you do? So it returns a set of URIs as a JSON here showing posting to-dos and profile RESTful collections. And in HTTP, how do you ask a URI what you can do with it? Well, you use the options method. So we say, what can we do with these options? And just following the spec, we return an allow header with the types of things you can do with these collections. So let's try some of them out. We can do a get on this to-do collection. And we see that it returns an empty collection of URIs because we haven't actually put anything there yet. It's got some API paging information which we'll get into later. There's also a link header which is less familiar to most. And there's a IETF draft spec going through right now by Mark Nottingham called link relations and HTTP header linking. And the idea is basically trying to get at this thing that Fielding talks about. Hyper media is the engine of application state where actually even better way to draw parallels is that if you're creating an href link tag and an HTML page, you can use the rel attribute to say how these two resources relate. This is the same way of doing this for database APIs like JSON. Leonard Richardson and Sam Ruby summarized this better as just connectedness in their RESTful Web Services book. Highly recommended. Let's keep going. Let's post the to-do document here, title of foo at our to-dos collection. We get a result back according to the HDDB spec. You have to include some kind of text in the body of your message indicating something went well. So I sheepishly include OK, true here. But otherwise you can notice that it's created a URI for this document and also an ETag and last modified. These things are generated on right. So CloudKit is actually read optimized. In real world web services you're going to do way more gets than updates, etc. So if you do the work on right, you can get some efficiency gains from that. Like generating ETags. It shouldn't be something that you just MD5 on the way out of your response because you're having to calculate that every time. You should just do it on right. I do this by what I casually refer to as HTTP oriented storage. So on the back end, inside the in terms of CloudKit, it's just storing metadata and the document and the metadata I'm talking about is like HTTP metadata. ETags, last modified, etc. So what this also means is no SQL, no ARM, happy day. It uses TokyoCabinet actually instead. And if you subscribe to Ilya's blog who was speaking here yesterday, I thought it's on, yeah, he's here. You're probably already familiar with TokyoCabinet as a very fast key value store that's persistent. CloudKit uses what are called TokyoCabinet tables which end up being like keys with hashes on the back end. So the real storage mechanism on the back end looks sort of like this. We have, you know, record keeping random ID with various metadata at the flat document stored and then some other magic in there. I'm using the Rufus Tokyo Gym to pull this off. So you can ask TokyoCabinet to query this store which in this case will be a collection of to-dos. And you can stack conditions on it that you want to use. There's tons of comparators, equals just one of them. So CloudKit is schema free. I'll put a little red star up here that we'll discuss later. It's schema free if you want it to be. And basically, HTTP and JSON are the schema. So back to Curl. Two kinds of put in HTTP, right? There's the one that most people are familiar with from Rails. It means update an existing resource that's already defined. The other kind of put means create this resource with this representation at the URI that I'm asking for. So here we are posting a new to-do at slash ABC. It's successful. Same type of data. Now we can run options against this individual resource to ask what we can do with it. These are things you could adjust if you want to give different permissions to different users, for example. So here we can see we can update, delete this document, do the usual gets. So if we were to fetch this document, obviously just returns the document itself with a bunch of HTTP stuff. Last modified, ETag, it's useful for caching and optimization. Again, with the link header thing, this time the relationship is called versions, and CloudKit is auto versioning. So anytime you update a resource or a document in CloudKit, its previous version is archived. So the overall URI space ends up looking like this. You've got your top level collections, and then the unique resources that you stored within that collection, which we've seen. And then for every item in your collection, you can ask for its list of versions. And then finally a specific older version you can access using its ETag that it had at the time it was archived. So looking at our list again, you can see the URIs. The most recently added or updated thing is at the top. So it's reverse ordered by last modified feed style, which allows us to do some cool things. One of which is solving a lost update problem if you're unfamiliar with this. It looks like this. Two users, two browsers, one document fight. And it's usually either the first one or the slowest one wins depending on how people have implemented it. The outcome is also hand wave if you're a framework author most of the time. But HTTP has already solved this problem for us. So let's just use their solution. We try to naively update a resource that exists, this ABCs thing. CloudKit will respond with a 400 error saying bad requests because you have not specified the exact version that you think is the current version that you're trying to update. So now we can be specific in an HTTP if you say if match that saying, the precondition is I want to perform this operation if the ETag matches this one that I have. So this is the current ETag. Everything's good. 200, everything worked. You can see a new ETag was generated here. Client two has the stale data. They're trying to be good. They're being specific. They have an ETag, but they're also stale. So now they will get the proper response, which is 412, meaning the precondition for this operation failed. That condition being ETag was not matched. So now this client can actually fall back on the version history that's stored remotely and get a listing of URIs to see what it's missing. And since they're ordered by Last Modified, they can find its starting point and move forward. And here we have a link, a relationship to something called the Resolve version of this resource. And the reason that this is added, rather unconventionally to this basic REST API, is to solve a big O of N problem. If you had 1,000 documents in your history or even your top-level resource collection, that's 1,001 gets. One at minimum, if you're not using an API page, to get the list of URIs and then 1,000 gets to get each one individually, which obviously does not scale. So we can either rewrite this in Scala or we can choose to solve the problem. So I'll pick option B here. So if you append this Resolved modifier on the end of any collection, you will get what looks like a batch HTTP response. These don't actually exist. So there's no way to communicate ETags and modified things in the headers for multiple resources. So here they're just an array of JSON documents with the documents embedded with the metadata and then the flat document inside. And again, we've got a link relationship back to the index of this thing that we're looking at here. And this trick also works on the top-level resource. You can see the whole collection at once if you'd like. Finally, of course, we want to delete things. Same deal. You can't delete something that's out of date. So you're passing in a specific ETag here and okay, it worked. If you notice the canonical URL to do is ABC for this resource is now stored at this archive where the ETag exists. So if I were to try to fetch this again, what would we expect to get? Not a 404. We get a 410, which is a more specific version of 404. In HTTP if something was there but has been removed, we actually know this in Cloud Kids because we versioned. So we can say the entity was previously deleted. And what this allows is for other clients, again, if they want to, they can go back and see that the latest version is archived, but you can make good decisions about what to do when something's been deleted but someone else is modifying it and we'll get to that later. So we've got this uniform interface that we expect from REST. We're hosting easily RESTful collections with addressable nouns. We're manipulating them with HTTP methods. And we're exposing relationships with links to give us this fully discoverable API that we're able to walk in curl. So what exactly is missing? Things like the ability to ask questions or do common things like pagination or if we want to query, these are one-off snowflake style things that everyone does differently because there's no standard for them in their APIs. So you end up having to rewrite user agent code to speak to different APIs and then that's still not realizing this vision of generic user agents on the web. So, for example, if I wanted to do for my open ID or only extract titles from to-dos, these are random sort of things. So what I've been toying with here is uniform interface for querying. There's something available actually now. There's a Google group and things for it called JSON query. It was introduced by Chris Zip on the Cypn blog. He works for them and works on Dojo. The evolution is basically we had X path in the past that became JSON path. You can find the original article here and this evolved into JSON query which is like X path, JSON path with more stuff in it. You have a ray slice operator. So now we have a consistent way across multiple services. We can actually ask for API pages. This is the second set of ten in this collection we're asking for. Works like Python or ray slicing. So you can even say with the step value of two start in and step. So every other one on that collection if you needed that. You can say I want all to-dos with the priority greater than or equal to three. You can chain expressions so I want all priorities greater than three and only the first five elements of that. You can extract things. I want a JSON array that only has to do names. Sorting, ascending, descending, chained. Recursive finders, they have this object ID, literal, which means the thing that you're looking at when you're iterating a collection. So if we had in theory like a to-do JSON document with embedded users, this would recursively find this first name properties of those users that are embedded into unions. So these are all available and implemented completely in a library called JSONQuery.js. I maintain my own fork of this on GitHub where I keep bug fixes. And it didn't have any tests before. I've added a complete test suite for all the features, mostly to learn about the API and when deciding not, I want to use it. So there's a jQuery plugin for CloudKit also under my GitHub profile. And this allows you to point a store at basically whatever service you're using. And then the boot method is sort of like the on-page ready method that you use in Ajax apps. So it's going to bootstrap off that metadata URI that we saw earlier, figure out what the host has and what it can do with it. Loads one local store remote collection and then loads the data that you own in that collection in this sort of in-memory and browser queryable JSON store. And if things fail, you get meaningful status codes so that you can fail gracefully. So inserting to-dos, the store object we created, we can just say the collection to-dos. We can create something with the name of foo. And then when it comes back, we can do something with this object updating it. Just call the update method on the object, give it the new base on. Once the success callback is hit, you know that the remote service has been successfully updated. You can use the object if it fails. Again, you get a good status code so you can fail gracefully or recover. And by recovery, I mean like the 410 thing. You can make a decision as an app developer if someone's updating something that's been deleted is that something you want to recreate a new URI or just fail with some kind of error message to them. You got a 412. You could actually, you know that you're out of date so you can go to the version history. And you can progressively diff merge forward and version history until your local thing is in sync and then put that back to the remote service. So it's sort of like get in that aspect. And incidentally, that's how you would implement online offline synchronization. Just come online again or if you've got a flaky connection, you can do these things in small increments that someone's using the app. Finally, deleting, you can just destroy the object. That's a pretty obvious one. And even cooler, you can query these collections. So all the same syntax that we just saw used on the service side is also local in your browser. You can query with the exact same syntax. And if you look at the JSON query branch of Cloudkit on GitHub, you'll see that that JavaScript file is actually inside my Ruby implementation that's made possible by Johnson which is a Ruby JavaScript bridge. I think the author might be here this weekend. So this is something that's on the 1.0 roadmap but is on a branch at the moment. Something else called JSON schema also exists. And this is total vaporware right now. This is something that's planned but not implemented where you can say for any collection, I want to see the schema for it. So if you're operating in a schema-less environment, who cares? But if you want to be able to know the things that you can include in the query across services, then you could ask for schemas and they look like this. They define what you have and then they also define the constraints on this. So validating what you're going to try to update on a server is no longer this mystery. You have to try and then see what error or requiring human interaction in the browser. We can know ahead of time and that also allows us to validate with the same code on the servers and the clients which is very powerful. So wrap up here. The open web part of this means that we can take the RAC application we built before, change this exposed keyword for our resources to contain. So now what we get for free is open ID filters and OAuth filters including OAuth discovery in our RAC stack. So it looks like this if you're hitting it from a browser, unfortunately they don't understand OAuth yet. I hope that changes. If you fail OAuth authentication, it's going to drop a discovery challenge header in the RAC environment. You'll fall through to the open ID filter and then finally dump out a login page for the user. So it's like typical open ID sign in. I just have a very bare minimum implementation of this in the cloud kit at the moment. So again you can do your dance with your provider, come through fail OAuth, succeed in open ID and access to service. Same API, same stack, same URI coming from a service or desktop app that supports OAuth discovery. They're going to fail on the OAuth filter, get the challenge header in the RAC environment, fail on the open ID filter because they're not a browser. And finally the same page gets dumped out. But if we were to curl this, we'd see that the same thing that was written to the browser, we're getting the proper 401 status, but now we have the OAuth discovery challenge header here, which since browsers don't recognize, they don't pop up like the gray HTTP auth dialogue, which is cool. They'll just ignore it, but these discovery clients can see that our provider metadata is hosted at this URI. They can hit it and then discover where we describe how to interact with the service. So OAuth clients that understand discovery, if you squint really hard, we'll get this XML document that basically explains how to get request tokens, access tokens and authorize them and how to identify yourself to the remote service so you can sign every subsequent request. So finally now these OAuth things can do that dance and then succeed. The open ID filter knows what's going on with the one upstream so it passes everything on through to the service. So getting to 1.0, there's a bunch of jQuery work that's on that branch in GitHub right now that needs to get merged into Master and released. There's the JSON schema work that still needs to be done and a bit of templating so people can plug in their own auth pages and things like that for OAuth. That's it. We're done. That was very quick. I've got time for questions and if you would like to participate or help hack on this, there's the site and my site as well. Any questions? Thanks. Any questions? Yeah. So to show the querying, that's going to be really slow if you don't index with art and you guys are doing an index on demand or how does that happen? Yeah. Actually, the question was showing the querying model that could be potentially very slow. So I'm basically following the process that Carl outlined yesterday where this very thoroughly tested at the moment and I'm just trying to write clean code around it and having good specs has allowed me to try to do some experimentation around there. So some of the things I've got two layouts I'm working with for indexing various JSON properties on right and ways to optimize finding things and storing facts about that. It's definitely a work in progress at the moment. It could potentially be slow. There are some smart things that Tokyo cabinet can do though to also search things. It's actually got regular expression matching in the collections. So by letting it manage that, I can be smart about chopping off requests that come in and saying this is something Tokyo cabinet could do for me. These things it can't do so we pass that through the Johnson bridge. Just one hop back and forth at most. Other questions? Yes. Not that I know of. I've been in communication with a couple of people in one company that have been trying this but there's nothing in the real world that's currently using it. What kind of apps do you think with 6.4? Well I'll tell you what I'm personally using it for. There's an old school Mac app that I showed a screen shot up called ActionTastic. I built several years ago in Objective-C and the goal is to have it synchronize with a web service so if you weren't around your computer you could just use it in the browser and it looked and acted exactly the same. That's what I'm using it for is like a back end for that. I've got some ideas for iPhone apps as well that just need a rest service so this will allow them to just use it basically. Other any plans to extract Tokyo cabinet making it easier to use another store like Redis or something else like that? Yes certainly. Yehuda has a library on GitHub called Modeta that's an abstraction for key value stores and I've been discussing this with them. Basically most operations I perform could be abstracted through that API. The querying thing doesn't fit in the current API and it's not totally clear right now what things should be moved into that API or not. So once we define those responsibilities clearly I would actually love to move to just using Modeta and allow plugability. As it stands today there is an in-memory store that you can use so when you install the gem that's what gets used by default so you don't have to compile Tokyo cabinet which actually is pretty trivial but if you implemented the same interface that that in-memory store implemented if you wanted to try writing an adapter that would actually take care of everything at this point. Other questions yes? Sure the question was kind of a comparison between CouchDB's MapReduce for querying versus JSON query and how that would work out. Clopkid actually started well it started as all kinds of different crazy ideas I trashed but one of the most recent plausible ideas that it went through was it was a Sinatra app that acted as an open ID and OAuth proxy to CouchDB and I found it to be a little slow because of just the hops in and out of JSON and Ruby and in and out of HTTP on the back end it also felt hackish to me because CouchDB doesn't support natively at least at the moment I know they're working on this but the concept of identity so I had to add my own metadata like underscore user and then write a view to extract things by users and it just felt like a lot of configuration and things and I wanted to be more automatic so there are some things it will be better at clearly than this but it kind of depends on what sort of things you're looking for. Other questions alright that looks like that's it thank you very much for your time I appreciate it.