 Okay, I'm glad that we actually got the book giveaway given because nothing warms up a room like a giveaway with no giveaways. Okay, so my name is Jesse Walgomot and I work in Houston, but really I'm just a developer for Chaiwan. And I like to say that I'm the Team America lead developer there. That's from my son. He's about two. And July 4th happened with the fireworks and he just kept yelling out, America! You know, instead of America. And so instead of the US lead developer, I like to say that I'm the Team America lead developer. The NoSQL stars that we'll talk about today. CouchDB, which is created and then was included in the Apache system, 2007. MongoDB, NuKid, although not the newest in 2009. SimpleDB from Amazon in 2007. And RavenDB by Yende this year. Okay, so there is an evil nugget of information about the NoSQL systems. And that's that they all share an internal structure that was inspired by Lotus Notes. But I don't think that that means that we should all run screaming from them because there seems like there's some interest. In fact, who here has programmed and connected to MongoDB? Awesome. What about CouchDB? Cool. What about both of them? Nice. Okay. And then finally, anybody use Amazon SimpleDB? Yeah, nice. And then RavenDB, I'm expecting very little. But anybody use it at all yet? Okay. So how can we determine what's the best to use? And I'll go ahead and say subjectively that there is no one right answer. I think that it all depends on what you want to get done with it. But what I want to do with this talk is go over how they're different. And then that may influence what you want to use. So we'll talk about the languages that they're each written in and the API and maybe why that matters. If there's versioning, how queries are done and inserts and then any intangibles. So MongoDB is the cool kid nowadays. So it's new. And I think that anytime that you have a video written about you, that you have achieved a status of fame. And if you don't know what this is, I just saw it this morning. And it's Mongo is web-scaled. And it's pretty cool. Okay. So the APIs themselves. CouchDB, the API that you talk to it is a full rest API. And it is written in Erling. MongoDB is C++. And each connections to it is language dependent. RavenDB is a .NET system. And it runs on Windows and has both a .NET API and a rest API. And SimpleDB is written in Erling. And it's a name value store. And it has a SOAP interface and a rest, what I'm going to call a rest-unlike API. Okay. So what's on here isn't truly important. So the CouchDB, you connect to it through, you're just posting JSON, you're getting it VNID, and then you're going to get JSON back. We would likely use it in a Rails project using a wrapper around the JSON. And one that's out there, it's pretty good. It's CouchPotato. And so you can see that you include a set of methods and then you declare on it your properties. So this one declares property. And then later on, if you want to create it, you actually say you connect to the database and you say CouchPotato.database, save document, and then the user. So it's a little bit of a different syntax that you're using as compared to your general .save.create. In MongoDB, it's much the same. So there are two main conflicting libraries right now, Mongoid and MongoMapper. And they all share very similar syntax, where you still include methods, whether it's the MongoDB document or the embedded document, and then you either say that it's a field or a key. And at the very end, you say that you want to create on the class name itself and pass it in your variables. So which do you use? MongoMapper or MongoID? MongoMapper is definitely easier to get into. So MongoID uses ActiveModel, which is Rails 3 and the new hotness. Mongo...wait, I don't know what I said, but MongoID uses ActiveModel. MongoMapper is more like Rails 2, so it's just a class that uses validatable. But on the plus side, MongoMapper has better association. So you can say school.students. and just walk the tree very easily. MongoID, you can't, so you have to wrap that. You have to do that yourself. MongoID uses Errol for your queries. MongoMapper just released a domain-specific language to build all of your ware clauses. It's pretty nice. One of the last two, though, I think are the most important reasons. MongoMapper is more familiar for ActiveRecord, for better and worse, but it's also easier to get into. So I think you could drop a Rails developer down into a project that is using MongoMapper and that they can get through it. They can know what's going on and be productive in it. MongoID, I think it takes a little bit more research. But MongoID has a declarative master-slave. Like you can say that this query is okay to go out and ask the slave databases for their information, or no, I have to be on the master, which I think is fairly interesting. Onto RavenDB. So it's your general put, get, and it's just JSON. And then you also, down at the bottom, is some link code that probably at least half of you can read and not want to write. But it's there, and so it's set up to easily get into link. And in fact, when you do your querying and your map reducing, all of your indexes are defined as link queries. This is Amazon SimpleDB, and I claim it is not RESTful. So this is a put request, and you can see up there that it's and action equals put attributes, and you set all your attributes. So it's attributes and values, and then those are each stored, and you query them later on. And I'm not going to spend too much time on SimpleDB throughout the whole talk, so I think you can guess how it rates on our scoreboard in the end. But I don't like this, but it does get the job done, and so you can build things that easily store with Amazon and get them back later. Okay, so our scoreboard for APIs. CouchDB gets a star, RavenDB gets a star, and MongoDB gets a star. There's no star for SDB. Versioning. Probably too small a text, I apologize for that, but we'll get through it. So CouchDB has versioning baked in, but really it's MVCC, multi-version concurrency control. So it's not really versioning. While it stores versions of documents upon each other, only the most recent is sent if you're doing replication. So you're not going to have all of the versions. You can't count on it to track your versions. You can count on it to be able to bring people up to date. So this is really good for offline databases, syncing and bringing back to when they come back online. But it's not good for being able to track your blog comments or your blog versions of everything. So this is good in multi-master as well. So Mongoid has a nice versioning where you can include a versioning model, and RavenDB has versioning built in as well. So for CouchDB and Mongo Mapper, you're generally right rolling your own, which in a document it's not as hard as you might think, right? So you can just say that a document has many versions of that document, and each time you update it, take the copy and save it inside itself. So I don't think that that's as big of a deal as possible, but having it just baked in is nice. So RavenDB has it baked in. This is the configuration where you tell it, obviously in a .NET system, where you tell it that you want to store, I think here, 50 max revisions. On the Mongo word side, you do the same thing, where you say I want to include versioning, and then you say that the max version over there is five. So for versionings, Raven gets one, and I don't think I gave Mongo one here, but I probably should have. Okay, so queries. CouchDB, every query is a JavaScript map reduce implementation, which is pretty cool, I think. So the cool part about it is that all of your databases, they all have all of these different, or all of your documents have all of these different collections of attributes. And so when you say, hey, give me this information, it does a map reduce to say, okay, what functions do I need to do to collect, to decide if I should include this document in the collection, and then optionally reduce it down. Do a sum or a reduce or whatever you want to do on it. So as CouchDB has, you've got your documents, and then you also have these special documents called design documents. And that's where you actually declare which views you want. So you declare that if you want to be able to get all of your documents, you need to have an all view, or by last name, or total purchases that appear, does a map and then a reduce. So the cool part is that once that's done, once you've been able to do a map reduce, and you're getting your data by something that you've indexed on, it's fast. But what's not hot about it is you have to declare, or pre-declare, all of your views. So this flies right in the face of something in Rails that we take for granted a lot, but it's find all by, and then that name, or passing your conditions, or doing all of these queries. You don't get that, and you can't do it, other than an ad hoc query, which is pretty slow. But, and I'll say this with the map reduce in Couch, if you set it up with multiple servers that are talking to each other, and so they're either in a replicated fashion, map reduce there is pretty cool. MongoDB. MongoDB has dynamic query, so it does not, you don't have to pre-declare your views like you do in CouchDB. It does have map reduce, but as I learned today from Scott, it's not concurrent, which means that it's a single thread that goes through all of your documents in a linear fashion. So it would be cooler if it did take your 100,000 documents, split them up into a raise of 10,000 and process them all separately, but it doesn't do that, at least not yet. I included an example here that, again, too small, that's very bad key-noting, but it's just basically saying like event.where the school ID equals a pass-in school ID, and then it makes use of some scopes, so .publish.future, and then it sorts it and limits by 20 characters, or 20 items and then says .all. So being able to chain together conditions and scopes all of that just works the way that it does in Rails 3. Simple DB querying. There's no sorting, which I think is a little bit lame. You can do things like equal, not equal, less than, greater than, equal to, and then I included here just in the right aid, the right Amazon web service gem, how it does it. So you do querying based on intersections of attributes, so if you imagine that you've got documents and they all have attributes on them, you're looking for documents that intersect each of them. So the downside is that it returns XML and then you have to parse the XML. You have a maximum of five seconds before it will likely time out, and it's RavenDB. This is, I think, one of the cooler things. So RavenDB indexes are MapReduce, just like CouchDB, but they process them in the background. So even as you are inserting records into Raven, you don't have to wait for, you don't pay the index cost whenever you're inserting into Raven. So it takes it and says thank you and then processes the index in the background and then your results are stored at disk and then those results are then passed back. Downside of this is that your results are eventually consistent, that we hear a lot about in cloud services, but especially here is that if I insert a record, I request that record right back, it may not have the changes that I just sent in. So you have to be okay with that or specifically tell Raven, don't just get the nice query, do the calculations for me, get the absolute most current version. Okay, so back to our scoreboard. CouchDB star, MongoDB star, it's a sad day for SDV and RavenDB star. Okay, so on to inserting records. Okay, so CouchDB, I think that the API is pretty cool, but having to recalculate the MapReduce is not cool and slows down over time and so we'll see that inserting records in the CouchDB, it's not necessarily a problem if you're okay with it, but it's slower. Mongo, inserts, they update in place, which is fast, but what I mean by updates in place, so it's in memory, the updates come in, it makes the change to the object in memory and returns to you and says, okay, I've taken your change, and then it does the change to disk a little bit after that. So the problem comes in when the power turns off before those changes are written to disk, that change will go away. I don't think this is a good thing, but I think it's something that if you're okay with it, I think it's pretty good. In the vast majority of cases of software that we write, this is okay. Yes, nice. Or can, right? Yeah, yeah. So I think that's awesome. 1.6 actually also added in some auto sharding stuff and is pretty cool. So didn't know about that, thank you. That's good, use that. And the next time I give this talk, this will be different. Okay, so some math. So I took this from this thing, this blog that I found, and the numbers look good, and what I found was similar, but this is pretty dramatic. So this is the number on the left, the number of seconds that each, if you're doing the inserts in the bottom, this actually takes. So MongoDB, which is the nicer of the curve, vastly outperforms the CouchDB install. Do you think that it has changed since then? I respect that. I saw in the loading of some data that we'll look at in a second, a five times difference, which obviously is not this graph, but a five times slower in loading of CouchDB than MongoDB. But I like CouchDB, but from what I've seen, and I think it's a connection issue, yes. I can absolutely see that, right? Because even in mine, where I was inserting 24,000 baseball game records, it was one insert per, one connection per, and so I think that sheer connection time issues start taking effect. So cool. So I'm going to give to Mongo and to Raven the stars here. Okay. So we come to extras. I think that MongoDB has the best Ruby integration, that simpleDB has easy scalability that you don't have to worry about. RavenDB runs on Windows, and I think that's a downer. They use... Yeah, yeah, for sure, absolutely. And I think that he'd said that the biggest problem there was that they use the enterprise storage block, right, that Exchange uses to store large data, and so they're going to have to basically rewrite that, or a different version. Sorry. Right. Also RavenDB, as of what I checked, is a commercial license for non-open source that I think is a downer. But I don't think that we should discount it just because of that. I think that they're doing some really novel things, and so that's why I include them here in a talk. CacheDB has multi-multimaster that I think is really cool, and offline replication that I think is really cool, too. Both of them you can get started with. So there are free levels on Heroku that you can just get started with and start using and playing around with. And it's actually really simple to get running on your Mac as well. So for the extras... Oh wait, I think... No, this... Oh yeah, no, that's right. So the SDB, it's simple scalability and kind of its free layer, free tier gave it a star. I know, right? RavenDB, not quite with the extras. So as it stands, I think that Mongo, Raven, but I'd probably go Mongo. But CacheDB is more established. It's there, it's respectable, I like it a lot. Okay, so what I was talking about earlier is I was trying to come up with something to show. And so I found, from 2000 to 2009, every game data and just loaded it in. And so there's about 24,000 games in that decade. And in Mongo it took 77 seconds and then CacheDB took 355. So entirely subjective, but I did experience it. Okay, we have a couple minutes. So what's sort of interesting is you're able to have any Rails project talk to both. So this Rails project just talks to, I've got the idea of a Mongo game and the idea of a Cache game and we can do a little bit of querying on it. But I wanted to show, here's the Cache Explorer in which we have a bunch of documents. We've created the documents, but we have not accessed the views yet. So in our views, and we're looking at our design documents, they aren't there, but as we look at any one of these, we can see that this was Tampa Bay and, or to be announced, nice, first Kansas City. And if we want to just do something as simple as, if we're going to ask the Mongo, so we've got Mongo game and Cache game. And so we want to see how many records. So Mongo game dot count and then we can do things like dot where the home team is Houston dot count and we can get that. So if we want to do much the same Cache game and so over on Cache, we have to pre-declare our views. So we've got a view named all by visitor and by home. And so if we want to do a little, it's hard to call it a benchmark, but just I want to run this query twice and then we're about out of time, but run this query twice and show you the difference between when you have to load and create the indexes versus when you don't. So right now the by visitor index does not exist and so when we tell it to access it, it's going to take about 30 seconds to go through all 24,000 documents and build its index. So while it's doing that, good demo, right? Yeah, I think I'm coming off as a couch hater and I'm not, but I just want to show like the true difference. So that took 26 seconds for it to go through and build it. You asked for it again. Sorry? You're a simple BD hater. Yeah, that's absolutely true. I will wear that hat. Okay, so the first time that we asked it, it took, does this work? Yeah, so 26 seconds. And then the second time, 2.8 seconds. So what was going on there? Well, we saw here that design documents, it didn't have anything. And so now it does have one called by visitor and we can look at it and I've told it to reduce, but if we don't, then we're able to come in and see all of our games. So once we've built the index, it's actually very fast to query data on it if you're using the index, but it takes a while to generate it. Now, this doesn't mean that if I insert a new record, it would have to rebuild the entire thing. That's not the way it works, but there is startup time involved with Couch. I think I'm going to wrap it up. It does. Well, maintain. You mean like it's actually persisted, right? It's there, yeah. That's right. This implementation, the couch game where you include couch potato persistence, I've declared four views here. One by visitor by home and then if we wanted to see all the times that Houston Cincinnati played, we could do that. So it is not creating the views until you actually use them. In earlier Rails, Couch versions, you'd have to do this and then you'd have to run a command that actually created all of your design documents. So I like this one better, but you do run into why did this, you know, there exists a condition where if you just went live, why did this page take 26 seconds to load? Well, it could have been building the index. I haven't seen it. They may. Just do it. And then it's almost where if you deployed, you could then just say and call it, right? Call each of your indexes as far as automating it. I haven't seen it. It may exist, but maybe. I think that'd be a good practice so that your users aren't the first to kind of get screwed over for a new index. What else? Any other questions? Does anyone agree with my sort of loving on MongoDB? No one? Yes, thank you. Okay, so I think I will wrap it up. Thank you very much.