 Hi, so I'm Will Limberber. My talk is relaxing with CouchDB. And before I start, I needed a little bit about the title. It's actually a requirement. It's somewhere in the license that you have to put the word relax within. I think it's 20 words of CouchDB. So they have it there. It's there. When you start it up, it's there. Even when you compile it, it's got there. They're also writing a book. And I think it's somewhere in this book that they're writing. I haven't found it. But so what I want to go through is, in this talk, is why you should be excited about CouchDB and why you should want to try it out. And then it is significantly different from databases that we're all used to, relational databases. And so I want to give a little bit on how to start approaching this use CouchDB and how to get into that mindset. So first, I want to talk about what CouchDB is. And I saw there's a couple of people using it. Has everyone heard of it, though, at least? All right, so well, it's a document database. And there's no schema. There's no, each document can have a different structure from the next one. And usually, though, it's a little weird, but it's not too much of an issue. Because you're going to have all your user documents are going to look the same all year. If you're doing a blog, all the posts are going to be the same. If you have books or something in there, they'll all look the same. But they're all mixed and together and jumbled. And there's nothing. So that's not an issue. But what is nice is that someone was telling me about an example. They want to do resumes. And everyone's resume is different. Like, some people have these fields, and some people won't have these fields. And coming up with a database schema, that kind of every possible thing is pretty annoying. So it's written in Erlang. From that, it's very concurrent and scalable. And the choice of Erlang itself shouldn't be too much of a reason to use it or not. It's just some of these things that they've gotten, allowed them to develop these features much quicker. The other exciting thing about KashiBees, it's built for the web. It's built for web applications. Whereas, in a relational database, this has been around for a long time. They don't quite fit what we want to do with databases. And especially with Rails, since we treat the database just as a place to store and retrieve data. And a lot of times, don't use any of the extra features. Like, for me, there's no difference when I use MySQL or Postgres. Because I rarely take advantage of the features that Postgres would offer over MySQL. And the interface for KashiBees is restful. And this is pretty cool. When you get a document, you put it, you delete it. And it has all the same conventions that we're used to as Rails developers for a restful thing. The last thing is that what you're basically doing is storing and retrieving JSON documents. And this, by itself, doesn't seem that awesome. But this is actually the reason that I like it the most. It's you can solve some problems that you just can't do with a relational database. And so you'll know what JSON looks like. It's a subset of YAML of sorts. And so this example is, if you're writing a book, it has a lot of chapters. And if you were going to do this in a relational database, you'd have to have a separate table for the chapters. And with this, with the document database, you can have that all right in the same place where it makes sense. The chapters don't really make sense to exist by themselves, but they make sense in the context of the book. A similar thing is you could have a blog with all the post and all the comments could be in the post, or anything's like this. And this is the reason why I started using Couch to be several months ago is because I had this problem where I had my main model, that is what the user was actually interacting with, but it had many other things, like a book has many chapters, and I need to do versioning on this, where if any of the chapters were touched, any of the information about the book, anything that was touched, I needed to do another revision. And with a relational database, it's really painful, because you have to keep track of all these different things, but with this, when they're all in the same place, you can just do another revision really easy. So another feature that it has is that it doesn't really have locking in the traditional sense of a relational database. This is the awesome thing I drew in OmniFocus. It's really high quality. But so with a traditional locking database, if a write comes in and other people want to read it, they have to wait for that write to finish. And but with the way that they do it is, there's no, when you write it, you're not actually changing the data that's there, you're adding another thing. So the old one can still be read. So these first two reads would be read as the old data, and then that last one that comes in, that'll get the new data. But so you don't have to slow down how should we constantly be throwing out requests, even while writes are going on in the same data. Like I mentioned, it's add only, not just adding documents or updating documents add only. The way they have it structured is that even deletes don't actually delete the existing data. And a consequence of this is that your couch to be database is never in a bad state. When you hit Control C and stop the server, it's down. It doesn't have to wait and write anything. And when you start it back up again, it's up. There's no, it doesn't have to check for anything. You could pull the power cord from your couch to be server and plug it back in and you're not really gonna lose anything except for when I was trying to get it when I was down. The replication that they do is pretty cool. Like built-in two couch to be servers can replicate with each other. And then this idea of eventual consistency. If you want more in depth on this, the book that I mentioned earlier that they're writing, if you go to the couch to be website, there's a little book thing that goes into this in much more detail. So if you have two databases and I make a document here and the other guy makes a document. And when they go to replicate, there could be collisions. There's a programmatic way that they have that one of them would always win. And the older visions are never thrown away. So you can, it's up to you to decide what you wanna do in case of conflicts, but it's all there. There's another cool thing that I'm not gonna get into too much, but there's been, since the couch to be is a web server because it does everything over HTTP. You can have, there's a couple of people who are doing like couch database only blogs and like a Twitter client. And all the code is in couch to be all the JavaScript and everything. And they can all, since the replication works, they all kind of replicate amongst each other. And it's kind of this amorphous application development that kind of like one of the developers of this like to TI calculators in high school, however one kind of hacked on their version and spread it around. With the multi-tenant thing yesterday, you can do this pretty easily. There's still some of the issues, but you can do multiple databases very easily because it's just in the URL when you're getting the rest stuff. So you can have, if you're doing something like Basecamp or something where everyone's data is different, you can have it in a separate database without having to, the time spent to create databases as much. And it's really easy to integrate with other services such as like Lucene or other full tech search things. They have a nice little API for putting the data back across. And there's a way actually to store files in you can have any document can have any number of file attachments. And they do it in a smart way. When you want to, when you get a document that has these file attachments, it doesn't start streaming the file over. It just gives you some of the metadata. And then when you actually want the file, you do something different to actually retrieve those files. So the big difference is that it uses this concept of views instead of queries. And the views are, sorry, just a second. The views, this is the biggest conceptual difference. Like all the other stuff, it's kind of like a database that has a couple of features or something because it's the same idea of storing and getting data. But these views, the heart is kind of stumbling black. They're stored just as documents like anything else. So when you replicate or whatever, they're all there. And the views use this view server. Default, when you install it, you get a JavaScript one in Spider Monkey. And there is built in stuff. I mean, there's a way to swap that out. And you could use Ruby if you really don't like JavaScript or I've seen some Python ones. I haven't actually personally done this because the JavaScript is powerful enough and from all the AJAX stuff, everyone's familiar with JavaScript and it works really well. The concept that it uses is map reduce, which is sort of the borderline buzzword, but it works really well. So you have the map step which goes through every single document and then the reduce step was optional. So the first time you run it, it has to, of you, it has to go through every document which is horrendously slow. Not even if you have books in there and you also have articles or something else. It has to, it can't just go through the books because it doesn't know which documents are books and which aren't until it goes through each one. But then once that's done, it builds an index so it doesn't have to do it so it's much faster. So the map step, what you do is you go through each document and you have a key and a value and then it sorts it by the keys and that's how it persists and that's how you sort of query it once you have a view and I already talked about the persistence index. Oh and the other thing is, so it has the index and then any new data that comes along and it deletions or updates or new documents, the next time it only has to go through those new things so then as long as you keep your views fresh and there's some built-in mechanisms to, you know, you can maybe trigger it every 100th thing or you could have a cron job to run every couple of minutes to just run your views in case no one's actually accessing those particular ones and that'll keep everything nice and speedy. And so when you're writing a web application like you know ahead of time which queries are gonna be run so that initial slow thing really isn't a problem. So since it's built on HTTP and REST you get some benefits with that. The biggest one is being that it's cacheable using existing tools like the reverse proxies and load balancers. You can, you know, have a cluster of CacheDB servers and you don't need special tooling to do this. Your standard stuff that you're already using will straight out work. And since it's using REST, you know, we're all familiar with REST and it's easy to use. So getting started, we want to, so I wanna get you all on the right path to, you know, get out and try to use this. And what you have to do is, you have to install it from head because the latest release version doesn't have a lot of the tools that some of the newer gems and stuff that are out require to be at least 0.8 and I think the release version is only, I don't even know, but the latest seven head is like 0.9 and they're working on the way to 1.0. So you need already have early installed because that's what it's built off. You need the spider monkey, a JavaScript engine and this thing called, if you're doing on the Mac you need this ICU which is like an internationalization thing and the error message you get if you don't use it as kind of not too clear, so that's what you need. I think the best tool out there right now is CouchRest from J.Cris. This is the GitHub thing and it really does a lot of way to provide a nice API for you to use in Ruby. And, you know, default API to CouchRest is fairly simple but this, you know, provides just a thin layer of tools that you can build your own stuff off of. And a lot of people have built more complex stuff off CouchRest but this is a great place to start. So this is some code example of what CouchRest looks like and you, this database bang method it sets it up as the database that you're gonna use and if it doesn't exist, it'll create it. If it already exists, it'll just use it. So this is one way that you can, like if you were gonna do a multi-database thing, you could just put the username in there, you know, like in the string or, you know, however you want to shard. So saving it, you just, you give it a, you know, a hash and it'll turn that into JSON and save that. And then you get a response back with the revision number, the ID and then like the status. And then, you know, this is just really easy getting data in and out. It's pretty easy. And just to prove that it does actually work over HTTP, you can go back to the command line and just, you know, get the URL there and you get the data back. So, you know, it's, I guess it's just proving that I'm not lying that it's over HTTP. So if you're gonna change something and save it, then the revision number changes and everything else, I mean, the change does and everything stays the same. And if you destroy it, if you get it, you know, try to find it again, it'll raise an error. Also built in, this is the CacheDB's default kind of user interfaces. Once you install it, you just go to the port that you're at, slash underscore utils and you get this nice interface for viewing your data and you can do some editing and stuff right in here. And you'll see, this is the one from the previous example there, it stores the old version and it only stores it however until you compact the database. So you can't use this, it's tempting to use it as a version thing built in, but you can't because a periodic that you wanna compact your database to save space and you'll lose all your old versions when you do that. So the view thing I find is the hardest and the thing to wrap your header on. So I wanna spend some time going over views. And so this is a JavaScript here and what it does is it takes in a doc, document. And for this one, maybe I just wanna, this one, you can, so the documents or the view documents are namescaped. And so this is books, it's just arbitrary work and then all is the name of this particular view. So what it does is it goes through every single document and if it's a book, if you know the type, that's just an arbitrary field that's on the database, if that's a book, we can admit it, we can use null for the ID or for the key and it'll just use the ID of the document and then admit the entire thing of the document. And this is the view in the web interface. You have all the IDs and then you have all the data because that's what we admitted, we admitted the entire document. So this is a fairly trivial view but you can see, this is kind of just a primer here. And so what you can do that's more powerful is if instead of just emitting only the books, only one type of document, you can, for the key if you put the type of it and then the document, what you have now is a queryable view. So I just made, for this example, I just made I think 30 documents, 10 of each type article, book, and maybe user. And so the old one from the previous slide that just was books only, there's 10 of those. And if you did this one, if you call this one by type because that's kind of what we're doing here, what we're kind of doing here by type query. And if you just do it by itself, it'll return everything. If you do it by and pass in the key, if you pass in the key book, then we get 10 which is the same 10 that we got before. So what's happening is when it's building this view to disk, I said it was persisting the view, it stores it as a tree and it does it by the type. So in this case by the type because that's what we're emitting. And so it'll go down and just pull out the ones that match. So that's how you can do kind of customizable queries. And this is very powerful because once this view is built and you keep it updated, I mean this is instant. It doesn't have to do any extra work once, as long as the view is reasonably fresh. For example of a one with a reduced thing, this one is just doing the size. So we're still doing the same map step as before so we can do it by type. But then the other thing, so then they get passed into the reduce function and this one just returns the length of the values array. So what we got here is the first example, not doing any sort of passing in a parameters or queries, we get three which that doesn't make sense because what there's 30 documents. But what happens is you need to tell it to group the results. And it's one thing to be worried about is when you're testing the views in that web interface that comes with it, it automatically passes this in and this is a common stumbling point because you test your view out in that thing and you get the right results but then when you use it from your code it's all wrong. And you have to remember to tell it to group the results. So once you do that, we have the article, book and user, each one is timed. And if you only want part of that data, it's the same thing as before we can pass in the book and then we get the time. So the versioning thing, that's the big stumbling point that brought me to CachedDB. So one way to solve this is if you have a version number that just increments, you have to increment it yourself in your code and then some sort of master ID that links all the versions together. And what I do is it's the first, when someone creates a new thing, that first one will be version zero in the master ID and then from then on, we can add in the new versions to that group and then any other additional data would just sit there in the document. And how does this look when you want to get this? So what this does is it returns the newest version and it does it fairly simply. So what we do for the key that we're emitting, that we're gonna be doing searches on is that master ID field that we added in. And then we return the whole document there. And then what makes this work is a reduced step. We just go through all of the versions, pick out the max one and then return that. In the reduced step, they come sorted by not the version number. So we can't just pick the last one off or whatever. And one of the more complicated things, there's a great article if you search couch to be joins and the joins then quotes because they're not really joins is view collision. This is one of the more non-obvious ways to use the view capability of couch to be. So since you can have these deep structures for the common comment idea, it's possible to have the comments in line of the post. And that's attractive because it's all together, all one thing. The problem being if a lot of people are trying to post at once, you're gonna get more conflicts and that can be hard to deal with. So you can store them separately just as you would in a relational database. But then the problem is is how are you gonna get all of them together? And you could do two queries, sure. Get my blog post, figure out what that is and then get all the comments. But a nice way is if there's a way to do it all in one step. And so for this example, we have the idea and reversion number, those come with couch to be for our posts over here and then our comments. And also in the data here, they would reference that they're part of this, they're for that post. Sort of like foreign keys in a relational database. And so what you can do is when you're doing your view, you can admit either if it's a post or if it's a comment. And the key thing, like before I was just using one single value for the key and for most cases that's what you wanna do. But you can have arbitrary JSON structures as your keys. And couch to be has a way that it's documented of how it will crawl through and how it knows how to compare, sorry. How to compare two different JSON things. So what happens here is if you have the tree idea of what it builds, this ends, it's hard to wrap your head around. What happens is when you pass in just a single document ID and nothing in the second spot, it'll give you all of it. And it'll give you the zero and one makes it so that the blog post comes first and the rest of them are comments. And this is really powerful. By having the full power of a real programming language like JavaScript as your querying thing, you gain a lot of power relative to just SQL statements that get long and nasty and complex joins. So this is my favorite, so if we have all that theory, we wanna pull it now into Rails. There's a couple libraries, like I said, that are built on top of CouchRest. One of them that comes with CouchRest currently but isn't gonna stay in there for the whole time is CouchRest model. And I really like this, I've been using this and I know of at least one guy who's pulled this out of CouchRest in order to maintain it after it gets removed from the CouchRest thing. And so this lets you do similar, you can inherit from CouchRest and you get a whole bunch of things, sort of like active record, but with CouchDB spin on it. And to get this working, you need to put this guy in your environment, or be, you don't have to do this, but this gives you the kind of test, development, production system that we're used to. But that's not strictly necessary. You can call it whatever you want, or if you're doing multiple databases, you would wanna do this elsewhere. Yeah, so the couch potato from this guy was actually mentioned before. I really wanna like it, it has a whole bunch of features that are almost there. I really wanna like it, and I should just finish them, but it does the versioning thing, it does sort of like an ordered list, and a whole bunch of other things that are, it took kind of like rails, some common rails, plugins for active record, and threw them in there. And it's pretty nice. There's ActiveCouch, which is supposed to be, it's supposed to be closer more to active record. I know there's a data mapper adapter for it, but I haven't used that personally. I mean, just sticking close enough to the couch rest, you go a long way with that, you don't really need a lot of these things. But it's not all roses in couch to be land. It's currently a big moving target, there's a lot of changes, they haven't hit one point no yet. I'm not sure what their time scale is, but you do have to keep up with it, which isn't a huge problem, but it's something to be aware of. It's not a whole lot of people have taken it to production, things, there's not a lot of stories that you can get out from that. So the arbitrary queries are slowly, like I've touched on previously, but for a web application, that's not really a problem, but if you need to do like reporting or something that you're not running these queries over and over and over again, this isn't the choice for you. The lack of supporting tools, since it is new, you don't have the kind of infrastructure built on top of it like you do for relational databases. You lose out on a lot of the stuff from, you take for granted from ActiveRecord that it does a whole bunch of stuff for you that just isn't there when you're doing a couch because it hasn't been built up yet. I mean, there's some spikes in that direction, but they're still all relatively new. And that, you know, it's a pain point because we're used to really quick development with ActiveRecord and you just have to do a lot more for yourself where you didn't have to before. The biggest thing that I found was lack of intuition. Like I get relational databases. I understand what they're doing. And this, you know, is new. I'm not, you know, I don't have a straight up, you know, answer, you know, like our gut feeling that has been built up over time of, you know, what's gonna work and what isn't. So it's a lot of experimentation. You have to, you know, play with it a little bit more and figure out what works and what doesn't. You know, there's some great articles out there, but not as much, you know, not nearly as much has been written for relational databases. Here's what I want you to do next. I want you, you know, next weekend, not here on the conference internet, but download Erlang, download CouchDB, get it running and just play with it. It's not scary. Don't be afraid of it. It's fun. And you know, you don't have to build a whole application on it. Just get couch rest. Put some data in, put some data out. I do this a lot when I'm trying to test new features, you know, build up, you know, 300 or 1,000 different documents, you know, loop through, put them up and start playing with the view feature. Because once you get the hang of it, it's really powerful and you're really gonna like it. And that's all I have to say about CouchDB itself. I know, I'm sure there's questions and I'll be willing to answer them. Yeah. Is the works installed on CouchDB now working on OSX? Is for a while it wasn't working? I don't know if it works, but the last time I tried it, it was way behind and like the Lang Alex, there are the, not Lang Alex, but the J Chris couch rest gem, it requires it to be at least I think 0.8 and the port one, like they haven't released, released something in a while. And so a lot of, I'm not sure exactly what doesn't work, but I know it doesn't work. Do you have a recommended link as far as where to get instructions on? If you go to the CouchDB.org, their wiki has installation instructions for every OS and then there's a section to do it by source and it's just, you just grab the seven head, you have to do a bootstrap command and that'll check to make sure you have everything, configure and install and I mean, it's pretty slick. How is it for memory? So I haven't profiled that myself, but I know some people who have talked about that and it's actually, I don't know that it's hard numbers, but I think it's at least comparable, if not slightly better than like maybe my sequel for comparably sized things. I think someone said that they, like storage wise, it wasn't as much, I'm not sure about running memory. You'll have more problems with your Ruby app and object creation, your object space and you'll have to attach that. So keep that in mind because if you're using CouchDress then you let's say you generate a few, your key pass, 10,000 items and then CouchDress model will actually expand to get 2,000 objects for some capital. So you'll actually, you'll run into those memory entries because we were doing something where we had 20,000, maybe there was like 1k of text in there, 20,000 of these over and over can actually add something to a budget in residence. So what do you want them to worry about? So you're saying that CouchDV by itself isn't bad, but you're gonna run into Ruby in it. But as long as you have seen, because in CouchDV you can pass in the limits, so they're not creating hundreds of Ruby objects at once. What do you, which has 10,000 or 2,000 or 100,000 items in it? Sure. Yeah. So what is the top number of documents that you've placed it in and had it still be useful at this point? And is there a theoretical limit or a limited, is it a hard limit developer at this point? I'm not sure if there's a hard limit. Personally just in playing around with it, I did, I think I generated the max I've done personally is I think three million documents. And that had, I think it was like, it was a lot of storage. But I just wanted to see how long the view stuff took. And the first time it took a whole lot of time. And then maybe like 10 to 20 minutes. But then I did the same thing afterwards and it was instant, which that's, I thought that was pretty neat. Yeah, so I don't know if there's a hard sublimitive for that. But, and these weren't, I want to keep in mind, these weren't like complex documents either. They were just, they had a random number. And then I was saying, give me all the, then the view was all of them were greater than a certain number and you could change the number and just get, you know, skim off the top. Yeah. The point you made about not being suitable for two dynamic queries or something like that. Yeah. How would you characterize the queries I see are written in JavaScript, right? Is that what you're saying? Yeah, so what is what we're going to do? So, are they cashed, is that what? So, why is it dynamic, more dynamic queries, why is it poorly performing? Sure, sure, so what it has to do is go, when you make a new view, it has to go through every document and build it's, you know, this is my treat, it has to build my, it has to build the index and or, you know, write that tree out to disk and there's no way for it to, there's no other indexes for it to go off of. So, it has to go through each one. So, you should really think of those queries as indexes that you're generating. That's why they call them views and not queries. I've been using the query word just, you know, because it's, you know, what it's comparable to in the relational database. Okay. But, Once you build them, it's fast. It's practically instant. And then, so you know what these revision numbers, that's how it keeps track of how old the view is and then the next time it just goes through the new ones or any of the changes. So, that's why you have to keep your views fresh because if you don't, like, let's say you have an app and maybe one part of this isn't hit often, the first user that goes there, it's gonna have to crawl through all the changes elsewhere. So, that's why you need to be aware and keep all your views, you know, fresh. Yeah. Just, can I ask you to have anything where you can, like, in one HTTP request, send it a bunch of keys and then have it to do back and forth? I'm not, I don't know for the views. I know for the, maybe, I don't know the answer to the view things. What you can do is you can bulk, I know you can bulk insert and update data, but I haven't had a need to do the view thing, so I haven't looked into that. So, I'm sorry, I don't know that. It's like, it's kind of like the whole, well, you have the N plus one problem, right? You have a bunch of articles and a bunch of comments. Like, the example you get works for like, if I want one article and I want all its comments, you can get it all back, but, you know, without that, and at the behind, you know, 50 articles with 50 comments each, you end up with all those HTTP requests without a batch. I know, for this, for this, when you said, If you wanted to get 50 posts out of all the comments, that's 51 queries, right? Do a trick where the, where if you had another, if you had some sort of way to know ahead of time, which, I'm not sure. Yeah, sure. I think that one of the things, if you have a post plus all the comments in a single document, then it's one you to get all of it. You have, I just question whether there's a valid post plus all of the comments in one interface. And I would say likely, so that you show them, for example, the post first, and then put them to just the specific set of comments, let's keep a view to return that set of comments and that should be fast. That's my take on it. And then there's one thing that I forgot to mention, is that, you know, like I said, you know, it was, my main reason was for the versioning. And, but what it allows you to do is, you know, like I realized that I've been hampered somewhat by the relational database kind of way. Like, when I'm thinking of my models, I'm like, oh, you know, I straight up start thinking, oh, I'm gonna have to store this, it's here, the store this here. When really that's not, you know, what I wanna do, I shouldn't be thinking of how I need to store this when I'm trying to think of, you know, the next higher level up at the domain. And this lets you be much more fluid on how you're going to store and it's been, you know, it's great. You know, you're supposed to learn a new programming language over here. This is like, you know, a new database paradigm every, you know, a couple of years. Is there anything else? Nope, okay, I wanna thank you very much. Nothing great.