 My name is Arvind. My talks are going to be on CouchDB, CouchDB and CouchApps actually. CouchDB is a document database. Before I get started, the first thing that people ask me when I'm talking about CouchDB is going to be like, who runs CouchDB right now? The answer is CouchDB is really new and people are still building in functionality to it. Not so many people. But I think I can give you the link which gives you a list of all the people who are running CouchDB. A couple of examples that I'd like to just give you is one is Ubuntu One. Ubuntu One is a service that is baked into Ubuntu and it synchronizes your... It gives you something like dropouts so that you can save your files to the cloud and you can also synchronize your bookmarks and your contacts to the cloud. And they're using CouchDB for this. In fact, they're using this CouchDB replication feature. So CouchDB has a built-in functionality for basically replicating a database across multiple instances of CouchDB and Ubuntu One, what they're doing is they run a local instance of CouchDB that stores your contacts and your bookmarks and in order to sync it to the cloud, they're just using the CouchDB replicate feature. So they will get that done. And the BBC is using it to store user preferences and all that for their main website. And so that's again heavily reliant on the CouchDB replication feature because they're running about 32 nodes on a master-master replication feature. So the rest of this talk, I'm not going to be talking about how CouchDB runs. I'm not going to be talking about the comparison between CouchDB and other DBs or regarding issues that come up during deployment because to be entirely honest, I have not built CouchDB apps that are running in a large production servers right now. So I'm going to be completely focusing on, this is like a tutorial, I'm going to be talking on how as a developer you can get started on building things using CouchDB and building Couch apps. So I mean, this is like the very basic stuff about a quick introduction to what CouchDB is. So there are three main things that might be relevant. One is that it uses JSON documents, it's no SQL and instead of using tables, it uses JSON documents. One, a document that's like, you know, you can think of it as analogous to a row in a table in some instances. Another one is that it uses an HTTP REST API so you don't use database drivers and database connectors with CouchDB, your application just connects over HTTP and it uses a very restful API for accessing all the stuff you have on CouchDB. And the third part is that it uses JavaScript. So it uses JavaScript for indexing and querying through CouchDB views and it uses JavaScript for a variety of other things which I will come to next. But let's quickly go through a few of these things. So this is the CouchDB API. The HTTP API, a few parts of it, things that are most useful. The first thing that you will try out when you've just gotten a new CouchDB installation is the underscore utils HTTP endpoint. That is the futon administration interface. It comes built into CouchDB, it is something like PHP My Admin for CouchDB. So you can just take a look at it. So I mean, this is futon, there's something wrong with the monitor, it's not showing red. So all the black areas that you see, it should be red. Anyway, so it's like PHP My Admin and you can just go into the databases and you can see all the documents that are in those databases and you can do a whole bunch of operations right from this interface. The second one is Session which is used for logging in and logging out. Underscore users is a database of users. So this is how you handle authentication and so these are the CouchDB users. And to access any database, you will just put access slash name of the database. So usually what you do is you would use an HTTP put in order to put a database there or HTTP post to that endpoint in order to insert a document into that. You access a document by, you know, slash database slash doc ID. So one thing that you're going to be looking at is, all right, this is a restful interface. So for one document, you want to read the document, you run a HTTP get query. If you want to update the document or put a document there, you will use an HTTP put. So one thing over here is the difference between put and post in HTTP. You would use both in order to update something. But the difference is that input, whatever URL you're putting to, that is the URL of the resource you are writing to. That is the locator of the resource. And if you're using post, you usually do that to write to a different URL. So CouchDB documents can have attachments. And so you can access attachments the same way. And then the remaining set is about design documents and views. So I'll just come to that. Design documents are applications that you run right within CouchDB. They help in indexing and querying the database. And you can actually use them to write simple web apps right within CouchDB, which are called CouchApps. So I'll come to how those work. So design documents are nothing but regular documents. They are JSON documents. And the only thing special about it is that the name starts with an underscore design. And one database can have multiple design documents. And the convention is somewhat like a design document is one application. Now design documents contain all these things. So there are views, which are the equivalent of an SQL query or a select query. Unlike the SQL database, you need to define your view so that CouchDB indexes your documents in the right way so that you can access them. Then there are update validation functions, which are helpful to keep your database valid. I mean, sane. Otherwise, CouchDB is schema free. So it allows anybody to write any document. So update validation functions are run. And they can prevent certain changes from happening in the database. You can also write show list and update functions. These are the functions that actually put out HTML or any other, usually HTML. And these are the functions that you will be writing in order to create a website using CouchDB. And then there are rewrite rules because CouchDB uses very long URLs and you usually want some kind of URL rewriting so that your application has good URLs. So we'll start off with views. So a view consists of a map function. So basically, CouchDB uses map reduce. I mean, guys are familiar with map reduce. I mean, how many of you have heard of this, used it? All right, all right, so it's good. So basically, map reduce works like this. So there is a map function which is run once for every document in the database. And the map function can emit intermediate key value pairs. The reduce function is optional. So when you're doing a select query, so if you go back over here, if you can see at the bottom, you are running a query over there, an SQL query, a select amount from bills, order by date. And the same thing, if you wanted to do, you guys are familiar with SQL, right? I mean, so how many of you have not used SQL for anything so far? So the equivalent way to create a map function for this is to CouchDB has no concept of tables, so the convention is to have a field in each document called type, which gives what kind of document that is. So you will write a map function which emits the relevant columns, or RAM fields, and with a key that it needs to index on. So over here, the key is date, and the information that you're, all right, all right. I think it was getting cut off at the left. So amount is what you're interested in, and you need to order it by date, so you're emitting date as the key. Reduce functions are optional. Reduce functions usually take a whole bunch of these intermediate key value pairs, and then do a reduce on it. So it is like, if you want to do a sum of some field appearing in a lot of documents, or count them, or something like that, you would use a reduce function. So here is one example of a, I mean, this is the example that you saw earlier. And I've added a reduce function, which would basically sum up all those values so you can get the total. So there are some built-in reduce functions as well, underscore count, sum, and stats. So instead of writing your own function over there, you can just give this underscore count as a string, and it would do the same thing. And in this case, you can go underscore sum as a string, and it would do the same thing. And it would be faster because that is natively implemented. So you set up your view in that design document. This is exactly what you write in your design document. You put a hash map called views, and you put a hash map for each view. So the view over here is named Total Purchases, and that has a map and a reduce inside it. So how do you access that? You do a get query to the DB name, slash design, slash app name. So that is the name of the design document, yes? Yes? Yes? So here, behind us, two view Total Purchases, now, you see also behind a picture, what happens to that value that's gone? What is inside the total Purchase? This is the dot. Yeah. I have this. I think I have a very similar view written already. So this is actually the design document that I have written. So what you have is the map will emit a bunch of intermediate key value pairs. Reduce is being called with those intermediate key value pairs. It returns the value after it's called usually with an array of multiple rows from that thing. Oh, where does that value go? You get that value when you query this, when you send a get request to this one. You get a JSON response, which shows the result of that. So I can just show you an example. So this is, in the view get request, you can send a lot of parameters, which will just basically prevent, you know, there are a lot of parameters. One of them is reduce is equal to false, which basically gives you the response as it would be if there were no reduced functions there. So if I did reduce, it will reduce to a single value. And some of the other, so over here, those three lines over there are basically the parameters that you can pass while querying a view. So you can send a start key and key. You can basically filter, sort. And the last three, reduce group and group level, control the reduce function. Reduce is basically true or false, depending on whether you want to reduce or not. Group and group level control, like whether you wanted grouped according to keys or not. So right now, what you saw over here is not grouped according to keys. It's everybody has been all the, so this was the intermediate values which are emitted. So there are five of them. And when I asked it to reduce, it took the sum of all of these, and it gave me just one. And if I want to group it according to, say, right now the key I'm emitting is the customer ID. So if I want to group it according to customer, I just give it a group is equal to true. And it groups according to customer. So there are two customers in my database. So for all the examples over here, I'm basically going to be using this database, which is called POS. It has five documents. I've used the standard document IDs which CouchDB generates. So three of those documents are bills. So they look like that. So this is one example bill document. There's a type bill. There's a customer, the document of the customer responsible amount, and then the items. So there are three of those. And then there are two customer documents, which I'm only storing a name over there. So did I answer your question regarding the sum? So now group level, basically as a key, instead of emitting a single value as a key, you can emit a set of values in an array as a key. So basically, you can have three keys or five keys. So for example, over here, suppose I'm running a multi-store organization. And this is storing the bills for all of them. And I want to create a view which can be filtered according to the customer, and it can also be filtered according to the store ID. I mean, suppose there are five stores. So in that case, I can just change my map function and say emit doc.customer. Instead of that, I can emit an array over there, saying doc.customer and maybe doc.store. So a couple of other things are, you're seeing the view output over there. I mean, it's basically got total rows offset, which is the header, and then it's got an array of rows. And it could also, within each row, because this is a reduced view, it's only sending a key and a value. If you do not do a reduced, it will also send you the ID of the document which generated that row result. And there is an optional parameter you can pass, which will actually include the whole document, in which case there will be an entity doc also in that result set. One thing is CouchDB does not do joins. I think none of the NoSQL databases do joins. So when you look at map and reduce, you might think of one way to basically get around that and get that some kind of function. This is a very code heavy slide, so I'll just quickly explain what is happening here rather than going through that. So suppose I want to do a join between the customer's table and the bill's table. And for each customer, I want to get a list of bills or something like that. So over here, what I'm doing is I'm emitting both bills and customers. And in the reduced function, I'm creating an output which will look something like this, the customer call on customer ID, and bills, which will be an array which has all the bills of that customer. So I'm trying to put the output in that format using this reduced function. So if you look at it, it's kind of obvious what it does. So it is called with keys and values. I'm iterating over the values. If that value is a customer, I put that in the customer field. If the value is a bill, I push it into the bills array. So this should work, but it doesn't. Basically, what happens over here is that when this returns, the return value is kind of the same size in terms of number of bytes as the values array that went in. And CoushDB basically says that this is not scalable. Reduce needs to shrink very rapidly. Reduce needs to take a lot of rows in their intermediate view results, all the map-emitted rows, and compress it into one value or a very small array. It cannot give you a large one. So if I try this, it would give me a CoushDB. If I put this in the design document and did a get query on this, CoushDB will basically give me this error. It says that reduce output must shrink more rapidly. So usually, you can work around this if your application can read the set of keys that were emitted, and basically just know that it actually performs the function. So for example, if it's a very small database, CoushDB might not get to the point where it realizes that this is happening. But usually, it will run reduce once, and it's based on the number of bytes of the input parameters and the output that comes out of it. OK, OK, OK. Actually, that's a good question. So basically, the point of having map and reduce is to help in distributed processing of this. So what happens is, imagine that you have a map function which emits 10,000 results. And say it's working on two different nodes. So 5,000 on this node, 5,000 on the other node. What CoushDB can do is do a reduce on the first 5,000, and do another reduce on the other 5,000. And then the two intermediate values which come, it runs a reduce function again with just two values. So basically, reduce can be run. This is not a very good example because I think right now, with simple CoushDB, you cannot run a system like this. So yeah, usually, it doesn't do it just once. I think CoushDB is architectural design so that it needs to do it. You do not really control that, actually. And it's basically up to CoushDB. And you can see that by splitting it up, it becomes more scalable if in the future you need to do that. Yes, if the output of a reduce function is not even the same size, it needs to shrink very quickly. You will get this error. So basically, you need to do that calculation in your map function. So for each customer, you're emitting customer records. Or you're based on the bills you're saying. Correct. So basically, in order to do that, see, if I over here, instead of bills, if I were to just put a total or a total tax, then the result would be much smaller. I mean, the problem over here is that suppose a customer had like 20 bills, I'm sending all that bill into values, and I'm putting that all in the return value that comes out. If you instead of that array of bills, I just get the sum to put into that one, the return value will be much smaller. In fact, we are doing this. I will show you that example in a short while. So basically, unless you want to return something similar to the input that you've got, you should be OK. If you're summarizing it in some way into a small value, like the total tax that the customer has to pay would be of a single value. So that you can still do in your reduced function. Yes. Yes. So I mean, see, there are two things here. One is that if you do the emit like this, all of your values are collected together. I mean, if you do an iteration over that, don't do reduce, you just take the map and you do an iteration over that. You're going to get the customer first and then the docks of that customer. So usually, your application can be written to take care of that. But if suppose you do not have control over that, and you actually need to get it in that reduced form, like the format that we are talking about here, we still can do it, and we'll use a list function to do that. So I'll come to that in a couple of more slides. So any more questions about views? I think this is the last slide on views. The second thing I want to talk about was the update validation to prevent, because DB does not impose any schema restrictions, but your applications always have schema restrictions. You can't just deal with anything. So to prevent, you know, you can actually write your update validation, which will basically vet every insert into your database, every update in your database, and either make the decision whether to allow that or not. So in every user, every design document, you can just put one function called validated doc update. It takes three parameters, the new document, the old document, and the user. And basically it has the choice of throwing some errors. If it doesn't throw any errors, the update goes ahead. So you can see a few examples of what I'm learning. The first one, I'm checking that all doc access. That means that it's not a new insert in the database. If the author is not the currently logged in user, user context is a document describing the currently logged in user. I'm saying you're unauthorized. And when I'm throwing an error, I can either give the key unauthorized or give forbidden. Unauthorized will basically, you know, you can set up CouchDB to use HTTP basic authentication. So if it is unauthorized, it will basically throw that header that will show that dialog box that says, give your user credentials. If it's forbidden, it just gives you an error. So you can see that these are a few basic validation functions. Now the natural question is what exactly is user context? That variable user CTX that comes in an update value. So these are basically the, I've told you about this underscore user's database. So in that database, there will be a bunch of documents. Each document describes a user. And that document goes something like this. It will have a name list of roles, which is basically an array of strings, and then a password and whatnot. So basically the name and the roles from this document is what gets passed into the user context. So if you see the second validation rule over there, I'm checking user context roles, whether it includes the string underscore admin. And basically that is how I check whether the currently logged in user is an admin or not. So another thing is you can do access control of the server level by creating admin users. Admin users are not in the user's database at all. It is in the INI files and all. One more thing is at the DB level. So at the DB level, you can define administrators and readers. And CouchDB will restrict access on that basis. So I can just show you the foot on security. So basically that opens this database. And it's a special document called underscore security, which has got a list of admins and readers. So you can define names and roles of people who you want to set as admins. The only difference between an admin and a reader is that the reader cannot write into design documents. Even the reader can write into the database, just not into design documents. And the last one is you can use access control at the document level using this update validation function. So this is a common technique that is used. You have, at the documents itself, you can define an object called permissions, which will store all the so basically default permission or reassigned permission. And it defines the users and roles who have those permissions. You can write an update validation function. This is a lot of code in the next slide. So you can just see that this is a function which will just check that the current user belongs in there, in that has the particular permission. And you can just see the last line how it can be used. I mean, if you are in the update validation, you can just call this function hasPermission, pass the document which has the permission subject, and pass the user context, and the name of the permission to check whether it has or not. And it will basically return true or false, depending on whether that permission exists or not. So this is like a design pattern that you can use to have access control list type permissioning on a document to document basis. So there was update validation functions. So any questions about that? But you can interrupt me anytime. So this is about, now, CouchDB, one of the kind of design focus in CouchDB, has been on allowing CouchDB to be the single piece of infrastructure on the server side. Traditionally, you would use a database. You would use an application server, maybe like a PHP or something, ASP or something like that. And then you would use a web server. CouchDB wants to be all three. And show, list, and update functions are basically designed to allow you to do that. So you have to do that case also, or would you just have to use a CouchDB? No, update validation will run no matter how you're writing it. In fact, even if you're doing a replicate from one CouchDB to another CouchDB, when the node that gets written into will run its update validation functions for every document. So the first one is about show function. So show function is basically takes one document and converts it into HTML or something to show it to you. So I can, I think, show you an example which will be a little better than this one. So if you see this one, this is an example show function. So I'm basically taking just like I defined the views. I have a hash map called shows. I have a shown function named edit bill. And that gets a document and the HTTP request. As parameters, it can basically return some HTML to that. So the example over here is also doing the same thing. So a show function is like an HTTP, maybe like a PHP script in some ways. Because you're getting a request and you are outputting a response. And these are the numbers of those requests and response objects. So the request will have whatever you would expect. The method, path, headers, body queries, basically get parameters, form as the post parameters, cookie, whatever cookies have been set. And a couple of things that are specific to CouchDB, which is the idea of the document, which is called, the show function can be called with any document, in fact. And the idea of the document it was called with, the current user context. And info gives you basically parameters, environment parameters about the CouchDB instance that you're running. And response, you can send a response code, headers, body. I mean, one of the three. Instead of body, you can send base64. Basically, if you want to send a binary output, JavaScript doesn't have built-in support for binary strings. So you can take some base64 and call it stuff in a string and put that in the base64 and return it. CouchDB will convert into binary and send it back to the client. Or you can also send any JavaScript object. It'll serialize it into JSON and send it back to the client. So show functions are designed to output HTML, but you can actually use it for something else as well. So what I'm doing right now, if you saw that update validation slide, you're using an array of permissions on the document level to decide whether this person has the permission to do that or not. But this update validation function is only executed when somebody is writing into the database. But suppose you want to restrict the user when he's reading from the database. So to do that, you have to make him only read using a show function. And the show function can actually use exactly the same pattern, have a permissions array inside the document. And the show function will read that document and modify the JSON, filter the JSON accordingly, so that you can do access control on the document level. So for example, what you're seeing over here, I mean, I've made a role called secret keepers who are the only people who can get access to this member called secrets inside the document. So I'm just doing that using the same hash permission function as was defined for the update document. This is the general. So the last one is basically how you will query a show function. I mean, especially you'll put the database name, the design document name, and then you do an underscore show and you send the show name of the show function and the document ID to pass to it. List functions are, again, a similar thing. So you have, I mean, just like shows were used to convert a single document into HTML, these functions act on an entire view. So there's a, yes, show functions are not incremental. Usually it is on a document level. So yes, CouchDB does use caching. No, unless the document has not changed, it doesn't execute again. Actually, CouchDB uses this eTags, which are basically HTTP cache control. Basically, it's like a revision number, which you send in your HTTP response. So if a browser recognizes the eTags, it will, when it asks a request for this particular document, it'll just send the latest eTag. And then the browser doesn't do an actual fetch of the entire document. So yeah, I mean, so if your document was unchanged, show function is not executed again. List work almost similarly, but instead of sending the entire view results into the function as a parameter, because you might run out of memory that way, what you're doing is you're only sending the head of the list, head of the view, which is basically the total number of rows, metadata about that. And within that function, you can call a function called getRow over there, which will fetch one row at a time. So you have to iterate over all the rows in your view and convert them into HTML or whatever. So that's what I'm doing over here. So there are a few functions which are defined in the scope, so you can just call start, which is going to start the output. I mean, it's used for sending headers. Oh, I'm sorry. That's a problem with the presentation, I guess. Then it's over there. Sorry. All right, start ends over there. And then you can make calls to send. And iterate over your result set, and you can do send as well. And just as it is called the same way as you would call a show, I mean, basically you are showing an underscore list, underscore list name. Again, one problem. And you pass the view name. And you can pass any of the parameters that you would pass to a view, like start key, end key, descending, include docs, reduce, or not, all of those things. And this is how I mean, there was a question about how to do reduce. So basically you can use a list function to do a reduce. And it doesn't complain. So basically what I'm doing over here is I'm rolling my own JSON. I'm sending the JSON rows, tag, like a string, and all. And basically you can understand how this works. I'm relying on the keys to be sent to me, sorted by key. So every time the key changes, I flush that array. And until the key changes, I keep adding every value that comes into my intermediate result, which is this one, obj. And every time the key changes, I send the obj. Not really. I mean, see, there is some caching in the background. So if you query a list without specifying start key, and end key, and all, and it doesn't do a reduce. So it will basically, and assuming that there were no document updates, and list was already run in the past, it will be cached. And it will basically be served from the cache. However, if any of these things change, I mean, if you change a start key and key, parts of that list might have to be, I mean, if any part of the list has to be re-rendered, it'll basically run the whole thing again. So the last one, so there are just three things that help you make output HTML from CouchDB. The last one is updates. Updates are basically used for writing into a document. There is no functionality for supporting bulk updates as of now. You have to update a single document. And it is called, it works just like show. I mean, you get the document and the request as a parameter, whatever, if there was any form data, and it was posted, you will get all that in the request. I mean, so the request object will contain all your form data. And using that, you can update the document. So for example, over here, I'm taking ID from the request ID, and request UUID is basically in case the document does not exist, and you want to create a new ID. I mean, it's completely optional to use it, but CouchDB will generate a unique ID for you in case you want to use that for your newly created document. And this should be req.form, I'm sorry. So this update function will basically return an array. The first element of that array is the document itself that you want to write. The second element of that array is the response that you want to send back to the user. So there are three things about creation and deletion. The doc parameter can be null if it's newly created, if it's an insert kind of operation. The document that you return, if it is null, that means that it don't change anything. I mean, if the document exists, it continues to exist as such. So you return a null value in order to do that. If you want to delete the document in the database, you set dot underscore deleted to true, and it will get deleted. So these are the building blocks for making an app. I'll just quickly go through how this can be put together into an actual app. So two things. One is virtual host. So typically, you want your CouchDB, the URLs to be much prettier than what CouchDB offers as default. So there are two things you need to do to do that. One is to set up a virtual host in the CouchDB configuration file itself. So you say that if the HTTP request comes with a particular host name, you'll redirect it to this much. So you have your design document, you have an app name, and you'll actually go to an underscore rewrite endpoint. And the underscore rewrite endpoint is what you see below it. So you use it to rewrite the URL to something else. So for example, at the bottom there, these are examples. So you give a from and a to. So basically, you're taking the from URL which will be matched. It can be a pattern which can be matched, and you can set a to URL as well. You can filter it according to the query. I mean, you can filter it according to the method. So you can write a rewrite which works only for post requests or for get requests. And any URL segment that starts with a column is treated as a name part. So that comes as a variable which you can later use in your output as well. URL re-write. And in order to create design documents, there are two tools that are usually used. There is the NodeCowChap tool, or there is the Python tool. So both of these are very simple. Basically, you write your design document in a regular JS file. You can have attachments and all that in a separate folder. And these tools help in basically pushing your document into the database. So I use the Node tool. And so this is typically how you would write a CowChap if you are using Node.CowChop.js for pushing it into the database. It uses a common JS module kind of thing. So your entire CowChap is actually written like a common JS module. And your design document is exported. So basically, inside your design document, you do the regular thing. You can create views. You can create re-write, show functions, and all those things. So I'll just show a quick example of one of those functions. Is this visible? I mean, I'm done. Just a couple of minutes. So this is a document which I've already edited. The INI file created an endpoint called Bench. I mean, I've created a domain called Bench, which is the CowChap. So this is using a list function to basically run a view and get the converter into HTML. This is using a show function to actually create a HTML form kind of thing for showing a single document. And when I hit Submit, it is using an update in order to update the same thing. So that's a really short example. Any other questions?