 We will talk about CoachTV and he has been using MongoDB also. Learned about it, yeah, I don't know anybody, so what do you think about it? Yeah, we will talk about, maybe, comparing them with the kids, so it's quite interesting to see, like, both sides. And yeah, we will just exit the floor with this. Thank you guys. Sorry, it's not a really nice session, but it's enabling. It's not as exciting, it's pretty boring, less JavaScript-y. But yeah, it's more about HTTP, less about JavaScript, but we'll see. So yeah, I'm Srikanth, work for a small company called Dacus here. Doing a lot of CoachTV JavaScript stuff, so, so, so, she invited me to, like, have it booted up. Here I am. So, I wasn't making it like a really session, like a presentation sort of thing. Remind everybody, anybody use Remind? Yeah. So, it's a simple Remind-based thing, let's see if it goes. CoachTV, have you used CoachTV or how many have heard CoachTV? Used? No. Okay, so do you think it will be better to sort of run through what it looks like and what it is before I go into the talk? Yeah. So yeah, so this is, yes, so yeah, the mantra is relax, so just apply and get to home, maybe drink a couple of beers. So this is, so you log in to, so you just see this standard, hey, welcome to it, CoachTV sort of thing. So underscore utils is where you go to sort of see the foot on. Foot on is the web UI. It's nice because it comes built into CoachTV. All right, so simple things, everything's a database, let's take, say, package metadata. So how many use Node.js? How many are aware of this thing called Node.js? Okay, so basically this is the backing thing for Node.js. All the modules are right here, so you can dig into the modules. So just to brief this thing, two things to remember about CoachTV, HTTP and JSON. Like it's a JSON stored, uses HTTP for everything else. So this is a representation of the document. You can edit it, you can, this is the JSON representation, it's a slightly nicer representation. You can modify it right here to save stuff. Okay, let's say I want to add a category to this guy. Hit this button, hit this button, save. So I just added an array into it. So basically it's a standard document, so it has arrays or hash as it feels. And you can add any number of them, like anything else. Save document, yeah. You can just search by package, you can do that package sort of thing. So yeah, it's basically very simple document store. But there are a bunch of things that you can do with it. Let me log in first. That's good, that's when it's kind of password. Okay, so it has a simple HTTP on this thing. It's not like MySQL or anything, super secure or whatever. So if your security is your concern, then probably avoid CouchDB. It's just the latest version has HTTPS, but before that there wasn't much. How many of you have used MongoDB? Okay, so it's a decent balance. So to be honest, I haven't used MongoDB that much. My colleagues forced me to use it because that's the deeper choice. And yeah, sometimes I'm stuck over it. You can see what my preference is obviously. Yeah, I think MongoDB has a very effective marketing department. Exactly the opposite on the Couch side of it. They have screwed it up so badly that people are, should I even use Couch one? There's Couch base and there's Iris Couch. Iris Couch is the hosting. I don't know what it runs though. It runs CouchDB, regular Couch. Can you talk about how you take very... I'll come to that. Yeah, that gets into slightly complicated stuff. So I wanted to keep it for the end. Okay, so everything is a document. The only concept you have in Couch database is documents. So when you say all documents... So there are documents, the real physical documents that you actually store and retrieve. And there are design documents which are sort of metadata to the document database. That's how you use CouchDB to retrieve data. I'll show you that if the light turns up. Are we just taking it out here? So let's... Should we be okay now? So there is a section called design documents. And that's also a special kind of document. It just has a different type to it. Anything that has underscore is reserved by Couch. So underscore utils is reserved by Couch. You can't have a route that says underscore utils. Or basically anything like that. Underscore design is also a special listing. Language JavaScript. And there are these things called views. I'll come to that later. So we can actually do a quick look at how this works. So this is a view called call. You can look at the code here. It's all JavaScript. You actually have an option of choosing between JavaScript and copy script in the latest version. I think I said reference JavaScript. Basically there's some JavaScript you write to output this data. That's as simple as that. You can look at another view called rank. That's what it outputs. We'll see how you use it. This is the code for that. This is a little simpler version of that view. Basically it takes... So what it's doing is... We pull some information from GitHub and say how many watchers, how many folks are there for this project. And rank of a project is basically some of that thing. And we say emit rank and description. I'll come to MapReduce in a better site if it's going over. But yeah, so you run views to get data. Simple. The only other thing I want to talk about is the replication. Wish I could show here... Sure. So basically replication is a way of getting data from one place to another. Replication is one thing that CouchDB does better than any other database around. Like, you get no question about it. So you could do something like... Like what we do is there is a central NPM registry which we sync into our CouchDB and make modifications sync somewhere else. So I have basically spread this NPM registry around like in 10 places now. So it's just trivial to sync. Has anybody synced MySQL recently? How easy or hard is it? It's not easy. It's probably not that hard to set up. But once... To keep it going is really, really hard. It's a non-privilege. That's why you pay VBAs so much money. Not true in CouchDB. It's just... You hit this button and you say, Okay, let me actually... Let me try duplicating one database right here. So... Let's say... Registry is a very big one. And I can say continuous. So any updates to the package metadata will actually update automatically here. Yeah, so it's it. Okay, done. Ready to go. You can look at the status here saying, Like, yeah, 8% done on MySQL. You can go back. You can see that the numbers sort of increasing. It's really fast. Now it's already synced. So I can do exactly the same stuff. Exactly the same stuff on this database as well. Like, think about running it on the phone and like syncing it from a central database and running it on the phone. Just looking at all your contacts. Just running off local phones. This thing. Anyway. So, yeah, so that's like a whirlwind tour of CouchDB. Any questions? So just another thing. I mean, I generally miss context, like lose context very quickly. So if you have questions, ask right away. I'll probably forget at the end of it. Any questions? I think... Also, just quickly. Replication events. Even the events of it. Finishing replication and all that fires events. Like, is there a way for me to know when it's done? No, you don't have a way to know when it's done. So the only way to do is status. Or you can look at the changes feed. Oh, the changes feed. It comes in the changes feed as well. Awesome. So you said you could have CouchDB in your mobile phone. Yeah. And then you can work offline. Yes. And then you can synchronize it with the... Exactly. Exactly. So Couch is available both on iOS and Android. I don't know about the REM side of things, but yeah. We basically have... Like we've written apps that basically... People use tech... Like basically on Akash, you run CouchDB, go to the village, take all the medical information, sync it up to the database. It's really cool. It's really... That's when you sort of see the power of... What you can do with this sort of thing. You can press the tablet, right? Yeah, press the tablet. It's still in use? It's in use. It's going to be so much better. So yeah, I think a lot of people are either prototyping it or using it as a test bet for doing maybe the next version of it or really seeing if there's a use case for using that sort of devices or maybe the higher versions of it in the future. So this stuff was for jargon-based hospital, basically. Who go to the villages and get patient information. Okay. So yeah, this is sort of my agenda for the talk. Hopefully... So keep me in time because I know it'll overshoot the whole thing. So I'm going to talk about the data store. Writing the whole stack application, something that Meteor sort of popularized recently but it's been around for like three years. Well, not the same way, but sure. Data store, the whole application stack. I want to talk about where it shines, where it does really well and the bad things in couch. I mean, couch has its own couple of big issues, actually. And then if you guys are interested, we can have a conversation around the couch versus Mongo. We'll see. So how many people are familiar with Capital? Capital. Consistency, availability, party, and tolerance. Okay. So basically, I'll do it very, very quick. This thing. So there's at least, I think it was in 2000 or something this guy called Brewer said, hey, you can only choose between consistency, availability, and party, and tolerance. And like, till then, till then the whole available, till then consistency was the biggest thing. I mean, every database had to be consistent. If I ask you for how many, how much is your bank balance? It can't be 2000 or it can't be 5000. It has to be 500, exactly what I've arrived at that point in time. Over a period of time, especially the scale, people started thinking, hey, availability is more important than being consistent. When it's okay for me to show, this underscore package has 500 followers, but actually it is 600 or something. I mean, it's okay. It's not making the fundamental difference to anybody else. So there are these choices you made. So we can talk about it offline. I mean, afterwards about what it actually means. Basically, they said you can only choose between consist two of these three. Either it can be partition tolerant and available, it can be consistent and available, or consistent and partition tolerant. What Couch TV chooses is something, say I'm not going to be consistent. I'll be eventually consistent, but I'll be available and I'll be partition tolerant. Like I'm leaving some unanswered questions. Availability means it's always going to be available, whatever happens. Either it's down or it's up. There's no inconsistent state. Partition tolerant says, hey, I have a net split and I have two servers, one in California, one in New York. Will the apps still work? Will they have to talk to be consistent about it or something? Am I making any sense? Yes, yes. You can ask questions. I'll assume something. Availability is it cannot possibly be down. Sure. And the partition tolerant basically says that if you partition your database and say that you've cut the connection between two masters there, can these two remain available or remain consistent? One of those two is possible. He's better at explaining things than I am. So the choice that Couch has made is being available in partition tolerant. That's also the choice that Dynamo has made or Werdermor has made. That's not the choice that RGDMS just makes. They're always consistent, they're always available. They're probably not partition tolerant because they need to think about three good articles about Brewer's Capital. So basically, the gist is Couch will not be consistent at all the time. If I've written some information into my mobile phone and want to sync it up somewhere, the central database will not be consistent with what information I'm giving here. So they're showing two different information. And that's the gist. At some point an application will happen, they'll sync up and they'll all be consistent. But it's not consistent at every time. So three things to remember. It's a JSON store, has an HTTP API and designed for application. I showed you the JSON store, so I'll run it pretty quickly. So the fundamental thing with Couch is to think of not relations. If there's a contact, he has addresses. It's actually embedded inside his field. It's not a separate document that you create somewhere else. So it's a slight modeling paradigm shift, but I think most people are familiar. I have one number, it belongs to me. It can't be in a separate table somewhere. So what it basically avoids is joints. But yeah, so that's the model it takes. I'll skip the modeling part of it. So it's all, it's just strings, like it's ASCII or Unicode, whatever. So it's a non-properative format like Mongo, like Nobisans. So the advantage is it's very easy to do full-text search. It's probably a hundred lines of code to do the Couch Lucene project. There's probably a hundred lines of code. It's just that simple to just hook into the change of speed and write everything into Lucene. So that just simplifies a whole lot of things. Not having relations and joints removes to an extent, removes a lot of locking bits. Also the fact that it's MVCC. MVCC means multi-version concurrency control. Basically it's like, think of it like, every document, every check-in is a version. Every document is a new version. So it stores all the JSONs. Version two was different from version three. It has version two as well as version three. Maybe actually I can show you here. So you can see this is the current version of the document. So it has this thing called previous version. So there's an exception. But basically the thing is because it's replicated, it won't keep all the versions. The central has all the versions. Let me see if I have my own document. So this created a new document because I added this variable code. Now I can go back and see what the older version was. And it's also appended only. So it won't modify in place like MongoDB does. It's just a completely new document. So this sort of also reduces the need for locking. Which is what MySQL, like if you've sort of done MySQL tuning or something, you'll see at least 50, 60% of the time is actually figuring out what rows to log, which ones to sort of not let in, which ones to allow to query sort of things. So that's where most times, MySQL is tough for example. So it's also to improve synchronization. I think once you have like versions, you just take the latest version, sync it to the other machine for replication. So that's one of the reasons I can't see the older version because the application only takes the latest version. Is there a way to force complete application? Complete as in with versions? Yes. No, I don't think there is any. Like there are work around sweat, but no, like you could basically copy that whole database file. The other problem is most people don't want it because I'll talk about it, I guess. Because you're actually appending the whole document, especially for big databases. If you see the actual registry, the NPM registry, it's like 9 gigs, like for 11,000 documents. One because it stores a lot of stuff. So basically big loads up really, with this space loads up really, really fast. So I guess that's one reason to avoid it. But MVCC, basically it just gives you the latest version. And if somebody updates it, it does the same standard optimistic locking sort of thing. Make sure you're not updating your old document, so it'll give you, hey, can't save it. But especially in case of replication, when you don't, users don't have control over it, it'll create a document called conflict and let you actually resolve conflicts manually. But there is a deterministic way of doing it. So really quickly, I just pick one more deal. So every document has an ID, which is what is identified. And like I said, underscores are couch reserved. So ID is known to be a couch thing. It also has a revision. So is anyone familiar with the material? Yeah, so do you remember, can you relate it to the material revision tax? Basically it has a nice number and a sharp associated with it. So that's what it uses to say, find the version that I'm dealing with and that's the sharp, that actually almost guarantees me that it's something. Or hit on the other hand just like the sharp and no nice number. So yeah, maybe I mentioned it earlier, schema-less. Like you can put anything in there. It doesn't matter what you put in, as long as it's the correct taste one. Still making sense? Everybody, I'm the same. Okay, so the other awesome thing about it is HTTP. The fact that everything is HTTP. I think I started using couch like two and a half years ago and when I saw it, it was like so out there. To me it was completely out there. Nobody at that time, I think even now other than maybe Mongo for example, uses JSON as a store, just simple JSON. Even now I think I don't know of any other database that has a pure HTTP API. It's just not, in a database world it's just not done. It's just such a big performance that people don't do it. To me it was a really exciting thing. Everything is restful. I did talk about database. Everybody knows what rest is. Like the very basic thing is the repression. It gets a document or anything. It gets a resource basically. Post will create a new one. Put will update a new one. Put will update existing one. Delete will delete the resource. Simple. So the API is really obvious. If you want to create a database, we just say post to this URL. That's it. I have a really complicated way of showing that. But I guess you don't care about it. All of them are URLs. Just sort of. We created a document like that. We created a replication. A scheme itself is created by our post. You can properly get, I haven't ever tried it. This is the current status of replication by getting that replication with ID. But I haven't tried it. And there are design documents. I showed you some of the design documents. Maybe it's a good time to step into MapReduce. Anybody familiar with MapReduce? Okay. So I'll show you the views. But maybe I should just talk about MapReduce. So that's another exciting thing about CouchDB. At that time, a couple of years ago, nobody had MapReduce. Google had just published it in 2005 or 2006. The whole MapReduce paper. In the next two years, it was already there. It was already there in this database. I don't think other than Mongo, nobody else has MapReduce built into it. They have a loop to deal with it externally. Internally, but not internally. So quickly. So if you see this, let's see this. Let me show you an easier database. So with MapReduce, you have two functions. Map and Reduce. Let me show you an easier one again. What we're trying to do is basically say, one of the things that all these no SQL databases tend to do is to say, don't design your schema first. Design first your queries, how you want to access your data, and then figure out how you store it, or how you write your views, or how you write your queries around it. Views are one way to approach that problem. Actually, in CouchDB, views are the only way to approach that problem. So you say, I want to rank packages based on how popular they are, like popularity based on number of watchers plus number of quotes. So I will limit the rank and the description. In this case, actually, it's a bug that the doctor description isn't there. That's why it's showing value. It's nothing. I can just say, ID, save it. So this is building the database. The view itself. So you probably know views are incremental in CouchDB, so if it's computed, it won't actually recompute it again. So this one will be really fast. Like it's done. Like it's just go back and come again. It's just done. It's just like a couple of milliseconds maybe. If you change something else, let me see what I can change here. It takes a long time. Basically, what it's doing is it's taking all the 10,000 documents that I have into the view, into this MapReduce function and computing it again. Once it computes it, it's stored it. It won't recompute it. So in general, it's really, really fast, good for read-only applications. Read more applications. It's not the very intensive application. So the easy way to explain MapReduce is you take a whole bunch of data, split it into sections and compute on that, and then merge the information. So think of Map as selects. I want this sort of data. Like in a standard SQL this thing, it's like these are the things I want in my query. And Reduce as the... Can I use a whiteboard somewhere? The canonical example is like word count. So... We have a whiteboard, actually. It's behind the regular JS. Okay. I can have a little bit. There is a lot of fragments. We can find a fence. So maybe we can do it offline, after we finish it. I think it's important to understand MapReduce to know how the views and retrieval works. But basically think of it as letting the data running some query, stop one, and then applying that, passing that to Reduce so that it can be sliced down. Like Ruby, anybody Ruby? Or JavaScript? Like people? MapReduce are like the standard underscore function, right? Yeah. So it's exactly that. Map basically takes a whole bunch of data and transforms it somehow. And then Reduce just... Group by? Yeah, exactly. Group by. Sorry, I couldn't explain. But yeah, there are lots of examples. The canonical one is just word count. How words in a document is... And there's a good paper on it. So I think most of us should know about it. Cool. So yeah, simply Map does the select. Reduce does the grouping. Yeah. And there's a standard function to it so you can just run that through. So in this case, if you see... This one takes a doc. Doc is the actual document that needs to be mapped. And you can do whatever with the doc. And you can have a Reduce function. In this case, there isn't any... In the case of... In the case of this complicated function, I have a count. It's also written as... So it's equivalent. So it's just a shortcut for... underscore count is just a shortcut for returning that. And here is a small button that says Reduce. So Reduce is optional. You don't have to actually Reduce all the time. It's just like SQL. You don't have to group by all the time. And it will give me the number. So basically, what it says is there are like six packages and four for analytics. This I should have done this earlier. But yeah, 24 for API clients. Sort of thing. Is Reduce clear? So is this distributed? So no, this is not distributed. So this is the caveat with the Couch TV MapReduce. It's not. It's just local. Cloud and... If you use a Cloud and sort of service, which is distributed Couch TV, but it's not... It's a distributed view building, not distributed MapReduce. Sorry, I think I have a lot to cover. Yeah. Basically, so I've gone through views. Basically views is to... Wait, you just use views to create. And people tend to create a lot of views. So this is what we've created for... You see a whole ton of views. Like, hey, give me all the good ones. Like, it will take a really long time because I've not built it in a long time. So these are the ones that I'll get. So you can look at the code here. So the thing to see is all of them are stored into design documents. This is how it's actually stored in the Couch TV. Like the whole JavaScript is stuck in. And you can just modify it here. Or you can modify it in the... Quick question. Can I use underscore.js for writing my views? Yes. So basically, actually, if you're on 1.2, yes. If you're on 1.0 and below, no. Basically, any common.js module can be stuck in. Oh, okay. That's cool. There are a couple of rules to use. You can't modify stuff. It's like a map. You don't want to modify what you're actually working with. It's just to transform that whole thing. So, yeah. Anything that's common.js can be used. There's a special way. You just can't say... Yeah. So it talked about design documents. And it's... People know it with their language. It's all written in their language. There's a small part of it, which is JavaScript, but everything else is in their language. But once you have this awesome JSON store and once you have this awesome HTTP API, it takes you to the next level, which really kills you, which is writing the whole application stack in Couch. Since I'm running out of time, I'll just show you an example. So the canonical JavaScript, this thing is to do a to-do. And that's what it is. So what I've used is this thing called console. Yeah, so there are two alternatives if you're in the CouchDB world. One is the older one called CouchApp. Console is a newer version of the same thing. The idea of CouchApp is you embed the whole document, the whole HTML, everything inside CouchDB and serve it up there. Does it make sense? So it blows your mind because there's no middle-end anymore. You just have the database sitting here and the front-end setting inside the database. That's it. So this is my... This is a simple to-do application. There's nothing in there right now. So the way you do this is sorry, you shouldn't be seeing that password but that's okay. This is just my... this thing. We believe you. I'm sorry? We believe you. I don't even write anything else. So we basically have this bunch of code here. Just this is the key file to look at console.json. People do Node.js, right? You should be familiar with this package.json. You just take what my dependencies are, what's the main file to load and things like that. And you write this simple hello world index.html Well, yeah. Just some random examples. So wait, it does the query over like jsonp or... No, it's slow-closed. It's running on a different port. What's running on a different port? CouchTV is running open on a different port. I'm guessing this app... This app is actually in CouchTV. So you basically... So basically what you did is pushed it to a database called to-do right at the bottom. All the assets, everything including index.html are here. So the database hosts your app as well. Yes. That's cool. So that's the dv module I'm using. Basically JavaScript stuff that I said somewhere else are all included in line here. The index.html is attachment. Like attachment is another CouchTV thing. And there are a bunch of other things. So basically what you can do is do a list. And that's your hello world sort of thing. Like it's being hosted inside CouchTV. You don't need anything. You don't need anything else to display all your... So... What you're saying is instead of having your static files and all, whatever, on the file system, you're having everything on the dv? Yes. Your code is still in git and everything. You push it to CouchTV and that's it. That's pretty much it. Then when you mix that with the application, you can just take this whole app, put it onto a mobile phone. And your app is working literally offline without any interesting. And then because it's continuously syncing, it's updating everything everywhere. Yeah. And there's a really simple way to... Any other question? It's like... Is it like a performance like static file that you should be serving from the file system in the data? So what CouchTV does is actually one thing that I'm going to complain about CouchTV is that it doesn't do anything in memory. Like it doesn't do... So if you look at our setup has like probably a few million documents. It still takes like 128 megs. Like the CouchTV process. Elastic search and everything else takes like a gig. But it doesn't do anything in memory. So everything is literally on the desk. So it loads it up, sends it across. That's all it does in that sense. Performance. So one thing it solves is like, hey, I don't have to go on a mobile phone sort of thing. Or even in a real web app. Go all the way to the server, get the data and render it. That's the thing that kills performance. Yes. Yes. So you can start writing in your index.html you just write your JavaScript that says, hey, dv.getView and you'll get all the documents from that view. You can actually put from there or post from there. And it's just like anybody familiar with Meteor? Yeah. It's like a full version of it. But it's still similar. Can you index an attribute? Can you index an attribute? You'll have to write a view that takes that attribute. You have to emit that attribute in the view and then you can query on that. So yeah, sorry, I didn't have enough time to talk about views in detail but you can do a lot of things with views. Yeah, so that's how you index in college. Persistent. Use a persistent on the disk. And you don't have to worry. As long as you don't blow up the database or you don't re-index the view itself it's there. So they are decopying. You have the actual attribute actually copied again? Yes. And that's why it's a big disk suck. Like it just kills the disk. So many questions and then really wrong. Yeah, I missed the good part of CouchTV so maybe we can talk about it offline and the conversation around MongoDB. Yeah, so as long as people don't run away I'm okay to talk here. Is it really possible to line up somewhere available? Yes, it's actually on GitHub. Ah, great, awesome. Yeah, so maybe I'll just low it up and then people can think about it and... Any questions? Anybody? What's your top 5 problems with CouchTV? Why would I not use CouchTV? Because it sounds like a lot of great stuff but there are obvious drawbacks to this. Views are hard. It's a shift. Because you don't... Especially for people who come from the relational background, tend to find views really complicated. They are actually complicated. And there are things you can't do with CouchTV. Like, hey, give me packages that were created between this state and this state. To do stuff like that you'll have to index times as well. And indexing time is really hard because you have to say this day, this day, this day, this day if you look at over a two year time span basically creating so many views, so many documents inside the view that it's just... Yeah, it's not worth it. It's very slow as well. Top 5 problems. So the one big problem I have is the disk. It just kills disk, especially with the versioning and number of views you create. So I went from a gigbox to a 50 gigbox to a 100 gigbox just for a period of three months. It just eats away disk. You have to keep continuously compacting and stuff. It's versioning and the views are also stored in the disk. So if you don't optimize your views or if you write... One of the things that beginners tend to do is write a lot of views because every index, everything has to be via views. So you tend to have a lot of views. And then those eat up a lot of space. So what is the actual user versioning? Because you're not sure saving the delta changes. So can you switch it up or something? You can't switch it up. It's built into this thing because that's how it supports concurrency. So the problem with the standard way databases solve concurrency problem is being optimistic. If you update the same document you'll get an error. User says hey, user gets this thing say refresh again and put it again. But with the replication built in there is no user interference anymore. You can't ask the user anymore. I've updated my document here on my mobile phone consistently. I've updated this document on the data server or like the central database consistently. But when this syncs to this which version do I use? So those sort of questions nobody can answer. Literally nobody can answer. So somebody has to go look at this thing called conflicts. It exposes a feed called conflict. And you can look at the conflict and say this is the version I want. So literally like it. You decide which version I want. That's why versioning becomes important. Maybe you can... You've got to share this of course. Thank you. Anyway I would like to thank you all for coming. This first event is pretty cool. And I would like to thank Swissnex for hosting us in their offices. Thank you guys. Also Kieran and Xenap for shooting the talks. And most importantly you guys. And thanks for having me.