 So good morning, everybody. Welcome to the second day of the conference. I'm really excited to see the rest of the talks and give my own. Hopefully, this one will start the day off right with a little bit of energy. I think the noise development is probably one of the most energized fields right now in web and application development. There's, you can't turn around without putting another key value source on my bridge, or Dr. Maria Davis slotting her face with its documents. So hopefully, I can bring a little bit of energy and excitement for people who are actively working on this, even as we speak, probably some of your haggles right now. And to keep you guys going for the rest of the conference. Now, sometimes this is got to a national basis, but that question mark at the end is extremely important. I'm not advocating that we take all the minus-developed developers from Postgres guys, or from K-Microsoft SQL Server, build a job back and shoot them. I would never suggest that. That would be terrible. But I do want to sort of question the prevalence of the correlation benefits model. I think that's what is at the root of the NoSQL movement. But talking about that, let's actually dig into the fundamentals of the NoSQL movement. And I think there are five main reasons why NoSQL has sort of taken off. Though we generally hear about two of them. The first one we generally hear about is performance. So you see benchmarks here and there about how fast Mongo is at inserts, or radices at reading things off out of memory, because it's all in memory. And they're compared to the detriment of the relational model with standard Oracle and other benchmarks. There's a huge amount of debate about before it's benchmarks. And it's really wide up here. Otherwise, you can see he's hanging in all his fastest and in a lot of glory. Benchmarks are, they're good for some things. They're not, it's kind of cocoa, right? You can get a false sense of security from having high cocoa, just like, you know, from having really good benchmarks. Benchmarks are typically very artificial under artificial constraints in very unique circumstances. They might not fill up for your data. So it's always about to take them on a grand assault. That said, it does look like a lot of the numbers of extensions are a lot faster than in the relational database that you might be using. I don't know about you, but, you know, my applications aren't generally database-bound. That's not the driving factor for me. So we're going to move on from this one and talk about the second reason that people often present before in the US as well. That's scalability. And scalability efforts are often complicated, which just confuses the issue, right? It makes some people extremely upset when that happens. And we'll see what most people already know. But, you know, scalability is that you take one thing and then you do your one box or one node. Then you scale it up to a whole lot of nodes. And the ease with which you do this is, the easier it is, the more benefit you're perceived to get from it, right? I think this has become an important argument because of the web, right? So 20 years ago, before the web was huge, or the web actually really existed, I guess, we had desktop applications, which did not have to scale to huge amounts, right? Because they're on your mobile machine and you're using it. You can only carry so much data in a day. We had enterprise systems. They could generate a lot more data. But even then, it was, I think, still a fraction of the potential that we have now with web-based software, right? Your software can be seen by billions of people, literally billions of people. And they can upload 100 photos a day. Now you've got hundreds of millions of photos a day. It's a whole new world, as far as scalability is concerned. And so the database systems that were accustomed to over the last 23 years, they seem lacking, because that sort of extreme scalability wasn't a priority for a very long time. Post-president was a nightmare for replication for a long time until a couple years ago, when they finally invested effort in it, and now it's much easier. And I think that's an important point, actually. For both performance and scalability, to me, these were really bad reasons to go and know us well. There is nothing fundamentally true about relational database systems that guarantees that you won't be able to get adequate performance out of it, or you won't be able to scale them to the right level. In a lot of cases, it just has a big focus on it, because they haven't had that need. But if you wait, more is long, we'll make things faster. And as you see with the post-president example, we can make systems scale better. So those are sort of contingent reasons to like know us well. If you wait a little while, you'll get adequate performance out of the relational system. So there's no reason to jump on the nose, go in quite hardly. There are, however, several reasons that are more fundamentally the case. So flexibility, right? In your relational schema, you're going to constrain that schema, and it's hard to change. You run an altered table over a table with 100 million rows. There's a real cost associated with that. And it's painful. It's like the schema is that relational systems are built in some inflexible material, like steel or wood, where the schemas in, if they can ask you, is in no SQL systems are a paper, right? So you can fold the paper into this cool, circuit and nymph or gummy thing, then you can unfold it and refold it with a brand name, so you unfold it and refold it into a paper airplane, or whatever you need at the time. It's much easier, fundamentally much easier to do this because the schemas aren't set into the database management system like you're on the relational side. I don't think people talk about this one, but I find it pretty interesting because it introduces a whole family of no SQL solutions that we don't normally talk about when we're sort of off the cuff talking. The locus of work means that databases can do stuff for you. Oftentimes, better than you can do it for yourselves, right? So relational databases are built on relational theory, which means that you get a lot of functions that are built in and just equal for you. So you get aggregations and sorting and all of that stuff. It's not just magically there whenever you have any sort of persistence engine at all. If you use a key value score, you might not get aggregations. Probably don't as a matter of fact. But I mentioned the family that we go on to talk about, it actually differs from relational databases because it's based on an entirely different set of methods. It's based on graph theory. So if you had an IDB and you wanted to figure out how do you pick Charlie Chaplin to Jet League? That's who doesn't want to pick Charlie Chaplin to Jet League. If you're building on a close press or something, that's going to be a pretty expensive query to find the shortest path. If you've been on a graph database, Schwann's path finding is built into the database itself. It's way easier to get it done. It's always going to be more efficient. It's a fundamental character of how the database is constructed that it's going to be easier to find this in a graph database than in a roll-in database. And then the last one, this is like my favorite one. If you've seen me talk in the last eight months or so, you've seen me give my comics as hard talk, it's hard to talk about in a roll-in database. Where I spend most of the time ranting about the current state of the comics industry and how hard it is to model a relationship database because I've been trying for the last couple of years and I spent many years in this link. And the point is that certain domains map onto certain structures in the versus and player much easier. So relational databases are great for things like commerce data, user accounts often. Anytime you have a lot of relations, tracking, relational databases work pretty well. If you've got something more like a social graph, hey, it's got a graph in the name. Maybe a graph database would be a better model for it. And it would be, even if it appears more complicated to set up, it can be easier for developers to understand. And since we're all used to Ruby, we know that keeping developer products to be high is a valuable opinion of itself. So I mentioned comic books that I've done a version of this talk or a relation of this talk a number of times about comics. So you're gonna see some comics about this one. My example of the complex domain, when we go to an example of the domain you're not going to model a relation with is comic books, as represented by this chart, which is a chart of the romantic entanglements among the networks of the X-Men universe. So, so you've got green error lines that indicate unrequited infatuation. You've got pink, which is both parties of flirting, or the only ones that are unidirectional, all of those are better. So flirting is pink, red is a casual encounter, maybe a kiss, or how do I make sense of it, purple is they dated for a while, boy was a serious relationship of marriage, sometimes we're holding children, sometimes not, and then dash lines, which can be any of the others, are a parallel universe, we're all from reality, that version of the parent. You can actually see that Wolfram gets around quite a bunch. The game doesn't do too badly for itself. Chuck Savier, not so great, but he's bald and you know, I guess that's a turn off. So anyway, think about putting this into your relational database and trying to get useful information out. Think about the number of joins you would need to make and how inefficient that would be, right? That's again, that's a fundamental feature. If you put this into a graph database, as we're going to do later, you'll see the interesting data just falls out of it, happily. And so those are the reasons, I think, that are sort of most often presented for looking at the NoSQL sort of movement as a whole. But what is the NoSQL movement as a whole? I've got a problem with the name. My problem isn't that it attacks relational databases or that it's only finding things by what it's not. My problem is it's actually really, really broad. So NoSQL handles things like key bank scores, which are essentially just distributed hash links, right? And each one has different bells and whistles that make it more or less useful in certain circumstances, but that's essentially the base of it. And you've got examples like Amazon's Dynamo, which is a very few of the sites I've had to look with. P-Store, which is distributed with Ruby, which is pretty cool. You guys want to experiment with a key value score and you don't want to install anything in one of your systems. Try out P-Store, it's in the standard order. Redis is the hot topic on key value scores. It is, it keeps all this data in memory. We're going to talk about this more a little bit later. Don't get up here and I'll snap shots to this. It's extremely fast because everything's in memory, but it also has some downsides because of that. And then GTM, do you maybe know what GTM is? Ah, of course you do. So GTM is a key value score that's been in use for a number of years, maybe decades ago, and I don't really know, in the financial sector. And it's had a long, ancient history that is someone I saw put it. But it's sort of exploded on the most equal Google group a couple of weeks ago, and they've just been all these furiously. Why are you guys talking about it? We didn't know this forever. So GTM is apparently a valuable number of the key value scores as well. The second family, one of them are column-oriented scores. And of the four families that I'm going to talk about, these are the ones that are hardest for me personally to get my head around. Because all the others, they map easily onto things that we already use, right? So key value scores or hash tables, and the document maybe this is what you were going to see a little bit, and you thought of each document as a row on your table, whatever. Column-oriented scores flip that. So instead of having the rows as a fundamental unit, you have columns, as you might expect from the name. Unlike key value scores, which are non-structured, unless you actually add structuring to them, which we'll talk about. Column-oriented scores are a semi-structure, which means that you can sort of, there are relationships that you can build within them with fundamental technology in the database layer. Google's big table was probably the coming out party for column-oriented scores. When they published their paper, it exploded, everybody's like, wow, look at me, I can have these scores on me, where are they going to go? And no, no, they can't. HBase is an open source column-oriented score. Cassandra is particularly interesting because it is, right by its open source, the last couple years, it was developed at Facebook by one of the developers of Amazon's Dynamo, which was a key value score that I showed you before. But it is heavily informed by big table. So it's like if Amazon's Dynamo and Google's big table had a little baby, it would be designed, and it would be awesome, as we'll see. So we're gonna talk more about some of these as we go through in more depth. Document-oriented scores are probably more familiar to you than many of the others, because when Calcium exploded on this, I mean it exploded, right? All of a sudden, there's no Calcium. Wow, Calcium's everywhere. And then there's a sofa bed, and there's a cupboard, there's food time. It's a Meta-410 for its dreams. So document-oriented scores are also semi-structured, and you can have relations, and there's sort of one big relation built in, right? You've got a document, and then it has key value pairs within it. So there's that ownership of the concept of document, and then it has stuff within it. And then you can have embedded documents with a nose, and then you sort of wear one out. So I mentioned Calcium long ago, there was a great talk on yesterday, I don't know if Michael's in here, but there was a very good talk among you that when Confritz gives the video, I wouldn't defy as long as you watch. Is Anthony here? So I think, and I might begin as well, I think Anthony even wrote RDB after Calcium because he wanted a Ruby version of it. Does that seem right? So if he's here, you punch him in the head for integrating to the ecosystem, or how he learned whatever, poking in, high-five. And then to react is sort of the new guy on the blog. As far as I can tell, React really sort of exploded after either the New York, New York, and USFEL meetup, or the New USFEL East Conference. It's another document-oriented database that gives you a lot of good stuff, but I'm not gonna talk about any more than that, so hey, yeah. Then this is the last one. And I mentioned that I don't like the most about this, it's too broad. All of those three are sort of less structured than relational databases. Key guys, of course, have no structure. Column and document-oriented are almost semi-structured. Relational databases are very structured, right? Graph databases are sort of like the next step, you know, in relational databases and the amount of structure they let me do. As you might guess, they're based on graph theory, which I am not in that particular. So I know about graph theory, but I can't explain it to you. So if you ask me questions about that, I'm real confused and probably cry. But so the reason that we often ignore graph databases though, I think there are a couple. One of them is graph databases have been around for at least three years. People have been using it. People have been helping them running production systems on them and having a great time doing it. But they haven't been open sourcing it. So all of this graph database machinery out there in the world is sort of invisible to us in the open sourcing community. Unless we happen to go in and work for one of those companies. Luckily or happily that's changing. You've got things like a lever graph which is built in Java and LISP. And I know relevance is using a number of products. And I'm like pinging the guy every week to see how it's going. I think he's annoyed with me now. Maybe that's what I'm going. ActiveRDF is not really graph database, but it's like as far as I can tell it's a graph layer on top of a number of databases. So there's an activeRDF MySQL bridge where you can access the MySQL stuff as if they're for a graph database. I haven't dug into it very far, but a guy who helped me about it said cool. So I obviously pass on that here so you can go to school. And then Neo4j is the one that I'm most excited about. It's written in Java. And that means that you can get to a group generally. And actually Neo4j is the reason that I installed Jeremy on my laptop. I hate genre or anything, I don't prefer it, but it's jolly, it's fun. But I wanted Jeremy so I could play with Neo4j because it was amazing. And it's got a lot of momentum, even though it lacks a lot of critical features for large production points. It's moving and it's improving and it's open source. So you can contribute back if you're smarter than me, which isn't hard. So I give you some of the broad families. The next natural question is, for all of those motivations, to identify performance, scalability, flexibility, where the work happens and complex domains, it might make sense to figure out where are these options all fit. So on the performance metric, basically, and everything I say here is the sort of broad strokes. Within each family, the database is different. So some colorated databases are going to be more performant than others. So these aren't hard boundaries, where it's not like all culinary databases are faster than all culinary. They can shift around a little bit, but these are sort of the general, as I understand it, rankings. So key by-stores are going to be fastest because they have no structure that you have to dig through. Culinary scores, culinary scores, are going to be very, very fast because there's relatively little structure and it's going to be slowed down by your interface. So Calc choose is going to be, so it's going to be slower to get stuff out of than a Mongo, if it's native access. Relational databases, and graduate databases. So performance is a tricky thing, and I mentioned this when I talked about benchmarks. It really depends on what you're doing. So if you're trying to get out all the children that some guy had in 1837 because he went into your luxury research, you know, a relational database will get those has many relationships really, really fast. If you're doing shortest distance between Jethy and Charlie Chaplin, a graduate database is going to be way faster. So those could easily flip flop depending on your chosen area. Scalability, the other reason that I don't like the one on scale. Keybase stores are easily scale horizontally, right? You just get the dispatcher, a weight of associated key to know where it's stuff goes and it works. And I should point out that many of the keybase where it's calmery and not calmery have scalability built into the invite for fault. Replication and other methods of getting things out there. Like Mongo is working on auto-sharing, which sounds really cool when they get it done. Calmery stores can go at the scale because of Google's using it and they have more data than you use. Docnary stores are built for this, right? Things like Couch, where Couch's reliance on HTTP as an access protocol slows it down. So it also opens up the potential of being more scaled because HTTP is part of it is built around scalability. Relational databases, hard to scale graph databases and these are some of the features that aren't quite based on Neo4j for instance. It's unclear how they're going to scale because they're sort of hyper-relational, right? You have to, if you have to have all of your things related to a node on the same visible light, that's going to be present unique scalability. Flexibility schema and the charts are going to start shifting around. Key-byes stores have no schema, right? They actually have no structure at all unless you go in and add it. But by both they're nothing there. So they're going to be the most flexible. I would say documental storage and graph databases are probably about the same as the orange. The orange means it's about the same. And the reason is that, and again I'll use the graph database of my experience as Neo4j, in Neo4j nodes and edges are essentially like documents in a documental storage database, right? They're just associated key-by-parasit. Which means that if you don't actually create all the relationships, then you're just working with a documental database. Clearly it's going to be as flexible as a documental storage database. Columnerian stores are actually a little less flexible. If you look at Cassandra, you have to define small aspects of the schema startup. You have to say this column family will hold either columns or super columns, which we'll talk about. And in relation to that, it's obviously there once they're set to spend and they end up changing. Locusts of work. Generally, key-by-stores, column-oriented and document-oriented stores are not going to do a lot of work for you. There are some exceptions to that. So Redis lets you use Sets and so you can use InterSets and things like that. And then column-oriented, they can do homework, but it's relatively netful compared to what you get in the way of databases. So there's a wide range of things that work familiar with it. You know what the way of databases. And then grab databases, which also give you a wide range though it's a different set of things that you get for sort of free. Then you can push down to the database layer and calculate it. And then the domain complexity is how they stack up. If you have a non-relational domain, so something like you're just hosting a catch, so all I have is key and some data that you want to capture. Then that's key-guide stores. No relations between them. Column-oriented and document-oriented, you have both semi-structured data. So if you have, in the Rails world, all you have are has-many-relationships down, so you have one has-ones or has-many's. Then those are, that's sort of the sweet spot for domain modeling for column-oriented stores. Relational databases, if you have a relational domain. If you have maybe some many-many's, you've got some join tables, you could go. And then graph databases, I say they're good for hyper-relational domains, where everything is connected to everything else. And you cry when you look at your relational schema because it is like table-to-table, and then everything's black with all the connections being thrown around, right? Or kind of like that exponent picture I showed you. So let's dig into and actually see what accessing and using some of these systems looks like. Redis is the key-guide store I mentioned. It lets you store a couple of different types. Every key-guide store store is like the base case. It also gives you lists, and so you get list operations like pushing pop and sets, which gives you set operations. I believe there's a new release coming out soon that's gonna add ordered sets to this, which will maintain or sort for you while you work with them, which will be useful for a lot of cases. Redis is, the data is held in memory and at configurable intervals, you can say after 100,000 changes or 60 minutes of elapsed, right, disk. And so if your crash is, then you can move it back up and you've got the last snapshot from the disk. So you might lose some work, but you won't lose the whole remaining. Master's layer representation is super easy. And then the problem with Redis is that it's memory bound. And I discovered this, I was a judge for Rails Rumble this last year, and Hurl, which is Chris Moll's Johnson and Layhover's awesome project, he used Redis for its persistence layer. And if you're not familiar with Rails Rumble, you're given a line of a slice to run your app on that has a certain number that brings like 256 or something. And Redis, you know, lives in memory. So as more and more people use this application because it was really cool and really useful for a lot of cases, because of the old one, that memory start getting a little more used and a little more used. And eventually it was all gone. And at that point, they're about scratched and their application is, if you have an expanding dataset and you don't have an expanding, you know, RAM set, then maybe you shouldn't be using Redis because it is, it lives in memory, right? So if you take away its living space, then you've lost. And so, hey, comic books. So all my examples of comic book related. This is the Green Lantern, after all the other Green Lanterns died. So Green Lantern, the new Redis, this is how you set a key, it's pretty easy. For most of these examples, I'm going to be going pretty close to the metal, like the Ruby as close to the actual, like way you work with the system as I can. So this is just a Redis RB, if you have anything. So you set a key, you set a name, you get it, I got it just by asking for it, and then you can delete that key. This is how you use the Westerie. We're pushing on those hail to do this work. Hey, I got to lose my iron power. Great. I got more of my dead loved ones because I have a lot of dead loved ones. And then I got to blow up all the zombie lanterns when an alpha lantern is buried, it's, you know. I'm not going to go into the Blackstone or all of these to be happy. And then, so you want to get out of the Westerie, you can, you can arrange operations, and like I said, with sets, you can do your sets and things like that. Another key value store I didn't mention was Tokyo cabinet. Tokyo cabinet, in the base case, it stores strings, but it also lets you store tab-word data in something kind of like a relational database table. But the most interesting thing about Tokyo cabinet is the ecosystem around it. So there's this whole like suite of Tokyo tools. There's Tokyo cabinet, which is the data store. There's Tokyo Tyrant, which is the network and wear version of the data store. It's like the server, as I can talk to a little bit. Tokyo dystopia, which should be a manga, but probably isn't. It's for full text search, which I hear a lot of people have had an opportunity to try it out. And there's Tokyo Conan, which is their content management system, built on top of Tokyo cabinet. So there's this whole little suite of tools that you can install at any of these together, which is kind of interesting. This is how you use Tokyo cabinet. That's the Justice League International, which is the second lameless Justice League of all time. Does anyone know of the lameless Justice League of all time? Sure. Right? Super friends. No, the lameless Justice League of all time is Justice League Patrol. OK. So we're going to just look at the international in there. They're based in New York. And so we created and took a cabinet, and we want to add the members. So we got the key, JLI, and then we add all the members. But remember, Tokyo cabinet only serves strings. So if we're strings, we're gamma-lizing here to serialize it to get it into the Tokyo cabinet. And then when we pull it out, we pull the animal out to get back on it. OK, so that's the lameless Justice League of all time. This is like the universally acclaimed past Justice League of all time, the big seven. We don't mind a fly or simply a bad man or a woman. Aquaman and we don't mind a horse or a man or a big seven or a big five, I guess. OK, so we've got, this is the example for Tokyo table, the typo data, right? So we create a new table, and then we can add all these people with their roles, like super friends that doesn't belong there, because he can do anything. That man's a mastermind, because he can raise money, he has no preparation time. We are space confidence, et cetera. But I mentioned it was kind of like a relational database. You can query it. So there, here, we've got a query in the big seven table, and we're going to add a condition, a role, because all of a sudden we have a need to talk to some fish. So Aquaman has to earn his Justice League strengths. I kid, I love Aquaman. It's so much understood. OK, moving on, Cassandra. So remember, Cassandra is the love child of Dynamo and Big Ten. And it has interesting aspects of both. So it's column-oriented, like Big Table. It has columns, super columns, and column families. You can think of a column family kind of like a table. You can think of a column kind of like a column that is. So it's like an attribute value bearer, and A value bearer, E value bearer, E value bearer. And then super columns are collections of columns. So if you had, let's say we had a blog, right? Can we model? Then the blog posts would be the column family. The title would be a column. The text of the post would be a column. And then you could have a super column that is a collection of comment columns. That makes sense. We'll see an example in a minute. Like I said, this is the hardest one for me to get my head around, so I might be explaining it to the satisfaction of everyone in here. One of the most important things about ReefUp is that there are probably people who know way more about any subject than you at the conference. So if you just go around asking, what the heck are super columns? Someone will answer you. Try that one more. Like Dynamo, it's also distributed. When you add new nodes of Cassandra, the data is automatically replicated across them. It's eventually consistent, which means that when you write one place, it may take a while. It's not guaranteed a time frame when that data will be available elsewhere, but it will eventually propagate to all the nodes. And then it's easier to explain. You just add new nodes and they're automatically rolled into the system. That eventual consistency is the one that gets a lot of people. Because they think, no, no, I have that only data available from every place every time. So Cassandra has two ways of reading data. You can go a weak read, which goes out and checks the first node it finds essentially and brings back the answer. Or you can go a core and read, which checks a certain number of nodes and when they all agree, so they're all consistent, then it brings back the answer. So if you need just a quick answer, it's only created if you need the right answer to be used accordingly. Okay, moving off of traditional comments, we now have, hey, anime. One piece pirate, yeah. So we've got, we essentially, this is Evan Weaver's Cassandra gem, which is pretty sweet if you're using Cassandra, otherwise you probably don't care. So we essentially added the Cassandra. And we're gonna instantiate people, right? So this is the column family. These are IDs and this name is a column. So a column is just a key value pair and you can have lots of them in here. And then down here, we've got fights, which are one piece is a very spiked, I mean, like a lot of them, a lot of fights. So this is another column family. These are the IDs they correspond to Luffy. That's LuffyDMunkey. The opponent is not a column, it's a super column, because it contains a collection of columns, which are this view ID new to them, right? That creates a main value that includes the person, the people ID in it. So then down here, when we get out the fights, we want all of Luffy's fights, as you can see by my probably name variable, Luffy's fights. Then we can cycle over them and get out the names for the people that were the opponents at that point. So this is a little bit of relationality in Cassandra. Moving on to documentary databases. And a lot of you have probably seen Couch and Mongo, which I'm going to talk about here, but I'll go pretty quick. I think Couch explained because it's really made for the web, right? It's stored from it as a JSON, which we're all, A, you can read easily, but I'm gonna sift through angle brackets. And B, we're using it a lot of times for our already communication verticals. It's built such that you access it through HTTP via a risk interface. So you're getting documents, you're putting updates to documents, and you're deleting documents, which means that it's really easy to slap on a server and just hit it from somewhere else, right? In ways that we're already doing, or we're already making a huge impulse a lot of the time. Views are the thing that trip a lot of people up when they're thinking about adopting Couch and Mongo, because they have to be defined at the start. You can't query dynamically, you have to create your views initially. What it does is it creates an index in CouchDB, and then it updates that index as you modify the data. So it's an annual update as it goes. The way you create a view is through JavaScript, just like you insert stuff into it, and you're creating custom MapReduce functions that take a document and cycle over it and say, yes, this is part of the output or not. Now, the other thing that trips some people up for CouchDB is that it doesn't do partial updates. So in true RESTful fashion, if you want to update a resource for a document in Couch, you have to send the entire representation, the new representation, and it replaces the existing one. So if you're modeling a log post as your document, it has an embedded comment documents within it and you want to add a new one. You have to send the entire post document with all the existing comments to add that new one. So if you have a highly-permitted log that you can create comments for post, then that's going to become a part of it over time. This example is not as close to the metals as some of the others. It's on CouchRest, which is like one level up from the CouchDB library of choice, but it's still pretty basic for you. So we created a database, Kanoa, we create a couple of documents in RESTful Word, and then this part, it's third saved up, is actually creating a view. So they go in, underscore design, they get a name, this one's Trinit, because we want to pull out all the documents that are Trinit, kind of bad-ass, but not totally bad-ass. This commit is how you say, God, this document, if it matches this, it's in the results set. Null is used for grouping. So you can get out an associative array of results if you want to at this point. So if we had ages in here, or like we have shopper type, shopper lint right there, if we had that for everybody, we'd be grouped by shopper types. We know all the lint, shopper natives, and where. And then this is how you use it. You just call use first tune in, and you get a couple of rows. It brings back JSON, and you can slide below that just like we would in our old JSON. Not gonna be, again, there was a great talk on this yesterday. It'll be available by contrary, so you should go watch it. In light of that, I'm just going to talk to you a little bit about some of the differences between it and Couch-a-Dee, because they're often seen as sort of major competitors. Storage is actually in binary JSON, which is not JSON, it's another thing. And part of the reason for that, as I understand it, is because they don't actually work over HB, right? They have native clients for access, right? So they have binary sockets to talk to the database, and then there have been clients written for what? Everyone would want to use, probably white space even, I don't even know. But you can't learn how to do it in one. So access is actually a lot faster than Couch-a-Dee, because you're not going over HB, and you're going over a binary socket. There's another benefit over Couch, and queries are dynamic, you can create indexes, and you can query on them. There's even a shell for Longo that you can just go in and type your queries, and it's just like mySkillShell, it's a SQL. The other benefit, it does allow partial updates. So there's an example yesterday of pushing a comment onto a stack of blog post comments in a document, and you can do just that, with the IE of the document, and the new comment with the appropriate operator. So that's a better solution for some domains than Couch-a-Dee, okay? So I felt like there hasn't been enough Marvel love, so now we're going to talk about the Avengers. So to use a little bit, this is using just the standard line. It's not using Mongo Map, but Mongo Map is an orange layer on top of the Mongo library. This is the Mongo library. So we created an ID for Avengers and a collection of that for members. Then we add a bunch of members, right? These are the initial members in the first issue that you see over there. No, okay, so I have to say, right over here, you see Thor, Ant-Man, Hulk, Iron Man, but there's a fifth Avenger. Why does the boss get left out on the big banner up top? The boss is just as useful as Ant-Man, okay? And see, look, there she is. He's writing ants, she can fly, that's cool. Anyway, so I will send it on behalf of the boss. Okay, so here we can create an ID from name. We can pull out one of them just by fine, right? We pull out Ant-Man at the, in between issue one and issue two, Ant-Man turned into Giant-Man, because I figured that you're too shrinking people, maybe one should be a grummy person. So we update the name of them, let me say it. I think in one of the early issues, the Hulk left because he was very violent, there were some psychological tensions from the team. So he went away, and then later they found Captain America. So here we removed the Hulk, and here we answered Captain America as a new donor. Finally, Neo4j, the last of the examples, this is the graph database one, which is the last one we've worked on. I'm trying to figure out graph database before, right? So this is going to be pretty stripped down. Structure, it's based on nodes and edges, and both are sets of key-valued graphs. They're like documents for a microwave database that just have potentially more powerful connections between them. Queries, it actually uses Lucene for the query, which means that you can query in a lot of relation ways that we're already familiar with, because a lot of people are using Lucene or solar, things like that. Now, so the example for Neo4j is going to get a little more complicated, because there's a lot more stuff going on. It's actually split over three slides. The first one, I'm creating this person class with the Neo4j node mix in. So this person, whoever it is, is going to be out of the way today as a person edge. It's got a couple properties that are indexed. Name is obvious. Mute is just going to be a boolean for true or false or not. And then we've got some relationships here. We've got interests for that interquited love stuff, dates for, you know, one-time look-ups or whatever, and then marriages for stable, serious relationships. And what we're going to be modeling is over here, right? It's Magnino's love quadrangle of the story. So Esme was a young telepathic, I had a crush on him. He had a crush on a lost at one point. He made out in front of, I guess, at one point, and then he married Magin, had the scrumptious Lucelver, who we are all familiar with, I'm sure. Okay, so, now that we're going to class this, we can, basically I just did that class so that I could convince the code a little more and don't want to beat myself. So we're just going to create a bunch of people, right? So Magnino's person, these are all creating notes. Esme's person wrote, those three are all mutes, and then Magin and the lost actually are mutes. The lost counterpowers from her husband, he came before he beat her up, which is terrible. Okay, so, now onto the other relationships. Magnino had a crush on the lost. He married Magin, he had a date with her. Esme had a crush on Magnino, Magin married, and because Neo4j doesn't do bi-directional relationships by default, you can set it up to do that, so when you create relationships, you can say, when you create this, essentially, this is not code, but when you create this relationship, create this reflexive relationship as well. But I didn't do that, so we have to define the typical ones by hand down there. And then the query is, we get it out, right? I want Magnino, and then I want to see who likes Magnino, and it comes back with Esme, because she's the only one who has a crush on him, so this part just gives you the relationships, and adding nodes on the actual end points for those relationships. And then, so Magnino's all about the query, if you're kind, and scared, and you're not safe, and so let's find out if he really walks the talk, and see who has he dated, who's not immune, and so that's this way, right? We'll just give an array of the nodes that he's dated that aren't immune. In fact, I don't think he dated anybody, it wasn't me, but he did marry a non-human, so. I don't know Magnino, I guess I'll explain to you. Okay, so, actually I'm pretty close on time, so I'm gonna sit through this. You can simulate things, just like alternative languages, you can do in Python the same things you can do in Ruby, it's just harder than the other three. You can model things in non-relationally, this is relational, so you're simulating structure at that point, right? So maybe you've got superannu's pal, the home of hate, and you've got a document that's a person document, and it has a company ID that maps out to this underscoring that you've built in by Mongo, so you're essentially rolling your reform key support out there. Maybe you should go with something that actually has reform keys if you want to do this. You can also model microstructure, and for infeed famously in this piece of it, I think kind of the law, what is for all their serialized data, and I say this is also kind of a bad idea, this is one example, right, where I just took a hash, put it in the animal, and stored it in a content text field. At this point, you're losing a lot of the benefits of the relational data, right, because you can't do a referential integrity into that serialized data structure, it's a big pain. This is the NP-attribute value pattern, where we have, maybe the person on the table was really full of sparse data, and then we, so we extracted that out, and we have attributes, and it's essentially like a little, you know, it's a virtual schema that we go up here, but we don't gain the benefits of it being an original database at this point, because we can't link out to the values of attribute. And finally, we come back to relational databases, and I think that this is probably my favorite expansion of the NoSQL term, because it, hopefully it alleviates the fears for relational data as per us, that we're trying to get rid of them, but it also says that, you know, there's still something you need to look at, because SQL doesn't always work. And I think Cache is a great example of this sort of hybrid approach, because we're already doing it. Who here's using new Cache? Right, you got a hybrid system, if you have data, I mean, maybe just have a new Cache, that's great, but most of you probably have data, you've got a hybrid system, that you could easily swap, or add some new Cache inspiration, and maybe run it on memory, but it wouldn't work the same sort of way. Logging is a new example, this GitHub plugin replaces the Rails default line facility with a lot of uninesis, and it logs in there, which lets you do cool things with query law. And then most interesting for these sort of hybrid approaches are the domains. Sometimes you're actually moving together two different domains, so if you know lulu.com, right, it's a publishing, I'm not saying they actually do what I'm about to say, but it could, it's a self-publishing or Jeff Rosenbach's peak code, right, where they're selling PDFs and you've got user accounts. So e-commerce is a traditionally reliant domain, it fits a relational schema very well. But if you're selling documents, maybe it makes sense to put them in a document where you're doing it. Call me crazy, but naming stuff is powerful. Similarly, dating sites, where you could take money from people in the e-commerce functional, nice build, take database, and then you have a central graph in the graph database that says, oh, Johnny, they did Sally, and Sally, they did Billy, and Billy hates Johnny, because he didn't tell them, you know, you could manage all those relationships much more easily in the graph database. Is that simple, sorry, I thought I saw a Blackstone doing that, I should have asked you to do the comments. Anyway, okay, so different scales of data. If you think about a photo sharing site where you've got maybe a hundred thousands of users and they're uploading hundreds of millions of photos, or you're asking about the scale of the requirements between those sets of data, so maybe it makes sense to partition them out so that even accounts are in a relational database, and the photo stuff is in a more scalable, this is like the only time I bow to the scalability argument going to a scale, sometimes it might work in your current situation. Okay, last section, next steps, what do I want you to do? I want you to go and explore, and there are a number of resources out there, there's this internet webnet.com, it's a wiki that has just this gigantic Western database, it's grouped by family, it's very useful. The NoSQL Google group is kind of active, it's not ridiculously active, like agent-led applications, like the GTM thing that's recently happened, but it's a great place just to sort of keep your finger on the pulse. NoSQL ecosystem paper is from Rackspace, it's a great overview of a lot of these same issues. And then I actually set up a Google Wave for this talk, that if you want to be in it, continue the discussion after this talk, and find me, or email me, and we'll get one real wave and we'll see what happens. I've yet to find a good useful way, so I'm hoping this is it. Okay, next step, ignore the database. When you start an application, start with logical modeling, and don't be constrained by the physical database you're inputting in, right? Think about the data first, and then see what comes of it. Maybe you'll find that you don't actually need to launch your database, maybe you need to graph the database, just sort of be mindful of the domain you're working in. And then finally, and this is something I've pushed about on my blog, changing the default is a ridiculously powerful technique. There was a study in Sweden, I guess, about organ donation. Originally it was opt-in, where you'd be prioritizing to do the yes, and we're not organ donating, and then they changed it to opt-out, and the participation rates went up by 80%, or something ridiculously, it was amazing, because people stick with what they're given. So if you can make this work to your advantage by changing your own default. The next time you fire up a little throwaway application, try changing your default and using Mongo, or Redis, or Cal, or something, and just see how it works, okay? In Rails, you get this application, I have one on GitHub as a gist, that takes out all the active records and inserts Mongo, and it's how I got interested in this in the first place. And that's it, here's all my information. That's speedready.com, I would appreciate it if you guys could go there and rate me and give me comments so that I can put up there a speaker and we can all have better conferences, because everybody can have conferences to do better. And then bigot.com extended is our company blog, where I and Tony and other people blogging and then scripting.com is mine, and I don't think I have any time for questions, no, it's 10, 15. So if you have questions, attack me in the halls, perfectly not a lot of people. Thank you.