 Hello and welcome. My name is Shannon Kemp and I'm the executive editor of DataVercity. We would like to thank you for joining today's DataVercity webinar, Consistency and Distributive Systems Part 2, sponsored by Clowden, an IBM company. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we'll be collecting them via the Q&A section in the bottom right-hand corner of your screen. Or, if you like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DataVercity. As always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and any additional information requested throughout the webinar. Joining us today is Mike Miller. Mike is the founder and chief scientist for Clowden, a company that was recently acquired by IBM. As chief scientist, Mike is responsible for developing and evangelizing Clowden's technical vision, managing long-term product, R&D, and directing special projects. Well, MIT, as a postdoctoral fellow, he co-founded Clowden after cutting his teeth on petabyte per second problems at the Large Hadron Collider. Mike holds a B.S. in physics and a B.A. in philosophy from Michigan State, a Ph.D. in physics from Yale, and is an affiliate professor of particle physics at the University of Washington. Quite impressive resume there, Mike. And with that, I will give the floor to Mike to get the presentation started. Hello and welcome. Thanks, Shannon. I think I need to work on a much shorter bio to give you for next time. We're cutting into your time. So thank you again. It's a pleasure to be here. And I will just say that I'll be looking in the lower corner for notes from participants if for some reason the audio changes. I've worked very hard to try to make sure that we have the best setup possible. But please let me know if for some reason you can't hear or if it breaks up. So this is a second part. The scope of the last webinar that I gave as I was putting it together kept growing, and I realized that really to do with justice, I needed to come back and give at least another hour. And in putting this together, I realized that I think I can kind of close the loop in the hour that we have here, including Q&A, on the big picture items that I wanted to get across, and then beyond that we can chat with Shannon about getting subject matter experts in for various databases to go into more detail if you're interested in data modeling and more detail there. But let me flip forward to the second slide here. And you're not meant to read all this, but in putting this together, I guess the one thing that I want everybody on this call to take away is that there really is no perfect database. If one was to be built, I think a lot of the pioneering research is happening actually at the AMP Lab, which is algorithms, machines, and people at Berkeley Lab. A lot of great things have come out of there lately, including Apache Mezos, Apache Spark, if you hear about Databricks, Startup, or Trifacta. Just a tremendous number of ideas coming out of there. In particular, I've drawn pretty heavily on the work of a few researchers there in consistency. And the papers that they write are very easily digestible to anybody in our industry, whether you're a DBA or an architect. And I've tried to pull out the links as we go along for the ones that I think are most relevant. So what I want to accomplish today is to give you a very brief five-minute review of the high-level important points from the first webinar. And then we can pick up kind of where I started to leave off with a little more detail about some actual real-world data, particularly from networking, of why consistency is a hard problem. And then I'll try to switch back to something not quite so in the weeds and focus you as either developers or architects or database administrators about things that I think you should really focus on in making your decisions and deciding what solutions you're going to piece together in architecture or where you're really going to invest your effort in learning and testing. And then I want to switch to, I'd like to just bring your attention to some notable results from an industry survey, actually from academic literature, about what do real-world solutions really deliver in terms of asset compliance and consistency compared to what they promised. And then we'll go into NoSQL in some more detail with failure modes, strategies, and gotchas. And I'm going to focus in particular on the NoSQL side with RIAC, Cassandra, Clowdance, and MongoDB. And yes, the slides will be available afterwards, I think, nearly immediately. So let's flip back into just a quick review. And I want to start with the motivation just to remind everybody why this is a worthy thing to think about as you're building applications. My overall takeaway is that over the last five plus years, two market forces in particular have really stressed our models for consistency and transactional reasoning, big data, and mobile applications. Mobile devices, the fact that you have state spread across devices, different machines, and different data centers in the cloud. That makes a tremendous opportunity for folks like myself who are entrepreneurs building new systems and trying to redefine the application stack all the way down to the database. But it also makes challenges for folks that want to consume those new services because the whole ecosystem that exists around the established, you know, database, relational database world has to be kind of recreated around the new SQL and NoSQL market as well. But, you know, consistency in a distributed system, it's really your problem when your data doesn't fit on a single box when you want to replicate that state, say, for availability or for decreased latency for reads between different servers. When you have to spread that data between different data centers, say for availability or disaster recovery. And actually the minute you spread the state of your application across more than one device, and this is particularly a thing with mobile now, this becomes much more challenging. But even if you're on a single machine, consistency is still a hard problem. And so you have concurrency. Say if you have multiple people interacting with the database at one time, even if your application database live on the same server, or the minute actually spreads data across more than one process. So the upshot of the motivation from last time, my humble opinion is that this is now everyone's problem. The good news is that the market has responded. There's been a ton of innovation in database offerings, the technology themselves, NoSQL and NoSQL, as well as how they're packaged and can be consumed as a developer or architect or somebody leading a product team, whether it's open source on premise, or whether it's offered as a service in the cloud. And I'm going to just kind of try to tell you what my biases are. CloudIn is a distributed database as a service. We speak now not just one, but two APIs. We speak the Apache CouchDB API. So I'm very familiar with document databases, and that's kind of a bias that I use. And we also actually just rolled out a Mongo query language to make contact with MongoDB developers as well. But particularly differentiating thing is we are part of a trend that has built new database technology from the ground up using established algorithms from the literature to solve new world problems, and particularly focusing on distributed data and mobile strategies. And just to tell you kind of what that means and why that's very different than established databases, one of the things that we offer and really focus on at CloudIn is we allow you to write locally wherever you are, whether it's an application server that lives in the cloud and is handling client requests from a browser and talking to a large database cluster there, or they're actually saving state in say local cache of a browser or on a phone that happens to be offline in airplane mode. The ability to store data at vastly different scales and synchronize that later is a very real world problem. And those are the types of things that force you at the bottom to architect the database from the ground up. And this webinar is built to help you understand, okay, if you gain advantages like that, what are the things that you need to understand or worry about from traditional sense? Okay, so that's a quick review of the motivation. And let's roll forward just with one more part of review just with a little bit of NoSQL taxonomy. Choosing, this is a little hard to view if you're on the small screen. Hopefully you can blow this up a little bit. But here's an infographic from our CloudIn team that shows our subjective view of the NoSQL family tree. At the center are four seminal white papers that were published by Google, Google File System, Google MapReduce, Google Bigtable, and Amazon Dynamo. Actually, sorry, there's one Amazon paper that's snuck in. But those four papers and a few other things that spawn up around them really spawned, I believe, the NoSQL ecosystem as we know it. And it splits kind of left to right with things that are more analytics-focused on the left, maybe, and things that are more operational data or application state on the right. And top left, you have the entire Hadoop ecosystem that really came out of File System and MapReduce. Bottom left, you have Apache Cassandra that really did a very nice job of marrying the ideas behind Google Bigtable, which is how do you make a big column-oriented database and then make it very resilient using ideas from Dynamo. And then top right, I'm going to talk quite a bit about Apache Couch2D and what we've done with CloudIn, which marries kind of the MapReduce algorithms along with Dynamo-style clustering for high availability and scale out. And then there's one that doesn't quite sit on here as descending for many of these. It's probably more related to the MySQL ecosystem in general, but MongoDB is obviously a giant in the field and one of the default things that people reach for when they talk about NoSQL. So I'll try to weave that in as best I can. But putting this down into a single sentence, when it comes to things like transactional reasoning and semantics, I'll just read off a great quote from a researcher in the paper quoted here, which is that fundamentally because of things like the caps here and I'm going to go into this in a bit more detail. But NoSQL databases, the ones that I just talked about, often offer developers this choice between an algorithm that is fast and available but can deliver inconsistent results, right, or those that deliver consistent results that have slow or unavailable modes. And this is, you know, up to this point, this is really the realm of choice that most of us are in if we're choosing a system. My opinion is that there's a lot of great work happening right now that's going to be folded into existing NoSQL, NewSQL databases, as well as brand new products that will probably be invented or spin out of different research groups. But I think that we are on a path to identifying the consistency needs and availability needs that really define, you know, the new application stack as we know it, which is really built to provide, you know, store state for web and mobile applications at huge scale, whether they're enterprise or consumer facing. So I think there's hope here, but, you know, the upshot is that any system you choose does have, makes, you know, opinionated choices. And so as a user, you do need to understand your data store. And that would be essential wrap of what I wanted to share from part one. So hopefully that didn't go by too fast, but, you know, the upshot is that consistency is hard. And I want to take some time now just to make sure that everybody understands, even at the simplest level, why that is. Because there are quite a few claims floating around now in the database marketplace. I think it's important to be able to make informed decisions based on what those are. So let's take a look at why consistency is really hard. There's a great quote. Peter Deutsch has, I think, eight fallacies of distributed computing, but you can really stop at the first two, which is, you know, the first fallacy of distributed computing is that the network is reliable. I think anybody who builds systems on the Internet knows that, and the other is that latency is zero. We're going to take a look at real-world data around those in a second. But really, I want to make sure that we just step back real quick and look at the types of distributed systems. You know, I'm going to try to focus it down on two specific types, and then what the failure modes are, and even the simplest of cases. So you can very roughly bucket, I believe, the new SQL and no SQL systems into two styles. On the left, you have something that's probably very familiar to anybody that's run, say, MySQL at scale, where you'll have a primary rate master, and then you'll have one or more secondary slaves, if you will. They may be promotable to master and failure modes, but the idea is that you have a point where information comes in, and then you have other places where you can read that information back for, say, disaster recovery, high availability, or just to offload a portion of your workload to other machines at scale. And that, you know, I think is very much the MySQL MongoDB model. It's CouchDB 1.x model, although they're moving to the other side for the 2.x model. And on the other side, you have something that I'm very familiar with from my life in Cloudant, which is Dynamo-inspired systems, where you have a shared nothing architecture where there's no special node in the system. You store data on at least three nodes, generally. So right off the bat, you have a system that has to be three nodes or more and can scale linearly from there. And then you use kind of quorum or voting procedures to make sure that you reach consistency in that system. I'm not going to go into those in detail just to note that that's the style that Cassandra, Cloudant, and Ryak use in what I'll be discussing today. I'm glad to answer more questions afterwards or point you to resources where you can learn quite a bit more. But let's go ahead and zoom in on the case, just the simplest case, where you have one write master and one secondary read slave. So in this little diagram, in a perfect network, I have time flowing from the top to the bottom, and you have a single client on the left and red, and then you have a primary write master at the top and a secondary read slave at the bottom. And so if you kind of look at this from the top to the bottom, you know, the client can write a value of one to X and get an immediate success back from the primary node. The primary node then will probably asynchronously replicate that to the secondary slave. And as long as the client makes that read request at a, you know, significant, you know, time later where that replication can happen, then it will read back that value of one. And I'm going to just gloss over, you know, that latency for now. We'll come back to that in a little more detail. But after running distributed systems in the wild for 10-plus years, I can tell you that the number one thing that's going to happen is what we see on the next slide, which is more often than you would ever imagine that there's going to be a problem in the communication between that primary and secondary node. And oftentimes you'll have a network partition where, say, the client can talk to the primary and the client can talk to the secondary, but the primary and secondary themselves can't talk. So the way that would play out here, reading top to bottom, the client is going to try to write a value of one to the primary. The primary say, great, I got it. That asynchronous replication fails, and in a later point in time, the client tries to read that value from the secondary. It's never landed there, and so it gets an inconsistent result. So you write one thing and you go to read it back and you can't find it, right? There are all kinds of, you know, consistency guarantees that are models that databases use to try to minimize this type of pain. But at the core, this is really the number one thing that you see in network partitions, and it affects different systems very differently. If I just flip forward real quick, you can say, okay, why don't I not succeed that right until I know it's gone to the secondary? And that's something that, say, MongoDB allows you to do, right? You can choose different durability levels in Mongo. You can say, just write to the local client, right, or it has to get off your client into the primary node, or it has to go to one or more secondary servers in your replica set. If you do that, then you have to wait longer, right? But you can get a stronger guarantee that it is indeed consistent so that you can go back at a later time and read what you wrote. But just note that in the case of a network partition, even that strategy, as I show here on slide 21, can fail, right? And so if you lose that link, then you can't even write it all. So you go into a state where you're not available or very technically speaking, you falsify the liveness criteria. And so, you know, this is a very canned example. It's very simple, but it really gets at the core. You can start to work this out of, you know, the games you play in designing a system or choosing how you're going to consume a database system to dial in, you know, what is most important to you as a user. And so those choices are consistency, availability, and partition tolerance. You know, there's a wonderful proof that in certain types of distributed systems you can only choose two out of three at any given time. And there's lots of great work about that in the literature. But the upshot is each of the systems you're going to look at, even in the new sequel world, are making choices somewhere in that space, and it's important to understand those. So I'm going to try to kind of call that out as we go along now. So you may ask how frequent partition failures are. And when we started building Cloud and actually we thought that the number one thing we were going to have to struggle with was hard drive failure around 2008. I think maybe two years before Google had published a seminal paper about disk drive failures that most people on this call are familiar with that. And, you know, you have a certain failure probability. Once the number of drives you have spinning goes up beyond, you know, that number, you're kind of guaranteed to have one or more failures a day. And that can be, you know, applied to say node failure or RAM or lots of other things. It turns out that the dominant failure mode across all Cloud providers that we've been involved with, managed hosting providers, is network partitions. And I gave you a link here on the right to a paper that did a really nice survey just collecting different references of real-world data from big players. You know, there's quite a bit of information from different AWS outages. I think 2011 was one of the most famous ones where a network, you know, a misconfiguration led to a 12-hour network outage that affected all kinds of different services. But actually Microsoft and HP both have great publications and talks where they actually kind of peeled back the covers a little bit on the reality of, you know, the network stability of the enterprise-class systems that they themselves run. So in the network, in the Microsoft one, I'll just gloss over or call out some highlights. You know, I think in one year they noted over 13,000 customer-impacting network failures. Quite a few packets are lost in those substantial ones. But the interesting things to me were they had, you know, a mean of 40 link failures per day, okay? And at the 95th percentile, some days had, you know, 100, 120, something like that. And the median time to repair those was, you know, between minutes and in some cases up to a week. So they're substantial. But the number one strategy, right, the reason you have a network is you have a graph instead of like a single line. Redundant networks really only reduce that failure of impact by 40%. That's pretty amazing to me, right? I would expect that number to maybe be 90%, 99% with sufficient redundancy. And so the reality coming to that Microsoft study is that, yes, these are a fact of life and you have to live with it. HP did a nice study as well. And there's a reference within the reference I give here where they looked at, okay, what's the impact on the customer from their own managed enterprise clouds or their own managed enterprise networks for HP customers. And really, in the end, 40% of all support tickets. That's almost half. In the end, we're due to network problems. And those incident durations were in the hundreds of minutes. Okay, so these are not necessarily just fleeting things. They're substantial partitions. And that's, you know, some data that I think really does back up the claim or kind of my subjective opinion that the thing you should really worry about and design about is network partitions and not really single node failures because these are quite challenging to deal with in running distributed systems. And that's, I think, some of the things that make the sophistication of the NoSQL services out there really stand out even though they may not be marketed in that sense. Let's flip forward one more. The other piece of this is, you know, network doesn't necessarily have to be down if it's lazy, if it's too lazy. And sending information back and forth takes too long. You know, oftentimes you'll have timeouts in the system or, you know, at the application level, if something's taking too long, you just want to fail it. Right? And certainly if you have a consumer-facing site, you're very familiar with the fact that, you know, if, you know, some interaction, you know, user interaction takes too long, you'll probably lose that user. It's essentially the same as an availability hit. And so slow networks can be just as bad or even worse than broken networks. And my one piece of advice here is that, you know, just looking at mean behaviors really doesn't tell you the story. The tales of the distribution really matter. That's what we see in production. Here's some great data from a researcher at Berkeley where this is looking at the median time. So half of the, this is actually looking at kind of all possible ways you can make two different connections within Amazon's EC2 network. So in the top left, you can have, say, a server within the same availability zone ping another server within the same availability zone. So that's US East B or something like that. In the top right, you can ping servers across different availability zones, but in the same region. So these are probably within the same building, maybe within buildings that are immediately adjacent to each other. And in the bottom, then you can ping servers off of each other from different regions. And it left us running for several months, I think. And then I'm going to show you on this slide the median times. And then the next slide, the 99.99% times. The point to notice here is that, you know, okay, median times look great. I can ping a machine within the same data center with some same region, you know, under milliseconds. Same availability zone, but maybe different, sorry, different availability zone, the same region is like a millisecond. That's okay. I can think about, you know, handling distributed operations that way. Between regions looks mostly like speed of light, you know, probably, but something like 70% efficient, something like that. So it's really defined by how far apart things are. If I flip to the next slide, the 99.99% latencies, and I'll talk about why I chose that number in a second, these really go up, okay? So even in the same availability zone, you know, it's a decent fraction of the time you're talking about 50-plus millisecond connections between two systems, and it's slightly above that for different availability zones. But then between regions, you know, you really start to see long times, half a second, quarter of a second. And that has a big impact because just kind of wrapping up the latency summary here on slide 26, what matters here is that, you know, if you're doing distributed coordinated operations, and that's what you're doing, you know, that's what this entire talk and field is about, the number of things you can do is, you know, goes as one over the time to do one of those things. You know, if I can only do things, you know, if I take 100 milliseconds, that's a tenth of a second to, you know, talk between two machines, I can really only do about 10 things there, right, a second. Obviously there's concurrency and threading on top of that, and then you can start to get more fine-grained, but that's really the limiting scaling behavior. And so real-world latencies here are pretty substantial, and they have long tails. At scale, it's important to think about, if you're doing, say, 500 million operations a day, these, you know, 0.01% events are happening every single second of the day. And the picture is actually much worse here because a lot of the time when you see a failure, it's not just a single uncorrelated event. They tend to come in clusters, and that has to do with the communication patterns right of the networks. So, you know, these things can be large, and, you know, the network doesn't have to break in order for the system to act like it's effectively broken. I want everybody to remember that. Okay, I'm eager to come back to more on that in the Q&A period. So if we have unreliable networks, or they're not always as fast as we want to be, and that impacts, you know, these NoSQL choices, well, thank God for ACID, right, and traditional RDBMS, and the fact that a lot of the new SQL solutions, you know, whether it's full DB or new ODB, or something like that, you know, they really do promise to bring the best of the SQL world, right, which is ACID compliance, multi-part transactions, transactional semantics, with the same scalability of NoSQL. And that's been a really interesting thing. There's a great research study shown here which really dug deep under the covers of what the realities of ACID in the wild are. I'll just pop up one of these quotes, which, you know, my takeaway message is kind of beware of the marketing. You know, this research group found that a lot of, you know, one data store with a maximum of recommitted isolation, which is a lesser type of strong consistency, claims strong consistency, and this goes on into, you know, different kind of flavors of transactional integrity, et cetera. But the upshot is that if you flip forward, there's a nice study of, I believe, 18 traditional and new SQL-based RDBMS solutions, and eight of the 18 actually don't offer under any tuning serializability, which is kind of the gold standard of ACID compliance that applies globally to multi-part transactions. I'm kind of reducing that a little bit there. There's more formal definition. But only, you know, eight of the 18 didn't even offer that, and 15 of the 18 use a weaker model by default. And the thing that really caught my eye, you know, like Oracle 11G, which I, you know, is often thrown around as like the gold standard of, you know, transactional database, actually doesn't offer, if you look deep under the covers, the types of things that, you know, a lot of people think it offers. And so I just leave this list behind, and the link is at the bottom here and on the previous slide. It's something to look into. Yes, FoundationDB is absent in this. I believe it's treated in the link itself. I can come back to that later. But it's a very nice study. I didn't perform it myself. And if you're curious about digging into one particular database over another, it's very well treated in that publication down to the footnotes and the details of why that conclusion was made. So tossing in another picture of my confused daughter's face, you know, as a developer and architect, what should you use or what should you really worry about? And here's my take. When it comes to distinguishing characteristics, now I'm going to focus in on NoSQL in particular, because that's what I actually know the best. I should think about the concurrency model and whether or not there are locks. If so, how fine-grained are they and how do they apply across your entire dataset? The big one for me as a data modeler and architect is always, all right, the thing I really care about with consistency isn't, say, writing something in a disk going bad or a network partition, not necessarily letting it flow in the time that I need. How do I deal with relationships? Because every dataset is relational at some level. And so NoSQL doesn't mean that you're punting on relationships, right? It's like, how do you handle the concept of foreign keys? How do you best model? And what are the strategies that you take? And in particular, from Cloud, and I understand now that, you know, the consistency between, say, a primary index and materialized view, right, which is maybe like a prematerialized version of a join table is a very important thing for developers to think about. So I'm going to try to kind of collapse this onto my very subjective table. And we can come back to this in the Q&A. But if I look at an alphabetical order, Cassandra, Cloud, and MongoDB and RIOC, as you kind of walk down the list, you know, the distinguishing characteristics here that, you know, MongoDB is probably meant, my hunch, and I'm not an expert on MongoDB, is that the majority of MongoDB deployments are single server or master slave, something like that. It's really designed to solve that type of problem and do that very well in a way that developers love. It's not necessarily designed from the ground up to scale out to thousands of nodes. In contrast, Cloud and Cassandra and me are very much inspired by the Amazon Dynamo model, which is you immediately start with, you know, three nodes and go up from there. And so because of that, it has implications on locking and consistency. So, you know, MongoDB definitely has right locks, read locks, delete locks, and that's something that you want to look into in detail. But if you can live with the locking model, it makes consistency easier for you as a developer as long as you can, you know, stuff all the requests per second in that you need. In, you know, contrast, Cassandra, Cloud, and Ryok have taken a non-blocking approach by default and then started to turn on minimal locking as necessary. So Cassandra and Ryok now offer stronger consistency models that are optional parts of the API. In general, the documentation, both of those, you know, says something along the lines of, in the rare occurrences, you know, where immediate strong consistency is required, you can use, you know, the stronger portion of the API with availability and latency limitations, right? So it's like, you know, it's kind of like pushing the four-wheel drive button on the car. You know, you don't want to do it all the time because there's, you know, some implications to it. In Cloud and, you know, we don't yet have that, but we have some other things that we do with Quorum, which I won't go into in huge detail, that allow you to get some stronger consistency, if you will. But the big thing, as I was putting this together, that I realized, what you're really going to care about is how you're going to deal with relationships or the types of poor man's joins that you're going to do. And that's what I want to focus on in the back portion of this. And in general, you know, I'll talk about this in more detail, but, you know, you're going to end up with denormalization in some place, right? In the case of, say, Cassandra or Mongo or React, you're kind of denormalizing immediately in the way you're storing your data in the database. In the case of Cloud, your documents may be very normalized and point to each other, but then there's denormalization because information is stored in, say, the primary index and then a secondary index that represents your materialized view, right? So every one of these systems is going to be something like that. And then you have these different strategies, and it's interesting putting this together. I realize that, you know, at least my experience with MongoDB has generally led me to build fairly fat documents. I'm going to talk about that next. Whereas in Cassandra, Cloud, and React, I'm drawn to something that's a more right-only, very thin document, and then thinking about the links between those. So I'll just note that that slide is very much my opinion. And I'm glad to discuss that later on. So, you know, denormalization is something that does happen in all NoSQL systems. And the number one question you should ask is whether or not it's going to be the responsibility of the application, you know, to denormalize and then keep that consistent and measure any inconsistencies between the multiple places your information may be stored or whether the database is going to do that. And if it does, what types of guarantees can it give you on, say, you know, the consistency between the different representations of your data, say, a primary index and a copy of some data in a secondary index? That's a big question. So let's look at this in a little more detail. Here's a nice example from a blog post link to the bottom, which is generally talking about MongoDB and MySQL. But it's looking at a specific case of something that's kind of relatively static if you will, which is like TV shows, seasons, episodes, reviews, and cast members. That's a case that actually maps it very well onto a single JSON document. And one of the reasons there is that, you know, the exception of cast members, you're not going to see an entity somewhere in that, you know, graph showing up more than once. And in the case of cast members, you're probably not changing casts more than, say, once a season, right? So it's a more static data model. And it's fairly straightforward, I think, to imagine how a TV show can have an array of seasons, right? A season can have an array of episodes. Episodes can have, you know, an array of reviews and cast members. And so you can map that type of structure very easily onto a JSON or a BSOM document in this case. So that relationship as a single document is a very natural fit for some applications. I believe that's one of the reasons that, you know, MongoDB has been so popular, there is a large class of applications like that. Interestingly, if you flip forward to slide 36, it gets trickier when you think about something more like, you know, if you think about modeling Facebook or Twitter, right? You end up with some of these boxes turning green, or in this case, green means that, you know, the same entity may show up in multiple boxes here, right? And as you take this example of a user that has friends, friends at posts, posts of comments like that to thought, you can map that onto a document structure in JSON in the same way, but you can end up with these pathological duplications where, you know, a user document for Joe can actually then contain, you know, his activity stream, and somewhere in that activity stream Joe actually comments. And so you have these self-referential relationships stored in a single document. And that's one of the typical gotchas you get into with trying to solve everything with that fat document model. The, I think, leading solution away from that if you can't live with this, and I'll just note that even if you can live with this, or on the previous slide, the big challenge here is that you're spreading information across more than one document, and you know, most NoSQL databases don't give you a way to go back and then update that consistently across all of them. So in the top half of this talk, I talked a lot about, you know, real consistency of, you know, single objects, say, with, you know, system failure modes. But in reality, I think the thing that you're probably going to worry much more about is like, my data is going to end up a little denormalized someplace. How hard is it going to be for me, right, as somebody maintaining the application, or maintaining, you know, my application's data, to go back and say, okay, Joe got married and changed, you know, his name, I've got to update Joe's last name, how many different places is that data scattered, right? And that's the type of thing that actually can really throw you for a loop in production. And so that's why for, I think, more connected data, you know, you may want to look at, you know, going away from the stat document model to something that allows you to break up your data in something more relational, if you will, or a little more graph-like. And then gives you some more semantics as a developer to, you know, query that in a way that may look like a soft join, if you will, or certain types of joins, and then update, you know, information in, say, one place instead of all over. And so flipping forward to slide 37, I apologize, it's got very small in the WebEx, but that's where I think materialized views really come into play. And they're very heavily used if you're going to run, say, Cloud and or Cassandra at scale. On the left, you know, it kind of shows how, if you're building a blog, this is straight from the Cassandra documentation, how, you know, in Cassandra, you say key space instead of, you know, relational tables. You have rows, columns, column families, and a few complicated things. But in reality, you're going to split your users and your posts and relationships between them about who follows who in the kind of different localized portions of the database. And then in order to do a query, say, to rebuild a timeline, which is going to pull on those different portions of the key space, right? The same way that you would use a join and SQL to pull on different portions of, you know, your tables, that's going to be done with a materialized view. And I'll talk a little bit more about who you do that in Cassandra, not much. It's a very similar way to do that in Cloud and where on the right here I just clip some old JSON. I'm going to, say, build a transactional, you know, accounting system. I would have documents that look like the one on the top where I would have an account that would have a name and an ID. But I wouldn't store the value of how much money or, you know, how many Bitcoins are in that account. I would use separate documents that have a type and they represent a transfer. So it would be like a transfer table or a line in your checking book, right? That's how you actually do it, where you would then tag things like the source account, the target account, the date, et cetera. And then, in the case of Cloud and the database, we'll build these materialized views for you automatically and then have them ready for you to query whatever you want. Okay? And so that allows you to do, you know, I'm using the word join here because really, what join means to me is the ability to follow some of these relations. It's a very powerful thing that allows you to, you know, really operate beyond what you can do with single FAT documents. Okay, and I'm glossing over a lot here. But I want to make sure that we leave time for some Q&A, so I'm just going to close with kind of four slides here, a high-level summary of reviews of, you know, Cassandra, Cloud and MongoDB and RIOC along these lines. And then you can ask questions or bug Shannon for more subject experts in each one of these to come in and give you more detail. But really, you know, Cassandra, my take is it's a wonderfully high-available system. It's come a long way, I believe, the 3.0 release of the Cassandra query language, CQL, really looks very comfortable to focus coming from the SQL world. And it really eases the pain of denormalization that Cassandra encourages. The Cassandra community has done a good job of messaging around the fact that denormalization doesn't necessarily mean you as the application developer have to do all of the, say, joins or denormalization yourself. Really, what Cassandra encourages you to do is, you know, once many, many, many relationships are handled by inserting into multiple column families when you're updating or writing a document. And the query language gives you semantics to do that. I think that looks very familiar, in terms of coming from the SQL world. So the reality is that, yes, that's where that materialized view is being built. You're kind of, you know, writing it to multiple places at once and scattering that information yourself. And then those updates propagate, you know, eventually as you need them to. If you are updating something that is just a core piece of your information that absolutely has to be immediately consistent across all nodes on disk, then you will want to appeal to the strong consistency API that Cassandra now has with the understanding of, you know, the latency and availability hits that you'll take. So to be clear, you know, if you're hitting that strong consistency API, you can get yourself into a corner where, you know, it may reject you, right, because not everything required to be satisfied, oftentimes from the case of network partitions. There's a whole discussion to be had there around local quorum and global quorum and things like that. I'm not the expert to answer those, but I can push you to the right resources or communities there. Cloudant looks similar. It's very highly available along the lines of Cassandra and Ryuk because it's dynamo-inspired. And we actually push you to normalize your document structure. So every document will at scale maintain, you know, hopefully just kind of a single piece of information the same way you would have in a single table, and then you have to manually maintain foreign keys between documents and so foreign keys here just mean, you know, an array of IDs to other documents. The database gives you, you know, limited API to follow those links, but really the power is in one to many, many, many relationships. The Cloudant database doesn't require you to enter into multiple places at once to build and materialize to you. It gives you a JSON runtime environment or a JavaScript runtime environment so that you can say, okay, here are the things that I care about from documents, and I can give you a little bit of code that documents will be run through, and the output of that code is a materialized view that follows the relationships or does the soft joins that I need. Not all joins are expressible this way, but actually a decent subset of them are. And so the difference here with respect to standard is that, okay, the database is going to update those for you and build those materialized view whenever you insert a document, but that materialized view is going to be eventually consistent with the primary index. And that's one of those gotchas that people don't necessarily think about. Just because I wrote it, a single document, and I can consistently read that single document back, doesn't necessarily mean that there isn't propagation time inside the database someplace. And we have, you know, in Cloudant, there are portions of the API that allow you to push for stronger consistency between primary and secondary indexes, but at the core, when push comes to shove, it's still eventually consistent. And it makes that choice so that it can always accept your rights. Trying to wrap up here, so we have time for Q&A, MongoDB. I think it's very important as a Mongo user to understand what locks and when. Mongo has undergone some very nice upgrades. You know, at one point, I think a delete actually locks the whole server. Now locks are database-wide. I'm sure there's work to make them collection-wide at some point, and you'll want to understand the difference between acquiring locks for reads and writes, what that means in terms of the concurrency model. But you can go very far in Mongo with kind of fat, denormalized documents. And that's, I think, what I see as the most successful model to a large extent. In Mongo, you can also manually handle links to IDs of different documents. But to my knowledge, and please correct me if I'm wrong, MongoDB doesn't yet give you a way to, say, follow those links within the database. It's up to you as the application developer to search those documents individually and then join them up at the application tier. I would not be surprised if that's work in progress to change that. But that's my current understanding of this date right now. And in particular, you're going to want to be aware of the consistency subtleties of replica sets and how you denormalize your data. MongoDB goes a long way with kind of master slave replica sets, and they will auto promote themselves in the case of a master failure and give you different consistency guarantees or durability guarantees that you can choose from. But you need to understand which one of those will make it, say, immediately consistent but unavailable in the case of a network failure. And then wrapping up this quick review, Ryok, very much like Cassandra is highly available, you can include foreign keys to other documents, and then you have to manage that foreign key integrity yourself. But those types of kind of light, graphy relationships can be followed within the database via a pretty novel link walking API. And so Ryok has some great documentation about just how far you can take that. And new in Ryok 2.0, which actually landed a while ago, maybe even a year ago, they now will allow you to turn on strong consistency for a bucket. It's an optional portion of the API. And with the same impact on latency and availability, you can see this practice algorithm that they've implemented, then know that single documents are consistent actually on disk in the system. So that's kind of a lightning review of those four. And I just want to close with kind of my final thoughts here. This is very subjective, not even representative of cloud or IBM, for that matter. But I really think that of all the things you can focus on, like get into a lot of sales calls or architecture discussions where people are really worried about transactions and consistency in corner cases and they want to go to the whiteboard and talk about all the different failure modes, that's all fairly distracting, in my opinion. I think what you should really be worrying about is how fast you can build a new application and get it to market. Granted, you don't want to throw something terrible out there that's going to fall over in production immediately. And that's something that I think is a company we work very hard to, you know, make sure that when something gets launched, it gets launched in a way that will run. But really, you know, of all the things you can worry about in the database space, you know, I think our number one job is to make sure that developers can do their job quickly, efficiently, and in a way that won't, you know, will perform in production. I will definitely say that it's an exceptionally rare case where an application now depends on only no SQL or only an SQL-based system. You're probably going to run both. We do, in buildingcloud.com, across 35 data centers. You know, it's going to happen. And we focus a lot on the database, but there's a whole other side of it now, which is, you know, what's the mobile strategy? And it's hard for me to conceive of a single application now, even, you know, within government agencies, ultra-classified, it doesn't require a mobile strategy. And, you know, when you actually look at how data is stored in a lot of the magic that went into detecting and minimizing conflicts and making eventual consistency a natural part of the system, that all lends itself to a good mobile strategy, right? If the database in the cloud is eventually consistent, then you can deal with somebody writing to their phone when it's in airplane mode and, you know, synchronizing that at a later point in time. No matter what you choose, you're never going to engineer a perfect network, no matter how much control you have over the system. And so my very personal take is you should focus on systems that are available for writes and reads and tolerance of network partitions that, I believe, minimizes your pain in running a production system. All of these systems are new that I talked about, especially the NoSQL ones, and you'll need to become a pretty advanced proficient user in understanding, you know, in order to make the most use of them. And that really means data modeling. It's all going to be about how you're going to represent the relationships in your system because NoSQL doesn't mean non-relational, right? That's a big misnomer. It's just about finding the most efficient way to do that with your choice of database. And then I'll just close by saying there were a couple great talks that we had at Cloud and Con in San Francisco on the 17th that are just getting released on video, and so hopefully I can send a link to that as we post the slides at the webinar. But if you're interested in kind of state-of-the-art consistency and what's coming next in databases, that's a good place to jump in and start. So with that, I'll close, and I'm really eager to answer questions. Thank you. Mike, thank you so much for another great presentation. I just love how you get into all the databases and this is so educational. One of the most popular questions, of course, has been if people are going to get a copy of the slides that you have presented, and I will be sending out an email, follow-up email to all the registrants by end of day Monday, with links to the slides, the recording of this session, and any additional information requested throughout, including Mike's information, and so we can get you guys all connected. So Mike, we have some great questions coming in in addition to asking about the slides. First question is pushing some storage-related decisions back to applications versus in the database. Is that a good trend or development? We just came from there not too long ago. That's a great question. And I would say we're definitely in an oscillate, kind of a, I'm a physicist, so I'm going to call it a dampened oscillator. But yes, there was, I think early on, especially if you look at the original Amazon Dynamo paper, it's like a key value store, and if you wanted to do anything more than that, you had to deal with it in the application tier. And actually that original paper talks about quorum and the way you can use these voting mechanisms to redundantly store your data, and then kind of dial a consistency knob with every API call that you make at the application layer. So the great example there was like, okay, if I know my system is performing well, I can say I require two out of three rights to flush before I want this transaction succeeded. Great, I'm going to have two-thirds of every copy of my document to be consistent on disk. Awesome, and then if I know that a database goes down, I can relax that as I need to. And so it gave the application developer a whole bunch of flexibility and a way to very powerfully program around failure modes. In reality, that proved too confusing for I believe the majority of developers, right? And so that's one place where pushing that back onto the application has bounced back another way. It's like, okay, let's try to hide those things a little bit more in the system and just say it's strongly consistent or eventually consistent. You choose. You see that in Amazon's own product line, as well as everything I talked about today. I think another big example is foreign key references. Right now, referential integrity in the NoSQL systems is completely up to the application developer. That clearly has to change because relationships are reality. And that's, I think, just a statement about the young age of these systems. And then I had one more I wanted to say, oh, I think the other big kind of bounce back is, you know, offering transactional semantics. There's a great quote in the Google Spanner paper where they talk about why some of the newer Google systems spanner and before that megastore, I believe, which are the types of SQL-based databases you can consume in the Google platform services, why they went back to SQL-based systems with transactional semantics, you know, because even though they can be much slower and more false prone for large-class developers, that's just, you know, they need to be able to think that way, whether it's the way, you know, they were trained as developers or not, I don't know. But it's going to be important, I think, for, you know, folks on this call, as you're choosing a system to know your developers. For instance, we at Cloud have found that folks that write in JavaScript, you know, whether it's for the browser, for the phone, or, you know, for the server in the Node.js world, which is something we laughed about, you know, five years ago running JavaScript on the server. But now it's a very real thing, right? They tend to do very well with the eventually consistent world flinging documents around. Folks that are coming over from the Oracle world, you know, really miss transactional semantics. And so I think that you're going to see this oscillate back and forth as we go. And that the story is not closed by any means. But the answer, I guess, is to know your own developers. And do what they need. Nice. Next question coming in is, can one provide services in developing countries with Cloud solution considering slow network connections and bandwidth in those regions? Certainly. So that's a great, I can answer Cloud questions easily. Yes, so Cloud supports the Apache CouchDB API and fundamentally uses the same storage model underneath. The absolutely distinguishing thing about that, which separates it from everything else, is that you can write things locally and synchronize them with other servers or clouds at a later point in time without ever having to know that they were going to be synchronized. And that's a big deal, right? Typically when you set up replication, like master-slave replication or something, you're doing it as you build the system, right? Or you have control of both environments. But Cloud is very much Apache CouchDB inspired. We just extended that to great scale, running it as kind of like Akamai for your database all over the globe now. And there are lots of clients that run either the mobile clients that are offline for the majority of time and may be doing, say, sustainable development work or vaccinations or something like that. And when they get back to a hotel, they upload all that data, right? And they download at the same time. So it very much allows that master-master scenario. We use it heavily for mobile now, but really, you know, in my physics life, I have a Apache CouchDB server that's been running for three years, I think, without being touched a mile underground in Canada, you know, taking dark matter data. So that satellite idea is one that really does play out in reality. Fabulous. I love all the physics references. In what context are you using the term data modeling in this last bullet of your final, your two-cent slide? That's a very good question. I'm not sure I'm using it in an officially correct term, but I'm using pragmatically. I'm using it in the sense that there are a lot of choices, right? Even like that MongoDB example we showed, like with televisions or user data, you know, you have the choice, say, to store everything in a single document or break it out into a whole bunch of different documents and then store the foreign keys yourself and, you know, do an application side join. That's the sense that I'm talking about data modeling. In the case of, say, Cassandra, you're probably really going to have to understand, you know, CQL makes this easier, but at the core you're still going to have to understand, like, what's the right thing to put in a column family, you know, columns, you know, sub columns. Like, there's a whole bunch of choices there. And, you know, given your needs in concurrency and availability and scale and update frequency, you're going to have to understand how to map those, like, concerns of how the system works onto what your real-world data is that needs to be represented and stored and updated in the system. And generally there you're going to have to work with a community or work with a vendor to, you know, come up to speed very quickly and get trained up on that. But you need to be fairly proficient. You can't just, you know, hide behind an FQL-like, you know, abstract interface yet to get the best out of these systems. Perfect. And I love all the questions coming in. You mentioned materialized views, leveraging consistency for updates, but one would think of partition data by time or type with reference to Twitter or Facebook transactions. Did the attending misunderstand? No, certainly, you know, there's partitioning by time. I'm not sure I quite understand the question. Maybe I can follow up a little more afterwards if I don't get it right now. But certainly ordering things by time is one main index. Sorry, yes, one main projection. But oftentimes there can be other ones, right? When you actually think about building a Facebook or a Twitter-like timeline, there are going to be, you know, things that you get into where you have to, you know, say use a compound index if you will, right, or multi-column kind of materialized view. We order by time and then you group by something else and group by something else. Underneath, oftentimes, you know, some of these data stores will use a storage engine that kind of rolls things based on time as well. So there are different ways that that can come in, but, you know, you can order by time within a cloud of materialized view just as well as you can order by age, height, you know, rank, salary, any of those things as well. Perfect. And what are some UI experience considerations we can take for transactional applications that use eventually consistent data stores? That's great. You know, the number one place that I see eventual consistency is like, you go to Facebook and you update your profile picture and then you bounce back, you know, I don't know what they're doing. If you bounce back to your timeline, you know, usually you can catch it changing the picture. Right? Whether it's a cache expiring, you know, something I didn't mention like most web systems will have cache data anyways. So, you know, the end user experience is usually not immediately consistent. But you'll see things like that. It's just like if you go to LinkedIn, there's a little notification, you know, counter at the top. You've got 27 messages, 32 requests, you know, things like that. And you'll say maybe go and accept, you know, a few LinkedIn requests and not see that number change the way you would expect it, but then it will a couple seconds later, or, you know, half a second later. And so I talked about that a lot actually in the first one, which is, okay, eventual consistency, what's important for you to understand is like how eventual can it be for my application or for my users? And in most cases, these systems tend to be, at least at the server level, consistent within tens of milliseconds, right? It's only in the case of, you know, partitions and then healing events that you really see deep inconsistencies, you know, as the systems self-heal and anti-entropy kicks in. But, you know, from the UI perspective, it's usually like, huh, that's weird. I just, you know, I just changed my profile picture, but my phone still shows the old one. Oh, wait, there's the new one, you know. So anything that is less than 100 milliseconds, humans aren't going to notice, right? That's establishing literature. Anything on the order of a second? Yeah, we're going to notice it if we're looking. Mike, thank you so much. I'm afraid that's all we have time for today. Just a reminder that I will be getting an email out with links to the slides, links to the recording of this session, additional information on Cloud. And you guys just recently released a new query language, which is very exciting. We wrote an article on it for you. So I'll get that out to everybody as well. And just so you know, you can hear more about Cloud at our NoSQL Now 2014 conference in Expo coming up in August in San Jose, California. So make sure you check that out as well. Mike, again, thank you so much. And thanks everyone for the fabulous questions and your participation in the webinar. Hope everyone has a great day. Thank you very much. Thank you.