 So doors are closing, so I'm going to get started. My name is Ethan Gunderson here to talk to you about understanding data storage. Before I do that though, I'd like to bitch a little bit about something. That would be NoSQL. I assume that everyone here has heard of the term NoSQL and is there anyone that hasn't? That's what I thought. I actually think this is a really damaging term. Normal people don't talk like this. We don't group things together in a negative way. Besides being completely negative, it's almost a completely worthless term at that. Saying that you're using a NoSQL database tells me absolutely nothing about what you're doing. Is it a key value store? Is it a graph database? I have no idea because you've just said NoSQL. Even worse is that now that we've classified this group of databases as one group, it gives the impression that they're interchangeable, which they're not. A graph database is not interchangeable with a key value store. To help illustrate this point a little bit, what if we started saying no Greenland? It's Greenland, but it's covered in ice, so it's confusing. I don't understand it. It doesn't scale. No Greenland. Instead of saying I'm from Chicago, the United States, I would say I'm from No Greenland. That could be Italy, Mexico, the States, Canada. It's all now one area of the world, although they're completely different. It doesn't make any sense, does it? I would consider a personal favor if everyone in this room never said NoSQL again. The next time you go to say it, like, hmm, maybe I could be more descriptive in what I'm trying to tell the person. So let's practice this a little bit. Redis is not a NoSQL database. It's a key value store. MongoDB is not a NoSQL database. It's a document store. Cassandra is not a NoSQL database. It's just difficult to use. So that's the end of my rant. So like I said, I would consider a personal favor if you never said NoSQL again. So moving on. Like I said, I'm Ethan. I work at Obtiva. We're a small consultancy based out of Chicago. More importantly, I like databases. I like reading about them, talking about them. Obviously I'm here talking to you guys about them. The problem is that databases are really complicated and I'm just not that smart. They have a lot of big words, a lot of theory. People like go to school to learn about database theory and I just like read about it at night. So it's really difficult. But somehow I fixed that problem because like I said, I'm here talking to you now. So my solution was to start a user group in Chicago called Chicago DB. The purpose of it is every month we read an academic paper and then we talk about it. Learn a lot. We invited a lot of smart people. People smarter than me so I could just leech knowledge from them. That seems to be a pretty effective plan if you're looking to do the same. Just surround yourself with smarter people. Be the dumbest person in the room. You'll learn really quick. So if you noticed, I saw a problem that I had and I found a solution for it. This is something that I think the Ruby community has a problem with as well. And so we're clear I'm not talking about testing frameworks. We all know we should be using RSpec so that's not an issue. The main problem I'm talking about is we seem to always want to fixate on solutions before we recognize the problems that they solve. We see this a lot with the new gems. Someone high up in our community will release a new gem and instantly a thousand code bases are using it. Without really understanding why this gen was released, what problems it solved, is the code even good. We jump immediately on to solutions. And the latest, the latest explanation or instance of this is databases. So this is a fixational website for a database called BombDB. Its features are its scales, its maintenance free, and it uses something that is not a SQL language. It also is way cooler than SQL and it allows you to fire all your DBAs. Oddly enough, these are things that I found on various other database websites. None of this is actually made up, which is kind of scary that people actually post this and people read it and believe it. So almost immediately after this is put up on the internet, thousands of people across the world say I have to use this. This database will solve all of my problems. And mostly because they think it's cool. The website tells them that it will solve their problems, but I don't think they actually know what their problems are yet. So really, it's not about picking a technology because it's cool. I mean, the data in your application is the heart of it. Without it, you're, I mean, you have nothing. So really, it's about doing your homework. I think most of us would call ourselves responsible developers. And I believe part of that is having at least a basic understanding of the tools that you use and recommend. And that should include databases. And this knowledge should extend the bullet points of a website or the information in the read me. Some questions that you need to ask yourself are what were the design goals in mind? Are these the problems that they were trying to solve? Do I have those problems? Are they similar to problems I have? Do they match up at all? And there are other things like what's the data model? Is it a key value store? Is it trust? Is it a graph database? Can I model my data in a way that this database can be useful? What kind of querying does it have? Is it only primary key lookups? Does it offer secondary indexes? What about ad hoc querying for business intelligence reports? Those things, the databases you pick affect all of this. So the more questions you can ask and answer up front, the better off you'll be in the long run. So I gave a version of this talk earlier this week in Chicago. And on the train ride back, there was this poster for portfolio investment. And its big thing was investigate before you invest. And it really resonated with me because it's true. So either way, you're going to invest time. You might as well be investigating beforehand so that you're investing in knowledge that will last you beyond this project. So investigate before you invest. It's true for stock portfolios and databases. So let's do a little investigation. So from here, if you have any questions or comments, go ahead and shout them or whatever. I think this would be a lot more, this presentation would be a lot more useful to everyone if there was more discussion going around. So if someone has a question, go ahead. If you can answer the question, go ahead. I definitely don't have all the answers. So if you do, that's cool. So the main thing we're going to talk about is the Amazon Dynamo paper. This was published in 2007 by obviously Amazon. And it was, I think, probably one of the first papers to group together a set of technologies that describes a distributed database. So things like consistent hashing, vector clocks, gossip protocols, hits and handoffs, they were all described in this paper. None of them were new. These things have been around for quite a while. But the Dynamo paper described them in a very easy to read way. And if you're interested in any of this stuff, React is a public database developed by Basho that implements the Dynamo system. Dynamo is a proprietary Amazon database. So this is normally how you would start out a database, right? You'd have one node with all your reads and writes going to it. As your application gets more popular, you normally do something like that. You just, more RAM, more CPU. This still isn't really ideal, right? If that goes down, you're hosed, and you have no redundancy at all. So you would move maybe to something like this. You have a couple replicas that you can distribute your reads across, and then you still have the one master that takes writes. And that's still not ideal because eventually it'll fail. Now something will happen, your master will go down, and then your website's effectively down. No one can write data. You still may be able to read it, but that's not all that useful. At this point, I think it's pretty common for people to go and look at maybe master master replication or maybe some sort of sharding. But there are better solutions out there. That's what the Amazon Dynamo paper will talk about. So the first technology would be consistent hashing. This was really the heart of the Dynamo paper. So in our simplified version, we're going to have a hash function that you pass it a key for whatever key value you want to look up. And ours is just going to return an integer from 1 to 100. So from here, we can imagine just a ring space that contains all of our integer values. From there, we could, for all the nodes that we want to have available this cluster, we could map them into the ring. This could be pseudo random. You could use the same hashing function to introduce them as well. So now that we have our basic cluster set up, let's hash Scott Ruby. This returns us 15 for our simple example. So our insert mapped into 15 would place us there. So now our job is to find out what node this data would live on, which is actually really simple. We just walk the ring until we find the next available node. So in this case, the node that it lives on would be for key range 1 to 25. And then so on, the next node would grab the key range 26 to 50. So it's actually pretty simple and I think a really elegant solution. This also makes replication. Yeah, this makes replication pretty easy. We can simply keep walking the ring structure until we find the number of nodes needed to fulfill a replication requirement. So in this case, we find our canonical home, and we go to more nodes to find replicas. So now we have durability in our system. So what this would allow for is, let's say, you know, someone kicks the power cable on that node and it goes down. We can simply walk to the next node and find key range 1 through 50. So this point, we have a pretty high availability system. As long as one node is up, we can read and write data. There is one flaw on this, which I don't know if maybe anyone picked up. But it's, if we go back and look, it's not too hard to imagine that one of these key spaces could be hotter than others. So if you have a really popular user on Twitter, like Oprah was mapped into this, her key range would probably really would obviously be more popular than mine. So the node that stored her data would be hotter. You could solve this through what's called virtual nodes. And I think almost every database that implements the Dynamo system does virtual nodes. So what we do in this case is split our key space up into equal chunks and then map those chunks to various nodes. Does that make sense to everyone so far? Any questions? Do that again? Oh yeah, wait. Okay, so I guess that was a bad example. So for the Oprah problem. Okay, that makes. Okay. Okay. Makes sense. Okay. Right. Right. Right. Right. Right. Right. Right. Okay. So let's talk about some trade-offs now. We know that nothing comes for free, except for all the drinks last night, apparently. So what exactly have we given up? So we've gone from a pretty brittle system to one with extremely high availability. And the truth of the matter is we basically dropped acid. For those of, I mean, so we've all been basically raised on acid, right? Most relational database, most relational databases implement acid, at least to some extent. There are arguments whether or not they actually do. But for those of us who may not know what acid is, so admissity, modifications are like an all or nothing type of deal. The either complete or they don't leave anything in a half-modified state consistency. So nothing in your transaction will violate. The integrity of the database. And kind of more importantly, your system rolls from one consistent state to the next. It's never in an inconsistent state, obviously. Isolation. Transactions can't access data inside of other transactions. I'm sure nothing there. And then durability. So once it's written, once you commit it, it's there. Even if you would turn the server off or kill nine a processor, data should still be there. I think we can all agree to that these are desirable properties. If we could have all of these, we would want them right. Unfortunately, we can't. So the camp theorem was theorized in 2000 by Eric Brewer and was published in 2002, I believe. And so what this describes is three different properties of which you can only have two at any given point in time. So the first is the client perceives that a set of operations occurred all at once. All nodes are available for reading and writing. And operations will complete even if individual components are available. So we're clear partition tolerance. Any time a node can't communicate with the other cluster, we have a partition. This would include extreme latency in the network as well. So if other nodes, if it appears that the node is down, it's essentially down. So we've already kind of decided that we have partition tolerance. Nodes can go down. The system still functions to a relative degree. So we know that we have at least a P in our two letters that we can pick. And this is really you would always want P. Any type of distributed system, you would never want weak partition tolerance. What's the point at that? So basically what we have here is high availability. So because of that, we can kind of infer that we have weak consistency. So at any point in time, our system is inconsistent from node to node. And that's because since any node is capable of accepting a right, there can be times when they haven't properly synced up yet. So a node can have a different version of the data besides the next node. So the other option would be strong consistency and low availability. So in this case, we could do this by, sorry, just for you may not. We could ensure this by ensuring we can ensure this by saying that all of our replicas would need to accept and acknowledge the right. So our right comes in. The canonical home accepts it. And then we could get a higher consistency by blocking any more rights until the two replicas have accepted the right as well. This would lower availability as those nodes aren't available to accept the right anymore. It would also be kind of slow too. So instead, since we no longer have an acid system, we have a base, which is the opposite of acid. Get it? So basically available soft state eventual consistency. The system appears to be working at all times, but it's not consistent. And it'll maybe eventually become consistent. It is base. So where acid is very pessimistic and enforces consistency from transaction to transaction, base is more loose saying it'll eventually get there. So it's fine. So the next thing we'll talk about is read repairs. So since we have an eventually consistent system, it's possible for no nodes to get out of sync with one another. So in this case, we have the canonical home and the two replicas. And we're going to do a quorum read. We want all three to, we want to reach a quorum on the three nodes. So we're reading Scott Ruby again. Getting three responses, two are winning and one is killing it. So that's not right. We have one that's out of place. But since we've reached a quorum, we can simply update it. We're relatively safe enough to do that. This doesn't always work though. So in this case, every node has its own version of the data. Awesome winning and killing it. So read repairs won't work. One kind of elegant solution to this would be something called vector clocks. So vector clock is a, it's a partial ordering for events. So any time a node alters the key, it'll increment its own count on the vector clock. So in this case, the node A, which is the canonical home, has updated the same key three times. And that's replicated over to the, obviously to the replica. So everything's fine here. Where it gets interesting is when there's partitions between the nodes. So the first two writes were written and replicated successfully. The second one, or the third one, however, something happened and the canonical home didn't take that right. Instead, one of its, one of the nodes downstream from it did. So it incremented its own count. So it's at A to B1. This case, it's super simple. The main home can just see that it's a super set and pull it down. There's no, nothing special about that. The other option is to merge. So the same deal. First two writes were done successfully. However, there was some sort of network glitch or hiccup. The canonical home incremented its count by one, but so did the replica. So in this case, we could do one of two things. One of them would be merging, which, so this was designed for Amazon shopping cart. So in their case, they would just merge. So if you ended up with more items in your shopping cart, they'd rather have more than none. So it makes sense for them. In other cases, things that can't be resolved or can't be merged maybe because of your domain. Then you would need to bubble this up to your application layer, and that would have to resolve the read conflict. Any questions or anything so far? All right. So hinted handoffs. Hits and handoffs are a pretty easy way to recover from node failure. So in this case, one node has gone down. So all the inserts for that key range are going to the next one down the line. When that node comes back up, internally using different protocols, that secondary node simply informs the node when it comes back up what it missed. Simple, right? This is mostly done through gossip protocols. So gossip protocols are a way for nodes to do inter-node communication. Like pseudo-randomly and very often, they'll communicate the state of the cluster with each other and also ask the current state. So our third node is saying, you know, node one looks down, node two is informing the rest that it's going to accept that key range. Is there any questions on or questions or comments? I'm not sure how that would actually be implemented. It might be like through a time span. So for at least a period of time, it will be accepting the extra key space. And then using, I would say, the gossip protocol, if it was down for a sufficient length, it would rebalance the key spaces. I could be wrong in that, though. Any other questions? Yeah. You would still ensure that there was three replicas or whatever you tuned it to. So that's Dynamo in a nutshell. I'm pretty quick. So by doing a little bit of investigation, we've learned about consistent hashing and understanding of the problems they solve, too. And more importantly, we're learning what we're giving up in return for this type of power. So in other words, we're winning. So just some takeaways. Investigate before you invest. I really think there's power in that to every point in your technology stack, not just databases. Choosing the right tool for the application. And more importantly, do your homework. These papers are pretty, I wouldn't say fun to read, but they are academic. So speaking of homework, I have some for you. I have five papers in one blog. So the first paper, Cod's relational model. It's the basis of relational theory, as we know it. I think it's important to understand what we're giving up before we give it up. We've already decided that SQL sucks without fully understanding what SQL is. Cap and base papers. The cap paper is extremely short, but extremely dense. I think it's taken me months to actually comprehend what it's trying to tell me. Amazon Dynamo, which is a great paper to read. Big table, which I didn't go over anything with big table, but it's another paper that spawned quite a few databases. And then the blog is Werner Vogels blog. He's the CTO of Amazon. I just discovered this blog, and I'm kind of disappointed I haven't been reading it the entire time. He does an excellent job of describing distributed systems in general. Any more questions? I have not heard of anything like this. I guess you could write something into a gossip protocol for that. This one node has 900% more activity than any of the other ones. I don't know for sure, though. Any more questions? All right. Thank you.