 Thank you very much for that. Oops. Thank you very much for that wonderful introduction. And thanks, everybody, for joining, taking time out of your day to join this webinar. I hope it's valuable to you. We try to make this stuff as educational as possible. This is, I think, things that we live in day to day here at Cockroach Labs. And while they're important for Cockroach database and what we're doing, hopefully they're applicable in what you're doing in your day to day lives and the way you think about data in a distributed environment. I think the work that's happening within the Linux Foundation to promote and extend the use of distributed systems via cloud native and the CNCF, I've been part of that for a really, really long time. Super exciting. And I think these concepts are applicable well beyond what we're doing. This is less the commercial about Cockroach database and more a commercial about distributed systems. At least I hope it's that way. Gosh, ma'am, by all means feel free to ping me up and chat and QA along the way. Like I stated, I'm Jim Walker. I'm Principal Product Evangelist here at Cockroach Labs. I am, I don't know, I still claim to be an engineer. It's been a while since I coded anything professionally, but I still mess around in an engineering undergrad. And then on Twitter, I'm James, which is the Chicago pronunciation of my name James. So this session is intermediate. This is a beginner material. This is a little bit more complex, but we're going to get it's not it's not advanced. We aren't going to have like fingers on keyboard. We're going to go through some concepts. Now, that said, I am definitely no distributed systems expert. I live in this world. I'm curious about it. I thrive in this. I think it's incredibly interesting. But I do believe that the content today and just kind of this distributed mindset and distributed concepts and distributed principles, I believe, and I'm a firm believer that this is what makes careers. I think this is kind of the coming of IT and coming of tech is distributed systems in cloud. And so what I'd like to do today is give a high level context of these kind of important concepts, right? So we're going to cover the cap there. We're going to talk about raft, which is a distributed consensus protocol similar to PAXOS if you're familiar with that. And I'm going to also go through MVCC. We're going to talk a little bit about how this kind of all comes together to provide an active, active database. Now, Cockroach in particular is a CP database. And we'll talk about that in a little bit. We are on the side of consistency. But there's a lot of things that you can do in the context of the cap there, where if you if you if you're one type of database, you kind of have a range of abilities on the other side. And we're going to talk about that. And as noted, oh, gosh, please do ask questions in the QA panel. I've got a sidecar going here on my iPad so I can see questions. And I'm happy to stop along the way to drill into things or to answer any of your questions along the way. So with that, let's get into this. So, you know, I'm a student of history. I especially tech, I find it to be extremely interesting. I mean, there's kind of four kind of big moments in time, I think are really important to kind of think through. And how do we get to where we're at? I think about, you know, fair child semiconductor and the treacherous seven there, the treacherous eight. I mean, these are the guys that actually started Intel. They was a major, huge, massive group of people who really innovated and to me kind of started this entire tech community and the whole tech tech startup world. I also like to note, you know, Xerox back in the back in the 80s, Xerox was a big player in the Apollo Health Research Center, which is actually still going to Park Place. You know, what came out of that was Windows, the mouse, everything that we think about in computer today. A lot of that stuff kind of started there. There was a language called Small Talk that's my favorite language of all time. And then digital had a huge group of people that, you know, really kind of reinvented the way we think about, you know, infrastructure of modern chips and risk processing and these sort of things. Now, there's a big group of people that were at digital when they got sold to compact in the 90s that stuck around for a little bit but they really kind of collapsed their own data department and a huge group of them went to Google. And there's this kind of connective tissue from way back in the 60s and 70s all the way through to where we got to Google. And I bring up Google because to me, you know, the advent of kind of Google infrastructure for everyone else, that's where we're at today. And the technologies that I'm gonna talk about today, the theorem, these principles, honestly, you know, that the Google team has done an amazing job and a phenomenal job basically innovating and making all this stuff happen. And I like to start there because I think there's this two gentlemen on the right-hand side of the slide, Jeff Dean and Sanjay Gamalot. If you don't know who they are, I just believe that anybody who is in tech and in kind of modern distributed systems, you need to know who these two are. If you look at some of the papers that they've written and the research they've been involved in from MapReduce to Bigtable, the start of kind of NoSQL, the GFS and, you know, that kind of became closets, the entire backend file system of Google. Google Cloud Spanner, which is actually something we're gonna talk about a little bit about today, TensorFlow, you know, these two gentlemen have their hands on just about everything. And I think the modern internet and the way we think about, you know, our systems and our lives really have been affected by these two. And I always like to start here because you gotta give kudos to some people where they started. If you want a good read on kind of, you know, early days of this and the 2000s and where kind of Google started this, like, you know, really building scale out internet service. I mean, if you, you know, go back in time, I mean, Google came out, it was just like, you know, it was like Google search engine where there's a little bar where you're like, what's the big difference here? Well, the big difference was the way the backend worked. And so this is actually a really, really good piece of research by Jeff Dean that I think is actually pretty awesome from a point of history to kind of look through and go check that out. Now, just as a side note here, my employers, Peter, Spencer and Ben, they were at Google as well. Their employee numbers are in the mid 300s. You know, Peter and Spencer actually worked heavily on Google Colossus which was the next generation of Google file system. But they saw a really, they had a front row picture of a big table, mega store spanner because basically GFS and Google Colossus actually sat underneath these things. Lots of interaction. And then Ben, you know, worked heavily on Google Reader. You know, God bless Google Reader, RIP. But really great stuff. And so I bring this up because, you know, Peter and Spencer left Google and they started a company, they worked in a couple of companies and they were frustrated because they didn't have like the scale out tools. And that was the genesis of Cockroach database. I mean, that was, you know, how do you take these principles and actually allow them to be used outside of Google for lots of other people? And I think that's exactly what we're seeing with a lot of technology today. I think, you know, the Kubernetes spaces is a really great example of this as well. And everything we're seeing kind of this distributed systems this next generation. So what is the cap there? Let's get into the topic, why we're all here, right? So the cap there theorem is actually brewers theorem. You know, I think we called cap theorem because that's really what it's all about. But there's a gentleman by the name of Eric Brewer. And if you're also again, if you kind of go back in time, you know, Eric was the founder of InkTomy. And what InkTomy did around search in the late nineties was truly phenomenal. In fact, a lot of the people that were at InkTomy ended up going to, you know, places like Hortonworks and Cloudera and kind of this movement towards big data, right? And so this group of people really kind of started this concept of thinking about data and search and how we deal with this data in a wholly different way. And Eric, you know, as founder of this had a huge impact on this, you know, fast forward, you know, until, you know, 2000s, early 2011, he actually ended up joining Google. Now, I bring up, I mean, Eric has been phenomenal and his work around the cap theorem is just awesome. This concept first appeared in 1998. So it goes way back. And we were debating the concept of acid. So Adamus, you can see isolation and durability within a database versus base. And based it for basic availability, soft state, eventually consistent. And if we think about that, it sounds very much like no SQL. So we've been talking about the concepts of no SQL versus kind of relational databases and what are the trade-offs? That's really the genesis of this. And it was the genesis of the cap theorem. And very soon after that, you know, like the base principles are kind of used to kind of go forward and build big table and some other things. Now, Eric publishes, it's an IEEE CS document and he published it. It's called the Cap Principle in 1999. He presented this in 2000, Principles of Distributed Systems or Principles of Distributed Computing Conference, PODC. And that was really kind of the first time that the kind of world saw a cap theorem. And then this thing was proven by a couple of, I think researchers at MIT in 2002. So it was improved as a theorem. So the genesis of this really goes back to its Brewer's theorem, but it is the cap theorem. Now, in 2011, fast forward, Eric joined Google. There's this, another wired article. It's actually a great read. You know, Eric sat down very near Jeff Dean and Sanjay Ghemulat. And so you could start to imagine some really amazing engineering minds coming together to do some really interesting things that he's bringing to this concept of the cap theorem. And the change in infrastructure at Google around this time was tremendous. They really moved towards this kind of automation of everything. And this is a big piece of it. I bring this up and Eric is amazing because if you read this article, he talks about, he's like, I didn't rewire Google. There was a whole mess of people and a whole lot of people involved in this. And so he was a big part of that team and really kind of a lot of leadership there. So if you wanna learn more about distributed systems, I would go check out actually, if you check out the Google scholar stuff that Eric Brewer has published, Jeff Dean and Sanjay Ghemulat, you'll read some papers that are truly just amazing. I mean, you know, Borg, Omega, Kubernetes, cap theorem, you could go through the list. I think Google does a great job of publishing this stuff so that we can all read it and actually I'll learn from it. So this is where I kind of get a lot of the stuff that I learned about distributed systems. So I would highly recommend this to everybody. So, okay, great. So all right, it's about 10 minutes. I just wanted to give a little bit of a baseline. So let's get into the cap theorem. So the cap theorem basically states that it's impossible for some sort of distributed data system to simultaneously provide these three guarantees across consistency, availability, and partition tolerance. Consistency just means every read is gonna return the right data or an error. Availability is gonna say, yeah, I'm gonna have access to the system. And then partition tolerance means in a distributed system, it can tolerate the disconnection of pieces of that system. So say it's nodes, if a network fault happens between two of them or a node gets separated, it's gonna be able to tolerate that. And so those are the three concepts. Somebody's asking, can you share the scientific paper links referred to in the webinar? Absolutely, there's some QR codes. We'll share the stuff later. So I added QR codes. So we'll definitely do that, Mohammed. So we'll definitely get that out to you. So let's talk about consistency. Consistency is, so two requests are made from two different users against different nodes, they're gonna get the same data return. So here we have two users, they're both asking select star from customer and you're seeing two records come back. It is going to be the same, no matter which node or which component of that distributed system is asked. And that's consistency. That's a really, really difficult thing to do because, well, how are you synchronizing data? How are you dealing with kind of differences? If, can they all actually surface reads and writes? There's lots of different complexities here. And I think consistency is a really, really difficult problem to solve in distributed systems. Now, availability a little bit easier, right? So if two systems, if requests are made against two different nodes in the distributed system, you're going to get a response. That response isn't guaranteed to be correct, right? You may get different answers by the two things, but you're gonna get a response. And so basically the data is available everywhere within that distributed system, right? So we have consistency, we have availability. And then there's this partition tolerant, which I kind of said, it's like, look at, if there's a disconnection between parts of the system, you're still able to survive that, right? And so this guarantee is kind of special because ultimately when we look at this, the cap theorem states it's impossible to have all three of these and you can only provide two of these. Now, ultimately the permutations here are three. There's a CA database, which is consistent and available. Now, if you aren't partition tolerant, well, then you're kind of, you're not distributed. It kind of detracts from the whole concept of this, right? So a CA database, a CA status system is Postgres. It's a single instance of MySQL. It's a single instance of Oracle. That's CA, it's consistent, it's available on a single node, on a single instance, right? There's not partitions of that thing, right? I guess you get to kind of start talking about sharding and these sort of things, but there's lots of complexities there and the availability of the consistency, you have issues within those things, right? So a CA database is kind of a special thing. I don't think of that as a distributed system because the partition tolerance thing comes into play here. Where this gets interesting is when you start talking about an AP database or a CP database. And I think these are the two things that you want to be thinking about for your workload and whatever it is you want to accomplish with the data system, right? An AP database is going to be available in partition tolerance. So you're going to have access to the data set with no guarantee to getting the same response. This is Mongo, this is HBase, this is Cassandra, this is a couch base. Anything that's kind of these, there's no SQL databases, that's what you're going to get. You're going to get eventually consistent. You're not going to have guarantees on transactions and correctness of data all the time. At least things are fantastic because you can scale out, you can get really great in certain workloads, right? And so if you just need to basically get data to places, well, it's really fantastic option, right? You're going to survive the partition tolerance, right? And you're going to actually have the data available all the time. Consistency a little bit different. You're going to guarantee that every request receives the exact same response number out of who asks it. This is Google Cloud Spanner. It's really kind of the original, kind of going down this path with a CP database and Cockroach database actually is a mirror of that. Just kind of done everywhere else. And so CP we're going to talk about, right? So if you're going to implement consistency in a distributed environment, there's really two key concepts and a lot of whole, a lot of engineering work at the foundation of really what we're doing at CockroachDB, but I'm going to talk about these principles just in general, right? So we're going to drive into kind of the CP thing and we'll talk a little bit about AP at the end of this. So, okay, so RAFT is a pretty important concept within the world of distributed systems. This is a distributed consensus algorithm which really allows you to provide atomic rights in cases of reads. So what is an atomic right? An atomic right is basically saying, just as an acid in the acid principles, an atomic right means that basically if I'm going to write something, the atomicity of that transaction is complete. I'm not going to get halfway through it or one of those. It's like, this is the transaction and doing that in a distributed environment actually can prove tricky, right? Because if you think about a simple select statement or not a select, but a fairly complex insert or something like that into a table, it seems like a simple SQL statement, right? But on the back end, the database is actually breaking that down into like four or five or 10 or 15 different subsequent transactions that are actually happening. So, do those transactions actually commit as a whole? And that's like this concept of atomicity. Now, RAFT is implemented as an odd number group of replicas of data. So basically it's like when I write a record, I'm going to have three copies of that and that's going to be stored in three different spots, right? Now, this is configurable within the RAFT protocol, however you want to do it, but it's got to be odd, right? It's got to be three, five, seven, nine, 11, whatever you want to do. And there's different reasons why you would want, multiple different types of replica sets or size of replica set. RAFT is a very chatty protocol. There's constant communication between these RAFT groups that are happening, the three replicas. And then it's always keeping time via these kind of coalesced heartbeats, right? And so, as a system, when you're thinking about distributing consensus and especially consistency of writing data to the database, time becomes very important because ultimately that's what's going to give you, if transactions are overlapping something. We're going to talk about that when we talk about MVCC as well. Okay, so great. Within RAFT, there's a special group, if you think about the three, you know, if three, five, seven, there's always going to be one RAFT leader. You know, cockroach, we call this a leaseholder, if you're familiar with cockroach, but the RAFT leader is special. It's elected by the group of three, the three replicas come together and elect this. It's going to coordinate all rights and commands to the followers. So if you're going to write something to this group, right, you want to write a record, you want to write it three times, you're going to go through the RAFT leader. The RAFT leader is going to work with the followers to commit. And as soon as two of three, so the leader and one of the followers commits, great, I have quorum, write that thing. And that's really kind of this special nature of RAFT. And one of the key concepts between, you know, behind this distributed consensus algorithm. When you get consistency in a database that is distributed, using this is important. This isn't synchronous. This isn't async replication of data. I don't have like two systems where I'm writing into one and synchronizing that over to another. No, no, no. This is basically all participants coming together to ensure that you're going to have the right data. And that's all kind of governed by RAFT. Now the RAFT leader is also the only person or the only member of this can actually serve like an authoritative up to date. It is always up to date. Now if the RAFT leader dies, the two followers will come together and elect a new RAFT leader and create a new copy of that data as well. So there's a lot more on this. Actually, I'll hold on a second. I'll come back to that last slide. But if you go to the secret lies of data, man, I don't know who did this, but it's phenomenal. It's a great description, graphical about how RAFT works. So if you want a little bit more on how leaders are elected and how RAFT works, the secret lives of data.com does a great job. And again, here's a QR code for y'all if you wanted that, that's right. The RAFT leader also helps us with this atomic replication, right? So when I want to insert or when I want to actually do something, I'm actually talking to the RAFT leader and the RAFT leader is going to make sure that the right happens in each of these places in its entirety. And that's a really, really important concept because if we're going to ask for data from this distributed system, getting the right data return from a CP point of view is really, really critical. And so that's kind of the, one of the key concepts within RAFT that are really, really important to actually understand in terms of how these kind of the cap theorem gets implemented. There's similar concepts in Paxos. I'm not going to get into Paxos today. Paxos is another distributed consensus algorithm that if you want to go check out, there's lots of information about that out there as well. We've implemented RAFT in Go. Our implementation is available. If anybody wants to go check it out, I always contend that I think the code base of Cockroach Database is a bit of a PhD in distributed systems. So if you want to go check it out, you can check it out in our implementation. I know we contribute a lot upstream to XED RAFT. So if you're familiar with the XED project, which is governed by the CNCF, it too uses RAFT and our implementation is definitely contributed upstream in there. So there's lots of more information about RAFT and you can actually go check it out and the algorithms and all that, it's all online and implemented in lots of different places, right? Beauty of open source. Okay. So that's the, again, Secret Lives of Data, it's awesome. So now let's talk about consistency of data when it comes to time. One of the biggest challenges in a distributed system, when you're actually writing data into that distributed system, if you have a write coming through on two services, yeah? And they're writing to basically the same source or they're trying to come correlate this data, the concept of time becomes really important because first in, first out, last in, first out, like what are you implementing? Now MVCC is multi-version concurrency control. We're using this to implement serializable isolation in our database. So we're gonna guarantee based on time that each transaction as it comes into the system is going to be correct. And we're gonna make sure that's right across every single node in the system. Now MVCC is described pretty well. Actually, the Wikipedia article on this is actually pretty well done. So if you guys wanna check that out there, it's actually a pretty good read. But I'm gonna try to go through this in a quick way. Again, I'm an ex engineer, I'm just a marketer, so I'm just gonna leave it at that. But I think this does a decent job. So in this description, we kind of have three things that we're gonna talk to, right? There's a transaction, there's a timestamp for that transaction, and then there's a row of data or an object that we need to write to the system, right? And those are the three concepts that, so it's green, I guess it's orange, I don't know, I'm a little colorblind and blue, yeah? So there's three concepts. So let's just talk about a very simple transaction. So at time zero, which is up there in the top left, at time zero, I have a transaction that comes through. Maybe it's a write, maybe it's an update out of it, right? And I have a timestamp on that transaction as zero in the system. Now what I'm gonna do is I'm gonna say, hey, object, let's write that transaction to you. So maybe I'm updating a customer named Cisneros, right? And so I'm gonna do that. And so what I'm gonna do is I'm gonna say, okay, write this. And that's at time one, that happened. The object says, okay, great, I update my write timestamp to one second in the system time, right? Because that write came through at one second. So the object has this time, he has two timestamps, he has a write timestamp and a read timestamp. What it does in the backend, it creates this kind of like temporary object, a marker, if you will, in the database, right? And it says, great, I'm not done, but I'm kind of in this half completed state. And I'm gonna write this data to that thing, right? And I've been successful, right? And so I went through, I did this and I'm successful. So at time three, it says, okay, great. My read timestamp is gonna be updated to three seconds because I have committed this in the backend and my read timestamp is three. I'm gonna set it to get an argument back and that transaction has executed and it's successful. So at the end of this, we're looking at three timestamps, the transaction, its initial timestamp was at zero. The write timestamp, when it came into the object, was at one and it took two seconds for that thing to actually commit and go to read. Okay, and so I kind of have these two timestamps. And it's two timestamps, this actually is really, really what's interesting here. So let's try it again, but with a conflict. And you'll start to see how this works, right? So, okay, great. So at one second, I come in, my write timestamp goes to one, right? It's on the right hand side there. I'm gonna create this temp object, I'm at time two, and then wait a second, at time three, before I've actually committed this, before I'm done actually committing this in the backend object, another transaction happens against the same object with a timestamp of two seconds. So what happens is the timestamps are compared. So this transaction has a timestamp of two, but wait, my read timestamp is zero. So if I'm gonna try to, if I'm gonna write this data, I'm out of, I'm actually, I'm not writing to the most recent stuff because my read timestamp is off, right? Because my transactional timestamp is greater than that read. It's like, wait a second, we have a problem. And so typically what happens here is you get a transaction retry. So you need to try catch blocks around these things so that you're actually gonna, an error is gonna return. But this is really the magic of this. It's this concept of kind of having temporary objects and these two timestamps so that you can actually start to prepare these things and you start to understand, when and where these things are gonna happen, right? And so long story short, basically each transaction is gonna happen like you're standing in line at a store. I can't complete one transaction until the last one is done against one of these objects. Now it's pretty crazy when you start to coalesce across multiple different rows and multiple different objects, but this is all governed and maintained by MVCC. Now when I first understood MVCC, it's pretty cool actually. And this is the most simple description that I could do for this. I hope it describes it, but actually, so there's a question in the chat. How does MVCC compare to Google's true time? Is it complementary to it? I don't know the exact details of how true time works. I know within Google Cloud's manner, they're using atomic clocks to actually establish time across all these various different components. We're using, in Coggerty actually, we're using like NTP with some logical drift behind that. There is a blog post that we wrote called Living Without Atomic Clocks. I think that's the name of it. If you search for Cockroach Labs and Living Without Atomic Clocks, there's a great blog post that gets into that information. I'm sorry Ashish, I can't give you exact comparison to Google True Time. I don't know it well enough, but maybe that's a good topic. I'll reach out to my friends at Google. Maybe we have a talk off on that thing. I mean, maybe it might be a really good session, so. Sorry about that. Okay, so that's basically MVCC. Okay, my caveat. I'm also just the marketing guy. So I think it does it decent. Now, there's a lot of complexity in code to actually get this done. And the complexity of transactions in a distributed system are pretty complex. And I think if transactions and this sort of stuff was easy, there'd be a lot of systems doing it. I think there's a lot of algorithms that you can use in your lives to actually simplify these things. So how does this apply to your services and what you're actually trying to apply? How do you take these concepts of working into the stuff that you're actually already working on? And I think that's why, you know, cap theorem is important to understand, but like, you know, Raft and MVCC as, you know, just algorithms I think are pretty awesome. So, okay, so that's consistency. Raft and MVCC are critical components. And understanding that, that's how we're actually guaranteeing the same data across every different node. But let's talk a little bit about availability and storage and Raft are actually critical concepts underneath this. So availability in the old world, you know, when we had kind of, you know, the legacy database, right? Is Kenny had this active passive backup system, right? You dealt with the synchronization, right? Like from the primary system to the secondary, you know, right will come in, right by primary. And then, you know, there's some synchronization with some backend secondary. That's not distributed consensus. That's just basically synchronization. Now, this is costly. You have two big machines, two big databases. The synchronization sometimes is out of sync. These things go down. What happens when one goes off and one comes bomb? You know, how long does it take to do the failover? What happens when things come back online? How do you remediate the differences between a primary and secondary? I think there's some interesting things going on in this world in terms of active passive systems. Like the concept of having, you know, shared storage underneath, you know, a primary secondary database. That's kind of interesting. But still has issues. And the concepts of raft and distributed consensus actually get really interesting in this world because we're not doing this synchronization. We're actually writing in consensus. And that using raft from that point of view is a key piece of kind of this availability world. But we also kind of need to survive regional failures. People typically synchronizing from one data center to a whole the other. So they can actually survive that as well, right? So in an active, active database, we're building on these kind of core concepts, right? We're building on raft, right? In active, active database is a cluster of physical nodes. Every node can accept reads and writes, which is actually a pretty important point here. If you're going to be active, active, and the database is going to be, you know, is going to be implemented as a series of different nodes, can every single node accept both reads and writes? That's a big question because if we talk about scale, you're talking about the availability of something. If you have a single write node as some kind of larger scale database, they have a single write node. If that thing goes down, well, your availability goes out the window because you can't write, right? So somebody has single write node is actually a big problem. I think something that has to be addressed and you're going to be truly active, active. It's reads and writes across every node and every endpoint can access the entirety of the data underneath. This approach actually eliminates the whole concept of synchronization, but you're going to have to span while in two data centers. You actually want to do this in three, right? This is this distributed consensus thing, this replica of three actually overlays, right? Across multiple different regions, right? And so there's lots of challenges in this, but active, active kind of relational databases are special. There's a couple of them out there and that really is kind of this concept behind distributed SQL and this emergence of a couple of new databases in this. So if I think about this availability, we're using Rafter. Basically what we're doing is we're going to write data. Let's just take this table. There's a bunch of German records here. We have four records and we're going to write them across, you know, a distributed system of six different partitions, but we're going to write one across three different partitions. So the Mueller record gets written across three different things. And we're using Raft, this replica said, to make sure that they're across three different physical locations, right? Because now if one of the nodes come down, I still have access to the data somewhere else within the system and that's availability. And so Raft is used in this concept as well and it's a critical piece of what we're doing. The other thing that's really beautiful when I talked about this, every node is a gateway. So if I'm going to select data down there, user one is actually asking for the Mueller record. It's not on the node that they ask, but you'll get it from the Raft leader, which is located in the middle and the top there. The same thing on the right-hand side for say Wagner. And so, you know, it's kind of one of these concepts, right? You don't need to know where the data resides, but it's always available. And it's another kind of critical concept. But you can also survive regional failures. You know, we have a piece of data written in three different regions. When a region goes down, your load balancer just switches over to another region and you start accessing that data, you're always going to be surviving is the failure of a zone or a region or whatever it is you want to, whatever your failure domain is, right? It's like kind of one of these big things around availability. So there was a question that came in. Raft happens to have a single leader. Every right goes through it, but in active activities, every node has to accept rights and rights. How does Raft fit in here? Great question, big guy. So in this concept of, you know, we were talking about Raft and how we can access the data and there's a Raft leader. In this next slide, what we did is we asked one node for the data. Now the system itself knows where to find the Raft leader, right? You don't have to go to the node where the Raft leader lives. Every node is an endpoint and basically the system coordinates so that every node knows where to go to get that data, right? So you don't have to go to the Raft leader. You just coordinate with it. So in this case, I'm selecting Mueller. User one is going to this node in the bottom left. That node in the bottom left doesn't hold that data. It knows where to find the Raft leader, which is in the top middle and go to get that. So it communicates. It communicates within all the different partitions and nodes within the database to go find that information. So as a user, you don't care. And the beauty of this is you just, you set up connection pool and you have a load balancer in front of all this and users are connecting to any node within the entirety of the database and they just find the data. And it just doesn't, you don't have to actually direct things anywhere. The system itself sorts that out for you. So I hope that that actually helps you there, but it really comes down to identifying where the Raft leader is within the physical location and all that data. And that's, you're always going through the Raft leader to actually get that. Okay. All right. So, yep. So when a region comes back online, you just simply redistribute the data and nothing was lost. So in the context of what we're doing at Cockroach as, we want to make this as simple as possible. So this is a little bit about us here. And I was using our logarithms, but ultimately, when you deploy an instance of Cockroach database, you're defining a node or a partition, if you will, within a region or within the AZ, you're giving it a location. And we basically have built in DML to allow you to basically survive the failure of a region or an AZ or whatever it is that you want to do. And we've really abstracted this down into some simple kind of DML commands. You would check out our documentation. I'm not going to get too deep into this, but we can survive things by row, by each table, the entirety of the database. I think our documentation does a great job here. So if you want to learn more about Cockroach database, got by all means, go check out our docs, use CockroachDB serverless, whatever, right? All right, so raft again, distributed consensus algorithm, go check out secret lives of data. Can't give them enough cred. And then availability is using kind of the underlying concepts of storage and raft to actually do this, okay? So that's kind of a quick tour of cap theorem, raft, MVCC, again, cap theorem is just saying, it's impossible to have all three of these. You can only provide two. Now, I think it's actually pretty important to understand even in an AP database or a CP database, there's ranges. Look, and I'm not saying Mongo is bad. I'm not saying HBase or any of these no SQL databases are bad. I'm saying that for certain workloads, certain things are better. If I'm evaluating things, do I want to code against the document model first? The relational model, it's a big question. Does it matter if data is always correct or not? Do I matter if I have eventual consistency versus serializable isolation? These are trade-offs that you have to go through for your workload. And choosing the right database is actually pretty important in that context. And so to me, I think cockroach fits a lot of different things. I think a no SQL choice probably makes sense in some workloads as well. But it's important to understand especially in distributed systems, the trade-offs between these two types of systems because that's where you're going to actually run into troubles in your applications. I think the value of the cap term is understanding those things. And so that's why I think the cap term is actually pretty important. Now, we have conversations about this all the time. I'm happy to talk to anybody about it, engage with us. We have a community Slack channel that we have lots of people who actually have coded this stuff, who are engaged there. You have direct access to the engineers. If you want to try these things, you can use it in Cockroach DB. Cockroach DB dedicated is a dedicated instance for any, you know, it's a managed service that you just started up at CockroachLive.com. We actually have Cockroach DB serverless, which actually delivers this right now, single region, but a completely serverless database, free to certain limits, five gigabits of storage and a decent sized application, something called 250 million RUs request units. But your agent will actually build on Cockroach as a database for free as a managed service. And so we're just, you know, if you want to go check it out, you please do that. You know, for us, this is about making all this very easy. We want to give developers the relational model. We want to make sure they have serializable isolation. We want to scale out writes, not just within a single data center, but beyond many and make that really simple inside a single data center as well. We want to automate scale. So you never have to share a database again. We want to automate operations. So you don't have to worry about ongoing management of these things and setting up kind of active passive. We want to eliminate downtime. And we've gotten a long way to do that. And then ultimately, you know, delivering all these things is kind of a cloud database solution. So if you want to go check out these things, please do go check out cockroachlabs.com, Cockroach database. And that is all of the content I answered, the questions that I had along the way. But if there's any other questions, I'm happy to take them. I hope this was valuable to you all. You know, I enjoy, I enjoy learning about these things. I enjoy certainly presenting on them and engaging with the community. So again, if you want to engage with us any deeper, we're happy to go through these things. So one more question came through. Can we integrate other open source code solution available on GitHub with Cockroach DB? I'm not sure what other kind of code solutions you might want to actually integrate. But I mean, there's lots of different things we do within Cockroach database to allow you to integrate the API and lots of different other ways. So, you know, I'm going to presume there's lots of other open source code solutions that will integrate with Cockroach. I mean, we're wire compatible with Postgres. So if I think about, you know, the tools and the things that we use within the kind of Postgres ecosystem, a lot of them are just going to work on Cockroach DB, which is actually pretty beautiful. And we're pretty open about ourselves as well. Again, the code base is available on GitHub if you want to go check it out. And like I said, kind of a PhD in distributed systems. There was another question. Can you please elaborate on time in MVCC? Yeah, I'll try. I'm not sure where to go with this Fahad, but I'll try. I think time when you're trying to coordinate things is really important to have right across the different components of the system. And there's things like true time. There's atomic clocks, which are in the spanner, which allow you to synchronize time across, you know, two different things. Like, so Fahad, if you look at the time on your clock in your house right now, and I'm going to look at this one over here, they're going to be off by at least maybe four or five seconds, I mean, on average, right? I mean, geez, maybe our phones are pretty close, but even those are off a little bit sometimes, right? And so getting the concept of time very, very right when you're doing these distributed transactions is incredibly important, because that's really what allows you to kind of get this, the serializable part of isolation, that the ordering, correct. And so you can use hardware, which is atomic clocks. You can use true time. I think there's a couple of the clock providers have that sort of thing. So if you're looking at that across your services, you may want to check out true time. We've implemented it ourselves within CockroachDB again, you know, living without atomic clocks, we actually outlined how we did that. So time is a really, really important piece of MVCC for sure, because that's the timestamp. It's a critical piece of that, so. Let's see, there was another note. I was reading this about the CAP Theorem. Yeah, so actually this article, 12 years later, how the rules have changed about the CAP Theorem. This is an anonymous attendee and dumped this into the, into the, actually, I'm going to copy and paste this over to the chat. Yep, there was, I don't know if, I don't know if they can go out to everybody, maybe at the Linux Foundation, people can help. There's an article in InfoCube, and it's called, if you type in 12 years later, how the rules have changed for CAP Theorem. That's actually written by Eric Brewer, a phenomenal read. I would go check it out. And he investigates basically the trade-offs between CNA and kind of like CP versus AP and how the one that's left out is a range. And I think that's the best way of thinking about these things. So, great read. So whoever dumped that in the QA, thank you, because that's a great article. Another anonymous attendee, would you please mention the target workload types for Cockroach database? You know, that's a great question. It's self-serving for me. I'm happy to talk about this, but you know, workloads on top of Cockroach, anywhere you would use Postgres. Really, literally. I mean, anywhere you would want to use, like in a single region, lots of value, like you don't have to worry about scale. You just throw a note at the database and it scales. You don't have to worry about sharding anymore, right? Like it's getting rid of that concept. This concept of resiliency and having that built into the system, this is a distributed system concept. It's really, really important. Like you don't have to worry about active passage. Just those two things alone, just removing the operational complexity around those in a single data center, hugely, hugely valued, right? Like, thinking about Cockroach in a single day is great, but like, it'll also allows you to expand to multiple regions and across the globe. So the concept of global scale actually becomes really important too. So I always think of like, simple applications in a data center, but if you want to grow into the global as well, also a great workload. And finally, I think, you know, your comfort with the document model versus the relational model, sometimes has input into, you know, the workload and what you're actually working on. You know, I mean, like doing online schema changes, having referential integrity, doing joins, you know, secondary indexes, you know, from a data point of view, I find those things to be pretty important. But then again, I was indoctrinated into SQL long, you know, when I was an engineer. So I think it's, you know, what's right for your workload is really going to drive what the type of database that you're going to go and implement. I mean, from a pure relational cloud database, you know, I'm a Homer, but I'm just going to say Cockroach database. So it's really kind of between, you know, relational versus kind of no SQL. And it comes down to basically your level of comfort with consistency and your comfort with document model versus relational in terms of the tradeoffs of those two things. I think that's how I think about those two things. So it's really up to your workload and what you want to accomplish and what your goals and your team is comfortable with. So, well, cool. There's no more questions. We are at 45 minutes past the hour. I think I'm about right on time. So with that, I sincerely, I wanted to thank everybody for joining. I know it takes a lot to take time out of your day to join this. I thoroughly enjoyed talking about this stuff. I hope you're all interested in it. You go research and check out some of these things. I find that to be compelling. And most of all, I really hope this was valuable. You know, if there's any feedback, please do, you know, find us on Twitter, find me on Twitter, I'm just James, join our Slack channel, any of those things. So thank you everybody again for joining and have a great day. Yeah, thank you so much to Jim for his time today and thank you to all the participants who joined us. As a reminder, this recording will be on the Linux Foundation YouTube page later today. We hope you're able to join us for future webinars. Have a wonderful day.