 Great. Thank you very much. We are recording this webinar. Wonderful. Good morning. Good evening. Good afternoon. Good day to everybody. My name is Jim Walker. I am the VP of product marketing here at Cockroach Labs. And I'm coming to you today from Denver, Colorado. It's a beautiful day. So, you know, we have a database that is cloud native. A database that was really built and architected on some really kind of core, you know, distributed principles that I think, you know, have an impact on kind of the way that all of us think about our applications and services as we kind of move to the cloud and we deploy applications. And so I'm hoping that this talk today, while an overview of Cockroach database, also opens eyes to some special ways about thinking about data. And thinking about systems in a distributed world and the way that we approach these things. You know, people often ask us, you know, what kind of what level is this is a little bit more immediate, you know, I'm not going to be hands on code, but it's definitely not going to be kind of 10,000 foot level either we're going to we're going to dive into the weeds we're going to talk about which is a distributed consensus protocol, we're going to talk about, you know, distributed transactions and how you think about that and distributed system in the data, multi version concurrency control another really kind of important algorithm to think about. As we build out these distributed systems, the effect of the speed of light on what we do, which as you move towards being distributed, you no longer think about a single location you have to think about, you know, multiple and the speed of lights no joke we can't beat it at this point at least you know from what I understand we still can't get past that I can't wait until we do but I don't know if I'll see that in my lifetime but but still so we have to deal with speed of light and how do we do that and then we'll talk about some of the things that we're doing in transaction pipeline in some more advanced topics in cockroach and hopefully throughout the way this is valuable to you, more than just a introduction and kind of a under the covers into what cockroach database does. And I hope it opens eyes from a development point of view of kind of what these things are now, by way of introduction. Yeah, I mean this is purely architected for the cloud. This is not something that was like, you know, lift and shift let's take Oracle running a container in the cloud. You know, let's kind of deploy Mongo in a way that is kind of in a cloud and distributed but not totally cloud native not like not aligned with you know some of the core principles. I think a lot of them are in there. But but how do we actually deal with you know these things in broadly distributed environments it's not it's not moving improve it's not take something, change one piece of it to make distributed. There is an emergence of a group of databases that that are truly distributed and we, we think about distributed systems looking at the code bases of these systems I think is is truly valuable. I'd like to think of the code base of cockroach database as almost a PhD and distributed systems, and it's all open it's it's all open source so you know our get repo a few if you code and go. It's an extra bonus because it's it's all go code so it is it is a great implementation, you know, things that we do in cockroach you know get contributed upstream like you know our raft implementation is has had many you know servers posted against you know at CD and at CD raft, which is kind of a core component of Kubernetes as well. You know some of the things that we're running into in terms of, you know, being truly multi master in those type of environments these these broadly distributed environments where the speed of light actually does have an impact on how quickly things are actually committing in places in that CD and how we use raft like for atomic replication. So that stuff upstream and so you know I look at those projects is kind of the core kind of components I think are great for people and learn a lot about distributed systems and how to code these things I always learn through code so I find it to be the best way to go so. Really cloud native this is you know it's distributed sequel. You know the category which you know Matt Aslet who's an analyst at 451 group about eight or nine years ago started calling the space new sequel and actually last week or two weeks ago or something he published new research finding where he actually has just agreed this is distributed new sequel. It's, it's more descriptive of what these databases are doing and for me there's really kind of five key requirements to be distributed sequel number one well it's got to be sequel we're not talking about a new language. Let's let's let's let it work and let's be familiar and let it work with all the things we already use. You have to build scale into the system that's a core principle of distributed systems. You have to build for resilience in the system not around it. Don't surround it with technology to make it resilient and make it available all everywhere right I think that's one of another core principle, two of them actually resilience make it available and make it available everywhere. You know if we're going to be a relational kind of database in this new world, you know, having transactions and, and, and, you know, understanding guarantees around consistency also really important and then tying data location is the only way that we're going to be able to get past the speed of period and and lock stock and barrel, a core kind of concept you know when you, when you think about a database and a distributed system. We have lots of conversations with people around this in our in our community and within our customers, you know typically when you deploy a database you know you can prop up postgres or whatever and you know you're going to think about tables and columns and integrity and, you know, and you're just going to architect your, your, your, your logical data model. What's interesting when you move towards distributed systems and and a distributed sequel, you have to start thinking about the physical model and and and how do we simplify the physical model so that you know data does live closest to the user. And so you have to start being dispersed data throughout a cluster so that I can survive the failure of a node or a rack, an entire availability zone, or a region, or for that matter, maybe a Kubernetes cluster. And so you got to start thinking about latency and resilience and what you want to survive when you're in distributed systems because that's what this is all about and I think this is where we're headed with with everything from a cloud native point of view and I think these are the core concepts that really kind of define distributed sequel now as a quick overview. Let me just run through really quickly. This is how we talk about cockroach at a very high level and I'm going to get really deep in the reads really quick. Okay, you know cockroach, and I'll come back to that cockroach is a cloud native database is a relational database. We built an interface on this such as wire compatible with postgres. So this is just the familiar sequel database that work all kind of used to for these kind of, you know these transactional workloads heavy read transactional workloads I think when Kubernetes first started was like, Oh, it's for stateless and how do we do state and I just think of, you know, every application has a database typically right so this is a database that was really built and architected for this new world in fact, cockroach database is spawn of spanner. We actually built off of the Google Cloud Spanner white paper, which you know, you know, if you look at, you know, Jeff Dean and Sanjay Gemelot and Eric Brewer and all they've done at Google and what they've contributed to the world's phenomenal. You know, we took one of that paper and we built it so that it's available everywhere in fact, Kelsey is kind of a luminary in the Kubernetes space. And he tweeted this, it was a couple of years ago cockroaches to spanner as Kubernetes is the board. We are very well aligned with Kubernetes and I think that's a core kind of piece right, but also it's just basically a relational database. This is a series of nodes, all we do to scale the database we spin up a node pointed at the cluster, and the database takes care about distribution of data. Every node in cockroach can accept both reads and writes every node is an endpoint which is another critical point we'll talk about that a bit. If we can scale within a single region we can actually scale across multiple regions, this could be multiple clusters. And how do you implement a single logical database across multiple different clusters so don't federate, you know, the clusters just manage each one of those maybe we just federate at the data layer, right and so I know those people who are interested in kind of Federation of Kubernetes is very, very interesting but we can also do this cross clouds. We can have multi region but it can be multi cloud as well I can have now a single logical database where any endpoint and access data across the entirety of the database that's deployed across three different cloud providers, which I believe is truly phenomenal. One of those things that really really excites me about about cockroach. It's really resilient right we can survive the failure of a node or rack and easy an entire region. It just depends on how we distribute data within the cluster. Right because we're actually writing data and triplicate so in this case maybe have one copy of data in each different cluster, so that if one one cluster goes down I still have a remaining two copies of that data. We can change the replication factor, and really this this distribution of data comes down to, you know, how you actually want to get data closest to latency for for access and resilience. And we're doing it at the row level in cockroach and we'll show you that as well. Again, every node is an endpoint I can ask for data, if I'm on say the US West cluster, and it's going to be able to find that data say on a US East cluster, but I can also do things like. You know I'm going to ask for data on the East Coast and I'm going to make sure that data is closest to user so I can actually you know deal with that. The speed of light right and so get data very very close to users. People love this as well if you start thinking about the the sovereignty laws that are in place and data privacy. Can I have data that is you know German data live in German servers and so that's just all part of the setup in cockroach so we can start to help with some of these complex data compliance. Regulations that are going out there as well right so okay so let's get into the details right so that was about I don't know I guess I guess I did about nine minutes as an intro and probably about five on on like the top level cockroach I go out and try it is the best way I mean cockroach DB core which is the open source version you go download and start you know dealing with yourself, you know go in a cockroach cloud and spin up a cluster and start playing with this today if you want to right. Okay, so great. Now how do we do all this, you know ultimately I think of every database in three layers, I think there's a communication or a language right there is a, you know how do I interact with that and to me that's just sequel, you know we made a choice a long time ago to be wire compatible with postgres, and we've done a lot of work to build out that sequel syntax so that it is familiar from people I don't know I learned sequel in college. And so I've been I've been speaking sequel my whole life. At the very bottom layer is the storage layer how does you know ultimately a database is actually got a right to disk at some point right like that's the whole point. And then in between the language in the storage is the execution, you know, every transaction you know, select star from customer. Well, that's going to get broken down into say three different transactions this, you know, a start the transaction itself and then a commit more complex queries get broken down into lots of different transactions underneath right. And the database actually does that for us for us, you know, mere mortals we don't think about the transactions inside, unless you're optimizing queries you're a DBA these sort of things but when you're building database, this is where things can go right. If it was easy to build a database everybody be doing it but it's all the corner cases across all the different types of transactions and issues that can happen where things get really really interesting and so doing this in a distributed system creates some significant challenges from an architecture point of view. Now, ultimately at the lowest layer at the storage layer cockroaches implemented as a KV store it's it's a cockroach database actually is a database on top of a database. You know cockroaches uses something called pebble pebble DB and pebble is something that actually we built and launched about a year ago we originally built on something called rocks. And we actually rebuilt rocks basically refactored it at some for for for some interesting things that we need to do around multi tendency and whatnot, but we rewrote it and go. And so that's a KV store that that is the lowest layer of cockroach and ultimately for cockroach. Every table every row is really kind of ordered as this monolithic logical key space right so you could think of this dogs table and here we are alphabetically for all the dogs that's the keys right and so everything is basically ordered every single table. Now in a traditional database the way this works say you have an inventory table, you know every time I wanted to write something glove ball shirt shoes bat shoes if I wanted to write ball again, I would append another record at the bottom of this this this storage right for the inventory and I use an index to actually access these things quickly. This is the traditional model right and this is kind of the structure data now structured data to me is elegant right structure data to me is what allows us to model a database like. When you start to deal with like JSON objects and this sort of stuff for a document model database. Once you get above 1012 objects, making changes to these structures become really kind of difficult right because you know you're actually relying on a translation of these things in other places and not in the database itself, let the database maintain the structure and so for us having this kind of relational structures that kind of a core foundational component of what we're doing. But but how do we take this and approach it in a different way so we can actually be distributed and so ultimately like I said we're kv store underneath. Now, this tabular data is all stored in this order and then basically we're using a kv peer kv pair, where the K is the name of the table the index key and column name and then the value is actually the column value so let me just show you how this works right on the table dogs table we have ID name and weight is simple ddl there's some entries over on the right hand side. And so let's just look at this table here's four entries. These things get broken down into two records in the kv store right at first have the table name, here's the key which is the ID here I guess 34, and then the column is going to be name and that value is Carl. We have, you know, the table the dog, the ID is 34 the column name is weight that values 10.1. Now what we're doing is every single row gets broken out like this now we're encoding the key down into, you know, into hex so that we actually have some really really, really, you know, wonderful, you know, sorting going on and whatnot, very, very quick sorting but but ultimately if you look at this like if you look at the keys, they are ordered. If you want to insert something we know exactly where to insert this into the table based on the key, and everything is always going to be ordered and cockroach. In the meanwhile, manages all of this in the background this is just all happening unbeknownst to, you know, the developer who's just writing a sequel query, right. And we're doing this because it allows us for massive efficiencies and in the way that we sort data and allows us to distribute data in a very very intelligent way. We're going to take this this this range right we're going to take this entire table, and we're going to break it down and these contiguous kind of 512 megabit ranges we just upgraded this in 256 to 512 recently. These ranges are small enough so that we can actually, you know, move them quickly while while making sure that the overhead of an index to find these things is appropriate from a performance point of view as well so let's take this table we're going to break it down into three ranges you know we have Carl through Jack lady through PD pintop through Z. These are all my friends dog names, honestly so. And so what we have to do to find these different ranges well we actually created an index structure. And that index structure is implemented very much like a be tree if you're familiar with that that algorithm and how that works right, but that's how we actually find these ranges so that one I want to insert a record I talked to the index. It knows where to find it it's in that red range great I go to the red range. The red range says yeah I have space I'm good insert that record wonderful sunny is now inserted into there. What happens when I want to now insert another record into that range I want to insert Rudy okay great. I didn't have space so it says okay let me split that let me create a new color range, and I'm going to make all this extra space now I have space. It seems really simple but it's really complex to do but ultimately value here is, you don't have to worry about horizontal sharding anymore we've just done this. These ranges can be thought of as shards or tablets, and the database is just basically automatically doing all this there's no like layer around my sequel or postgres, this is literally at the storage layer of the database as well so there's a lot to my performance and more importantly from a distribution point of view. So being truly coordinated cloud native and architected from the ground up to do this is actually paramount let's not just reuse my sequel, and allow that to happen and amortize this, this this range splitting and whatnot above my sequel instances. Let's actually go in and rework the database at the storage layer to do that and that's exactly what we're doing here. We use raft extensively in cockroach. If you're not familiar with raft it's a pretty important protocol or algorithm I should say if you're doing distributed systems I really think it's an important thing to actually understand. And really it's a distributed consensus algorithm that allows us to provide these atomic rights and consistent reads so actually somebody's asking how is data integrity taking care of example, updating the same row from two different nodes and locations. We're going to actually, I'm going to show you actually how we do that in a second. Okay, so I'll show you exactly how we how we go through that and its combination of raft and MVCC is the algorithms that are in here and a lot of really magic stuff that you know our software engineers built out right so so raft. That was a question in the Q a you all sorry I didn't I didn't actually tell you what was going on sorry my bad. Raft is a distributed consensus protocol it allows us to have these atomic rights and consensus and consistent reads across the distributed system now raft is implemented as a series of replicas right so within the protocol there's this concept of a replica set, or a group of replicas right so here you have number that blue range that was on the last I think I had money in it. Right I have three replicas of that range. We can do it is 579 it's got to be odd it's an odd number of replicas, because we're going to actually get something called quorum rights, and this this is related to the question that was in the q a right when we write something to one of these replicas, because we're going to make sure that it's going to be right because two of three of the replicas are going to be insured that this is going to have you know the right data in it right and and we're going to get that quorum that quorum right. So the concept of a raft leader the leader is is elected amongst the three different participants, and then the other two are followers. The raft leader is a special note or special range if you will. It does handle all kind of authoritative updates and is kind of the system of record I guess for that for that for that replica set so it's a really important piece. And this allows us to do atomic replication because we can say hey look at Raft leader insert this data into your range, and it's going to make sure that it happens, and it's going to be correct across the entire group. Right it's going to ensure that consistency happens to have two of three actually commit the third one's got to come along, and the raft leader is always going to be right. Now, if you want to learn more about Raft there's this really wonderful website, I don't know who did this but I just want to thank them personally. It's called the secret lives of data. It's one of these I think it's if you go check it out it's pretty straightforward but it gets much deeper into the protocol I'm just giving a quick high level overview and the context of what we're doing and the QR code there should work for you. If you're interested in going out and getting that and again this presentation slides will be available afterwards so. Okay, so now that I've gone through Raft I've gone through kind of storage layer and and how we're actually breaking things out into ranges so that we can store that in a KV right. And we've talked a little bit about Raft now how do we use Raft to actually distribute data. But when we when we replica data in a cluster we actually want to think about, you know we want flexible options right we want to have, you know, kind of different signals that are going to power how we actually move data around and where data lives within this distributed right and so you know one of the things we think about is is typically diversity or kind of balance and utilization of say say storage across the cluster. And so you know I think the most simple way of thinking about this we'll take this range. We'll take this Raft group right the first one will write it in three or these are four physical nodes over there on the right hand side we'll we'll write the first Raft group cross we'll write the second one will write the third one. Now I've evenly distributed this data across four different notes. Now this could be nine it could be 15 it could be 100 if you will right and so cockroaches going to be smart enough to just evenly distributed this data based on what we want to actually you know what we want to actually you know either survive or how fast we want to get access to users information. One of the other heuristics that we look at we look at load, you know, like let's say this this middle range lady Lula muddy and PD is really really popular because you know muddy, my dog is the best dog ever. And everybody wants access right so I'm joking. You know so we can actually, you know, write those ranges on those that are less busy so that we can actually say hey look at, you know we have this range it's a little hot let's let's isolate it off on a node so it can actually, you know it's not going to hurt the performance of the database because you know lots of things are kind of in conflict on a particular node right. And so we'll do that as well, which I think is kind of cool. And we're using heuristics all the time in the database to actually deal with this. One second let me just make sure this is going away. All right, yeah so and then we're so that's by load. And then finally, we can also do things and do something really special. Something we call geo partitioning, which is a low level kind of constant in the database. Well what this does it allows us to write data to a particular location. When you spin a note up in cockroach you name that node, right and you, you define it as living in a region or a set of nodes that are kind of logically kind of composed together right so you could imagine here I have three of them I have you know us west us to maybe EMEA over in Portugal and I have three nodes in each. Now what I can do is at the table level at the row level, I can kind of overload the key, right and I can add a column to the key. So I can add like say country code was in each one of these dog records and there was EU as a country code now if I overload that and the K and the KV right and this is that what I went through at the very beginning. Now when I or when I order everything's well all the records that have EU are kind of at the top everything us is in the middle everything. And so what we're doing is we're now ordering this this lexigraphically ordered KV said using data that's actually in in each row. And now I can say at the table level hey I want all records that are that have this in the key to actually live on servers that are in a particular location. And this is some really magic stuff. And this comes back and this ties together kind of what we're doing at the storage layer with raft and how we're actually distributing this data and where it actually lives and the rules that we actually live and that we use in the database to actually do this so it's a pretty cool concept and and and and enabled by kind of lots of things that were going on across the entirety of the architecture of the database. So we can do things like we scale right so how do I just how do I scale this out so when I spin up a note and pointed out a cluster cockroaches smart enough to understand like oh wait I have this new note I have new capacity. What will happen is is it will rebalance the data. It'll start moving ranges around. You don't have to do anything it just does this so again, horizontal scale without any work, which I think is pretty phenomenal. I don't actually worry about application code logic application logic dealing with new shards. You just asking a node for data and it's going to find that within that within that cockroach cluster. We can also survive the failure of a node itself right so. Oh my gosh this this raft group. This raft group here is missing one of its partners and it's like oh my gosh get this thing back so it's smart enough to understand that I've lost one of my one of my replicas in my replica set the three copies, and it's going to create a new one somewhere else and within the cluster itself now that's pretty cool as well right and so, but we can also survive these kind of smaller failures when you know there's like a small little hiccup in time you know we'll just actually use logs to catch things back up. And so the raft leader make sure that all the replicas are right all the time and it'll just basically replay and make sure things get back to back in order across all the replicas now this is really cool in terms of scale and resilience. I think another feature of distributed systems and when I think about Kubernetes and pods and how we actually deploy, you know compute, you know, automating rolling upgrades within a system is also pretty cool and cockroach was built for that right like this. It's going to be able to spin a note down and spin it back up, and it's going to be able to do this scale and resilience. It's just going to basically survive those those sort of small failures. Well, I can just bring a note down and bring it back up with a different version of the software. I think we're backwards compatible up to two major versions which is I know a year a year and a half sometimes. I think cockroach can actually do that too so so these kind of rolling upgrades is another kind of key kind of cool thing and I think if you if you design your your application correctly, you too should be able to get rolling upgrades right so if you build scale into the system. If you build resilience resilience into the system, and it's automated so that actually I just understands as a single entity that if I've lost something I could actually come back from that. So then you're going to be able to get to this concept of rolling upgrades which is really really super powerful, especially as we get to production and we want you know always on always available services no matter where they're at on a planet. So, another kind of core concept. Okay, so I'm going to take a quick glass quick drink of water. So let's talk about transactions. There was a question in the QA about transactions how do we ensure consistency of transactions you know, this is not something that's very simple to do I mean if you have a database and you have you know, a transaction happening in Sydney and you have a transaction happening in New York, and it's all happening against the same account or the same customer record, who wins. I'm going to go through that I'm going to go through MVCC and actually how we actually deal with these kind of conflicts. You know what it's going to mean to you the developers that sometimes you have to actually wrapper things would try catch blocks. It's probably a best practice anyway I was a hack programmer I never did that sort of thing but but that's kind of what we're doing but we're going to guarantee in cockroach. We're going to guarantee serializable isolation which is the top level of isolation in the database. If you're not familiar with isolation levels, by the way, don't be I wouldn't be surprised and a lot of developers don't even think about this stuff. It actually gets, it's pretty important when you start thinking about you know the things that can go wrong in a database you know like a non repeatable read or a dirty read a phantom read. There's lots of issues that can go on with your data and you know as we've seen you know deep hacks coming in from all over the planet, thinking about these things in your database actually becoming more and more important. I think you know as as we get into this kind of more global environment and so I think isolation levels are really, really critical. So, how do we then implement these distributed transactions right we're going to, we're going to optimize for isolation level serializable and we're going to optimize for the speed of light here right and so so what are we doing. Right so well at the at the core I just talked to this basically I mean the core of every database is acid right Adamicity consistency isolation and durability. What we're talking about here really is the I the isolation level in our database right and so what and this kind of how we do it this is a quick walkthrough. And that just works I'm not going to get into the corner cases where things can go wrong but I'll show you a little bit how we actually do that right so here's a simple transaction I want to insert these two records into the dog table. My ranges have been dispersed across you know these four physical nodes that we've been working with before. And so I'm going to ask any one of these nodes to insert this record now I've purposely drawn this gateway as as as green because it matches the green nodes of their own. Remember, any node and cockroach can be an endpoint I can ask any node to do this now it's going to find, you know, the raft leader for this particular for this particular range that I want to insert this this first part of the transaction sunny. And that range is going to actually create a special kind of system record. That's basically saying we have a pending transaction on this range. And what it's going to do it's going to go communicate with its followers and say hey followers, write this but write it with like a temporary record status. And as soon as one of those followers comes back because I need two of three right because I'm going to be guaranteed right. Well as soon as one of them comes back it's like great go on to the next step. I can go on and I can actually start the second trend look at this this acknowledgement trialed and I already started the second step here, the second transaction right. Now I'm going to write Aussie it's going to go out it's going to do the same thing as soon as I get something back it's going to return an acknowledgement. Now this range and the raft leader is now set this transaction to commit. It's going to communicate back within the gateway itself it's the transaction is going to finalize, and I can send acknowledgement back to the originating user that actually asked for me to insert these two records into the database. So that's kind of how it works right like and so that's how ranges are used in the context of a transaction now you could imagine complex transactions and the amount of ranges that are going on and cockroaches going to handle and deal with kind of movement and kind of placement of all this data, and when these transactions are going to occur now. There's something special. Another kind of core kind of consent concept in distributed systems if you're not familiar with multi version concurrency control. I think it's a pretty cool algorithm. You know, funny enough, I find the Wikipedia version of this to be pretty valuable. Actually, you know, if you want to go read how this actually works but think of this is the how we get the isolation level in our asset transactions in a distributed system so I just got to walk through the algorithm really quickly. Now again, look at like me talking about this seems pretty simple. You know, kudos to our engineering team and how they've implemented it right because actually implementing this in a single system is difficult doing in a distributed system is truly tremendous and so I'm going to walk through a very high level version of this but again, you know, if you're really interested in seeing how we did it go check out our code. Again all available to people so so in in MVCC there's kind of three base components there's a transaction there's a timestamp, and then there's the object of that we want to actually affect right and so when a transaction occurs basically what we do is that time zero we've created a timestamp for this transaction at zero and I'm going to actually say write this data to this object now the object gets this and this is arbitrary time this isn't seconds it just basically steps right. What the first thing it does the object has two timestamps it has a read timestamp and it has a write timestamp. The object says like hey wait a right came in. Let me tell you exactly when the right for this right came in the last right came in at 01. Now what it does is it says hey let me create this temporary object because I want to always be in a good state like the object should always be in a good state but if I'm doing work in the object, create a temporary object and so I'm going to go write that data, and then and then the time it took to actually do everything and bring everything back to the original object right in my temp object and bring it took about two seconds to say and and now I have a read timestamp so basically the work I did is now up to date and that read timestamp is at three and I can send the acknowledgement back to the originating transaction say everything's good. Right. So what's cool is each row now has this read timestamp and a right timestamp so you'll see the right timestamp is still the last time a right came in and then the read was when it fully committed it and went good. So let's do this again. Okay, so great we'll start again and we're going to write timestamp to one I'm going to create this temporary object. And now while I'm doing work in this temporary object because I want the original object to be right correct you know I want it to be correct I should say not right. Another transaction comes in and it says hey I want to write this data, but my now my timestamp for transaction to is two seconds. What I'm going to say is like wait a second your two seconds is actually greater than the read timestamp. So wait a second. This can't work because I haven't finished doing something else yet. Right. And so in the time it took me to do that another thing came in, you need to deal with this right now we're just going to say hey basically is I have to reject this transaction because this stuff didn't work so this is on a single object and across 15 different well I might have to roll back and do all these different things right so we've actually done all this in the database to actually handle this sort of thing now this is for a row of data in cockroach. This is this this is a very generic algorithm that as we think about data as we think about code in our systems, how are we dealing with these sort of conflicts in distributed systems because anybody can be asking an object for data at any one time right and so this is a kind of a really critical kind of core algorithm. In distributed systems that I feel is kind of cool and pretty awesome now long story short, you know it's like standing in line. You know every transaction happens in order and ensures that that you know that the till isn't going to be out of order but but then again I'm just the marketing guy. This is a kind of a top level understanding of how this stuff works right so. So, let's go now to sequel execution so actually there was a question that came in the chat. Are there any locks that are put when someone is accessing a row or range of rows for dml operations. Well that kind of was covered in how we actually deal with transactions there's this kind of special hey there's a transaction pending so that's how we actually do that and then you know we're using MVCC to to ensure that that this is going to happen throughout. And then another really great question here was, what's the minimum number of nodes needed to have a database up and running, you know, I think it's one or three. In my opinion, you can do to but that doesn't make a whole lot of sense right, because if you're going to have replicas three replicas, you know, you want them across three places are, or maybe it's just all on one and you don't care about the resilience right and so I think that's kind of one of those cool things but typically four is typically the best chance because you can survive the failure of one node going down. You're always going to be secure and be resilient. Typically we'll see people do three or four in multiple regions, you know, if you want to survive multiple regions will do like four in the west four in the east. You know, four in the central whatever that is. It really depends on what you want to accomplish with the database you know I think the easiest way to do this is just use cockroach cloud. And let us manage kind of how that all works and help you accomplish your goals right so. Okay, so now distributed SQL execution so again you know we've talked about the KV layer. Now let's talk about how we can actually push down compute to each of the where the data lives as well. We compose queries so that we're actually distributing this sort of stuff. I like to think about this you know I was originally, you know I was at a company called Hortonworks and you know this whole concept of MapReduce actually originated at Google as well I mean this is another, you know Jeffrey Dean and Sanjay Gemma white paper around MapReduce the core concepts that were there were actually applied in Spanner and then are applied here as well right and so you know this isn't like let's coordinate my SQL instances or let's coordinate you know Postgres instances. Now this is a complete rework to be distributed of the execution layer of the database now there's significant benefits in doing so from definitely from a performance point of view. Importantly, what does distributed mean when we get into kind of cost based optimization and costs of transactions, how we actually going to optimize the database over time and so you know, reworking and re architecting not just the storage layer, not just the language but really really this execution layer, and not wraparound different executions with this stuff is really critical and something we've done so let's take a simple query we're going to count. You know, we're going to count the number of countries across my customer table right, and what I do is I just basically take that query I push it down to one of my nodes it's actually going to form a scan locally there we're going to do scans locally in each of the regions. We're going to take that data we're going to do the group buy in each of those regions we're going to send that data back right map, and then reduce right bring it back, and then result, and then send results back to the original Now again, we use this and chose to redo everything ultimately because we can do cost based optimization in this and I think if you're if you're familiar with, you know, traditional database a CBO is such a critical piece and kind of a core thing of how we actually deal with these kind of improving optimum of transactional. Oh man sub. Oh my God I'm totally our P99 latencies under five milliseconds how are you going to do that. And so we are really hard to help you do those sort of things right. Okay, now let's talk about latency. I've got about 20 minutes left I probably got about 10 more 15 more minutes of stuff here maybe 12 I'll make it 12 right. So let's talk a little bit about latency and let's kind of see how these things kind of let's take a step back and kind of come back up again. You know, you know back in the day when you build applications you had a server and database for me, man, it was so long ago this stuff all lived in actually one single box. But then you have a database server and application server but but it was like you know literally you had ethernet between the two is really super fast right. You know we are distributed and people are all over the place you know now we have we're dealing with users that you know the speed of light is actually you know a thing here right so we're on trip times of 12 milliseconds or 70 milliseconds. When you're doing that across multiple different transactions can be an issue. But we also do things in our implementations today to cope with outages right and how do we do backups. And how do we have a secondary primary or primary and secondary you know database so that I have failover right this has been this active passive system that we've had in place for years, decades. Right and so, you know this this active passive way to me is a thing of the past, and I think it comes out of again, kind of core principles and distributed systems where everything is active how do you make everything active. Active active database for us but how do you make sure that every one of your nodes and in your compute nodes and what you're actually building out are active right and let's not have failover systems because they're just a waste. They aren't used it wastes it's wasted capacity. There's just lots of reason and honestly asynchronous replication is only going to be so good because again you're dealing with speed light, and things aren't always out of already aren't always on sync. What happens when the primary goes down and you bring up a secondary and then the primary comes back on. How do you mitigate you know the differences between those two. Lots of different problems with kind of this active passive system right. You know for something that's more of an active active system what you're doing is you're taking smaller instances and deploying them in many different places you know for us. We have five different regions, so that we're going to guarantee that we're going to be sub you know 40 millisecond round trip times, no matter what happens across each one of these things so that when a user access data I say in New York. They could go US East we're going to have about a 24 millisecond read over on the West side or on the West Coast maybe there's an Arizona person there, they're getting something back in about 32 seconds right or yeah it was about 32 seconds right 31 right. So but what happens when an entire region goes out. Well, again remember if we've distributed the data correctly across this these these these regions. Well that user in Arizona can just redirect and go against the US West and actually, and look at those round trip times are about the same. So we actually haven't really really even lost anything right and so there's just ways of dealing with kind of these these latencies and outages that that get really really important when you have this active active system now. I think active passive is interesting as well from a backup and restore point of view you know I think that's one of the reasons why we have this is backup you know. If you have a distributed system you know why is would you back up everything to like say one big server that's sitting in Oklahoma. No you want to actually do distributed backup and restore so you have the you know this this same way you have a distributed system for you know servicing queries these read writes through every node you actually want distributed backup as well and so. You can actually deal with this within something like an active active system but again another core principle another thing to think about as we go distributed you know think about all the other things that have to be distributed that go with those things I think that's one of those big complexities as we more move forward towards this kind of distributed mindset and distributed thinking. You know our last release we made this pretty simple to actually implement. You know we've boiled down this whole like where data lives and how we actually deploy a database across multiple regions down to four simple DDL statements. You know we're defining cluster regions, and then we're placing a database within those regions right so we're going to start a node and place it in a region and then we're going to say hey this database. This is the primary region of East and then there's Western Central so that's where data is going to live. We can basically change goals of survival to be regional failure or maybe it's a availability zone failure. And then we can actually do this down to the real level. I'm not going to get too deep into this but lots of great stuff in our documentation on exactly how this stuff works we have really simplified configuration of the database itself down to four simple DDL statement which is really impressive work by the team here and pretty you know I'm still in awe of it. But again I think our docs does a great job of describing this and other kind of some of the principles that that I've gone through here so. Okay, so last topic distributed performance optimization so this one's a bit more even even more academic and even more down the ways but I think when again one of these things that you know thinking about distributed systems and think about how to actually deal with you know the speed of transactions right and so we're doing things we're doing lazy transactions, and we're also doing right pipelining. Right, and so let's look at first of all let's look at lazy and pipeline together so here's our transaction we're going to write these two records and we're going to commit it right insert into table to records right. So, in a serial way we start the transaction, Carl comes back, okay Carl gets written it comes back, start the second right it's it's you know the second right transaction and begin is transaction Carl Nigel right we're just doing the four things. Nigel comes back and then the commit is going to come back with the keys, and we're going to come back this take a total amount of you know one one two, three, four round trips. So lazy and pipelining what we're doing is we're actually going to write Carl we're going to write Nigel and both of those things are going to come back. Then we're just going to simply commit this and come back right so we've actually eliminated the need for this kind of first pending because remember in our transaction thing the raft group is actually dealing with this pending record. Right, the, the, we're using raft and special nature there and actually deal with that so we don't actually have to have a separate transaction to start this this this kickoff this pending. If you will on this particular transact we're just going to start writing these two things against these two ranges, and it's going to deal with this pending record. And when those two things come back, we're just going to commit and and we're going to go back and so we've actually saved, you know, this amount of time. Right, and so that's what we're doing with lazy and pipelining. And again it's all part of kind of how we store and how we actually execute execute transactions. Now parallel commits is something that we did over the past two years, which I find to be truly phenomenal the only way that I could describe it in English is beyond all this stuff is, you know, we couldn't change the speed of light so we changed the photons. What we send over is is different now we basically take a we look at a transaction, we look at all the information around it we take a picture of that and the transaction itself we commit on one note and we send it over to the next. And the next note says hey as long as the picture looks the same over here I'm good too, and it could immediately go back and say hey we're going to be good. So, what we're doing here this is a lazy pipeline and parallel so we're going to do Carl Nigel and we're going to actually come we're going to forward commit that thing because we actually know that it's good on one node, right. And then what we do is when all these things come back, we're good. And so we've actually replaced this kind of centralized commit marker we're doing current Nigel and then come back and then the transaction thing. We send it all at once, we're basically taking everything, and as long as we're good on one side, right the first line, and when we're, and we're all just goes over and we check on on the other side we can just actually say everything's great. Send it all back. I'm really, really amazing software engineering and I think the first time I saw it was, I was kind of demfounded at the, what it meant for us from a performance point of view, especially in broadly distributed systems when you have things that are happening, you know, in Sydney and in New York at these things really matter because we're talking about you know what three or 400 millisecond run trips, you know back and forth a couple times you know you want to be sub 100 millisecond that's really what it means to be kind of in real time for humans right I think that's that's where it's discernible for humans to be. Yeah, real or you know after that we actually notice the lag if you will, right. So I have covered a lot. I gave you a quick overview of cockroach. We went through our storage layer. We talked a bit about raft, we talked about how we distribute data within the database. We spoke about distributed transactions, I went through MVCC. We talked about how distributed execution is a little bit different than basically distributing just data or using multiple instances of a traditional execution engine. We talked about latency and the speed of light and and how we can actually, you know do optimizations from a performance point of view for that so the last thing I wanted to touch on before I, you know, I kind of end this and open it up for any questions. You know, when I first saw cockroach actually I was not it was we're on stage. I was working at CoroS at the time and and if you're not familiar CoroS kind of one of these companies been around friends of the Linux found Linux foundation in the CNCF for years and years, and kind of innovators in the Kubernetes space there have ultimately bought by Red Hat and the CEO of CoroS Alex Bolvie wasn't on stage showing off Kubernetes showing how you can kill a pod and the data and the system just comes back right like it was awesome. And the only application they used to show this off was cockroach and this is about four years ago. And I think, you know, for me that was the moment in time where I was like wow that's really cool I just had database that had the very zero to very limited impact and it's in its performance, and I'm just killing off pieces of it. You know, you know fast forward a couple years and you know I remember, you know the Federation group within SIG Fed was trying to figure out oh man how are we going to federate Kubernetes clusters and I think, you know the work that people like you know the upbound team are doing with crossplane and you know there's their scupper for networking, there's a lot of stuff going on but for me why not just federate at the data layer why not just have a single logical database that that's going across you know multiple different Kubernetes clusters. And so for me I find multi region and global scale to be kind of the future of what we're going to do and for us if I could just deploy nodes across, you know, multiple different Kubernetes clusters and have them all participate and any one of those end points can see data in any other cluster. Why worry about the the operational nightmare of dealing with the cluster and making those things work in multi regions. Let's just let the database deal with it. We have a demo that we did this I know it's on our YouTube channel, myself and my friend Keith McLellan goes through a really great presentation of this I think we're going to do this one again we're actually, we use cube doom to actually kill pods as well which was kind of fun so that one's coming but I to me this is, you know, probably one of the coolest things that you know I've seen in this space and I think the future is truly hybrid and multi cloud. I think people, you know, they question it, you know networking security, these things are a challenge and distributed worlds, especially across multiple different networks but I think it'll be interesting how this how this comes about right. We also have a Kubernetes operator, you know, while we are kind of like perfectly aligned with Kubernetes we don't need a Kubernetes operator to do, you know, all the basic install and scale and resilience and all these things about how you find data. You'll find other databases really really complex operators for us it's all about the day two operations and the stuff that we've learned by deploying cockroach cloud on Kubernetes we have, you know, the thousands of nodes nodes running on cockroach cloud and it's all being you know managed by Kubernetes from an orchestration point of view so it's kind of helping with management the rolling upgrade stuff, you know, what are the best practices around security so we're taking basically our best practices of running kubernetes cockroach at scale on Kubernetes and packaging them into an operating operator available. Lots of different people using cockroach, this is a sample of some of our customers. You know, we often get asked, you know, cockroach is just for global workloads I don't have data all over the planet. Actually, you know, I would say a fair majority of our customers are using us in a single region and that's because, you know, the pure elimination of any you know, of sharding and, and this kind of elastic horizontal scale that we can give to people is a huge value. You know the ability to survive any failure is just it's a phenomenal piece and just let the database do that so you know we're being used for kind of general purpose workloads and general purpose applications and then lots of these kind of system of record where you know, this is this highest level of isolation right. And so I think this is one of those things where you know if you're if you're thinking about being distributed running something on kubernetes. We're really just containerized running any environment you know cockroach definitely be on a list and it's kind of one of these things that that that you know I think once you see it you can't unsee it is funny thing that I've, I've heard from people so you can learn more about cockroach you can take some coursework everything is there it's it's it's all for free. You know we have some stuff on distributed databases, you know cloud native apps. Lots of general purpose SQL stuff and then lots of stuff about us and how to build with Python and Java. You can go out and do that. That's that's all available on our website but then again as I said you know as I mentioned before I think you know cockroach cloud is is really one of the easiest ways to get started and you know if you want to go start a cluster today that's that's that's the way to do it so. Okay, that's 54 minutes 50 with the intro so I went through a lot. I did want to actually. There was a couple questions I think somebody was asking about security. I think Tim answered these live I'm not sure. But there's lots of things that we're doing for security within cockroach itself. You know we've gone to great lengths to make sure that we have all the core kind of concepts that you would need in a database that you would expect in a normal database are in cockroach like all the management capabilities. You know integration with other things you know how do you actually you know we actually improved our logging so it works a lot better with things like Splunk and data dog. And then in the last release you know we set this up with a Prometheus endpoint right and so lots of different things that we're doing around the core of the database to make it work the way you work security be one of those things you know how do we integrate with LDAP. You know secrets management within this you know do we you know we encrypt data arrest and data in motion. We've lost between you know different endpoints and whatnot so there's lots of things that we're doing if you want to deep dive into all the kind of components of security we've implemented this. Again I just I always push people towards our docs I think they're a phenomenal resource the team does an amazing amazing job. So let's see any other questions there was there was a lot of questions and thank you Tim for jumping in and doing all this and and and reading this and answering these questions I'll go through a couple of them just to kind of talk through I have a minute or two. Somebody was asking for, you know, do we support spatial data. Yes, we actually implemented those libraries in cockroach itself as well so we actually do now support spatial data in a distributed database, which is actually added added some interesting kind of concepts there. So we're actually doing that as well. There was a security question. Oh my God almost everything. Somebody was asking about bulk loading and how we actually do that. Yeah, we do, we do actually provide bulk lading capabilities. But I think we think about it in terms of the backup and restore capabilities and the batching capabilities that we have you know I know our, our IO team thinks a lot about these things. And the last answer again is go to the go to our docs I think there was lots of information there on this as well so let's see. Yeah I think I've answered everything else. Unless there is not any other questions I mean that was that was a lot of information in a little bit of time. I do hope this was valuable to everybody. I hope that you know, you know this wasn't just you know just cockroach talking about cockroach and try to make this a little bit more about, you know, some of the principles that we use in underlying architecture to hopefully you know open your eyes to some of the things that are out there and some of the things to actually look at so again I'm Jim Walker I'm just at James jay mce on Twitter. I'm always happy to answer questions you know our cockroach community slack channel is an extremely vibrant community lots of people asking and answering questions there. But you know for me go out and try it. If you're interested in this sort of thing and then if you're looking for a PhD and distributed systems I still contend that that our code basis is a good place to do this so on behalf of, you know, our entire company and a lot of people worked really hard on a lot of really cool stuff here. Thank you for joining us and thank you to the Linux Foundation for having us today. Have a great day. Thank you so much Jim for your time today and thank you to all the participants who joined us. As a reminder this recording will be up on our Linux Foundation YouTube page later today. Okay thanks so much everyone have a wonderful day.