 Welcome to Scaling PostgreSQL, Never Shard Again and Gain Efferless Scale. I'm joined by Chris Casano, Solutions Engineer at Cockroach Labs, and Jim Walker, VP Product Marketing at Cockroach Labs, who will discuss how CockroachDB delivers scale with primary keys and cockroach, range partitions, and the architecture that delivers scale, automating data placement for latency and resilience, and resharding in production without downtime. My name is Lindsay Hooper. I'm one of the PostgreSQL conference organizers, and I'll be your moderator for this webinar. I want to tell you a little bit about your speakers. So Chris Casano is a Solutions Engineer at Cockroach Labs. His passion resides in assisting customers on use cases, technical adoption, and success with CockroachDB. He has implemented and assisted customers with data solutions in multiple industries for over 20 years. And prior to Cockroach Labs, he ran the Northeast Solution Engineering Team at Hortonworks Cloudera. So welcome to Chris. And Jim, Jim is a developer who's spent a large part of his career helping people understand complex emerging technologies. He's passionate about distributed systems, data, and open source technologies. Jim is currently the VP of Product Marketing at Cockroach Labs and has helped build companies like Hortonworks and CoreOS. So that's it for me. I'm going to hand it off to Chris and Jim now. So take it away. All righty. Thank you, Lindsay. Thanks for joining me, Chris. And thank you, everybody, no matter where you're at in the world for joining us and taking your time to spend it with us. You know, you give a lot of presentations about Cockroach and Cockroach database and lately, you know, the amount of interest and the amount of inbound and just people asking to learn a little bit more is actually increasing, which I think is great. But I also think it's a little bit of a, the consequence of kind of the climate of what's going on right now with what people expect around scale and delivering kind of these rockstar experiences for organizations and our customer bases. You know, we're seeing people that have used, you know, traditional kind of auto scale kind of databases where, you know, you maybe have a sharding layer that sits around something like a PostgreSQL. And we're seeing those things fail over as, as basically scale becomes, you know, a daily thing for organizations. So this is a little bit timely, but I also find this to be an extremely interesting topic because once being a developer and once being somebody who dealt with databases, you know, sharding is something that I find to be painful at best and not something that I really love to do. So I do encourage everybody to ask questions in the chat. There's already been one and we'll definitely get to that in over the course of the session. That's that question. So I can answer a lot of things. Chris is much more technical than I am. You know, we both work together at a company called Hortonworks. So if you're familiar with it, oops, so we get distributed systems. I was, I've gone on and some heavy. Chris still calls it Hortonworks clutter. I can't do it. I just call it Hortonworks, but hey, you know, so little bit differences between us, but what we're going to do today is I'm going to give a background. We're going to go into a demo. I'm going to use a couple of slides to show you how we actually do sharding underneath just to give you guys a better sense of how that works. So you can actually, you're going to use this, how you actually choose primary keys in cockroaches is a really critical function. The database will take care of all the sharding underneath. And so what I'm going to show you in slides, I'm going to come back and do one last demo and then come on. I want to get as much time on keyboard as possible. And in the theoretical part of the slides as we can, I'm going to do this demo and then I'm going to show you what we can do. So, these are some graphics from digital ocean. I did quick search on Google, but you know, it's, it's, it's traditional. It's a way to horizontally scale a database. It's been with us for a while. I've seen and I've heard of instances of my sequel and, and, and postgres scale out to maybe 100 or 150 different shards. Can you imagine managing that? Good God. You know, things to manage those sort of things or you hire like 5,000 people to manage all those shards, it's not so. But it's a pretty simple process, right? You choose a shard key, you apply a hash function, you create shards. Ultimately, you know, the database is now scaled between these two different, you know, shards, the table is broken down and to do for different areas, you can place it on different servers. So now you can scale out basically the database and increase performance, right? So let's just start with shard. Good example of this is again from from DigitalOcean. You know, we're going to shard here on price, right? And so you can take all the records that are between, you know, $0 and $50 and put them into a shard, everything from $50 to $100, everything 100 plus. Now I have taken a single table and I broke it down into three shards. And these three shards can now be deployed and handled or managed in a separate instance of Postgres. Now, there's a there's a couple of challenges here with doing this, right? You know, for developers, I find it to be difficult to deal with. And you know, some people, everybody has different ways of doing these things, but the more complicated SQL is and the more additions I have to do around the sharding logic, the more amenable I am to bugs. And what happens when I actually change a shard or add a shard? What happens to my application logic? What does that mean for us? What happens when we want to actually change the scheme of a table? And what does that mean for the shards? And what does that mean for downtime for the entire system? From an operations point of view, when I do a resharding, just to scale out, if I want to really carefully think when we're going to do this and actually manage that process, it ends up being its own implementation in itself, because there are application changes, there are changes to the backend, there's changes to the load balancing. There's a whole bunch of considerations that come into this. And so you're also doing this probably in the middle of the night. So you can actually avoid downtime or avoid outages for your customers. So a little later, we're going to actually show you how to do rolling upgrades. So you can have a database change out versions of software and nothing's ever going to be out, right? So we'll actually show you how you can actually accomplish this with CockroachDB as well. A single shard failure can cause a ripple effect on resiliency for the rest of the database. Backups of sharded databases are difficult. And then finally, if you want to have a failover architecture, an active passive system with a sharded database, you're basically doubling costs. The failover often is quite complex. When you start dealing with RPO and RTO, mitigating the differences between the passive database and what was active for an hour or whatever that time the systems were out, doing that manual remediation of getting the data correct in your backend systems, when it's sharded, just adds another layer of complexity. Because failures happen, and it's just a fact of life. And so how do you actually deal with RPO and RTO in a sharded system? It's difficult. And so I don't need to spend too much time talking about the challenges of sharding. I think we've all dealt with it. There's incredible power in this though, right? I can take a single instance of Postgres and now scale it out. And I have three instances all working as a single database. It's awesome, right? And so done this myself, it's actually, it is, there's a lot of value in there. Real quickly, there was somebody asked, will the webinar be recorded and posted? Yes, it will. So if you need to drop off, God, by all means, please do. If we bore you, then I'm sorry. But if you need to go because of meeting, then take off. But we promise to try to keep it exciting. And like I said, I'd like to get to Chris's stuff as quickly as possible. So there's been this advent of kind of a new type of database that is dealing with this problem. And it really comes from really a lot of the lessons learned at Google in say the 2005 timeframe when Google started to go scale. They had big table. They were trying to run transactions in that. They had lots of instances of MySQL going on. And they basically said, how can I actually take the best benefits of kind of a no SQL type database and add transactions to that so that I can have something that is going to be, you know, a system of record and a solid database that can actually scale like we want, but still work like a normal SQL database. They created something called Google Spanner. And they published a white paper on this. In fact, a lot of the core principles that are in CockroachDB are directly descendant of Spanner and some of those initial, some of the initial work that the Google team did. In fact, you know, the three founders of this company actually come from Google. But what we're doing here is we're taking all of the beauty of a relational OLTP database, we're combining that with the scale and the resilience of no SQL, and then adding in a bunch of features that actually allow you to do some really interesting things in distributed environments, either distributed nodes within a single region, or in a single AZ or single data center, or globally dispersed across the entire planet. And so that's really kind of where distributed SQL has come in. And really, we feel this is the right database architected for these kind of the cloud, as we become much more distributed, right? Real quickly, Cockroach, we are cloud-name distributed SQL. This is standard, developer-friendly SQL. We are wire-compatible with Postgres. So every all the drivers and everything you already use, drop in, you can use. From a SQL syntax point of view, we chose to use Postgres because we felt it was the database that was most broadly used from an open source point of view for these enterprise-type workload applications. Whereas I think there's a lot of, you know, usage of my SQL out there for other types of use cases, like these kind of use cases where we were going like these heavy system of record OLTP seem to have been dominated by Postgres. And that was our choice early on. Now, from a syntax point of view, we do, we cover a lot of the syntax. We have broad coverage of it. I think there's two areas in particular that we currently do not deal with. You know, we think through it, we talk through it. What does it make sense? Does it make sense or not to do these things in a distributed database? One of them is stored procedures. Not many people designing new systems with stored procedures in them anyway. It typically comes up on migrations from Postgres to CockroachDB. And so we do not implement those. You know, we feel that, you know, running these things as a service, as a Lambda function or as a microservice does make a whole lot of sense. So, you know, you can take advantage of Git repos. And maybe there's feature flagging. You can manage these things and unit test them more correctly, right? And so I think, you know, there's lots of reasons why stored procedures have their challenges. Their own challenge though, in a distributed system in that, do you really want to actually have, you know, data moving around in a broadly distributed system? So, there's things we're doing around that. But that's kind of one of the areas. But regardless, this is standard SQL. This isn't a new language. This is standard SQL syntax, wire compatible Postgres. I'm going to talk a lot about effortless and elastic scale. How do we scale up and down without actually charting, dealing with that? And then, you know, the database just deals with that. But it also deals with failure. If a node goes away, you're going to see very minimal impact on, you know, transaction volumes and the performance of the database itself. And then if you're going to be distributed and you're going to have nodes and data living all over the planet, you need to provide local latencies even at global scale. How do people actually get the local experience for all transactions when I have, you know, nodes in New York and I have nodes in Singapore? You know, this is not a simple problem. I think ultimately for us, you know, there's other people who are building out these kind of distributed SQL databases. I don't, like, I don't, like the performance for them is important. I think we all perform a lot of them and pretty much every, you know, open source and publicly available bench and I'm happy to talk through that. In fact, we publish all these things. But for us, you know, the ultimate competitor for us is the speed of light. And we focus on the speed of light here at Cockroach because I think that is the ultimate limit in terms of, you know, distributed systems and these sort of things. And so that's kind of what we do here. So this is a unique distributed architecture. Cockroach is implemented as a distributed system where each node is a single atomic unit. It is the same binary in every single instance. There is no, like, different types of node in Cockroach. It's pretty simple. You spin up a node, I could spin up an instance of Cockroach right here in this laptop. As long as I have a TLS connection, I can connect it to a database. And instantly, this laptop right here would be a participant as that database. It would look like a logical database, like an endpoint. And any node in Cockroach is an endpoint, can service a transaction as well. There's no, like, one transaction node, whereas, like, things like Aurora, all rights have to go through a single node. We don't have that. It's like every single node is both reads and writes and will service transactions, right? And it's actually a really key part of our architecture, because this is shared nothing, right? This is truly a distributed system that was built for scale. It was built for resilience, right? And so, I want to get into, basically, let me just show you guys it. And I'm going to come back and show you underneath how we actually accomplished some of this stuff, technically, is a little bit more insight. But Chris, do you want to take over and show people the scale and resilience demo? Absolutely. And then, if you have questions throughout the way, please do. I'm monitoring them. I'll allow them into Chris wherever I can. So, thank you, buddy. Yeah, no problem. So, for folks at home, if you want to try this out yourself, like Jim mentioned, our documentation is pretty world-class. You can find a breadth of information of here around building out demonstrations, testing things out, wondering about all kinds of intricate SQL statements you can do. There's a ton of stuff in our documentation. And the one here I'm going to show you today is the tutorial around fault tolerance and recovery. I'm going to take you through the entire resilience process within Cockroach. So, I'm going to spin up a six-node cluster. I'm going to show you what happens, what a node is removed. You'll see that the database is resilient and can continue to transact and do its workload. And if you want to make things even more resilient, one thing that Cockroach allows you to do is to over-replicate. It starts with a default replication factor of three, but I increase that to five, to seven, to nine, depending on the type of things I want to survive. And, as Jim mentioned, we are a very resilient database, which is the reason why we pick Cockroach as the name of the company. Cockroach is a very resilient insect, and we are definitely a very resilient technology. So, let me walk you through how this works. So, like all great database demonstrations, this one's going to be through the command line. So, folks don't mind, but I'm going to show a lot of UI here. So, I'm going to kick off a demo, and all I've done is basically script a tutorial that you see out in the documentation. So, this is going to set up a six-node cluster locally here on my machine, and we'll give this just a second to spin up. But here's the admin UI. This is what it looks like when you first enter the Cockroach database. So, we have six live nodes that you can see here, and right now I'm loading data into the system. So, it's starting to replicate out some of the information. So, if I go over to the databases, I have a database that I'm loading in the background. We're going to actually run the YCSB benchmark. If folks are familiar with this benchmark, it's a pretty common industry standard. We're going to run this workload, and then as the workload is running, we're going to actually remove some of the nodes in the database so you can see the resilience. If you want to see that workload at, you know, doing its job right now, you can go over to the metrics page, and you can see we have a couple SQL queries running. Nothing crazy. We have about, you know, 50 queries per second running. So, right now I have a couple of selects, updates, inserts happening, and what we're going to show is that how, you know, how the resilience works. So, I'll give you one other thing to look at. So, here's how the data is distributed, right? We have a user table, and since this whole discussion around shards, what we're actually doing is we're creating what we call shards or replicas of that database, excuse me, of that table. So, in this case, we have basically three replicas of the user table. Now, if we had, if it was a bigger table, we would see more, you know, we would see more shards or more, what we typically call ranges at Cockroach. Right now, we have replicated three times. We can see those replicas are on four, five, and six, and we'll see this increase over time. Okay. All right. So, let's go back to the main screen. We have six nodes. Nothing's under replicated. Everything's healthy. All right. Let's go and actually remove a node. And I'm going to show you what happens. So, Cockroach is a very patient database. When failures happen, it takes back, it sets back, it looks at what's going on, and it will take the appropriate action over time. So, right now, I'd identify that one of the nodes are suspect, right? We know that it's down. Well, let's go look at the workload. Well, the workload might have taken, you know, a little downturn in the beginning, but you're going to see it pop right back up, and the database will continue to transact and do what is, you know, what it's meant to do. So, anytime there is a failure, like I said, the database is patient. It's going to look at that node to see, hey, is this, you know, is this node just suspect? Is it just a network partition? Is it going to come back? We'll wait for about five minutes before we determine that node to be completely done. But if it does come back online, it just adds itself back to the cluster, which you'll see here. And the database is really good at saying, all right, now that my, you know, my node is back up, let's look at the replicas, make sure that they're up to date. There's some magic that it does underneath to make that happen really quickly. It'll either send a snapshot of data over, or it'll just look at the deltas and replicate itself from the RAF logs. So, real quickly, we're back into good health. And because we only replicated the data three times here, I can actually only incur one simultaneous node failure, right? If you think about how cockroach works, we're very quorum based, right? So, we always want a quorum of reads and writes. And so, if we have a replication factor three, we at least need two replicas to say that we had a quorum write or a quorum read. So, if we incur more simultaneous nodes going down, you know, we can get into an unavailable state, which is not good. We want to make sure that we're always available. So, what you can do if you want more survivability within cockroach is that you can actually increase the replication factor. And this is an online transaction that you can do. So, I'll just show you here that here's a statement that if you want to increase the replication factor, and you can do this, you know, you can pick specific tables that you want to do this for. It doesn't have to be across the entire database. You can pick which objects, tables, and indexes go into what we call these replication zones, right? And so, in the background, you're going to see that the, I might have already done it actually, you'll see that the database will start up replicating itself, showing us that there's nothing under replicated. And we go back to the screen that I showed you before. We'll actually see that there's more replicas for that user table, right? So, you can see here we have more replicas now across the database. And Chris, just to be clear, like the replicas are basically replicas of each of the quote shards, you know, we call them ranges in Cochrane's DB, because ultimately the way that we actually break down tables, you know, we're ordering everything by primary key and everything else. And I will show you a little bit of that in slides, theoretically, on how that works. But think about this, every now every shard is actually replicated. So, it is always going to be available in case something goes down, right? Exactly. Exactly. So, in this case now, I'm going to show you when we have two simultaneous node failures, right? So, I'll kick that off. It's actually going to stop two nodes, and you're going to see kind of the same thing that we saw before. If I go back to the overview page, you'll see that two nodes will become suspects. Again, the database is going to wait and be patient to say, all right, you know, I have something, you know, wrong here. Should I start up replicating? Again, it will wait until it can see if that node comes back into the cluster. But the same token, I can see that the, you know, the transaction can continue on the database and that we're resilient to any of the failures that might have happened from a hardware perspective. And again, this could be node failures, it could be disk failures, even entire region failure that we can go ahead and survive through. Again, if you want to go try this out yourself, definitely go to our docs page to do the fault talents and recovery page. And you can try this out for yourself. It's a really simple and easy demo. There you go, Chris. Look at like you're, you were talking too fast to get into the under replicated ranges so that everybody can see it now. So, yeah. And so we showed a little bit of our own scale and resilience and how that works. This is all basically done under the covers, right? And so, Chris, you added more nodes, right? Yeah. And basically data was actually spread out to those no more. So shards were actually automatically written out to all those new places, right? In the same case of resilience, if a node went away, the database is smart enough to take that and say, okay, I just need to redistribute that data, right? That's correct. Yeah. Cool. Alrighty. So I will go back and show a little bit more about, all right. Here we are. So let me show you now how that all worked underneath the covers. Chris, can you see my screen? Yeah, you look good. Okay, great. I mean, my slides look good, not me. All right. So ultimately, like I was always saying, every table in cockroach database is really broken down into ultimately KV. So we are actually a database on top of a database. We've rewritten the database from the ground up, which is actually a pretty important concept. We aren't relying on existing sets of code to do things. We're relying on existing APIs, the SQL syntax, and being wire compatible with Postgres. But on other covers, we're going to speak that language. The entire SQL execution layer has been rebuilt, redesigned, revisioned. And then the way that data is actually stored to disk had to be changed, too, because we're actually storing replicas, right? We're storing multiple versions of that data underneath. So this gets you out of this crazy, like, active passive systems. And this is how we get to an active, active environment, right? Ultimately, every table in cockroach database is stored as KV. And it's a big monolithic, logical key space. And the primary key for the records in each table is ultimately what makes up the K in the KV store. Okay. And so that's at a very, very high level. What happens in a traditional database, you know, we store database, we store records, we create a new table. Ultimately, that's what an index is, you know, the index is ordered ball, bat, glove, and then it points to where this data is, right? And so that's like the traditional approach. We aren't doing that. Actually, when we insert records into the storage layer, it's everything's being kind of inserted in order. It allows us to do some pretty cool things, right? And so we're going to actually convert everything. Ultimately, everything is actually the key is basically the primary key, the table name, there's a couple of different things that are going on underneath here, and then column values. Let me just show you how this works, right? So here's a table, it's the dog's table. There's three columns here ID, name, and weight. And here's the entries that we actually want to put in here. ID 34, Carl, his weight's 10.1. So this looks like a relational, this is just a table, right? Well, for us, what we're doing is we're actually, when we actually convert that to the storage layer, what we're doing is we're taking each one of the referrals and we're going to break it out, right? We would say dog table, ID, primary key, 34. For this column name, write the value Carl. For weight, write the value 10.1. And so what this does is it basically breaks everything down into this ordered set of values, not just by the row, but of everything that's in that row, right? And now I have this big, long, huge table. And if I want to insert something, I know exactly where to insert it so I can actually keep these things in order, right? There is a lot more complexity here, y'all. Like this key isn't actually written as dog slash, I mean, we're encoding a lot of things and hashing these things and everything kind of gets, there's a lot of optimization at this level. But at the highest level, ultimately, we're breaking down a traditional table that you would think in a relational database into KV. And this is what we're doing underneath. And I use this because this is what's going to allow us to actually show you, Shardian is ultimately a physical problem, right? Like we have to physically locate data in different instances of Postgres. So this is how we actually deal with that physical bit, right? And so this allows us to have like these huge masses of mass efficiencies for sorting. It allows us to actually deal with no values and allows us to maintain the integrity of the data as well, right? And so coming back to this, so if I have this key space and I have all these records, what we're going to do is we're going to break down this table into these something we call a range, a Shard. A range is 64 megabits currently. We're looking at changing those values. That was the optimal size for us to actually be able to move these things around very efficiently within a cluster. They're designed this way so that we could split them and move them and do some interesting things. But the first range is basically the first couple records. The second range is the next set. And so you'll see basically there's this whole basically sets of data. These are basically taking care of the Shards. Our Shards are 64 megabits in range, very size, right? A range, if you will. We had to create an index structure to actually locate data so that when you want to actually find a record, you're actually interacting with an index. It acts very much like a B tree though. So it's an extremely efficient. The algorithms around B trees are phenomenal. And so that's we get a lot of speed there too as well. But it allows us to do some cool things. Like if you're just doing hash partitioning, which some databases do, and like doing range scans on these things are difficult, especially as you start to go across multiple different places. We can do some really cool things because we know where exactly this data lives. We can actually go get all dogs between muddy and sell it pretty efficiently. When we want to insert a record, here's your little command line, insert sunny. We actually go through the index. We say, okay, great, that's going to be in the third range, third shard. It's going to go try to insert that record. It's going to say, hey, space available. Great. Yes. Awesome. Okay, good. Let's insert that record. Wonderful. What happens when I want to insert the next right into that range? The shard is now too big. It's not it's over 64 megabit in size. Well, it's going to say I don't have enough space in this range. So it's actually going to split the range. And so the database itself automatically deals with the sharding and the creation of these ranges. We've just split the range. This doesn't happen in real time while the transaction was happening. There's a little bit of leeway in terms of 64 megabit size so that we aren't limiting the performance of a particular transaction. We'll actually let it flow through its size restriction and do this kind of later. We'll mark the range as like, hey, you need to split. And it'll do that in the background while there's no traffic going through it. So basically what's happening is we have now split this table into four ranges because 64 megabits and by X amount of size is what we needed to do. We have basically auto sharded here because ultimately in the back end, cockroach is dealing with the physical location of this data as well and where it's actually located. And Chris kind of showed a little bit of that, right? And we just add nodes and basically these things are just spread out amongst all these different things. So to the developer, they're just writing queries. They can access any node. Any node is an endpoint and it can find that data in that logical database no matter where it is stored. So this whole concept of sharding everything just completely melds away into the background and is handled by the database itself, same with resilience. Chris talked a little bit about how data gets placed. There was a couple different ways in which we think about this. We do have heuristics that are dealing with this. We typically will distribute data amongst the physical instances of cockroaches, the replicas that is. Remember, there's three copies of every piece of data based on, you know, maybe we want to have even storage across all of them. Maybe we want to optimize for compute, like there's a hot range or a hot shard, if you will. We're going to locate that by itself on its own server somewhere. We can even do things like actually we can embed a location or a field that we want into the primary key and we can tie data to a particular location as well. We call it geo partitioning. It's phenomenal. We had to do it for latency. When you have data distributed all over the planet, you're dealing with the speed of light issues. You kind of have to try to keep data where it needs to be so you have fast access to that. But it also helps with things like regulatory compliance where typically we're doing that in the application itself. And let's let the database just deal with all German records need to live on German servers. We can actually do that. And it really comes down to how we place the replicas in the database. Again, the database is just dealing with this all underneath. So let's just go through a real quick example. Here's our table again. There's three different ranges, three different charts. What it's going to do is going to take that data and it's going to instantly write it to different nodes within the database, right? We're using something called raft. If you're not familiar with raft, go check it out. It's awesome. Like it's just like it's phenomenal. It's a great intro to kind of distributed systems and how all these things work. And honestly, once you get that, there's some really, it's just, it's really cool actually. So here we are, we're writing data to four physical nodes. So we take the first one, we write three replicas across three different nodes. We're going to take the three blue replicas. We're going to write those three times across three nodes and the red ones. What we've done is we've evenly distributed this data across multiple nodes. And so now from a storage point of view, we have even usage yet the data is still available. So this whole kind of active passive thing kind of goes away and I'll show you how we actually deal with this resilience as well. Sometimes we'll actually place data based on workload. So I mean, I was talking about this before, sometimes you have a hot range. Everybody's really interested in muddy. Muddy's my dog. So that's the hot range. That's why I put muddy in there. And so everybody wants to access that record. And it just seems like, you know, there's lots of compute going through that range or that chart. We can actually locate that replica and pick it and pen it by itself on nodes by itself. And the database is just doing this underneath the covers. We're using heuristics across access and data across various issues and statistics is going throughout the system, including, you know, network speeds and everything else to actually to locate these or to put them on nodes where you're going to get an even amount of performance. Now, if you think about this too, you know, not only we optimizing for storage, but we're also optimizing for compute. So you think about active passive systems having a fully active like sharded database, you have about three EC two instances or three RDS and so whatever that is, the cost of that is insane, right? Like it's like double your costs. And so if you can actually now distribute compute more evenly as well, can you get much more efficient from your from your compute side as well. And so the database just dealing with this, you know, it's configurable, lots of buttons and dials you can deal with, but it's this is another thing that it's doing. When we add a node, so this is what Chris was doing, we added a node here. We basically you spin it up. As long as you have a TLS connection, you pointed at the cluster itself. And what happens is the database is smart enough to know there's a there's something called the gossip protocol, it knows a new node has been located. And it's going to automatically move data around and balance that data. So it's going to take one of the versions of the first the range one and move it over to node five so that I'm, I'm good. I've now kind of rebalanced. So that's how I just I just add I just scaled the database up. So instead of going through the process of resharding, which, you know, in the middle of the night, you kind of got to bring the whole database down and, you know, well, you had to do a whole lot of work to actually get the data right. And by the way, by the time you're done with getting all that data right, and then you bring everything down and you bring the new node and the new system up, you have an hour of downtime. And who knows what happened in between the time you got the resharding done to the commit. It's a nightmare. We simply just spun up a node pointed at the cluster and you're ensured that the data is consistent and correct across all these nodes and across every table and every chart or every range across all those tables. It just takes a lot of that nightmare around and just deals with this with this with this expansion, right? Same thing happens when we lose a node, when we have a perfect permanent failure, sometimes nodes go out of out of sync for like maybe 30 seconds or two minutes, we don't bring them all the way down and like in this scenario, but sometimes things just die. You know, all things in life, hardware and computing just in general, all organic things, they all go away. And so everything is ephemeral in some in some nature, right? And so sometimes nodes go away. And so what happens when like node three goes out and I had, you know, two of my replicas for different ranges on that node? Well, it's smart enough to know the leader of that replica set knows like, oh, I'm missing one of my replicas. So I better create another copy of that on another node. So the blue one creates one on node five, the red knows that he's missing one of its three replica sets. So created on node one. So I've just completely survived a failure. And so again, this whole kind of system of like everything's active active and every one of these nodes, every one of these physical instances could have serviced a transaction. I've now just dealt with this and I've scaled the cluster up. I've dealt with the failure and that's kind of the demo and the stuff that's been happening in the background. Ultimately, it really comes down to the way that we store data. And it comes down to the way that you choose really the keys that are being used, the primary keys for those tables so that, you know, the way that that we store data all comes down to how we choose these keys. When you implement CockroachDB, typically the things that we're thinking about when we start up a database and instance, you know, create database, create table, oh wait, I'm thinking about the kind of the logical structure of that data and my data model. When you deal with CockroachDB, there's two other things that come into play. It's not just the logical data model, it's also the physical. It's what do I want to survive and how fast do I want data available to end users. Because the same way in which we're able to survive things, well, we said I want a replication factor of three. You know, I want to make sure that these ranges of data, I could have said I want a copy of a range in three different regions so I can survive the failure of a region. You're actually defining that at the table level. The beauty is you can change this in production. It's an alter table command and you alter table, you say I want replicas to be all in a single region now, great. It will just start moving the data around. The database just does what it's told to do and that's really kind of the true nature of what Cockroach does. The last thing I want to talk about in terms of replica placement is we can tie data to a location. So if you have records that were EU based or US East or US West, we can actually tie those records to those particular locations. Wow, look at my colors off Chris. Is this orange? I don't know, I'm color blind. This red should be going over here. No, I gotta fix that. But I can actually say look at all data that has country code of US or all these states in the US which are east of the Mississippi, I can actually tie them to a location and that's really done at the table level. The alter table command is very, very powerful. We can repartition data and move it around where it needs to be. The database is smart enough to actually understand these things and make sure that data is going to follow a user in many cases as well so there is fast access to that. But this is a capability that I've not seen and this is taking charting to a wholly new level and not just basically automatically charting but tying data to a particular chart and it's really taking our ranges and tying them to physical instances. So I want to get back to one last point on resilience and I'm going to do one more demo here because I think it's awesome. Resilience is not just about say losing a node, it's not just about having active passive systems. I also spoke a little bit to this, you know, when are you going to actually put your new shards in place? When are you going to take the database down and put it into production? This is kind of one of those things that well it's going to happen at two in the morning, it's pain in the ass and nobody really wants to do with this sort of stuff. It's not even when you're just charting. Sometimes it's when you want to actually upgrade the database. So how do you upgrade a chart? Well I got to bring the whole thing down. I got to bring all the three shards onto the new instance and make it all happen. Chris is going to show you how we do this actually in production with a rolling upgrade. So Chris, do you want to take over and show that as well? Sure, absolutely. Let me know if you can see my screen. I can. All right, great. So again, I'm going to show one of these self-striving demos where we're going to take you through a major rolling upgrade. So let me just bring it up and I'll speak to it. So this is going to spend up a five-node cluster and it's all going to be in the 19.2 release. Let me just get that cached. There we go. Okay, get rid of this. All right, so you can see here we have a five-node cluster and you're probably saying, well, why is this admin UI look a lot different from the last demo? And that's because we're actually using a prior release. So this is the 19.2.5 release that you see here. Again, we have five nodes. We have some data that's coming into the system. So I'll show you that here. We have another workload that's running. Again, we're using the YCSB one. We can see we have just a small table and then it's probably pretty similar to the workload that we have before, just about 50 transactions per second. Hey, Chris, which node is this UI running off of? Right now it's running off node one, but you can run off any of the nodes. The nice thing about Cockroach is that you can grab this admin UI from any of the nodes within the system. It's not like you have to set up a separate There's no separate node. That's the side node or the admin node. It's every node is the same binary. It contains all this stuff. Yeah, I can show you that. So if we go look at all the nodes, so you can see here on the HTTP port for node one, that's how I would go get the admin UI. Node two, this would be here. So yeah, it's pretty neat that the Cockroach itself, the admin tools are inherently resilient. So if you can't get on one node, you get on another if that nodes down. So pretty neat how it works. Okay, so yeah, we have our workload running. We're going to upgrade this into the 20.1 line and I'll kick this off and I'll explain what's happening. What we're going to do is we're basically going to stop a node out of time. And you'll see this happen in the UI. We're going to stop the switch out the binary and then you're going to see the versions change over and over. And you know, when you're doing this in the real world in a real production environment, you certainly want to make sure that, hey, the cluster is in a healthy state, you know, the nodes are up, we have nothing under replicated, you know, there's some preparation you want to do, as well as even take a backup of the system. But then from there, once you're all prepared and ready to go, you can script this out if you need to and then slowly, you know, change the binary out on each of the nodes. You can see that's still happening here. And then once you get to the ends, you'll have a new cluster that's in the next release. And this is a major version release. It's not like we're taking the entire cluster down for a maintenance window to do an upgrade. And then let me refresh. There you go. There's your new UI. There's your new UI. All the nodes are live. We can go to the metrics and we can see that the workload continues to do what it's doing. And this is all happening in real time without having to take a maintenance window. So Chris, you basically had a cluster with multiple versions of the database running at the same time. How many versions back can we handle in this type of scenario? I mean, you could have just the version of the database just running without upgrading it, right? I mean. Yeah. So you can have mixed versions in the cluster. We don't recommend you do that over time. Like you definitely want to try to get consistent. Yeah. So what will happen is that as you're upgrading, we'll still support the 19.2 APIs so that things can still run. You basically have two APIs running. But then when you committed to the upgrade, there's basically a flag that you set to say, yep, I'm good. I'm happy with this upgrade. And then you're all set. You can always go back and downgrade if you want to. But once you set that flag saying you're good, you're on that new line. You won't be able to roll back then. You do give you a window, though, as far as how long you want to keep it up. But we recommend you set that flag to say, yes, I'm done with the upgrade. Right. And so when do people typically do this? I mean, is this just something that's done live? I mean, that's what we're seeing? Yeah, you could do it live. I think you definitely want to do it at a time whether you don't have a lot of activity going on the cluster. You certainly want to be careful, just like you have anything in production. You always want to test things out in lower environments, make sure you have your steps down, make sure you take in the course. Make sure things are not under replicated, making sure there's resources available. So all the diligence that you do in any production system. But then from there, you can script this out and do a rolling upgrade. Talk about autonomous database, right? This is a major upgrade I just showed you, from the 19-2 line to the 20-2 line. Yeah, it was a big one. I'm excited for 20-2, by the way. By the way, everybody, our naming structure for version is the calendar year, which is 2020. And then this was dot one, which was the first major release of the year. We'll have a second major release actually happen in the latter half of this year. So actually, Chris, there was a question here, and why we're still in the UI, and why you can only do the UI on? It's better to look at than us or a blank screen, I guess. I thought we were kind of back to you, but I'm happy to stay on. So there's a question. I assume Cockroach way of mitigating hotspots due to poor PK choice would be the continual splitting of ranges, right? And so in the case that somebody has actually chosen a bad PK, right, and now we have a hot range, how do customers typically deal with that? Yeah, so great question. So what we do allow is online primary key changes. That was in the latest release, so that if you do have a primary key issue and you need to change it, you can go ahead and do that, which is rather nice. Live in production. Yeah, live in production, if you needed to. Probably always testing, like I said, always testing the lower environments. But yeah, if you needed to do that in production environment, you can do that. So one thing you always want to think about with primary keys is you really want to shoot for randomness if you can. More randomness you can have within your keys, the more distribution you're going to get within the database. Now, if you do end up getting a hot range, there's a couple of things that can happen. One thing you can do is you can always split the range, right? And I think we do have a setting within the database after a number of transactions per second are hitting a particular range, it will self-split itself so that you can get more, again, more distribution and split the hot range into separate ranges. But that's, yeah, that's one aspect of working with Cockroach is to think about how you can create random keys, right? Yeah, I think it's kind of one of those things that's unique to us too. I think a lot of databases do kind of online schema changes, doing it in distributed environment and then making those changes and have the database start to move data around for you is kind of one of those unique things about Cockroach, right? One last thing I'll say to that as well is that we do have a lot of folks that try to migrate from legacy databases, whether it's Postgres or something else or Oracle and they have a lot of sequential keys. And you can try it out. If you are getting hotspots though, you have to think about how you maybe have to re-architect that or think about how you can handle those ranges. But yeah, it's good to know that we prefer for keys to be more random than sequential. Right. Cool. Any other questions in the chat there? Yeah, there's one other here. I'm just reading it through. Let me ask you a question while I read that and then I'll paraphrase again. It's a great question. So everything you show today, Chris, is available in the binary itself, right? And can you just talk a little bit about Cockroach demo? Because I mean, I'm going to show people like, yeah, you can go download core, you can start cloud, but we actually do package all of this in Cockroach demo, correct? Correct, right. So we have something called Cockroach demo. So in the binary, you can just spin up this thing called Cockroach demo. It spins up a quick and memory database for you, you know, gives you a web UI and you can go ahead and test out some of the, you know, enterprise features. So here you'll see, but you, you know, show databases, it gives us a sample database. I can say use mover and oops, sorry, wrong command. Yeah, I could, I could show the tables here and, you know, I can start mucking around a bit. I can import my own data if I need to. And what the, what, one thing that's really cool with Cockroach demo is I'm actually exit out of this. You actually have the ability to spin up a workload. So let me just make sure I get the right statement here. Okay, let's do this one. And what I'm going to do is I'm going to spin up a cluster using a topology we call geo partition replicas. And it's basically spinning up a, spinning up a nine node cluster. And you'll see this in the web UI. So yep, so right off the bat, I'm just creating a demo. It's actually doing load for me as well. So I can see actually load happening here on the database, you know, just a couple of selects and inserts happening. And then if I, you know, go to the node map, you can see that it's spinning up a, you know, geo distributed cluster. It's not doing anything on the latency side, you know, it's all local to your machine. But it gives you the, you know, the practice of saying, All right, let me just spin something up quick so that I can go test a few things. Let me test my SQL statements. Let me test my, you know, my application connectivity to the database. It's really useful for just local development, which is hard with some of the other cloud providers that have their own tools for, you know, for, for these services, like, you know, spanner and Aurora and so forth. Sometimes it's tough to do local development from our world. It's really easy just to do it from a local binary. Yeah. Yeah. I think actually spanner came out with a, they came up with an emulator recently so that it helps people actually develop, which, you know, that's good. You're still not going to get into the distributed nature. It is an emulator, but still it's, it's a good, it's a move in the right direction, right? So, but local development is a big deal. It's actually a pretty important concept, right? That's what happens. So, um, so I'm going to go back to the slides. You know, the question that it comes through, I kind of had to read it, like, and really read it. I mean, there's a lot of different, like, strategies, Chris, for getting, you know, the right PK and, and avoiding hotspots, right? I mean, like, the way you do hash partitioning, the way, you know, the, the, what, how we actually hash that thing actually takes care of a lot of that stuff, right? Yep. Yeah. Yeah. So I, I think it's about an advanced topic. I would, you know, I would encourage if anybody's, like, playing with that and really wants to have a really good conversation, join our community Slack, our engineers, our solutions, engineers, our support team, everybody's on that thing. Actually, we all kind of spend a lot of time in Slack, and our community Slack is, is pretty well populated with, with some experts on this whole thing. So, you know, one of the questions here was also typical use cases. This is absolutely OLTP. You know, we, we have some very basic concepts within cockroach database that allow us to do some basic kind of OLAP or kind of HTAP, if you think queries. We added a vectorize execution engine to the, to the database as well. So you could do things like, you know, I want to do a wrap up at the end of day of how much money I made across every single instance, you know, of all my locations and stores, and that data is distributed across the entire planet, right? And so like, helping that, that sort of query now, really deep analytical cubes and knowledge, like this is, you're going to do that in a data warehouse, typically. And typically the, the, the reference architecture that we implement is we're using, we have CDC, so we implement change data capture within cockroach. So you're going to CDC and use that to, to into Kafka sync or into an S3 bucket or whatever you want. And then, you know, run your analytics on that data from, you know, the appropriate tool that's built for OLAP. We have tons of, tons of customers doing that. But, you know, we, we, we are on the side of OLTP. And, and we're not just doing, you know, run of the mill, casually consistent transactions that you're going to find. And, you know, no sequel databases. No, this is the highest level of isolation for transactions you can get. This is serializable isolations. We are going to guarantee transactions within cockroach DB, even when they are happening all over the planet. Now there's lots of interesting software engineering that we've done to actually reduce those latencies. If you want to find some, if you really like to geek out on these sort of things, we actually just published a paper, a sigmod, as part of sigmod last week or two weeks ago, I guess it was, that actually gets into the details of, of what we've done here from, from how we're actually able to, to implement serializable isolation at distributed scale. So talk about the CAPFAM for a while and what that means. So there's some really cool stuff. I would go check that out if you're really interested in it. But this is OLTP, you know, it's System of Record stuff, but any old application that you need a database for is where we are typically a good fit. So, and then finally, if you, if you are interested in learning more, you know, please do check out, you know, Cockroach University, you know, there's hands on course work a lot of stuff that we talked about, you know, Will and Lauren will walk through this with you. They're both a joy to listen to. I would also encourage you, if you want to try today, to download Cockroach DB Core and run it on your laptop. You can spin up a three-node cluster right here on what you're looking at, or go over to Cockroach Cloud and get a cluster up and running immediately on, on hosted as a managed service. And so that's available. We're giving away a three-thirty-day cluster right now as well. So that is our presentation. There were a couple questions here, Chris. I would want to actually get through this. Do you know if any of your customers use Hasura with Cockroach DB? We're actually working with the Hasura team right now on the condo between us and GraphQL. You know, we have relationships with the Prisma team as well for GraphQL. And so these are some of those developer tools that we are pushing down. So stay tuned. Lots of work going on with that team right now to make that easy as well. Is there a leader region that accepts rights, or is this multi-master? Chris, how do you answer that? So you can do, right? Am I on mute? No, I'm not. Okay. Yes, you can do rights anywhere in the database. So you don't have to go through a single master. We do reason rights across the entire cluster. So that's a huge differentiator against some of the other databases out there that have to go through a single master. Yeah, Aurora is a single master. And when they do multi-master, it's only two nodes. It isn't more than two, right? So multi-master is really difficult. The person who asked this, you know, how do you deal with conflict resolution? There is the architecture of Cockroach DB. There's another presentation that gets into how we actually deal with transactions in the database itself. And again, we have lots of, like, our docs is a great place to go check that out. Like, there is a, I think the life of a transaction, Chris, that thing is, yeah, the life of distributed transaction, this document is just awesome. And if you really want to see how this stuff works underneath the covers, how we actually deal with conflicts, phenomenal work, our docs team just, they don't deserve, they deserve every single kudos I could send them because it's some really great stuff here and learning how this stuff works. So this webinar was really about, you know, sharding and scale and resilience and that sort of thing. So we really focused on that side. I do hope this was helpful for you all. I hope we stayed true to the title. I hope we gave you enough hands-on keyboard and a little bit of theory inside. I tried to stay away from slides, but sometimes you just got to use slides to explain some deeper concepts. But with that, Chris, thank you, buddy, as always. Did you enjoy yourself? Always a pleasure. Great, great talking to everyone today. Yeah, awesome. So thank you, Chris. Thank you, Lindsay, and the Postgres Conference team for having us. And thank you mostly to everybody for joining us today and taking an hour out of your day to spend listening to Chris and I talk about this stuff. I hope it's useful for you guys. I hope you go out and try it. And again, if you have any questions, any concerns you want to interact with us, our Slack channel is growing wildly and we're excited about that and we're excited to interact with everybody. So again, thank you very much. Scale and survive. Lindsay? Great, great talk. Thank you both for spending an hour with us. Thank you to all of our attendees. Awesome questions seem to be a really great group today. So with that, I will let everyone go. I hope to see you at the next Postgres Conference webinar and have a great afternoon.