 Candice, appreciate that intro. There we go. Thank you, Candice, appreciate that intro. Hey, Jim, how's it going? Hey, Chris, good. Going good here? I am. You ready to ham and egg? Yeah, I think that people can probably tell from your title and my title that you're actually my boss, so I have to just behave myself. I work for you, Jim. Whatever you need, I'm there for you, bud. So glad we can all be here together with the folks on the webinar. So we have a great agenda for you today. So just we'll do some quick introductions. Jim, you want to go first? Yes, I'm Jim Hatcher. I'm a Solutions Engineer at Cockroach Labs. I've been at Cockroach for about three years, but I've been working in software and databases for about 25 years. And as you can see from my title, I'm not a manager, but I'm thrilled to be on a call with a manager. Before, Jim, you don't even do your self-justice there, bud. OK. You've been working for software, but before you got the software, what did you do? You come from industry. So what did you do in some of your prior work? Well, one of my first jobs at a school, I took a job at Chase Manhattan Mortgage Company. This was before Chase was JPMorgan Chase. And I worked on a system where we had all these documents related to mortgages and the mortgage servicing world. And so we had some databases with millions or billions of records, which at the time, I thought was a lot. And then I took a job as a web, I started a company and started building websites using classic ASP, as it's now called, and SQL Server. And that was a neat job because I got to build dozens of websites for different companies. And for each one of those, we built a custom backend and designed a database. So that was a neat job for me because I got to really practice my skills at data modeling and querying. And then I eventually got into financial services. So I worked at First Data Corp for a while. And then I worked at a company called Cyber Source, which is a big payment processor. Later got bought by Visa. And then I worked at Western Union on a couple of systems related to risk management and fraud prevention. And those are the jobs. And once I got into financial services, where I really started seeing scale, like kind of enterprise scale on mission critical systems. And eventually, because of some of the scale challenges, I eventually got into no SQL databases, specifically I did some work with HBase and then a lot of work with Cassandra. And so I kind of got into the no SQL world. And so I worked with Cassandra for, I don't know, maybe six, seven years. And then one day I discovered Cockroach DB and in the world of Distributed SQL. And I thought that was a really, really interesting technology. And that's what brings me here. I've been working at Cockroach for about three years. And I kind of see in the benefit of what Distributed SQL can do. So it's, you know, I consider myself a database guy. I mean, I coded a lot in my career, but I always loved the database end of things. So I love talking about databases. So happy to do some today. Well, you're the perfect person for this conversation today. It's gonna be a little bit of history. We're gonna talk a little bit about databases. Yeah. Challenges we've had over time with scale and so forth. It's gonna be fun. And, you know, just to introduce myself. So I'm Chris Casano. I even know as manager in my title, I work for Jim. Jim needs something. I'm there to help him. And kind of like Jim, I was been in the database industry for a long time. I think maybe not 25 years, probably more like 24. Just maybe, you know, a little bit junior to Jim, but I've been in all sorts of things with databases, with data warehousing, search technology, you know, SQL technology, you know, transaction processing workloads. So I spent probably first 10 years of my career in consulting or professional services, doing large-scale implementations where scale was important. And then the last 10 years, doing more things around solution engineering for various different software companies. So like I said, I think you're gonna see a little bit of a history lesson here today of, you know, why distribute the SQL? What have we tried to do in the past and what we can do today, which has become a rather really exciting market, I guess, in the database markets, a new way of thinking. So let's get to it. What do you think, Jim? Can we move forward? I'm so excited. Let's do it. Yeah, I think this was like 10 minutes on intro. So I think people wanna see some real things here. So let's talk about some real things. You know, if we look at, and Jim, I'm sure you would agree with this. Like over time, you know, I've seen this in my career where there's all this, you know, ideas of trying to do more volume, right? I need you to do more query volume, more throughput. And the database that I currently have is not good enough. I need more database, right? And however you try to get there, folks try to do weird things over the course of time by trying to distribute a SQL database. And it hasn't been easy. And usually these are the main things that I've seen in the past. I think you've probably seen the same thing. Usually it's scale. Scale's kind of the big one, right? I need more query throughput or I need a lower latency. I need more CPU power to like lower, make queries faster or have massive data volume that I can't fit on one server. I need multiple servers. And then to a lesser extent, I've seen locality come up. I see scale is probably number one. Locality has probably been number two where, you know, they wanna lower latency by bringing data closer to users or data needs to be, you know, more localized in a specific geography. So this is what I've seen over the course of time. I mean, do you agree, Jim, or anything that you've seen that's not listed here? Well, yeah, especially in that OLTP world where, you know, you're not doing big batch scans of data, like you might be doing in the data health, did warehousing or, you know, machine learning or something like that. But in that OLTP world where you're powering a website or you're powering an API and you're doing like, you know, lots of little reads, lots of little writes, you know, that, you know, yeah, the scale in terms of just pure, you know, throughput, you know, how many transactions per second can I do? I used to do 1,000, I need to do 2,000 now. How do I do that? Or I used to do 100,000, I need to do 200,000 now. How do I do that? Or, you know, we've all seen the statistics about how much data is growing and how much data exists in the world. You know, like used to be like a terabyte of data and a database was, it was a huge database, you know, and, you know, in today's world, you know, it's not hard to get into tens of terabytes or, you know, push into, you know, much larger data sizes. So, yeah, yeah, I definitely agree. And, you know, scale, you could, I mean, you talk about scale, there's lots of things you can be scaling out. I mean, usually we're talking about, you know, all these, you know, distinct queries that we're doing, but yeah, you can scale to the multiple data centers or you can, you know, there's lots of pressure on different dimensions of your, you know, your topology and your setup. Yeah, yeah, no doubt. And, you know, it's even scale of just managing tons of databases, right? So, I mean, this is, you know, and we're gonna be pretty particular today. We're gonna focus mostly on transaction processing and OLTP type of workloads. I've seen the same thing in other spaces, whether it's OLAP and search, I remember back in my search days, we do, you know, we would shard, you know, all these search indexes and that was such a hard thing to maintain too, but yeah. So, we're gonna mostly focus on transaction processing and, you know, why Distribute SQL can, you know, reduce a lot of pain that we've seen as far as the solutions that we've had over the course of time. So, you know, if we bring it back to what folks have tried to do with Distribute SQL at a really simple or fundamental level, it's this, right? You wanna take SQL, you wanna take schema, the rights that you have, the reads that you have by using SQL and then being able to fan it out across all these different nodes or all these different servers so that you can do more throughput that you can lower query latency that you could bring data closer to end users. So, this is kind of like a simplified form of what folks have wanted to do over the course of time. Hopefully that makes sense, everybody. I know it's like kind of an odd diagram that you see there, but it's basically, you have schema, you have reads, you have rights, and you wanna distribute this across a whole bunch of nodes. And the way that we've done this historically is kind of fun. It's kind of fun to talk about because we've seen, because there's been a lot of pain, right? It just does bring you the scale that you need, but it brings you pain in other areas. Like the slide says, a history of trade-offs. So, I mean, I think we should go into each one of these and kind of talk about maybe the trials and tribulations we've had with some of these, but I don't know, anything you wanna add here, Jim, as far as maybe the trade-offs you've seen over time? Yeah, I mean, there's a lot implied in that last slide you had. We're creating a table, inserting into the table and reading it. You know, like, there's all kinds of little decisions there. You know, when you go from one purple box to six purple boxes, you know, I mean, historically a relational database, you sit on one big box and all the consistency guarantees around foreign keys and unique indexes and the ability to join is all, it's all built on the assumption that it's all gonna be in one big box. It's all in one memory space where we can check to see if there's a referential integrity between these things because I'm working at memory speed. And as soon as you say, well, hey, it sounds easy. I'm just gonna take that kind of work and I'm gonna move it across six boxes. There's all kinds of like gotchas and trade-offs, like you said. Yeah, I mean, you know, sharding, I think I'm a SQL server guy for most of my career. I can remember starting to see articles on how can you shard SQL server and you'd read those and think, okay, I kind of get how that helps, but that's a lot of work because as soon as you go from one box to two boxes, you have to have some type of algorithm that says, how am I gonna shard this data? I mean, am I gonna split up my customer's table based on name and A through M goes on the first server and N through C goes on the next server? And how does my application logic know that and how am I gonna read the data? And what happens if I had a third server? Do I redistribute the data again? And then what happens around asset transactions? Can I guarantee atomicity? What if I'm inserting a record into shard on box one and shard on box two? Now I'm not dealing with data in one memory space and so how can I give guarantees of atomicity and consistency and isolation? And so it just creates all these headaches and generally what happens is those headaches kind of migrate up into the app layer into some layer above the database. And I think what we all love about relational databases is like the database handles so much for you. You model your data in a specific way and you put foreign keys on your data and I can join the data and it all just works. The data handles the data stuff and up in my app I deal with the app logic. So, I think that's the pain point with sharding. In the NoSQL world, my journey with NoSQL comes from Cassandra and Cassandra, it does that sharding of data for you. That's one of the great things about NoSQL is it's a terrible name by the way, NoSQL. It's really be like no relational or something but there's lots of flavors in NoSQL but generally they all share the attributes that they're horizontally scalable that I can add more and more nodes to the server cluster to allow for scale. But they introduce all these same trade-offs. The sharding is kind of handled for you but how do I deal with concurrency handling in the data? Can I do joins? Like in Cassandra, you can't do joins. And so different flavors in NoSQL let you do different things but generally speaking, you're trading off. You're saying to get the scale that I want and to get the resiliency and the high availability I want, I'm gonna give up on the consistency. I'm gonna give up on concurrency handling. I'm gonna give up on joins. I'm gonna model my data differently. And so these are all sort of like, if you're depending on the use case you're looking at like you might, you have to look at that trade-off and say, man, how much do I care about that scale versus how much do I not want this headache of, you know, I'm the deal with all this stuff? Yeah, a lot of times it's a forced trade-off too, right? The business needs to scale and then all of a sudden you have to do all these things where the app team needs to change the way the app behaves and then the database operations team needs to coordinate how they're gonna do, you know, backups across all these different shards and so forth. So it's like one of those things we've lived with the pain of having to do it because the business needs to run this way. So, you know, part of the reason why, you know, Cockroach was founded was because of all of these pains, right? Like, and I think this gives the advent to the distributed SQL market is that we've seen this pain for a long time. So there's gotta be a better way to reimagine some of these things. But yeah, as you can see here, these are kind of like the, there's probably more than the three that you see here. But obviously these are like the top three avenues we've taken over the course of time as far as how we handle scale, right? There's that sharding use case where I almost look at these as like trying to manage your kids, right? So if you look at the sharding use case, you're taking your kids and you're putting each kid isolated from each other, right? And they don't know what, you know, no way to communicating with each other and they're all being told what to do in their own little world, right? So you don't have a team, essentially, right? You have all these like silo databases that need to help support your particular application, right? Do you actually do that with your kids, Chris? I do, I keep them all, they all stay in their separate rooms. They're not allowed to come out and talk to each other at all, right? Can you imagine trying to do that with your family? Imagine trying to shard your family, it would be terrible. But then the next one, like I'm gonna use a kid analogy again with no sequel, right? So imagine having inconsistency, right? You know, inconsistency with your discipline, right? You discipline kids in a different way or you probably give them rewards in different ways. That's gonna drive all kinds of different behaviors. And that's what you kind of see with no sequel, right? In some cases where you don't have transaction management like you did with sequel databases, this can be a big headache for mission critical systems, especially that deal with things like money, right? Having inconsistencies around currency or money is a big issue. And then the last one, like leader worker, right? Now you're just telling one kid that, hey, you're the leader, everybody else doesn't matter, right, like you're the favorite child, right? Can you imagine trying to run families this way? And this is what we've done. We've managed databases this way to try to get the best out of them. And there's all these trade-offs that come along with it, right? That the trade-offs on consistency, trade-offs on database operations, just to, you know, to match scale. So I think we've all felt this for decades now and there's gotta be a better way, right? What do you think, Jim? You think there's gotta be a better way? I'm hoping there's another slide after this one that... Yeah, there's gotta be a better way. So let me click through here a little bit more. So this kind of speaks to kind of what we've seen, right? Historically, and there is definitely a better way. And then I think the industry right now with distributed SQL has reimagined this a bit. But Jim, I know you're actually really passionate about this slide, right? Isn't this like one of your favorite ones to talk to her? It is. I mean, it's laid out as a quadrant of agility and reliability on one side and scale and availability on the other side. And so you can see this upward trend towards cloud-native-distributed SQL. That's kind of the mecca. But for me, it reads like a timeline of my career because early in my career, I learned about relational databases, what we're calling legacy databases here. I remember in college taking my first database course when we got to the section on data modeling and I had this light bulb going off above my head moment where I was like, I love this. When we started talking about third normal form and just the elegance of data modeling and every piece of data belongs in one place in the database. And then you started getting into SQL and what does it join? And what's an inner join versus a left join? And I was like, this is a really neat part of building systems. And then early in my career, when I was working with kind of smaller systems, I modeled lots of databases and I got really good at writing queries. Man, I kind of prided myself as being like this, the query master. It's great when your database is handling the stuff you wanted to handle and you're just working on fixing business problems. And relational databases are great for that until you get to enterprise scale problems and then the database starts to be a problem for you. And the business comes to you and says, I need to quintuple the size of the database and you're like, oh my God, I can't buy a server that's big enough to do that. Or I need to spend all this money on a big sand with the expensive cash but you just run into these ceilings and you're just banging your head against the ceiling. So for me, I got to a point where I was starting to build systems that were bigger and bigger. And so I got into no SQL databases, like I've got to have scale. And so when I got into Cassandra, I was like, man, this is awesome the way this thing scales out, but I had to give up third normal form modeling. I had to give up foreign keys. I had to give up unique indexes and I had to give up joins. And that just all felt like wrong to me, but it was like, but when you need it, when you need the scale, you don't have a choice, you know? At least, that's the way it was 10 years ago when I first started getting into that. And then the cloud augmented legacy databases, the cloud has some awesome features, including just the ability to quickly go from one server to 10 servers, has that elasticity and the ability to respond to things quickly. I don't have to go fill out an order form and wait six months for some servers to show up on the loading dock to get new hardware. And so you started seeing things like Amazon Aurora where the storage later was able to take advantage of some of the elasticity of the cloud. So those cloud features were really nice, but you still have the physics problems of how do I scale the data and how do I model the data. So when I learned about Cockroach and I heard that Cockroach is a distributed SQL database, meaning it's kind of the best of both worlds between the legacy data relational and no SQL and some of the cloud features, like I was like, that sounds really cool. And I'm also very skeptical that it actually does what it says it does. And so I started playing with Cockroach specifically and I found that, wow, it really does, you know, it gives me the consistency guarantees that I want from a relational database. It gives me the asset transactions I want, lets me model the data the way I want to and use the tools, the SQL based tools, but it has that horizontal scale of a no SQL database and it takes care of that effort. You know, I can go from a three node cluster to a 10 node cluster and that kind of auto scaling of, you know, auto balancing of the data is just built in and replicating data, it's just built in, you know, you know, so anyway, I started drinking the Cockroach Kool-Aid and really kind of, you know, so I like this slide. I sort of feel like this has been the journey of my career, like, you know, to kind of feel the pain of each one of these along the way and now to kind of live in a world where, hey, you know, I can have my cake and eat it too. Yeah. Exciting. Now I'm hungry, Jim. Yeah. Yeah, you know, I think it's still a lot of what I was gonna say. I mean, this is spot on. This is exactly what, you know, I've seen two in the past and you know, it was funny, like just database in general, I love databases. I remember just when I was starting my career and getting started with SQL, remember there was things I'd learned in math, I was a math major in college and there was things I'd learned in math class. I'm like, I don't understand. Why have to learn any of this stuff? Where does it apply? And all of a sudden you start using databases and it's like, oh my gosh, look at all the fascinating things that are tangible or real that you could do that are all math based. I remember like learning about unions and intersects. I'm like, oh yeah, I kind of remember learning that in logic. I didn't know why it was useful, but now I do. I'm trying to union or intersect data. This is, you could just easily do it with SQL. So all the stuff was a lot of fun. But yeah, I think once we got into those scale situations then it's when I got really nerve wracking and then I'm so glad there's a better way. And then a lot of what our founders in Cockroach and other distributed SQL companies as well all inspired from this long history of dealing with this. Being in the industry and dealing with this and trying to imagine a better way. And then we've actually come up with a better way. And we think that there's a whole new market, a whole distributed SQL market that it's really comprised of these like five tenants. So distributed SQL, it needs to be SQL first of all. Like that was something we don't wanna give up again. For folks like Jim and I that have been doing this for 20 plus years, love SQL, right? Why would we wanna give that away or change that? It's so universal and friendly for anybody that needs to talk to data. The next one is scale, right? We've talked about that countlessly today. It's gotta be an easier way to scale, both vertically and horizontally. Consistency is another one. Like why give that up? Why put us in a position where we have to make a choice around data consistency or not? Why not just include it as being the default? Like data is always consistent, right? That's wonderful. And then the last two, I think because of the way, because of the architecture, these are kind of bonus features that you get with it. It's kind of like a batteries included option for having distributed SQL. So the one is resiliency, right? So that's being able to incur failures. And having been in the distributed systems space for a number of years, and Jim, you were too, with working with Cassandra, there's so many different failures that can happen in a distributed system. And it's gotten so much better over the course of time. But just really living those years, I mean, trying to manage state and trying to manage configurations between all the different nodes. I mean, it was just hard with distributed systems. And it's gotten so much better to the point where you can actually rely on them. They are fault tolerant in a lot of different ways. You can lose a node, you can lose a rack, you can lose entire data center and still have resilience. So that's like kind of another key feature that's like a batteries included option that comes with these distributed SQL systems. And the last one, I think that's really kind of nifty and new is this whole idea around geo-replication, right? Because you have a distributed SQL system now that can span not just data centers but different regions of the world, you can do some nifty things. You know, I have one customer kind of describe what you can do with Cockroach as being almost like a, how do you describe it? You describe it as like a, jeez, it's getting the term. Of course, this is gonna happen. Right in the middle of the webinar, I forget, I lose my thought. It's like a mesh of, like a network mesh of servers. That's the term you're looking for. Yeah, like what's CDN, that's what I can remember, content delivery network, right? So like Cloudflare, like it's a, you can have data pushed all the way out to where it needs to be localized, right? So think of a CDN, but also with transactions, right? So imagine being able to do that. That's kind of what distributed SQL database gives you, right? You have being able to push data all the way to the edges of where it needs to be consumed or accessed, but then also have a ways of handling, you know, asset transactions traditionally, like you've seen just in single relational databases. Yeah, also the ability to deal with compliance things like GDPR, where you've got some data that's required to stay in Europe and you've got other data that's okay to kind of span a cluster that's across Europe and North America, for instance. You've got to, Chris, you have some children that you actually keep in Europe and that you don't let your kids in America know about, right? And then I was just trying to extend that metaphor. Sorry. Yeah, no, they always travel with us. I wouldn't be that bad of a father. Okay. Okay. But you know, one day it will happen. One day they'll go on their own, they'll go to college, you know, they'll need a significant other and they'll travel along the world and then we'll just be a distributed family. So my wife's family is more like that. They're distributed all over the world and my family is all from Long Island in New York and no one ever leaves Long Island. So it's like, you know, that speaks well to data residency. If you're from Long Island, you pretty much stay there for your entire life. It's like localizing data there. That's what we do. We just stay in Long Island. But yeah, so, you know, when you think of the distributed SQL market, these are the five things to keep your eyes on, right? It's another way of re-imagining what we've done in the SQL world all, you know, through the course of time. And that was a better way of thinking about not just how to scale, but how to do, you know, how to scale without having to stress out your application team, while having to stress out your database operations team and being able to do these things in a different way. The last thing is kind of another bonus too. You know, a lot of these databases too, can also be cloud agnostic, right? They not tied into one particular, you know, cloud provider. You can put them and deploy them where you see, where you see FET, whether it's, you know, different clouds, whether it's on-prem infrastructure, even Kubernetes. I know we're doing this today for the Linux Foundation. I know, you know, Kubernetes is very big in the Linux Foundation. Cockroach and some of these other databases all run in Kubernetes, which is a weird thing to think about, right? If you think about Kubernetes, if you think about, you know, stateless apps, like why would you put a stateful database on Kubernetes? Well, because of the distributed nature of it, it allows you to actually handle failures. Like if a pod goes down, like the database can maintain its state and still be able to be resilient and serve the communities of the applications or services that are accessing it. So that's another one. And then of course security, right? Distributed SQL systems are not going to really exist out in the marketplace unless they're fully secure. Enterprises are not going to adopt them unless they're, they have full-blown security. So these are the key things. If you're looking at distributed SQL and you have these, you know, five to seven things in mind, this is one good way to start looking at some of these distributed SQL systems. Hey, Jim, anything you'd add? We should check to Q&A too. See if we've got any Q&A going on here. Oh, we do. We have a question. All right, so we have a question from Ollie. Doesn't geographic aspect, doesn't the geographic aspect required to have a backend close to? It does, right? So, you know, if you think about a database that's going to be geographically distributed, you have to have infrastructure in those geographies as well, right? So, so yeah, the backend, I mean, especially in cloud, this is rather useful, right? If you think about all the regions that a lot of the cloud providers have that you can spin up infrastructure in those regions and then you can have these distributed SQL databases spread across or located in each of these different regions, but all communicate together and so that you have one logical database that just spreads across all these geographies. So, pretty cool thing. Before, you know, before trying to do that, it was replication gymnastics, right? You'd have one primary database and that would have to replicate over here and then maybe you have to replicate to another region. Then if you have, you know, if you're trying to coordinate this with multiple rights, right, you have rights from all these different databases and you have to sync the data between all these different regions and reconcile it, it's a lot of gymnastics, but being able to do this in one database that can spread across all these different regions and have consistency, have reads and writes that can be done anywhere, really opens the doors for simplifying your architecture. Yeah, I guess we touch on cloud agnosticity, but we're agnosticity. That's a new word today. Yeah. Just invented a gem. Yeah, yeah. I mean, that's an interesting concept. We talked a little bit about cloud augmented databases and, you know, they're kind of the quote unquote cloud databases out there. You think about Google Spanner, you think about Amazon Aurora or Amazon RDS or Amazon DynamoDB and, you know, Microsoft has Cosmos they have a lot of interesting features and they would fall under the umbrella of distributed SQL. But, you know, one kind of thing to think about if you're considering moving to one of those databases is, you know, like if you go to Amazon Aurora and it runs in Amazon and if you decide one day I don't want to run an Amazon anymore, well, you can't run Amazon Aurora, you know, and, you know, there are certainly vendors out there. I mean, Amazon keeps moving into different spaces. I don't mean to pick on Amazon, but, you know, you hear about Amazon Bada Pharmacy Company, Amazon Bada, you know, banking company, Amazon Bada Transportation Company. All of a sudden you find yourself, Amazon's a competitor to you and so you think, oh, I need to move off of, you know, DynamoDB and so that means you have to, you know, migrate your whole database to something else because I can't take that Amazon DynamoDB and run it on prem or move it to GCP or something. So, you know, when, you know, there are several flavors of distributed SQL, including Cockroach that are cloud agmostics. So they offer the same type of features that are available in the cloud and they offer the cloud native aspects of some of those things, you know, they can take advantage of elasticity and they can take advantage of the scale of the cloud, but they can run in any cloud. So, you know, Cockroach, for instance, you know, we can run on IaaS resources and you can even have a multi-cloud, you know, a deployment where, you know, you've got a deployment of Cockroach where there's a single cluster, but you've got some nodes running in AWS, some are in Azure and some in GCP. And so that, you know, for certain use cases that cloud agmosticity, as we've defined the term, can be an important factor, whether you're trying to survive, you know, you have a survival goal of, I want to be able to operate even if it was on US East one goes down, you know, or, you know, you just want to have some, you know, leverage and when you're negotiating your cloud contract, you know, just to say, hey, look, I can move off of this cloud if I need to, you know, so there's some mentioning benefits there that I think are worth considering. Yeah. Yeah, portability is a big one, right? Being able to run anywhere should be, you know, you shouldn't be constrained to being, you know, only locked into one place. It should be, you should always be portable. Yeah, so this kind of sums up what you can see with like distributed SQL databases now. So, you know, folks here might not have seen the database just like this, but here's what you'll kind of see in this space now is that you'll have a database that's kind of the middle layer that you see there with all those purple boxes where all these nodes or all these servers in cockroach's case, every node can take a read or write. Some of these distributed databases can't always do that, but that's the idea, right? You want to be able to distribute the workload across all these different nodes. Any node can take a read or write and a new interesting way of accessing that is through a load balancer. And, you know, you always take load balancers in front of other things, but in front of a database is kind of an interesting thing, but what you're doing is basically round robbing the traffic around each of the different nodes of the database because you don't want all the traffic to go to one node. There's some, you know, there's those cloud augmented databases that we mentioned before. Well, you have to do that, right? Like all writes need to go through one primary writer and then that gets all the traffic and then there's probably read or read replicas that don't get that right traffic. But in this new world, you could just send your reads and writes everywhere. Now, what this gives you is that you have that over, all of a sudden you have this horizontal scale and you have vertical scale too, but you get this, you know, theoretically endless amount of nodes that you can add, you know, you can add to a database. So it's useful if you're scaling, you could just scale kind of on demand and some of these databases allow you to just scale without any downtime. That's a big piece here. Is that because of the way the systems are architected, you know, planned downtime and actually we say unplanned downtime is lovely, you know, it's not as critical as it used to be because these distributed systems can tolerate failures a lot better, but where this changes the game is planned downtime, right? Instead of having to say, all right, everyone coming on to Saturday because we're gonna do the upgrade or because we're gonna upgrade our Linux, versions or so forth, now it just becomes just taking a node out of service, doing the maintenance that you need and adding it back into the cluster, right? So planned downtime can actually be done intraday, you know, it doesn't have to be done on the weekend. So a lot of good goodies that come with this type of architecture and the main ones are really at the bottom of the slide there, right? It's built in resilience, built in resilience gives you all that goodness for unplanned downtime, but also for planned downtime. Horizontal vertical scale we mentioned, automatic data sharding, this is a lovely one, right? Instead of having to have your apps team stay, all right, you know, you see that we have the three apps and services up top. If we had a shard of database, we'd say, all right, you know, app service one, you go to database one, app service two, you go to database two. Now all they have to do is connect to the load balancer. So with your app developers, it looks like they're connecting to a single instance database, but if you have the covers, there could be hundreds of nodes behind that load balancer. Right, so you abstract all that away from the app developer. So they could just develop their app and just point to one endpoint, you know, to access the database. So all that sharding is all done in the back end for you, which is just, it was just a lovely, it really is. It's such a lovely feature. And then lastly, we talked about geographic distribution, right, being able to service these nodes all throughout different regions and data centers and so forth. One thing I'll point out here is this is kind of inferred, but there's replication involved. So, you know, when you write a piece of data into a distributed SQL database, you expect that that data is actually gonna be written in several places. So you might imagine, you know, I'm gonna write this customer record and it's gonna end up actually in like nodes one, two and three here. So that, you know, by having the data replicated in multiple places, it means, you know, that's part of the reason we're able to do things like, you know, how am I gonna go take node one down and we're gonna patch the OS is because the data is actually on several, you know, in several places. And so I think that, you know, some of the magic of distributed databases is that that all feels just transparent. Like your application doesn't really know that that's happening. It's all handled at the data layer. So, and when you talk about, you know, we're gonna replicate the data, we're gonna distribute it across multiple copies of the data in multiple places. And then we're also gonna have a highly consistent database. You know, those are two things that are very much at odds with each other. How do you keep multiple copies of the data and keep it totally in sync and totally consistent? And so I think that's where the magic of kind of having a SQL layer that sits on top of a distributed storage system because, you know, when we talk about SQL in the simplest sense, we mean the language, but, you know, SQL implies there's data consistency, there's joins, you know, the things we expect from a relational database that, you know, I'm not gonna write into a table, immediately read that data back and get some other value and think, well, that's weird. You know, why isn't my data consistent? You know, that layer that maintains the consistency across multiple replicas is really important. Yeah, perfect. Cool, well, you know, I wanna be cognizant of our time. We actually have a couple of questions in the QA. I actually don't wanna go to thank you. Let's just, let's go to some questions. And if folks have questions from our presentation, feel free to fill them up in the QA. We'll just get to a couple. So, you know, let's pick off a couple. So, Vikran asked a great question. Is CockroachDB open source? It is. You can actually go see, you know, that's in the spirit of the Linux Foundation as well. You wanna go see, you know, CockroachDB, you can go to GitHub, GitHub slash CockroachDB and you can see the source code there if you'd like to, you know, we do take, you know, community contributions as well. So we love being an open source. It's a great tool for us to engage with partners, with the engage with customers. So yeah, CockroachDB is open source. So thanks for the great question, Vikran. All right, so we have another one from BoKar. Wondering if agnostic means no dependency to a cloud provider. So since Jim coined the phrase agnosticity, did I say that right? You did, yeah. You wanna answer that one? Yeah, I mean, you know, the term agnostic in general means I'm not relying on one. So like, yeah, I mean, if we say cloud agnostics, you just mean there's no, we don't care about cloud. You know, we're not married to one cloud. So there's probably a family metaphor in there, but I'll just get past it. Okay. All right, cool. So we have another question from Samson. Is there a leader worker in the cluster? Great question, Samson. So in this case, there isn't like a traditional, so if we go back to a couple of slides, right? Let me see if I can just click back. Right, so if we have this leader worker set up, like you have here on the bottom of the slide, where it is this note or server all the way on the left at the W, that would be the writer and then you'd have all these readers. Cockroach is architected a little bit differently. So this would be kind of like more of a leader worker set up the way cockroach works. And we don't have a good slide to show you this here today is that all the data is sharded across all of the different nodes and these shards are called the ranges, but each one of those ranges has a leader and that leader will help coordinate the reads and writes. So what happens is that you have reads and writes that can be done all over the cluster because it's now it's distributed. So instead of thinking of a leader at a node level, a leader is really at a shard or a range level and that's how the distributed reads and writes work within Cockroach. Okay, let's go to the next question. All right, so Sam is saying I am new to this database. Is it possible to run distributed SQL database on a single in-house multi-core server for internet access only? Or does it always require a cloud connection? What are the names of distributed SQL databases that you recommend to be used? Okay. You made a take down? Yes, sure, go for it. So yeah, for Cockroach, Cockroach DB, we replicate the data three times by default. So for a Cockroach cluster, the minimum size you want is three nodes and that way you're able to have one replica per node. And so in that configuration, you can take one of the nodes down and do some maintenance, bring it back up, do the next one. So to get that kind of high availability, minimum size you want is three, even for like a dev network. So we do have the ability to like spin out Cockroach on your laptop in a single node configuration. That's really just for, it's not something you'd want to, certainly not for production, probably not even for dev, but if you just want to kind of play with queries and when you're running that mode, you just create one replica. So if you, there's no taking down to the node and still getting to the data elsewhere because there's just one copy of it. Yeah, so, and Sam, just to answer the second part of your question too. So yeah, you can totally run, Cockroach specifically, you can run in-house if you wanted to on-prem. Usually you'd run it across multiple servers instead of like one big multi-core server just because it's a distributed system, like running a distributed system on single database. I mean, on a single server, doesn't really give you the benefits that we've kind of called out here today. You also asked like, what are the names of distributed SQL databases that are used out there? So, I mean, obviously Cockroach, you know, Spanner was kind of one of the first ones that kind of pioneered this space. And then there's some other ones as well. There's TidyB and Yugabyte and some others, but, and then Jim mentioned a couple with Aurora, but they do follow more of like a leader of work or type of arrangement. All right, so thank you for the question, Sam. And let's see, let's go to another one. All right, the question is really starting to come in now. So Ollie's asking, is there a separate node to ensure all database upgrade, to ensure all database updates migrations? Okay, so I guess he's asking if there's a coordinator that does all the database updates and migrations. I could take, you wanna take Jim? Sure, I'm happy to take it. So, yeah, so Cockroach, the query's coming in, the request's coming in, select statements, update statements, et cetera. You know, those get round robin between the nodes. And like Chris said, we break data down into ranges. So, and those ranges are done by size. So if you have like a customer table, you know, with millions of records, we'll break that customer table down into ranges of about 512 megs in size. And the records within that range will be sorted by the primary key. And then each one of those ranges gets replicated three times. So like on this diagram, you might see, you know, customer one, the range where customer one through, I don't know, 200, they're all, they're replicated on nodes one, two and three. And one of those replicas will be the leaseholder and all the rights have to go through the leaseholder and all the reads have to go through the leaseholder. And then when you write, you have to the leaseholder plus at least one other copy. So that's where we need quorum. So anyway, so that's kind of the basic mechanism. And then we use the RAF protocol of Cockroach, which is a distributed consensus protocol to make sure that, you know, that we can guarantee the animosity of rights. And so, but those leaseholders can move around. So like say node one was the leaseholder. And so you're writing through there and you're doing your reads through there. If that node went down, there's mechanisms where within those ranges, they're kind of gossiping and talking to each other and saying, are you up? Are you up? You're up. And there's a process called leader election where if they node one goes down, the nodes will say, hey, we lost our leader. And, you know, basically one of the other replicas will become the leader. And then, you know, and then there's some auto-healing things built in where now we have two of the three replicas available. We might need to create a third replica to get us back into a fully replicated state. So at any given point in time, there is like one node that you need to talk to, but that node is not like written in stone and your app doesn't need to know where it is. Your app just needs to say, as long as I can connect to Cockroach, Cockroach is gonna route me to the right place. So, you know, so like you can read and write to any node and any node you talk to might have to redirect you to some other nodes to involve those leaseholder activities, but, you know, that's all kind of transparent to the app. You know, as far as you're concerned, you're just talking to this one single, you know, load balancer and, you know, you're getting your reads and your writes done. Cool. We definitely have, I know we're just about out of time, but we have a few more questions. Let's try to hit them if we can. So, I love David's questions here. He asked, can you give a customer use case examples where geo-replication has been used for localized read and write requirements? Any examples would be good. Could be local compliance, control data access from other regions. Sorry for the newbie question, but very interested in what you had to say today. So, yeah, we've actually run into this a lot. So, a lot of the use cases that tend to come to distributed SQL, the same use cases you know of today, right? It's really about modernization of the same use cases that you know. Where we've seen this, especially with like geo-replication or being able to just fan the data out to multiple different regions. A lot of our SaaS provider clients or customers, right? Where, you know, they have customers accessing their service from different parts of the world and they wanna have data locally. So, this use case is, you know, sometimes we refer to it as metadata, right? It's metadata for powering a particular service or for a SaaS provider. We see that, you know, the multi-region use case be used predominantly for those types of use cases. Identity access management's another one, right? So, if you think about identity management when users need to authenticate or, you know, be authorized to see data. Again, if you can push those authentication authorization data out as closer to the users, they'll authenticate or authorize faster. So, yeah, it's a bunch of interesting use cases. Like I said, these are the same use cases you've known all throughout these years. They're just different ways of reimagining it in the distributed SQL environment. Yeah, typically either you try to get the data closer for latency purposes or you're trying to get the data in certain places for compliance reasons, I would say. We're gonna hit some quick easy ones. So, Shakar asked, will it work in containers? Quick answer is yes. You wanna run cockroach in containers? You absolutely can. We were very well in Kubernetes. What else do we have here? I was a question around, anonymous attendee asked, is Cochrane's DB suitable for OLAP workloads? Typically, Cochrane's purpose built for OLTP workloads. There's some OLAP things that we can do, but it's not our primary wheelhouse. OLTP is really the main wheelhouse. Let's see. Question here about whether we prefer certain Linux distributions and the answer there is we don't. We're pretty Linux agnostic as well. Usually in production, we run on Linux or Unix. We can run in Windows, but we don't really recommend it for production. Okay, cool. So we have a, again, another anonymous question. Do you have KPIs for distributed SQL versus the traditional SQLs? With replication, three nos, as Jim mentioned, what is the trade-off from SQLs to new SQLs? Great question. This comes up all the time. Folks ask us all the time, what are the benchmarks? How do you compare benchmarks? Your best benchmarks are your own, right? So you can take any of the standard benchmarks out there, TPCC, YCSB, there's a ton of them, right? Your best benchmarks are your own. And because Cochrane's distributed SQL system, there's a small penalty you pay for doing a right, right? Because you're going to another node to make sure that we have data consistency. In a single instance database, you don't have to pay that penalty, right? So you can be a little bit faster there. So what we might make for, the trade-off might be for query latency, you get all these other things as far as resilience, horizontal scale, online schema changes. So like I said, the best thing to really do is look at your own benchmarks. We're looking at performance. Like I said, the performance in Cochrane, which is awesome, it will probably never be as high performance for writes like you would have on a single instance database. So you just have to keep that in mind. All right, we could probably take about maybe one more question. I don't know, Jim, do you have anything teed up? Otherwise, I'll look for this laundry list of great questions we have here. Mark's asking about whether we support transactions. So transaction being, I gotta insert into the customer's table and also the orders table and maybe the word item table. And I wanna do that in a transaction where it's asset compliant, meaning it's atomic, it's consistent, it's isolated and durable. So we do, and the mechanism by which we do that is the raft protocol. But basically, if you're gonna write like a multi-statement transaction, you get round robin to the node that the load balancer sends you to, that node we call the coordinator node or gateway node. And then it's responsible for figuring out what the right range is for each one of those inserts you're doing. And it coordinates with the leaseholders, which might be on other nodes. And they create a right intent, which is like, hey, I've written the data, but it's not committed yet. And anyway, so that coordinator node makes sure that all that, those right happens to those various tables. And then when each one commits, then the transaction itself commits. So like in Cockroach, we provide full asset compliance. And yeah, there's some pretty cool tech behind how all that happens. But yeah, so there's no eventual consistency in Cockroach. It's very strict consistency where a CP system in the CAP theorem that helps. Cool, well, this was fun. I know we do have some more questions that are out there. I don't know if maybe we can follow up through email with some of these, but Candice, I might hand it back over to you to kind of wrap us up here. And I just wanted to say thanks to everyone that joined the webinar today. Always fun doing these things. If you wanna know more about Cockroach in particular, we do have a Cockroach University where you can learn more. But we hope today's session was great and we'll kick it over to you, Candice. Thank you so much, Chris and Jim, for your time today. And thank you everyone for joining us. As a reminder, this recording will be on the Linux Foundation's YouTube page later today. We hope you join us for future webinars. Have a wonderful day.