 Thank you so much, Candace, for the wonderful introduction and welcome everybody for joining today. As noted, please do ask questions along the way, I'm happy to take them as I'm talking through things with you and super excited to get into this. The concept of this as a presentation really rose over the last, I'd say about three or four months and funny enough, I was at KubeCon in Valencia, gosh, I guess it was about a month ago, something like this. And I was having a really good conversation with some people who are looking at databases and thinking about infrastructure and Kubernetes. And they were pretty well advanced in that. And I think what I realized is that we're doing a lot of work in becoming cloud native in the lower layers of our infrastructure, right? And I think what the CNCF does around governing of projects that kind of lend to this whole movement towards becoming cloud native, I think we're doing a great job and I think there's a lot of value in it. And it was great in Valencia to see so many people like doing real things, which I thought was great. But I think, you know, in light of everybody kind of going in this direction and in doing real things, I think one of the things that I realized was that, you know, as as we become more and more cloud native, the more we can push these principles up the stack, not just in infrastructure, but into into our data and our databases. And then even further, you know, into the way that we think about our applications and the services that we build, I think that's that's where the real value that that's where the real value is had, you know, having this this end to end tie of, you know, really thinking through the principles of being cloud native about, you know, automation and and being able to scale and being resilient and, you know, the kind of these core concepts. And I just I found it to be kind of very, very intriguing and almost fascinating to kind of see how these things work. Now, today's conversation is really how do you push that up into, you know, the the data layer? And I I do believe that the database is is is infrastructure. I think it's below the line in terms of, you know, infrastructure or app because it's just a lot of things that are done in the database that are actually really, really critical in terms of fueling the way that we think about our applications. And so this is really more of a more or less a beginner session, if you will. I do like to give people a little bit of a caveat before I start, kind of which level are we going in here? But this is basically kind of what to think about when you're actually, you know, choosing a database and in the context of cloud native and Kubernetes and distributed systems, I definitely have an opinion. I am a principal product evangelist at Cockroach Labs. Cockroach Labs is a cloud native database. It's a database that was, you know, built with automation of scale and inherent resilience in it really built to be a distributed system. And really honestly, the very first application I ever saw run on Kubernetes about five or six years ago and when I saw it, I couldn't unsee it. So my name is Jim, you can follow me on Twitter at James. I own basically all things James James dot com. You can mail me anywhere. I'm Jim at Cockroach Labs, but my job is to go out and talk about these things. I'm fairly passionate about them. And so hopefully this is valuable today. But as Candace noted, if you do have questions along the way, please throw them in the chat. I am happy to take them. I'll jump in and check them out along the way as well. And by the way, stick away, stick, stick till the end. We'll definitely make it worth your time. If you if you stick it all the way to the end, it looks like a little rally book with a cockroach on it. Anyway, so databases and the choice of a database typically comes down to one thing. What did I use in my last project? What am I familiar with? What do I already know? What do my friend use? What are the tutorial use? And honestly, I find this approach to choosing a database when we think about our applications is honestly, like not very responsible, you know, hopefully that thing is going to work in the context of what we want our applications to actually go out and do. And I think if you talk to most developers, this is the way that decisions are made around a database. And this is such a critical moment in time because the database powers a lot of things that happen in the application. And so to take a step back and to think through what it is we want to accomplish with this application. You know, what are the various different workloads in that application? What are the various different tools that I need to actually deliver on what we need to do? I think that's really, really critical. And unfortunately, you know, database choices just get made when somebody starts building. And well, fortunately and unfortunately, it's very easy to to do these things in these kind of modern environments. So I think it's kind of a but it's I think we all need to take a step back and think through, like, what is the value of the database, the overall context of not just, you know, my application logic and what I'm trying to do, but all the things that go around that once I'm successful. And that's really kind of what this talks about, right? Because I think these seem like very easy decisions. But if the wrong choice is made, you can wind up with something that's kind of an impedance mismatch of what you're actually trying to accomplish. And so we're going to talk through some of those things around, you know, how people make these decisions. You know, what are our infrastructure choices? How does that actually weigh into the choice of what I'm doing? What are the requirements of that workload? What are the what are the data requirements within the context of that? And then costs and available resources that you have as well to manage these things. So it's a quick little framework of kind of how we're going to talk through things here. So I start always these conversations when people are asking me about databases. And look, I work for a database company. My job is not to basically make cockroaches work everywhere. My job is to make sure that the right database is used for the right workload and the right workload and the right application really comes down to kind of how are you going to deploy this thing? What is your application architecture? And and depending on what that is, you know, certain databases do well in certain situations. You know, in the old world of databases, you know, the way it worked is, you know, I code here on my machine and, you know, underneath my desk, I had an instance of a database running. I'd serve up Tomcat and like that. Like that's way back in the day. You know, today we're building these kind of distributed systems where very kind of like their microservices running all over the place, all accessing, you know, a single consistent, you know, RDBMS. You know, does that thing need to live within my four walls? Is it a managed service that I just grabbed from some cloud provider or some database provider? Like, how is the application and the deployment architecture of my application? How is this going to affect my choices? You know, is it going to be global? Is it going to be across the whole United States? Am I having running services in multiple different data centers? What does that mean for my data? And I think that's such a critical component. I really like to start there in all conversations about really thinking about the database because the way that we deliver applications has fundamentally changed. And I don't care if it's like an app that you're, you know, marketing to thousands or hundreds of thousands of developers or it's an internal app that's kind of, you know, being used by a department within your organization. The sea change has happened in general we no longer kind of build software, you know, create a make file, build it, burn it on disk and send it to somebody, God, that's a long time. Nor do people just install, you know, download and install as much anymore. Really, basically the shift has happened where we now publish software. We publish it as a service. People subscribe to that software. We provision an instance for them, right? And so this kind of new way of thinking about the delivery of your application is also actually pretty important when you start thinking about the data that lives underneath that. Am I gonna have an instance of a database for each, you know, instance that I provision for each of my customers or each of my tenants from my particular service or am I gonna have one big database that's actually gonna take care of all that? Okay, so how do I segment that, right? So these are all really big decisions when we're thinking about where the data is gonna live and how it's gonna service up into our application. Now making the wrong choice in this context can actually result in a whole lot of technical debt because this is kind of one of those things. I'm really good at building my service but like this backend infrastructure about managing customers and accounts and usage and all the metadata that goes into this kind of these services that I'm deploying for people. It's a wholly other world. And so how do we create less of a challenge as we're building out something that isn't kind of fundamental to the service or the application that we're building? Again, a key consideration point as we're choosing database because ultimately the database is where this stuff is all gonna live. That's why we do these things, right? And so I got a little ahead of myself in slides but cloud services, where does my application data live? What is the system of record across all these different things? Is it consistent need to be consistent? Is it available? Is it available everywhere? So using a document store or relational database or KV or graph, like what are your choices and what's right for what you wanna accomplish? And this is kind of one of those areas in which how are you gonna deliver that software? So first of all, how it's gonna be deployed. Second, how are you gonna deliver that thing? I think the third point is the database isn't the only thing. Sorry, in your app, center of the universe and as great as it is, is not everything because typically within even the smallest of organizations there's gonna be a lot of things that live around your application. Is there a data warehouse where you're doing ongoing analytics? Is there a streaming analytics engine like a Kafka or ISA valent or something like that where you're taking things off in kind of an event-driven architecture? I don't know, it depends on what your application is but can the database integrate with these things? And these integration points go well beyond like the data itself. Is the database gonna integrate with things like your security frameworks and your authorization frameworks? Is it gonna work directly with Kerberos and LDAP and Active Directory and all the key things that kind of go around it? Like, am I gonna be a model of the database using things like Datadog? And so these integration points are actually pretty important too, right? So there's the deployment, how it's gonna go, how I'm gonna deliver things and then basically how it's gonna fit. How is it gonna work the way that you work? And so thinking through those things also is a key consideration of choosing the right database because if it doesn't work with these things you're gonna be stuck with, again, technical debt managing these things yourself or trying to build them out yourself because it didn't integrate with something that was kind of easy, right? So again, the how you will deploy will be actually pretty important. In the context of this webinar and I think the Linux Foundation and CNCF and when I started thinking about Kubernetes, man, is that ever really important to think about when you're thinking about the database? Typically, on the left-hand side it's just like without any sort of Kubernetes. What are you wraparng with Kubernetes? What's running on and in Kubernetes in your deployment? Is it just the business logic? Is it just the services? Or are you running the database within Kubernetes? Which is the far right. Maybe you're running all your services and you're accessing some managed service within a cloud provider. That's great, it definitely works. Does it make sense for you to actually gain the scale of Kubernetes or this kind of always on time resilient nature of Kubernetes and run, say, I don't know, a distributed database within that environment? It might make sense for you. But it really, again, depends on kind of your team, what you wanna, how you wanna deploy and kind of what's right. I do believe databases inside a Kubernetes cluster can be really quite easy. I think people will struggle with that. They just don't know there's certain things out there that actually are cloud native, right? And so thinking through this and understanding is the database gonna actually fit within my deployment style from a cloud native point of view as well. Another key area to actually think about. So the second part of this conversation is about workload. And when I think about workload, I think about what is the application doing and what is the data that we're consuming here? And I think if we ask ourselves, what is the data and what is the nature of the data? What are the requirements around the data? We'll get to the foundation and the lower level thinking about which database is correct for us or what solution is gonna actually eliminate most of the headache that I have with my data. And so I like to start about thinking about data in terms of data modeling. Look at, I come from where I'm a relational database person. It's kind of where I work, where I'm at. And I've always thought in relational database. I never really understood document model. I think I gave up on document around X to no. But I think about data integrity. I think about referential integrity. I think about joins. I think about aggregate views. I think about secondary indexes. I think about all the various different kind of concepts within a relational database that provide a whole lot of power. That if you're doing it in say a document store or some other store, do you have to code those things? Is that technical debt that you're adding? However, sometimes a document database or a document store makes a whole lot of sense. I like to think of it as kind of like if it's the system of record where data needs to be correct and data integrity is actually a requirement. It's important. It needs to be consistent. It needs to be reliable. Then it's more of a kind of relational workload. Am I going to be doing some more complex things with the data around joins and aggregate views? Am I going to want to change the way I look at the data or query it in different ways? Do I want to have an easier integration with my data warehouse? Because gosh, man, a lot of the analytical stuff just use SQL, right? Is that important? Or is it more like the workload itself is more of a system of access thing where I kind of know what the pattern is. I know what the data is going to be. It's not going to change a whole lot. I want really, really fast, reliable access to that data. Maybe the document model is right for you. But thinking through the... So you don't have this kind of impedance mismatch. You have an impedance match, right? Like what are you trying to accomplish with that workload? And is the database and the solution right for you? And all databases aren't everything for everybody. As much as database vendors would love to tell you like, oh my God, we can handle it all. It's just not right. And it really comes down to the architecture. There's a reason we have lots of different databases because we're trying to do different things with different workloads. And so getting this match between workload and what the data is and what you want to accomplish, absolutely critical. Absolutely critical. And I think one of these other things is actually pretty important. And this comes back to a lot of kind of the... I think there's a lot of some interesting... Let's just say interesting marketing from database vendors. I think everybody says, oh, you have to have an asset transaction. Acid this, asset transactions. Acid is like this term that everybody throws around but I'm not sure people really understand what acid means. And I think it's actually a pretty important point to ask. Ask your vendor if they know what acid is. That's a great question to ask. And more importantly, what does the I in acid mean? What does that isolation level mean? Cause I'll tell you right now, nine out of 10 developers have no idea what an isolation level is. They think it's a tuning performance parameter but they don't understand basically the concepts that are going into isolation of a transaction within a database. So getting familiar with these concepts and understanding what various different isolation levels mean within the context of you and your data because really it's gonna come down to the throughput, the number of transactions you're doing, the chance of overlap of transactions and really the appetite you have for correct data or not. Is data consistent? Is it eventually consistent? So understanding the isolation level of the database is pretty important. This is a great little like diagram of the various different levels of consistency going from the bottom all the way to the top whereas you're gonna be guaranteed consistent at the very, very top. So if you don't know of Jepsen.io, Jepsen.io, Kyle does analysis of distributed databases on this website, it's pretty awesome actually. Go check it out. They have a pretty good description of kind of these consistency levels, these isolation levels and really some of the issues that can go wrong when these data's happened. Now, ultimately I think every database has a default isolation level. For Cockroach we're serializable. We actually just wanna take this toy out of everybody's hand. Like let's be serializable and then tune the craziness out of the performance of the database underneath it. But let's let reads happen, relax, so to say. Regard, there's not a commercial for Cockroach. But we are that. Other databases have default isolation levels as well. I think it's important for you to understand what that default isolation level is because understanding kind of these various different things that can go wrong with your data is actually pretty important. Now, remember there's trade-offs between isolation level and performance in your database. And it's actually pretty important to think through that too. So again, what are your requirements around the data and really that interaction or basically the consumer experience around that application is. And so again, all of these things are pretty important. So oddity, eye and acid, a big deal. Don't let people just say, oh, we're acid transactions. Talk to them about it. It's actually pretty important. I think scale is also kind of one of these things that when we start building an application we may not think about it. It's like, I just got to get this thing to work. I'm not thinking about onboarding my thousandth or let alone my millionth user, right? Like, and what that's gonna mean for really the backend infrastructure that's supporting this thing. But I think we should think about these things up front. And why not architect for scale when you can just get that as an easy solution? And when I think about scale I really think of like three vectors. I think of, you know, number one, the amount of data like the how much storage is it gonna require the size of the database if you will. Number two is how many transactions are you gonna have? You know, we have customers are doing like a million and a half transactions a second and that's a crazy workload. And do they need to do those transactions all over the planet? Right? How are you gonna stand up to that? And then third is this are they gonna do that all over the planet? Like where will our users be? I think of geographic scale. So size of the database in terms of storage, transactional volume, and then geographic scale I think are the kind of the three vectors around scale that typically I think people should be thinking about because ultimately when you go to scale on application scaling the database is really gonna come down to, you know, get a bigger box and just, you know, more compute and more storage and I can have a bigger database. And but that's gonna run into a limit that's vertical scale, right? And once you kind of run past this limit of vertical scale you don't wanna go down that path. Most people when they think about, you know scaling up the database, they'll implement sharding. So we'll think about horizontal scale, right? And so basically just the concept let's split the database in two and then, you know, we'll mitigate, you know where queries are going across these two different shards. It gets pretty complex though. Once you get about three to five or, oh gosh resharding databases is it gets really, really complex. And, you know, how many resources does it take to actually manage that sharded database environment? These are the things that are all of question but it's up to you to ask like how am I gonna scale? And what is gonna be that long-term kind of impact of me with this database in terms of how I'm gonna actually address these things in the future. And I'm not saying one way or the other go this database that but it's actually important to think about this because look it if the application is like some simple like, I don't know I'm managing birth dates for all my friends and their kids that the scale that thing is gonna be what a couple hundred records, maybe a thousand I may not need to have to think about these other kind of components of scale, right? This is not as important, right? There's gonna be a transaction at the day and I just use Postgres and it's something simple something easy, right? Again, it's really getting into the data and what your expectations are around scale and resilience and some of the other things that go into this because, you know ultimately maybe you wanna actually span out between, you know, multiple different geographies because I'm getting reads and writes all over the country, you know are you gonna basically just have two instances of your application and then, you know two big databases and you're synchronizing those things, what do you have the resources to actually manage that synchronization? Do you know how to do that? What happens when something goes wrong? Are you gonna be okay with having downtime and is there buffers and pooling between these two things? Like thinking through the database how does it handle this kind of regional scale as well as pretty important because tell you what, you know, regions fail all the time it happens in cloud providers and so can you survive the failure of a region and how are you doing this with the database? You know, without any sort of interruption because ultimately that is what's causing the interruption is it's the database itself and honestly, can I scale both reads and writes? Can all instances of the database take reads and writes? How do I deal with, you know conflicts, these sort of things? So geographic scale is kind of one of these things that if it is a requirement gets pretty complex and there's some key requirements on the database to handle this and then let's go a little bit further, you know let's go like, let's get broad geographies Europe, US, South America are you, can you deal with the latency between these various different systems? So that, you know all my users in Europe have to access a write node that's in California is that going to be acceptable latency from a user expectation? What about privacy and compliance? You know, how do you set this up so that, you know data doesn't leave a jurisdiction? Another key concept to think about when we're choosing a database, right? Regulatory compliance is another key area to actually think through. And then, you know, I think of this all the time and, you know, uptime requirements really come back to kind of not just the audience experience but the consumer experience of your application but also in terms of, you know the business continuity of what you're trying to actually think through and accomplish. And I think a great question to ask yourself when you're thinking about these things is what kind of failures can I survive and what do I need to survive? What is my failure domain that is acceptable for my application? Is it a rack? Is it a server? Is it an entire AZ? A region? A cloud provider? Do I need to ensure against these things? Because ultimately look y'all, everything fails. And so understanding the failure domain for your application, your workload what is acceptable for your application? It's actually pretty critical because ultimately the way that, you know old relational databases are, you know we have this active passive system where, you know, synchronizing from one to the next if the active or the primary fails you fail over to the secondary that works then and maybe it's good enough for you. Is it time to start thinking about say an active active database where everything's on all the time and you have full HA across the board because I don't wanna have to come back from a failure. Another great question in the concept of this is what about planned downtime? How often am I gonna update this thing? Does it do rolling upgrades? That'd be really nice. I don't have to take it out of production. More importantly, is schema gonna change along the way? How much am I gonna be changing this application? Can I do those sort of changes without having downtime? Right, because I don't, you know it's a critical requirement of what we're trying to do. Another great question to ask about the database, you know and it's really not about the database it's really about your application. What are our uptime requirements? Now, let's find the right database to meet those things, right? Like what are our scale requirements? Let's find the database to meet those type of things and that's kind of two of the most critical. So we talked a little bit about kind of deployment and architecture and data and scale and resilience and some of these things and some of those questions around there. I do think that the choice of the data store that sits underneath is about people and money as well. And when I say about people it's not just the expertise around a particular solution that is right for you. It's about how many people or the efficiency of resources it takes to manage that or the cost it is to basically get all these capabilities into a single solution to help you out or what is the cost of integration with something that's not there? That sort of thing, right? So resources are not to be dismissed in the context of this conversation because ultimately, great, I can manage 104 shards of my sequel but if it takes me 11 people to do that and I'm paying a fully weighted salary in the Bay Area like let's be nice, say it's 700,000 and that gets really expensive really, really fast. So maybe I've used open source but I have this kind of like huge massive dependency bill because I didn't use the right database, right? I also think when I think about like the cost of open source, I think it's interesting, you know? There's like, if I think about like, you know, like the cost kind of thing, the commercial open source, yeah, it's kind of free to start. You can upgrade features via open core. They give you support and maintenance. You got to think about those costs that are involved in there. But I think more in terms of the support, I think is one of those things. Like you can adopt, say, an open source database, say on a cloud provider but are you getting the support and maintenance that you need around that? Or do you go with, you know, a smaller vendor to actually get, you know, direct line insight into the roadmap and the community and direct support in a Slack channel. And so I think all these things actually are resource requirements. And so thinking through what it is you're gonna have problems with or whatever it is, how are you gonna solve those things actually takes up resources? When you start to think about these cloud services for a database as well, often, I'm not saying with all of them, often they're hidden costs, you know? Can I scale up storage and compute separately in terms of what I'm paying for? What is it cost to actually run this thing across multiple different regions? Do I need, you know, an added-on utility to perform, you know, change data capture? What about backups? Do I pay for backups? Is it part of the overall cost? What about IOPS? Am I getting charged egress costs, right? And so there's calculators for a lot of the cloud services out there that will go through a lot of these different things. It'll be actually pretty important to think through because it isn't just the licensed cost of the database or maybe it's free and open source, but like all these other things have to happen. How is it gonna integrate with everything else I have in my organization? How am I gonna make all those integration points happen? And so it's not just kind of like the, hey, it's easy to get started and I got, you know, this low time to value. It's the ongoing stuff that can actually start to show more cost. And, you know, going through these calculators, going through line by line, the various different things you have to actually think about also pretty critical when choosing a database as well. And then there's the not so usual cost calculations, you know, how many people does it take to manage this thing? Do I need SREs? How do we deal with developers who have issues? What about the time it takes to get this thing up and running? What about the time it takes to expand into another region to get another instance of this thing up and the resources and the people that are involved about it? What about downtime? Is it guarded against downtime? Do I, what if I wanna upgrade the database? Do I do that in the middle of the night? What's the cause on the people that are working with me? How do we get back from this downtime? What happens when something fails? And, you know, I went from a primary database to a secondary and all of a sudden the primary is back and I have to actually remediate and make sure they're the same across these two-door sources. I haven't lost any data. There are so many kind of like these not usual cost calculations that go into the choice of a database. And I think it's actually important to think through these things. How do you architect them out of your overall cost structure long-term? How do you avoid the technical debt that this might cause you long-term by making the right choice upfront? And I think that's really kind of the, this key of this whole conversation. So here's kind of nine questions I think about, you know, when I start to think about, you know, what are we gonna think about for databases, you know, how will the database support my application architecture? We talked a little bit about it. How is it gonna integrate with everything else? Is it gonna fit my deployment style? What is the data? And what are my data integrity requirements for my workload? Am I gonna have to model these things? Am I, is the document model right? Is the relational right? You know, what is scale? And remember, scale is not just, you know, the size of the database, but it's transactional volume as well. It's, I'm gonna have requirements for my workload that are global. Am I gonna have uptime requirements? Where will my users be? And then I think ultimately what is my budget and budget is not just money. It's not just like the cost. It's the time and the resources and the people that are needed to actually manage these things. And I think these are kind of nine key questions. I'm sorry, there's a couple of little typos here, excuse me. But those are nine key questions that that might be good. I like to start with one thing. And there's just one question for all these things is, what is my data and what are its requirements? And if we think about a database, ultimately what that database is doing is managing data. And look, y'all, like if databases were easy, man, we would have a whole lot of them. We already do have a whole lot of them, but like we would build a new database for every application we have because, well, gosh, you know, like I'd be able to hit the requirements exactly then on. And it's the requirements of the data that are actually really, really important in this choice. So I implore everybody, don't just make it a choice on a decision because I use something in the past. Ask deeper questions, get informed, understand some of the concepts so that you can actually make the right decision that are not just gonna get you started to get you going quickly, but it's gonna allow you to actually avoid long-term technical debt. Now, at Cockroach, we really believe like distributed database is the way to go, the way that we implemented the database, you know, when it's single logical database, if we can handle, gosh, incredible amounts of rights across many different regions, every node is an end point so they can all service, right? So we really look at this as kind of a true distributed system. Across multiple regions, we can have a single logical database across multiple Kubernetes clusters. It's gonna integrate with all the backend applications, the data warehousing, the event streaming infrastructures. And so this is a general reference architecture that we use to help kind of talk through where Cockroach can actually help with organizations and talk through some of the issues. So if you're interested in that, we're happy to talk to you about that. You can get started with a serverless instance of Cockroach DB right now. I won't go too much on this. It's free up to a certain point, five gigabytes of storage, 250 million request units. You can go try this. It's basically you get a Postgres instance free. We're wire compatible with Postgres. So it looks a lot like that, but you never have to worry about scale or resilience and these sort of things. And as I know, at the very beginning, if you're interested in learning more about Cockroach database, go check out our O'Reilly book. I think we're still giving it away for free on our website. So go check that out. I'm really happy that we got a Cockroach on the front page of our O'Reilly book. It'd have been weird if they gave us a llama or something like that. So go check it out. There's a QR code. You can actually scan through and get to that. And so with that, I wanted to thank everybody for joining me today. I don't think there were any questions, but if there are, I'm happy to take them. But I do hope this was valuable to you. I really tried to make this not about, we and what Cockroach does in our database and like at least give you some good thinking points around what you need to think about if you've never really thought about the database before. And so I hope there's at least a couple of things that were valuable to people along the way. And I do thank you for taking the time out of your busiest days today. And I wish everybody to have a great rest of their day. So thank you so much. Thank you so much, Jim, for your time today. And thank you everyone for joining us. As a reminder, the recording will be on the Linux Foundation YouTube page later today. We hope you join us for future webinars. Have a wonderful day. Oh, and Jim, we got one question. Someone asked me if you could put the QR code up again. Oh gosh, you guys, I was on mute. I'm sorry, can you see my screen again? Yes. OK, great. Sorry about that, everybody. Don't worry. There it is. QR code, definitive guide. Go check it out, read all about us, and then try it. I got to tell you. So we're happy for you to read the book, but try it. I think the database is pretty cool. So that's my big advertisement. So thank you, Candace. Thank you so much, Jim.