 Okay, let's go ahead and get started. My name is Ben Darnell, the CTO and co-founder of Cockroach Labs. I'm here today to talk to you about running CockroachDB on DCOS. So today I'm going to talk about this concept of a cloud native database, what that means and why you should be interested in using one. And then also why you should be looking at CockroachDB specifically as your cloud native database. And why this kind of database is a natural fit for advanced orchestration platforms like DCOS. And then I'll be giving you a demonstration of the new DCOS package that we're announcing today. So what is a cloud native database? Well, cloud native, we think of it as being a collection of features, including horizontal scalability, where you can just add new machines easily. Individual machines in the cluster aren't really special. You don't have primaries and secondaries. You don't have different kinds of replicas. And the entire system is built to handle failures transparently and provide you with continuous availability. So why is this important? Well, because it helps your business adapt to change. Your database can grow and shrink based on demands, both of storage and query traffic. It gives you very easy and rapid development so that your developers can make progress quickly and get changes into production easily. And the cluster can be self-organizing to balance load and reduce latency. This is especially important in global businesses where you typically have traffic patterns where the load follows the sun. Traditional databases have trouble with these characteristics. Traditional databases are more comfortable scaling vertically than horizontally. They make you distinguish between primaries and secondaries and read-only replicas and on and on. And frequently, this varies depending on the database. You have a manual and error-prone failure process, often involving asynchronous replication that may lose data when there's a failover. In Cockroach DB, on the other hand, you can add nodes to the cluster at any time and the data automatically rebalances across them. Any node can serve any role. There's no distinguished primaries, secondaries or anything like that. And failover is automatic and everything is handled using a consistent consensus-based replication. And so a cloud-native database is a good fit for an advanced orchestration platform like DCOS because it provides a lot of functionality that you need to provide this level of service. So first of all, DCOS, like a lot of container platforms, provides elastic allocation and scheduling. So you can say, I need ten copies of this process and it'll find space on your hardware to run that. It provides service discovery, network virtual IP addresses, load balancing, that sort of thing for managing communication between both within the database and the database and your application. And then one of the most interesting things that DCOS provides now is what we think of as weekly persistent local storage, which solves a dilemma that you tend to face in container-based deployments of databases. In sort of the first wave of container-based deployments, you have a couple of different options for how to manage your actual data. One way is to just use a RAID array or some sort of network attached storage that will store your data persistently and reliably and redundantly. But this is expensive, you can spend tons of money on an enterprise-grade network attached storage box. But you don't really need that when CockroachDB provides its own redundancy internally. On the opposite extreme, container platforms will often provide you with just ephemeral disks that gets wiped whenever your job gets rescheduled. But that's no good for a database either because if you lose too many nodes at once, then you may lose all the replicas of your data. And so using the new DCOS SDK, we can take more control of the scheduling process and give you a stronger degree of association with the data that's on your node's local disks so that you can have a good degree of reliability with data that's not, where you're not paying for multiple levels of redundancy. And so today, here at this conference, we're announcing the release of our DCOS package. Which you can install with a single command DCOS package install CockroachDB. And this will start up a three node cluster for you. And this, you can change this later by going back into the DCOS interface and changing the node count variable. You can also do all of this through the command line or through the web-based interface. All right, so before I go into demonstrating this package, let me tell you a little bit more about CockroachDB and what it can do for you. CockroachDB is an open source SQL database for global cloud services. So the key things that it provides are distributed SQL. So you can use the SQL language and your existing tools across a large pool of resources. And it provides data integrity at global scale, which means high availability or multi-active availability as we call it. And the entire system is built on a system of consistent transactions and consistent replication. So distributed SQL makes it possible for CockroachDB as a database to grow along with your application. This is full-fledged SQL with acid semantics and indexing joins the whole deal and it runs across multiple database servers. And so this lets you take advantage of both distributed storage across your pool of resources and also the distributed computational resources on these nodes. So in CockroachDB, scaling is always transparent. Your tables can grow to any size. There's no manual partitioning or configuration needed as your data grows. You can just add more machines at any time and the data rebalance is automatically across the pool within constraints that you can configure if you need to. Also support distributed execution of your SQL queries. So when you do an aggregate query like a select sum of duration from sessions, this query is actually going to be farmed out to all of the nodes in the cluster that contain data for the sessions table. And each of those nodes will compute their partial sum and then send those intermediate results back up to the gateway node for the final sum. And so this gives you a very efficient way to operate on large amounts of data spanning large numbers of machines. So DCOS gives us flexible scheduling for these different database nodes. And so you can schedule your CockroachDB nodes near your application servers for lower latency. You can even schedule them on your application servers. If you end up with unused disk capacity, then you can just start up a Cockroach node to kind of make that disk space available as a part of your database to the rest of your cluster. And you also have a lot of flexibility in terms of where you place these resources. You can span your cluster across the globe. If you have at least three data centers, then you can survive the outage of any one data center. You can also survive the loss of machines within these data centers. So this gives you the flexibility to put your data close to your customers and provide them with low latency access. So I mentioned multi-active availability already. This is the way that we provide to survive disasters. And this is sort of a new term, so I'll explain what I mean by that. You can think of disaster recovery originally as being about backup and restore, so you would have a database and a bunch of services talking to it. And then you would make a backup and store that remotely so that in the event of a data center outage, you'd have something that you could use to get back online and recover. Of course, this was pretty painful because it was a very manual process and took a long time. And so in the event of an outage, you would have a lot of downtime. So this evolved and we moved towards a more efficient failover model where you would have a hot spare database. So you'd have a priority and a secondary with asynchronous replication between them. Which means that you really have much less downtime when there's a failover. But it's fairly expensive because you've got this huge pool of resources in your secondary data center that's just sitting idle most of the time. And so then the next step in this evolution is active-active, where both sides of the link are serving traffic at the same time. This doesn't actually get you a huge amount more efficiency because even though neither side of the connection is sitting completely idle, you don't want to work this cluster too hard because it will, if you are using both sides at more than 50%, then one goes down and you've exceeded the capacity on your remaining data center. And so you still have to keep utilization on these clusters low. There's also a problem that this replication can either be consistent, can either be synchronous or asynchronous. So consistent replication is going to be synchronous. It's going to impact your latency and it's also going to hurt availability because when the second data center goes down, the first one actually gets blocked because it can't commit its two-phase transactions across both replicas. If you use asynchronous replication on the other hand, then you've given up on consistency of your data. And there's always an opportunity for conflicts between the two data centers. And so the cockroach DB model, which we call multi-active availability, is an extension of active-active to more than two data centers. And this sounds like a small thing, but it's really kind of a game changer because it makes it possible to use consensus-based replication instead of being stuck in the asynchronous log for reading model or the synchronous two-phase commit model. And so in this mode, you have at least three replicas of everything. And whenever you go to commit a change, that gets broadcast to all of the replicas and two out of three or three out of five need to acknowledge the change in order for it to be considered committed. And so you don't need to rely on every data center or every replica being up. But you just need two out of three so that you have the better of the two latency wise. And you still have high availability and the ability to tolerate data center failures. And the same pattern plays out on a smaller scale, even if you're in a single data center, because everything is replicated three ways and can survive the loss of individual machines in the data center. Whenever a machine goes down, that kicks off an automatic repair process that will go and re-replicate that node's data onto the remaining nodes. And so on top of all this highly available replication, we also support distributed transactions. These are full fledged acid transactions that give you all of the other guarantees you expect from a relational database. And like I said, it's built on this consensus replication model using the raft consensus algorithm. So all changes go to a majority of the replicas. And you can never lose committed data, as long as you don't lose a half of your replicas. We have distributed transactions which can span rose tables, even databases. There's no restrictions on what can go into a transaction in CockroachDB. There is a limitation on the size of a write transaction. But there's no limitation on what tables or other objects can be included in the transaction. And if you're familiar with the concept of SQL isolation levels, that databases give you a number of different configurable settings that you can use for the isolation of your transaction. Cockroach defaults to serializability, which is the highest of the four standard SQL isolation levels. Because we think it's important that your database provide the maximum amount of consistency, because it turns out that actually trying to think about all the different ways that things can go wrong in lesser isolation levels is really difficult. And we don't think that's a good trade off for developers to have to make. And so now I'm going to give you a demo as long as the Wi-Fi is cooperating. So this is a brand new DCOS cluster that I set up this morning. On AWS using the default installation instructions. And so I can go into the package catalog and find CockroachDB and click Deploy. And so it's a one-click process. This is going to start up five tasks in DCOS. The first task, which is starting up right now, is the scheduler. So this is using the DCOS SDK instead of Marathon. And so it takes on some of the scheduling work itself. And so this is the first task that gets started up. There's very little custom code in this package and actually in CockroachDB 1.1, which is coming up pretty soon. Even that custom code is going away. Okay, and so now we can see that the other four tasks have started up. And so what's running here is the scheduler, a metric server which translates Cockroach's exported metrics from the Prometheus format, which we've supported as our first monitoring integration to the StatsD service that DCOS provides for us. And then we've started up three Cockroach nodes. And we can actually go over to another tab and see this is the built-in admin UI on one of these nodes. But even though if you can see the address bar, it says localhost. This is actually running in that DCOS cluster. I just set up a SSH port forward as a shortcut there. So a database that's not doing anything is not very interesting. And so I'm going to launch another application just to put some load on it. So this is a simple JSON file specifying a Docker container. This is just a Docker container that we've built for purposes of just generating load on a Cockroach cluster. And so it's going to start up, start the cluster with a parameter pointing to the first node in the cluster. And with one command that is started, and we should be able to watch in just a minute and see the SQL query graph spike up, and there it goes. So I'm going to call your attention to the third graph on this page, the replicas per node. So all the data in CockroachDB is broken up into ranges. Each range is a contiguous chunk of data in the database, usually at 64 megs by default. And so we can see here that the number of replicas on each node is the same. And as the load generator is running, we're seeing that this number is increasing over time as new ranges get split off from the data as it grows. We can see some other variables here. We can see the rate at which the data is growing. But let's go back to here to see the replicas per node. And I'm going to go to the DCOS web interface on my CockroachDB service. And I'm going to go in here and edit it. So here in the environment tab, we have a bunch of variables including node count. So I'm going to change this from three to five, okay? And so now we'll see two more tasks showing up on this list. Once they get started up, we're just waiting a minute for those tasks to start up, and here it goes. So here's the first of the two new nodes, and we should be able to see here in just a minute that a new node appears in this graph. So it didn't refresh automatically, but I had to manually refresh it to get it to show up. And so now you can see that the replicas per node is going down because the new node is available and it's rebalancing on to these other nodes. And the SQL queries is also going down, that's not good. The joys of live demos, I think this is actually a UI bug. It's misrendering the last data point as a zero. But anyway, you can see here that over the course of a couple of data points that the number of, there were 12 replicas per node when there were three nodes. And then once it got up to, once the additional node started up, that the other replicas got rebalanced on to the new nodes. And the number of replicas per node is going down. So that's an example of how easy it is to run Cockroach on DCOS. And so now back to the presentation. So I'm going to tell you about the current status of Cockroach DB. Current version, which was just released yesterday, is 1.0.6. Our first production ready release was in May of this year. So we've been doing six, this is our six patch release since then. And our 1.0 version was provided all the core benefits of Cockroach DB. The distributed SQL, multi-active availability. It also comes in both an open source and an enterprise edition. And in the enterprise edition, the first feature we have there is distributed an incremental backup and restore. We do have a backup option for the open source edition using kind of a SQL dump format, which produces a file full of insert statements that can be used to recreate your data. So you do have a backup option in the free edition. But the enterprise edition has an implementation of backup and restore that is much faster on both the backup and restore sides. And then very soon, probably within a month, we'll have version 1.1. The key theme of this next release is what we're calling ruggedization. Which is just trying to make the database more robust in production. Giving database operators the tools that they need to inspect the cluster, see what's running, what queries are taking a long time, how to adjust queries to improve their performance and cancel long running queries, things like that. Of course, every Cockroach DB release includes ongoing work on performance and SQL feature coverage and bug fixing all of the usual things. So most of the new features in 1.1 are related to operational things. They're more administrative tools than sort of high profile features. But we do have one big one, which is a fast CSV importer, which uses the same framework as the enterprise backup and restore functionality, but we're making this available in the free version of the product. Because we know that it's important for everyone to be able to get their data into Cockroach DB to be able to even start trying it out. In the longer term, this is mainly looking at features that are slated for our next release, which we slated for next spring. Of course, the ever present performance and SQL feature categories. Our top requested SQL feature is going to be coming in this next release, which is JSON column types inspired by the Postgres JSON v format. We're also building change data capture so that you can get a table or database level log of all changes emitted out to Kafka, so that you can stream that data into other data storage systems for later analysis. And we're working on improving our support for global data architecture and especially giving you better tools for managing a cluster spread out over a large area with high latency network links. Part of this is going to be in the form of a new enterprise feature for row level data partitioning, which gives an admin control over data placement at a sub table level, as opposed to the table granularity controls that you get in the open source edition. So that's pretty much it. So here are a bunch of links for where you can find out more about us. Our main website is cockroachlabs.com. All of our source code is on GitHub at cockroachdb slash cockroach. If you want instructions for running cockroach on DCOS, the best document for that is currently on github.com slash DCOS slash examples. There's a link on that page for cockroachdb. And we're also active on the Gitter chat system. So if you wanna come chat with us in real time, then that would be the best place to do it. And that I'd be happy to answer any questions you have. Yes, right. So the question is about how sensitive cockroachdb is to clock synchronization problems. This was something that Kyle Kingsbury referred to in his in his Jepson report on analyzing the consistency of cockroachdb. So this is something that we're of course paying attention to very closely. We do most of our testing on the public cloud platforms. And we find that in general, NTP works very well. So as long as you run NTP on your nodes, then this hasn't really been a problem in practice. We see sub 10 millisecond clock offsets. And the default configuration for cockroachdb is to have a for the nodes to crash and die if they detect a clock offset of 250 milliseconds. And then to actually have a consistency problem, you would need a clock offset of half a second. And we run in this configuration on all the major virtualized cloud platforms. And we don't really have any trouble keeping clock offsets well 50 times below the maximum limit there. So this is something that you do have to watch out for. You've got to be sure that you are running NTP. You can't just count on the cloud platform doing it for you. But as long as you're running NTP, this has not proven to be a problem in practice. So how does the performance compare to other SQL-based databases? Well, it's a tricky question to answer, because it's tricky to get an apples to apples comparison. And so one way to look at it is to just compare single node performance. So a single node of cockroach compared to a single node of Postgres, for example, there's no fundamental reason why those are different sorts of endurance. And so you can get a baseline number there. And for most operations, we're within a factor of two of Postgres single node performance. As for the impact of synchronous replication, this is going to depend on both the way out of your nodes in terms of geographic distribution and the distribution of your query traffic. If your queries are well distributed across the key space, then you can get a lot of parallelization even though the latency is high. And so that also helps mitigate it. But if you have a lot of contention in your queries and they're all hitting the same key, then your performance is going to suffer in proportion to the latency between your nodes. And so for the most part, we do recommend that unless you really need a globe-spanning architecture that you probably want to keep three availability zones in one region or the equivalent terminology across other hosting providers. Because you are going to be paying that latency hit on all of your writes to go across to the other replicas for your reads, of course. That reads don't have to go through the consensus layer. And so you have to talk to the leader of a range, but that doesn't necessarily mean going across the network if the leader is local. And we do, when query patterns allow us to optimize things by assigning a leader in the place where the query is coming from, we take advantage of that so that you can have as good a performance as possible. Yeah, so the question is, do we get, when you run sort of complicated SQL queries with joins and where clauses and things like that, do you get good performance in the distributed environment? And that's a complicated question to answer. But we do generally get good results in terms of being able to split up a query in such a way that it can run as efficiently as possible across these clusters. So the biggest limitation for that sort of thing right now is that the query planner is kind of stupid and it doesn't know a lot about query, that the table statistics how to take advantage of the fact, like it doesn't know how to take advantage of the fact that maybe one table is a lot smaller than the other. And so sometimes it will produce a very inefficient query plan for joins and you have to kind of hold its hand and tell it exactly how to join things together. But yeah, we have a pretty good ability to take queries that are sort of well indexed and turn those into efficient distributed query plans. Yes. No, we don't currently support the geometry or spatial indexing formats that Postgres does. No user defined types either. The best way to think of our SQL support at this point is that we implement a very large fraction of the common subset of SQL across all major databases. We don't support a lot of things that are unique to any particular database. Yeah, so we have made a lot of improvements in our join support over the last six to nine months. So we did a blog post about a year ago, I think, which is probably what you're thinking of where we talked about our initial version of joins being kind of limited. I think that we've made improvements in a number of areas. For one thing, we implement merged joins now and not just hash joins. The query planner has gotten smarter. We support temporary on-disk spooling of intermediate results, so it doesn't have to all fit in memory. And so yeah, we've made a lot of improvements in our ability to handle joins. I think that we're still not great at dealing with joins that handle very large amounts of data, but if the indexes in your tables are such that the join can be satisfied without having to do a big cross join that multiplies out a lot of data, then I think it works pretty well. Yeah, so can we take advantage of operator-supplied information to help optimize data placements? Like to put some parts of the data on fast storage and some on slower storage? Yes, so that's currently fairly coarse-grained. You can configure these things at the table level, and so you can have a table that's stored on SSD and another that's stored on spinning-disk. In 1.2, with the row-level partitioning enterprise feature that I talked about, we want to be able to let you specify that on a subtable granularity, which could include doing things like designating a timestamp column as your region key and then say that you'd have a cron job shifting boundary so that data would age out into cheaper storage over time. Thank you very much.