 Okay, hi, I'm Scott Gray. I'm an engineer with Apple, thus the dark mode presentation. So I'm here to talk to you about CloudKit, which are CloudKit built on FoundationDB and Record Layer. And specifically, I should probably start with kind of what is CloudKit? CloudKit is resilient, structured cloud storage. You might otherwise know that as a database. The idea behind CloudKit is we provide infrastructure for an application developer to come in, design a schema for their application, deploy that schema to the cloud, and then provide all your standard database functionality. Insert update, delete, query, and really kind of where the bread and butter of it comes in are on notification and sync between, and the synchronization of changes between your devices. So we're not just a place to kind of rest your data in case you drop your phone in the toilet, which I've done, don't ask me how. And, but also to make sure that any change you make on one device has been synchronized to another device. So from a tenancy model, CloudKit takes sort of an inverted view of tenancy from a lot of other database systems do. And that is that for every application that you're running on your device, whether it be your phone or on macOS, every application that uses the cloud is allocated a distinct logical database. So what this means is across the thousands of applications, whether they be the applications that are natively available on your device, or third-party applications written for the cloud, each instance of an application that you're running, you're allocated your own database. So this is made possible thanks to FoundationDB, of course, or I wouldn't be here talking about it, and also by RecordLayer. So RecordLayer is a project that actually came out of CloudKit. It was open-sourced in January of this year. It provides facilities that give you basic relational database semantics, relational-ish. There's a couple places where it might feel a little foreign, but by and large, relational semantics, which of course includes the ability to have a schema, to define record types, to create indexes, and maintain those indexes. It has a stateless execution architecture, which I'm gonna talk about in a bit, particularly on how we leverage that, and also point out there is a more detailed talk specifically on RecordLayer that's gonna happen at 340 p.m. So let's start off with one of the key abstractions of RecordLayer, which is this concept of a record store. So a record store is a logical database. It is defined by an external schema, and as I said, that's because if you think about it, I'm developing, say, the photos application. I have one schema for photos, but I wanna instantiate a billion databases. You locate, a record store is located at a given prefix in the FDB key space, and the entire database is contained within a contiguous key range. That means all of the records, all of the indexes, all of the metadata about the state of the record store is all contained in a contiguous key range, and that becomes very important in a minute. So for example, let's say we took, we decided that our key at which we wish to locate our record store is maybe the first part of the prefix is gonna be the user ID. The second part might be the application that we're talking about. So what this allows us to do is have a series of databases that all belong to this user. They are all contiguously located next to each other, and each one is, of course, completely self-contained. And thus, within our cluster, we can be housing millions of users and many tens of millions of databases. And of course, when we wanna scale out, it's simply a matter of adding a directory service which basically keeps track of which cluster owns a particular user's data, and every request gets routed to the cluster. We know because of how we've organized our keys, we know exactly where the database for that application lives. Taking it out a step further, of course, scale out becomes easy. Every time we feel that we're reaching capacity, you can throw another server in, new accounts will be allocated to this. And furthermore, so we have some machinery that sits at the bottom, it's physically at the bottom of everything. So we have this machinery that's keeping an eye on things. So how is the workload distributed across the cluster? If it sees an imbalance, so now here's where the contiguous key range comes in. So if it sees an imbalance, all we need to do is to be able to pick up these ranges of keys and begin shifting and moving them around to rebalance the load across the clusters. Of course, in this example, I've kind of lumped, our unit of work is an individual user with its contiguous set of databases, but of course, that's kind of arbitrary. You could just as easily decided that every application is gonna live in a different set of clusters. So the other major attribute that I mentioned about record layer is stateless compute. So what does that really mean? So what this means is that every server can handle any given request. And a great deal of care went into record layer to make sure that the act of physically opening a database or physically opening a record store and beginning work on it is extremely fast. So it can be done in milliseconds. So what this means is every incoming request, these servers are just sitting out here being arbitrarily handled requests. They open the database, do their work, close it, and then they're done. Furthermore, all of the requests that we currently support are stateless and streaming. And what that means is currently, none of the requests that we support will currently do any in-memory work. That means no in-memory sorting, no in-memory joins, nothing that spills to disk, like an in-memory sort. Instead, we rely very, very heavily on record layers, extensive library of indexes to support these operations. So record layer has text indexes. We have indexes that support aggregates and indexes that support joins. But as a result, kind of part of this kind of streaming and stateless architecture is in terms of memory, every request is very well bounded. This goes all the way to kind of larger, you know, larger high-scale query operations. So for example, any query that's being executed on the server can be stopped at any point. And what that means is it may have made some progress but may not have finished. And what happens is at that point that it gets stopped, we return the result set along with this opaque continuation, which is really just an encapsulation of the state of the query. At any point, the client can then return with this continuation and pass it back in. It gets routed to an arbitrary server in the cluster, which can continue the query, proceeding, handing back results back and forth until the operation is completed. As you saw, the continuation is valid across servers. It's even valid where you, to say, bounce the entire system between, we don't, but where you to bounce the entire system between these continuations, it wouldn't matter. As long as there's a server to process it, it will continue the work. So where this comes into play, when you pull all of this together, is if you think about a request, what is the cost of a request? A standard, like an insert, update, delete operation, they're all a fairly finite, fixed, well-defined unit of work. Well, what we've just done is we have now broken a query into a well-defined unit of work. So what we can do is we can place limits. This is another facility of record layer. We can place limits on how much work a given portion of a query is allowed to consume, whether that be on the number of records that have been read, the number of bytes that have been read, the amount of time that's passed since the query began. So now we have a well-defined discrete unit of work, and so now what that allows us to do is the entire rest of the system can think about resource management in terms of just rate of requests. The amount of work being done by every request is fairly fixed, so now we can just simply apply rules on how fast may an individual user come back with requests, how fast can an individual application process requests, we can have all these rules that maintain order, that maintain stability across the system simply by rejecting, if we see that a request is exceeding the rate for which it is allowed, we can go tell the client, in this case, a device like a phone, go away and come back in just a little bit, give us some time. Next, I'm gonna dive in a little bit into how CloudKit uses FoundationDB and Record Layer for indexing. Originally, CloudKit relied on mostly non-transactional indexing, kind of external indexing services, and I'll kind of drill into that in a little bit, but as most of you probably know, by having an external non-transactional system, you get lots of strange idiosyncrasies when it comes to handling requests against these indexes. FoundationDB, of course, gives us transactions, transactions give us fully consistent indexes, and I'm gonna dive into a couple of the key indexes that we utilize today. One is what we call our sync index. So as I said, syncing is really kind of the bread and butter, it's one of the things that makes CloudKit special and particularly useful. This sync is implemented via an index, the index itself is implemented via version stamp, and I know several people have discussed a version stamp previous to this, but effectively what version stamp is, it's FoundationDB provides the ability to insert a value with just a placeholder, and what the placeholder says is when you commit, I want you to replace this placeholder with your commit version, and the commit version is really, really useful because it's monotonically increasing, monotonically increasing, and it's unique. The other feature that we leverage when building this index is, you remember I mentioned that record layer is kind of relational-ish. One of the interesting things that it's capable of doing is defining indexes that span record types. For example, theoretically you could create an index on the field first name, and regardless of the record type, I wanna, if you've got a field called first name, I'm gonna index it, if I say give me all the records that have a first name of Scott, I will get all the records of all the record types for which there is a first name of Scott. So by leveraging this universal index feature, what we can do is we can build an index that spans say like the modification time or more importantly, because it spans the commit version, which really is a modification time, of all the records. So let's talk about this in action, but before I do, I wanna go into a little bit about how we used to do it and kind of maybe the naive way, and actually we used to do it before version stamp was introduced, was our old index was built on this concept of just a change token. So every record you insert is just, we give it a unique number, a unique increasing number called change token. So along comes a client, they need to insert their lemon, because who doesn't need to insert their lemon? The first thing they do is they go read the next, the maximum token we have, they see that it's four, they increment it and voila, we have an ordered commit, maybe we had another device out there that the last time they looked at their fruit, they were at two, so they knew all about their grapes and they're asking for, what fruit has been added since I last looked? And voila, a banana, a strawberry and a lemon. I know my fruit really well. So works fine, the problem is it serializes all modifications. So for example, let's say we got two people come in at the same time, we've got your fruit, we've got your lemon, you've got your pear and at the same time, they both read the max version, they both get four, they both write at five and the first one that commits wins, the second one that commits gets a conflict. No problem, you simply retry the transaction and it works. So this actually served us really, really well because if you think about our usage model, you have a database per application, so what is the actual concurrency in that database? It's fairly low, it's like if you have two devices, it's the odds of those two devices trying to write the same set of key or trying to write or do a modification concurrently, but as CloudKit grows and evolves we're handling larger and larger use cases, more concurrency on devices, more concurrency on users and clients and this just doesn't scale. So everything is better with version stamp. So now what happens when we bring in the idea of version stamp? So now when our two clients come in and they decide they wanna insert their respective fruits, what their insert becomes is insert a key of version stamp which like I said, is this opaque thing and your fruit and then at the time one of the transactions is committed, they are assigned the commit version that FDB assigns, it's guaranteed to be unique, it's guaranteed to be monotonically increasing and thus we have a consistently ordered, a consistently ordered index of modifications that's also free from conflicts. There are some really interesting aspects about this, I'll highlight some of the interesting challenges but I can't really go into it in a 20 minute talk. So first is how do you implement deletes, how do you handle deletes and updates in this world? Because you don't wanna remove the record, you would like to maintain a forward view of changes. How do you move data between clusters? So that commit version is unique per cluster. So if you just, that machinery I showed moving users from one cluster to another, if you just move that data as is, you get this interesting problem where the commit version on the target cluster might be logically before the data you just moved. So if the user begins doing new inserts, they will actually be placing data earlier in the index than at the tail of the index, so how do you deal with that? And further we have this old implementation and we needed to switch this new implementation without shutting everything down. So how do you do this change without disrupting the client behavior? So all of this is in the record layer paper, I'll be talking, I'll have a link to that at the end of the presentation. But next I just wanted to move on to text indexes. So I mentioned, we had our indexing used to rely on external indexing services like solar. This has the problem that there is some lag, so that when you write a record, you might have a slight delay between the record, when the record's available in the index, so for a brief period, you may not get results, or even worse, you might end up out of sync. So if something goes wrong between the time you wrote the record and the time you indexed it, you may not get an index entry at all. Obviously, foundation DB, transactions, everything's consistent. But in order to do that, we needed a text index. So one of the features that was added to record layer over, I guess the last year, was the presence of this text index, which is, again, fully transactional, low overhead, and it's also interesting to think of it as sort of like a personalized index. So typically with a text indexing scheme, you'll have a big indexing service sitting out there, managing the world. In this environment, again, a record store is completely self-contained, which means you have your text index totally self-contained alongside the data. So it's fairly inexpensive. It moves with the data. It's managed with all the rest of the data. It's super convenient. And the last thing I'll touch on is just kind of like what our experience has been with foundation DB. Obviously, foundation DB was open sourced out of Apple. We didn't open source it because we're through with it. We open sourced it because it's great and you should use it. Obviously, it's the main development focus of it is testability and simulation. Because of this, it tends not to break. Bugs tend to get caught during development and not deployment. There's extensive testing of kind of cross version upgrades. What happens when you're upgrading this thing live as making sure that the old version and the new version are fully compatible. And CloudKit is built on a bunch of different technologies, a bunch of different services. And it is a tremendous testament to the quality of testing in foundation DB that we can take a release of foundation DB, have it rolled into production in a matter of weeks rather than a matter of years of testing and fixing problems. And you can have a great deal of confidence that it will just work. Record layer and foundation DB are awesome. I'm not just saying that because it came out of my team. I highly encourage everybody to use it if you haven't, if you aren't already. We have a number of papers. So actually I only list one of them here but this is the one very specific about record layer. As I said, at 340 there's a talk from another member of my team, Nicholas. I highly encourage you to go check that out and please get involved. We get involved with foundation DB, get involved with record layer. We would love to see the support from the community on this. And that's it. Thank you. Thank you.