 I'm Alec from the Layers team at the Apple Foundation DB team, and I'm going to be talking about a topic in effective client design. So you may think that if you're designing client applications on top of Foundation DB, that because you have transactions, concurrency is an easy problem. In some sense, it is. For one thing, if you use transactions correctly, you can be pretty sure that the correctness of your system is maintained in the presence of concurrent actors. The issue though is that correctness isn't the only thing. You also have to worry about performance, and minimizing conflicts in your workload can be one of the harder things that a layer developer or a client developer has to consider when they're trying to design their application or data model. So I want to go into a few techniques on how you might minimize conflicts and design an effective data model. So first, going into what a conflict is. So Evan talked a little bit about this in his talk, but a conflict is when you have two transactions that are trying to modify the same data at once. In particular, in Foundation DB, every time you do a read, the client without you even having to do anything, will record what ranges of keys you read, and will add this to a set of read conflict ranges that it keeps in memory. Likewise, whenever you do a write, it modifies a separate set of write conflicts, and then when it goes to commit your transaction, it submits the read conflict set and the write conflict set along with any mutations. It's these conflict ranges that the resolvers use in order to determine which transactions need to be failed. So what happens if you get a conflict? Well, typically, you're going to retry, and what that means is that you have two performance problems that you'll run into. One is that each time you do an unsuccessful attempt at committing something, you'll end up wasting resources on the cluster. This means that if you have a lot of conflicting things, you can end up decreasing the total throughput of your system because a lot of it being used to handle these things that don't end up doing any work. They don't get committed. So that's a problem. The other problem is that you have an increase in observed client latency. Then in particular, if every time you retry, if you get a request from a user, every time you retry, that's another round of requests to the database that you usually will have to wait through. So your observed client latency is going to be higher. So that's pretty bad if you're both decreasing your throughput and increasing your latency. So what can we do about it? So I'm going to outline three separate techniques that a client might use and they are using atomic operations using snapshot reads and using version stamp operations. I'll go into what all of these are, and then I'll go into a motivating use case where you can see them in action. So atomic operations. Atomic operations are an API that foundation to be exposes that allow you to push down work to a storage server. Typically, all of the atomic operations could be rewritten as the user requesting a key from the storage server, getting it back, modifying it, and writing it back. So for example, if you wanted to add one to a key, you read the key, you add one to its value, and you submit a commit that overwrites that value. Of course, the problem is that if two people try and do this at once, one of them has to fail. So the alternative is using an atomic operation, you basically send a commit to the storage server that just says, add one to the key, whatever it is, and you don't actually do the read, you just let the storage server handle that for you. One warning with this is that if you imagine a particularly bad data model, for example, that reads a single key and updates it every single transaction without atomic ops, then you'll end up serializing all of your operations and that's bad. But with atomic ops, what you'll end up doing is just slaming the storage servers that are responsible for that key. So you can turn your one set of performance pathologies, the serialized operations into another, just hotspots. So they're not a panacea and you have to be a little bit careful. The second technique is called snapshot reads. So like I said before, whenever you do a read in FoundationDB, the client is automatically adding a read conflict range. Well, snapshot reads don't add that range. They say, I know what I'm doing, don't add a read conflict range to the key I'm reading, just do the reads and give me the values. So the idea here is that you might do some reads speculatively. So for example, let's say that you had five jobs and you wanted to pick one of them, then what you could do is read all five keys at snapshot isolation level and then whichever one you actually end up picking to process in that transaction, you call a method called add read conflict range or add read conflict key to add a conflict range just to the key that you actually use to determine what your operation and then carry on. Then if two people come in at once and they pick different keys from the five keys that end up getting modified then they can both coexist and that's great. The other thing you can do with snapshot reads is if you're modifying a key using atomic ops, because that key will be pretty hot. If you want to include or do something based on the value of that key within your transaction, you can read it at snapshot isolation level and not be killed by somebody else modifying it. The warning here is that conflict ranges are how FDB guarantees serializability. So if you are a little bit too clever with getting rid of conflict ranges, you could end up committing things that you shouldn't. Then you can have interesting sessions trying to debug what's going on in prod. Then the final thing is version stamps. Ryan already mentioned version stamps in his presentation, but version stamps essentially let you get a 10 byte monotonically increasing value from the database at commit time. An overwrite parts of your key or parts of your value with that 10 byte monotonically increasing thing. FoundationDB guarantees that within a cluster, that value is unique and that value always goes up in time. This allows you to do things like handle cues in a very high contention way because there are no read conflicts at all, and everything is handled by the cluster putting things in commit order. A couple of warnings here. One, version stamps are inherently non-item potent. If you end up retrying a transaction, you will get a different key guaranteed the second time. This is the exact opposite of idempotency, and also the values aren't valid across clusters. You can't correlate exactly versions you get from one cluster with versions you get another or from another, and that includes doing things like if you have to restore into a new cluster, you're not guaranteed that the version stamps that you get back will make any sense. But they can be very powerful in certain situations. To talk about these things, I want to go through a case study, and the case study is the sync problem. What is sync? Sync or synchronization is a process where you might have multiple clients who want to synchronize on some set of values. In this case, we're going to use a mathematical set, but you could imagine synchronizing a map or synchronizing something more interesting. It's going to have a pretty simple API. You're going to be able to insert things into your sync machine and you're going to be able to get things from your sync machine, and you're going to be able to pass it a token from the last time you read, and so you only get updates. You're tailing updates continuously. So we're going to start with a pretty simple approach, and we're not going to worry about concurrency at first. We're just going to keep a key. That key is going to have the maximum token we've seen so far, and we're also going to keep an index of items in our sync machine indexed by this token value, and then we just sync by doing a scan. So for example, here's a simple sync machine. Inside the cluster, we have five items, the max token is four, the items are zero through four, or index zero through four. So in order to insert something, our client will just read the max token, it gets back four as its max token, and then it will commit a new transaction. You can see in its read set it has the max token, and in its write set it has the key that it's writing to five as well as the max token, and so when it inserts for Joe into the database because it's the only thing going on, it gets added to the end, and the max token gets updated atomically. Likewise, a second client is trying to do a sync, so it'll sync from three, so it'll start scanning from four, the one after it got its last token from, and it'll get back two results back, elderberry and fijoa. Okay, here's the problem case, right? So you have two clients trying to insert it once, one is trying to insert fijoa, one is trying to insert fig, client one reads the max token, it gets back four, client two reads the max token, it also gets back four. Now client one writes five to the database, overwrites to key five, it succeeds because it was first, but now client two. Client two tries to do it right, and it's trying to write the exact same key, it's trying to write five, and importantly it's basing the key that it's choosing on the value of max token, but max token has changed since it began its transaction, so the resolvers will fail it. All right, so this design actually kind of will work with low concurrency workloads, and so if you have a few enough people trying to interact with a sync machine at once, then you might not have a problem, but eventually you'll end up being, oh, I see what it's doing. Oh, this has a volume control, and every time I accidentally hit volume, it like complains about I'm already at min volume, whatever. Yes, if you have too many clients at once coming in, you'll eventually be upset about having to answer client questions about how come they get random conflicts all the time, and so you'll decide that you wanna make this more scalable and add the ability to handle more clients at once. And so the problem was that all of our clients based their next key based on the value of max token, and they all choose, chose the same value based on the max token, they all added one, and so we're gonna try and approach where we're going to try and relax the reliance on max token, so we're no longer gonna do that read at full isolation, we're gonna do that at snapshot isolation level, and then we're also going to have clients try and pick different values when they insert into the sync index through randomness. So like I said, just add a random value to the value you get for max token, and then write that element using the new value, and then we'll update the max token accordingly with the value we chose, and we'll use the max atomic operation in order to have many clients do that at once. So let's see an example, again, of the problematic use case of two clients trying to insert at once. This time you'll notice there are gaps in between the elements of the sync index, and that's because we're adding a random value each time, so instead of incurring nicely by one, we get gaps, and so the first client will try and read the max token, it gets back 20, and let's say the second client tries to read the max token, it also gets back 20, and then let's say client one generates a random number and it generates a very random number seven, and it writes back to key 27. So if you look at the read set of this transaction, notice that max token is not within it. That is because we read max token at snapshot isolation level, but we are writing to key 27, and also we're adding key 27 to our read set, and the reason we're doing that is that if our second client happens to choose seven as well and tries to hit to that same key, we want our conflicts to save us. If we had made the mistake of not including that, we would have had those bad production problems I mentioned earlier. And so client one succeeds and writes for Joe at a position 27, client two, by amazing coincidence, didn't pick seven, and this time tries to write fig to position 22, and because it doesn't have any, max token has changed, but it doesn't matter because it's not the reconflict set, it's not writing to the same key as the other transaction, so it succeeds, and now fig is inserted in position 22 in our sync index. So is this a good solution? Well, in some sense we haven't exactly solved the problem in that we still have a bounded probability of conflicts, right, if people pick the same thing, then they get a conflict. So that's not great, but maybe we're okay with that, but there's a deeper problem that has to do with reads. So this is a little bit more involved in his three clients, sorry. So again, we'll have two clients trying to insert it once and one client who's trying to do a sync. So as before, client one will read the max token, we'll have client two read the max token, and then let's say client one tries to commit, it generates a random number of seven again, it inserts a fijoa in position 27, so far so good. Now let's say at this point, client three tries to begin its sync. So it lasts on 19, so it's gonna start reading from 20, it's gonna get two results back, elderberry and fijoa, also so far so good. Now client two, client two let's say rolls its random number generator and gets back two again. Well, this time when we insert something into the database, it's writing a position 22, it has no read conflicts that no keys have been modified since it started that it cares about. And so it will successfully insert fig into the database. But it inserts it at position 22. And the problem there is that because client three has already read through 27, client three will never go back and sync back 22, so essentially you've had a lost right into your system. So how do you fix this problem that some values aren't synced? And as an addendum, this design, this basic idea of adding random numbers or randomness to the keys as you're entering them into the queue, that's not necessarily not a tenable design. For example, certain job queue structures, this would be perfectly fine. If you're doing certain kind of roughly auto incrementing primary key type things, this might be fine. The only problem is that because of our read pattern, we'll lose some updates. And that's the exact problem we're trying to solve. So if we look at this, the problem was we read through a key and then somebody else committed something before us. And so the way we're gonna get around this is by making sure that every time we do a write, it's at the end of everything that anyone has ever committed. And so to solve that, we're going to use version stamps. So we're gonna remove the max token key altogether. We don't need it anymore. And we're just going to begin all of our index keys with the commit version using version stamp operations. And we're gonna depend on the two properties of version stamps that I mentioned before. One, they're monotonic by commit order. This is how we're gonna make sure that our syncs line up correctly. And the other thing we're gonna rely on is the fact that they're unique. This is how we're going to make sure that two clients coming at once don't write to the same place. So here we have our two clients trying to insert into the database. This time it's gonna be relatively simple. Client one, when it tries to insert for Joe, it can do this all in just a single blind write. It has no reads in its read conflict ranges. It has one mutation, namely that it's setting the version stamped key, beginning with a version stamp to Vijoa. And its write conflict set is a little bit complicated. Ask me afterwards if you want to know the details. Let's say that the foundation DB cluster gives it version 500. So when it goes to write into the database, it will overwrite the places with the version stamp with 500 and boom, Vijoa gets added to the end of the database or to the end of our sync machine. Then the client two, it also will submit a very similar looking transaction, but this time let's say the foundation DB database gives it a different version and we know it's going to give it a different version and we know it's gonna give it us a higher versions. Let's say it gives it 520 and fig gets added to the end of the database there. And so you can have multiple clients all adding to the same machine at once, all getting placed at the end happily coexisting. So is this a perfect solution? Well, kind of. In some sense it meets our spec exactly. We have unlimited parallelism all coming in at once to the same place and we have zero conflicts. No, but nothing is perfect, right? So we have a couple of problems. One is that FTB's tuple layer encodes integers very compactly in particular it uses a variable length encoding scheme. So if you have less than 65,000 items in your sync index, you can get by with only two bytes for your token, for each individual item. But versions stamps are 12 bytes long as deployed in many of the bindings. And so you're increasing your space usage sometimes by a lot. Also, versions stamps can make your code significantly more complex. A lot of your client code is gonna be written with the assumption that it can know what keys it's about to write where version stamps just by their very nature you don't know what it's going to write until the very end, until it gets committed so that can make it a little bit more complicated. Especially if you have to do things like remove, multiple updates to the same key, remove something you're going to sync that's a little bit of a hard problem. As mentioned before, this is not an item potent operation. So if you wanna make it item potent, you have to do a little bit of finagling where for example, you might keep a map of item to version stamp and you have to check that map to see if it exists or something like that. And likewise, deletes and updates can be somewhat complicated that when you do a deleterate or an update to something that's in the version stamp thing in order to figure out the key, you need a map like that in order to do it correctly. So in conclusion, there are a couple different strategies you might employ in order when you're analyzing your data model. One is to look at your read, modify, write patterns that you have in your database things, keys that you are updating within the transaction and see if any of those can be replaced with atomic operations. You wanna be careful, be mindful of your read conflicts, think about which read conflicts you don't really need and which ones you can remove. And then a third thing is you have to be careful when you're being clever with conflict ranges, et cetera, that you're not accidentally removing something that you actually do depend on and therefore destroying the correctness of your system. So it's a little bit of a balancing act but it's kind of necessary to get good throughput and good latency with FDB. So thank you very much and happy data modeling.