 Hi, I'm Alec. I'm on the FoundationDB record layer team, which is one of the open-source layers that we develop on top of the FoundationDB key value store. And I'm here to talk about one of the features that we developed when we ran into a scalability problem when trying to take some of our record stores, which is a unit of data within the record layer, and add more and more requests. And this is somewhat similar. It's in the same space as what Nilu and Shen just talked about, and that it's about when you have too many requests going to the same small number of keys. And it's also, Adam mentioned it in his talk about CouchDB, and so I'll go into a little bit more detail about how that works. So this is unlike Nilu and Shen's work, which is a little bit more general. This is for a very specific use case. And that is one that we think a lot of layers will run into, which is when you have a small amount of metadata that you need to keep transactionally consistent with another piece of data. So, for example, the record layer uses this for the store header, which it associates with every store, and that contains information that can then be used to determine what indexes are defined, for example. And it's important that this be transactionally consistent because if a new index is added and some instances don't know about it, then those lagging instances might get more records to save and forget to index them, and that can lead to consistency problems at the application level. But because you're accessing with every operation, every transaction, that means that the read rate to whatever storage servers are serving that key can be really high. And so that becomes a scalability bottleneck. You can't scale a record store now beyond what the ability of those, like, small number of clusters can do. And so the solution we came up with is a transactional cache and validation key. And so this involved a new FoundationDB Key Value Store feature, the metadata version key, which was added in FDB61. And this key, 255 slash metadata version, has some special properties. The first is that the semantics for setting the key are very restricted. The users only allowed to set it using the version stamp value atomic mutation, which Scott mentioned in his talk earlier today, and I mentioned my talk last year. And some special properties of this key include the fact that every time you update the key, you're guaranteed to get a new value. And so that means that if you transactionally update this key in some other part of your cluster, then you can use changes in this key to detect that there have been changes in the other part of the cluster, although it doesn't tell you what those changes were. The other thing that is special about this key is it's kept in a few special places. Most keys are only stored on storage servers, but this is also stored on a different process called the proxies. And the third property of this key is that it is limited in size to 12 bytes. And that's going to be important later. And so then we used this in the record layer to keep a cache of this metadata and then transactionally, or invalidate it whenever it was updated, whenever the underlying metadata was updated. I'll go into how that works. So here you have a diagram of an FTB cluster and an FTB client. Within the FTB cluster you have some proxies and you have some storage servers and a few other things, but these are the important things for this talk. And within the storage servers we have some metadata, in this case indicating that there are three indexes. Then on the left you have a client and the client currently has a cache and with each cache entry it remembers the metadata version, which the metadata was last updated. And so when the client begins a transaction, it always begins with a get read version request to the proxies. This isn't new, this is something that's been in FTB since the beginning. The new thing is that now when the proxy returns the read version to the client, it also returns this metadata version key. And so because it's being returned with every read version, this is why it's important that it's small in size so that the overhead for returning it is relatively low. In this case, the client can read the key. When it reads the key through the standard get API, instead of going to the storage servers like most reads, it just uses the value it got in the read version response. In this case it can see that the metadata version and the version and the cache are the same. And so it knows it has the right metadata. And so it can continue on without having to consult with the underlying storage what its metadata was. So you've saved a read and you've also decreased the scale of the limits on these storage servers. So then the process of updating the metadata gets somewhat more complicated, not too much more complicated. Now whenever you want to change the metadata, you just also need to include in the same transaction that the metadata version key needs to also be updated. That goes to the proxies just like all commits, the proxies forward that along to the storage servers, and then atomically the metadata version and the metadata can be updated. Now if you have a client that's lagging behind that hasn't gotten the update yet, now when it begins a transaction, it doesn't get read version just as before. It gets back the read version and the metadata version. This time you can see that the metadata version and the version and the cache don't match. It then uses that information to know it needs to go to the underlying storage servers, get the new value of the metadata, and it can update its cache, update its view of the world and continue on. So now this client will know to update all four of the indexes for any records it saves. So there are a few additional considerations. One are some of the complexities of dealing with multiple tenants, and Adam mentioned this in his talk, and then we also had to hit the same problem because the record layer also allows for many, many record stores to live on the same cluster. And so there's some amount of toe stepping is kind of unavoidable because there's only one of these keys in the whole cluster, but we can also play some games of conflict ranges to make sure that the only thing that happens is a few extra cache invalidations. And then there are also some edge cases related to when exactly need to update this key during the life cycle of this cached metadata. And so that all went into our implementation in the record layer. So this is available now, so I encourage you to check out the metadata version key added in FTB 6.1, and you can see the pull request that added it there, and also you can check out the very pithily named metadata version stamp store state cache, which is the thing that adds the capability of the record layer in 2880. Thanks.