 Hello? Oh, there we go. So I'm Evan Chonin. I lead the development of FoundationDB at Apple, and I'm super excited to see all the new faces here today. I'm gonna spend the next little bit of time taking you through a lot of the improvements that have happened over the past year since the previous summit and give you a taste of some of the exciting projects that are being worked on right now that are coming at you pretty soon. So what happened in 2019? Well, we had the 6-1 release and the 6-2 releases, and in between there was nine different patch releases. All these releases combined added up to over 200 release notes, which means over 200 meaningful ways that FoundationDB was improved, and in total over 3,000 commits. Looking through those release notes to prepare for this summit was a challenge to condense into this time slot, but there was a few themes. I talked at the last summit about multi-region replication and gave a presentation on that topic, and this year, as we've pushed that feature to production and our production use of it, a lot of the improvements have been hardening and making that as perform as possible. We've also done a whole lot of work on FoundationDB's data distribution algorithm. So the data distribution algorithm at FoundationDB is basically what's dividing your data across all of the storage nodes in the system. So it's basically breaking up key space into movable chunks of key ranges, and it's dividing that data across, you know, this server holds the set of keys and so forth. So we've improved it in two ways. One of the major goals of the distribution algorithm is keeping load even across all of the servers. And looking at the 6.0 release, there was a number of ways at which you could get a lot of right, hot bandwidth into a certain small number of servers. So we've done a lot of work to really balance better the right load across the servers. We've also done a lot of work in creating the reliability. So when a server failure fails, this algorithm is going to redistribute those keys to elsewhere. And so we've done a whole lot of work basically spreading your data in a way that's safer so that if you lose, if you're triple-replicated and you lose three machines, we want to minimize the chance that you actually lose data. And we have. So those two topics maybe cover 10 of the release notes. So I got 190 left for you guys. The rest goes into this final category. Well, basically we've made the database better in every way. Database can get better. I'll just hit on one that I'm particularly proud of. So if we're looking at the scalability limits in 6.0, we could go to around 200 disk clusters. It's a little fuzzy. It depends on your exact deployment. Through the few releases, the 6.1 and 6.2, we've more than doubled that limit. And the improvements come in two forms. So one of the reasons that there's a scalability limit was related to recovery times, which as you guys are all familiar with and one of the pain points of FoundationDB, when a machine dies, you're going to have to do a recovery to recover from that. And that recovery is going to scale or did scale with the amount of data in the database because the amount of data in the database governed how much metadata, like that shard map I was just talking to you about data distribution, like how big that thing is. And so basically we've done a lot of optimizations to figure out how we can recover that metadata to make recovery as fast as possible. Let's scale more. The other scaling limit is related to the cluster controller, which is basically owning the membership of all the processes in the cluster. So basically when you start up a new FDB server instance, it's going to join the cluster controller and the cluster controller is going to tell it, you do that job. You're a storage area or transaction server. And so that guy, that cluster controller could get overloaded with CPU. And so we've done a number of different improvements to reduce the amount of CPU load on that cluster controller, which has allowed our scalability limit to go a lot higher. So maybe we're at 191 now. 192. Or left, I guess. So the starting, I obviously don't have time to keep going through all of the different improvements. So I thought it would be fun instead of hitting all of the big highlights to take a deep dive into maybe just one or two of the release notes. So if we go to everyone's favorite section of our release notes, the 6.2 other changes section, I'm sure you've all combed over in great detail. Some were buried in there. You'll see this guy, causal read risky has been enhanced to further reduce the chance of causally inconsistent reads. And I'm sure all of you guys are wondering, well, what the heck is causal read risky? So to explain that, we got to go back to my presentation from last year. You guys couldn't escape the boxes. I couldn't do it. So when I was explaining how read versions worked in FoundationDB last year, I didn't really tell the full story. So basically the way I explained it last year was that a client sends a regress to one of the proxies. That proxy sends, like, ask the other proxies that they've seen a bigger version. All of that is aggregated together and you send the biggest number back to the client. Seems simple enough. However, when there's a failure, like a transaction log dies and there's a recovery, you're going to recruit an entirely new set of all of these different roles, and that includes the proxies. And because proxies are stateless, there's nothing that's going to prevent a client from talking to the old set of proxies and getting a read version. So you could be happily accepting commits on a new generation of logs but still giving out versions to this old version from the previous generation. So to handle this edge case, basically, every time that you talk to one of the proxies and ask for a read version, instead of just talking to the other proxies, you're also talking to the transaction logs to ask, are you still alive? Are you still accepting commits? Are you part of the latest generation? And with multi-region replication, with multi-AZ revelation, like if you're using 3Data Hall, these transaction logs might not be in the same locations as the proxies. So talking to them, in general, if they're all collocated, it's cheap, but if they're farther away, you're gonna start to see bigger and bigger GRV latencies, got read version latencies. So basically, that takes it, so causal read risky is basically saying, I don't care about talking to those transaction logs, I'm just gonna assume that they're, like I either don't care about seeing a potentially stale read version in this very rare and weird scenario where your proxies are partitioned from the rest of the system. So that's useful by itself, but then if we go back to this line now, that might make more sense. So causal read risky has been enhanced to further reduce the chance of causing consistent reads. So basically, what we did to improve causal read risky is we put a minimum time on master recoveries. So basically, every recovery is now guaranteed to take at least 80 milliseconds. And this gives the proxies the freedom to say, well, I know as long as I've talked to the transaction logs within the last 80 milliseconds that I know a recovery hasn't completed in time. So then I can just trust the results and don't have to do this extra hop to those transaction logs. So as every good distributed database engineer, we don't trust clocks because there is still the chance that the master measures time differently than the proxies. So it's still an option. It's not enabled by default and you still have to turn this on, and it's safe enough that we're using it. So that takes you into just one line of the release notes. By the way, if we look a few lines up from the causal read risky, there's another kind of interesting thing here. You can set the amount of memory for the page cache of the storage engine. The Snowflake team has successfully halved the amount of IOPS to disk they're doing by increasing the page cache size. So maybe if you're having trouble with the use costs and the amount of disk work you do, you might want to increase the memory you give to your proxies. Okay, bonus features. I threw this in here because when I was going through causal read risky, it sort of got my brain churning. What other things do we do at Apple that might be interesting to the community? And I actually found a few things that we do that might be interesting to you guys. So it's a little off theme, but I figured I'd throw it in here. So one of them is a problem that probably everybody has here, developing layers, and something goes wrong like you have a hot key, you're like, well, what is hot? It's kind of useful to know that when you're trying to debug problems. So we basically have a tool from the CLI that you can turn on client sampling. Basically every transaction a client does has a chance of being stored back in the database itself. So you can enable it with FDB CLI profile client and turn on a rate to sample with, and then I've linked to basically an analyzer that we've committed to the repo that can look through this sample of transactions that have happened recently and find hot keys and heavily conflicting keys. So this guy hopefully can help you guys solve problems more quickly. I have one other one of these, and that's basically hidden in the depths of our like dash dash dev help on FDB server. You'll see that there's this consistency check option. So what this is going to do is have a process that's literally scanning the key space and comparing all the replicas of the data. And basically it's saying let me read for this shard and check out all three replicas are they all the same? And so FoundationDB itself, that's our whole job is making sure it's consistent. But FoundationDB is built on top of disks and disks can have problems. So a lot of those problems are found by check sums but there are some that aren't. So a disk doesn't violates an F-sync and just doesn't actually sync it and you lose some data. It's possible that you'll end up with a valid check sum but an invalid set of data. So something we do is for all of our clusters we leave one of these consistency check processes on slowly scanning through our data kind of on a monthly basis. It also has a side benefit that it reads cold data. So if you have really cold data in your FoundationDB database, it often could be subject to corruption or types of disc corruption. Basically when you haven't read something a long time it can get corrupted. So by reading it every month you're basically going to protect yourself also from that. Okay. So back to the presentation. That was sort of my best attempt at covering the last year. So let's look at what's some exciting projects that are happening right now. I already mentioned a little bit about our scaling increases improvements from 6.0 to 6.2 and we're working on it and we're aiming to go about three times larger. We really want really big clusters because basically the size of an individual FoundationDB cluster is going to determine how big of a key space you can use asset transactions over. As soon as you get to the scaling limits and you're forced to shard your database it's a problem and it's a whole lot of engineering effort for you guys. So we're doing our very best to keep pushing this limit higher and higher. A key improvement that we've started work on is a stable wire protocol. One really annoying aspect of using FoundationDB that I think all you guys can relate to is the fact that the client has to be upgraded before the server. So this pairwise process of kind of putting the new binaries on it for the clients and then upgrading the servers is very hard to get right and it's a very delicate dance. So we're using a we're starting work on a GRPC protocol for talking between clients and servers and we'll re-implement our binings to take advantage of that. Another key area for improvement is on our backup and restore and anyone who's using backup and restore which I hope is most of you probably don't realize how much it's costing you. So backup for FoundationDB the way it's implemented right now is every time you write to the database we're actually saving a separate copy of what you wrote into the storage servers themselves at the very bottom of the database. So it's effectively doubling the amount of writes that are going to your disks just because you have backup enabled. Well we have these T logs and they happen to have a change feed of all of the data already in a nice append-only log format. So we're currently doing work to basically have the mutation shipped straight from those transaction logs out to the storage server which will just be a straightforward doubling of your write bandwidth. I mean it's going to be really, really helpful. Also I talked a little bit about scaling earlier and one of the things that doesn't scale right now is our restore. So as you add more and more data into the cluster your store times are getting larger and larger and basically there's a 100 megabyte limit right now on how fast clusters can restore no matter how big it is. So we're breaking through that bottleneck and basically allowing the store to scale at speeds as fast as you know as you can to restore. The Snowflake team has also done a lot of work on snapshot backups for disks like or places like EBS that support it. And so if you're in an environment that supports snapshotting you might want to check out that work. I think it's in 6.2 however the documentation isn't there yet so that's coming from the Snowflake guys. Okay the next cool feature, too many cool features is query push down. So this one is basically allowing a more sophisticated logic to happen on the storage servers. A very simple example that's very easy to understand is right now if you wanted to count all of the keys in a given key space in FoundationDB your only option and only way to do it is to scan all of the data sending every single result back up to the client only to throw it away and just count the number of them. So having the ability to say with your get range I don't care about the actual keys just give me the count of them is going to be very powerful and there's a whole like we're starting we're just starting work on what other operations we want to support with this so if you have some opinions like jump in the discussion on the forms. So the last two things we're going to talk about are basically related to features that are being talked about today so I'm not going to go into as much detail and leave it to those presenters but I just wanted to highlight them because they're so critical for our future and the first is new storage engines so the breakout sessions right after this is FDB internals and each of these two redwood and these two storage engines will be talked about but the FoundationDB is effectively scaling up an individual single key value store to like participate and have a lot of them participate and act as one big key value store and but also ultimately the performance of our whole system is going to be dependent on how fast that one key value store is so that's why this work on the storage engines is so critical for getting great performance out of our system. Redwood is in pre-release right now and we've already seen some really good numbers for performance and you'll hear more from Steve later. Wavefront has been developing a radix tree based memory storage engine so anyone using the memory storage engine tune in for Mung Run's talk later today also and finally another problem that is very hard to work with in FoundationDB is read hotspots they are very easy to trick yourself into thinking you've distributed data when in fact there's some key that you're reading a whole bunch of a lot so we want to make our system be able to scale up reads on small key ranges a lot more fluently and so we're going to do this with some a new role that's going to provide native consistent caching basically we can detect hot key ranges and give some stateless processes and they can serve reads basically taking load away from the storage rivers that are responsible for that hot range you can hear more about that during the lightning talks so that is all I have for you guys I hope this was informative we are now at a break and then following the break we'll split up for the case studies in FDB internals thank you guys