 We're from Snowflake Computing and today I'm going to be talking about some of our performance testing infrastructure at Snowflake with an emphasis on our use of the TPCC workload. So what is TPCC? TPCC is an industry standard OLTP benchmark used to simulate a realistic workload of a retail business. It consists of five diverse transaction types that have some are read-only, others are read-write. They also vary in terms of complexity and duration. There's also an approximately two-to-one read-write ratio, which is similar to what we experience at Snowflake. So that's why we've relied so heavily, or part of the reason we've relied so heavily on this workload for our performance testing. As you can see from the diagram, a TPCC database consists of a configurable number of warehouses and both the size of the database and the number of concurrent clients scale and proportion to the number of warehouses chosen. So how do we use TPCC? Use it to analyze performance in a number of different ways. We have periodic Jenkins runs to search for unexpected performance regressions in our master branch. We also test the performance effects of enabling new features, such as our new serialization protocol, which Andrew will be discussing next, and also changing different parameters, for example, using different asynchronous IOLibraries, comparing different page cache sizes. We also use it as a baseline workload to ensure that Foundation EB remains available in performance while we perform snapshot backups and rolling upgrades. And we test a variety of cluster configurations with different hardware and different apologies. So what does a typical test setup look like for us? Most of our tests are on a five-node cluster of AWS I3-16X large machines. We always ensure at least one virtual CPU per process, and typically run with triple replication and an SSD engine. And to close to resemble what Snowflake uses, we run most tests with three resolvers, three proxies, and five transaction logs, and a large tester pool on separate machines to ensure we're not constrained by tested resources. So now I want to just briefly talk about one example of an interesting finding that we had while running these tests. So we used to operate under the assumption that scaling the number of storage servers would always result in increased performance, but as you can see, that's not always been the case. NTPCC throughput is measured in terms of transactions permitted and the number of clients scales linearly with the number of warehouses used. So ideally one would expect linear scaling of throughput, and that's what we saw here with 32 storage servers or 64, but in this particular configuration when we went to 128, we started seeing decreased throughput, and that was really surprising, even though we replicated this many times. And what we found is for reasons that we still don't entirely understand and are still investigating, we saw an increase in read latency as we scaled out the number of warehouses and the number of storage servers. And this causes longer transactions, and with the Way FoundationDB handles optimistic concurrency control, longer transactions result in higher conflict probability, and this higher conflict rate resulted in the decrease in throughput. So this is by no means an official benchmark or anything. We can scale beyond this, but it's just an example of some problems that we identified with one particular cluster configuration, and we were lucky to not be surprised by a production workload and be able to debug this in our test environment. In addition to our work with DPCC, we've also done a lot of work scaling out how many processes can we handle in a single cluster. And often we found that the bottleneck there was the cluster controller CPU. So quick refresher, the cluster controller is a process whose responsibilities include handling client connections, running a failure monitor, and handling status requests, and this is largely workload independent. So cluster controller CPU will increase with the total number of connections and the frequency of status requests, and we see here that even in the absence of any workload, as you scale out the number of clients and the number of servers used, you see a steady increase in CPU, and this can be a problem because the cluster controller is single threaded. So these tests and observations led to a series of optimizations to help out with making the cluster controller more efficient. With these optimizations, we were able to handle an additional 150% more status request per second. These optimizations included pre-serialization of status objects and some low level optimizations to the flow task queue. We also cherry picked a change from the open source version to allow processes to run with a cluster controller class. And when you run a process with the cluster controller class, then you're ensuring that under ideal circumstances, that process will be elected cluster controller but won't have any other responsibilities. So if you run without this class, you run the risk that, for example, a hot storage server could also be elected cluster controller and thus be overwhelmed, start missing heartbeats, and eventually that can cause re-elections in downtime. We also improved the way we do throttling so we don't have to make as many status requests to the cluster controller. So some of the metrics that Ashish was talking about, you can get those with a status request. But if your cluster controller is under dress, you might not want to do that. So we pushed a lot of those metrics to the proxies. And then we can get those cheaply for efficient client-side throttling. So where do we plan to go from here? We'd like to continue scaling out our TPCC tests to more closely resemble a production workload. We also hope that now that our implementation has been open sourced, and I actually had the link on a previous slide. Well, we open sourced our implementation written in flow. And we hope that other companies can run similar tests on their own environments because there's a lot of different parameters you can change. We also didn't do any official benchmarking, but we hope that this robust open source implementation can lead the way for doing that in the future. Also another thing to consider, we didn't make a TPCC workload. But if you have, for example, a higher read-write ratio or a more random workload, that's another industry standard benchmark that could be used for similar purposes. Thank you.