 So right now, I'm going to tell you about using some data from Cortex to improve high availability in the PROM scale. So this is the recap. Prometheus high availability works by just deploying two identical Prometheus servers scraping the same endpoint and storing almost the same data, right? Ben mentioned that the time stamps might not quite align, but it's close enough, which is very close data. But when you think about remote storage solutions, people don't actually want to pay the storage cost of keeping both copies of the data. So what most remote storage systems support is some ability to deduplicate this data in a way, right? Keep only one copy of the data for time period. So for example, replica one might be sending the data and keeping the data. If replica one goes down, then the long term storage might want to switch over to replica two and store that until that goes down and so forth. So PROM scale is based on SQL, so we originally implemented a very naive solution to this using database locks, right? All of the PROM scale instances in one cluster tried to get the same database lock. Whichever PROM scale instance got the database lock was the writer, and the other data was just dropped. If the writer died, it would give up its lock and the other replicas would get its lock, et cetera, et cetera. There were two problems with this solution. One problem is that this created a tight coupling between the PROM easiest instances and the PROM scale instances. PROM easiest itself couldn't take the lock. It had to delegate that to PROM scale, but now you had to have this one-to-one coupling between the Prometheus servers and the PROM scale servers. But really what you want in these type of systems is one Prometheus tier, a load balancer, and then one PROM scale tier. This was impossible in this kind of system. The other thing is, as with most database locking systems, you know who has the lock right now. You don't know who has the lock an hour ago. And so if you are getting delayed data, now you can decide whether to keep that data or not. We solved this second problem by switching to an immutable lease approach, where for each cluster and time period, only one replica took a lease. And once that lease was taken, it was immutable. You couldn't change the lease. At the end of the lease, if that replica was still alive, it could extend the lease. But if the replica went down, then another replica, say replica 2, could take the lease for a future time period. So you couldn't modify the current lease, but when that lease is up, you could switch over. That solved the second problem, because now you have a lock of you know who had the lock an hour ago, right? To solve the first problem, which was already decoupling the network topology from knowing who the replica was, right? So you wanted prom to get to know which replica data came from without having to change the network topology. And for this, we used a clever idea from Cortex, which was just to put in the replica information into the data itself. This is commonly in the Cortex done with external labels, where you define external labels on your Prometheus and think, hey, I'm sending data from cluster A, and I'm a replica 1, or I'm a replica 2. And now the data can be sent through a load balancer. And once the data is received, you still know what replica that data came from. And so this allowed us to create a new availability architecture, which is actually what you would expect. You have your Prometheus instances sending data labeled with the appropriate external labels to a load balancer, which then sends data to a prom scale here, which uses a releasing mechanism as described before to save it into our database, which is time-scaled to be a database built on top of Postgres. And that's it.