 So Kyle Bader, I work with the storage BU at Red Hat. Been working on Ceph for a while. And one of the things that's been coming up recently is how do we take Ceph and take object storage to the next level of scale, right? I have this old logo from Ceph from 2007 on here. We used to say it was petabyte scale storage. Well, it's like not cutting the mustard anymore. So how do we adapt? So some of the largest tests we've done were with maybe 10,000 OSDs every now and then CERN, the same place that collides the particles. They'll say, hey, we're getting a new shipment of hardware. And do you guys want to run some tests on it for three weeks? So we've done this a series of time. The most recent one we did was with 10,000 drives. So we built a 10,000 drive Ceph cluster, ran it for a few weeks. And each of these times, we've pushed it to make a bigger cluster. So if you look at the highest capacity drives you can get these days, they're around 16 terabytes. And if you were able to have 10,000 of them, you're going to get into the on order of 160 petabytes, which is a really big data store. But we have customers that are starting to say, hey, I want 200 petabytes of storage. What does this look like? And they also want to have access, really demanding throughput, particularly if they're going to be using it for machine learning applications, where they're saying, hey, I'm going to have all these cars that are going to be pumping all this data into this thing on the order of petabytes per day. And then I'm going to have to every so often, every so many months, I need to go through all of the data and retrain it with the new fresh data. And so you're looking at a very serious amount of throughput demands. So we were thinking, OK, well, how can we solve this? How can we cater to these really, really demanding use cases where people need hundreds of gigabytes per second of throughput? They need these 200 petabyte clusters. What can we do? So one of the things that was done a number of years ago was we were working with the Yahoo folks. And we helped co-develop an architecture where they had multiple SEF clusters backing the storage for Flickr. And so I think this same sort of architectural approach is still relevant today. And so you can create architecture that's kind of like a shoal or like a group of squids. So you have a group of SEF subclusters acting as a bigger cohesive whole with a single namespace across of them so that you can have billions and billions and billions of objects and hundreds of petabytes worth of data. So what can you get out of this? Well, in terms of subclusters, we were kind of seeing what sort of throughput you can get in and out of a relatively modest size cluster and that you can kind of extrapolate things. So even with a relatively modest cluster of about 700 spindles, we were able to do a little bit over a petabyte in 24 hours. So this is kind of validating the use cases where people are, you know, I need several petabytes a day on the order of how many disks you're going to need in order to be able to absorb that much data. We wanted to see how many objects we could stuff into single bucket, right? So a bucket is like a flat namespace and people often will want to put in many. So we tested, I think, up to 250 million objects in a single bucket, which is a lot. And you can see that our latency, after kind of an initial step, there was some sharding where we internally shard the metadata. But after that, we had very consistent latency even though we had 250 million objects in a single bucket, which we considered to be a lot. We also wanted to test storing, well, it's not a placeholder, but this is light and talk. We also wanted to see, let's put billions of objects into the cluster in global. And so we did. We put over a billion objects into the cluster and observed how the change in performance changed over time. You'll see that it was relatively the red bar being the latency. And then the blue bar is the object population. So our latency stayed relatively flat until the point where we were taking up all of the SSD for our metadata. And then we slowly were starting to spill some of the metadata over to disk. So as there was a higher percentage, you can think of it as a cache mesh. So as less of it was on SSD, the more of it had to come from hard disk, that's why the latency was starting to creep up over time. And then for reads, it was relatively stable because it's still just going to be a seek to read. And there's not a lot you can do to accelerate a seek other than you can't cache the entire population of objects. So with those individual subclusters out of the way, what would a shoal look like? Well, in a SEF multi-site topography, you have these ideas of zones and zone groups. And these zones and zone groups are originally put into place in order to be able to do replication between them. But that doesn't necessarily have to be true. So by creating a realm, that's like a S3 global namespace for buckets. And then each zone group, a bucket can live in exactly one zone group. And then you can potentially configure, have multiple zones in a zone group, and then do replication between them. But if you only have one zone in each zone group, then you're basically just partitioning the namespace of buckets. So each bucket lives in exactly one zone group. And so if you're interacting with the object store, you don't necessarily need to, like, you just have your S3A path, or you go to your bucket.s3.example.com or whatnot, and it'll get routed appropriately. And the way this routing works is DNS, right? So if you go to your S3.example.com, it's going to map to the main IP address. Or if you go to use path style access, you're going to go through the S3.example.com. And this is going to round. You could potentially just round rob in this, around the cluster, in the most simplistic sense. But you could also potentially do something sophisticated in HAProxy, where you have some sort of Lua-based routing that actually looks into the cluster to find out where a bucket lives. And you could kind of do it, like, I don't know if anyone is familiar with an SDN controller, but it's kind of like a first packet approach where the first packet gets resolved by the control plane and then it embeds something in a lookup table so that SQL ones don't have to go through that logic. You could do something similar to that. And then you'd have a separate cluster endpoint per subcluster similar to how AWS has different endpoints for different regions, right? So you're effectively creating little regions that each have their own sef cluster. And then with some sort of DNS plugin, it would map. So if you say, you know, Kyle is the name of my bucket, I go to Kyle, right? You go to the DNS and you'd have a DNS plugin right now, someone has written one for Power DNS that talks to sef and then says, oh, okay, this bucket is in this one, and then they'll respond with the record that'll route them to that cluster. Well, you could do the same thing. And I think kind of the, where you could kind of take this is to take it and put it into core DNS. And then if you had an operator that was deploying multiple clusters inside of an open shift, it could automatically configure all of this wiring and route the traffic appropriately. So that's kind of, that was my quick, quick little talk. Thanks. Thank you very much.