 All right. So welcome to the Seth Day of Vancouver. It's wonderful to see all of you here. We've been kicking off a whole series of Seth days this year. And it's great to be back at the Open Infra Summit. Myself, it's been probably six or seven years since I've been at one of these. It's wonderful to see the crowds are still out here, and folks are still very excited about using Seth and Open Infra and all things open source. So it's great to see folks here. And I thought, let's say a little bit about myself. For those who don't know me, I've been working on the Seth project for basically my whole career. Started off with the startup Ink Tank, before it became Ink Tank, working across multiple areas of Seth. And through Red Hat, and now I'm at IBM, managing a few teams of software and developers working on Seth to this day. In terms of the Seth itself, the Seth project is helped governed by the Seth Foundation, which was formed in 2018 to promote and foster the project. There are 32 members across multiple industries and geographies. And it's just a general home for collecting all the stakeholders within the Seth community together and pulling resources together to improve the project and the community overall. So we've run lots of Seth days, Seth Lacan's, which are multi-day conferences. We had the most recent one in Amsterdam a couple of months ago. It was fantastic. The Seth Foundation also does things like funding, upstream documentation and marketing. If you're interested in helping out with the Seth project, this is a great way to get involved, even if you aren't a developer. So feel free to contact us if you want to join. So we had many different Seth days this year across the US, Asia. And this week we had one in Seoul, South Korea, and today in Vancouver. And this is a photo from our Seth Lacan Amsterdam, which was a couple of months ago. It was an absolutely packed conference. Incredible content across three different tracks. So really good talks there. If you want to learn a lot more about Seth in detail, I definitely recommend checking those out on the Seth YouTube channel. You can find all information about other events here. And if you want to organize a Seth day, we're starting to think about planning for next year already, please send an email to foundationsf.io or come find us afterwards. And I'd like to say a big thanks to our sponsor for today's Seth day, OSNexus. Steven and Bihanker will come up and give a talk afterwards about OSNexus. But I'd just like to say thank you very much for your support. So before I go into more about Seth itself, I want to take a quick poll at the room. How many folks here are already using Seth? Okay, how many folks are not using Seth? Okay, so maybe this intro part, I'll keep it a bit short. So I'll just talk a bit about what Seth is and where we are. You probably heard of this before. You can hear described as the Linux of storage, software defined storage, unified storage system. But what does this really mean? Well, Seth is all open source. It runs on commodity hardware, all kinds of different networks and disks, what kinds of servers you can think of. And it provides access to storage and across all different protocols, object block and file related. Seth has always been very focused on freedom of choice and open source. So it's free as in beer. Anybody can take it off of GitHub and use it. It's free to modify and share again. And Seth has always provided the freedom from Ben Roelaken. Anybody can take the code, run it themselves, improve it and share their improvements with the community. The big focus for Seth has always been around reliability and quality. So making sure that you don't lose your data. It's a pretty important aspect of a storage system. So one of the big focuses of Seth is on really making sure that your data doesn't have any single point of failure. Durability is always maintained, even if there are outages happening. I know it's going down, disks failing. And there's no minimal interruption in service from any of these events. And as easy to use during upgrade processes or online expansion or shrinking of a cluster as possible. And in general, Seth is always favoring correctness and consistency so that your data maintain, remain safe over availability or performance. Seth is also designed for high scalability. So you can start off with a very small system and grow it to multiple petabytes. You can add or remove stuff for a storage over time. Seth doesn't care what size the disks are. You can mix and match all different kinds of things together in one system. And Seth will deal with it straight out of the box. In the last perhaps seven or eight years now, there's been a lot more interest and focus on multiple data center and disaster recovery capabilities. So there's lots more of interest in replicating data between different data centers, either for disaster recovery or for maybe access from multiple areas of the globe at once. We have one group coming online in one day working on things in one area, another group coming online later in the different time zone. And you can shift your workload kind of around the globe that way. So Seth is always built on this underlying RATOS layer. RATOS is the reliable autonomic distributed object storage system. And it forms the basis of all the other Seth protocols. It handles all the details of replication internally. So everything built on top of it doesn't have to worry about all of that. They just store things in as objects in RATOS, which has a very rich kind of object oriented API. Then there are three main interfaces for accessing Seth. There's the RATOS gateway for S3 or Swift style object access. There's the RATOS block device for you often use with virtual machines in OpenStack or containers in Kubernetes. And then there's the Seth file system, which you can imagine you could use in any place where you want a shared file system with access from multiple points at once. So I'll go a little bit more into depth on how RATOS works. As I said, it's all focused on strong consistency. And it's providing a very low level kind of object API, where you can interact with a single object atomically in a very reliable and scalable way. So you can do kind of really complex transactions on a single object. And it'll all be maintained atomically and consistently across however many replicas or however many racer coded shards you've configured. There are a few different aspects of RATOS. Three main demons. There's the Seth monitors, which maintain kind of the overall state of the cluster, which demons exist, which disks exist, are they up or down, that sort of thing. Generally, you need an odd number of these to maintain a majority state. And they use the Paxos algorithm to maintain this consistency and consensus about what the cluster looks like. So usually you only need three of them. For larger clusters, you might use five, maybe up to seven, but it's not really something that you scale out. Then there's the manager, which aggregates real time statistics from all the different demons and has a number of different kind of pluggable modules for orchestrating the cluster and doing other things like the manager's also hosting the web UI, the dashboard, Persef, among other things. Finally, the most numerous demon is the object storage demon or OSD. This is usually deployed kind of a one per disk configuration. If you have multiple different kinds of disks, you might be passing, giving it multiple, giving me different partitions from an SSD and one hard disk per demon. But in general, it's responsible for storing the actual data on these disks and managing all the replication and kind of cooperating with each other to rebalance data as needed and notice anything that's failing and report it back to the monitors. So when you're looking at a data storage application, generally in the past, you'd have the kind of legacy architecture where you have a single application talking to a single server, maybe with some kind of pairing for HA or failover, but you wouldn't be able to really scale this out very well. You'd have a really big bottleneck in the main gateway there. So Ceph is really more about client versus cluster architecture where any given Ceph client can talk to all the OSDs and all the monitors in the cluster as need be. And any of the things that are storing data in RATOS are gonna be strapping the data across many objects that are spread across all the OSDs in the cluster potentially, or at least within whatever pools you have set up. So you get the full parallelism of the cluster and you can get a very, very good performance, especially for large parallel workloads. So with this data spread around the cluster, how do you find it? Where does an application know when it's writing this object, where that goes? Well, there are a few strategies here. One classical approach is to have some kind of metadata server where you go and ask the metadata server, where is this data? That's great for a small system, but for a system like Ceph, where you're talking about millions, billions, trillions of objects, that doesn't really scale. So Ceph takes a different approach, which is calculating placement of objects. It has this map of the cluster called the OSD map that you maintain by the monitors. And this tells Ceph which demons are up and down. And from that map, it'll calculate, given objects, which OSDs the object should live on. So that tells the clients if they have the name of the object and the OSD map, that's all they need to know to figure out which OSDs to talk to. They don't need to do any extra lookups, they can just go directly to those OSDs and talk to them and get the data they need. And whenever something changes in the cluster, like say a disk dies, the OSD map gets updated, that OSD is removed. The new OSD map is distributed in a gossip fashion to all the other OSDs and clients. So whenever one finds that they're talking to another demon or a client finds that it's talking to another demon with an older, a newer version of the map, the new version is immediately shared back across the cluster. So everything is kind of distributed in a nice, even way and the updates are propagated quite quickly in this way. So radius is a very rich API for object storage. At the underlying layer, it's much more flexible and powerful than you might think of when you think of object storage as an S3 type of object. Radius objects are kind of more like files in that you can access and address them at a break granularity and you can update and do all kinds of complex transactions. Even customized ones on the OSDs themselves. They also have a kind of attribute and or key value interface. So you can store some, ultimately small structure data there. They have an easy way to look up and access things. Or you can just use them as buckets of bytes like you would have a regular file. And objects are organized with NSF into pools. So you might have pools for different use cases. You might have one pool that's maybe stored on SSDs, one that's HDDs. Maybe you have certain pools that are for one file system that's more temporary storage with a different set of settings than you have another pool for a more long-lived storage sort of thing. So when you're talking about storing objects in pools, that's how a user or how a client sees radius. In order to find the OSDs that it needs to talk to, as I mentioned earlier, it's working based on the OSD map and the name of the object. In order to make the management of these objects a bit simpler and take less overhead, we also further subdivide pools. We shard them into what are called placement groups or PG's. So each placement group is just at an arbitrary shard of a pool, which means that all the objects in a given pool are distributed approximately evenly among the placement groups. And to find out which placement group an object belongs in, the client simply takes a hash of the object name. And the placement group is the piece that's actually fed into the placement algorithm to look up based on the placement group and the OSD map, which tells you which OSDs are up and down. You know precisely where your data is being stored. So this is, that's a kind of a simplified version of how the client and raters protocol. When you have replication involved, it's very simple. A write goes to the first OSD in the placement group and that OSD will replicate to the other OSDs. Replication can be any number of replicas you like from two to, most I've seen people you regularly use is four, but usually three is a good default. You can also use a ratio coding with CIF. This is much more space efficient, especially for very large objects like you'd see in an S3 sort of environment. It's possible to use it with other things like with blocker devices and CIF file system as well, but it has a high performance impact when you're doing lots of little random writes. So it's usually not as a good an idea to use it for those use cases. But with the ratio coding, the idea is kind of similar to replication. The writes are going to the primary OSD and it's breaking up the objects into chunks for your ratio code. You can kind of think of a ratio coding as like a more generalized version of RAID where you have some stripes that are just regular data and then some parity stripes at the end. And those are just divided among several different OSDs to maintain that durability guarantee. So in general, RAIDOS is all about virtualizing your infrastructure and your storage. You can have all kinds of different hard drives, SSDs and determine how exactly you want to manage it and how it's presented to the user, organizing it into different pools and choosing replication factors and ratio coding factors. With replication, you can even change replication factor dynamically. That's going to add a lot of extra copies or a lot of data movement as you're doing so. With the ratio coding, it's only possible to set that when you're creating the pool. So you want to really consider what kind of ratio coding you want to use at the beginning when you're setting things up. That's a good final overview for RAIDOS. I think since some folks are already pretty familiar with stuff here and have already used stuff quite a bit, we'll leave some extra time for Stephen to go into more detail on his things. So thank you very much. The next one. Okay, the next one is Stephen. Yeah, Stephen, yep, and I'm not sure. The program, I think it comes to you. Thank you.