 All right, I was just waiting for my allotted time. Hello, everyone. Welcome again to Seth Days in beautiful Vancouver. I don't know about you guys, but I love the city. My name's Olaf, and I have the privilege of walking you guys through fine folk through some of the basics of Seth. Or as the session title was pitched, a beginner's guide. This will very much be a bird's view of Seth hitting on some of the whos, wheres, and whys. I'll be leaving all the howls to my esteemed colleagues speaking throughout the rest of the day as they go into deeper detail about Seth. So personally, I sometimes have a hard time expressing what I know into words, so I like to have explicit definitions of keywords, topics, subjects that I can memorize, regurgitate at will to help get conversations and ideas flowing. And I had attended for a lot of that to be included in this talk. But as it turns out, unfortunately, not all concepts are so easily boiled down to a dictionary definition. Some of the things I'll be talking about are easier to describe by what they do, as opposed to what they are. But that'll be a bit more apparent as we go along. So let's get started. And start out at the beginning of our beginner's guide. So where did Seth come from? Seth was originally developed by a gentleman by the name of Sage Weil back in 2003 as part of his PhD project at the University of California in Santa Cruz. So this first iteration only implemented the Seth file system. So later in 2006, Seth was open sourced under the lesser new public license. And in the following years, various institutions, companies, especially other developers, further helped support development of Seth in 2012. The same Mr. Weil was able to found Ink Tank, a company that sold support for Seth. And in 2014, some crazy company called Red Hat acquired Ink Tank. And here we are. What's in the name? So what does Seth mean? It's a little known story that Mr. Weil was a avid fan of 1980s Steven Spielberg movies. But he was so busy writing Seth that he never had enough time to watch the movies to the end. He only got about halfway through and had to get back to work. So all the time he's working away on Seth and he's bugging him in the back of his head so much that he wants to know, can he phone home? So he takes that sentence and takes the letters and makes Seth out of it. That is a complete lie and fabrication. Seth is not an acronym at all. It doesn't stand for anything. Seth is actually short for cephalopod. So yes, this is our first dictionary definition of the day. Cephalopod is any of the marine animals belonging to the cephalopoda class of mollusks. That includes squids, as well as octopi, octopuses, however you say that. And as well as banana slugs. Banana slugs shouldn't be important, you wouldn't think. But as it turns out, banana slugs are the, according to the internet, the mascot of the University of California in Santa Cruz. So I guess this was a naming convention to keep everything in the family. But in my search for a definition for Seth, I found many descriptions of Seth. Some were more helpful than others. Like, yeah, I'm not gonna pick on anybody, but eventually I found my perfect dictionary definition. And that is, it was from an author named Nick Fist. Seth is an open source, distributed, scaled out, software defined storage system that can provide block, cluster, and file storage. I've found low in this, it's perfect. Not only does it say what Seth is, it says what Seth does. I think that's important because those last three things were broken out, block, cluster, and file storage. Most of the other descriptors were calling Seth unified, which it surely is. But I think that the exact definition of, definition again, what a unified can mean different things to different people. Someone might consider SAN and NAS access from the same node as unified. But yeah, anyway, let's break down this definition just a bit, open source. I'm sure everyone here knows what open source is and a big fan of it does. That's what we do. The distributed, scaled out just means it's over lots of, you know, lots of nodes. It could be as big as an entire data center or as small as a, I guess you can install it on a single node. Software defined storage is another thing that we will define here shortly. But like I say, I wanted to highlight the whole point of, not just what Seth is, but what Seth does. These last, the block, cluster, and file storage are the three services that Seth exposes to users. And we'll see about that in a bit. All right, so software, software defined storage. So software is in the definition, what's software? I think it's a bit too silly to try to define. But so software these days has a release cycle. That's, this is Seth's release schedule here, Quincy and Pacific are the two active releases at the moment. I didn't bother listing out every single release. I just wanted to show how, in finales, how the package versioning numbering changed. So starting then, the first number of the release version is also that letter of the alphabet, as in 10 is, or J is the 10th letter of the alphabet, K11. Maybe that's obvious to you guys. It took me a while to figure that out. So here's some more of our definition software defined storage. System of extracting data storage so that the provisioning and management of storage are separated from the underlying hardware. We'll see in just a minute how, in a diagram, well it's pretty obvious too. The services that Seth exposes don't talk to hardware at all. I usually wouldn't quote Wikipedia, but this is that the second one's from them. I just thought it was pretty jaded for Wikipedia. So software defined storage is apparently just a marketing term for a computer data storage software, yeah, yeah, yeah. So yeah, once I got, well yeah, as I mentioned, so the whole dictionary definition thing doesn't work too well sometimes. And once I got into trying to describe our defined file block and object storage, yeah, some of the definitions were a little bit less than useful like, for example, a method of organizing and retrieving files from a storage medium. Okay, that's not particularly helpful. But so a file system file storage should be one of the most familiar ways of storage that users are familiar with. I couldn't find an agreement on what the first file system might've been, but file systems have been around at least since the 60s. Everywhere, every computer in here is using one. This is my presentation as a file. So file storage, file system and file storage is essentially just hierarchical. You're making directories and putting files in directories and the user gets to set those up. None of those files are stored continuously. They're all chopped up much like our next definition, block storage. But yeah, popular file systems are FAT32, NTFS. And so in Ceph, the file system service is called Ceph FS or Ceph file system. Block storage is when the data is put into fixed blocks of data and then stored separately in unique identifiers. Lovely bland definitions that, I'm sure you guys can read for yourself. I won't bore you with them. But yeah, so block storage has a pretty adept, I don't know if adept's the right word, a pretty descriptive name. And that's kind of what you get with block storage. You get a big chunk of data that once again isn't contiguous with block storage. You know, examples being sands or ischic disks, even a local disk. It's a thing that stores data but it's pretty useless as another itself. It's got to be attached to an OS or a VM or something somewhere that describes where all the data is stored on those in that bad boy. Object storage on the other hand. And here's another one, there's terrible definitions. A system that divides data into separate self contains that are restored in a flat environment with all objects on the same level. That's not too terribly unuseful but that being the opposite of a file system, everything's just stored flat. It's all in one big place. There's the identifier that's given so that it can be collected later or retrieved later. Good uses of object storage but how it gets used is for things like collections of music or unorganized data, music files, video files. They're available over restful servers or HTTP. Most people, while not as old as file systems, object storage has certainly been around for a while thanks to S3 and of course Swift. Insef, that service is provided by the Radar Skateway and I think I skipped what that's called in block storage. We'll get to that in a second. Here's our architecture diagram. So the Radar Skateway is our object storage service. RBD is the Radar Block Driver. So our block storage and SefFS is our file system. And these are all sitting on top of Radar. So yeah, get that out of the way. Those are the three services exposed to the user. So that's what Sef does. Now we want to talk about the back end of Sef for a second. Starting with Rados. Rados stands for the Reliable Autonomous Distributed Object Store which is quite a mouthful. I'm glad it got concatenated to Rados. It's the underlying or core storage layer for Sef which it is object storage which can be very useful. Very, very useful. And so Rados is made up of one, two, about as many OSDs as is needed. We'll get into that definition just shortly. But Rados works together with Crush. Crush stands for Control Replication Under Scalable Housing. So while Rados is essentially all the storage it is Sef or what Sef is relying on. Crush is an algorithm. It's a pursuit of a random placement algorithm and the quote I have here is from, again, our Mr. Weil. Crush is the magic that figures out where all the data in the system should go. And everyone can repeat this calculation and know where to read or write data. So what makes Rados and Crush a bit special is that usually when data gets chopped up and placed all over disk drives or servers or whatever there's a lookup table so that that data can be reconstituted. The difference with Crush and Rados is that there is no lookup table. There's no body in charge of keeping track of where everything is. There's no master node like in some other file systems that we'll see shortly. There is Crush. It Crush is an algorithm. So it takes the name of the object along with a way of describing all the disks that are in the system and it calculates where that data should go so that nobody, none of the other nodes in a Rados system, you know, like I say, they don't need to talk to a service or a master list of where all the data is. They can calculate where that data is which is extremely helpful when a replication is needed and things go down and you want to add servers and that kind of thing. I believe that's a topic later on today. I promised the definition for OSDs. These are the object storage demons. These are the workhorses of Cep. So these are the things running on all the nodes that actually do the storage. That's a gross oversimplification but I did promise you a bird's eye view. I'm skipping over other back-end parts of Cep, namely the monitors and the Rados. That was, I don't know if you remember the architecture diagram. That was the skinny little blue part. That's the library that lets applications talk to Rados. In favor of moving on to our next topic, whether your mom should download Cep or not. So they, a couple of years ago, or several years ago, a good friend of mine who was a, I don't know, prolific is the right word but he was a regular contributor to more than one OpenStack project. It was relating the story of having, trying to tell his, relate to his parents what he did for a living, to which his mom responded. What is this OpenStack stuff? Should I download it? Which I thought was hilarious, because one, I know his mom and two, I could just see her on her phone trying to download OpenStack, which we could look to her. So yeah, so it's probably a safe bet that none of our moms, no need to download Cep. But who should? Pretty much anybody that's looking to, looking for storage. Some of the most obvious use case for Cep is OpenStack. The OpenStack sender block driver can use Cep's Rados block driver as well as Manila can use the Cep file system. I think the last statistic I read was that over half of all OpenStack implementations are to use Cep, so that's a great partnership. But any of those three services that we talked about, Cep would make a great use case for standing on its own. Certainly excels in object storage because the underlining data's storage layer, Rados, is object storage. But you could also use it as, and you don't want to set up a NFS file share for all your web servers to use. You could certainly use the Cep file system as a distributed file system. Or yes, block is certainly available. And clearly there are plenty of people needing storage because there are loads of other storage solutions out there. I wanted to compare and contrast just a few real quick. One is the first one I've got listed is StorageScale, which used to be known as SpectrumScale, which used to be known as GPFS, which stands for Global Parallel File System. This was a highly-performance clustered file system. It has been around since the late 1990s, or released by IBM. I think most people's dislike for it would be that it is not open source. The rest of these are open source. And all of these, I think they're, anybody who runs Cep would say that they're, the lack of all three available types of storage from these guys are to their detriment or what makes them less appealing. Gluster is another network catalog file system. Oh, back to StorageScale. I have heard rumors that you can make StorageScale perform better than Cep, but I haven't done run those tests myself and hope to report on that at a later meetup if that's actually the case. Gluster was bought by Red Hat in 2001. It will, it is out of the box. It's just a file system, but there are supposedly plugins that will let you get some kind of block access to Gluster. Gluster and HDFS both have a bottleneck problems in that, like I was telling you about how Rados and Crush get to calculate where storage should be. These guys both either try to keep all that master list of where all the data is on a single server, which single point of failure, never a great idea or more than one name nodes when it's obviously a lot more robust when every single node knows where the data goes in a system. Our last question for our bird's eye view is how do you install Cep? There is a official deploy tool called CepDeploy, which Cep updates or maintains. There are plenty of orchestration tools are very popular and I wouldn't purport to try to define them here, but Ansible of course is Red Hat's. Also owned by Red Hat, so they're somebody else that they like to use their favorite child, I guess. But Cep Ansible is pretty popular. But any of the other orchestration engines, Puppet, Cep, Salt, they all have Cep modules and you can install them with that if there is. And of course there's Rook, Rook uses Kubernetes and this is actually a pretty good place for me to stop because I believe the actual next session is on Rook. So yeah, thanks guys. You've been a wonderful audience. Tip your wakuses.