 Welcome back to the SDS Dev Room. We are approaching the end here. But in the meantime, our friends will be here and tell us a little bit about what we have to do with the set and the solve. No, okay. Okay. Yes, I just, just unmuted it. Yeah, it's just a video. Yeah, yeah. Yeah, welcome, everybody. My talk, as Patrick said already, is about deploying set classes with Salt. I will start off with covering a few basics. We'll see if we actually need that. Then I will try to run you through a deployment process, which I've done on a virtual machine cluster. And in the end, I'll go into some more detail about some features of the project, how you can customize it and so on. But yeah, we'll see how much time we have. All right, so to start us off, who doesn't know Salt here? Okay, a few people. And who doesn't know Seth yet? Just one, two, okay. We'll go through it then. So Salt is a configuration management software and remote execution engine. It's similar to other projects like Ansible, Puppet, Chef. It does a few things different than others, but whatever. It's based on Python, Python's templating engine and zero MQ. And the underlying principle is a master applying state to minions, so it's push-based. And these states, you can define a bunch of files. That's it in a directory hierarchy. And in these files, you can define directed acyclic graph across several minions. This is one of the things that is fairly unique. I think Puppet does that too, but that's one of the features of Salt. Yeah, the mission statement of Salt you can read for yourself, or you can just look it up on the GitHub page. Seth, we heard quite a bit about Seth today already. It's a scalable, fault-horrent and self-healing storage system that provides you with block storage, object storage and file storage. And it's mainly designed to run on commodity hardware. Yeah, this should suffice for now. Okay, so we wanna deploy Seth using Salt. And the project for that is called Deepsea. It's basically a collection of Salt files, Salt state files, mainly. And they should aid in the creation of Seth clusters and in the management as two. It has several, or by its inception, it had several goals in mind. So it doesn't, it won't deal with any bare metal deployment. So you should do that before. It won't install an OS or anything and it won't bootstrap itself. So you need a running Salt cluster. That's basically installing Salt, starting the processes and accepting a few keys. You can look that up on the Salt website. That's fairly easy to do. Anyway, Deepsea starts after you have a running Salt cluster. Another aim was to automate hardware detection. So when you wanna deploy a sizable Seth cluster, you have to deal with quite a lot of hardware on different machines. And we try to avoid you having to deal with that. So we wanna automate a lot of that. Deepsea tries to spot problems before they get deployed. So we have a bunch of validation steps that we wanna run and warn you that you shouldn't do this. And yeah, it's not just deployment, it's also managing the whole lifecycle ideally. There is some work to do still, but yeah, we'll get there. It's obviously open source licensed under the GPL. And the current status is we are at version 0.71 or two. I'm not quite sure at the moment. It's usable, so the whole deployment workflow works and basic management capabilities are there too. So you can decommission notes and add new notes and that kind of stuff. And yeah, you can read more about it. You can report bugs. You can contribute on GitHub if you're free any time. There is a Wiki too and obviously the bug tracker and all that, so just go there for now. It might, at some point, as I just talked to Patrick, it might at some point migrate to the CEPH, what's it called, the CEPH project, but we'll obviously inform you of that. Okay, so basic workflow for Deepsea. As I said, you install your OS on all your notes that you wanna use. You install Salt, you get your Salt cluster up and you install the Deepsea package on your master. And then you start using Deepsea, obviously. It's organized in a bunch of stages. To get a running CEPH cluster, you need to run at least three stages. This is one through three here, one, two, three. Stage zero is kind of optional. You don't really have to do it. It would take care of all your minions being in the same state, so run some updates, install a certain kernel version or whatever. We do have some files for that. They probably won't work for everyone, so that's why it's optional. Discovery will check out all your hardware. Then there's a manual step. You have to create one configuration file that pulls in all those proposed configuration fragments and what that means, we'll get to that in a second. Yeah, this is the one manual step you have to do. After that you, Deepsea will create your configuration out of that, push it out to all the minions, and then stage three will actually deploy CEPH. In stage three, you will deploy monitors and OSDs, so you will get a fully functional CEPH cluster. And stage four will then deploy some extra services, mostly, you know, CEPH FS, RGW, ISCSI, and all that sort of thing. There's also stage five, which we'll then deal with removal, but that's more related to life cycle management of a CEPH cluster. In the next part where we'll go into a deployment process, we'll only look at stages one, two, and three because time constraints mostly. Especially stage four can get quite complex. If you look at ISCSI deployment or RGW, you can have multiple gateways that interact with each other, and the whole configuration just can become quite complex, so there's not enough time here to cover all that. Some more notes about Deepsea itself. The stages are orchestration files for you that know salt. That means they're executed with the orchestration runner, and these orchestration files will take care of all the minion targeting that needs to be done. It's based on roles, so we put roles on nodes, and salt will act on those nodes depending on their roles, but you can also execute these states manually, but then you have to target them yourself. A common pattern that you will see within Deepsea is this redirection pattern. Salt, when you wanna apply state to a minion, you can point salt at a directory, and by default it will look into this directory and execute the init.sls. This is just a convention of salt. We have all our init.sls have this include in it. They include a file in the local directory according to some configuration data that is stored in the pillar. For those of you who don't know what a pillar is, that's just a place where you can store static configuration data that is available to all minions. Let's kind of think of it as a key value store. So this includes here, the specific one will look into, look in the pillar if there's anything defined for mon init. If there is, it will use that as a value, or if not, it will just use default, which in this case will then open a default.sls in the same directory and execute what's in there. And why we do this, I will get back to that at a later stage. Also deep-sea requires a minion on your master node because we do want to apply some state to your master. We heard some people saying that this is kind of a deal breaker, we haven't really understood yet why. But yeah, I just wanted to mention it. All right, yeah, so let's try it. I'm not gonna do a live demo. I'm holding it like Lenz did earlier. I don't trust the demo gods. I have a bunch of screenshots of the deployment process. I hope it's not too confusing with all the different files that are involved. That's not too important. What I mostly want you to take away is that that you can customize certain things. Certain things are taken out, not out of your hands, but are done for you. That's the important part. And yeah, I hope that works. So don't get disheartened when you get lost in what file is open now or whatever. Okay, so I have this demo cluster, 10 virtual machines, fairly small machines. They have two network interfaces each because that's what Ceph likes. Ceph likes a cluster network and a public network. There are six OSD nodes all in all. Four of them have, no, actually all of them have five, five gigabyte drives. If such a thing would even exist. And two OSDs have an extra one gigabyte drive. So we have 32 drives overall, means we can at most deploy 32 OSDs. They're fairly conveniently named, as you can see. So we have one, one, two, three and data one through six. I just chose that way because it'll be easier for the presentation. Those names will be used in the policy.cfg, but I'll explain a few things if your naming scheme isn't quite that convenient. Yeah, as you can see, there's nine nodes that I'm gonna use for this cluster. The admin node is basically just my salt master. All right, so stage zero, I've talked about this before. It's optional, it's at the moment very SUSE specific. We're working on getting rid of that, but there are some issues in salt that need solving first. It doesn't do any black magic or anything, so it sinks your salt states, it installs a package or two, runs some updates. You can do that with salt itself, fitting for your distribution, and then you can just skip stage zero. That's highly optional. At the moment, stage zero and deep sea might reboot your minions, and that includes the minion your master's runs on, so be advised if you do use stage zero and you manage other nodes, maybe with reactors, maybe step away from running stage zero. It might surprise you. Okay, so stage one is the first interesting one for us. It does discovery, so it goes out to all your minions and queries them for their hardware, which for Seth, of course, is network hardware and storage hardware. It will then write a whole bunch of configuration fragments in the pillar sub-directory, which you can, oh, there's a typo there, it should be SRV pillar set proposals. Those configuration fragments, we'll look at some in a second, are really just tiny YAML files, which will then make up your configuration. What's important to note here is, it will produce a roughly, at least, per fragment and minion, it will produce one file, so it will produce all the necessary fragments for all your minions to be a monitor. You don't want that, probably. I don't want nine monitor nodes here, obviously, but it will give you the option to that. It will also produce all these configuration fragments for all the salt minions that might not become Seth nodes in the end. You can see that there are a lot of files that are created, but luckily, you never have to really look at them so. I just wanted to mention that. This proposals directory will look like that after you run stage zero. So yeah, it's a bunch of subdirectories, a whole bunch of role definements, so as you can see, rollmon will be one important one, obviously, that we're gonna use. Above the roles, there are the profiles. Those will be the storage nodes, so for every hard drive deep-sea encounters, it will try to propose a storage layout for that particular node, and create a directory that's kind of named after what hardware it found. As you can see here, this resembles roughly my OSD nodes, and above that, there's just some general configuration data. So yeah, what's actually in those files? Oh, by the way, I hope everybody can see that it looks all right, huh? Okay, so the top one is the configuration fragments for one node that is going to become a monitor, and as you can see, they're really tiny fragments, so there is a roll key, and this particular fragment will add mon to that list, and since the monitors of Ceph have to be on the public network, they'll get an IP address too, and that's it already. The storage profiles, they are a bit bigger, so we again have this roll array, and in this case, you add storage obviously, but you also now have the actual storage profile. So in this case, this is one of the minions that had the five drives in it, and this particular proposal that deep-sea came up with will deploy five OSDs, standalone OSDs, all on their own disk. There's usually two ways you can deploy an OSD in Ceph that's standalone on its own disk, but you can also put the journal on an external disk, so the OSD will use two separate disks. You usually wanna do this when you have an SSD and spinners, for example, then you would put the journal on the SSD and the actual data on the spinners to speed up the write and reprocess. That's the other line up there, data plus journals, that would be the other way of deploying an OSD, and we'll again get back to that later. Okay, so you have all your proposals now, now it's time to come up with the policy.cfg. So as I said, this is the central configuration file. Basically what you do is you include a whole bunch of these configuration fragments, and Deep-C will then in another stage collect or compile all those configuration fragments and create a config out of it. We only have nine nodes here now, so it's not too unwieldy, but imagine you have 100 nodes and your storage nodes have 24 drives in it or so, you get a lot of fragments, so you obviously don't want to list every single file, so, and then the policy.cfg, you can use the globs that Salt uses to. I'll show you in a second what I mean, but you can also do more complex stuff with list slicing and reg access. And yeah, the order in which you list things is important because later options will overwrite another incarnation of themselves, but that's not too important for us now either. So the policy.cfg that I used for this cluster here looks like that. Yeah, it's maybe not too obvious what happens here, so let's step through it. This is a configuration fragment here that will just assign minions to a cluster. Eventually Deep-C should be able to manage multiple clusters, multiple CEP clusters. For now, this is only used if you want to deploy a CEP cluster, but also have other nodes that you manage in Salt. Then those nodes, you would actually include a cluster-unassigned file, and then they wouldn't be managed by Deep-C anymore, but since we don't have that, we just include the cluster assignments. Next step is the hardware profiles. So here I just include all suggested profiles because I have a limited amount of disks only. There's nothing I don't want to use, and Deep-C only came up with one suggestion. That's, you can see profile dash some disk, dash some other disk, and then there's a dash one at the end of every directory. Usually Deep-C would come up with at least another, with another proposal, which would then be named dash two, but that only happens if it finds an SSD. So if it is able to create this external journal OSD, which doesn't happen on virtual disks, obviously. Then you include some common configuration. This mostly deals with CEP configuration values that you want to deploy, and then you assign some more roles. Mainly here the master, which is my salt master. All the notes that are called mon-something should become admin notes, so they get an admin key ring, and all the notes that are called mon-something will become monitors, obviously. You can see how my naming scheme here is very convenient. As I said, you might have some notes already, they might not be as conveniently named. There are some other options to use these globs. You can use other salt globs, so you can list certain allowed values. But much more interesting is you can give another argument to it. So this admin star will create a list of all the notes that are called admin-something, and then you can pass along either a slice, which will take a chunk out of this list, or a reg-ex, which will match every item of this list against this reg-ex, or you can even use both if you wanted to. So this should take care of even the most unlucky naming schemes for hosts, for a self-deployment, and avoid you having to list all the files that are there. Okay, so we have our policy.cfg. We run stage two now. This will take all those fragments and produce a configuration for Deep-C to use. This is based on stack.py. Deep-C comes with stack.py, but if you use a fairly new salt version, this will include stack.py too, so there's nothing to do really. It will write out this configuration to a directory tree under srv-pillar-sef-stack-default. This is the configuration you basically came up with in your policy.cfg. It will create the same sub-directory tree in its parent directory and create all the files that are in the default tree too. These files in the parent directory are for you to use so you can override certain options that Deep-C came up with and that you don't like. That's just a customization thing. You just have to look at it. It's fairly obvious. And you can check your config by just looking into the pillar. So you target a minion and do pillar.items, and it will list all those configuration key value pairs that you have, and you can just check up that everything's as you wanted. And then you're mostly done. You run stage three, which will take care of the whole deployment, and you have a running sef cluster. One thing to mention about stage three is that it will validate the configuration you want to use. So it will run quite a few steps of looking that you have three months, for example, that you have enough storage nodes that maybe you have a firewall running on some of the nodes that will notify you at least of that. So when stage three is through and some things don't work, it might be a firewall issue or so. Yeah, and then go out, install sef, create the cluster, and create a pool. This is what it looks like. You don't have to try to decipher that. What you should take away is that it's nice and green, so we're all good. And then we have a running sef cluster. This is, yeah, the health warn here, don't worry about that. That's just the pool that it automatically creates is too small. What I want you to pay attention to is that we have 32 OSDs, so on each virtual drive that machines have, there is one OSD. Yeah, and that's it, basically. So, can finish now? No. Let's look at how you can customize this whole process. Are there any questions so far, maybe? I don't know, is anybody very disheartened or very confused, probably? Probably too confused to have questions. Okay, let's confuse you a little more. I've spoken about the hardware profiles. So as I said, Deepsea will usually come up with another profile for external journals, but only if you have an SSD, which is obviously not the case in my virtual machine cluster here. So I just wrote one. What Deepsea will propose will look very similar to that. So it's just a key value mapping of data drive pointing towards a journal drive that you wanna use. On a real cluster, this would look more like that. This, you know, we wanna use the actual ID of the drive because the def slash SD something might change. You get an idea why it's good that Deepsea automatically creates all those files because, I mean, this is even a small one, right? This has like, what, seven or eight drives. So imagine that with 24 drives, it gets a little error prone, let's say. So yeah, so, but I wanna use this profile now. I wanna use, I wanna create SSDs that have their journal partition on another drive than their data partition. So we are back at, imagine we're back at stage, after stage one, obviously. So we haven't created our policy.cfg. Yeah. Another thing we have to look at now is how to customize the behavior of Deepsea. Cause mostly for this particular case, Ceph disk, which will we be using to deploy the OSD, will just refuse to deploy an OSD on a one gigabyte drive. Reasonably so, but we wanna force it to do that. So I wrote a custom SLS file, which I called foster.sls and put it in the OSD subdirectory of the salt tree. I won't bore you with what's actually in it. It'll just partition the drives the size you want to and then force Ceph disk to actually use those partitions. Okay, so how do we get Deepsea to use it now? I spoke before already about this redirection pattern. So it's up here again. Oh yeah, and you might notice on this slide here, there is an init.sls and a default.sls. Those are the files that Deepsea comes with. Yeah, and I just added something there. So now we have to put a key value pair in the pillar which defines OSD init to foster them. So Deepsea will use my foster.sls. And this you can do in the stack directory where the configuration lifts after stage two. I've spoken before about those two mirroring subdirectory trees. And I look here into the global.yaml. You will see, so this file was initially empty and it gives you a little comment to notify you that it will override values in the default slash global.yaml. And basically I just put those two key value pairs in. Mostly the OSD init obviously is here interesting. For the partitioning I've done the same thing because I also created a default custom partitioning scheme. And then this is in the pillar. So when you then run stage three, it will use your customizations, go out and deploy as half cluster. And we have now one with 30 OSDs because the two one gigabyte disks went into other OSDs. Yeah, this is just a general workflow. Some more things that we can look at is beyond deployment. So I talked about stage four already. This deals with all additional services that SAF offers. So MDSs and SAFFS deployment obviously, ISCASI, Radar's Gateway. We, I think we already merged NFS Ganesha or maybe not, but we're working on it. So it's gonna be there too. And there's some facilities for managing certain clients too. Yeah, this is a whole another presentation in itself obviously. So yeah, but it works the same way. It works, it's governed by a policy.cfg and by this orchestration file which you could customize if you wanted to. Stage five will deal with removal of components. So say you want to decommission one of your OSD nodes. The workflow here would be to change your policy.cfg to not include this particular minion anymore and run stage two to push out the configuration and then run stage five. This will remove the OSD. You can always run stage three and four two because everything you do in salt is idempotent. So it won't change anything if there's nothing to change. The idea here is that when you decommission an OSD you might want to deploy a new OSD with new hardware. So if you do that in one step you obviously want to run stage three and four so you want to deploy a new node before you take out an old one. And yeah, this is it. We work on some other. Yeah. Right, okay. So question was what about a single OSD on an OSD node? And this is right now not possible. We are working on that though. I mean, if the hard drive is broken you can obviously, you know, the OSDs out already. You can just pull it and put in a new one, restart the service. But yeah, we're working on decommissioning single OSDs too, yes. And that concludes my talk. So any more questions? Salt SSH? I haven't tried that, but I don't see why it wouldn't work. Cause I mean, the transport, oh so the question was if this can be used through salt SSH and I don't think there's anything you have to undo to do that, right? I mean, it's a salt configuration method. So I would cautiously say it should work. And another question? I wasn't setting up the OSD. No, so the question was if the OSD deployment or proposal process will include the OS drive and no, of course it won't. It will in fact ignore any formatted drive that you have in your OSD nodes. Any more questions? Right. Well, I mean, no, so for two reasons. So changing a radar school, you can only increase the PG count and your salt master has also the admin key. So there, you know, the rados command is already kind of a cluster management thing. So we've forewent that. Okay, looks like no more questions. Then thank you and have a good day.