 Okay, next talk is up. Please give me a hand in welcoming Joshua and talk about what you need to see. Thank you and thanks for staying late. It's getting pretty late and I think mine is still... Is it the last talk in this room? No, there will be one after it seems. Okay, so thanks again. My name is Joshua and I will tell you something about the project called Deep Sea and especially which new features we have. I will start with explaining what Deep Sea actually is, how does it work and then I will come to the features. And so Deep Sea is a project developed at SUSE where I work. I'm one of the main developers for this project and it was created to deploy and manage CEP and it's based on salt, so we use salt to actually make it work. I will just explain in a second what salt actually is and this is a follow-up talk which was given by Jan which is also the organizer of the SDS rooms or thanks Jan. And yes, so as promised the salt is a configuration and remote execution configuration management remote execution engine which is highly extensible with Python so you can write your own modules and you can use Jinja as a templating system. As I said, it's here to deploy and manage CEP. Since you're in the software defined storage room you basically should know what CEP is but I will just give you a very brief introduction. So it's a free, it's basically copy paste from Wikipedia so you can read that up as well. It's a free software storage platform and it implements object storage in a single distributor cluster and it provides interfaces for all the major things like object, block and file. So how does deep-sea work? For that to understand we basically just need to get some salt concepts so just to be on the same page I will make it very quick. Basically we have to know about states, that's what salt uses to do certain things on a machine which for example like installing a package, configuring whatever service and there are orchestrations which allows you to group these states and so let's say you want to install very different set of services and want to apply them or whatever you can just group them. It's a very abstracted way to look at things and that comes in pretty handy for creating such a system. That was wrong, right. So we use that level of orchestration to make concepts of stages. So this concept of stages is something that deep-sea invents and it's just a way to look at certain operations and make them more approachable. So we split up a complete, let's say, so we use this kind of concept of stages to group these states or to group operations by topic and so we have five or at least we have six stages to be fair. The last one is a bit optional because that actually moves stuff and so these stages do certain things like stage zero, does setup of a cluster, it installs all the necessary packages and all that stuff. The second one, the first one does discovery and configuring but that's not really important, it's just for the record. We should just jump in right into a demo just for you to see how things work. This is obviously prerecorded because I don't want to face the demo gods, just always, it goes bad. That is sped up by times three. So in this demo, I will run the just stage zero just for demoing purposes. They kind of look the same, they just do different things. Here the stage zero, as I said, is the preparation phase. It takes care of installing everything. And you can see, we are parsing and I will go into what's actually happening here in a second. So here you can see that we also do validations and stuff so this is how it actually looks like when you deploy a self cluster with DPC. We have warnings or when something is green and then it just goes over and does stuff. As I said, it's times three. So I think you get the idea, right? Here it's installing packages, it's syncing, it's pushing files back to Armenian, it monitors processes and stuff like that. Right. And that brings me already to the features and you just saw in this demo, you just saw the first feature. So usually salt isn't very verbose. It's actually very terse in what it's actually showing you. When you, so who of you know salt? Who of you has worked with salt? Not that many people. So when you trigger a command with salt, it doesn't return you anything until it's done. When it's done, it gives you a nice output and it has like colors in it and it's all good and it gives you a pretty good report. But until that time, it's just showing you nothing and if the operation is bigger, which it is in a big cluster, then it might be that you wait for like 10 minutes and think that it's maybe stuck or then you will abort it and stuff goes wrong. So there is something like the salt event bus, which you can attach to and then look which events gets emitted and which states are finished, which modules have been executed and which modules have been successful and whatnot. So those events and so the event bus is basically, it's nice to look at when you know what's going on but when you're not really sure, it's super cluttered and you just can't really read it. So what we did, we wrote a wrapper, actually Ricardo wrote wrapper, he's sitting here, and we evaluate the files. So we pre-render all the states because we need to pre-render because it's YAML files templated by GINJA and then we can, where we get an idea of what the states will execute when we actually run them and then we attach to the event bus and then we just match the states that we just executed with the state that is being sent to the event bus and that's where we get this representation. So with this, we can actually check or give you a better feedback when things actually completed or if they failed right at the moment when they were executed and not 10 minutes after. Ceph has different services and those services are usually installed on different nodes so it's kind of natural to think in a role-based approach. Those roles are self-specific. So there are monitors, there are OSDs. Since two versions, there are also managers and there's a Radius Gateway, an MDS for CephFS exposure, ISCA-Z, NFS-Gnacia, and there was OpenAddict, which is the management framework which uses Grafana and Prometheus, but I will get to that in a second. So that's a concept that Deep-C implements and allows you to specify these roles which then later gets installed. As I just said, there are this OpenAddict role which also implements the monitoring side of things, but we actually want to split that out because there is the Ceph Manager dashboard, which is now built into Ceph itself and that's where the data is actually represented and we just scrape it with Prometheus profile. We also do, obviously, operations because it's not only there to deploy, but it's also there to manage stuff. So that's not a new feature. We have always been able to install various packages, obviously do various basic configurations, add new OSD, which is like the object storage team and we're there as being processed and decommission the same thing, like on an OSD level, on an node level. This is basic stuff that you just need to have when you want to manage a cluster. And we always did updates. So when your distribution just sends in new updates, on iteration over these stages, you get new updates applied and we also did automated restarts if there is a new kernel. So if it actually is required. So we still do updates. Well, it's, yeah, we still do it. It's obvious, but we do it a bit more sophisticated now. So when there are pending updates, we not only just apply them, we also look at them and then we apply them and then we flag certain roles. So for example, if there is an update for one specific service, then we flag the service in the next iteration of one of the stages. Only this service will be restarted if it's required. So we, for example, look at the list of open files and when there is deleted, then we just restart the daemon to not run into very bad situations when you restart the daemon after you just apply three major software updates. What we also do, we do, so we, Ceph has a configuration and there is only one big configuration and we internally split that up. So every daemon gets his own configuration that allows us to track configuration changes. So we compute checksums and then when somebody changes something in the configuration file, we compute the checksums and then compare them and restart only the corresponding services on the next stage in location. So just to be a bit more fine-grained and don't restart everything all the time even though it's not necessary. What we also added are health checks. So the thing is that salt is very good to run things in parallel. That's nice. When you want to spin up a cluster, when you're just going from zero to a running cluster, you can be really fast, you can just execute everything at the same time but when you're operating on a live cluster, you don't really want to execute everything at the same time because bad, bad things can happen when you, for example, have a bad update in your channel and then you apply it like on all your nodes at the same time and yeah, I think you know the bad stories of that one or you just push the configuration, mist up the syntax and then everything blows up and you have outage, all your clients get disconnected and it's just the same with kernel crashes and there is a longer list of things that can go wrong if you do everything at the same time. That's why we actually moved over to a sequential approach with intermediate health checks. So we check for things like is the node up and running after you, for example, did a reboot when you restarted the service, did it come up again? Are the amounts present? Is the system D unit in a proper state? Is, of course, self-health OK because when you restart multiple nodes, it can't complain and if some of these conditions are not met, we abort the operation, the stage and then you as an administrator can interfere. I just talked about monitoring. This is just for the record. We use Prometheus to scrape the data and Grafana is a time-serious database with neat little dashboards. You can just look at this. There is one more feature which was tough and it's still tough. It's migration and for those who know SAF a bit better, there is and was Filestore which was the de-factor standard for a long, long time on this format but the recommendations change over time. So right now it's BlueStore and eventually in the future it will be something different. In order to get the new, latest and coolest on this format you kind of have to redeploy your cluster because there is no direct migration path offered from SAF. So we also try to find a way to make this less painful redeploying something. So this is why we use this kind of controlled way of killing an OSD and bringing it back up with the new format and in between applying checks and all that stuff so that we try to be very user-friendly. We have also different modes like the more aggressive one which goes up per host operation or a bit more careful which goes on a per OSD basis. We also do upgrades like from one release of your operating system currently only open SUSE and SUSE, sorry and also of the SAF version to the next. Also here we leverage our approach of sequentially going over each node and being super, super careful of not breaking things by applying basically the health checks after each operation. Also a neat little feature which basically sounds horrible in the first place which is a staged shutdown but if you want for example to have a data center move from one place to another there is actually a recommended order in which you should shut down your services and that's why this feature is there. We also apply app armor profiles by default tuned D, not enabled by default but we at least shipped the profiles. Oh yeah. And so that's a feature that's a bit crazy and it was never fun but so there's this engulf feature and we took the name engulf because this project called Deepsea and it's, it deploys SAF and we tried to be a bit more in this maritime kind of type and so this feature was implemented to make a non Deepsea cluster which was deployed by whatever SAF deploy or even manually or SAF Ansible or whatever and then make it a Deepsea controlled cluster. The tricky part was that everybody put his configuration files wherever he wants and everything was a total mess and it kind of works now and it's always worth a try if you want to switch over to Deepsea we have a method of doing so but take it with a grain of salt it's pun intended but this feature is special so you should take it with care. We also, you also can do benchmarks when you spin up your cluster and you just bought new hardware you also want to benchmark it you can do a baseline benchmark leveraging SAF Bench you can do RBD benchmarks with FIO SAF is also with FIO One thing that we added too late were unit tests this project evolved out of a prototype and we were kind of late in writing unit tests but we caught up pretty fast now but the more important part is that we now have not only unit tests but also integration on smoke testing and that is actually implemented in SAF's own testing framework called Toothology so we have an easy way to define certain parameters let's say this distribution X number of amount of nodes this kind of services that we want to have and we expect this and this to happen and then it just installs and it's super cool you should definitely have a look at Toothology if you want to learn more about SAF we can do purging that's nice if you have a proof of concept cluster when you go to a user, customer, whatever you just want to show something, tear it down you mess something up, you buy new hardware then you want to test it again you spin it up, purge it down and why Deepsea is also great for proof of concepting is because that it actually requires almost no human intervention because we always try to get very sane defaults there's only one step where you have to interfere and this is step two before stage two where you actually assign host names to certain roles and that's what you kind of have to do because we can't really tell or we can't find out what you want to have that's just, you always have to do that but otherwise you can just run through them and after depending on the size and depending on the speed of your nodes like 10 minutes later you have an up and running SAF cluster but we, as I said, we rely on sane defaults but also at the same time we use salt which is highly customizable and highly configurable so you can basically do everything on your own you can even swap out even your own states and just messing with it it's not really recommended but you can do it you can do it, it's, the framework is there for it so that was basically the end of the features first of all I want to thank my employer for sending me here and also for supporting the Deepsea project which is upstream also want to thank the team Eric Jackson who was the original creator Jan, Tim, Ricardo and Nathan which is the guy that mainly was helping us with pathology and the ORT mode for those who don't know the ORT mode is a pretty neat little thing of organizing tasks and you can write in ORT mode a presentation and export it as a reveal.js presentation which this presentation is so, and you can find it here all the links, the links to the Deepsea it's under the SUSE namespace my slides and my github, email address and with that, thanks for your attention and I'm done so the question is if we have more plans to add more sophisticated features like crush map manipulation that's not the scope of this project I would say there is a crush map editor or will be a crush map editor in the self-manager dashboard if I'm not mistaken maybe eventually there will be one but that won't be probably not Deepsea's land I think right now there's a separation of using Deepsea to deploy the extra cluster and then use the dashboard for it right, good, thank you