 Amazing. Hello everyone. Today I'm happy to talk about ZEF management with OpenMatic. Let's start what I want to talk about today. First of all, the different components. So ZEF in general, I hope everyone is aware of what ZEF is. Can you maybe raise your hand if you've never heard about it and have no clue what it is? Jawao, that's a lie. My macros turned on. Yeah, it should be, so of course. Second thing, Salt and Deepsea, OpenMatic in general, and then Bromisois and Grafana. Yep, come in, please. Let's start with ZEF in general. What is ZEF? Just a quick introduction. Maybe there are some folks who've never heard about it. ZEF is a distributed object storage. The cool thing is it not only supports object, as it's part of its name. It also supports block and file storage. So you can say maybe it's the unified storage because it combines all the different storage layers. ZEF per se is designed for scalability and reliability in general. And of course, this is a thing of definition performance. ZEF motivation principles, why was ZEF initially started? Or what are the ideas behind the development? ZEF is a totally scale-out system. So it scales horizontally up to thousands of storage nodes. You can combine them into one big cluster. It was from the beginning designed to have no single point of failure. This is really important. This is a key differentiator to other storage solutions. It runs, of course, on top of commodity hardware because you just need a standard Linux. Doesn't matter. And on top, just install the SES packages. So you could use your hardware. You usually use. It does self-manage and self-heating whenever it's possible. So there are lots of mechanism included into ZEF which makes sure that the ZEF cluster is always up and running in a consistent state. And of course, it's completely open source. Those are the components of ZEF. So the different ways how to get access to the data. First of all, underneath there is the rados layer here. This is the distributed object storage underneath. And then you have four different ways to access this data. First of all, there's lib rados directly. So you can query lib rados via your preferred programming language just to mention a few CC++, Java, Python, whatever, feel free to use. Then there is the rados gateway. This is a REST gateway compatible to S3 and Swift. Lots of time used with OpenStack, I guess you already heard, and other talks yesterday and today. Then we have RBD, the rados block device. Those are mainly used for virtual machines to directly add physical storage to the virtual machine. And last but not least, ZEFFS, so the file system from ZEF on top. Those are the different ways how to get data out of your distributed ZEF storage. Solved, that's another key component. I think almost everyone knows what Solved, if you've ever heard about Puppet, Chaff, Ansible, all the deployment tools. Solved is yet another deployment tool, of course. But what were the reasons why we have chosen Solved and not all the one of the other existing tools? The answer is quite easy. Solved is really scalable, up to those many nodes. I see your face, Owen, up to a few thousands of nodes. What's really nifty is the parallel execution included in Solved. So if you run a command on your client, so-called Minions, this is executed in parallel. So this makes it, on the other hand, really fast. And it has its own protocols. So there is no initial SSH connection needed to get things done. So this makes it really fast. As usual, it's easy to get started. I guess everyone tells you that his thing is easy to get started with. I think Solved is because you just have to write some YAML files. So that's doable, I guess, for almost everyone. And it has a really active community. So feel free to contribute. What we made with Solved is something that's called Deepsea. That could be a little bit weird. What is Deepsea? Deepsea is just a tool that uses Solved and combines Solved to be able to manage and orchestrate almost the whole SSH cluster, from the initial deployment to upgrades to adding nodes, deleting OSDs, all the kind of stuff that's included into Deepsea. And it uses Solved at the end. Right now, Deepsea supports those kind of features. First of all, I already mentioned the initial deployment, configuration. Then something that we call engulf. This is the initial support to import all the SEF clusters, or SEF cluster you deployed, for example, with SEF deploy. So if you have, in the past, in your environment deployed, a cluster with SEF deploy, you can migrate it on fly with Deepsea to Deepsea. Then Deepsea is able to deploy the radar's gateway for single site deployments, where it's also possible to deploy the SEF-FS MDS. Then we can create and also create the shares for SEF-FS and S3 for NFS Ganesha. This is possible with Deepsea. And there are some more features. We're able to deploy ISCSI targets, and as well as the ISCSI gateways. Right now with LRBD, in the future with TCMU Runner. This is right now under heavy development. Then we also deploy the Grafana and the Prometheus node. We're using to monitor our cluster environment, to gather the data, and to visualize it at the end. And the interesting thing, and also one part of the topic of this presentation today, it can also deploy and do the initial configuration of OpenATIC. Let's talk about OpenATIC. OpenATIC is an open source management and monitoring, I would say, UI for SEF. In the past, it was a little bit more, but I will talk about this in one minute. The idea behind OpenATIC when we initially started was we want to build up a web UI tool that an admin would like to use, because most of the time, there are several UIs out there. They're really cool. But at the end, you're still using the CLI, because it's more powerful. It's faster. And in general, you have lots of folks who just don't like UIs in general. So we had in mind we want to build something that is usable for administrators as well. A really important thing was to build OpenATIC as well. I would say kind of stateless, so that those administrators who are, yeah, want to manage their cluster with the CLI, with different tools, with different REST APIs. Doesn't matter. Can still do that without the need to have caching and all the kind of stuff, which is sometimes really annoying to do that. Just look in the past for those who have seen or know about OpenATIC from last year or did not follow the whole development this year. What we had with 2.x already was set cluster and status dashboards. Those were based still on Nagios. Gathering the data was Nagios, anti-singer. We had initial pool management and also pool monitoring, also based on Nagios. We were already able to manage, delete, create new RBDs. We could monitor them. We had the view of cluster nodes and roles. There was just a list with the node name underneath. So you were aware how many nodes are part of my cluster, but there was no interaction with it. So no further command was just a view only. And the support for multiple set clusters. So if you had, for example, a development cluster and a productive cluster, you could just put them both underneath the same UI and manage them within one centralized UI. Notable changes in 3.x, I guess, that's really, yeah. That was a big step forward. We refactored the whole code base. As I mentioned before, with 2.x it still was possible to also manage local storage. That means, initially, OpenMatic was built to be a local storage UI, to set up LVM, NFS shares of top, iSCSI, all the kind of things. And we removed all of this code within the 3.x release in a new branches. And we refactored. So it's now support, theft only. There's no traditional support anymore. As I mentioned earlier, it's now completely stateless. You can just spin it up, do changes somewhere else. Doesn't matter. OpenMatic will instantly recognize, because we'll just gather the data live from the cluster. From a developer perspective, a little bit of simplified installation. We now have just a single package and not a UI and a backend package anymore. We removed a lot of dependencies, because we removed the traditional stuff. And now we don't need all the LVM and all the other stuff. So we removed all the packages. Really important thing, and already noticeable on the web UI, is that we've removed the monitoring. So Nagios anti-sync with PNP for Nagios, with something more, I would say, 2000. And now it's 18. So we replaced it with Prometheus and Trafana. And we also added a bunch of more things like notification, more robust, error handling. I will show you later in the live demo. I guess this makes more sense. New features, we added to 3.x. We not only refactored the whole code base, we added new dashboards based on the new monitoring system we added. So we added dashboards from Trafana directly, gathering the data with Prometheus. And then we added the Trafana dashboards into the UI, including them. We're now able to manage the self-object gateway within the UI. We can manage iSCSI targets in the UI via LRBD, so create new iSCSI targets at initiator, at authentication rules, whatever you want. We can create NFS shares with NFS Ganesha, also within the UI. So there's no need to go to the CLI anymore. And really important, we support the newest release, the newest stable release, Luminus. And also with some Luminus features, for example, they were introduced with Luminus, such as pool compression. That's also possible to set within the UI. Last but not least, we improved a little bit the pool management or LBD management. We add more capabilities. We can now set cluster-wide OSD flags. I will show you later. And with Prometheus, we are now able to do node monitoring as well. So we not only gather the data from our self-clusters and the pools and the OSDs, we also gather the data from the nodes itself. So CPU, memory, network bands are bent with all the kind of stuff. And you're able to view this in the UI as well. While I already talked about Prometheus and Grafana, let me just tell you just a little bit about what it is. I guess you already heard. I know they have separate booths and separate presentations about it, but just to wrap up. Prometheus is a time-serious database. It collects all the data, while it's node exporter, for example, or self-exporter, and gather the data. And then we have it in one time-serious database, which is quite nice. But the user expects something more visible. So we are using Grafana. So we're adding Prometheus as a source to Grafana to visualize those data within CRAS to make it more, yeah. I would say human consumption to make it, I would say pretty. Those dashboards, as I mentioned already, are exposed via the OpenEdict dashboard. But this is really important. The standalone dashboards are still accessible. So if you decide to create your own dashboards, or you want to use this overall Grafana instance for your whole environment, and you just want to have one instead of several dedicated Grafana instances, you can do that. So there's no encapsulating, something like that. It's just a default Grafana instance. And we are embedding those graphs. And if you want to change something off the defaults, feel free to do that. There's Grafana still reachable on port 3000. Just connect to it and modify your Grafana. Same thing for Prometheus as well. Just a little bit about architecture, how it looks like. From a web UI perspective, if a user opens the web UI, it directly talks to the OpenEdict back end. This is completely written in Python. And then we have different ways to get the data out of the ZEV cluster. First of all, you have seen in the ZEV stack at the beginning, we are using libredas together, some data from ZEV directly. Then we are asking a ZEV object gateway as well. Then the important part for deployment. And right now, for the iSCSI management and NFS management, we are querying the solved rest API. We're querying, at the end, we're using deep-sea. So we're querying the solved API. And the solved API triggers deep-sea to get all the data and to give us the data back. And then to show nice, shiny, fancy graphs, we're using Grafana embedded into our UI. Yeah, come on in. Just a short outlook. I try to keep those slides as I don't want to rush through, but I think everyone is more interested in a live demo instead of slides, because I've written it down in a presentation description. So I will keep this as short as possible. Just the outlook. And I'm really happy to do that, because it's awesome. This is the OpenEdict login screen. This has a cool logo. And there's a locking thing. It's nice. And it's a little bit of black as cool. But this wasn't the outlook. The outlook is more like this. But now it's the question, what really changed? What is OpenEdict? Yeah, you're right. We replaced the logo. Now we're using the upstream theft logo. Which is kind of cool? That's obviously OpenEdict 4.0. We released it with this new cool logo. Now just kidding, there's a lot of more behind. Why are we using this logo and why do we replace the word OpenEdict completely with theft is, and I'm really happy to announce this, is that the idea is to get OpenEdict upstream, directly OpenEdict upstream. The idea behind this is to replace the existing manager dashboard with OpenEdict. At the end, it's a theft dashboard. There is no mention of OpenEdict, the name anymore. Of course, it's still in the code. But it will go away. We will try to push all of that. We are right now refactoring it into theft manager as the default dashboard. We made this decision like I think two or three weeks ago. So this is brand new. And feel free to spread this around the world. If you're interested in on what we are currently working on, right now we're developing the initial pull request in our own GitHub repo. And ideas as soon as we have something that is compatible, we will push it upstream. That's the idea behind it. So in the future, there's no standalone of OpenEdict more, it's just its default part of theft. So as I promised from the beginning, I have a live team and I hope this Wi-Fi is better as it was the whole week. Let's give it a try. Most cool is that I have to take a look here and scroll here. That's really nice. No mirror screen. Let's see if this is working. Nice. The Wi-Fi is awesome. I do it this. So I skipped the login screen. I know it's awesome, but I skipped it. I'm already locked in. Everyone can take a look at this. There's also a live demo on demo.openedict.org. If you want to play with it, feel free to do it. It will be reset every day at midnight. So if it's broken or not reachable for a few hours, then someone did something stupid and then it will be there next day again. This is the landing page. This is the landing page. This is the initial Kravana dashboard I was talking about. So what we did was we embedded the default as you can see, a default Kravana dashboard into our UI to use it from there. There we get a quick overview about the status of your current class, the heel status. How many pools do you have? About the size you used. Within that, you have all the cool Kravana features, this overlaying. You can zoom in directly. All the graphs will update automatically to this timestamp you have chosen. So this is really cool. Then we added not just a single dashboard. Those are the default dashboards from our end. But as I mentioned, if you need more, just add them within the Kravana UI. Just one thing to mention, name them differently. Because if you name them the same way that we did, you do an upgrade right now. It will just override your files. So just name it differently. Maybe to mention, because I put it on the slide, this is the node statistics. We're gathering now CPU memory, all the data, and visualize them from every node that's part of the cluster. If this would have been physical hardware, we would get those data as well. But those are virtual machines on hosted. So there is no smart data. You can switch between different nodes. This cluster is a cluster out of seven nodes. So if I'm interested in this node, just click on it. You get directly the information from this node. Same thing. I'm not quite sure. I want to show all of the dash, just a few. For OSD, for example. So if you're interested in a specific OSD utilization, how many PG's are stored there, we're gathering the data and visualize it here. Let's go to the OSD tab. This cluster is really big. It has in some six OSDs, which is amazing. If you click on OSD, it will automatically show you the graph you've seen on the initial dashboard as well. So we embedded those pool OSDs, node graphs, on those tabs as well. So there's no need to go to the initial dashboard, but you could if you want to. One thing that's really new is to configure cluster-wide OSD flags. So within the UI, you can now set flags like, you can pause the cluster, of course, which makes it inaccessible anymore. You can send no-scrap all the things you're usually using within the UI. That's not possible within OpenMatic. From the RBDs tab, same here. You get some initial information about the RBDs. Then the statistics is, again, the Grafana dashboard, and there are no statistics. That's cool. We can add new just to show you the dialog, what we are able to do. You can give it a name, say, okay, which pool it should belong to. That doesn't make sense. You can give it a size. You can use the default features or adapt the features to you need. We also tried to add a few hints and tried to combine those features because there are several features that are belongs to each other, and it's not possible to set a feature without setting the other feature beforehand. So we tried to visualize this within the UI. So as soon as you click a feature, all the other features that are not possible right now are created out and vice versa. Yeah, this is one of the hints we've added, for example, you have done something and now it reminds you you really want to leave the page because you did not click create or what else. This is the pool, looks quite similar with details of the pools. You also get, again, the statistics. So the dashboard of those pools, there they are, there should be more. Yep, here it is. There's more. The resolution of this beamer is awesome. Just to show you here what's possible to do, you can say, okay, I want to replicate it or in a ratio-coded pool, for example, if you click on re-applicated, you get those various variables you can configure. Same for a ratio-coded, it will instantly update the fields and then you can say, okay, what do I want? I want compression. Now you have to select an application on top. You have to select, thanks, you have to select what do you want to use the pool for, for example, for CepFS or RBD. That's all possible within the UI. And I guess this is a link to upstream. If you have a problem to calculate replacement groups, which could be sometimes a little bit difficult, there's a direct link to upstream with the formula, how to do it, to calculate what's best for your specific pool. We improved the notes tab a little bit. In the last version, we just had shown the host name and that's it. Now we also show cluster, the roles, those are assigned to those nodes. So for example, this one is our master administrator and automatic node. This is monitoring node manager and storage or this is an ITW for iSCSI gateway and also storage node, for example. Same here, if you click on statistics, you get a statistics for just specific nodes. And you can also trigger a task. We included crop and deeps crop. So with the UI or within the UI, you can click and say, okay, I need to scrub this node. iSCSI, that's one of the new features we've added in the backend powered by Deepsea. We're using Deepsea. Within iSCSI and NFS, we also added the managed service dialogue so you can stop and start the iSCSI and the NFS Ganesha services or at least restart if you have to and just to show you what we've added. First of all, of course, iSCSI share needs a name. You have to specify the portal as in this cluster, I only deployed one iSCSI gateway so I can only select this one single node. I can add, for example, an image, it's a demo image and then we added authentication as well. D4 is just user password and the initiator list, but we also added user authentication and discovery authentication and you can combine them as well if you want. I know it's sometimes kind of hard to do that. I usually just stick to user password but maybe some of you using, for example, I don't know, user authentication so you can still do that. Same for more or less NFS. There should, in a few seconds, appear right here also the managed service so there we can also start and stop the NFS service if we have to. We get those, what's really nifty, we added, for example, those little hints so how do you mount this share to a specific client? If you have no idea how to do that, just copy and paste this command to your client and that's it. So we try to improve the usability over time and there's lots of things we have to improve but I think we made a good step forward. Let me just add an existing share to show you what's possible to do. Just five more minutes, wifi please. I have 10, amazing, then I can just reset. Yeah, there should be an added dialogue of NFS. I think everyone, can you mention? Yeah, you have a question, sure. Yeah, sure. Okay, the question was if we can do something interactively with failure, so for example with this crash or something, automatically if something happens to the cluster, if we can trigger something, no that's not part of the web UI so far or not part of the core. You get just notified that there's something wrong and then you have to do something. Couldn't load NFS export, oh no, it's something. That's amazing, let's click here. So no, right now this is not automatically possible but I guess that's something for the future. That's really cool. I don't know why let's click on that. I don't trust this wifi here. You shouldn't. And I don't. I don't know then more, how many clients are connected to it? I don't know. I can't set up a hotspot. Or I skipped the NFS part. If it's not, I can try to let it program. Yeah, why so long let's give it a try. I know if it's a general wifi problem or a problem of this instance. Or someone of you, why I was talking connected to the live demo, right? And did something. Evil you, I haven't, should not mention it. That's the reason why I'm in full screen, oh no. Object gateway. So we are now able to manage the users of the gateways as well. So create new users and adapt quota for example. Initially you get those details listed here. Then we also have statistics again. I hope this is working better. Yeah, of course. You can create your users, add for example, sub users for Swift. You can add the keys you need and you want to. You can adapt the capabilities of this user. For example, a user should just read for example. You can set just adapt those. You can set user quota. And interesting, you can limit it by size or by amount of objects. And the same for the bucket. So you can also limit. I'm not quite sure if you are able to see this because of the hats. By also size and quota if you want to. A really cool and amazing thing is still the crush map. We've visualized within the UI. It's just a visualization of the crush map. You can't do anything with it. It's just a view only. And 2.x, we had a edible crush map thingy included. But yeah, after you figured out that it's so easy, we implemented that you were able to just track and drop for example, your rack or your notes within the tree and click on save here. I think this is a great idea. I want to reorder my whole crush map. And we figured out maybe to prevent the user, this is not the best idea to let them play with. We need something more guided like wizard to create a crush map or to change something. So we decided we want to remove the edit feature for now and edit later again as soon as we have something that is more major than just a track and drop virtual tree. But it is still there. But I think I'm definitely sure it will go away more or less with self-upstream. Users is unspectacular. I guess the cool thing about it is that it's the settings page we added to the new version. It's now possible to specify, for example, the Deepsea host, the Grafana host, or the Object Gateway host within the UI. You don't have to go to the CLI and configure the IP addresses, passwords, whatever shared secrets there. This could be done within the UI. This, for example, this cluster was spined up with Deepsea completely. So those details are automatically added with Deepsea if you deploy it with Deepsea. That's what I mentioned that Deepsea is capable of deploying OpenMatic and do the initial configuration. So we have the same for Object. Here's for Grafana. So if you have, I don't know, dedicated Grafana instance in your environment somewhere else, sure, why not use it. And last but not least, the pass to the key ring and the user behind. What is always really helpful for us is this little thing, I just want to mention it, because it's not used that often, report a bug. So if you found something, don't just close the software and run away. It would be really helpful to let us know that you found something or you had problems or documentation is totally horrible or the whole thing is total crap, then we want to hear why. That would be really important to us. It's not, on the one hand, it's really cool if you say this is really helpful, but what is even more helpful is to know what is not perfect, what is missing and what features, yeah, are not working currently. One thing before we go to question and answer is the API recorder I just want to show you. You can click on the API report. I hope this, we can give it a try. I'm not quite sure if this really, this is not a big cluster and I hope it will survive. I will show you the API recorder. This is something for those folks who have, don't like to read the documentation in general. How many space do I have? I guess one gig is sufficient, right? Blah, blah, blah, it doesn't matter. Yep, there it is and now you can click stop and the cool thing is now we get more or less, I would say, up and running Python script out of it with the calls that we did. So that's exactly what we did. We're calling our own REST API within the UI, so this is what we did. So you can adapt those settings and use this script to, for example, create new RBD devices and so there's no need to read the whole documentation because most of the time I don't like documentation. So as we have just five minutes left, I would switch really quickly to question and answer. Yes, sir. Yeah, yeah. And at this point, you're definitely right. You have a problem with the single management system. You can make it more or less high-available with, for example, syncing the data and do a pacemaker cluster behind for the management thing. But at the end, this wouldn't influence the accessibility of your data because if the management node is down or corrupt in any case, it doesn't matter, your clients can still access the data. This is completely independent of the deployed cluster. But yes, in this case, you're totally right. This is exactly what we are currently working on. Exactly. So the idea behind or the outlook for the future is to have this all based on the core. So that it is part of the cluster to do the deployment. But for now, we have to stick with what we have and this is from our end, Deepsea and upstream users. Yeah, most of the time, I guess, that's Ansible. Yeah, sure, sure. Doesn't matter. So that's what I mentioned. What we are using Deepsea to deploy the cluster is just for deployment and also you can use it for orchestration. But if there's something broken within this master management system, it doesn't matter. You can manually fix your cluster. You can still access the web UI, for example. You can just spin up a new web UI because it's stateless. This will directly gather the data. So yeah, it would be, it's not, it's in this case a single point, but it doesn't affect your clients or your data. Yes. So it's a question about how opinionated the open advocate is. We're running, for instance, Ceph using the Ceph Docker project. So we're running Ceph with Docker, ourself using Ceph. And we have our stats in OpenTSDB and we display them in Grafana as well. So how well would open addict play with that scenario? Generally speaking, out of the box with Grafana wouldn't be a problem. We can still show all the data within the UI. The only thing that would be missing is the iSCSI NFS management part because therefore we need deep-sea right now. So this part would be, yeah, not usable within the UI. But the rest is fine. And the idea or the plan behind why we're moving upstream is to remove this dependency as well to be completely independent at the end and not relay on, I don't know, specific deployment tool to remove exactly this behavior. Another question? I have one minute left or? One minute. One minute. Yep. On the OSD display, how does it scale if you've got hundreds of elements there filtering and positioning? On this page, it's not a problem because we do pagination. So we gather the data and then we have this pagination here from a default of 10 that can scale up to 100, for example. But you're right, and we already replaced this just to mention it. For example...