 Okay. Everybody to our first roof talk and welcome Alexander, the roof maintainer. Thank you. Yeah. So my name is Alexander Trost. I'm a well, roof maintainer. I'm just a side note. I'm a DevOps engineer at Cloudability. And today I'm kind of going over using Rook or well, yeah, using Rook in Kubernetes to run SAP in Kubernetes and kind of going into like what are the advantages of doing that. And yeah, from a genocide kind of like what is Rook, a bit of the architecture of Rook, a bit into like the integration with Kubernetes that well, in the end, most of the time you want to use the SAP storage you're running with Rook inside of your Kubernetes cluster for your application. So let's say I don't know, like my skill database or something. And then well, the advantage is kind of like how can it help you doing that. And one part which probably falls flat due to technical issues with my laptop. Well, I'll see if I can get some video or something up that we kind of get a demo of the creation of a cluster and also how easy it is then in Kubernetes to consume the storage there. And while you're adding and removing a new note to the cluster, to the SAP cluster in the end, well, I'll have to see. Yeah. And in the end, just like a why Rook, a few more points there and yeah. So Rook is a cloud native storage orchestrator cloud native. Everybody talks about it's it's well, those this cool cloud thing, you know, it's specifically made for being run in a Kubernetes. Yeah. Which simply implies that it's in a container. So it's basically a cloud native container storage orchestrator. And through this orchestration, Rook, well, can run for example, SAP, but also other storage softwares, which I will come back later on. For you in your Kubernetes cluster. The special point which, yeah, Kubernetes allows through the extensibility of the Kubernetes API is that you have the possibility of creating your own custom types and also well in the end controllers, which react to these custom types in Kubernetes. More on that later. And in general, the goal here is that for the orchestration part, we try to have the deployment automated as best as possible, bootstrapping of the software of the for example, in this case, a self cluster, the configuration provisioning, scaling, upgrading, migration, disaster recovery, monitoring and resource management. Those are all points which we try to cover. Not all of them. Those points are covered yet. But well, it's open source. We are happy to see people help us reach the goal. A full automation of your, for example, SAP cluster in Kubernetes. As I already said a bit earlier, Rook is not only SAP. SAP is one of the many storage providers that Rook can run for you. For example, MinIO, which is an object search. Maybe you've heard about it. Then there's also from an excenter guys. It was HFS. It's well kind of in the direction of like SAP, which provides block search, file search and also object search. But in a more like, well, and I think they have a huge point in like gear replication by default kind of in it. Definitely worth to check it out. And some other tool, which I have laid out on a slide, which are currently on in Rook. It's, well, mid-fossum, it's open source. It's especially what is good for us. It's hosted by the Cloud Native Computing Foundation. It's currently in the incubating state of the of the CNCF project level. Yeah. Rook framework. Rook is not only there for just running SAP, but also for MinIO, HFS and so on. Rook tries to get a generalization of certain types, which are common across those multiple storage back ends. So, for example, to say it with, for example, yeah, SAP and HFS, the selection of the devices on nodes is the same. There are no two types which are different. This is one type where we basically have a list with some more details, which we go in later too, of which devices to use. And only already with those common specifications, specs used between multiple storage providers, it's most of the time pretty easy to, well, not that I would recommend it too much to instead of like distro hopping, as you storage software hop all the time, but in the end something like that would be possible because you could just copy and paste the device list from, well, SAP, for example, then to HFS or, well, back again. Yeah. With the framework part of Rook there, we not only try to have this common ground of specifications, we simply also try to have combined testing efforts that makes it easier for all those platforms to, which have their own operator in Rook to be, well, tested together, have policies and stuff, just share the code that is not like every operator which, for example, would run something like SAP, like HFS and other possible software backends, software storage backends that they don't need to write that from scratch all the time. So that's one of the goals with the Rook framework basically there. Yeah. The architecture of Rook, it's, well, it's going a bit into Kubernetes right now here as we're kind of a general look right now here. We have the Kubernetes API. We have our client utility which we can use to talk with the Kubernetes API and for Kubernetes uses as the further data store of the API objects. And if we basically start from here, we have our client utility with which we tell the Kubernetes API, we tell them about these new objects. So that in the end, we end up with objects like SAP cluster, SAP block pool, SAP objects or SAP files and well, for the other operators, if you install them too, obviously those two. When we have created those objects, we can access them again, just run it like for those knowing about kubectl, like kubectl gets SAP cluster and we get the object as it is currently in the API. And in more common, more current releases of Kubernetes, I think with one something, 12, 13 or so, you can get some more information about the object as it is if you simply run the kubectl get to, well, retrieve the object from the Kubernetes API so that you have a quicker overview of if it hasn't been created successfully, how many monitors am I running and just some general information. Moving on to the operators. As said, with the multiple storage backends, for example, we have the SAP operator there then, with MINI operator and so on, for each storage backends operator. And the operator basically takes care of creating the deployments, the DM sets and all that is needed to run the storage software. So for SAP, it would be creating deployments for the monitors, creating monitor deployments for the OSDs and all other components like manager, MDS, RGV and then also, for example, configure certain aspects in the manager like enable the dashboard and, for example, enable SSL on it, disable SSL on it, stuff like that. And it's making it simple by this. This is possible again because we have one common object, the self cluster object here. With this one object, with which the Rook operator knows how to handle it and has all the options we support in it, which you can just go ahead and add it for not all options right now, but for a good amount of options, you can go ahead like toggle the dashboard on the fly, if you have it on false, when you create a decluster initially and just go edit it, put through and apply it against the Kubernetes API and the Rook operator will pick up this change and do its thing and enable the dashboard. Now, moving on to this part here, we are, this is basically like we are on the node right now. The demons are obviously for, let's say, for example, SAF, we have a monitor in OSD or, well, one of the SAF demons, which is placed on a node, if, well, a mon is placed and do you want to use the disks on a node also for storage, you would, well, obviously, also have OSD demons running on that. But all containerized, so there's no like, well, conflict with other stuff running on the system. It's, well, in container, so, well, there are certain differences between which run time, container run time you use from the isolation level, but they're isolated at least. So, the container part there is also the point where the upgrading for Rook is also making it easier, because we just go ahead in the deployment, which, for example, would run in OSD there, would just go ahead to the Kubernetes API and say, hey, change the image from SAF version 12 something to version 13 something. And for the other demons, basically, the same update aspect comes in place too, because Kubernetes does this management for us. Then there's also the Rook agent. The Rook agent is, in our current case, the so-called, yeah, Rook flex volume provider, plug-in provider. This is, well, flex volume is kind of like the predecessor of CSI kind of in Kubernetes, where CSI is the container storage interface, which, well, kind of, as the name implies, is like a common interface for storage, not only a thing for containers, but in general for storage. And flex volume is kind of like that too, but more on a limited scale as I know, especially for Kubernetes there. And we have it, we have it running right now still with flex volume, or just more or less to the point because of Kubernetes with having CSI support on stable, no beta or stable with version 113. And we're, well, currently opening like the last three versions of Kubernetes, so we still have to have to flex volume. But to the flex volume here, if I have a pot which wants to use a volume on this node, it would, no, the kubelet, the node component of Kubernetes would call out to a socket on a node, a flex volume socket, and then the root agent would receive this request and then, well, go ahead, get the information of the volume and do the RBD mapping and then the RBD mounting. One quick note, the kubelet, I think, is in most cases doing the formatting of the volume, so yeah, I think the formatting is done by the kubelet, which, yeah? I'm not sure if you're going to, is this accomplished with flex volume or will it be accomplished? I'm trying to answer a question, sorry. So, you mean the block device, the, if I mount, map the volume? The OSTs? Yes? Yes. Okay, so if I understand correctly, the question is that how are the OSTs able or getting the access to the block devices? Well, right now it is simply done by mounting the slash depth of the host into the container. There are ways, I think, especially through, what was it, local storage, local, no, storage from Kubernetes, which has been introduced, I think, like 1.9.1.1.1.1, I think. 1.9? Yeah. Which could potentially be used for this exactly purpose. We are still kind of looking into how we can make it, use it like this, but we're kind of like right now a slash depth is basically working for us also. But we also are already at least looking into restricting the slash depth access to just the device and or the partitions that are on this disk and for the OST. But there's still a good amount of work to do to get there, but yeah. So, yeah. So if I create the RookSafe operator and then go ahead and create a RookSafe cluster object too, well, I have, as already said in the previous example here, we have the agent pod, which is running in all nodes. And the operator then, as said, takes care of creating the SAF component parts like the OSDs, the monitors, manager, MDS, RGB is also there. And, well, in the NGO for SAF cluster with that. So, to summarize this part, we have this custom object with kind SAF cluster, which the Rook operator, the RookSafe operator has defined. You create it and what you currently with that would get is, no, we're a SAF cluster running SAF version 13.2.4, something, something. And ignoring your data, the, well, data, the host path in RookS right now where most config is stored and monitor data. But we're working on trying to, well, make it really just for configuration, maybe still one data, but not OSTs. Because right now the behavior would be that if you don't specify like disks in anything else, you would get an OSD using a directory of, well, basically a file sort of directory OSD, which, well, if you have this from performance devices, way better. The disk one is way better. Yeah, but coming back here, we have like the dashboard part here where we say, yeah, we want the dashboard, so we have enabled through. We can control a bit about the monitors like how many do we want. If we want multiple monitors on one node, this is if you have, well, only three nodes, you may or may not want this to happen. Or if you have multiple nodes, you normally want to disable this more than three nodes, for example, because then you already always have a room to move a monitor from a node which failed to another node and, well, create a new monitor there, basically. This part with the search here is here, for example, use all nodes, use all devices is kind of like controlling, there's more to control, but those two options are kind of allowing you to select where storage will be used and what's storage will be used. With the use all nodes, it's kind of, well, use all nodes that the root operator can place OSDs on it, which are pretty like a valid and Kubernetes aspect. Use all nodes would imply that all empty devices will be used for OSD, and if we skip a few slides ahead, we come back again to this storage configuration part where we have, again, those use all nodes and use all devices. Instead of, for example, having use all nodes true that all nodes which are applicable will be used, we could also specify a list of nodes, which will have information on the next slide. We can also have a simple device filter where we go, yeah, use all disks with SD and, well, wildcard, basically, and then Rook will take all disks with this matching Ragex besides disks that are not empty. There's also possibility to specify certain configuration parameters, not only in a cluster wide level, but also on a node and even OSD part level, where in this case, for example, we have an option which normally is only used for NVMe or, well, faster than SSDs devices, is how many OSDs should be created per device. If we go just one further, we are at the nodes list here, and there you can see that, for example, you have this node, you say, well, use a directory, have those resource limits in place from Kubernetes site that container, the OSD container for this node, as it's on the node level here, will have those resource limits in place. Then, for example, here with a device, you have the node name and, in this case, specifically say, hey, use SDB, but you can also, again, if it would be, for example, NVMe, as it's the second or the second device here, you could then specify like config and then OSDs per device like free or, well, insert a math for NVMe devices, how many OSDs here, basically. There we even have it, OSDs per device, where you can just have it per device and our node level on cluster wide level. So that's not all, just as you find where I left off, yeah. To shortly go into the further Kubernetes native integration part, besides the so-called custom resource definition, those custom types, objects we can create, we already have in Kubernetes storage classes which allow us, the admin most of the time, to specify which provisioner and certain parameters for this provisioner to be called. The provisioner will be called if you create a volume now, a persistent volume claim. If it matches this storage class, if it has this storage class in it, Kubernetes will take care of talking with the provisioner and saying, hey, somebody requested 20 gigs of storage, for example, if you go for one further. Here, just example with 20 gigabytes of storage, and here we specify the storage class names, especially use the one we just saw. If we create this, this will result in 20 gigabytes of storage. That's kind of where Kubernetes takes, well, takes over in point of like storage management, as Kubernetes has all the persistent volume claims, which I hope for everyone using them, always reflect the claims an application has made, like, hey, I need 500 gigs of storage. It's always this application is using 500 gigs of storage. In the background, it's kind of like a not on Perl application level, you have the persistent volumes, which are, again, also having the information like which storage class has been used to create this volume, how much storage is this actually, what access mode is used, like, Kubernetes has free access modes, where the first is read write once, which is, well, only one can read and write to it. The second is read write many, where, well, many can read and write at any time at the same time. And there is read only many, which, well, many can read only at the same time. Yeah. So, what can we kind of do to make it, well, if you have a set cluster, especially because, well, you need to run a set cluster, make it better for you to, well, run a set cluster in Kubernetes, well, especially, I said, if you have one, if you don't have one, and you're like, ah, let's go, it might be a bit problematic because you need to know a bit of stuff with Kubernetes. It's, well, the simple fact that if you start one higher with a platform and then put it on, it's, you should have knowledge about a platform. Part of what RUG does is, as we heard, health checking for monitors. So, if you have, let's say, five nodes, and you currently have, just for simple example reasons, have your monitors one on the first node, second on the second, and third on the third, and the third node would fail. It would move this monitor, fail it in the cluster, and move it over to the, let's say, node four or five, which are available for that. The simple management, simply through the Kubernetes API, in a second, through the Kubernetes API in point that you have those YAML manifests kind of, yeah, say and put it like this, with infrastructure as a code kind of. It's, yeah, more and more, well, not only infrastructure as a code, but kind of application deployment as a code, depending on how far you see it. You have those YAML files, which define how your self cluster should look, how, which pools should be created. You just have a YAML file, which has here a pool object with these and these names, how many replication, should you use the ratio code, and should it, which failure domain, which, as the YAML object is, well, you just, KubeKettle, create it through the Kubernetes API, and you have it. You have your pool created in a few seconds, and it makes it easier on that side to, well, manage those things, too. Files are, for example, same here, too. It's not like where you owe, I want you to use file system. Let me buy five servers and install seven, well, set up the MDS. With that, with root, well, you would need to put Kubernetes on it, and then just create a file system object, which has size numbers and everything, like for the pools, again, for this file system. What else are the numbers like? How many active monitor, MDS do you want? Should there be a standby MDS to, and just gives you a bit of, well, well, playgrounds, not to say like about a good amount of room to have certain options and have that automatically happen. Same for RGB. You have an object, control it through that, have certain options to control the behavior of the pools or how they should look. Yeah. And the third point I would like to bring up here is that, as we saw with the storage selection, again, there, you have a self-cluster object, you have either a list of nodes or a more generalized device filter or something where you say, use those disks in the folder cluster or even directories right now, which just as a side note for the storage selection, we have Travis Nielsen, which is also working on the root project, looking into enabling LVM kind of with the self-volume part to be created, well, through self-code, well, no, self-code, and not as it was done since the release of 0.9. Now, it's done not for all cases like that anymore, for Roop's own, well, it was the preparation, partitioning, formatting, and so on process. Well, we already went through that. Well, the y-root part is kind of covered by what you would get as benefits if you have a Kubernetes cluster for something like that. But it's kind of, in general, it simplifies certain tasks for the storage backend. There's assets, still some work, like, for example, replacing a node right now. It's still a bit of a manual process, but we're getting there. It's more or less a bit of a clash, like, should we listen on the nodes that are in the cluster and then react if one goes missing, for example, or is already just not ready, or just some points where, for whatever reason, maybe an Etsy issue or something, there's an empty node list, so we could go, ah, nothing to run, so just grab it all, kind of. But, well, as we're happy to just, well, see people maybe just chime in on the discussions on those topics. Yeah. Self-managing part is kind of that it's, well, we don't have lifeless probes and readiness probes for all self-components, but we think we have at least for the manager right now and for the monitor, we do kind of external half-checking from the operator side to see is it, well, is this monitor still in form or do we need to fail it over to a new node? Yeah. And, depart with the dynamic provisioning there, that I create a claim and few seconds later, my block volume in-self has been created and linked with Kubernetes through the persistent volume object is, well, it's a breeze to, if you want to run with applications, we don't need to, like, well, fill in a ticket to say, hey, I need 20 gigs of storage. Can you please provide it to me? And someone starts running through the data center and finds, tries to find the data, they just got something for you. But, it's simply dynamic there, which is pretty, pretty good for, well, especially for developers. We are also interested with, kind of like with the guys from cross-plane IO, Jared Watson, Basam Tabara, if I got his name correctly, we are looking into also trying to get more abstraction in, like, for example, and a developer would say, yeah, I need a bucket, so he just creates a bucket object. And, for example, for, we also have support for Cockroach GPU, so we would, obviously, as is the database, also want to look into having a database object which the developer would just create and then in some way, which is, kind of, still up for discussion, should it be a service broker, should we, like, do some magic, like, with, in Kubernetes, well, mounting a directory into then the pod, which contains the credentials, for example, then to this database or so. So, again, there, if you want to, kind of, jam in there, we're happy to have discussion on that. The last point here in this list is, well, it's, it's depending on, depending on how you see it. One problem, I think, which is still existing for AWS with the, there, what was it, EBS, the block storage from Amazon, I think, what was it, the failover time was, like, when the node failed, it would take, I think, like, five minutes or so to failover to another node and, well, I hope you're only running cloud native applications or it shouldn't be a problem as you have more than one replica, but it's still, kind of, not too good numbers, even though it's cloud native and it should be okay with a failure of a pod or even multiple pods in your cluster, multiple applications. It's still, kind of, a bummer to have, like, five, five minutes in the case of a failure of a pod. So, that's, kind of, where I'm, kind of, saying it's the last vendor login, because, well, you would run a self-class, which, yeah, only one hand could also create more maintenance instead of, for example, taking care of just the EBS volumes of Amazon, but the failover is, in this case, for example, faster or you just don't have to use their storage. Yeah, yeah, as, kind of, yeah, said in the beginning, it's not only SAF, there's not only a Rook SAF operator, there's also a Rook Minio operator, which, well, is object storage, so we have a Minio object storage object, we have also Cockroach GB, which, well, we, right now, we only have a Cockroach GB object, but, again, we are open for people communicating and discussing with us how we can go to best path if it's more on a service broker approach or not approach. HFS has been implemented by the next vendor guys, just there, again, also a big shout-out to them, as it's a huge work, it's not like a Minio, we have, like, two or three replicas of, honestly, it fills it only, but there's, again, kind of, like, SAF more complexity simply behind it, and it's amazing to see them have implemented it in Rook, and so they're using certain parts of their Rook framework. And an FS server has, as Minio and Cockroach should be, and, well, HFS too, implemented thanks to their community, so, yeah, well, I don't have time to demo right now. Let me get to this page first. If you want to get involved, we are on GitHub. I think some people are still on GitHub, right? Nothing like with Gitlib. We have a page which we are currently reworking, so if you're right now, like, where do you find this and this, we're working on it, we know of the problem, and, well, we have a Slack channel, Slack, no, it's a whole Slack for our own. Yeah, if you want to join, we also have a conference with Channel, and, well, if you have questions with Channel or Channel and so on. Well, Twitter, we have them, we still have a mailing list, I think, and there's community meetings every second week. Yeah. I would just jump to the questions right now. So, the question was that there's an option to run multiple monitors on the same node, right? Yes. Yes? I mean, that's... There are certain people which probably run Rook with just one node and monitor can't free, so... Yes, yes, officially and development purpose only, yes. The second question is about the monitors failover. You're saying that if a monitor dies, it will be failed over to somewhere else. I'm fully aware that the monitors themselves are able to sync from an existing, at least as long as we still have one available monitor, it will be able to sync. But now I'm curious, what's the storage backend for those monitors? Just your plain disk. So... Is that disk mounted to the container? Is Kubernetes itself or Rook managing to copy that data to some extent, or is it relying on the monitors synchronization mechanisms? Let me repeat the question. So, the question is, for the month failover, how does the data get to the other node, or does the data even get copied from the node which failed to the other node, which is then the new monitor? No, it doesn't get copied. It's a brand new monitor, which, well, just a new directory, which... Well, what was it? Like, just the one host sent here, talked to them, and as you said, then the synchronization basically kicks in for monitors, and the new monitor should be, well, up to speed in a few seconds, I hope, at the longest. Yeah. Yes. Okay. So, the question is that if a monitor fails, what will kind of happen to the pods than the juice storage? So, from Seth's side, the monitor list is always kind of, well, to be like, fluid. So, if one monitor fails, well, it's still in the list of the conflicts most of the time. So, if you bring up a new monitor, and it's back in the... We're talking with the other monitors, this new MonMap is basically where the new monitors in it will be kind of distributed to OSDs, or, well, the applications again, at least as far as I'm concerned. Okay. So, you're literally creating a new monitor, and not just reusing the same ID and IP? Yes. We're creating a whole new... Basically, disfiring the discussion for later, maybe. Oh, yeah. So, you showed us how to storage the network layer, like find cluster is also kind of important. How does that get defined? Good question. So, the question is how does the network layer or, well, how does the network kind of... Well, what would be the network layer if you, for example, then puts F on it, and you have five thousand other applications running in a cluster, right? Kind of. Yeah. Okay. So, to kind of extend on what I just said for the question, it's going into the direction of separating the SAF traffic, the replication traffic, and the client traffic. Right now, you would go and use host network mode, which, well, currently Kubernetes state is that you only have one interface per part. Well, if you use host network, you get a nodes network set, which has other advantages and or disadvantages to say like this. But at least to go a bit further into the CNI part there for Kubernetes, there are multiple, I think, two or three projects right now, which allow you to define multiple CNI plugins in your... In the config somewhere and then have multiple interfaces in your pods, which I think Intel, at least, well, I heard it from Intel at like container days where they showed it off for like vibe stuff and so on. And for them, it seemed to work pretty good. But right now, host network would be the way to go. But, yeah.