 Hello everyone, welcome to the roof intro and deep depth with Seth talk. I'm here with Blaine Yeah, I'm Blaine Gardner. I'm a storage engineer at IBM and rep maintainer Hello, I'm with Wasserman. I'm Seth and OpenShift data foundation, which is work and set and lots of cool stuff in open shift at IBM architect at IBM And I'm Jay-Z Lopez. So I do pre-sales and consulting and whatever you want around Rook, Seth and anything containers related to storage so and real quick to myself, I'm also maintainer of the Rook project and Working for Qt Technologies Inc. as a funding engineer Okay, so What do we want to talk about today in the roof intro and deep depth? We want to give you an introduction to Rook and Seth for anyone who's new We want to talk about the state of the Rook project What's in the making any cool features we have cooking up and some real-life example and Additionally, which I think is interesting for a lot of people. Also, the new cameras to Rook is application disaster recovery and Especially also day two because second day operations well First day it works second birthday. It's broken. We don't want it. So let's talk about it So first of all a few questions Who's here to learn about Rook for the first time? Has any one of you already experimented with Rook? Okay, and who has Rook deployed in production, but like I guess we can see production as a bit of a stretchy term even like in test environments or something. Okay, and Question which is not on the slide Where do you run Rook? Do you run it on in the cloud in the virtualized environment? On-prem Okay, on bare metal on-prem Okay Yeah, let's talk about Rook Yeah Yeah, to get us started I want to talk about just kind of an introduction to Rook Starting from the very beginning. What are the questions that led to Rook? so in a in a cloud world where cloud providers are usually providing storage what you do in your own data center and And Also, like why you know what what storage is normally part of Or like not part of the Kubernetes cluster But why why can't it be in the Kubernetes cluster for the Kubernetes cluster and then managed in the same way as Kubernetes applications? So these questions and some iteration led us to some goals for Rook and so that's make Storage available to Kubernetes applications and have them be Kubernetes native just like the stores that you would get from cloud vendors and We also want to make that easy so automating the deployment configuration upgrades and At the time we were seeing This new operator pattern and how good it was so we wanted to to implement that and use that and leverage that We also wanted to make sure that it was open source that it's like free for whoever wants it also, I think understandably the project did not want to roll its own data platform But instead looked at existing data platforms and what they offered and settled on stuff and some of the reasons why stuff was chosen is Because it is a distributed software to find storage solution It provides all three major types of storage. It provides block It provides shared file system and it has best-in-class support for S3 APIs And it's often even bug compatible with AWS and Beyond that it's it was already why widely trusted in enterprise used by thousands of organizations and Something that that I find which is personally very cool is that an old like a very long time Contributor and user is CERN who uses it for their large hadron collider and their particle physics data And they have one of the largest stuff clusters in the world Additionally, Seth is very durable. It's designed to be consistent. So it's not eventually consistent It is consistent data safety is Seth's priority And it does this by offering sharding and this can be across availability zones or racks or nodes or discs or whatever You know, whatever structures you have in your data centers The replication is configurable. It's durable and even in disasters We've seen that it's It's all almost always possible to manually recover data even in the worst possible disaster As far as the architectural layers that we're dealing with Rook itself is really just the management layer CSI we also have a Container storage interface driver and this dynamically provisioned Seth storage and then mounts it to user applications So that when user applications are running they get access to Seth, which is the data layer as directly as possible Neither Rook nor CSI Sit between stuff and the application is application directly to Seth Installation can be via home charts via tons of example manifests that we have And there's also a quick start guide at Rook.io. I saw someone taking a picture and I'll hold here for a second if there's anyone And you can just click get started and and that can get you going Um Rook can be installed in Short anywhere kubernetes runs this can be on cloud or on premises you can have virtual hardware bare metal hardware and Even the underlying storage it can be disks attached directly to your nodes. It can be Cloud volumes even like EBS and if you're just looking to do some testing we even support loopback devices If you already have a surf cluster Rook can attach to it and then help offer that native solution using pre-existing storage and All of this kind of adds up to Rook being something that helps with cross-cloud support Something that's dear to me is object storage provisioning. So Rook was an early investor in Self-service provisioning of object storage buckets So this was done with a project that Provided object bucket claims and this was like back in 2019 and that evolved into a Kubernetes enhancement project now which we're Helping to try to like usher into beta, which is This cozy project the container object storage interface to allow a little bit more flexibility there I'll pass it back to Alexander So to talk about the Rook project in the community as Blame pointed out Rook wants to be open source It's always as far as I'm aware has been open source on the Apache 2.0 license We have for communications. We have slack get-up discussions and so on and What is especially vital to any open source project? There's also Quite some big companies behind it that maintain contribute to it and It just shows in the numbers of contributors to the Github project and the amount of container downloads Which I think we are already at 400,000 I think now, but The number is always a bit hard to calculate based on like which architecture and the old Docker Hub Download count per architect adds well a lot of people use it a lot of people like it in the end It's the magic of an operator for self-storage incubators we have graduated as a project from the CNCF and Well, it's been quite some years 2024. It's October 2020 It's It's been some time But it's the time where as we've graduated project with most of them. It's more more Yeah, evolving into what people need making it even easier for them to run the self-storage incubators and Well, I Partially talked about this already as said we've graduated since almost four years now It's already been declared stable for quite some years Even yeah as a maintainer. I think saying it is always a bit finicky, but I've been running Rookseft for like six years seven years or so and As we had it with the questions where people run Rook already like for example on prem on bare metal there That's exactly my use case that I was able to cover with it in the beginning and well still am and Yeah, I said with them companies that contributed with people being able to use it as it is on the Rook project and downstream as a product it's Showing how stable it is especially also nowadays and with more more features making more more targeted towards the Kubernetes environment Regarding the release cycle. We try to do release every like four months. That's the schedule we have the Well in December 1.13. So this year December, right? Yeah, 1.13 coming this December 1.13 in the next year then and we undergo have the regular patch a patch releases if There's not necessarily anything critical. There are sometimes of the smaller features which are able to be put into the patch releases as well. So Yeah Well So yeah, it's our one of the things about you know all the real life Things that we've seen is that we see that roots. If it's just like good getting everywhere really is really there's a great adoption to that and Obviously the collaboration we have with different like Universities and other places So you can actually watch a talk that we did the right here on that link That will tell you about the partnership that we have with all the different institutions around the globe and It's um, it's very interesting to see where it's actually popping out and who's using it. It's just like any geos and The different types of universities. So it's really interesting to just like go through that Wanted to tell you about application and business continuity to and this is something that We didn't really have originally we were really focused with roots. If thanks to Seth on high availability and The fact that the cluster would remain operational even if a node was to go down or a single drive used to go down and As we saw more and more stateful applications getting into the Kubernetes environments, obviously we realized that we needed more than just high availability So we started other projects and we're gonna name them on the next slide And it was how do we do dr on top of a cha? In Kubernetes environment for stateful applications state lays off slightly different. So we really focus on stateful So for those who are not familiar, we just come back on two terms our PO and RTO So our PO is recovery point objective So where is the data when you're gonna be able to do the recovery, right? So what basically what amount of data have you lost? Until you can restart the application and RTO is recovery time objective Which is how long is it gonna take you to get the application back up and running? So a chair remember what we said so the Kubernetes cluster has built in a cha and Seth has the same functionality that basically maps so it uses different protocols But Seth has built in a cha where the monitors maintain the cluster map So even if one month dies so because the the monitor itself crashes the pod Or the node where one of the monsters running goes down We're still highly available. So we behave and we're aligned to like with what basically Kubernetes does So we can limit the actual physical topology of the cluster. So when we deploy the Rooks F cluster We can specify where we want the components to go so we can like if you have a Kubernetes clusters that is multi-rack So you're gonna be able to apply that very same topology and tell Rooks F We'll deploy my components aligned to the rack so that if I lose an entire rack My cluster is still highly available. So we can even just go even further. We can push to the extent rarely seen to multi data centers you could actually do that thanks to the Seth topology and Features that we have built in Rook and the selectors and everything we use to select where the pieces go now something that was missing kind of a In the beginning a lot of people said well, it's a Kubernetes cluster, you know, we get we start pods We start application, but we're stateful came another need Which is how do we back up and restore an application, right? The state of the data is inside a PVC, right? So we maintain the state of the data through that data so what we had Ignored in the beginning because of stateless Became more of an evidence we need to do what we used to do in all the ways of doing when we had like rail servers or Mainframes or whatever So provide the concept of backup and restore So you're gonna grab the state of the application or all the CR secrets config maps Whatever and also the content of the PVC so that when you want to do a restore Not only you restore the actual data But you also restore the context of all the CRs of the application at the moment you took the backup So that when you restart the app after the restore you're in the exact same state so Very good for logical protection It usually offers a good granularity depending on the tool you used to do the backup some backup tools can actually do file level recovery Others do not so every implementation and the The tool you choose to do the backup Will just like be different and offer various levels of granularity when you do the backup Specifically the data that is actually on the PVC Now the next step was how do we do dr? so business continuity and Came like to concept metro dr So that you want to be able to basically lose the least possible amount of data and Restart the application as soon as possible. So that's the gonna be the lowest You the lowest up you and the lowest RC RTO and we also have regional dr Where your two data centers are too far apart so that you can do synchronous replication? So you need to do a synchronous replication and you assume and you Accept that you're gonna at some point be missing some data because of the a synchronous replication process So far each a remember that's the built-in Safe thing for backup. So we have external solutions can be Villero for the people that run Open shift. It can be OADP based type of backup and restore Everyone uses and that's the whole beauty of Kubernetes and the environment So you really choose the tool that fits the bill for metro and regional dr. So metro dr actually leverages To separate Kubernetes clusters attached to the same external Self cluster so when you want to do a failover What you do is that you fail over the app from one Kubernetes cluster to the other one and you'll reattach the resources the persistent volume claims and the The underlying PVs to the application that is restarted in the other cluster. So basically you lose no data Your RTO is basically the time it's gonna take to trigger the failover of the containers from one cluster that died To another cluster always be careful a lot of people like to say we want that as automatic as possible There's always someone that needs to check if you really want to fail over because once you start the failover Well before you you are able to fail back. You need to wait for a full failover So that you avoid any problem. So you can automate that but just make sure on the checks you do So we also leverage another project open source project called ramen dr That helps you automate the collection of the CRs of an app So that we can basically scale down the app in the source cluster if the cluster is still operational But you can also force the failover to say I cannot contact that cluster so I cannot scale down the app But the project is designed so that if the two clusters and you want to test the failover The project ramen dr will actually scale down all the pods all the deployments and everything in the first cluster and then failover the app in the other cluster Regional dr is the same concept except that we're going to do a synchronous replication between two clusters two rook stuff clusters So we can do the asynchronous mirroring for RBD based So our BD is a block device virtual block device feature in Seth and CFFS is the shared file system that Alexander was mentioning So we do support asynchronous mirroring for the two types of PVC So depending on the app you have some PVS will some PVC's will be our BD based and other PVC's will be CFFS based in both cases is the ramen dr That will take in charge or be in charge of the failover of the app between the two clusters And now it's going to give more details about data So I want to talk of very important data operation Upgrades we all need at some point to upgrade the software either to get a security fix new features bug fixes and Upgrades can be Cool and important, but they are sometimes put in us a bit with risk if something goes wrong And with work staff when we actually store your data if something goes wrong in the upgrade It means you actually cannot access the data also for other ports. So we need to be a bit more careful and And we have several dependencies. Sometimes people forget about the first one is obvious Safe version you can upgrade this safe version separately But you need to make sure that you don't need to upgrade work as well We support buckled comparably for few version But sometimes if there's a big change in stuff on your feature, you have to upgrade also work And as as the blade mentions we use fcsi to provision consistent volume and It's basically bundled with work. So if you need a new fcsi you may need to upgrade work especially for new features and Of course in both cases. I always recommend use stable versions for production workloads Don't put upstream latest workloads Your workloads on actually always never I think up some latest is only for us developer or if you try something and Then we have the Kubernetes version and again, it's because rook is an operator You can actually upgrade Kubernetes without upgrading the operator But and we actually support at least a finger on six version difference But at some point you may need to upgrade work because maybe Kubernetes change API or even deprecated them or Change something in the CSI interface and you want to consume it So don't forget when you are with Kubernetes there's the work operator to consider and look at the support metrics and Last especially on-premise Kubernetes runs on an operating system and you may need to upgrade that it's actually the hardest part sometimes and We have the dependencies with the operating system for example sometimes new features in the operating system are required for new features in CSI We are working on quality service for steps CSI that gonna you'll see group V2 New or kernel for us We also use the staff kernel clients if you need a new client that uses a new kernel You will may need to upgrade your operating system and another things happens when you upgrade a kernel you need to restart the notes and then You need to drain the note and at that point First of all, it can take time Starting physical service can take few minutes sometimes long few minutes large number You may need to drain other pods running on your own node It could be a database that needs to sink it straight into the volume It can be a VM that live you need to live my great and in that case Because you are actually stopping the surfboards. You don't have access to the data And we care about the data. We want to in surf especially but also in work. We want to make sure you're safe So I'm going to talk a bit about the AJ that Jason mentioned and how we do it So a default fellow domain is node What it means we don't only place your data on different or is these which maps to different disks We actually store it in a different node. So if your note goes down, you still have two copies of your data You have a quorum. You have a majority if there will be an additional failure We want to allow rights because then we have only one copy. We can lose it and if the other two copies or is these are up They would not know they are not up to date So you need so you want to avoid two Two fellows in the same time and work I want to make sure that if you are doing an upgrade or any operation stopping the OSDs or Other surfboards, you won't get that situation of having one copy. So we use very cold kubernetes mechanism called PDB, pod distribution budget that allows you to define rules of Dependencies on a specific types of pod in our case if a pod of type OSDs is stopped You cannot stop another pod with a different node labor because the fellow domain is known We do support as Better or more granular fellow domain at work and availability zone availability zone is basically like a color color-coded data center usually a bit far not very far but Far enough that you won't be it will be full title related different power So if there is a fault in a fellow in a zone, the other zone is not affected you can see as is is like The cloud concepts and we can actually deploy works that We call it stretch because they usually the network latency between the zones is high And we make sure I like mentioned we have the morning self-monitors put it's that monitor in a different zone monitors are like the control time for stuff and we put make sure Do I have my make sure that each replica is in a different Zone so if you have a zone failure you have two copies of the data and you can easily Write to the data and also move your pods without a zone and have the data This is high availability and again here. We use PDBs, but instead of node label. We either use work label or zone So now let's go back to the OS upgrades that require node restarts node we start I think that if it's not virtualized will take more than 10 minutes usually 20 minutes So we want to make sure that the upgrades are not too long. So if we do One node at a time. So it's three nodes cluster. Have you find half an hour? But we usually run larger cluster. So 30 nodes Five hours and those who actually run really long large clusters 300 can get to 50 hours That's more than two days of the cluster upgrading. We want to make it better. So if you use More better fellow domain we can actually if you think about it Upgrade all the node in the same fellow domain simultaneously. So here we have Three nodes is the same but 40 nodes if we split them let's say to three rex turn nodes into rex Basically, you can see most landlings upgrade in ten nodes and you get the same time to upgrade as you have for three node cluster Larger cluster how you would say yes, I can do 100 node pair fellow domain But you don't want to upgrade one other nodes This is not healthy things can go wrong So I would say you should do around ten nodes at a time and but you still haven't file only five hours for the upgrade It's much better and faster and you can ask but what if I don't have those fellow domain physically What if the nodes are splitter on rex? So I cannot use work fellow domain, but you actually can Because we don't look really on the physical hardware. We use the Kubernetes labels So if you can actually manually add rec label, let's say your 30 node cluster you can add Each divided ten nodes pair a virtual rec and this way you can expedize your upgrade. I Want a question a fellow domain should have at least five nodes why five because let's say you have two So you lose a node at that moment We want to put the data on a different fellow domain So in that fellow domain, we have one option one node to write the data And it's get even harder if you need to recover if it's a long-term failure You will have one node and stuff would start self itself healing And you want to move the data in that fellow domain to a different node But again, you have one node to choose from So not only have all the rights going to that node also all the recovery traffic and also you need the capacity So that's why I recommend at least five node you lose a node you have four node to split the load on So we presented you work. We tried to start with the basic the introduction for any one new and added talking about business continuity backup and restore and upgrades For those who are more experienced if you have questioned want to hear more about work And One of me more about stuff. We have a booth in the project pavilion. We are there every day. Those are the hours Come ask us talk with us Join this work Community, thank you. We have five minutes left as well. We can probably take some Q&A I think I think I saw a question over there. I don't know if there is a microphone that we have available So especially in Kubernetes the PV is a small Yeah, I repeat the question. What's the benefit of running work staff on AWS EBS? so EBS you pay for the size of the volume, but in Kubernetes We usually don't have those big volumes. We have small And they're not all also people sometimes try to say I need 10 giga, but actually use one giga So you don't want to do EBS volume of 10 giga and pay for all the 10 giga You want to be thin and as the issue with EBS for example TP3 is the IOPS depends on the volume size So small volumes have lower IOPS. You need at least one terra EBS volume to get really good IOPS So you can install stuff with one terra EBS volumes and then provision lots of smaller PV's on top And you get also we are AZ EBS is perzo AZ. It's not of course AZ and You can stretch it and actually it's our default in AWS to use fellow domain AZ and then we put a replica into the Z So if you have a zone failure, you have the data in the other zones and everything is highly available Thank you Yeah, the question was whether we have any documents officially on sizing I don't correct me if I'm wrong. I don't think we have any like really strong dedicated sizing guide I think we defer to Seth's guide for that. Do you have any? We basically are we basically defer to the Seth documentation because in the end you can to some degree one-to-one map that to Rook Seth If you for example go to the documentation for the CID definition as for example the section for resources, there's at least a link if You might not have that link yet For sizing in regards to disk sizes and so on. What's Amount of memory. There's some info in the seftox as well. I think we also have like some notes on that besides that, I think it's best to also ask on the It's best to ask on the Rooks like with like what you are What is your goal in regards to usable storage and so on like this would be a point to keep in mind there In regards to a storage sizing not necessarily CPU and memory, but like as a using the seftox as a guideline is a Good starting point. I think we might be able to squeeze in one short last question Is it many children? Hi guys. So my question is regarding encryption does the Rook provide Encryption at the rest for example or what kind of encryption it provides? Thanks Yeah, so Rook provides encryption at rest so you can encrypt the Like the OSTs the the demons that run on top of each disk It also provides encryption in flight and that's something that I think was added in the 1.13 release or maybe even recently in 1.14 And then there are also like some different encryption options for Yeah, for each like PV or PVC, I can add about the PV encryption So we support for an IBD based PVs. We use the encrypt To encrypt the data, so it's basically science and encryption the data is written encrypted Goes across the network encrypted and everyone kept it and for CEPFS RWX PVs We depend on a kernel model called fscript. It's in the main kernel, but we are waiting it to get to well another downstream well Linux version and then we'll have also for RWX PVs encryption Yeah, I think that puts us at time, but yeah, thank you everyone