 Hello everyone and welcome to KubeCon, first KubeCon to be in person as well as virtual. I'm Celeste Han, I'm joined with Rohan. We both work for Red Hat. I'm acting as a Rook maintainer and I'm part of the storage team. So I'm also a part of the storage team. I've made work making on Rooksef. And today we will be discussing storage and networking with Rooksef running on Maltes. So before we dive, let's take a step back and discuss some of the storage challenges that come with Kubernetes. So as we all know, Kubernetes is a platform that manages distributed apps and also distributed controllers. Most of the time these apps are stateless, but when they do need storage, it's not so easy. Typically we have to rely on external storage, so storage that leaves outside of Kubernetes. And with that comes some caveats, like the storage is not really portable if the environment changes. For instance, if you deploy on the cloud, if you deploy on Bervero, then you would have different storage and the over experience would be a bit broken. Obviously there is the deployment burden, but this could just be the responsibility of the storage team. And how about data operations? If you want to increase the capacity of the cluster, who is responsible? Who is actually managing that storage for you? If you have to go and ask for a different team to increase the capacity and how is the storage going to be allocated when these are all the questions you have. And just to add to that, we also, we might be relying, if you're not like relying on external storage, when you run on Bervero, you might be relying on providers in many services like EBS, but this means also that you have been the lock into. And also cloud providers comes with their own set of issues. You might be limited to the number of PSVs you can request for each node and things like this. So there are also limitations when you use their cloud provider managed services. And Rook is essentially the response to that, the response to how can I get this homogeneous experience of having storage that is inside Kubernetes and it's being provided by it. So Rook is open source. Rook is also commonly known as RookSeph and is a storage operator for Kubernetes. So if you're familiar with the operator's pattern, then you should know that typically operators or logical entities are responsible for the strapping software and managing them and automating their deployment, their configuration, and the upgrade. So essentially the entire life cycle. This is what Rook does. Rook is a CNC-operated project since last October. And then what I just like I said now, storage is being provided from within the Kubernetes cluster. It is almost just like another app that runs as part of Kubernetes. And what we gain from that is we really have this homogeneous experience regardless of the platform you might be running. So your Kubernetes cluster may be Bermuda or may be running on the cloud, for example. But then other things you would do at this point is just claiming storage from those platforms and then transforming it into other entities that we will see in a minute and then provide persistent storage to your applications. Then what is SEPH? SEPH is also open source. SEPH is a distributed storage software defined solution and is really unique. Unique in a sense that not only it provides you many storage interfaces, but it is extremely scalable with no single port of failures. It was really built for scale. And one of the most unique features, as I said, is it provides access and really allows you to consume storage through several interfaces. First of all, block. So block like block devices from a kernel module or a CUNY plugin like a hypervisor plugin or also using NBD, the network block device layer. It also exposes a shared file system, either using a native driver or using Fuse. And it also provides object storage, like native object storage. It has gateways that are S3 compliant. If your app is designed to interact with S3, then SEPH, throughout its gateway, can just understand all of that and your app all of a sudden is compatible. It's like having an S3 in-house. All of these layers support snapshots, cloning, geo-replication, and the list goes on, honestly. We don't really have time to discuss all the goodness of SEPH, but this is just like a highlight of what SEPH is really capable of. And last but not least, SEPH is robust and is really battle-tested for more than 10 years. We've seen thousands of production deployments going through multiple terabytes to essentially petabytes of storage. A little bit more now about now that we know about ROOP, we know about SEPH, and how do they fit together? What is the actual architecture you get when you run ROOXF? So essentially, it's divided into three, but also optionally four pieces that we'll see. So ROOC, as mentioned, is de-operated, the brain that owns the management of SEPH. Management means deploying, configuring, and upgrading. And we have the SEPH CSI driver, which is a CSI driver that allows you to that dynamically provision storage. So effectively, what it does is it provides persistent storage to your applications. More like, not more recently, but coming soon, we will have a SEPH COSY driver. Actually, we already have it, but it's really beta alpha at the moment. And what it does, it's essentially similar to CSI, but for object. So instead of claiming for storage through either a block or a file system type of interface, then you could claim bucket and they would put objects in it. Just like you would if you interact with S3, for instance. And then we have SEPH as the data layer. This is where all the data are transiting. And just to point it out, ROOC is not in the data path. So even it doesn't change anything whether you have one ROOC running or many instances of it, like SEPH is a data path, and ROOC is not interacting with it. In terms of dynamic provision, this is not we currently get from CSI as well as the bucket provisioner. So on the very left, we have a traditional claim with RWO where we'd write once. Essentially, the app is claiming one block device. This block device is being attached to the node and comes through a storage class that currently is RBD, which represents the Rattles block device. And in the middle, we have files. But this could also be true for the block layer because all the storage interfaces from SEPH do support all the WX blocks. All of these interfaces are supported by all the interfaces of the driver. So the app, for example, can consume files with the RWX fashion, which is read, write, many, which means I'm mounting this storage multiple times and I'm accessing it, reading and writing to it at the same time from various nodes. And on the very far right, we have the object functionality, which comes with a bucket claim today. So you would be creating an object bucket claim and you would get a bucket out of it with an endpoint and credentials. So access keys and secret keys if you're familiar with S3. And then you can start interacting and just connecting your app with that claim. So what about the networking in SEPH? How should we set up SEPH to be the most performant? So SEPH supports two type of networks. Commonly, we call them public and cluster, where the public one is more client facing or a client connecting to SEPH will go over that network. And we have a SEPH cluster replication network with which essentially represents the internal replication because SEPH is a replicated storage. So once the client writes once, then that data will be just replicated like two times more, which gives you replica three, for instance. And this network is mostly responsible for replicating data. Or backfilling it if you lose a node and those types of internal operations. Of course SEPH works well with a single public interface only, but obviously if you add a dedicated interface for your cluster network, you will see a performance improvement. And that's what we'll also be showing later in the presentation. This is an example of the topology that Rohan will walk you through during the demo. But typically this is like all the next networks that will be present in the demo. And this is like more or less a traditional SEPH deployment, regardless if it's part of Kubernetes or not. At the very top, we have the Rook operator that is connected to the SDN network. So it's a default network that you get from, for example, like Calico. It is a default interface that all your pods are connected to. Then we have the second network, which is the most interesting part of the third one, which are actually being provided by Multis. And we have so the SEPH public as well as the SEPH cluster. We can see that the OISs at the very bottom are all connected throughout the cluster network. But then all the, and they all obviously also connected to the public facing. If we look at the very top right, we see a user writing data in its application. And it's using RBD. And then we can see that the traffic goes through the SDN, the primary default SDN, as well as the Rook public interface, as mentioned earlier. Now I'm just going to hand it over to Rohan to walk you through the networking with Multis and to you, Rohan. Thanks, sir. So currently Rook has support for three networking methods. It has the traditional pod network where the pods use the default Kubernetes networking which has been created by the administrator. Then we have the host networking. In this, the pods utilize the host network and the network interfaces on the host are visible inside this pod. Then Rook supports additional public and cluster network for SEPH using Multis. Next slide, please. So we are using the type as MacVlan. And we are using variable IPAM so that each and every pod gets a unique IP address throughout the cluster. Next slide, please. So this is a basic network attachment definition that uses public network, that uses MacVlan type and it is linked to ENS5 on the host. So for every network attachment definition that is, which is using MacVlan type, we will have to pick up an interface on the host and we will have to refer it here in master so that it utilizes that interface. We have set the IPAM type as whereabouts and we are specifying this range 192.168.1.0. And similarly, we will create a cluster network attachment definition. And how do we tell Rook about this network? So in the Rook spec, we have a network section. We specify that we are using provider as Multis. Then we have select us for the public and cluster network. So the public is pointing to Rook SEPH public NW. Here, the initial Rook SEPH is the namespace of the network attachment definition where it is created. And if we create network attachment definition in the default namespace, we can use it throughout the cluster in any other namespace. Now back to SEPH, he will talk about the performance details of Multis. Okay, thank you, Rohan. So now we are going to dive a little bit into some of the performance analysis that we run. What we are really trying to see here is not so much how many nodes we had, how many CPUs we had, what was the amount of RAM, what type of disks we had, because at the end of the day, this doesn't really matter because what we're really trying to determine here is what is the performance boost that we can get by adding more interfaces. This is really what matters today is understanding how much performance can I get if I add more interfaces. Right, from the beginning, it sounds really obvious, like yeah, I'm adding more network interfaces, I'm getting more throughput, so I'm going to get more performance. But in what dimension, at what type of scale, are you going to get more performance? In the setup that we had, we had three interfaces, just like I mentioned earlier, and you also see that throughout the demo, we have one interface for the default SDA network, and we have a second one in each part for the public network, as well as a third one for private network. The private network is the only part, remember, of the OSDs because they are the ones writing, reading, replicating, backfitting storage, so that's the only needed, only the OSDs. So what's up with the random writes in terms of IOPS and bandwidth, what type of boost did we get? We kind of got a similar boost, so if we focus for a moment on the left diagram, the random writes for IOPS per node, with and without multis, so the blue line is with multis, and the red line is without multis. What we can see is that without multis, with very small IOs, because I mean, when you benchmark IOs, typically, they are just small blocks, right, like between 1 to 4 to maybe 8k, but beyond that, it's potentially considered as more of a output benchmark type. So in terms of IOs, random IOs that we can get for writes, we're up from 10,020 something to 12,000, sorry, to 22, and this means that we gained like 35% at the very high end of the spectrum, which is really significant. Now, if we look at the right side, we see that random writes, in turn, and we were really focusing on throughput here. So obviously, the IOs size is much larger than when we test for IOPS only, what we care about is really how much throughput, like what's the bandwidth we can consume when we run those tests. So again, the blue line is with multis, where the red line is without multis, and we can see that really adding this, like those extra networks to set up is really helping when running client IOs, and we can see that this additional replication network dedicated for replication is really helping, and we kind of see a 22% gain at the end of the spectrum as well, so it's really something, like, I mean, it's a really good performance boost. In terms of random read, again, for IOPS and bandwidth, if we look at the right side, we actually see that we got also some sort of a 22-25% gain with and without multis for IOs. And then similarly for the random reads, we got a little bit less for the reads with and without multis. We don't see that much of a performance impact on reads. Maybe, like, not most likely, but maybe due to, like, typically reads are already cached in memory, so, like, it's really always hard to, like, because we were using, like, 25 gigs of bases, so it's always difficult to just consume those 25 gigs, and then we have reads were just served from memory, then, yeah, you don't see that much of an improvement. So now I'll just, like, hand it over again to Rohan, who is going to walk you through the demo and the cluster that we have set up for you. So I'll start sharing my screen, and I will take it over for one second. So here we have a running cluster, which is in completely healthy state, and we have the public network set to 192.168.231.0, and the cluster network set to 192.168.232.0. So if you see all the component parts, we will see that every part is running, and none of the part is in the fresh loop back off. We check the network attachment definitions which we have. So we have created the network attachment definitions in default same space, and we have two. That is cluster and public. Now if we check the root set cluster, so in the network we have specified the provider as multis, and the cluster network is set to default slash cluster, and public network is set to default slash public. So when I am specifying your default, it is the name space where the network attachment definition exists. So the cluster network attachment definition is in default same space, so we use it to use it here like this. So if the network attachment definition was in root set, so we would have written root set here instead of default. See the description for the network attachment definition. So we see that we have the cluster network attachment definition which uses the interface ENO2 on the host, and has the IP range which we just saw in the step cluster settings. And then we have the public network which uses the interface ENO1 on the host, and has this IP which we saw in the network config. Now let's check the parts for the networks. If we describe one of them on parts, so we can see that net1 has been added from default slash public. And to apply the network annotation to the parts, we apply this particular annotation which specifies to use default slash public in the part. Now let's check one of the OSDs. So as we know the OSDs use cluster network, so we can see that we have net1 from the cluster network and net2 from the public network. Only these three OSD parts are the one which will have the cluster network attached. And the rest will have only the public network. So this was all for the demo for the multist cluster. I'm going to resume sharing then and we can wrap it up. So before we leave some of the key takeaways of that presentation, what you need to remember is separating SIF networks when you run SIF on Kubernetes with RookSIF is possible by using multist. Multist really allows you to add more network interfaces onto your parts. Using the IP address management whereabout is preferred. It really allows you to have a smooth deployment experience, get a unique IP address, of course, the entire cluster and without using the HCP service at all. There is a performance benefit from separating interfaces than just using a single one. And all of that is available since the RookRelease 1.7. Obviously, we keep on adding more improvements over each releases, so stay connected. So well, thanks everybody for your kind attention. If you want to reach out to us, these are our email addresses. And here are some links for resources you might be interested. Like all the projects that we reference over this presentation are available throughout those links. So thanks again and enjoy the conference.