 So, we're going to be talking about deep dive and particularly RUG, which is currently a project within CNCF and EdgeFS as a new operator, which is very similar to what CIF, but designed for a little bit different use cases, so we're going to be talking about that. A little bit about myself. So, I'm founder of Nixenta and CTO. Recently, we just sold Nixenta to DDN. And before that, you probably, Linux, iSCSI, I was a Linux hacker for a very long time. I didn't met with Linux, but for instance, if you're using iSCSI within Linux kernel, this is what I've done back in 2003. So, what is RUG? RUG is essentially is an orchestrator which is used for in cloud native storage, particularly in Kubernetes and extends Kubernetes with custom types and controllers, which are tailored for storage and soft storage issues. It has deployment. It does bootstrapping, automates configuration, provisioning, scaling, upgrading, migration, disaster recovery, monitoring, and resource management. These are all very complex tasks. And if you do this by any other means, you will have to think about essentially implementing something which they call Kubernetes. So, that's why we're kind of leveraging Kubernetes for these purposes. It is also a framework for many storage providers. It used to be just CIF a few years ago. Now these days, it's more than CIF. It's obviously a GFS. I'm presenting a GFS. There are other providers. We talk a little bit more about this later. It is hosted by cloud native computing foundation and it is totally open source. So, storage on Kubernetes. What is this really? Okay. So, this is the example of classic Kubernetes with three nodes, one master, or maybe more than one master for high availability. And then what do we see? We see processor, we see networking, we see disks. We saw, okay, so why don't we just consume those workers in the Kubernetes and build up a storage because that's all we need. We need just orchestration. And we want to run our software as essentially pods or stateful sets or something like that and consume the storage devices and provide virtualized software defined storage on top. So, we essentially deploy storage into the cluster itself. So, it runs and managed by the Kubernetes. The harness of power of Kubernetes for that. The automated management by smart software. It's a portable abstraction for all our storage needs. Rook is a framework in a sense that Rook by itself is not a storage data plane. It's just really an orchestrator framework for different providers. For instance, it does this storage resource normalization. It discovers the disks and presents those disks to the provider so they can utilize them. Not necessarily it can be just this. It could be the other persistent volumes which operators can consume. The operator pattern plumbing. So, we're leveraging this between the different operators and storage operators. Common policies, specs, logic. This is all we're leveraging, testing. And these are the current providers which we have, which is EdgeFS, CF, CockroachDB, Minio, NFS, Cassandra. And more to come this year. So, watch out. Now, what's in 1.0? And we just recently released 1.0. We announced this at previous KubeCon. 1.0 has two new operators, which is Apache Cassandra and Accenture EdgeFS. And we're going to be talking a little bit more about EdgeFS today. And we'll talk about multi-home network as example of usage and what we've done for EdgeFS. Okay, what is EdgeFS, really? First of all, this is Git-like architecture. And I just wanted to give you this snippet from the Git documentation so you can read about this. And you can kind of better understand how EdgeFS is implementing the storage within the itself. Two kind of three important points highlighted in red. One is if two objects are identical, they will have the same SHA. SHA is a fingerprint signature of the data block in Git. How many of you know how Git operates and using Git today? Raise your hands. How many of you know the Git and using the Git today? Yeah, I'm surprised that there are any people who didn't raise their hands here because Git is ultimately is the widespread source control management system. Why it's so widespread is because it's a fault tolerant and highly scalable system. And we saw that if you build storage system with the same similar architecture in mind, we can achieve enormous fault tolerance and availability. If two objects different, they have different SHA. That's how we essentially detect all sort of errors and corruptions. And then we do automatic self healing by fetching the data from different sites and regions. EdgeFS objects always cryptographically self validated and therefore globally unique. So what that means is it doesn't matter where your data or metadata is located, it will always have exactly the same signature. Therefore, I can match this signature and I can do global verification of that signature. You can think about benefits of global cache, for instance, right? It's totally immutable storage. You can cache it. You don't need to invalidate the cache. It becomes really scalable at that moment and very fault tolerant. So same as in Git, any modification or street modification is fully versioned. Essentially EdgeFS is a system which manages versions. Whatever you do with the system, whether it is block device, object file or noise scale, yes, we support noise scale as well. Any modification to that instance essentially will be versioned. And those versions is what we're managing and those versions is what we're distributing across different locations. It is scale out system and protocol unified object storage solution. Ultimately, it is object storage solution. It is protocol unified in a sense that it supports different protocols such as NFS, ISCSI, it supports S3, variation S3, particularly for HIT use cases. And it's most importantly designed for geotransparency and HIT use cases. And we're going to talk about this more. All right. So Rook at your fast essentially deployed as a Kubernetes operator. So when you see your stateful set, you will see, oh, okay, this is your stateful set representing the targets of the storage system. It's a full service lifecycle management. You can install, update, rolling upgrade, uninstall, reinstall. So the whole thing is managed by essentially so-called custom resource definition. If you want to upgrade, you just open your editor, you edit the version and go to the next version. It knows how to upgrade correctly. It will take care of your interdependencies, which are complex in the storage system, and it will do the right thing. It is also tightly integrated with Prometheus and Grafana. So you can monitor the behavior of the particular data segment. It is easy of use built-in GUI. That's what we have. Sometimes it is kind of tedious job to go ahead and edit the CRD to change something. So we're sort of providing a GUI wizard which allows us to modify CRD so that you don't need the actual editor to do that. But the editor is good because that's your DevOps style, right? So you can edit the CRD and essentially commit this to the repository and then have kind of control of what's been changed, the history of the change of your infrastructure. It is run in embedded environments as well, as well as can be deployed in very large-scale environments as well. When it runs in embedded, it can run operators in just one gigabyte of memory and two CPU cores. What it does is really consumes locally connected road disks or directories, and we're actually working on adding a mechanism of collecting persistent volumes as well, like in a cloud, and then present virtualized SDS on top of it. It exposes different data protocols, and one of them, most important one for today is S3, which represents the object access protocol. Additionally, we provide S3X, which is noise scale abstraction and POSIX like extension on top of S3. You can do very nice things with S3X. For instance, it outperform S3 for scenarios like time series, databases, big table databases by almost 100 to 100 times. Scale out high performance on FS. By scale out in high performance, this is what really it is. It's going into potentially hundreds of thousands of IOPS. You can reach a million of IOPS if necessary. Scale out as quasi-block devices. What it means, you can have many block devices in the same segment scattered across various persistent volumes or just applications, and when you write to one device, it's really going to be leveraging all the disks and servers and zones connected to it. AGFS connected data. The important aspect of AGFS is that it can connect multiple segments together. It's not just one. It can be many segments. It can be hundreds, really. Again, if you remember, I talked about GIT and how the GIT is architected. Very similarly, we do reconciliation of the versions, and eventually we provide fully consistent view of whatever you want to synchronize. Cloud connectivity. What it means is that we can actually transparently present different object formats as internally just metadata within the AGFS global namespace. Block level geodeplication is critically important because it allows us to save quite a bit of the bandwidth when we transfer data between the links. Metadata only transfers. Because we kind of know the fully consistent representation of all the changes we're making, we can make an assumption that you don't need to transfer this or that particular data block, and all we need to do is just transfer metadata only. This is a tremendous savings when you're synchronizing multiple clouds together. Local caching, obviously, when you're enabling just metadata, there is no data yet, so we basically have local caching mechanism, which allows us to locally cache, then pin, unpin, and clone the datasets. And intelligent perfection is also part of it. We kind of do not wait for the IOR to complete. We're just reading ahead, and that is all built-in and transparent to the application. With all that, we're getting to the geotransparency, and it's real geotransparency, where you do not need to think about the data management complexities. You do not need to think about snapshot synchronization and stuff like that. It is transparent to the application. And it builds global namespace really. The protocol transparency is like, for instance, you can see the NFS files in S3, and you can see S3 objects in NFS transparently. And multi-protocol, that means you can actually from one segment right here have different protocols, whatever your application needs. A little bit more about the GFS cloud connectors and what we do. Most importantly, when you're talking about cloud, you're talking about object storage, right? So we're talking about S3 everywhere, transparently syncing regions. Metadata-only synchronization is critically important with local caching because otherwise you're going to be paying doubles and maybe triples, depends on how many connected sites you have. So we currently support AWS cloud, Google cloud, Microsoft Azure, and Alibaba is also recently been added. And most importantly to mention is that it operates with unmodified native objects. So what it means, S3 object in AWS can be seen as OSS object in Alibaba and vice versa. And you can set up synchronization with just metadata, so you don't need to move the data actually, right? So edge interface is really a multi-cloud layer with geo-scalability, which spans of geographically distributed sites connected as one global namespace. It's a Git-like architecture with full tolerance and immutable version metadata design. It scales equally well for object, file, built-in noise scale, or block devices. It is geo-transparency. It's always on, it is bidirectional access to the same S3 bucket or NFS export in different regions. It's automatic. Last writer wins update strategy for S3 object. So what it means if there are two users or applications updating the same S3 object, the last writer will have a preference if versioning is not enabled for that particular bucket. The SNAP view groups is what we have as a concept for the NFS and ISCASI synchronization. And geo-consistency is what we have the SNAP view groups, which are literally floating with connected geo namespace. So what does it mean floating? By floating, I meant to say that when I create a SNAP shot, let's say, in one segment, that SNAP shot is transparently will be available in the other segment. You don't need to send it. You don't need to kind of transform it manually. It just appears in a different segment. Any granularity protection, so we protect files, directories, buckets, logical unions, noise scale databases. It's all sort of built-in into the paradigm of geo-consistency. Geolocality and active LRU caching. So what it means is that this edge of S, you don't need to worry about latency of access to your data. Your latency is always local. If you have local SSD, your local writes going to be with the latency of what SSD provides. Your local reads will be with the latency your local SSD provides. This is critically important for HIT connected devices, because the latency is essentially killing the whole idea. An idea is great. Yes, you need to move your computing to the edge, but how would you do that without edge FS? So we need edge FS. Education synchronized asynchronously, and that means geographically eventual. Obviously, we cannot break the law of physics. Also, there is a latency between different locations. Yes, there is a 5G on the horizon. That's going to be another five years when the 5G is going to be really available for everybody. But even when it's going to be available, you're still going to have problems of sort of like, okay, my link is disconnected. What am I going to do? So how am I going to reconnect? How am I going to synchronize? And so on. So that's why we designed and developed and brought in the edge FS to solve that issue. Use cases to mention. So there are different use cases. I think most important ones are here. The multi-cloud CDN workflow. So obviously it's a classic content delivery network. You avoid full replication. You can do pin-on-pin flow cache content. It's a configuration of primary source from AWS, Azure, Abyy, and others. Cloud high availability. So cloud can fail. Your site can fail. So you need automatic fail-over. You need some fail-over over the cloud links. So redundancy is critically important. Operate in offline mode. So yes, so link can be disconnected. But locally we can cache the data up to seven days, and then we eventually will be synchronizing. And after synchronization is complete, your data will be exactly consistent as it used to be before. Azure ET2 and from cloud. So capturing of edge data in a local cache and provide clouds for AI ML processing is something which STX protocol was specifically designed. And most importantly, we also do not need to send the data if it is deduplicated. And if you send, we can essentially guarantee that what we send is necessary to be sent. Access global namespace transparently while avoiding the need for full replication is very important for this use case. And because we kind of rooted into the Rook and we started with Kubernetes, we're natively supporting persistent volumes across clouds. And it's actually very directional. So what it means by leveraging a few groups and snapshots, you can provide geotransparent synchronization. Yes, it can use just metadata only. And yes, it supports CSI for file and block TVs, and obviously consistency groups. Data segmentation and region awareness is also supported, means that if you have, let's say, a large Kubernetes cluster and you want most efficient access to the data our logic will select the closest segment and execute your application on the closest segment. All right, so one of the topics which we want to kind of talk today about is networking. Within the Rook, we're currently leveraging just the pod network, which as you know, in Kubernetes is a one big flat network, which is great, right? Very easy. So you have 1G, 10G, 25G, 4G, 100G, whatever. This is great. But it's sharing same pod network with a converged application. And this is a problem. When we're running applications on the same Kubernetes clusters alongside this storage, you probably want some sort of isolation. So not anymore. What we kind of come up with is a multi-home network. And this concept is when we essentially created different network for different user-specific purposes. And the network which essentially are just responsible for Kubernetes servers, API, Kublets and so on. Liveness, readiness, props kind of lives here. With that, you can essentially attach your application which needs specific user traffic to certain networking ports will be orchestrated within the same pod right here. If you want to take a look on a definition and example, how it looks like, like here, you see, okay, this is the Flannel SDS. And you see there is a RUCA-GFS namespace for which this network was allocated. We selected multi-CNI for very practical reasons, really. And it's just essentially just first CNR orchestration which we selected. We're going to be adding CNI Gini a meter later. But ultimately, why multi-CNI? Because it's backed by Intel. And we know that as far as networking, Intel is probably leading the space. So it makes sense to kind of select multi-CNI. It's a flexible selection of SDNs. You can do not just Flannel, you can do Calico, you can do others, you can do DPDK, you can do SRIV, DPDK, all sorts of things you can do with multis. It enables namespace isolation. This is an important feature because whatever you define in this namespace will be a network just for this namespace. And other pods will never actually be able to get the same IP address from this network. So it's totally isolated to the totally different networking devices. Okay, what does it mean for EdgeFS? For EdgeFS, from the EdgeFS point of view, it means improved performance characteristics. And we actually have done some work on measuring this. It means improved data security. That means if your application and ingress and ingress running on a different pod switch network, there is no way they can see what's happening on the back end. And the back end is often shared between various tenants and users. So it's critically important to isolate the application network from the back end network. And this is how it used to be. So you see it's just one pod network. And now we're kind of transitioning in this particular example to this front end isolation and back end isolation. You can have more than two networks. It's configurable. It's flexible and so on. The most important mechanism here is what we gain is improved QoS and SLA. When there is no packets kind of floating from application side, we can better guarantee QoS. It's going to be better response time. Therefore, it's better SLA agreement when it's isolated here. Okay, so the demo setup. This is a very typical demo setup, though in production we're running larger setups than this. Here you see four nodes cluster. So it's one, two, three, four hosts. Now after the deployment, you will see target pod. In my example, it was 4TB each. So it's like we have SSD for sort of metadata upload. We have HDDs which 1TB each. So we got one pod 4TB, another pod 4TB, another pod 4TB. And it's really a stateful set we're talking about. So this is a stateful set. Additionally, we have S3X deployment. It's a Kubernetes deployment style. So we'll get S3X pod, which we'll be serving this protocol and connected to the back end and front end. We also have Cosbench. It's essentially it's a workload generator which runs on a separate host just to simulate the behavior and kind of highlight the fact that with multi-home network, you'll get significantly better parameters. Now, if you take a look on Rook and GFS cluster CRD, typical CRD looks like that. So what do you see here? Okay, so you'll see the name of the cluster. You'll see the namespace. And from the previous slide, you remember that for that namespace, we've done different network. You'll see the image version. So we're selecting 1.20. We'll see the service account for our back and other primitives. You'll see the local host directory where GFS is going to be storing kind of configurational files of each target. And then you'll see some parameters which are coming from the Rook framework. Like, okay, we want to just use all the nodes, all the devices. We dynamically detect all this configuration and building this totally automatically. And some configuration, like we enabled read ahead, we enabled some optimization for performance, metadata float, and some other stuff, and some resource limitations. So when we execute this CRD, Rook operator will notice, okay, so this is edge of S. And therefore it will execute a particular operator implementation which build up all that for you. So you don't have to edit configuration, it will automatically detect SSDs, automatically detect HDDs and build most optimal pairs for you. This is essentially all you need to write to get the cluster up and running. You can play with this if you want after. So here we go. So we've done some performance analysis. Obviously, as I've been expecting, the results were pretty much, I was expecting actually better numbers than this, but it's still quite significant numbers about this. So what do we see here? So better response time. So response time in case of multi-home is significantly better by almost 30%. And this is when you're just leveraging one network. This is this multi-home network. So you may wonder why there's so many milliseconds. Well, because we have 128 write threads and 256 threads. And obviously, we bound by the idea we have only 16 disks in that system. But I've done this on purpose because I actually wanted to highlight how the system going to be essentially behaving with multi-threaded application. And the example is just two megabyte S3 object transfers, these three syncing replicas, and six terabyte of random IO dataset. So it's a real test case, real use case of how a system can benefit from multi-homed. On IO bandwidth, you'll see actually even better numbers. You'll see 40% better bandwidth with multi-home network versus the just pod network. So we can conclude that this multi-home, if we enable multi-home network, we'll get significantly better response time and better bandwidth. In addition to other benefits which I highlighted, like better isolation, data security, and other benefits. To summarize this brief introduction to RUG HFS. So first of all, RUG community is growing. We're adding more providers. So there are a few providers from Red Hat, which are coming out. There are a few other independent providers. We're improving existing providers. We're building kind of new practices, how to contribute. The top contributors I would like to highlight is Abana Yo, Red Hat, Mixed Antibody, and Suzy. These are the companies cloudability who are essentially putting the efforts into making this better. RUG HFS is emerging to address multi-cloud and Azure IT needs. We added RUG HFS for the reason because we believe without RUG HFS or something like that, the H computing problems, the problems for H computing will not be solved sufficiently and efficiently. So that's why we kind of wanted to, for Kubernetes community to pay attention for RUG HFS and how this can be used for those sort of use cases. The multi-home network is going to be available in upcoming one-one release, but you also learn that multi-home network can be significant in your isolation for the data security purposes, also improving performance, and it's actually possible to do this in the Kubernetes itself without introducing any additional layers or custom modifications. So all kind of clean built into the RUG and will be soon available, not just for HFS, but also for CF. And thank you very much. So I can accept a few questions. I have three minutes left. If you guys have questions, architecture, go ahead. Thank you for your presentation. And I have two questions. Basically, HFS provides S3 and S3X like API, and the data is starting no local disk or blob storage in public cloud. But I think what's the difference between HFS and can you compare HFS and Minio? Minio is just object. Minio is simplistic object. It runs on top of kind of directories and repurposing existing infrastructure. HFS is more kind of enterprise object, so to speak. It runs at a high performance. It can run on top of row disks, for instance, and it consumes row disks without file system. HFS also provides good compatibility with S3 and has additional options such as S3X, which is more designed for sort of big table kind of cloud scalability and geotransparency. Additionally, obviously, HFS is a unified protocol, so if you have block and NFS, which Minio doesn't. Yeah. Another question is about metadata storage of HFS. I noticed that you have the LMDB configuration in your configuration, but I think LMDB is not scalable, so. So we've done a lot of work on LMDB, but as far as metadata, metadata is globally scaled out, so essentially there's no centralized metadata server. With this architecture, metadata exists on all the pods, so it's gradually distributed, and because we're using the git architecture, so we can address this very easily because it's always immutable, so essentially it's a sharding across multiple locations, multiple devices. But I'm not sure. You just mean the metadata is automatically sharded across every node, but how to guarantee the, for example, one copy of the metadata is destroyed, or how to do the HAA? So HFS provides replic accounts and revision coding, which essentially gives you high availability, so it is in addition to the segmented storage paradigm where you can synchronize between the clouds. Thank you. Thank you. Great questions. Any other questions? I have a question. Is the HFS open sourced? I couldn't have found it in the GitHub or anywhere else. Yeah, so we have great news. We're open sourcing HFS within the next months or two, so it's coming out. So this is very good news for Linux Foundation and community. We also will be working with LFH community to make it part of LFH at some point, so it's getting to the full open source essentially. Okay. Any other questions, guys? Thank you. So with that, I guess we can conclude this session. Thank you very much for coming.