 Good morning, good afternoon, or good evening, depending on where you're watching this virtual presentation. My name is Alex Kirkop and I'm one of the co-chairs for the CNCF storage tank. And my co-presenters, Xing and Raffaele. Hi, Alex. My name is Xing Yang. I work at VMware in the cloud storage team. I'm a co-chair of Kubernetes 6 Storage and also a co-chair of the CNCF Tag Storage. Now it's Raffaele Stern to introduce himself. Good morning, good afternoon, good evening. My name is Raffaele Spastoli. I work as an OpenShift architect and I also help in the storage tag as a tag lead. Brilliant, thanks everyone. So thank you for joining this storage tag presentation. We're going to be talking today to cover a wide variety of topics, but we want to cover what the tag does and how you can join in and help. We work a lot with a number of the storage projects in the CNCF, so we'll cover the projects that we work with and the ones which are being reviewed. We'll also, as part of the tag, we'll also prepare a number of different documents. So we'll cover some of the three documents that we've been working on this year, including the storage landscape documents, the performance and benchmarking and our cloud-native disaster recovery documents, which I all hope you'll find very interesting. So as a quick intro to the tag, the tag stands for Technical Advisor Group and they were renamed from the CIGs earlier this year. We renamed the CNCF CIGs were renamed primarily to avoid confusion with the Kubernetes CIGs. The tag is fully open and everything we do is public. Our meeting agendas are minuted and our conference calls are open every two weeks on the second and fourth Wednesday of every month. We'd love to have more people from the community join us, whether you're from a vendor or from a project or even just an individual contributor. Everybody is welcome and all of our recordings of previous meetings are available too. So who are we? The tag is formed from a diverse set of users and developers with a cloud-native and storage focus. We're obviously trying to lead and promote the cloud-native storage technologies within the community. We have a number of coaches who help with the coordination of the tag and a number of tech leads who act as subject-matcher experts in their field and provide advice and work on the different projects. Together with our coaches and leads and the community in the tag, we work with the TOC via our liaisons who are Sada Lee and Erin Boyd, both of which were previous members of the tag too. So what do we do? What's the key purpose of the tag? The tag is there to help scale the TOC, right? So the main function for us is to help provide education to our end-users in the form of white papers and documents and work on the projects and review projects and provide guidance to the projects for the TOC. And primarily act as an extension of the TOC for the technical subject matter that we're focusing on in the storage space. Now, when we say storage, it's actually pretty broad. So it's not just things like file systems or volumes but includes things like databases and object stores and key value stores too. Some of the things that we've built over the last couple of years is some material that helps with informing developers of ops teams and everybody who consumes cloud-native products and cloud-native ecosystems on how to best make use of the technology because what we're realizing is that more and more we're seeing developers take advantage of these technologies. And it's very important for the end-users that are consuming the different types of cloud-native storage to be able to understand the different attributes of those systems and how they interact. And Jing will be talking a little bit about that when she covers the storage white paper in a couple of minutes. The second thing that we do with the TOC is we help with the process of moving projects into the CNCF ecosystem. So in the CNCF we have three types of projects. We have sandbox incubation and graduation where sandbox projects have a low barrier to entry. And the idea is that they use the CNCF as a way of building out their community and ecosystem. Incubation projects are projects that have gotten to a good level of maturity and have proven their use in production in a number of end-users and have healthy number of committers and project roadmap. And then finally, graduation is for the most mainstream production use projects and includes things like Kubernetes, for example, and Prometheus. And graduated projects also benefit from additional levels of maturity like security audits and have some future-proofing in terms of making sure that there are multiple organizations that are supporting them. And so as part of the TOC, as part of the help that we provided the TOC we often see presentations from different projects and we help with the due diligence reviews to either move to incubation or graduation. So some of the time we'll be just doing discovery and outreach to kind of their projects and other times we're specifically helping the TOC and working with a sponsor from the TOC to actually review the projects. One of the important points is that we operate in a consultative mode and ultimately the TOC have the final say in every decision. And once the TOC has voted on those projects it goes to a public vote for a final review. Of course, you know, the other thing is we would be nothing without the end users and the community. So we try our best in the way of surveys but also reach out in these sort of virtual events to gather your input and feedback and try to understand what are the best things we can do to help the community. And we're actually just going through a planning phase right now and definitely if you have input and you have things that you'd like to work with us or you'd like us to work on please provide that input either to the mailing list or by joining one of our calls. I talked a little bit about the sort of expert advisor status that we hold with the TOC where we help with the due diligence for reviewing projects but we also periodically check in with some of those projects and do reviews on their health, et cetera. But again, as always in all cases the TOC will make the final decisions on those projects. So now I'd like to hand over to Jing who will talk a little bit about some of the projects that we focus on in the storage space for the CNCF. Jing. Thanks, Alex. I'm going to talk about graduated and incubating CNCF projects. RUG is a graduated project. RUG is an open source cloud native storage orchestrator for Kubernetes. RUG turns distributed storage systems into self-managing, self-scaling, and self-healing storage services. RUG supports multiple storage solutions each with a specialized Kubernetes operator to automate management. It has stable support for SAP and alpha support for Cassandra and AFS. Vitas is a graduated project. Vitas is a database clustering system for horizontal scaling of MySQL. Currently, Vitas supports MySQL, Pokerna, and MariaDB databases. It combines and extends many important SQL features with the scalability of our now SQL database. EDCD is a graduated project. It's a distributed key value store that provides a reliable way to store data across a cluster of machines. All Kubernetes clusters use EDCD as their primary data store. It handles storing and replicating data for Kubernetes cluster state and uses the wrapped consensus algorithm to recover from hardware failure and network petitions. Taikevi is a graduated project. Taikevi is an open source distributed transactional key value database built in Rust. It provides transactional key value APIs with ACID guarantees. The project provides a unified distributed storage layer for cloud native applications. It can also be deployed on top of Kubernetes with an operator. Dragonfly is an incubating project. It is also a project under SIG runtime. Dragonfly is an open source P2P-based cloud native image and file distribution system. It was originally created to improve the user experience with application, cache, log, image distributions at very large scales. For Dragonfly, no matter how many clients started the file downloading, the average downloading time is almost stable without performance penalties. We also have Langhorn, which just became an incubating project. Langhorn is a distributed block storage system for Kubernetes. It is built using Kubernetes and container primitives. Langhorn creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes. That's all for the CNCF storage projects. Next slide, please. Here are a few projects that are in review for your incubation. Next, please. Now I'm going to... And here are a list of the other storage projects in CNCF. There are a few more sandbox projects shown here. Next, please. I will talk about the CNCF storage landscape white paper. In this white paper, we described storage system attributes, different layers in a storage solution and how they affect the storage attributes. We talked about the definition of data access interfaces and the management interfaces. Next, please. Storage systems have several storage attributes. Availability, scalability, performance, consistency and durability. Availability defines the ability to access the data during failure conditions. Scalability can be measured by the ability to scale the number of clients, throughput or number of operations, the capacity and the number of components. Performance can be measured against latency, the number of operations per second and the throughput. Consistency refers to the ability to access newly created data or updates, updates has been committed. Assistant can be either eventual, consistent or strongly consistent. Durability is affected by the data protection layers, level of redundancy, the endurance of the storage media and the ability to detect corruption and recover the data. Next, please. There are several storage layers that can impact the storage attributes. For example, rather than directly access resources, a hypervisor can provide access to resources which could add access overhead. Storage topology describes the arrangement of storage and compute resources and the data linked between them. These include centralized, distributed, sharded and hyper-converged topologies. Storage systems usually have data protection layer which adds redundancy. This refers to RAID, erasure coding and replicas. Storage systems usually provide data services in addition to the core storage functions, including replication, snapshots, clones and so on. Storage systems ultimately possess data on physical storage layer which is usually non-volatile. It has impact on overall performance and long-term durability. Next, please. In this diagram, we can see that the workloads consume storage through different data access interfaces. There are two categories of data access interfaces here. We call them volumes and API. Container orchestration system has interfaces for volumes which support both block and file systems. In Kubernetes, there are two volume modes, file system and block. Block mode allows workloads to consume file system directly. Underneath, it can be either file or block interface. Block mode allows draw block device to be consumed directly by the workload. On the API, we have object store API, vestors or retrieves objects. Note that there is a Kubernetes six storage subproject called COSY container object storage interface which introduces Kubernetes APIs to support orchestration of object store operations for Kubernetes workloads. It also introduces COSY as a set of GRPC interfaces so that an object storage vendor can write a driver for provisioning accessing object stores. It is targeting alpha in Kubernetes 1.23 release. On the API, we also have key value stores that uses an API to store retrieved values from stores based on a key. It is typically used to store state configuration or distributed systems. On the API, we also have databases. Databases are typically accessed through an API. Note that all databases are not all databases are cloud native. This can typically be addressed with additional tooling like the use of proxies and orchestration systems that allow them to be better suited to run in a cloud native environment. Next please. Now let's look at orchestration and management interfaces. This diagram shows workloads consume storage through data access interfaces. There are two ways for storage integration systems. The dark green box here, control plane interfaces refer to storage interfaces directly supported by SEALs including container storage interface, CSI, dark hood, walling driver interface and so on. CSI has three GRPC services, controller, node and identity services. Identity service provides info that supports functions such as create and delete volume, attach and detach volume and so on. Node service functions such as mount and unmount volume. The orange box here is called frameworks and tools. This is an extension of control plane interfaces. For application API including key value store and databases, SEALs currently don't have direct interfaces for it yet. But some frameworks and tools have support for them. For example, Zhuk supports Cassandra. Beta has an operator to manage my SQL clusters. So that's all for the storage landscape white paper. Now I will hand it over to Alex to talk about the performance white paper. Thanks, Shing. So we'll talk a little bit about the performance white paper. We were looking at the landscape in the storage space and we categorized those different attributes. One of the key attributes that we wanted to also cover in further detail was performance and then later on things like data protection and disaster recovery which Raphael will cover next. But in the performance white paper what we wanted to do is to provide some information to help people understand the concepts for how you measure performance and we focused on two main areas one for volumes and the second for databases. We'll probably add additional areas in future. But right now we focused on volumes in terms of the things that we want to look at like the latency or the number of operations or the throughput for those volumes and then also databases in terms of things like the transaction overheads but of course also the functions like latency and other benchmarks related to transactional reads and writes within an environment. One of the key things that we found when we were going through this with a number of people from the tag was that there's a number of different misconceptions when it comes to managing when it comes to benchmarking especially when it comes to doing apples for apples comparison. So we actually spent quite a bit of time defining some of the common pitfalls and other considerations that are important in understanding their impact on the performance of the system. So at a very basic level for example differentiating between the number of operations versus the number of the throughput based on other factors within the test. So for example if you want to test the number of operations for lots of little objects or smaller block sizes versus for example throughput where you might want to be testing with larger block sizes and factors like and factors like concurrency too. The other aspect that which is really important is the data services which again Jing talked on in those different layers and how they affect the performance. So understanding the topology whether it's hyper converged or centralized or accessed over a network or being local is obviously key because the latency from the topology definitely affects almost every aspect of performance. But of course the methods which are used for data protection whether it's things like a razor code or replicas for example the compression algorithms that might be used in data reduction and things like 10 provisioning but also the overheads used to protect the data like in encryption. And in all of these factors there are also some more complicated things that come to play. So for example understanding the different Q-dats but also making sure that the bottlenecks aren't happening when you're trying to measure bottlenecks that they aren't happening in the clients or in the back ends. So try to understand those factors. One of the key things is the caching element and certainly the easiest downfall that we've seen in multiple times is making sure that the size of your workloads matches the caching capabilities of your system. So for example we've seen many many times benchmarks revealing numbers where the results far exceed the capabilities of the storage system or whether that's a volume or a database and that's typically because the workload is smaller than the size of the cache and therefore in reality the benchmark is really only testing the speed of the cache rather than the speed of the storage system. Of course with all of these things managing the environment and understanding the headroom in your environment is also key. So in summary the important takeaway is don't use publish results because they're very rarely useful for making comparisons it's extremely hard to compare the publish results in an apples to apples way without a deep understanding of the test conditions and we would always recommend that you run your own tests on your own applications within your own environment because that's really the only way that you're going to be able to get a real life representation of what you'll be able to achieve in your own environment. The performance document is still open for review and would love to hear your input on this and maybe have any suggestions would love to hear from you. And now I'll hand over to Raphael who's going to cover cloud native disaster recovery which is the latest white paper we've been working on. Thank you Alex. And thanks everybody for coming today. This white paper in this white paper we submit to your attention the concept of cloud native disaster recovery it's an approach to organizing your disaster recovery strategy and for applications, for your applications and we think it's an alternative to more traditional disaster recovery approaches it's something that cloud native cloud native best practices enable today and I think it's something you should consider. I think it's something you should consider. It's not always going to be the best approach but it should be something you know about in your toolbox and when it's the right thing to do how to do it. But what is cloud native disaster recovery? Let's define it using this table and comparing it with more traditional disaster recovery and by traditional disaster recovery here we mean what you find in large enterprises large traditional enterprises not the web scaler not maybe the unicorn startups but really normal enterprises so let's go line by line we will cover this table quickly the first difference is in the type of deployment for your workloads normally what you find especially for state workloads which is the obviously it's the objective here tag storage for state workloads what we usually find is an active type of deployment rarely we find companies that have the ability to do active active across data centers or across regions with cloud native disaster recovery we say that you do active active with cloud native disaster recovery another difference is how do we detect that we are in a disaster situation with disaster recovery traditional disaster recovery usually the decision is made by a human it's a human decision there is maybe an emergency room or a situation room where many people are meeting and somebody says okay we are really in a disaster we need to trigger the disaster recovery decision we need to trigger the disaster recovery procedure in cloud native disaster recovery the software needs to realize that there is a problem and trigger the the R procedure automatically or autonomously and then the procedure itself in traditional disaster recovery it's usually a mix or manual and automated tasks like I said the trigger is definitely manual maybe the rest is automated depending on your maturity that is the thing that actually this many companies test maybe once a year or couple of times a year but in cloud native disaster recovery everything must be absolutely automated and then the two main metrics by which you measure your SLA for disaster recovery which are RTO recovery time objective and RTO that you respectfully measure the downtime of your service back in time you go the amount of data loss the amount of data loss measured in time the amount of transactions that were lost because of the disaster so for the downtime in traditional disaster recovery you can have from close to zero two hours depending on how well you organize your work loads for cloud native disaster recovery it's going to be close to zero and it really we cannot have zero because there are some health checks that need to trigger and traffic needs to be redirected to to the healthy locations but it's really we measure it in a matter of seconds and for RPO so the data loss in traditional disaster recovery you can have close from zero two hours again depending which approach you adopt and in cloud native disaster recovery it's going to be zero if you use strongly consistent workloads and something that is theoretical and bounded but in normal situations very close to zero for a venture consistent department and then moving a little bit out of the technical aspects and talking about organizations and ownership concerns disaster recovery is formally owned by the application team in reality in many traditional many traditional enterprises what these application team do they turn around and talk to the storage team and ask what kind of SLA can you provide for storage and then whatever the storage team answer that's also that's what also is adopted by the application team so essentially the storage team drives the disaster recovery for everybody for all the applications in cloud native disaster recovery in cloud native disaster recovery the the application team is clearly the owner of the disaster recovery process and they have to pick the right middleware the right storage components to be able to to be able to perform the disaster recovery and then another thing that we found actually creating this cloud native disaster recovery infrastructure is that for traditional the capabilities that enable us to build these disaster recovery procedures architectures and procedures are coming from storage typically in the form of backup and restore or volume verification whether synchronous or asynchronous but in cloud native disaster recovery the capabilities that we need come mostly from networking and they are in the form of east-west communication by which we mean the ability of our data centers of our region to communicate horizontally to east-west and in the form of a global something that sits in front of our data centers or our regions and spreads the traffic under normal circumstances but when there is a disaster is able to realize one of the failure domain is not available and redirects the traffic to the available and else the failure domain next slide please so in this white paper we I want to cover a little bit what you can find inside of it there is this definition high level definition then we have a section with some theoretical content that supports the definition and gives you evidence that these architectures can actually be built so you can find some volume knowledge on why it's possible to build these architecture and a lot of links if you want to go deeper and so you find things like definition of failure domain high availability disaster recovery and then the CAPTRM and everything that comes with it and then a description or an anatomy of distributed workloads are organized with respect to being fault tolerant and being able to manage replicas and shards and then we talk about consensus protocols which is what allows all of the instances of this distributed workload to coordinate with each other and then in the final part of the white paper we have some reference architectures on these planetary disaster recovery approach in particular we have two architectures for strongly consistent workloads or deployment and eventually consistent next slide please so this is an example of the kind of research that you can find inside this white paper for example here we have a list of products that can be utilized and deployed in a planetary disaster recovery fashion so maybe I should say that not all of the workloads state of workloads can be used this way you need to pick the right middleware we give you some criteria on what to look for and then there is this list it's not meant to crown kings it's just a set of a set of products that we have analyzed and as you can see in this table we show what is the replica the consensus protocol used for replicas and paxsus and rafts are the most popular and then what is the consensus protocol used for synchronizing inter shard transactions next slide and this is an example of the reference architecture in this case it's a reference architecture for a Kubernetes deployment and for a strongly consistent workload and so if you can see my point there we have the workload the state of workload this could be a database could be a key value store could be a cache as you can see we have several data centers a list tree the workloads needs to be able to sync its state and that's what I was talking about probably you will have a front-end in front of it a stateless front-end probably some kind of ingress to get practical to the clusters and there is a global balancer in front of the workload when a region goes down for example let's say data center one here goes down the workloads will realize it let's self-organize maybe by creating more replicas in the in the remaining regions and then at the same time the global load balancer using health checks would see that one of the one of the data centers is not available anymore and redirect the traffic to the LC location overall you don't lose any data and the client may experience some pickups but in general if they have a good retrying mechanism they will just continue working another aspect that we look at when we do this reference architecture is what happens if the ingress traffic is working correctly but the east-west traffic is somehow interrupted and that's really what it's called a network partitioning in the cap theorem and so we want to make sure that we analyze and we describe what happens if a network partitioning happens for example for state strongly consistent workloads if a network partitioning because of the quorum and the literal election protocol quorum requirements and the literal election protocol the partition that doesn't have the majority of instances will essentially consider itself offline and start behaving as not available and so you will have the same behavior that happens when there is a disaster and so traffic will go to the healthy location and again the clients will not experience an interruption of service or an inconsistent behavior because the state is never it never drifts into inconsistent inconsistent split brain scenarios and I think that's it for the so back to you Alice brilliant thank you Raphael and I think with that ends our presentation would like to remind you that if you want to get involved feel free to join our meeting which is on the second and fourth Wednesday of every month we have a we have our GitHub repo which contains all the links to our meeting requests, our meeting minutes I mean and our recordings and again to remind you if you're interested at all in how you manage storage things like block stores file systems, object stores key value stores databases and all the topics around that related to things like performance recovery and and scalability of cloud environments would love to hear from you and everybody is welcome so thanks again and we're happy to answer your questions now