 Good morning, good afternoon and good evening depending on where you're watching this We'd like to introduce you to the cncf Tag storage session where we're going to cover an intro to the tag some of the projects we're working on and the landscape and Technology in our space My name is Alex Kirchhoff. I'm the CEO of storage OS and I'm also a co-chair of The cncf storage tag and my co-presenters are shing and raffaela and I'll let them introduce themselves Thanks Alex my name is shing yang I work at a VMware in the cloud storage team I'm a co-chair of a Kubernetes 6 storage also co-chair of the cncf tag storage And now it's a raffaela's turn Thank you. My name is raffaela spassoli. I work at Red Hat as an architect for OpenShift I am also a lead in tax storage tag Brilliant so let's dive right in So very quickly today, we're gonna cover an overview of the tag and some of the storage projects That are being worked on in the cncf and that are being reviewed by the tag We'll also cover some of the documents and content that we've been working on including The storage landscape documents the performance and benchmarking and the disaster recovery As always, we're aiming to have a few minutes at the end for for Q&A So feel free to post questions As we go along so very quickly This the the cncf tags are technical advisory groups They used to be called SIGs, but they were renamed because of some of the confusion With with between the cncf SIGs and the Kubernetes SIGs All of our calls are open We have meetings on the second and fourth Wednesday of every month So please you're welcome to join and have a look at the at the agenda and And and join our mailing list and all the recordings of of previous tag sessions are also online on on YouTube Fundamentally, we're a mixed bunch of people verging from You know technologists and and business users and vendors But all specializing in in storage We have a number of coaches that Manage and plan with the tag and a number of tech leads who are technical authorities in their space And we work with the TOC Specifically Saad Ali who also used to be in the in the tag and and Aaron Boyd who also used to be in the tag before they joined the TOC Fundamentally what we're here to do is to help the TOC scale We're we're here to provide technical expertise and sort of bring the the storage community Into the cncf A big portion of what we do is to create documents and an information which helps end users understand the cloud native storage space Which is quite complex and Jing will talk a little bit about some of the things we're doing there But of course what we're trying to do here is is lots of research And that kind of ties in then to the second thing we do which is which is projects today The cncf has three types of projects. There are sandbox incubation and graduation Sandbox projects is is the starting point for for a lot of the projects journey into the cncf and it allows them It has a low barrier to entry and allows Projects to build a community and and start working towards their intellectual intellectual property policies whereas incubation projects are projects which Happened after a due diligence step and those incubation projects are Effectively used successfully in in production and have a healthy number of committers and also a number of Users that that that can vouch for for the project graduation is is for projects that have gone mainstream and you know things like Kubernetes, etc. For example and Prometheus are are some of the big graduated projects And they're also benefit from security audits and have Committers from multiple organizations to ensure the life a longer life of the project And so the tank actually is Helps the TOC With the review process and we perform the due diligence for projects that are going through due diligence Reviews and we typically work with Sponsor from the TOC Who also who also typically will have you know a specific interest in the project? And finally a couple of other things that we do is we provide We provide input from end users sometimes by for example running surveys and an understanding end user use cases To to help prioritize what to work on next And to also provide that that feedback to to projects as needed And we and we also of course, you know, we've talked about the the meetings and the community that that That we're trying to build which which often has You know very interesting discussions about a variety of topics which then often end up becoming documents that we publish And finally of course, you know, we're we're trusted advisors to the to the TOC But no, we don't actually make decisions on projects We we provide advice and and recommendations and ultimately the TOC who are who are voted in make the make the final decisions For those for those projects And with that, I will hand over To Xing who's going to talk a little bit about some of the projects that we have in the CNCF Thanks, Alex So I'm going to talk about a graduated and incubating for the projects Rook is a graduated project Rook is an open source cloud native storage orchestrator for Kubernetes Rook turns distributed storage systems into self-managing self-scaling and self-healing storage services Rook supports multiple storage solutions each with a specialized Kubernetes operator to automate management It has stable support for self and offer support for Cassandra and an FS The next one is a Vitas. It's also a graduated project It's a cloud native database system for deploying scaling and managing large clusters of database instances Currently Vitas supports mysql, Pekrna and MariaDB databases It's architected to run as effectively in a public or private cloud as it does on dedicated hardware it combines and extends many important SQL features with the scalability of a no SQL database EDCD is a graduated project. It's a distributed key value store that provides a reliable way to store data across a cluster of machines All Kubernetes clusters use EDCD as their primary data store It handles storing and replicating data for Kubernetes cluster state and uses the raft consensus algorithm to recover from hardware failure and network partitions Tai KV is a graduated project. Tai KV is an open source Distributed the transactional key value database using Rust. It provides transactional key value APIs with asset guarantees The project provides a unifying distributed storage layer for cloud native applications It can also be deployed on top of Kubernetes with an operator Dragonfly is an incubating project. It's also a project under SIG runtime Dragonfly is an open source P2P based cloud native image and file distribution system It was originally created to improve the user experience with application Cache log image distributions at very large scales For Dragonfly, no matter how many clients start the file downloading The average downloading time is almost stable without performance penalties So that's all for the CNCF storage projects Next slide, please So here are a few projects that are in review for incubation Next Here is a list of all the storage projects in CNCF. There are a few more sandbox projects shown here So next, please I'm going to talk about the CNCF storage landscape white paper In this white paper, we described storage system attributes Different layers in a storage solution and how they affect the storage attributes We talked about the definition of data access interfaces and management interfaces Next, please I apologize there, I hit the wrong button Storage systems have several storage attributes Availability, scalability, performance, consistency and durability Availability defines the ability to access the data during failure conditions Availability can be measured by the ability to scale the number of clients throughput or number of operations per second, the capacity and the number of components Performance can be measured against latency, the number of operations per seconds, and the throughput Consistency refers to the ability to access newly created data or updates after it has been committed A system can be either eventual consistent or strongly consistent Durability is affected by the data protection layers, levels of redundancy, the endurance of the storage media and the ability to detect corruption and recover the data Next slide, please There are several storage layers that can impact the storage attributes For example, rather than directly access resources, a hypervisor can provide access to resources which could add access overhead Storage of topology describes the arrangement of storage and compute resources and the data link between them This includes centralized, distributed, sharded and hyperconverged topologies Storage systems usually have data protection layer which adds redundancy This refers to rate, erasure coding and replicas Storage systems usually provide data services in addition to the core storage functions including replication, snapshots, clones and so on Storage system ultimately persists data on physical storage layer, which is usually non-volatile It has impact on the overall performance and long-term durability Next slide, please In this diagram, we can see that workloads consume storage through different data access interfaces There are two categories of data access interfaces here, we call them volumes and API Container orchestration system has interfaces for volumes which supports both block and file systems In Kubernetes, there are two volume modes, file system and block File system mode allows workloads to consume a file system directly Underneath, it can be either file system or block interface Block mode allows raw block device to be consumed directly by the workload On the API, we have object store API that stores or retrieves objects Note that there is a Kubernetes-seq storage subproject called COSI container object storage interface Which introduces Kubernetes APIs to support orchestration of object store operations for Kubernetes workloads It also introduces COSI as a set of gRPC interfaces so that an object storage vendor can write a driver for provisioning accessing object stores It is targeting alpha in Kubernetes 1923 release Under API, we also have key value source Which use an API to store retrieved values from stores based on a key It's typically used to store state configuration for distributed systems And on the API, we also have databases, databases are typically accessed through an API Not all databases are cloud native This can be typically addressed with additional pooling like the use of proxies and orchestration systems that allow them to be better suited to run in a cloud native environment Next, please Now let's look at orchestration and management interfaces This diagram shows workloads consume storage through data access interfaces There are two ways for storage systems to interact with container orchestration systems The darker green box here, control plane interfaces refers to storage interface directly supported by COSI Including container storage interface, CSI, Docker, loading driver interface, and so on CSI has three gRPC services, controller, node, and identity services Identity service provides info and capabilities of plugin Controller service supports functions such as create and delete volume, attach and detach volume, and so on And node service supports functions such as mount and unmount volume This orange box is called framework and tools This is an extension of control plane interfaces For application API including key value store and databases, CEOs currently don't have direct interfaces for it yet But some frameworks and tools have support for them For example, Rook supports Cassandra, Vitas has an operator to manage MySQL clusters So that's all for the storage landscape by paper Now I will hand it over to Alex to talk about the performance by paper Thank you very much I will spend a few minutes covering the storage performance by paper Which is effectively a follow-on from the landscape by paper that we had put together where we decided to focus on some of the different attributes The first one we focused on was performance and the second one was availability and disaster recovery Which Raphael is going to cover next In the performance by paper, we aim to define all of the different concepts that we use for measuring performance and benchmarking And we focused primarily on volumes and databases In the full knowledge that of course there are lots of different systems to persist information But we focused on sort of two of the main ones And in this, when we were looking at it, we ended up defining a number of things to educate the users on how to compare benchmarking And performance in their environment primarily because it is such an incredibly complex topic So we look at some of the basics where we look at operations and throughput So things like IOs per second or perhaps transactions per second in a database Or perhaps megabytes per second or gigabytes per second even when it comes to throughput And a lot of those things are very much influenced by a number of different factors So of course when we're looking at operations, latency is probably the single biggest factor And low latency is going to be key when it comes to anything that involves, you know, thousands of operations per second or thousands of transactions per second But of course latency becomes less of a factor when you're managing sequential workloads or throughput sensitive workloads Because it's typically easier to implement faster throughputs with higher latencies Simply because the units of work are larger So, you know, often databases, database operations, for example, and small file system operations can be measured in maybe 4k or 16k chunks But sequential workloads like analytics might be operating in 128k or one megabyte chunks And so just like regime was explaining in the white paper, a number of different options come into play when it comes to the topology of data protection, data reduction and encryption Because all of those services are overhead, you know, topology, whether you're accessing data locally or you're accessing data over a storage network, for example Data protection when it comes to comparing the different ways that the data is protected, you know, whether it's mirroring or replicas or some sort of raids Or even erasure coding, for example Data reduction like compression and dedupe can also have a huge effect on compression, sorry, on your performance and on the benchmarking capabilities, especially if the dataset that you're working with is highly compressible or highly dedupeable And of course, you know, encryption for security is a very common factor nowadays, and obviously adds some overhead, but in many systems encryption is highly accelerated because it's already catered for in many frameworks Now, the two other things then which which come into play is the concurrency so things like the number of queues and the number of clients that are used to generate the workload. You know, you have to make sure that when when you're looking at the performance if you don't have artificial bottlenecks that might be you know CPU bound or or network bound and using multiple clients and multiple back ends can help with that. And of course, you know, the probably the single biggest thing that that catches so many people out is is understanding the caching at multiple layers and making sure that if you want to test the storage performance of your of your system that you need to at least use a data set that's that's that's a number of orders of magnitude bigger than the cash size available. I lose track of the number of times, up seeing benchmark documents, quoting numbers where All, all, all that's being quoted is effectively the speed of the cash because because the dataset isn't actually utilizing the storage system at all. So, so, you know, in summary, the important takeaway is it's it's really useful to to use the published results for making comparisons. And therefore, our recommendation is to be able to understand your tests in your own environments with your own applications. And in the and in the documents, we're now looking to provide some guidance on the uses of standardized tools, which which effectively would allow you to to perform your own tests in your in your own environment and perhaps compare different systems, whether you're running, you know, on prem or in the cloud or or some sort of virtual environment or development environment. That covers the performance by paper. And with that, I'll hand over to Raffaele, who's going to cover off our disaster recovery documents, which have been working on so hard for the last few months. Thank you, Alex. Yeah, here we present the disaster recovery, white paper in this white paper, we focus on an approach to disaster recovery, which we call cloud native disaster recovery. This is a, this is not a new approach, but it's an approach that we think with cloud native technologies and and solutions is going to be easier to implement and also more affordable. That's not to say that you have to use it. It's something that we think it's something worth considering when you design your disaster recovery architecture. To define it, we compare and contrast it with what we call transition traditional disaster recovery approaches. We debated a lot about using this term traditional and but just to be clear, by that we mean what you normally find in in nor in enterprises in in the average enterprise or company. It's not what the web scalers are using not what startups that don't have technical debt are using but on average what you can find. And so let's, let's take a look at some of these dimensions that we use to define the cloud native disaster recovery. For example, what is the trigger of the disaster recovery procedure traditionally it's a human decision with cloud native we want the decision to be autonomous. So the system knows when it's experienced a disaster and can react and then what happens when the disaster recovery procedure is triggered. What we see in traditional enterprises is mix of automation and human action. Okay, but the best ones, maybe the more automation other do more human actions with cloud native we want everything to be fully automated. So the system sees that there is a problem and self readjust to the new situation. And then coming to the main metrics by which you mean your disaster recovery SLAs, which are RTO and RPO recovery time objective and recovery point objective. In transition disaster recovery, we see RTO so the time that the system is down between minutes and hours. Depending how long it takes to make the decision and how long it takes for the disaster recovery procedure to recover. In cloud native disaster recovery we want it to be near zero. It's going to be in the order of seconds and because the health sex, automatic health sex, need to realize what's going on and they need to redirect the traffic to the healthy locations, but it's going to be order of seconds. So RPO, which is the amount of data that is lost because of the disaster in traditional disaster recovery can be from zero to hours depending on how you implement your volume replications or your backup system. In cloud native disaster recovery we want it to be exactly zero so perfect consistency across all of the disaster failure domains. And then coming more from a point of view of the process and the process owners of more people than technology concerns. Normally what we see in traditional enterprises is that the storage team owns the disaster recovery process. Technically it's really owned by the applications team which have to publish their own business continuity document but what happens in reality is that the application teams normally turn to the storage team and ask what is your RTO, what is your RPO and they adopt whatever is available from the storage team. In cloud native disaster recovery instead we think that the application team should own, should completely own that process. And finally, during these experiments and building this architecture what we notice is that from traditional disaster recovery usually the technical capabilities that enable the disaster recovery architectures are coming from storage. So things like backups, restores and volume sinks. But in cloud native disaster recovery what we need instead are capabilities from networking. So it's to ask communication between regions or between our failure domains and a global balance and in front of everything to direct the traffic to the healthy locations. Next slide please. So we're going to find inside this document very quickly. There is this definition that we just covered and then we have some definition on concepts that are needed in the rest of the document like failure domain, HA, DR, we're not going in detail on these things it's just enough to understand the rest of the documents. There are books about availability and disaster recovery where this definition are given in much more details. Then we covered the cap theorem and everything that come with it right like consistency availability network partitioning the cap theorem is the base of why it's it is possible to realize architecture that like we described here with the cloud native disaster recovery. So it's something that we need to explain in this document and understand to continue to understand why those things are possible. Then we define, we again look at the state for workloads and their anatomy. Similar to what we do in the white paper in the storage white paper but here we focus in on the characteristics of the anatomy that provide availability and scalability which are the concerns for for disaster recovery here. And then we look at consensus protocols which is the are the protocols and the algorithm that are needed to keep all of the instances of a distributed workload in sync. So Paxos and Raft for consensus between instances that have to do the same action and then through phase commit and three phase commit for synchronization between instances that don't have to do the same don't have to do the same action so they're working on different partitions or different data structures. And finally we have some reference architecture for strong consistency and event or consistency. So clown clown native disaster recovery reference architecture in the case you want to do strong consistency or event or consistency. Next slide is an example of the kind of research we did in this in this document these are. This is a table with a list of product. These are modern storage product or stateful workloads. And this is the choices that they make in terms of consensus protocol for keeping the replicas in sync so replicas have all they have to do the same thing so they can use Raft or Paxos or some derivative of it. And then this the sync protocol or between the shards or the partitions which are doing different things so in that case you use two phase commit or three phase commit or some other protocol of that kind. Next slide please. Here is just the diagrams in the document can find more explanation but just the diagrams about our reference architecture for strong consistency. Okay, so here looking at the picture on the left we have three Kubernetes clusters in different failure domains disaster recovery failure domain so that could be data centers or could be cloud regions. Depending on your company's definition of risk and disaster you will have maybe a distance that these regions or data center have to be spaced or have to be far apart. So you need to set up this kind of deployment and then you will have a global of balancer in front of it to direct the traffic, probably a stateless front end, and then our stateful workload. So the workload needs is West communication capability between these clusters. So that all the instances can find each other and can execute the replica and sync. And then at the bottom you see that they use persistent volumes provided by the platform. And then when, so when in this case when one region goes down. The, the stateful workload readjust itself with the remaining replicas, and the global of balancer will detect that that one region is not available anymore and send the traffic only to the available regions. And it only is, you don't lose any data so RPO is absolutely zero, and there to is just the time that the global of balancer takes to to realize that one region is not available anymore that the stateful workloads also needs a little bit of time to readjust itself. I'm not going to go further in detail on this let's let's look at the next slide for for time's sake. The next one is a event, a reference architecture for cloud native disaster recovery but in this case for eventual consistency so in this case we don't need three regions we only need two of them. In the case is other than that the architecture is similar but in the case we lose a region here that the remaining region or the main data center can keep working but there is the possibility that the two states diverge. And then, when, when they come back when when the disaster is recovered, they will have to converge to an agreed state. And that's, you know, that's, that's the characteristic of eventual consistency. We go in detail, more detail on this in the document and we talk about the fact that eventual consistency does not necessarily mean eventual correctness. So the resulting reconciliation may not be correct from the point of view of the application, but you know the only thing that this this eventual consistent state of workload will guarantee is that eventually. They will share that they will have a common view on the state. Next slide please. That concludes the part. Thanks. Thanks so much, Raphael and Xing. So finally, a little note to kind of say please join us we'd love to, we'd love to have you as part of our community. Feel free to join our meetings or and join the mailing list. And with that, we will say thank you and open the floor to Q&A. Thanks everyone.