 Hello. Thank you all for joining us for our session on cloud native storage where we're going to talk about what the CNCF does in our storage tank and cover a little bit of an intro of what cloud native storage is and some of the projects in the space and some of the efforts we've been working on, white papers and disaster recovery and performance. So my name is Alex Kirkopp and I'm here with my co-presenters, Raphael Aspasoli and Zhinyang, both of who you've probably seen around in other talks. So what are we covering today? Today we're talking about, we want to give you a little bit of an overview of the tank and what it does and why we're here and what we do in the CNCF. We're going to talk a little bit about why cloud native is important and some of the white papers and documents that we've been working on and finally, we want to hear from you. We want to talk a little bit about the community. So please feel free to reach out during the talk or after the talk and we'd love to hear your questions and find out how best we can help the community. So a little bit about the tag. The tag is, you might have known them in the past as the CNCF 6. The tags were kind of renamed because we kept getting confused with the Kubernetes 6, which are a distinctly different organization. What we're here to do is to work with the community in the CNCF with an open community so all our calls are open and all of the work that we do is freely available on our repos and we'd love you to have a look and help us out. So who are we? It's a complete mix of people. We have vendors, different experts, various independent contributors working with cloud native technologies, mostly with a storage focus. The tag has a number of coaches and tech leads and we're always looking for additional numbers to join both in our calls and as potential leads. And I'll explain a little bit more what we do in that space. So please feel free to reach out directly to us. Connect to our mailing list or the CNCF Slack where we're always online. So what do we do? The tags were put in to help scale the CNCF, right? And to scale the TOC and the work that they do with all of the different projects. The CNCF has gone from a few projects to now dozens of projects and in the sandbox category we have over 80 projects as of the last count. So what we're here to do is to provide the storage experience to the TOC and to help the TOC scale and help make ultimately the best decisions for the community. But in all cases the TOC remains the ultimate decision-making body and those are the elected members of the CNCF. So what does it mean? We work on providing educational resources and white papers to help the community understand what cloud native storage is. We provide the expertise to review projects as they go from sandbox to incubating to graduated and we provide expertise into some of those annual reviews as well. And of course we work with the community and events like today and provide subject matter expertise whenever the TOC requires that help. And these are some of the projects which are in the CNCF portfolio that connect storage. Some of these are extremely well known you know like at CD which I'm sure each one of you is using on a day-to-day basis with every Kubernetes deployment. But there are projects like Rook that's that's operators for for CEP for example with Vitesse and TIKV which are distributed databases or distributed key value stores. Harbor which is a container image repository. As well as other other projects like Dragonfly, KubeFS and Longhorn that are in the incubation stages. Now each of those CNCF projects is listed on the CNCF website and we have there's a long list of sandbox projects which are also listed under the CNCF website under URLs in that sheet. And each of those each of those projects tends to go through through a cycle where we have sandbox projects which are very focused on providing you know a low bar of entry into the CNCF where they can actually experiments on different technologies collaborate with other projects find overlaps but also work on you know building out their community and building out the criteria that will help them ultimately to get into incubation stage. Incubation stage is where is where it is actually one of the highest bars for the projects to achieve and this is when projects are being used in multiple production environments and they have a healthy roadmap a healthy number of committers and you know the project metrics are showing the wide adoption of those projects and then graduation graduated projects are where we have where we add additional criteria like security audits to make sure that those projects are useful are used by some of the largest mainstream environments. So a little bit about the client narrative storage some of you might be very familiar with this some of you might be new to this but why should you think about it and why is this important well I'll put something up in amber which might be a little controversial but I think that there is no such thing as a stateless application and why is that because ultimately all applications are storing states somewhere and sometimes you're storing states and files and sometimes in key values and sometimes it's in objects and sometimes it's in databases but all of those applications needs neither way to store their state and so what you know when we're looking at cloud storage we kind of see this this huge necessary adoption where typically a lot of those stateful workloads used to be outside of Kubernetes and now as we've as we continue to get further adoption those safer workloads start moving into Kubernetes to take advantage of course of all of the automation scale the performance the automated fail overs and we'll talk a little bit about some of those attributes when Jing talks about our white paper and when Raphael talks about things like disaster recovery in cloud native workloads and the thing is cloud native storage is here to stay right we we have a very broad ecosystem there are literally hundreds of vendors and hundreds of drivers that interact with Kubernetes with different capabilities and we'll talk a little bit about some of those capabilities in the white paper discussion and more and more we're seeing operators to help deploy and manage the automation of these stateful workloads like databases like message queues like like key value stores etc and so many more different use cases which you'll find in some of the talks today and tomorrow as well as you know if you if you visit the sponsor area you'll find lots of examples of this so one of the things that we that we that we wanted to do as part of our education on cloud native storage was to give some clear indications of the differences between between the difference cloud native storage attributes so I'll pass it over to Jing who's going to talk to us about the white paper that we developed thanks Alex so I'm going to talk about our white paper on cloud native storage in this white paper we described what are the attributes of storage systems and what are the different layers in the storage solution and how they impact different attributes of a storage system we talked about the data access interfaces and the orchestration and management interfaces storage systems have some storage attributes such as availability scalability performance consistency and durability availability refers to the ability to access to the data when there is a failure and this can be measured by uptime like the percentage of availability for example 99.99% of uptime scalability is measured by the ability to scale across the number of clients the number of operations throughput capacity the number of components performance is measured by latency the number of operations per second throughput consistency refers to the ability to access to newly created the data or updates after it's committed this can be either event your consistent or strongly consistent and durability is affected by the layer of data protection the level of redundancy and also the endurance of the storage media and the ability to detect corruption and recover from the recovered data from that failure storage systems typically have layers or stacks that affects the storage attributes rather than directly access resources or hypervisor could be access the resources and in this case it could could add the access overhead and storage topology refers to the arrangement of the storage and the compute resources and the relationship the data link between them and that includes centralized distributed sharded or hyperconverged topologies storage systems typically have a data protection layer that adds redundancy so here we refer to raid erasure coding or replicas storage systems typically have data services in addition to the core services such as replication snapshots clones and so on and ultimately three system will persist data at a physical storage layer which is normally non-volatile and this will affect the overall performance and long-term durability and now let's look at data access interfaces in this diagram we have workloads the container consumed store storage system through data access interfaces there are two types of data access interfaces as shown here one is volumes that is the interface supported by the container orchestration systems including blog and file systems and the other type is APIs here we refer to object store key value store databases so for the object store we do have a project called cozy content object storage interface that is aimed at adding object storage support into Kubernetes and that project reached the upper stage in 1.25 release of communities and now let's look at the orchestration and management interfaces this control plane interface here refers to the storage interfaces supported directly by the container orchestration systems such as CSI container storage interface the darker warning driver interface or other native interfaces and note that flex volume was deprecated in Kubernetes 1.23 release so if you are using flex volume please move to CSI driver as soon as possible and have this orange box here frameworks and tools so this is an extension of the control plane interfaces for example we can have a operator for a key value store or databases to help the workload to work in Kubernetes so that's all we have for the storage landscape white paper so now I'm going to talk about a new initiative that we just get started in tech storage we're working on this use case template based on the CNCF storage landscape white paper it's still working progress but once it's ready we want to invite cloud native open source projects to fill out the template and give a presentation at the tech storage meeting and then eventually we will publish those use cases at our GitHub repo so here's an example use case this is still working progress so I just used ECT as an example here so here's a description of what is ECT what are the storage attributes so it's a distributed key value store it has a strong consistency and it's a CP database meaning that it supports consistency and partition tolerance at the expense of availability so you can't survive request if quorum is not satisfied and also it supports provides stability reliability scalability and performance and it is cd so Benjamin one from EDC team did a presentation on June 8th of this year so I have added link here and looking at storage topology it is cd is distributed and it's a key value store and from the data production point of view it has replicas and it has replication and snapshots so this is something still working progress if you are interested please join our tag and join the discussions so just to follow on from from those templates one of the things we are looking for and we would love to hear your input is which types of use cases and which types of templates we should be working on and document next and we're looking at you know members from different projects databases for example open source projects that that work on stateful workloads and build out these templates to effectively help end users to figure out their stateful workloads now one of the things that you know we talked about in the storage white paper and I heartily recommend that you have a read of that is we wanted to get an understanding of the different attributes and how they interact with each other right because each of the different layers in a storage system and each of the you know the topologies in the architecture of that system will affect you know one of those one of those attributes and typically you have compromises to make right between the consistency versus the performance or the availability versus the scale and all these sort of decisions so we then came to the conclusion that one of the most common things that we were getting requests for from the end users was how do we focus then deeper into some of those attributes and we focused on performance and came up with this performance white paper and also availability with our disaster recovery white paper and one of the one of the interesting things with the performance white paper was we were looking at the different ways you can you can reliably compare and benchmark applications within your clouding environment within your Kubernetes environment and so we focused on two main areas one is how do you performance benchmark the volumes and that might be available from your different storage systems or cloud providers and then secondly how do you benchmark and manage databases because that was also a very common solution and as we were going through this we we obviously you know hands up on a very big performance nerd and maybe some of you are too but one of the things that we realize was it's actually one of the most complex things to do Apple for Apple benchmarks across these different environments right because and it became almost just as important to explain how to do it it was explaining some of the pitfalls that that you that you might you know come across so for example figuring out whether you wanted to measure what's more important for your for your particular work was it the operations or the throughput or or or figuring out the overhead that some of the different layers and apology might might might provide like you know the different types of data protection or or the or encryption or security and how all of those things affect the latency in the overall throughput of the system but also how you figure out the different currencies you know sometimes you might find that the clients might be the bottleneck rather than rather than the storage system and in all of these sorts of things we came across so many examples where where end users might be doing their own things like for example you know benchmarking systems that had really big caches with really small volumes for example and I can't I can't tell you how many benchmarks there are there you know published on the internet where you know we say oh we're delivering five gigabytes per second on a disk that only supports 200 megabytes a second and you kind of say okay well you're clearly just benchmarking the cash at that stage and you know to actually benchmarking the volume right so these are the sort of things that that's that we caught out I'm hardly recommend that you read the paper because there's lots of learnings in there but one of the key takeaways is don't always focus in fact almost never focus on you know published benchmarks that are often marketing material and it's really important to actually measure the performance in your environment because there are so many things that can affect the performance all the way from you know the nodes the network the environments the orchestrators the storage systems and types of file systems and databases and everything else in the stack but you really need to test these things in your in your specific environments with your specific tools so so the paper gives you those tools and those methods to help you do that in your environment and now I'm gonna head over to Raphael he's gonna talk about our failover and disaster recovery thank you Alex so cloud native disaster recovery while I get ready here this one okay perfect so by the way thank you for you all for coming it's it's so numerous it's a it's really a pleasure for us to see so many of you and to see the interest so cloud native disaster recovery there is a question right there probably you are you're wondering does disaster recovery change in any way with the cloud native approach to things and in this white paper we submit to you that it might change right it probably should but for sure there are there is a new way to do things that we should all be aware of and then be able to make our choices so what are the difference between traditional dr and cloud native the disaster recovery we created this what the white paper obviously going that but we created this table here to try to summarize them right so in terms of deployment with the traditional dr usually you have active passive type of deployment so you have two data centers here one data center serving the traffic and another data center is passive and it's there just take over if there is a disaster with cloud native dr we want to do active active deployments so all the instances of our stateful workload can receive rights and reads there are no passive instances and then the detection of the disaster in traditional dr usually it's a human decision so there is an incident there is some problem and somebody decides okay let's trigger the disaster recovery procedure and then a bunch of things that need to happen right and that's why because there are human involved that's why companies typically do maybe biannual disaster recovery exercises to make sure that the procedure actually works right for cloud native dr we want the detection of the disaster to be automatic and then the disaster recovery itself right usually in traditional dr it's a mix of automation and human actions you know if you're more if you're more if you're mature maybe you have automated everything but it's in generally it's a mix and instead in cloud native dr we want it to be fully fully automated and then RTO and RPO recovery time objective and recovery point objective these are the two main metrics or KPI that you use to measure the efficiency of your disaster recovery so recovery time objective is how long does the system stay down right it's not available and for traditional dr you have from close to zero if you're very good to hours right because it takes hours maybe just just to decide that there is a problem and then and then to start recovering but for cloud native dr we want it to be close to zero essentially a few other checks have to fail and then and then we decide that it's a disaster and we start we start recovering and for recovery point objective which is how many you know how long of time spent of transaction do I did I lose because of the disaster for traditional dr it depends on depends a lot on how you do data replication backup and restore but could be could be from zero to hours again right but for cloud native dr we want to be able to have exactly zero RPO if you're doing if you're doing especially if you're doing strongly consistent deployments and then from a more a process perspective the owner of dr has always been the application or the business unit but in traditional in traditional enterprises what this team do usually is turn to storage the storage team and ask what can you give me in terms of SLA and SLO for recovery of the storage and then that that becomes their strategy so really the storage team owns a strategy for disaster recovery but in cloud native dr what we are noticing is that it's because because this there is a new generation of middleware that can be used to do this kind of cloud native disaster recovery and then this middleware projects or products are chosen by the application teams then they become really the owner of the strategy and the process and then in terms of capabilities traditional dr uses very storage oriented capabilities like backup restore or volume replication synchronous or a synchronous but in cloud native dr we need the more network oriented capability capabilities we need the ability to communicate east west and by that I mean if you have multiple data centers there is there need to be a way to communicate between data centers and and that's that's the way that's the path that the middleware the smart middleware is going to use to sync the transaction and then we need a global of balancers that are smart enough to detect if an application is available or not and trigger automatically the recovered procedure so what what can you find in this white paper so we have this definition and then we have some some other definition around the concept of failure domain high availability and disaster recovery we have some reasoning around the cap theorem it's a it's very important to understand the cap theorem to also comprehend because it lays down the rules of of stateful distributed workload and now I know that they can work in in a situation of failure so if you're like me you see some white area when I went to school they didn't they weren't teaching the the cap theorem I studied myself I suggest everyone that that is working in the storage area to to go up and to study up on it it's it's really important and then we have we look at the anatomy of this new generation of fully distributed stateful workloads and the concept of shards and replicas and then we look at some consensus protocol and some reference architecture so I have some examples here but you can you can find more about this in the white paper so for example when with regard to the anatomy of a stateful application what we find is that distributed work stateful workloads all have the concept of replicas and shards maybe they call shards partitions and replicas they call something else but they all have these concepts replicas are used to provide a high availability right because I replicate the data to multiple failure domains and shards are used to provide high scalability because they're used to split their requests into multiple parallel processes they can elaborate more more requests so that they this this kind of workloads can scale beyond what a monolithic database or a monolithic cache can ever do and and it's interesting so replicas and shards need to coordinate replicas because then they you know these are processes that they have to to sync and have always the same view of the state right and shards sometimes need to coordinate because you may have a transaction that involves multiple shards so we did some research to see amongst the most common stateful you know workloads and there are more here if if your favorite is missing apologize we can add it to the the white paper but we did some research to understand what what protocols they were using because understanding the protocols gives you a good idea of what the workload can do so if you're doing for example a software selection this is a good idea to dissect the the product and really understand what happens so for replicas consensus protocols the two most used are raft and Paxos as you can see and going forward it's going to be always raft because Paxos is a little bit more difficult to implement and for shared consensus there are various options one of the most used is the two-phase commit protocol and here for an example of a reference architecture this is for deploying a strongly consistent workload on Kubernetes so big for the cap theorem we needed three availability zones in order to have a strong consistency and we need to select a workload that has chosen to be consistent because you have two choices for the cap theorem you can choose to be consistent or you can choose to be available during a network partition so here we have we pretend to have a consistent strong and consistent workload and three failure domains and what happens when one of the region goes down is that the global balancer needs to figure that out and reroute the traffic to the healthy availability zones or regions and the workload will automatically readjust to the situation there are more information there's more information in the in the white paper and we also take into account eventually consistent workloads that's it brilliant thank you so much so finally I just wanted to finish off before we go into questions with a little very blatant recruitment we'd love you to join our community would love to hear you would love to hear from your staple workloads and your projects and the challenges that you're seeing and in this space please feel free to join our meeting it's open to all it's on the second or fourth Wednesday of every month and we've seen you know if you only join just to listen that's fine too you can find information about tons of different projects have presented over over the period and participate in the discussions and would love to hear from you so with that I'll hand over to two questions please wait for the microphone because the sessions are recorded it's very helpful I have a question this deck seems to be kind of focused on a single application and there you can get what we call technical consistency of that single application but in a disaster recovery what is important for us is business consistency and that's also deals with the order that applications are brought back is that anywhere in the white paper described or in the definitions yeah and our consistency is I think across applications comes into play when you do backups and restore because you need to backup at the same all of these applications at the same time but but for online I see you disagree but for on for this kind of workloads if you need consistency across across state of workload you need to do a dual right essentially a ride that is consistent across across multiple state of workloads and at that point is that right either either goes and or doesn't but there is no risk of inconsistency I'm not talking about the restore if a data center goes down it is an atomic event but it is a durable event unless there's a nuclear bomb but if there's a fire or flooding not all systems will go down at the same time so you have a period of minutes maybe even hours that the systems are out of sync you're you're perfectly right I think in you know in in many typical disaster recovery scenarios you don't have that big bang event that takes out an entire data center you perhaps have a fire or a flooding or something and racks go out over a period of time and you are hampered by the fact that you can't access a data center during that period so you have no control over it but I think this is where some of the patterns that we see in in cloud native and using you know Kubernetes and orchestration come into play right because the different workloads can have can can be declaratively designed with their dependencies and different you know service models etc. such that they actually do start in the right order and and things might fail health checks because one of their dependencies failed for example and they can be shut down and then restarted in the right place so so I think it's a long win at the answer of saying it depends but but there are the technologies available to do it and I and I would argue that it becomes an order of magnitude easier when you're using you know cloud native to do that yeah but you used availability zone and region quite freely as interdependent or and I think doing a disaster recovery across regions will never be zero cannot be zero yeah this new generation of workloads can be stretched across regions and we did experiments and load tested them so if so we did we be simulated disasters where we took down an entire AWS region or actually isolated network right because we can't take it down but and the workload reacted seamlessly removed the work the the traffic to the other available region so for the workload really it doesn't matter it's it's all about the availability zones it doesn't matter if sorry the failure domains doesn't matter if a failure domain is a machine an AZ or a region it will always react what changes is really the latency right if you want to stretch across multiple regions processing a single transaction will take longer right and can can your use case afford to take longer that's really the question but they don't have any problem to be geographically distributed I did test with North America but I could have stretched across Europe and it would have worked thank you any other questions anyone yell wave anyone okay final thoughts well fantastic thank you again everyone for joy us and we'd love to see you in our polls