 Hello and welcome to the introduction to the CNCF SIG storage. Today we're going to be talking about the cloud native storage landscape and what we've been seeing and what we've been working on and what we're seeing in the future for cloud native storage. My name is Alex Kirkopp. I'm the founder and CEO of StorageOS and one of the co-chairs of the SIG storage. And I'm here with my co-presenter Aaron Boyd. Hi. Thank you, Alex, for the introduction. My name is Aaron Boyd. I currently work for Apple in cloud engineering and also the co-lead of the CNCF storage SIG as Alex mentioned and Quinten is unable to join us today, but also serves as a co-lead as part of the CNCF. Next slide. So today we want to cover quite a few areas around the CNCF storage SIG. We want to talk about what the SIG does, how you can join and help out, also an overview of some of the storage projects that exist now within the CNCF at Incubation and Graduation level. We'll also take a look at a couple of the projects that are currently in review. And then Alex is going to go into areas we would like to see more projects participate in possible gaps in the landscape, an overview of the CNCF storage landscape document that was published earlier this year, and an overview of the performance and benchmarking document that has recently been published. So, what do we do with the SIG? First of all, all of our meetings are open. We meet twice a month, on the second and fourth Wednesday of every month at 8 a.m. Pacific, and you can see here several links to list out what our charter is, who generally participates. It also has a link to the current conference call that we use every time we meet so that you can just bookmark that. And then our agenda, which is open for anyone to add items that they would like to discuss within that. Also within the agenda, we try to keep very thorough meeting minutes. And then there's the recordings. So if you have time and you want to go back, you're welcome to watch any of the previous recordings of the meeting. And then lastly, we would love for you to join the mailing list. The mailing list, we send out information, questions that may come up, also links to the recording and the agenda when we're done with each one of the calls. So our calls and our membership are always open, we're completely transparent. So if you have any questions or concerns, you can reach out to us through any of those means. So who is on the storage SIG? What is it comprised of? So I would say currently the participants within the CNCF Storage SIG are a very diverse group of both users and developers of cloud-made technologies. And we all have a strong storage focus. And we're all leaders and early adopters. Many of us have been involved in Kubernetes since the beginning and have seen the progression of how storage has changed from being a technology where we pre-provision volumes connected to them to dynamically provision to adopting object storage, for instance. And so we're organized into three different leads, all from different vendors or companies, Alex, myself, and Quentin. And then we have tech leads who are volunteers that help also review projects and do due diligence. Jing, Luis, Sugu, and then Saad. Saad is also a TOC member, so he helps perform some of the TOC duties to the storage SIG as a liaison between the two. So what do we do? What exactly is involved in our participation in the CNCF Storage SIG? So the idea of the SIG was to have subject matter experts in storage be able to review projects or talk to different projects and early adopters so that the TOC could then scale those contributions in a way that they feel like or review technically their due diligence was done on it and also be able to reach out to the end user community for storage to retain that integrity and quality and therefore support the CNCF mission. There are a multitude of projects that apply weekly, monthly, that want to be part of the CNCF and it became a bottleneck to be able to review those fully. So the idea of the SIG was created separated out by different components within the cloud-native landscape and here we are with Storage SIG and the Storage SIG was one of the first SIGs to be created and most of us have served since then. So what we do as part of that mission is we educate we review the storage project proposals, we engage with the user community and we work with TOC and other SIGs. So what does that mean other SIGs? So many times projects cross over these different lines. They don't meet into these nice little containers for a pun. So many times a project will come into the CNCF and they will have a component of storage within that. They might have a key value store. They might have a database associated with it. So then we work with the other SIGs to make sure that it has a cloud-native architecture. It fits within the means of the mission and perform the due diligence necessary to make sure it's an appropriate project to be accepted. So part of that end-user education is publishing white papers and various other items that help educate our end-users on how to select the best storage for the use case that they're wanting to use and adopt. So white papers, presentations, videos, all of these things are a form of training to better teach people new to Kubernetes or new to cloud-native or who may be coming from legacy applications and how they need to change them. We try to develop generalized vendor-free best practices around this. We try to create a common nomenclature of which we talk about different storage technologies and different patterns and publish those out so that we can see the trends within the community, within the wider landscape and help users make the best choices for what they're trying to accomplish. So the first one of those things that we published globally was the CNCF storage white paper. It's on version two and the white paper is long, but it is extremely comprehensive in trying to net out what are all the different components involved in traditional storage and also within the cloud-native landscape? What do those things mean when we start running them in a cloud-native fashion? The second piece is the performance in managed marketing white paper. For many people it's critical to have a very performant file system or way of utilizing storage. And so this is our first pass It'll be a version one to be able to articulate how we could do performance in benchmarking within this cloud-native landscape because it's very different than it would be done in a normal traditional storage. And then as much as possible we provide that information and we publish it out to our users and we strive to be very vendor agnostic open and publish everything that we find. And then part of the project review there are within the CNCF TOC if you go into GitHub you can see many of the different processes laid out. You'll see this nice little graph we have at the bottom published there as well. And so this shows how the SIG is involved in the review of many projects. And so part of our job is to first identify to the CNCF gaps that we see in the project portfolio, but then to also go through the current projects and make sure that they're still tracking where they should be. Ideally when projects are accepted into sandbox the idea is that we see a viability of that project and we want to see it progress through the different levels. Incubation and then graduation. And so part of that we call them health checks, but it's really projects moving from level to level and seeing if they're meeting the criteria from one to the next. It's also to perform discovery and outreach to different projects. We may hear of a project within the community or so may reach out to them. We have them come present on our bi-monthly meeting discuss the technology and then see if they're a good candidate for acceptance into the CNCF at the different levels. We also help those candidate projects. So part of our job is not to be the yes or no let's say within these projects, but part of it is to identify maybe where there's gaps and work with that project to help them remedy any issues that we might see that will prevent them from being part of incubation or graduating of the project. And so once we have the project assigned, it's assigned to the SIG. We also then work through this process as you can see at the bottom to review it, to provide a recommendation to the TOC. Once we have done that, the TOC agrees. They assign a sponsor from the TOC to drive that due diligence and then put it up for public comment and then they vote on it and that's generally how a project goes into incubation either through sandbox or incubation initially. We engage. We really try to reach out to end users to gather their input and feedback regarding where they see pain points on either using these technologies or integrating the technologies into their roadmap. What are their primary use cases that they're using that technology? We try to document that and gather that and we try to put that in a consumable report, either that be a presentation or then into our white paper as a new version. And we try to include in there what are the different components of that technology that make it unique? Maybe it's the design of the architecture, maybe it's the UX, but we try to capture all that information and publish it. We also try to enable the community. Probably biggest thing is within our bi-monthly meeting and taking those agenda and notes, sending information out on the mailing list and keeping our communications open. And the documents, of course, are obviously openly published and maintained and they're continually open to feedback. Those are living documents and we change them as they need to be done. And lastly, I believe, we work as a trusted advisor to the TOC. So many times, the TOC's formed a limited number of people and depending on the year, there isn't always a representative who can speak to a subject matter expert level for storage. So the idea as well of having a storage SIG is to be able to be the trusted advisor to them to provide input in terms of how the project can scale if it adheres to the CNCF values and also the help of the project. So, you know, our role is to augment possibly the experience of the TOC members as necessary and community. We kind of already went over this in a couple slides before, but we really do want everyone to participate that wants to be here. And you will see here within the link where you can submit and help review projects. Maybe you don't have anything that you want to bring forward, but you want to be involved in the process and have input. We completely welcome that. And they are openly published and can be commented on. And, you know, though we're just have looked mainly at management frameworks and blockstores, you know, we're really focused on trying to get possibly more object stores in databases, you know, that I don't feel like we're at the limit of what we can accept within the storage SIG. And there are many projects that have presented on the bi-monthly meeting, just some of them listed here. You're welcome to go back through the meeting minutes of the agenda. Search for any one of these and find the reporting so that you can go learn more about these projects listed below. Yeah, just a small interjection to echo the passion we have for the community. You know, we're a really diverse community covering sort of vendors and leads and maintainers from different projects as well as, you know, just independent contributors. And the most exciting thing for us is to discover new projects and discover new concepts which help the cloud native storage environment. So, you know, as Erin said, please do propose your projects and we'd love to hear from you. Yeah, absolutely. And so thus far, as far as projects that have been incubating or graduating, here's just some of the many Rook just recently graduated in the last few weeks, also the tests at CD, TIKV and then incubating as Dragonfly. So there's plenty of room for other projects still to be part of the CNC of storage SIG. And so what wasn't mentioned on that slide as far as the levels is the CNC of sandbox. So the sandbox process has changed dramatically within the past few years. And the sandbox is meant to be an early stage to identify potential projects that would be a good fit, but don't yet meet the criteria of incubation or graduation. So you can go to this link below, look at the different sandbox projects are there, and then also look at how you can contribute your project as a sandbox project. The idea though of the sandbox is truly to grow and cultivate the project into a viable incubation project. So both the TOC and the SIGs are committed to helping projects get visibility that they need, contributions and advice. So the current projects that we're reviewing for incubation is Provega. You can learn more about them at the link below. And then OpenEBS has been in sandbox for quite some time. They have applied to incubation and we're currently reviewing the criteria for that. Thank you, Aaron. Okay, so I'd also like to cover off some of the work that the SIG has been doing and some of the papers that have been published that we have published recently. We recently revised the CNCF storage white paper with some additions. The version two of the white paper is there to explain the variety in the cloud native storage landscape. And when we talk about storage in a cloud native way, we're not talking about just storage systems in terms of volumes, but in any way that you can persist data in a cloud native way. So that includes things like object stores and databases, for example. When we first set out to put this white paper together, we discovered that actually it's really important for end users to understand what their application needs and then to be able to translate that to the attributes of a storage system. Because in a cloud native world, developers get a much more active say in what the storage system is and which choice of storage system to use. So being able to understand the attributes that make up a storage system is really important, as is understanding all the different layers in a storage solution. As nowadays, there are a variety of layers that affect the virtualization or the integration with an orchestrator. And understanding those layers is also impactful to the attributes of the storage system. And then finally, we review the different data access interfaces in terms of both volumes and APIs and also the definition of the management interfaces. So I've put in a little summary in the next couple of slides to cover off things like the data access interface, which is the way that you persist or access data in the system. We tended to group this and the white paper into two main buckets, the first being volumes that can include traditional block devices, fire systems or shared fire systems. And this can be over a variety of topologies, whether it's local remote or distributed topologies, as well as access to storage via APIs. So for example, object stores or key value stores or different types of databases. And although we're used to determining attributes based on different data access interfaces, like I said, it is kind of important to understand the different topologies of those systems as often the way that you access storage is not a good indicator of the attributes of the storage system. Secondly, of course, in a cloud native world where we're looking at making the workloads declarative and composable in much the same way that with an orchestrator like Kubernetes, you're defining things like the computer, the memory requirements for your applications with the different storage interfaces that can apply CSI being the CSI interface being the standard for Kubernetes interfacing with storage systems. You have the ability now to be declarative across your storage workloads and developers can define in a declarative way what they need out of their storage environment in terms of the size of the volumes or the different attributes that the storage system can support in terms of, for example, replication or data protection, for example. And it's key to understand that there are native interfaces and also some frameworks and tools that provide these projects with the ability to provide the connectivity between the storage systems and the different container orchestrators. We talked about the storage attributes. We defined five key storage attributes that are important within the within the cloud native storage landscape when evaluating the different storage systems. Those are availability, scalability, performance, consistency and durability. The key here is that each storage system may have a number of different attributes and often attributes can cause compromises with different attributes. For example, scalability and performance and consistency, projects might have different optimizations in each of those attributes to hit different use cases. For example, systems that might be optimized for throughput might not be optimized for latency or systems that might be optimized for redundancy or durability, for example, might not be optimized for throughput, for example. So it's important to understand what your application needs and what the storage system can provide under the covers because there are a number of different ways of measuring these attributes as are listed on this page. We also talked about the different storage layers in a storage system and in an orchestrated environment you will see that there are a number of different layers starting from the container and the container namespace and the orchestrator to the different topologies that might be implemented in the storage system, whether it's centralized or distributed or sharded with databases, for example, or very commonly nowadays, different hyper-converged topologies, as well as the data protection capabilities which have, of course, important impacts on data integrity as well as consistency and latency, for example. And then the data services that allow the flexibility within applications to provide snapshots for backups or data protection and replication, for example. And then we mustn't forget the physical underlying layers that ultimately the different storage services are persisting data on. And then finally, we recently released our performance and benchmarking in white paper. This was something which was our first offshoot from the CNCF storage white paper where we said we're going to provide additional details on some of the attributes, the first attribute being the performance and the way we measure performance, as that seemed to be a common question amongst end users. So in the white paper, we put some work together to define some of the common concepts for measuring the performance and the benchmarking of volumes. What's important here is that we made a decision not to specifically assess the performance ourselves, but this is more about providing end users with the information to be able to assess the performance of their own systems and understand the attributes of their own systems, whether it's storage on volumes or storage on databases, which were the two areas that we decided to focus on. As we were putting the paper together, we discovered that actually a lot of the challenges were around assessing the common pitfalls and considerations that often come up when people are benchmarking systems. It's definitely safe to say that Apple's Apple's comparisons are extremely complicated in the storage world. There are lots of factors that can affect performance, everything from things like compression or different caching or different physical infrastructure or different virtualization parameters or even things like encryption and other types of data protection can dramatically affect the performance of different systems. Ultimately, we decided to make sure that all of this is enumerated and in an easy to understand way. We provided some sample tools so that end users can run their own benchmarks and understand the attributes of their own system. Here's the most important takeaway. It is completely useless to compare vendor published results. If you're looking at IOPS or transactions per second or megabytes per second or whatever, it is hard to impossible to compare the published results without actually understanding the specific test conditions. What we always recommend is that end users run their own tests on their own storage in their own environment so that they can understand how their particular environment behaves with their storage environment. That ultimately gives them a better understanding and a better capability of figuring out what their application needs and what their application demands are. That's it. With that, we finish our presentation and we'd love to hear questions and we hope that we'll be online to take questions and hear any comments.