 Hello and welcome cube con you've made your way to my data's booth and today We're gonna talk about operators CSI and the cool stuff that waits beyond that technology So a little bit about Maya data for stuff. We're a top five contributor to CNCF ecosystem by PRs This is by the dashboard that CNCF has for that sort of stuff We're also the creators of open EBS and litmus, which is a chaos engineering project a little bit about open EBS It's a storage engine kind of platform. You kind of get to pick and choose different storage systems to hook in We call this CAS for container attached storage. It's like direct attached storage. It's like network attached storage We like to think you get the speed of direct attached storage and you get the nice features of network attached storage But you remove the blast radius and the difficulty of managing things with that So there's some great reading you can do online Check those out and for our purposes. We also have licenses you can buy for these So you may hear these terms Cuba are propel Cuba are chaos These are effectively just the licenses if you want to get professional services and other fancy stuff To go along with those open-source solutions and those are CNCF projects. So let's dive into our discussion today What is an operator? What's CSI? These are conversations. We've been having at recent cube cons and other events You might even be bored and tired of hearing about these things But part of the reason they get discussed a lot is they're important an operator is a really convenient way to Manage and run a stateful or complex application on kubernetes CSI decouples the underlying kind of storage mechanism or storage plug-in from upstream kubernetes So if there's a security issue or we need to release a new version of our storage driver We're not tied to a kubernetes release waiting for a new version Minor version update you can update your storage kind of plug-in Separately from kubernetes and that that's really nice when it comes to innovation and improving the speed at which We can kind of build out this cool technology so those two things are actually Something that have dominated the conversation at past cube cons operators and CSI And they're certainly present at this cube con But other acronyms that you're gonna see a lot more over things like this dpdk spdk Those are two projects. We'll talk about a little bit. They're related to speed PMM is also related to speed PMM Has a CSI driver that Intel's worked on. It's persistent memory. It's meant to help speed up workloads Cozy is interesting. Cozy is something Minio has worked on they do object storage So the same way CSI decouples and makes it really easy to work with these systems in a Loosely coupled and easy to innovate on kind of fashion Cozy brings that same advantage set to object storage Whereas CSI covers file storage and block storage and then finally ti flash is a cool project really oriented around TIKV the top-level CNCF project. That's a key value store in rust Pincap the folks who are working on that, you know, notice that they were able to get a 10x improvement in speed Applying this new technology and technique to their system. So all of this innovation all this stuff I think is really enabled by those past two technologies So there are talks at cube con about PMM and Cozy and TI flash itself So those are worth checking out What I'm gonna talk to you about a bit more in depth here is DPDK and SBDK because these are part of the storage system that we plug In to open EBS These are newer technologies So these are things that are promising speed and great performance improvements but you know, I want to Confront things at the head here and just say I understand that there's there's a bit of exhaustion people are going through I've seen conversations like this online recently people are complaining or Just venting about how it can be frustrating with how much stuff is out there in the ecosystem how hard it is to keep up And you know, but part of that is by design When you look at other systems that tried to kind of pick the winners ahead of time they tended to struggle with that a bit and It's really tough to make a perfect prediction about what's gonna work and what's not So what we expect is over time, you know, it's it's gonna be chaotic and messy to begin with but You're gonna see a forest and and some order sort of grow out of that chaos eventually and There's always gonna be a little bit of it, but It will get a bit tidier over time as We all get familiar with the terminology as as projects merge and Different approaches kind of become the clear standard or the clear way we want to approach certain types of problems For instance CSI and operators have become the clear way that we approach Some problems on storage and we think really cool stuff is coming out of dpdk and spdk as well So what what is dpdk? What is spdk? Well dpdk is the data plane development kit and one thing to take a notice of is this is a Linux foundation Supported project the same way the CNCF is Linux foundation supported And the idea down at the bottom of the web page You can see here is just it's a library that helps you accelerate packet processing workloads So it can accelerate your network card. It can accelerate You know any of the stuff that's running on unique sockets or you know things like storage on your system And that's where spdk comes and spdk is the implementation of dpdk for storage So if you're looking to get this speed improvement that dpdk the data plane development kit is promising And you want to implement that on on your system or your solution? spdk is the version of that and we contribute to that until contributes to that a Bunch of other folks are contributing to it. So there's a lot of cool work happening here And as you can see on the page, there's a lot of stuff related to NVMe And newer technology that NVMe is not just a new type of hardware. It's also a new protocol Which is you know exciting We've all been using SCSI for decades and this new protocol suddenly emerges And it's sort of this whole system is designed for the new type of hardware that we have For the first time we have these CPUs with all these extra cores just Sitting around so how does that change how we can work in this new space? Well answer that question the three kind of bullets that were on that previous page that we just saw that are worth highlighting is The drivers sit in user space alongside your application not in the kernel And you want to avoid system calls or you know working through the kernel as much as possible And that's just it's connecting your application in user space directly to the hardware And that's part of the speed improvement. That's part of the design of spdk and dpdk is doing that kernel bypass like that Similarly, there's no interrupts. You want to use polling to check for completions that results in better latency And no locking in the IO path. So use message passing instead You can see some really big performance improvements and both of these are really enabled by that Shift in technology the mvme devices are faster CPUs have these extra cores all of a sudden, you know, we have 64 core CPUs That we can buy off the shelf and plug into our home servers and stuff So with that hardware available if we dedicate some of that hardware towards This new tech we get some really amazing results in terms of performance And specifically, you know dpdk and spdk are libraries that you can build with So you have to build something with them. So we decided to build something called Maya store It's the storage engine in open ebs that you can use to Get the advantages of this new technology. It's an open source project on github And there's a nice guidebook up already on Maya store.gitbook.io that we're going to look at really quickly This kind of steps you through the whole process and at the end of the github you get to run your own A little benchmark using a tool that Is popular for that sort of purpose and we have a guide up that kind of steps you through that whole process. So What we've seen from other folks who've already gone through this process with Maya store or what we're calling kubera propel For the licensed version if you're interested in that sort of thing Is really good performance in in some types of works loads. So this is a mixed read writes We had some folks at voltera who run a lot of stuff on the cloud They're a partner of ours when they were evaluating us. They did these benchmarks and we're just like wow Maya store does Really well in comparison and certain benchmarks. So that that was their findings And similarly we have other partners. We like to partner with the folks who are really building The the best way to Run the workloads that are out there right now for storage So prokona happens to be one of those partners. There's a lot of great stuff on their blog as well If you're interested in reading about how they're thinking about Maya store spdk and dpdk They've taken a look and seen that you know what this image is showing is there's about equivalent performance between Just raw disc basically and Maya store itself. So it's very little overhead Going through a system like this, which is exactly what you want for a storage workload So back to the get book. This is meant to be a guide All you're doing really is looking through the guide. You'll get some commands like this that you'll plug into your command line terminal cube cuddle commands You'll do a cube cuddle apply to apply a set of yaml to get install things You'll check that things are installed correctly like this. You might have to enable certain features on your linux kernel to uh To make sure you're getting full advantage of this. So I think there's some sections of the guide that talk about enabling huge pages and Checking that you have the right sort of driver for nvme support You know one or two other things, but uh, that that's all covered in the step by step So once that's done, uh, we get to the fun part, which is benchmarking, uh, which is what this video is here I have a slightly higher res version of this video Uh, and I'm just going to click play. So, uh, this video is showing a benchmark that we ran, uh, and it is Uh, the phio test. So phio is a way common way to a benchmark the The availability of storage resources in a cluster or a system This is a toy cluster that we've deployed on aws. For instance, you can see it's using cops And it's a very simple kind of aws cluster and if I skip ahead To where we're actually sort of, uh, pushing some some work through the system with phio You can see the the kind of data we're getting out. We're getting iops megabits per second And it's going through a whole suite for a set amount of time And at the end of that whole comparison you get this nice graph, uh, that you know talks about latency and percentiles and Just like a bunch of wild stats. So this is a really fun way to put your system Whatever that may be through its paces and, uh, really see how you stack up To kind of the best technology that's out there So as noted that demo that we just looked at is actually, uh, it's a toy cluster on aws So you're not going to see the the most impressive numbers out the box But we actually did partner with intel and data stacks intel provided optane hardware Which is some of the best, uh, the best kind of tech out there in the mvme space And data stacks, of course, is known for their work with, uh, kassandra One of the most popular distributed databases out there. So, um, if we combine all this together kubara papel or maya store That that's our implementation of the spdk dpdk technology again and then intel and data stacks Um, all working together. We we saw some amazing results, uh, is is kind of to cut to the chase Um, when we worked through this solution guide with them, we saw some really impressive numbers This is the a view of the kassandra benchmark. Um, I think with a three million, uh, rights is what it's showing here And, uh, yeah, just like I said some some really impressive numbers. So, um, we're really excited about kind of the early progress and the the promise that this kind of technology shows And we think that's sort of what this cube con is about is it's looking at these next gen kind of technologies and, uh, you know, we've moved beyond just catching up to Running storage workloads in a convenient way. We're now, uh, we're we're getting the cool new technologies to kubernetes first kubernetes is the first platform To adopt it and and have it be accessible which is even cooler and it's thanks to that great Kind of basis that csi and operators have provided so If you want to try this out yourself, uh, open ebs is already on marketplaces like the aws marketplace It's probably on other marketplaces where you get your software from For your kubernetes cluster. Uh, and similarly you should be able to search for kubernetes and see kubernetes propel By the time this video goes live for cube con Similarly kubara chaos will appear in the same, uh, sorts of venue if you're interested in doing chaos engineering tests on your kubernetes cluster Uh, and then yeah, uh, if that guide on our kind of the max level benchmark the the coolest of the cool tech That you could really test Some of this new technology out on is interesting to you. You can visit our website We have a complete guide to running kassandra on kubernetes. Uh, and on that page. There's a link to The guide that I kind of discussed in part of this presentation. So Yeah, thanks for swinging by our cube con booth. Uh, we're really excited to see All the talks this go around and uh, we love seeing the new storage technology kind of come to life and and do cool stuff