 So thanks for joining us today in our webinar. So a new way to explore your Kubernetes storage options and introducing Cubester, which is an open source project. So before we get into it, I'm Michael Cade. I'm a senior technologist at Cast & Buy Veeam. And I focus on community about delivering all of the good news that we do from a product point of view, but also content around what it is we're doing in that space. Sarish. Yeah, hi, I'm Sarish. I've been with Cast & Buy for a little over a year now and yeah, I'm excited to be here. And tell you guys a little bit more about Cubester. Cool. So to get started, so this is an hour, our first rodeo from an open source perspective. So we've had quite an ongoing open source strategy as it comes down from a Cast & Buy point of view. Obviously there's a big why out there around open source at the moment about being able to leverage and help the community, a huge history behind open source that goes way back. But just to go point into some of the areas that we've focused in or contributed or leveraged and innovated on is the first one that we wanna mention is around Canister. So this is an open source framework that allows for application level data management or application level consistency when you're backing up your Kubernetes environments. And then you've got Copia, which is the underlying framework that allows you to really lift and shift data from A to B in terms of data protection, data management. And then what we're gonna talk about today is Cubester. So we'll get into a bit of the challenges around why, why Cubester is a thing, but also get into what Cubester is as well. So some of the challenges around persistent storage, as we know, persistent storage is growing quite dramatically, quite fast in the ecosystem. So when it comes to having multiple choices and flexibility and where you store that application or that stateful data that you require, it's about we need to make sure that we're choosing the right storage. We need to make sure that the storage is fast enough for the application and the workload that we're leveraging that storage for. We need to make sure that we're using the right storage. So it's very easy to go and buy the fastest compute, the fastest storage. And yeah, it'll probably do be great for your application, but is it needed? And obviously with that, that higher spec comes a requirement around cost. It's gonna cost you more money. And then another thing to consider is around, is the storage ready for data protection? Like can it, is a snapshot's capable? Are you able to do that lift and shift of data so that you got a point in time copy of that data? So if you were to need it, you can recover back into that production storage or potentially into another area. And we always talk generally about you can't always have the best, the cheapest. And yeah, the most economic way of being able to, it's just not, you can't complete the three, the three points of the triangle around having all three of the best options. So there's always a financial constraint into that as we just mentioned around over provisioning and having more storage than you can use, but you're still gonna be paying for that or whether there's technical constraints on that from a performance point of view. So I think it really comes down to that. There's a lot of choice. And yeah, it's really hard to maybe prove which choice is best at the moment. Yeah, and we're gonna see more of that as we go through, right? And we're gonna touch on, well, why? Why and how are those choices even blowing up even more? So one of the things that is a challenge today is around understanding and benchmarking that storage rather than just going, I'm gonna start provisioning my nodes, my storage. How can we make sure that our storage is up to speed, up to scratch, up to performance of what we need for that user, right? And this isn't a new concept at all. We've been doing this since the 1950s when disk drives were first, well, they first came to market and understanding the IOPS and the requirements around that. It's just in a Kubernetes world. I'm not gonna say it's, I think it's difficult, but it's not as easy to be able to... Well, the thing is that Kubernetes is like a recent, it's pretty recent and young and there's a lot of people that are trying to now enter the Kubernetes space and just get on the ground running can sometimes be a little difficult, especially if you wanna do something like, you have an application that's running on some legacy infrastructure and trying to move this application to Kubernetes. You may not know Kubernetes well, but you know your applications needs up front, right? Cause this is where you already use this application for. So you know your applications needs, how can you prove that Kubernetes can satisfy those needs without that in-depth experience that you may have after using Kubernetes for like a few months or a year or whatever it's right. So yeah, I think it's a little bit of a challenge because of new space, but yeah, go ahead. No, so I was gonna exactly back you up on that, Sarish, is that not everyone's an expert in everything already today from a Kubernetes point of view. Things are changing very fast. They're moving, people are actively having to get applications up and running and sometimes they're just hitting the easy button on whatever storage is available to them and not necessarily thinking about the consequences of that from a financial point of view, but also from a performance point of view. Yeah, and I think, yeah, so there is a need to have something that gives you that information, right? Yeah, yeah, absolutely. And this just goes to cement that, right? In that from an entry point of view and an ever-growing landscape of CSI drivers out there, this was a recent snapshot that we took of the CNCF landscape. You can see on there that there are X amount of vendors, but if you were to take a snapshot of this six months ago, it was probably half this. And that goes to show some of the enhancements and the innovations that are happening around storage in a Kubernetes environment or as an option to Kubernetes clusters. Yeah, and like you said, like in the past, it was just the entry provisioners which are like baked into Kubernetes. And then, yeah, with, I think there was this flex volume that came in between and then the interaction of CSI. Now, almost like anybody can write their own storage driver for Kubernetes. So like I said, with incoming CSI, I think the whole number of options has blown up a lot for Kubernetes, for storage and Kubernetes. Which is an awesome thing, right? It's great. You've got so many options and the flexibility of being able to choose the correct storage that you need for your applications. But it's not always a one-size-fits-all. So you need to have the ability to choose the right storage and have that visibility and understanding of that, of that storage. So then we get on to, well, so like Sarish said, is that, so we know our users, we know our load, we know what that looks like, we know what our application looks like, but then we've got all of these different choices, not to mention all of the other choices that you have in a Kubernetes as well. But let's just focus on the storage and make sure that we're using the right data store. And this is where Cubester can really help there before we get into it. But the challenge is, is that you don't know what you don't know, right? And that by being able to run samples against those storage types, potentially different storage types, different storage protocols that all have different drivers and different capabilities, is now you can understand a lot more about whether your application fits on that storage from a performance point of view, but also does it tickle with the other boxes around? Does it enable us to do data protection against that? Does it allow us to do this, that and the other? Like they're all common things that maybe you are being or not being looked at because people are just diving straight in to using what's available to them. I mean, like you kind of said, that applications can be very different and then workloads, because the applications have, you know, applications can have need to be able to scale with the number of users you have, right? And the more users you have, now again, that's another point of change. Another thing that can change your IO output, right? And that's another thing I need to track or be able to test your storage, if your storage is capable of handling growth and stuff like that, yeah. Yeah, and then just to simulate is that, well, all of those different applications, the databases, no SQL, SQL, et cetera, but all other requirements around storage, there's potentially similar, depending on what the load looks like, but there's also a lot of different in there as well across different applications that require different storage needs and requirements. Yeah, like you said, it comes down to different application types and then like, sorry, different application types and then like, yeah, different types of users or different user profiles or different numbers of users even, right? Yeah. Yeah, so then this leads us on to, well, how do we help? How can we help understand that storage or explore those options that you have from a storage point of view? And there's three key parts that we'll touch on. We'll go a little bit deeper into about how and then we'll get to show you as well on what this looks like. So the three key areas is, well, let's identify what options we have from a storage perspective within our existing Kubernetes clusters. Let's validate that they tick the boxes that we require around snapshots, but also other areas. And then let's evaluate that. Let's put some benchmarking around that and understand what the capabilities are of that storage in terms of performance and IO. So getting into that, like let's say you've got various storage options. Let's say that you are one of those users, companies, businesses that have gone all in on Kubernetes, you've deployed everything, your applications are running. Let's just say that like the storage was chosen and it's working great for you, but you just don't understand maybe what's happening under the hood or whether you're using the complete, all of the capabilities that you have available to you on the storage, like IE performance. And that's important here because this isn't just a day zero tool. This doesn't just, it's not just applicable to people that are only deploying Kubernetes clusters today. You can go back and retro scan or identify the storage options that you have within your cluster, understand that performance that's available, but also discover wasted resources in there. So really give you a good visibility of what's going on in my Kubernetes storage landscape. So then we get into actually validating that the storage options that we have are in fact configured correctly. So making sure that the storage options are configured correctly, but also is the storage capable of snapshot, say data protection for example. And that's done in fact, I'll let probably Sareesh talk about the flow here about how this is an automated flow, right? And I know Sareesh, you're probably gonna show this in the demo later, but just as a visual, just walk us through that bottom bit if you could. Yeah, so the first thing that it comes out to is deploying an application, right? And generally an application is a pod which maybe has some sort of storage or some sort of data and that's represented by a PVC and a PV, right? So in this case, when we deploy something like what you see down here, it's just an application with the volume attached to it, right? Now, so this is something that, like I said, if you have experience with Kubernetes, it's very straightforward, you already understand it, right? And then you also wanna be able to validate, hey, Ken is my storage provisioner capable of now protecting this, right? So I definitely, so I have a, Cubester has a method here, which it'll deploy an application, it'll take a snapshot of that application and then restore it and ensures that the data that I was written to it originally exists there. She just kind of validate that, hey, yeah, my provisioner is now capable of doing end to end a data protection, right? And the reason this kind of came about is because we definitely saw a number of customers come to us and say that, hey, Cassin can't protect my thing because my snapshots don't work. And so a tool like this is very useful in figuring out what the issues are in their provisioner setup, right? Or in their storage setup. And yeah, so I think it's powerful in debugging the kind of issues that you might face when setting up a certain type of storage. So I think another thing to add here, and for those that are already obviously familiar with deployments, pods, PVCs and PVs, then this will all make sense anyway. But for those new to this space is that this is, you run a command, and again, Saris will probably show you this in the demo, is that this is as simple as you literally throw a command in and it will go away and it will run that automated test against that PV basically and make that snapshot, perform that restore and then report back to say whether it was successful or not and automate that process. So whereas prior, the way in which that benchmark in or the validation would have to happen, more so the validation on this step is you would have to go and create that application, that pod, you would have to create that PVC, you would have to create that PV and all of those steps. That snapshot, yeah. And it became, and especially if you're new to the space you're already being filled with other parts of the Kubernetes landscape and world, you probably haven't got even time to delve into the storage, the networking, everything around that to be able to understand how to do this. Even more so if you've come from an operations point of view the frustration of like you, like me before, I was very much a storage person. So I was able, I'm always lived in that world but coming into here, it's like, that's a completely different way of thinking on how you would run those validation tests against your cluster. So this is really just simplifying that validation of your storage options. And honestly, even if you're experienced, like sometimes it's good to just have a shortcut that can help validate your storage, right? Like it takes time to do all those things, to create an application, then to create a snapshot and then to do a restore. Like all those steps are individual steps but when you have an app, some sort of command that can do it quickly, it's very helpful. I mean, it's part of my toolbox on day-to-day testing, yeah. Yeah, yeah, absolutely. So then we get to the final bubble or the final third headline if you like and that's around evaluation. So how do we understand the performance that you can have out of your storage? Now, this is leveraging and I'm gonna let Sharish get into the details of this but we're basically using FIO or flexible IO tester to perform this test. So think about that pod deployment that we just spoke about the application. That's gonna be FIO on a lightweight OS. It's gonna simulate a given workload and out of the box. There is a set one that is available but really another piece here is that it can be, you can plug in your own FIO configuration files in a true Kubernetes world, it's multi-platform. It doesn't really matter. As long as you've got access to Kube CTL then you've got access to be able to run Kube store alongside that. So it will pick up your storage that you have. And as Sharish also ended the last slide on was that handy little tool in general not only just for evaluation but it's a handy little tool to have in your back pocket to be able to identify, validate and evaluate that the storage in your environment is up to speed for whatever requirements you have. And in particular here is it's gonna enable that benchmarking but also make it super easy and automated for that. And I guess, Sharish, what would you say about the bottom bit that is that different? Again, it's a lot like the previous slide like you deploy an application like this is something that it's a core to Kubernetes where you deploy an application with some storage. And in this case, the application itself is something that tests IO. And yeah, I mean, like I said all of this can be done manually but we just automated the process. We've given you this application that has FIO that connects to a PV that runs this test and then reports results back. And this tool kind of helps you do that with like one step, right? Also, it's a very flexible tool. I made it so that you can provide your own FIO configuration. If you understand the type of IO that your application generates then it does simple as providing that FIO config to this function and then it just prints out the results that will help you then decide Hey, is the storage good for me or bad for me? Right? Yeah, super. And again, that handy little tool to have in your back pocket just to have that ability to understand what that storage is doing. But also think about this as a troubleshooting tool as well because storage also changes. Like wherever that may be there could be other impacts on that storage environment. So having this there can pinpoint where that potential issue could be. So with that, Sareesh, why don't we jump into a bit of a demo and show everyone how this looks, how it works? Sure, yeah, I can share my screen. So yeah, I'll just kind of run through. So I have a cluster here. It's a GKE cluster. And let's just do a get storage classes here. So, you know, this is what you'd generally do if you wanted to see the storage options a lot that are available in your cluster, right? And it looks like here we definitely have three different types of storage classes, but also gives you some other information about the type of information that you have. Now, this is useful for most cases but sometimes you wanna know a little bit more, right? So that's why, you know, if you run something like Kupster by itself against a simple program, it gives you some additional information. For one, you know, some important information, like what kind of, what version of Kubernetes am I running? Right? But also now, you know, the same two provisioners that we saw, it breaks them down by provisioner and it gives you additional details about it, like, you know, the type of provisioner, where you can find more information about it and then, you know, some features that it supports, right? Apart from that, it also does things like mentioned, let's say storage classes. Like if you remember, those are the same two storage classes that we saw up here. But additionally, it also tells you the type of volume snapshot classes that they have, in the case of a CSI driver, which this guy is, right? And then, yeah, you know, this is an entry driver and an entry driver doesn't have a volume snapshot class, but it does still give you a little bit of information about it. So that's the base test. The next test that, you know, we said we want to show is the ability to take a snapshot and then restore it, right? So I'll run that test right now. Let's pick the premium storage class here and then, you know, there's only one volume snapshot class for us to deal with. So let's just pick that one. So like the previous slide said, what this is doing first is it's creating an application. And what an application has is a pod, a PVC and a PV. And in this case, the application itself is writing some data to this PV, to this volume, right? It's probably writing like a date string or something like that, something to identify some bit of data. And once that application is up and ready, they said to create a pod and a PVC, then we take a snapshot. This will take a few, two seconds. But yeah, I mean, the point of this is to essentially validate that the snapshot functionality of your provisioner works, right? So yeah, it looks like we've taken a snapshot and now we're storing the application. And once the application is restored, then we will validate that the data inside of it matches what we wrote initially and then that's success for us. I'll tell you that, hey, your storage is set up properly in another couple of seconds, maybe. So Rich, what would that look like if you were like manually wanting to check that? So first you would have some sort of a deployment, right? Or an application, right? So you'd set up a pod, so you'd have maybe a YAML representation space, right? So you have a YAML representation for a pod, a YAML representation for PVC and those two together will create your application, right? So after that, then again, you'd have to create a YAML representation or some sort of a snapshot, right? And what that snapshot YAML looks like is it says, hey, what is the PVC that I'm trying to snapshot? What is the snapshot class that I wanna use, right? And so I think that's like the base information that you need and you use that information and then you create that YAML and then what that'll do is it'll take a snapshot, right? And what you will see is you'll see a volume snapshot object being generated, right? Once you have that volume snapshot object, now you can create another YAML called a restoreYAML, right? And that restoreYAML will now need this volume snapshot object, right? Or sorry, well, so this restoreYAML, YAML will look a lot like another pod in a PVC, right? And instead of in the PVC section, it says, hey, I want you to create this application using this existing snapshot, right? And then it'll take that existing snapshot, create this PVC with the old data and then bring up your application, right? So that's what a restore looks like. So I mean, it's definitely like three or four different steps, but this gives you like a quick and easy way to just validate that that entire workflow is successful, right? We've just one command, right? That's the key part to this, is that it's just taking away another potential pain point of having to run this. And I know the next demo gets into the performance, the benchmarking of it, but a very similar flow, right? It's very similar. So here we could run an FIO test. I have seen, so let's use standard for this guy. So this is standard FIO test. I could just do a help here and just give you some idea of the options. Like I said, you could pass in an FIO config file. You could also change the size of your storage and you'll notice that for several of the storage providers that the sizes matter. Sometimes the bigger size that you use of a volume, the faster your FIO performs, right? So there's a couple of different things that you can modify here, but yeah, let's run this with the defaults for now, just to see what we get. In my experience, well, like I said first, first we do is create a PVC and then we also create a pod which then completes our whole application, right? So that's an application which is now has FIO on it attached to a volume. And it looks like it's taking some time because it's 100 gig PVC, but yeah, create a pod. And now it's running an FIO test. This is a default FIO test that we have. And our default test is actually a set of four jobs. It's random writes and reads on 4K and then 128K. So in my experience, it's taken about a minute and like 20 seconds or so, but yeah, what it's doing is it's, like I said again, a pod that has FIO as the application and it's running FIO against this PVC that's mounted somewhere. Now in terms of what it would take to do this on your own, yeah, you would get a pod which has FIO as the application that's installed on it. You could create this application using a pod and a PVC. And once you have that application there, then you could exact into this pod and then run an FIO test within the pod itself. But again, that's a couple of extra steps, right? And a couple of steps that maybe if you're new to Kubernetes, you may not want to take those steps. Maybe you just want to say, hey, can I run a simple program and figure out whether this is right, that this storage is right for me or not, right? Like I said, it's a handy tool, something that may be useful in your toolbox as well. Yeah, so one of the things, just before you kicked it off, Sareesh was around the different size, like the options around the size string, right? And this allows us to test against that. But if you notice that right at the beginning of the session, we also touched on what the compute node looks like or what the worker nodes look like and being able to choose them because that, especially in the public cloud, that dictates potentially what the storage can do and where there's a potential ceiling on the disk underneath, right? So being able to run this, it gives you that visibility of that and understanding of that. So it might mean that to get more throughput, more IOPS, you need to move up another compute node or a compute type to be able to get better performance. Better results, 100%. And I've actually written a blog post about this where we kind of talk about the various different ways that you can configure your entire Kubernetes environment to get the most out of your storage, right? There's not just those two, there's multiple other ways too, like the types of nodes even matter, not just their size, but if the nodes themselves are shared or it's dedicated, like that has an impact too on your performance, right? Yeah, I think we'll share that down the line, but yeah, that blog kind of covers a little bit more on all those different options. But yeah, you see here the different results of, like I said, four different jobs that ran. And it gives you some information now, like I said, this information can be output as a JSON outputs that way you can maybe parse it better and create your own reporting tools around that. But yeah, this is just what I output for now for this one test with four different jobs. But yeah, I think with that, give it back to you, Michael. Yeah, yeah, for sure. But nice, quick, easy, simple way of being able to check your storage, right? So I'll bring back the slides. And this kind of goes back to where we were before in that choosing your cloud storage and understanding your storage. And the options when you get into, especially the public cloud, is you've got different compute platforms, you've got optimized compute in some areas, you've got memory optimized compute, which all play a part in what the storage does, GPU options, et cetera. And then you've got potentially a premium or a standard or a high or a standard or managed or managed that, and depending on what the cloud provider you're in and you're using, you've just, you've potentially just walked into a huge minefield of having to understand what this all needs to look like. So you need to pick what the right type of storage, whether it's SSD, whether it's hard disk, whether in some cases whether the volume makes a difference like we just mentioned. Other thing to keep in mind, the nodes that drive the IO, they also affect what that storage ceiling looks like, whether they're shared or whether they're dedicated, like I mentioned around whether they're premium or standard. And the reference we're using as you're here, but literally every cloud has their own options, which is great because we've got the flexibility and choice, but also it's a headache, especially when you're trying to get the right storage and you're trying to get the right compute and you're trying then also to make sure that your application has got the correct, all the correct configuration as well. There's a lot of things that are happening and you just kind of want to hit the easy button when it comes to, or at least try and hit the easy button when it comes to storage. Yeah. I mean, at least when it's trying comes to figuring out storage, maybe we're hoping that Cubester will help, will help you reach that goal boiling down all these options into one that works for you. Yeah, exactly that. So I guess with that, that's a really good point. So he says, well, how do you get started? Where can we find Cubester? Well, so you've got two QR codes and we'll make sure that there's links in the description and everything that enable you to get to here. But ultimately it's cubester.io is the first and foremost, probably the best place, but then the code repository is over on GitHub. And you can see some of the options that Sareesh went through over on the left-hand side of the, this is all also found on the site on cubester.io. Is there anything you want to add there, Sareesh? No, yeah, I mean, feel free to check it out. Yeah, I hope it's useful. Yeah, I think I was even able to get things up and running in my home lab. So it's super easy to get going. If you're familiar with Kubernetes, this is going to be a walk in the park. If you're just getting started, it's really not going to be difficult to get going at all. Yeah, I think the biggest thing is that, yeah, if you have Kubernetes set up and you've used kubectl at all, kubectl, then you have enough tools in your box to, I mean, you have enough to get started, just put it that way. That's all you really need to know. So then that brings us on to get into the end of the session. So how does the goals of the project, and this probably comes better from you, Sareesh, but I'll walk through them and you add anything that you want to add. So first of all, you've probably by now realized that how does this project help make benchmarking and validating your storage easy? Hopefully from the demo, from walking through the slides, you can see that there's a passion there and the easy button to make that happen. We've mentioned a few times about a handy set of tools to debug and validate your storage. And also, obviously this is open. So in the future, there's plans to allow users to post their results, compare them, to ultimately make that easy button even easier because if someone else has already ran something against something and you've already got the specs of the workload that is required, that'd be pretty awesome if we could just go and reference something to understand what that needs to look like rather than having to all go and fish at the same time to see what it looks like on a paid for environment. Let's just look at the stats that have already been done by the community so that we can help each other make the best decision when it comes to the storage options as well. Yeah, that's one of the plans for the future, to get the community involved, to get more results and to kind of have, yeah, like I said, you don't want two people fishing for the same thing and said, somebody's already done it and you kind of have an idea of how your application or what your application needs are, then it'd be nice to have a place where you can go and say, yeah, these are my different options and this is how my application would fare in different environments, right? And what would be really cool is like storage vendors who are already very much embracing the CSI drivers for themselves, like leveraging that driver, they were able to run them same tests and cloud providers with their storage options and we start to build up a good list of options, storage options out there. I think that would be an awesome way to develop this and increase that community footprint. And another thing you mentioned is that the storage providers themselves, like I said, maybe the test that I ran isn't optimized for Google, right? Maybe Google has their own set of tests that they deem that their storage runs best on in those given parameters, right? So yeah, it would be nice to kind of see if, to get that information as well, like their FIO test, what do they recommend? What kind of applications they recommend, right? So those are the kind of things that I can definitely imagine FIO capturing, right? So if you're using Google, your application might work better if it looked like XYZ or something like that, right? Yeah, that's an awesome point. FIO is just what's been used here to gauge that performance. But that application, in theory, could be, like bring your own application that enables you to do something, right? Like that's the beauty of this. So I think to wrap things up is, we've already seen these three things. So identify, let's understand those various storage options that we have present in the cluster, whether it's day zero or whether it's, like maybe you've got a Kubernetes cluster that's been going for weeks, months, years. You can still use this tool there and help you identify potential areas with your storage. And then validate is, well, let's make sure that the storage is actually configured correctly, whether snapshots are enabled, whether we can do something with that storage and just basically the options that we have available to our cluster. And then finally is around that evaluation, understanding what the storage performance can look like by running it against tools like FIO out of the box. So anything you wanna add there, Suresh? No, it looks good. I think, yeah, it'll help you do all of these things, right? And I think down the road, as we get more community involvement, I'd like to see what else I can support, right? What else I can, how else can we grow Cubser to satisfy all your Kubernetes storage needs, right? Yeah, absolutely. I think I'm excited to see how this can grow, how it can help achieve more around the Kubernetes storage space. Yep, and how it can really help newcomers into the space and get running, get their feet on the ground. Cool, okay, so with that, thank you for again, watching the webinar. Hopefully we've left all of the relevant places that you need to visit to find out more or see more. Cubser.io is probably the first port to navigate to. But yeah, hopefully that was useful. Thanks a lot. Yep, thank you.