 Okay, thank you everyone for joining us today. Welcome to today's CNCF live webinar, canister application level data operations on Kubernetes. I'm Libby Schultz. I'll be moderating today's webinar and I'm going to read our code of conduct and then hand over to Michael Cade, senior technologist and member of technical staff. And pop in member of the technical staff both with cast in by beam. A few housekeeping items before we get started during the webinar. You're not able to talk as an attendee, but there is a Q and a chat on the right hand side of your screen. Feel free to drop your questions there and we'll get to as many as we can. This is an official webinar of the CNCF. And as such a subject to the CNCF code of conduct. Please do not add anything to the chatter questions that would be in violation of that code of conduct and please be respectful of all of your fellow participants and presenters. Please also note the recording and slides will be posted later today to the CNCF online programs page at community dot CNCF dot io under online programs. They will also be available via your registration link use today and the recording will be available on our online programs you to play list. With that, I will hand it over to Michael and pop in to get started and kick off today's presentation. Yeah, so this session is going to be focused on an open source project called canister. Just before we go into that, if we just go through the introductions again of ourselves. So again, I'm Michael Cade from I'm a senior technologist I basically sit within our product strategy group within beam. Obviously a recent acquisition was around custom focused on Kubernetes data management. That's not what we're talking about but ultimately the focus around Kubernetes cloud native has been my focus purely my focus for the last six to eight months solely and then before that probably tinkering around learning a little bit more around this space for the last two or three years. So that's that's me in a nutshell. Haven. Hey, y'all, I'm power. I'm a member of technical staff here at cast and by beam. So for the last, I think three and a half to almost four years now, I've been solving data management problems for mostly stateful labs on Kubernetes. I'm kind of contributing to open source projects that are related to data management and one of them being canister that we are discussing later today. Awesome and patterns going to be the the one focused on showing you the deep dive into the into the open source project but also getting to show you how it works what it and what it does. I'm going to go into a little bit of the the challenges and the the issues that we have from a data management a broader data management space, and then how canister can help with with some of those those issues so we go to the next slide. Haven. And really what I want to focus on here is, is that data management challenge around cloud native first. I think from a day two point of view, or at least from a day one point of view, Kubernetes deployment cloud native deployment is is relatively advanced is quite easy to go and deploy your, your applicator your Kubernetes clusters your applications out there in the spaces, where we're seeing more of the challenges around day two and whether that be data management whether it be security or observability is definitely one of the top most topical challenges that we have within that within our space at the moment around protecting that or having visibility into the data the management of that the security of that, as well as having some visibility the observability of that. And that really focuses on from a data management point of view around things like backup, but not just backup, obviously the recovery of that that data disaster recovery. Things still happen in this in in this platform, as well as being able to move quite freely those applications from one cluster to another and or one location to another as well. And then as we look into the different applications that we have, especially around state for workloads where we're going to have that level of responsibility to protect that data is different data services. Now, not saying that we don't have these data says that data services in other platforms, but there's a different way of looking at data services, especially when it comes to Kubernetes and how how those data services are provisioned and how they're protected then from a data management point of view. But then that leads on to there are different storage technologies as well we have our entry provision is we have our CSI drivers different ways of being able to store our workloads our data within or without like even outside of the the Kubernetes cluster as well. So some of the things that we have to consider are these layers of operations are are broader than what we maybe once had in a virtualization world, for example, and that's where we're going to just quickly go over some of those those areas and then I'll get into a little bit more around some of the options that we have when it comes to data management. So the first and foremost, and this really aligns to any of our platforms whether it is virtualization, as I know a lot of us are on this learning journey around cloud native around Kubernetes, and it doesn't matter whether you're on virtualization whether you're in a cloud based work cloud based environment whether you're on Kubernetes, you're going to start with some physical storage a layer of storage where you're going to store some some of your data and that will be the spinning flash the NVMe that is propping up the the infrastructure underneath so it's going to be running somewhere. And then similar to that it depends again regardless of platform you're going to then have that block file or object storage offering. And this will be the presentation so whatever that platform is that you're you're running, you're going to have that presentation layer that is going to take that physical storage and present that through to wherever that platform may be. And these are the two very generic areas that we're going to see across all of our platforms regardless of where they are. Now where it becomes relatively different is when we start talking about data services, and then even more so when we get to the stateful applications on the next bit but data services for the most part, when we think about databases we think about no SQL SQL, but we also have to think about the messaging queues and those batch processing workloads that require some stateful or state, stateful data that we have to consider and we have to protect. So then we, and at that point that's where we start looking at how do we protect that data and how do we do it specifically for that particular data service whether it be MongoDB whether it be no SQL or MySQL sorry, or any of the others that you see pictured on there. The other interesting fact here as well is that the data service doesn't necessarily have to run within the Kubernetes cluster or within the cloud native cluster itself. It can be an external data service such as Amazon RDS and but we still have to protect that workload and we have to then have a have a more have a better understanding of what the application that is using that data so that we can protect all of the data and the application together and that leads us on to, especially in the Kubernetes world is that a stateful application is made up of many different areas whether it be config maps whether it be secret services, applicator pods, deployment sets, stateful sets, etc. And they all play a huge part in making sure that we have that scalability and how we how we leverage that that stateful application. So there's a few different differentiators or differences is when it comes to not only the data services but the stateful application. Whereas if we were to look back at virtualization for example, we would generally see one virtual machine with one data service and we would then look to protect that whole virtual machine in one instance or one go. Now we're having to look at various different moving parts to be able to protect everything as well as the data service itself, because they all reside in different location potentially in different locations, but they serve a purpose in in terms of that scalability and availability of that that workload. If we go down to the next one, Pavan, and then that leads us on to the next challenge. So one is the challenge about choosing where you want to store your data, like the choice of data services that best suits the application that you're running on. But then this leads us on to, okay, so now we've got different areas of or different flavors of data management, and we need to understand what level of protection is it is enough. I, are we going to be and that's what we're going to talk about over the next couple of slides is really first and foremost, we need to be able to one take a backup because we need to be able to restore if and when there is a failure scenario to that. Now what that failure scenario looks like and what we can withstand within the environment, very much determines what level of protection you potentially need, but also the importance of that data. And it's not just going to be a one size fits all. When it comes to Kubernetes data management, you're going to potentially have different applications that require different SLAs and different service requirements when it comes to how do we get that. How do we get that application back up and running as fast as possible if failure scenario, A, B or C was to happen. And then we've got the added options around this because of the nature of cloud native workloads we've got the ability to quite easily lift and shift and migrate those workloads those applications from one cluster to another and that might be based on a failure again, but it also might be based on migration. It might be based on performance it might be based on scalability to or different options within maybe certain public clouds. Then in the same vein to that application mobility. We have the disaster recovery use case, fire flood and blood all still happens from a from a platform perspective when it comes to Kubernetes or any platform for that matter. So we have to consider that as well from a from a data management point of view is what happens if the worst was to happen in our in our production site and our production cluster was to no longer be available. What does our disaster recovery plan need to look like and how do we get that data from a to B so that we can have that business continuity and keep things up and running. Then we had as if that wasn't enough then we had the we have the added complexity around compliance requirements regulations about keeping data. What data are we keeping whilst all all the time trying to keep this freedom of choice and making sure that we're choosing the right tool for the right job, the right platform for the right. Ask of what we need from an application point of view, but ultimately keeping that that agnostic approach to being able to protect our data as and when we need to and if we just go through some of these options for data management and I'll touch on some of the benefits of them but also some of the maybe the the best way to put it is some of the pitfalls of of some of those angles for data management. If we go to the next slide please. So first and foremost, I would say that a large majority or a large percentage of that underpinning physical storage will carry some layer of or capability of being able to leverage storage centric snapshots. Regardless of what platform we're running on but let's say about where we're looking at cloud native we're looking at Kubernetes again now. And this is basically leveraging the underpinning storage system with no hook into the application itself. So it's very similar to pulling the power out of your your desktop PC. And obviously when that comes back up it's going to look and feel exactly hopefully what it looked like as it went down but it's a very dirty way of being able to have a point in time crash consistent copy of that data. Now, depending on your data service that might be sufficient that might be enough to have a really fast recovery point if your if your application can withstand that that that process of being able to take that point in time crash consistent copy. But obviously it's very dependent on the application and the file system of which your data is running. Now this is going to be the fastest option because literally it's going to be a copy of the the change blocks since the last snapshot. And it's going to give you a very fast way of being able to recover those blocks as fast as possible. But the biggest but on this is around it's on the same storage system as your production. So yes, it's a point in time copy it's crash consistent which is great if your application can withstand that but it's not going to give you any transaction transactional level or granularity of being able to recover that data. Plus if your failure scenario is your storage system is now no longer serving data, then your storage snapshots are also no longer storing data. So the word of warning here is if this is this could be maybe used in conjunction with some other methods. But if this is enough and maybe that's all you need to keep a day's worth of protection, because ultimately I don't need an offsite copy of this data because the data is only really used in one situation. This might be a valid way of being able to protect that workload. So if we go down to the the next one now. This is where it starts to get interesting and this is where when when pavan is talking about canister later this is really where the first hook comes from a from a canister point of view. So this is the same as what we just said about storage snapshots. However, now we're going to actually speak to the application and the data service and we're going to put a hook in there so that we're making this now application at least application consistent so that we're going to freeze and flush the data services layer. We're going to initiate then the storage layer snapshot that we just spoke about we're going to unfreeze the data services layer and then we're going to record that completion and the status that snapshot. Again, that's going to give us a really fast recovery point, but now we've got the added benefit of it being a little bit more consistent. However, we've still got the same problem that is on the same storage as production. So at this point, we probably want to start thinking about how we move that data away from that production storage system as well into a different media type so that we've got a copy away from that that stored that production storage system. So then if we start looking at, well, how do we do that? This is where we start thinking about the data service centric point of view and how we take a copy of that data and then potentially store that into maybe a repository such as object storage or NFS or a file based location just somewhere different to where our production workload resides. So this is then starting to focus on the database. It's starting to focus on the data service that we have within there. The key part to this and this comes down to the recovery is that this will have no dependency on the underpinning storage. Now, what this doesn't have is it can give you a level of complexity when it comes to recovery because as much as this is going to give you the database and the specific tool set that allows you to protect that data. So my SQL dump or a Postgres or PG dump or you name it, there's probably a built-in service to your data service that enables you to take a dump of that data. And with that, we've got a copy of that data. Now, when it comes to recover that, we've only got a copy of the database. We don't have the surrounding aspect to that application that maybe pulls upon or uses that data. And that's where that next layer comes in where we want to be application centric. And this is the focus about being able to capture everything under the application banner as it were. So the front end, the back end, but also the data service that we're wanting to be able to leverage and be able to restore from. And what this allows us to do is have that freedom of choice, but freedom of choice when it comes to recovery. I want to be able to use those fast application consistent snapshots if it makes sense. And the failure scenario doesn't involve an outage of my storage system. But if not, I want to be able to work through and have an understanding of what that whole application looks like. Especially when we look at Kubernetes where you could have hundreds of different pods and hundreds of different persistent volumes and claims around that that hold that important data. Now, this will give us that level of consistency and that flexibility of picking and choosing what we actually need to recover and the granularity around that. And I think I then summarized some of these bits in the next slide. So there's four options. There's actually one more that we could have gone into around a dirty read and around that aspect. But ultimately, it will depend on what that data service looks like as to where and what your data management strategy looks like for your workloads. If a storage centric snapshot approach is enough for that, and it's going to give you a copy of that data, a very fast recovery point, but you don't have the requirements to have that offsite or have that at least on a different media type away from that production, just in case the production storage was to fail. Then if it is an application that requires that post freeze and post store type operation, then we've got the ability to be able to look into that or is that the capability of the strategy that we need. Then if we do need to take it out of band and onto a different storage layer, then we can do that by being able to take a copy of that data and storing that in our object storage in our file system external from the production storage. And then a full blown overview of your whole application is really focusing on how do I protect the whole application at the same time to give us all of the options around flexible recovery. Now, that's hopefully bringing up the data management challenge that we have both from a wider platform point of view, but also from a Kubernetes point of view cloud native point of view. But I think what we should do now is maybe take another look into the need for application consistency. And with that, I'm going to hand this over to Paavan who's been heavily involved from a from a canister project point of view and he can explain a lot in a lot more detail than I can around what canister is what it does and how you can get involved in and what it does. So we just discussed like different layers of data protection and me and also different like shortcomings of crash consistent or storage centric snapshots. If you like go back and look at some of our slides before I think Michael had like a big butt with storage consistent snapshots. They don't really talk to the application or are not aware of what's happening in the application itself. So when you think about those things, the need for application consistent data management arises. And it could also be other requirements like we have some data services running and we want to use the internal tools that the data service provides like my SQL dump or PG dump, etc. If we want to use those and or we have external data services like Amazon RDS. We want to protect those or even if our data service is in the form of an operator, most likely the data management comes in the form of CRs or custom Kubernetes custom resources. So we would be able to protect those as well. And at the same time, we I think we discussed about some of the hooks that we can have to freeze and unfreeze the data services. And finally, we could also be requiring. I think advanced scenarios where we need to, I mean, an example here is MongoDB secondaries if we have replica set with multiple nodes and multiple clusters here we would want to take back up of secondaries and stuff like that. So these are some of the needs that we have learned while developing canister. And apart from that, the I think the protection workflows are also complex. So different folks in working on, let's say different sides of things where you have Kubernetes cluster admins and application developers or database administrators. They all have the same requirements of protecting the application, but they don't have the same expertise. So cluster admin may not know the internal workings of a database always. How do you generally put together all these concerns and have like a single way of protecting all kinds of applications. So they are, I mean, these are some of the complex workflows. Then once you have figured out how to protect an application, then you have different moving parts like in terms of infrastructure. You could be using an object store or you could be using a vendor targets or a file storage for your backups. Then again, we spoke about types of backups. Someone may want to use like logical dumps or logical backups of the data service or someone else would want to use volume snapshots. While also doing that, we would want to handle the lifecycle of an application. What if the workload is up or it's down during the backup? When do we need it to be running and when do we need it to be frozen or scaled down in terms of Kubernetes workloads? So bringing this all together, like if we think about all these requirements and the workflows, we came up with canister to kind of put all these together in one particular framework to allow all different users and different goals to be accomplished using like a single mechanism. As it says here, if we want to capture different requirements from different experts across the infra team or the developers or the database admins, we want to provide a common way to perform backup and recovery tasks across these teams and also be able to share their workflows with each other and extend them in case they require that. So bringing this all together, I think canister is a tool that allows these things to work seamlessly in Kubernetes way or in a standardized API could be used to do these things. So let's actually move into canister and discuss more about canister. What is canister? It's a framework for application level data management. It's mostly made up of four main components, the canister controller, blueprints, action sets and profiles. So the canister controller is nothing but it's based on Kubernetes operator pattern. It's mostly responsible for the state management of these custom resources that we have here, the blueprints, action sets and profiles. And a blueprint is like we discussed, it kind of defines the workflows for your backup, restore or delete operations. It could be other operations as well, which we'll see later. But mostly if we want to define backup workflows for a particular data service or a particular workload on Kubernetes, blueprints are used for that. Now, once we have operations and workflows defined in blueprints, we use action sets to run those actions or mostly to inform the canister controller on which action to run and from which blueprint on, let's say on which workload and stuff like that. Now finally, we also have profiles. These are mostly used to define target destination for our backups or they could also be used to define the sources for our restore operations. So apart from the components that we discussed just now, we also provide a couple of tools or command line tools along with canister. So cancuttle is a small tool that we can use to create the CRs that we discussed. Mostly action sets and profile CRs can be created using cancuttle. Now, we also have this tool called can do. This is mostly used to move data to and from an object store location. So it's generally used inside blueprints and requires like a container, a specific container called canister tools to run this. So we have seen all these different components. We can actually go through some examples and dive deep into some of these specific components. Now, what we see here is an example blueprint. It's a simple blueprint. I haven't added very complex workflows here. So as I described earlier blueprint is used to tell the canister controller how to backup or restore an application. Now, this is done through actions and these actions actually contain one or more phases. And as we see here, each phase can have like a canister function. It is a primitive that we use to execute, let's say backup, sorry, bash scripts or shell scripts, or it could also be used to take volume snapshots and stuff like that. I'll cover canister functions in some time, but let's go through this example here. What we see is a MongoDB blueprint and the main action shown here is a backup action. And here the output artifacts are mostly used to store state from a backup action. So once we execute any backup action, we would want to store some state. In terms of most of these data service backups that we have here, it would most likely be a path in from our object store. So the path we see here is the path that we can find inside our object store bucket. I'll cover how we can provide that, but that is what we are storing here. And the phases we see here, there is a single phase and the function, the canister function called cooped task is used here. Now the function itself spawns a pod in the namespace that is provided here and with a container of image that is also provided here and finally executes the bash command that we have provided. So the command here, you can see we are using MongoDump here to capture the data from the MongoDB replica set. So this is a simple blueprint. Now once we have the blueprint defined, how do you tell the controller that we want to execute a particular action from a blueprint? So that's when we deploy an action set. Now again, if you see the spec of the action set, it contains mostly details about what action to run from which blueprint or the subject for the action. The subjects are mostly Kubernetes resources and then finally we also have a profile that can act as a source or destination for that operation. So here in this example, we are selecting the backup action from the blueprint which we just saw, the MongoDB blueprint. And then we are selecting a resource to run the action on. That is the stateful set of MongoDB replica set and it's assuming that it's also deployed in the namespace MongoDB. Now the profile example profile is being used. So once this action set is submitted to the canister controller and the action itself is executed, the controller then sets the status section of the action set. It updates whatever information we have provided in the output artifacts in the blueprint if you remember that. It stores that artifact value there. And at the same time, it also kind of shows us the progress of each of the phases from the blueprint. We'll see that when we actually see a live demo. But we do get a constant view of the status in the system. So if a particular phase is in progress or it's completed for every change in state, we see an update in the action set. Moving on to a profile. We just saw how a profile is used to define target location for our operation. But what does it contain? So if we look at this example, the profile itself contains two main components. First, an object store location. In this case, it's an S3 compliant or an Amazon S3 bucket called canister backup. Then once we have this bucket, now we would need the credentials also to communicate with that bucket. So that's where the credentials comes in. So there are a few different ways of providing credentials, but the one that I have used here is called a key pair. It's selecting the IDs that we ID field and the secret field that we have provided from the secret reference that we can see here. So in this case, it's actually taking the credentials from the example key ID to the example secret access key that you would be able to find in the example secret. So if we go and dig into the secret, we would see those fields and the value set. So it's kind of a secure way to provide credentials so that they are not exposed anywhere. So now that we have seen all the different components, they interact with each other during the execution of a particular action. So assume a database workload, a blueprint and a canister controller are already deployed on a particular cluster. So how do we back up this particular database workload? The first thing we would do is create an action set. Like we saw in the example, the action set should be defining the action from this blueprint. It should select this blueprint and let's say a backup action from this blueprint. And it also needs to provide the database workload as its subject and also select target destination if required. Now, once the action set is created, the controller which is constantly watching or polling for creation of action sets looks at it and finds all the actions and the blueprint that we have provided there. And it goes and fetches the action from that blueprint. Just as we saw in the blueprint example, maybe I can go back to it for a bit. So here, one more thing we saw was that the namespace is provided as a go template. This is actually a way to kind of generalize a blueprint. We can have a single blueprint and we can use that blueprint across different objects in the cluster. So if we have multiple deployments of MongoDB replica set in the cluster, the same blueprint can be used for that. So like I said, it goes and fetches this whatever action the action set provides and it kind of renders all these go templates. We would want the values from the object that is being used there. So the controller goes and does all these things, fetches the actual action and then it uses the canister function that we have provided in the blueprint to interact with the database workload. The function actually determines the steps or the commands necessary to perform a backup for this database workload. And once we have executed those commands or if it's taking a volume snapshot once we have done that, then the function can also decide whether it wants to store this data in an external S3 or an object store bucket. So that is determined by the profile provided in the action set. So once all these things are executed, the data is moved out of the cluster into the object store. The controller then comes back and sets the status on the action set that we saw in the example. So the status will then have the location information about the snapshot from the bucket and it also constantly updates each phase status and that's how like the whole workflow is once we have a successful action set, we would know that the backup has been successfully taken. So we have seen how canister works in theory. We can actually look at a live example and see how we can use canister to protect MongoDB replica set on a live cluster. Let me actually share my screen. So let's start with looking at the cluster that I have. So I have a GKE cluster with the 121 Kubernetes version and I've also deployed MongoDB here in the namespace Mongo. Let's check if everything is running fine there. So things are running fine here. What we can do is I have not added any data here. So we could go ahead and add some data there. What I'll do is I'll execute or run Google Excel except into the pod that we see here and I think it should have a Mongo client there which I'll use to execute or add some data there. So I'm creating a database with some restaurant entries here. So I have added some four entries into the database and we can actually confirm whether all of them got added. So we see four entries here. So the database is set up with some data and it's actually running on this cluster. Now what we can do is actually see how simple it could be to deploy Canister and protect this database. I'll create a new namespace where I can deploy Canister. So the namespace got created. Now if we see Canister documentation, we do provide commands to deploy Canister using Helm. I'm actually copying the command from there and let's just install that. Can we just zoom in a touch, mate? Just so we can see it a little bit clearer. This is better. Maybe go one notch first. Yeah, that might be better. So we just installed Canister. You can check if all the pods are up there. So the controller is running in this Canister namespace. Next thing I would do is just install the tools, the CanCuttle and CanDoTool that I talked about earlier. There is a simple command to actually just install these. I'm going to run that and let's see. So I think we have those installed now. Let's confirm that. And yeah, I provided this O68.O, which is our most recent release. So we have CanCuttle from that version. Now, one thing we talked about was a profile or a destination for these backups. What I'll do is create a profile with my S3 credentials. I have already set up a bucket, so it should work. Yeah, so the profile got created and the secret we see here is nothing but the secret where our credentials are stored and the profile just references that secret. So we have that. We can just confirm. So what we see here is that it's referring to this secret that got created and the bucket that I have created for this demo. So now we have the profile set up. We have the controller set up. The next thing would be the MongoDB blueprint. This blueprint is one of the many examples that we have in our repository or you could find them in canister docs as well. I'm going to directly use that blueprint from our GitHub. So let's create that and add that to canister namespace. So it's called MongoDB blueprint. And actually check what phases it has. So the one I showed before was a very simpler or a simple version of the same blueprint. So we see here that it has backup action and again similar to what we saw in the example we have an output artifact and under the phases we can see that the cooped task is the phase itself is called take consistent backup. It's using MongoDump here. Once the MongoDump creates a snapshot of the database we are using actually the can do command to push it to an object store. So can do has this sub command called location push which takes that data from the MongoDump and just streams it to the S3 location that we have provided here. So there's also delete action here in the blueprint. This is nothing but it can be used to delete your the snapshots that you have taken in the backup phase. There's one thing that both restore and delete have in common that is the input artifact. So the artifact that we created here actually stores the location of the backup. So we provide that to these two actions as a part of the input artifacts. So now we see the cloud object from here being passed into these phases or actions. Now again restore is using Mongo restore. It's using the opposite of it's just doing the opposite of what the backup phase did using can do location pull to stream data back and then using the Mongo restore to restore that data back into the replica set. So this is mostly the blueprint. Now I think we have everything set up. How do we protect the data that we have in the MongoTV? So as we discussed earlier, we need to create an action set point to the backup action from the MongoDB blueprint. So before that, let me just confirm the profile that we have. So I'll be using the can cut a tool to create an action set. So it has this command called create and we are providing action set as the option there. Now I can select the action. I can select which namespace the action set should be deployed in and I'm also selecting the blueprint here and this stateful set acts as the subject for the blueprint action. So I selected my MongoDB replica set and finally we also have to provide a profile if required. In our case, we'll use the profile which we created earlier. So now the action set got created. What we can do here is use describe to see what is happening in that action set. So okay, cool. I think it looks like it's already done with the backup. So just to explore the action set, what we have here is the blueprint, the action, the object that we selected and the profile that we provided. Now, if you see the status section, it's actually complete now. If we see the state, it's completed. So it had like phases from the blueprint that are now complete. And we also add events at regular intervals to see how the action set is progressing. So in case these things were, I mean, if we had larger data and if the operations were taking longer, we would see each of these states getting updated one after the other. But in our case, since it was pretty quick, we weren't able to see that progress. But if you see all the events here, we have already updated and completed the action set. So now we have the data created or backed up. What we can do is actually verify. Let's see if this location has the file. So what I'll do is use S3 command to verify that from my location. This is the commander. Yep. So if we see here, this file just got created when we ran the action set. So now everything is set up. The backup is done. What I'll do is I'll simulate a failure or a disaster. Let me go back into the part that we had for Mongo. And I'm actually dropping or deleting all the tables that I created earlier. Okay, so we just deleted the data and verify once everything is gone. So it looks like the table is gone. Now, how do we recover this? So there is an easy way to create an action set actually to run the restore operation from the previous backup that we just took. So let's again use cancurtle tool to create that. And if you see here, what we have here is a from flag that you can provide. And let's take the backup name here. So what it does is takes the output artifacts from the previous action and provides that as input artifacts into the restore action. So let's create that. And again, we can check whether things are running fine. Cool. So we see here that it completed the phase from the restore action. It's called pull from blob store. Again, it's used as three profile that we provided in the backup and the same stateful set subject and the location which it got from the output artifacts from the backup. So now that we have actually recovered everything, we have to, I think, let's just verify the data once. Let's go back into our Mongo and verify if my tables are back. Yep. So we see the entries that we had just deleted above came back after the restore action. So, I mean, that was pretty much it. That was how we could just restore the Mongo in case of a disaster. It was as simple as that. Just once you have the blueprint, it's all about creating action set and executing these actions. There's also one more action that we have, the delete action that we saw in the blueprint. I can actually show that as well. So we just use the can cutter to create action set for action restore. This is very similar. In this case, we again use action as delete instead. Let's provide the profile and let's provide the from that we used, which is nothing but the backup action that created the output artifacts. So one more thing to notice here is that we run these delete operations. When we run these, we don't need a subject. So we don't really have to provide this Mongo DV as a subject. Instead, you can actually say, since it was a boot task that spins up a pod, you can select the namespace where that pod has to come up. That is provided by this flag here, which is namespace targets. So let's create that. Again, we can verify the status there. So that has completed the phase delete from Blobstore. Now, one thing we ran earlier was the AWS S3 command. We see now that the file is gone. So this is how we can actually, the delete operation is useful when we want to maintain a certain number of snapshots. And if you want to delete or retire some of those older snapshots, we can use these delete operations. So that was pretty much it. I think we saw how we used the canister to recover from our disaster on the MongoDB. One thing we noticed there was we used the pod task function. So that is one of the many functions that we have in canister. If we see here, we have a lot more functions and options available to be used in the Blueprints. For executing shell scripts or bash commands and stuff like that, we can use these three functions that we see here under custom logic. Kubexec is one of the more important ones here. This actually works as if you were running Kubectl exec on a particular pod and a container, but through a blueprint. So you can automate that process and provide a command to execute on a particular container or a pod. Now for resource lifecycle where we talked about earlier, if we want to scale up or down a particular workload, we have functions for that. Then we have functions that handle PVCs. Here we see backup data, restore data. These functions mostly go and perform volume or file system based operations. So we could mount the PVC on a particular pod and go into the PVC volume there and whatever file system it has underlined there, we could run some operations on that. Then there's also functions to automate the process of creating volume snapshots. These could also be used in a blueprint like with other restore and delete snapshot functions. Now we talked a bit about RDS. There are functions available right now in canister to create RDS snapshots, restore them or delete them. There's also one more function that we have where in case you want to move your data out of RDS into let's say some other provider where you have deployed Postgres, this function is actually helpful. You can take the data out and move it out into a non RDS Postgres as well. So those were some of the functions that we already have. I think this may not cover everything that we have, but most of them and these could all be used in Blueprints today and we can also find examples of how these are used. And to talk about some of the providers that we support in terms of object store, canister allows creation of profiles with these object store providers, mainly S3 Azure Blob and Google Cloud Storage. The point I have here S3 compliant, it just means we can have something like Minio or something that is compliant to S3 APIs. Then if we do want to take snapshots of entry providers or volumes that we have, I mean we spoke about storage and trick snapshots, there are helpers out there in canister. So canister can actually be used as an SDK and these helpers or the functions that we have for creation of snapshots or creation of volumes from snapshots, they can be used in whatever software you're using. I mean as long as it's Golang based, you can import canister and use these functions. So that was most of what I wanted to cover. Michael, you want to talk about some of the new features that we are coming, I mean that are coming in the near future. Yeah, I think awesome demo and really deep dive there, Pavan, really good. So one of the things that we're working on if they're not already in the project is different file storage destinations for different backups. Obviously if we're moving data from A to B, we want to be focusing on one security around encryption. Secondly, D-dupe and compression so that we can get data much more streamlined and efficient from A to B. And then also other data services seeing an increase of data service operators out there, one being Kate Sandra, but there's others out there as well that canister has the ability to start protecting as well. So they're either on the roadmap or recently added to canister. And I think as a takeaway, the next slide I think is just ha... Maybe let's go to the next one, Pavan. It's just easier to shout this one. I think from our point of view, all the slides will be available. My biggest ask is take a look at the project to see how it can help you. Feedback contributions, like raising issue, give us some ideas about where it could be used, where you're using it, spreading the word, but then also just understanding what data management tasks are out there and how and when to choose canister or to potentially look at other data management tools in that area. And I think with that, we can close out. Right. Thank you all so much. I have posted our public Slack channel for online programs in the chat. So if anyone wants to continue the conversation after this, feel free to hop into that and shoot out any other questions you have. Thank you so much, Michael and Pavan, for a great presentation. And unless there's anything else, we will see you all next time. All right. Thanks, everybody. Thank you both so much.