 to do. My name is Prasad Ganga. I'll be talking about application never backups with canister and copia. About me, my name is Prasad Ganga. I work as a tech lead at infrastructure technology in Spune. My main interests are communities, go and open source. I'm also the maintainer of WordCube and canister besides work also like trekking and playing cricket. About today's session, I'll be talking about data management in general, the challenges we face and how canister framework helps overcoming those challenges and how canister helps protecting your application data on Kubernetes. So when we talk about data operations, undoubtedly data management or disaster recovery is one of the challenging problem we need to solve. When it comes to Kubernetes or applications on Kubernetes, the problem becomes more challenging because there are lots of moving parts. Your data management basically depends on the infrastructure you're using, the kind of application you have deployed. And if we talk about the general approach, approaches we see about data management. The first one is storage-centric snapshots where the underlying file system provides a way to snapshot the volume. It is crash-consistent, but obviously they don't care about if they don't basically interact with data services. So that's why there is second approach people follow, which is storage-centric along with data service works. For example, some application needs freeze and unfreeze data before you perform snapshot. So this can be, you know, some people follow storage-centric approach or snapshot approach with some hooks that allows them to freeze and unfreeze the data service. Third one is data-centric approach where we use database utilities like mass build-down, postgres-term to snapshot the data. And then there is application-centric where we use multiple strategies in collaboration to manage the data of application. So obviously there is no single approach which there is no single solution to this problem because the data management or the backups depends on a lot of factors like the infrastructure you're using, there are different provisioners, different types of application. Each application has their own way of data management. So even if you talk about just backups, there are different ways of taking back up. You can do volume snapshot, logical backups, provider-based API calls like RDS snapshot or you can call operator APIs to perform snapshot. Then application might have their specific concerns like application might need to scale up, scale down before and after the backup and restore. Again, freeze and unfreeze the data. And then your backup might have different target requirements like different types of object store. It could be vendor specific as well. So when you talk about protecting stateful application data on Kubernetes, there are a lot of things we need to consider and there won't be a single workflow which we can follow for all the apps. So ideally the ideal solution would be to have a framework that allows us to combine different approaches and build a workflow that can be executed to perform application level backups. So that's where Canister comes into the picture. It's an open source framework to manage data at application level. The way it is achieved is using blueprints. So Canister has something called blueprints which you can define to build workflow and then you can execute that workflow. We will talk about it more in details. So talking about Kubernetes framework, sorry, Canister framework components, there are four main components. One is Canister controller blueprint action set and profile. Excuse me. Canister controller is basically the custom controller responsible for managing performing operation based on the CR creation. So the different level of CRs which are involved in the Kubernetes Canister framework is blueprint action set and profile. Blueprint is basically where you define the workflow for backup and restore or delete operations. Action set is the basically trigger kind of to trigger the actions defined in the blueprint. Profile is the CR where you define destination for your backups or in case of restore, the source for your restores. And to manage all these CRs, there is the you obviously need a custom controller which is a Canister controller which take which performs some operations based on the CR creations. Then there are two tooling CLIs Canister provides. One is Can Cattle. Another one is Can do. Can Cattle is to, Can Cattle helps you creating the CRs like action set and profiles. Can do used within container to push your push and pull data from the object stores of your choice. All right. So this is how the blueprint looks like basically. So blueprint consists of actions, list of actions. So in this blueprint, this is the blueprint for MongoDB application. In the action you could see there is a backup action and then in each action there could be multiple phases. So in this case, we view on this one phase. Phase consists of function and arguments. So function, the Canister function defines how the commands or whatever, how the operation is going to take place. In case of cube task function, what Canister does is it runs a container with the given image and execute these commands inside that container. If you have requirement like you want to execute into a container and then execute command, you can use cube exact function. So there are list of Canister function depending on your use case you can use. We will talk about it more in the next few slides. But yeah, this is like in the each phase, you define how you want to perform those operations and you basically build the workflow. This is the example of action set CR. In action set, you basically, as I said, action set is kind of a trigger for the action defined in the blueprint. So in the action set, you define your reference blueprint and the action you want to run with defined within that blueprint. And you also pass the object reference on which the blueprint action will be performed and then profile. Profile holds information about the object store where you want to pull or push the backup data or pull the data from. You pass the profile reference and once the action set is created, Canister, you know, runs some operations and then based on the operation status, it updates the status in the action set status field. So in this case, you can see it has set some output artifacts, that is the path to which the backup artifacts are pushed. This is an example of profile. Profile holds the credentials and the object store information. Like in this case, we are using S3 compliant object store, object store of S3 compliant location type with bucket canister backup. And these are the credentials defined to interact with that bucket. Cool. All right, so this is how in theory, we have talked about how canister works. Now it's time for a demo. So we'll be showcasing how PostgreSQL application can be, you know, protected using canister. So, all right, so how this communities cluster running in which I have created a PostgreSQL namespace. And in PostgreSQL namespace, I have deployed a PostgreSQL application. These are the running ports. So what we'll do first, we'll try, we'll add some data into this database. So using kubectl exit, so we are under the pod. And now we'll try to create some database like let's create test database, add some data, and let us create a table named company and add few entries, add one more entry. Cool. So let's list down all the entries. Okay, so now we have two entries in the database, right? Now let's perform backup on PostgreSQL. So I have already installed canister operator in canister namespace and operators up and running. Next thing we'll create is the blueprint. So before creating blueprint, let's once go through the blueprint. This is a blueprint for protecting PostgreSQL application. Yeah, so if you go through the actions in actions, we are defined when backup action. And in action, there are multiple phases. In this case, there is single phase for backup action. In this phase, we are using kubectl function, that means it will run a new pod with this image. And we'll execute the commands defined here. So in the commands, you can see we are building the host name from the object past. And we are executing pg dump all command. And then we are using can do location push to push the dump to the object store. And then we are setting the output artifact, which is basically the path to which we have pushed the data. And then in restore phase, we are fetching the data from from the location we have defined during backup. And basically, then again, running psql command to restore restore the data. And in delete action, we are just deleting the dump push to the object store. Alright, so let's create the blueprint in the controller namespace. By control namespace, I mean the namespace in which we have defined, we have installed the canister operator. Alright, next step is to create a profile to specify the object store information. So it will verify if the past information is correct, if the bucket exists in that region. And we'll create the profile. Alright, next step is to perform backup. So for performing backup, we will be again, will be basically creating an action set. And if you go to the command, so we are specifying backup action from postgres bp blueprint, which we have created. And we are passing the postgres stateful set as a reference object or on which the action will be performed. And then the profile name, which is a three profile, some random code. So that is how we have created action set which will perform backup action. We can check the status using pupectl describe action set command. Alright, so in the events, you can see the status is complete. And if we see the artifacts, it's saying the backup has been pushed to this location. Let's quickly verify that. Okay, so we have this skill. Yeah. So the bucket we had used was canister demo. Inside that, you can see at this path, there is a file which to which the to which canister has pushed the data. Cool. Alright, so now we are done with backup. Now let's do disaster. Let's delete some delete the database we had just created. We will again execute, I will do cube exit. Sorry, cube cattle exit, execute cube cattle exit command. And then get the PSQL CLI. And then so we have, okay. Cool. So we have, we had created test database this just drop the database. Alright, now we no longer have this database. Cool. Let's do restore. So the forest or we have to find the restore point. That was the action set we had created for backup. Okay, so let's get the restore point. This is the restore point to which you want to restore our application. Okay, so we'll be creating again action set. But in the action set, we'll be passing restore action. Okay. So instead of passing whole information again, you can just refer the rest of the information like do print dot effects from the backup backup action set. So we'll use from argument will pass the reference to the backup action set. And that is how it will create restore action set. We can get the status using cube cattle describe action set. Cool. So the status is complete. Let's again go into the database pod and verify if the data is restored correctly. So ideally, we should see two entries in the company that company table. So first of all, we can see there's a table. So there is a database which is restored correctly. Let's connect to the database and try to this down the entries in the company table. Good. So you can see the data has been restored correctly with two entries as expected. So yeah, this is how using blueprint, you can define the backup restore workflow and then use action set to run the actions from the blueprint. All right, moving back. So let's let's see how this whole thing happened. So if there is a database workload, you want to protect data off. First thing you need to do is you need to define blueprint, you need to define workflow, how you perform, you want to perform backup and restore operations. And once you have canister control up and running blueprint created, you can use action set. Once you create action set, you define the action you want to run from that blueprint, then canister control will fetch the blueprint and action for that action from the blueprint, and we'll run that workflow. So we use canister functions to define how you want to perform those operations. And then again using can do we push the artifacts to object storage. And this is how once everything is done, the actions status is updated with the required information. Cool. So we talked about canister functions. There are different types of canister functions. You can use while building the workflow. That depends on your requirements. So if you want to execute some commands, add some custom logic, you can use cube exec or cube task function. If you want to scale down scale up or scale down the workloads, you can use the scale scale workload function. There are a few functions for pvc operations, like backup everything from the pvc restore data to and from the pvc. There are a few functions you can use for taking CSI volume snapshot, AWS RDS snapshots, can do also supports different types of object store. And then, yeah, different types of providers snapshots are also provider based snapshots are also supported by canister using specific canister functions. The complete list of canister functions can be found in the canister docs link. But I won't go in like in the to the whole list because there are lots of functions. Yeah, so moving back like how we push and put data from and to the object store, we use copia. We used to use the stick but recently we have switched to copia for all the object store related operation. The reason is, it's it's more secure and reliable. It's it provides different types of encryption algorithms. And the deduplication is very efficient. It's it's way faster than the restrict. It supports multiple compression algorithms. And basically have lesser meant to footprint. And it supports lots of object stores including S3, GCS, Azure buckets and all. So it's a it's way faster, reliable, secure than restrict. So we have we have like switch to copia for almost all the operations. So for now, the way you can enable copia for object store related communication, you just mentioned copia snapshot in the output artifact of the action. And you have to create canister sorry copia server with with the repository backend of your choice. It could be S3, GCS anything. And then when you create profile for your for for your action set, you specify the credentials of copia server instead of direct instead of you know, specifying the credentials for object store. So copia server acts as intermediate server between your object store and canister operations. And through profile, you will you can you will be communicating with copia server that also you know, provides you fine grain security configuration. Basically, instead of using the credentials of object store, you create a copia server and use copia servers credentials in the action set or profile to to trigger the operations. Let me give you an example how copia profile would look like. So this is how copia profile looks like you define the location will be of copia and you specify endpoint of the copia server. And then you specify the tls information and username password for for authentication with copia server. And then canister will use you know, will will push the artifacts to the copia server. All right. So yeah, I think we've already talked about these. So if you if you specify the copia snapshot in the artifact, you have to define the copia credentials in the profile, and then canister control will communicate with copia server to push the artifact and fetch the artifacts for backup and restore. All right. So as of now, the copia server creation part is manual. We are we are in the process of automating that. This is something you can expect in the future releases. The the new features new upcoming features in the future releases are we were trying to improve the user experience or blueprint author's experience to build the blueprint. We'll be adding more canister functions to to support the operative specific snapshot operations like k tessandra and other operator based databases. You can expect more more examples in the community blueprints. And the copia server creation, which is manual as of now. Few resources you can refer canister you can find all the canister docs, including the canister different types of canister function you can use for building blueprint at docs.canister.io. We have documented few sample blueprints that you can use as is or you can modify as per your requirement. They can be found on example directory in the canister GitHub repo. The copia repo copia docs are at copia your docs. And if you have any doubts, if you want to discuss anything, suggest anything, please feel free to raise issues on the canister canister GitHub repository. You can also join our slack slack workspace canister.slack.com. Feel free to you know, we'll be happy to help you if you have any doubts and any issues. All right. So yeah, that's all from my side. If you have anything, you can reach out to me on Twitter on LinkedIn. Yeah, thanks for having me. Thanks to all the organizers of KCD Chennai. Thanks a lot.