 Hi everyone, my name is Ivan Sim. I'm an open-source software engineer at Kaston. Today I'd like to share with you an open-source project by Kaston called Kennister. Kennister is a software that you can install on your Kubernetes clusters and use it to protect data persisted by your stateful applications. We're going to start off with an introduction into what is data protection. Then we're going to take a look at Kennister and what Kennister can do for you. Then we're going to dive straight into demo. After the demo, I will be sharing some project opportunities that some of you might find interesting and relevant. Some of you may be familiar with the data protection working group. That working group is part of the Kubernetes 6 storage. Last year the group published a white paper on data protection workflow. In the paper, the group defines data protection workflow as a process to protect valuable data and configuration of stateful applications running on Kubernetes. The result of the workflow is typically something called a backup. A backup is an artifact that you can use to restore your stateful applications back to the state or the checkpoint defined by the backup. The paper goes into more details with examples on why data protection is important and necessary for Kubernetes. So if data protection is something that you think about a lot, I encourage you to take a look at the white paper. So while I talk about data protection today, why bother? In 2021, CNCF survey reported that 64.69% of the respondents said that they are currently running stateful applications in containers in production Kubernetes. Considering the number of organizations running Kubernetes in production today, this is not a small number. Now if you are part of the 64.69%, you may have noticed that the data protection architecture, specifically toolings and solutions around backing and restoring data, hasn't been keeping in pace with the rest of the cloud native ecosystem, and we want to change that. The third reason why we want to talk about data protection today is as we all know, Kubernetes provides a set of common APIs for development, operation and security teams to manage resources and enforce policy efficiently. We believe that data protection toolings and solutions shouldn't be managed differently. They should all be part of the cloud native ecosystem. We have chances to talk to many of our users to try to understand their pain points around snapshotting and backing up data. The one thing that always comes back revolves around the different requirements, the different flavors and the different varieties of strategies and mechanisms to capture snapshots, backups and data dumps. For example, if you are someone who works closely with Cloud Provider API, you may already have scripts that capture snapshots of your virtual volumes. That's great, but snapshots captured at that volumes offers only crash-consistent backup. In other words, only data that has been returned to the disk is included in the snapshot. In memory data, pending transactions, they are not included as part of the snapshot. If you are someone who works closely with data services and databases, you may already have custom pre-imposed hooks that are run prior to the snapshot operation. Going further down the stack, you might depend on specific database utility tools such as MySQL Dump or PG Dump or Monco Dump to capture data dump of your databases. All of these tools require direct access into the container where the database processes are run. What about the application as a whole? The configurations and secrets that your microservices may depend on that are not part of your databases. Overall, there are way too many options and way too many moving parts. If you imagine yourself being a barista, it's like trying to have to manage and remember all these different recipes and ingredients that you are required to put together, the different coffee types that your customers may ask of you. This is where we hope canister can fit in. We are hoping that canister can bring some order and coherence into your data protection strategy. Canister is an open source project by Castin as I mentioned earlier. It is implemented as a Kubernetes controller. It exposes a set of cohesive APIs in the form of custom resource definitions. And user use these APIs to define and curate data protection workloads. Canister is also extensible. We have community members who have been publishing and sharing the blueprints for their data sources, as well as extending canister with custom canister functions. So terminologies like blueprints and canister functions, they will become clearer during the demo. Canister exposes three main APIs. At the highest level, we have blueprint. Blueprint is a template of collection of instructions. It tells the controller of canister how to perform specific actions on specific application. For example, I can have a blueprint that contains and defines backup and restore actions for my Postgres database. I can have another blueprint for my MongoDB, another blueprint for my Cassandra services. The second API is ActionSet. We can all think of ActionSet as this runtime trigger execution point or entry point that tells canister to go find a specific blueprint and execute a specific action within the blueprint and supply the function with a set of input parameters. The last API is Profile. Profile abstracts away the details around interacting with remote repository, specifically used to store backup artifacts. Some of you may be familiar with the 321 data backup rules. One of the rules says that at least one copy of your backup should be exported to a remote location. Profile abstracts away the details of working with those remote repositories. Let's go straight into the demo. We will be using an EKS cluster. I have pre-installed a Postgres database on the cluster. The part has a PVC and a PV attached to it and is backed by an actual EBS volume. On the right side of this diagram, you see that during the demo, I'll be using canister to manage the CSI volume snapshot and volume snapshot content and resources to create and manage the actual EBS snapshots. The flow of the demo is pretty simple. We're going to start off with using a tool called Cubester to discover and verify our volume provisioners and the different CSI storage solutions on our cluster. Then we're going to go ahead and deploy our Postgres footprint, and from there we'll dive into specific individual actions that first of all creates a volume snapshot for my Postgres database and then invoke the EBS API to list all the snapshots. Then finally to restore the volume snapshot to a new PVC. So let's switch over to my terminal. In this demo, I'll be using a script to automate the typing. Otherwise we are targeting a real live EKS cluster. The first command I want to run is to use kubectl to confirm that I am indeed working with the right EKS cluster. So great. Now we're going to use Cubester to discover and verify the volume provisioners and the CSI solutions on our clusters. Cubester is another open source project by Casting. And as some of you know, Kubernetes comes with a collection of entry volume provisioner. In parallel, there's also a collection of out of three CSI drivers. So Cubester was able to discover that my cluster has an EBS CSI driver installed. It also provides some helpful commands to allow me to run subsequent checks to confirm that my CSI driver is indeed healthy and working correctly. Cubester also discovers that there is an entry EBS volume provisioner that comes with my Kubernetes cluster. So for the rest of this demo, we will be using the EBS CSI driver to create volume snapshot and to restore the data. As I mentioned earlier, I have a Postgres database installed. So let's make sure it is up and running. This Postgres database is also preceded with some test data. And this is the data that we will be snapshotting and restoring in the rest of this demo. So in my database, I have a table called Sessions. And within Sessions, I have three rows of data. Great. So now let's go ahead and install Canister on my EKS cluster. So installing Canister is very straightforward. We use Helm. So Helm install is the command that we use. And commonly, Helm upgrade and Helm uninstall is all you need to manage Canister. Let's take a look at our controller part. Great. It's healthy and up and running. As you can see, it doesn't take long for the Canister operator to be ready. Just let's go ahead and deploy our Postgres Blueprint. So up till now, we haven't had a chance to look at what the Blueprint is. And we're going to do that right now. Fair warning, there will be quite a bit of YAML. So our Postgres Blueprint is called CSI-snapshots. And inside this Blueprint, we have three main actions. The first action we're going to look at is called Create Snapshot. So a Blueprint has a number of actions. And each action has a number of faces. What we are seeing here is that the Create Snapshot action is made out of two faces. The first face basically says, before creating the snapshot, set the database to read-only mode. And once the database is in read-only mode, call this built-in Create CSI Snapshot function. This is what we call a Canister function. It is backed by some Go code. And behind the scene, it just relies on the client Go SDK to make HTTP requests to the Kubernetes CSI volumes snapshot endpoint. In addition to faces, we also recently added something called a defer face. You can think of a defer face as that always run final face. So regardless of what happened to previous faces, this face will always be run at the end. If you're familiar with Go, this is very similar to the Go defer statement. So this defer face says, whatever happens previously, always reset the database back to read-right mode. So put things back to how we found it before. Now the next action I want to show in the same Blueprint is the Restore Snapshot action. See that the Restore Snapshot action is actually a lot simpler. It has only one face and it relies on this Restore CSI Snapshot Canister function. It is expecting a number of input parameters. So later I will show how we can use the output from the Create Snapshot action and feed that into the Restore Snapshot action as input parameters. The final action that we want to look at is called the Describe Snapshot. So in this face, in this action, I'm actually using the AWS CLI to communicate with the AWS EC2 endpoint and assets to list a set of snapshots that have just been created. The point of this action is to show that Canister worked well with Kubernetes API. It also works really well with Remote Cloud Provider API as long as the proper secrets and credentials are provided. So let's go ahead and create an action set. So as I mentioned earlier, an action set is that runtime execution trigger point or entry point. This is what the YAML of the Create Snapshot action set looks like. It's basically providing information on what action to run and where the blueprint is and then also a set of input parameters that will be fed into the Canister function. I'm going to go ahead and use kubectl to create this action set. Now we're going to watch the CSI Volume Snapshot and Volume Snapshot content resources being created. So by simply creating an action set for the Create Snapshot action, we can see that how Canister is able to talk to the Volume Snapshot APIs and create the resources accordingly. So it looks like our Volume Snapshot resources and Volume Snapshot content is ready. So Kubernetes thinks that our snapshots are ready. Is it really? So what we're going to do next is to utilize the Describe Snapshot action and use it to call the AWS API to confirm that we indeed have the EBS snapshots created in the cloud. So what we did here was that we parse the status of resource of the Describe Snapshot action for information that was returned from the AWS EBS API. As you can see, we get information around the actual Snapshot ID and the Volume ID. So great! Our EBS snapshots are truly created. And the last stage of this demo is to really take that newly created CSI Snapshot and restore it to a new Postgres database. We'll be using a tool called CanCTL. So instead of creating the restore action set YAML as some static YAML, we actually use a tool called CanCTL to read the output from the previous Create Snapshot action set and pipe that into KubeCuddle and use that to create our restore action set resource. So as you can see with that action set being applied, we now have a new PVC restore to the same namespace that my original Postgres database is running in. The status of this PVC will remain pending because, by default, the storage class that we are using has the binding policy set to wait for a first consumer. In other words, the PV will not be bounded to the PVC until a part, an application, is actually available and start consuming the PVC. So the rest of this demo is really to just install a new Postgres database to be managed by another stateful set workload and tell the stateful set to pick up the restore PVC instead of creating a new one. So this will take a couple of seconds for the new Postgres part to be ready. The first part here is the original part where we captured the Snapshot form and the second part here, as you can tell, is the restore Postgres part. Okay, it looks like it's ready. And the final command that we're going to run is really just to use KubeCuddle exact and exact into the restore Postgres database and make sure our sessions table have been restored with the three test data roles we have inserted earlier. And there it is. And with that, we have reached the end of our demo. As I mentioned earlier, Canister is an open source project. If backup and restore is something that you have to deal with a lot, I encourage you to check out Canister. Source code is on GitHub. Feel free to join us on Slack. I'm always there answering questions and talking to you about your use case. Join our community meeting. We meet every other Thursday at 6 in the hour at UTC. We have some exciting roadmap coming up and feedback and contributions are most welcome. Thank you so much for your time. Cheers.