 Hello and welcome. I'm Andy Goldstein. I'm Steve Chris and we are here to talk to you about disaster recovery for your Kubernetes clusters A little bit about ourselves. I'm a staff systems engineer at Heptio. I have been programming for a long time I started with Commodore 64 basic and I'm now up using go and I've been a Kubernetes contributor since 2014 And I am the Heptio Arc team lead And I'm Steve Chris. I'm a senior systems engineer at Heptio. I work on Heptio Arc with Andy and I've been in the past a contributor to upstream Kubernetes and also a member of the release team and in a past life I was an enterprise IT engineer So I certainly have some experience with the challenges of designing implementing and testing dr strategies All right. Can I get a show of hands? How many of you manage clusters in production? All right. And how many of you have a disaster recovery strategy? Excellent. And how many of you have actually used it to recover from something? Okay, a few of you. So let's talk about what is it that you want in your IT infrastructure? What you want is a collection of servers and services and applications that are all running perfectly well You've got your monitoring in place and all of your checkmarks are green because everything is running great But before we get to that, so you want to sleep soundly at night because everything is running correctly But what actually happens? Well at some point no matter how good your infrastructure is no matter how excellent your network is Something is going to go wrong. You'll probably get a page or a phone call or a Slack notification And you're going to have to deal with it. And in the short term you're probably not going to be very happy But that's okay because we're going to give you some ideas around how you can do disaster recovery for your Kubernetes clusters So the first thing you need to do is probably do some rebuilding and some of the tools that we're going to present today hopefully will help you with that So before we get into talking about DR for Kubernetes, I want to do a quick review of what DR might look like in a more traditional IT setting So in the old world we had a pretty strong correspondence between an app and a server And so typically we would deploy a single app onto a single server Now that application might be made up of multiple components and the server might be virtual, it might be physical But regardless there's a very strong correspondence between the application and the server And so if we ever had a disaster and we needed to recover the service for the application We need to be able to bring back the server with all of the same software configuration and data as it had before And so typically the way that we would do this is we would take full backups of our server on a regular basis, a nightly basis typically And if we ever had a disaster and the server went down we'd do a full restore from our backup Bring up a new server that was essentially identical to the old one and this would enable us to restore the service for our application In the new world with Kubernetes things look a little bit different than that And so now when you're running a Kubernetes cluster you don't just have one server You have one or more masters which are running the Kubernetes control plane You have many nodes which are running again some Kubernetes components as well as all of your containerized workloads And then you also have an SED cluster which may be running in or outside of your Kubernetes cluster But this is actually storing all of your Kubernetes state information And so let's take a look at what's inside each of those a little bit more So within the master we have first of all the Kubernetes API server And this is the entry point for creating or fetching information about Kubernetes state We have a scheduler which is responsible for deciding which nodes pods should run on We have a controller manager which is going to be running the core control loops To constantly push the state of the Kubernetes cluster towards the desired state And then we have SED which is our persistent store of state information for Kubernetes And then we also potentially have some CNIPods and a kube proxy Which are going to help with all the networking and communication concerns And then on the nodes we also have kube proxy and the CNIPods for networking Beyond that we have all of your containerized workloads in the form of pods And so as we start to think about how to design a DR strategy For this new Kubernetes environment We really need to think about where is the state within this environment Which components are stateful and which are stateless And so if we think about where state is, it's really in two places within the system And the first is obviously SED SED is the persistent store of all of the Kubernetes state information Contains all the specs for your deployments, your services, your config maps And your secrets, etc And the second is in your persistent volumes for your applications So if you have workloads that are using volumes to store persistent data We obviously have a lot of state here And so these are really the key components that we need to focus on And make sure that we have robust backup strategies so that we can restore this data In the case of a failure But if we think about the masters and the nodes themselves that are running the core Kubernetes components They're really basically stateless And so this means that as long as we can quickly bring up new versions of those In the case of a failure We don't really need to restore from an exact copy of the previous version of them We can spin up a new cluster And as long as we can restore our SED data and our persistent volume data We'll be able to restore service to our applications So let's talk about master and node disaster recovery Like Steve said, they're basically stateless You may have some of these that are unhealthy Maybe they're running okay for the most part But you're getting some alerts You've got a disk that is flaky, a network card that's not performing correctly So there's some tools you can use to take these out of service Cube control has a couple of features Corden and drain that you'll probably want to add to your toolbox Corden allows you to mark a node as unschedulable So any new pods that are created will not be assigned to whatever node you've cordoned And drain goes one step further And will actually evict any pods that are running on a node that you're trying to take out of service And once you've done that, as Steve said You want to very quickly be able to provision a replacement master or node So how do you do that? Automate it Now unfortunately, we are not going to be able to tell you the one and only one way to automate Recovering and reprovisioning a master or a node or a cluster Because you all have opinions You have IT departments who say we're using Ansible Or we're using Chef or we're using Puppet So we can't tell you what to use But we are strongly encouraging you to automate the creation of masters, nodes and clusters One thing to keep in mind is that there is a teeny tiny amount of state that is necessary to preserve And those are the certificates that are used for the components in the cluster to talk to each other So when your cubelets talk to the master or to the API servers And when the controller manager talks to the API servers There typically are SSL certificates that are used And these things you want to maintain and retain and incorporate into your automation So that when you have your Ansible or your Chef or your Puppet And you're using that to automate provisioning all of these instances You want to bring your certificates with you so that you don't lose them If you do lose them, you have to regenerate them or get new ones And you potentially could have an outage in that situation And I do want to highlight that recovering the masters and the nodes is not really the crux of the disaster recovery problem Stateful data is what we're really here about So let's talk about etcd first There's a few different ways that you can approach disaster recovery for etcd The first two are similar At the block level, you could take a backup of the partition or the disk where the etcd data directory resides This is where all of your etcd state is And with something like this, if you lose one of your members If you have a highly available etcd cluster, and you definitely should If you restore from either the block device or at the file system level You just take a backup of the directory and restore it When your member comes back online It will get a delta of the data that happened in the cluster since it was offline So the surviving members will send a snapshot of what it needs to catch up And so your cluster can become whole again Another option is to use etcd control They have a great feature as part of the etcd3 API To be able to take a snapshot of your etcd database And at some point in the future, restore it This one you've got to be a little bit careful with though Because if you do a snapshot and restore When you restore it ends up creating a brand new etcd cluster So this effectively means you will have an outage if you go this route But it's a good tool to have in the event that you have a total outage You can certainly recover some of your etcd state this way Assuming you have backups And then the fourth option here, which is our favorite Is using Kubernetes itself to get the information out of the API server About what's running in the cluster So the API machinery special interest group spent a lot of time Building a discovery mechanism So you can go to your Kubernetes API server as a client And you can say, what are all of the API groups that exist And within each API group, what are all the resources that exist So you can look at the core API And you can see that there are pods and deployments Or pods and secrets, etc You can go to the apps API and see all the deployments And this is something that's very easy to write Just a loop, you can iterate through everything And say, tell me all the data that you have So what about persistent volumes? Because presumably if you've got stateful workloads in a Kubernetes cluster You probably are using persistent volumes for that Unfortunately I don't have a great answer here At least for a generic one Because some of your data might be in cloud providers Like specific persistent volumes, EBS volumes Azure managed disks, GCE persistent disks, etc So there's nothing in Kubernetes that right now Allows you to say, take a snapshot of my PV There's a proposed set of APIs to do that But they're not available yet So if you've got Kubernetes in production And you've got persistent volumes Maybe you've got some tooling that you've written to do it But unfortunately you can't rely on Kubernetes for that And there's other volume types as well Beyond just the cloud provider ones NFS volumes, anything that can come in from a flex volume So how do you back those up? Again, it tends to be roll your own But we have a better solution We'll get to that in a minute here with Steve Yeah, so I'd like to talk to you now About an open source tool we built Called HeptioArc And its purpose is to help with backing up Some of that stateful data that we've talked about Within your Kubernetes cluster So what exactly does HeptioArc do? So it has two core features And the first one is that it enables you to back up And restore your Kubernetes API objects Now Andy just talked to us about some of the different options For backing those up in terms of what's in at CD And we use in ARC the Kubernetes Discovery API For accessing all of that information And creating backups of it as well as restoring it In the case of a disaster And we do this for a few reasons And Andy started to talk about some of the pros and cons there But one of the reasons we think the Discovery API Makes a lot of sense is that If you're running in a managed Kubernetes provider You may not have access to the underlying at CD cluster And so using at CD CTL to take snapshots May not even be a feasible option for you Additionally, ARC And using the Discovery API You have a fine-grained control over What types of resources you back up So with at CD backups it's really kind of all or nothing And if you want to restore your cluster You basically have to restore the state of the entire cluster If you're using the API though You have all of the controls that it provides you In terms of filtering by namespace Filtering by resource types Filtering by label selectors And so we enable this all through ARC And additionally If you are backing up at CD directly You don't get the benefit of Being able to capture all of the information That is stored for extension API mechanisms So if you have an extension API server As part of your cluster Odds are that the data to support that Is actually stored in a separate at CD cluster This is the recommendation for how to design These extension mechanisms And so if you're backing up at CD You're not going to capture that information But if you use the Discovery API To gather that information about extension API servers And so you can just back that information up Directly into your ARC backup Or whatever other backup mechanism you have And so we believe that the Discovery API Makes a lot of sense here for accessing that information So ARC uses the Discovery API To pull that all out of your cluster And it creates a tar ball That stores all this information And places the backup in the object storage system Of your choice The second big feature that ARC has Is that it will actually back up And restore your persistent volumes for you Assuming you're on one of the Supported cloud provider platforms And as Andy mentioned a minute ago We use the snapshot APIs That the cloud providers offer For taking backups of volumes ARC out of the box supports the Three major public clouds But as we'll see in a minute We also have an easy way to extend The functionality of ARC to support So as long as there's an API For you to take backups ARC can easily integrate with that Now beyond those two big features We have a number of other features That make it really easy for you to use So we support scheduled backups So rather than having to go manually Create a backup You can simply configure the information You'd like to back up through ARC Set a schedule and have those run On an automated basis Over time So here we go We support complex filtering Both when you take a backup of information As well as when you do a restore Of that information back into a cluster So you can filter based on The namespaces you want to back up Based on the resource types And based on label selectors And so often we see that users Will take a backup of their entire cluster So that they have all of the information And when they go to do a restore They may do it on a namespace They may only restore Components that match a certain label selector And so this gives you a lot of control Over how you recover the information Into your target cluster Additionally, we give you the ability To restore into different namespaces Than you backed up from And so this is really useful For use cases where maybe you have An existing namespace And you want to create a clone of it Maybe for testing purposes So that you can fiddle with some configuration And have other use cases that require You to change the namespace Arc makes that really easy to do Now we also designed Arc to be very extensible We recognize that we can't meet Everyone's needs out of the box And so we want to give users the ability To extend Arc to meet their needs And so the first of these mechanisms Is what's called hooks And hooks are basically a way For you as the user of Arc To define scripts that need to be run Within your pods Really before or immediately after Backing up those pods And so a great example of this Is if you have a pod that's running That's using a persistent volume And prior to executing a backup Of that volume you actually need to freeze The file system to ensure you get a consistent backup Arc makes it really easy to plug in An FSFreeze command before the backup And similarly an unfreeze command Right after the backup The second major way that we allow You to extend Arc is through what's called plugins And so there are sort of two major categories Of plugins that we support currently The first one has to do With cloud providers And so there are kind of two Core cloud provider APIs that Arc relies on The first one is object storage Which is where we actually store The tar ball that contains All of your etcd data And the second one is block storage And this is what allows you to take Snapshots of your volumes And restore them later on We have a plugin model which allows you To define your own implementations For both of these and to very easily Plug it into the Arc server at runtime So that you can extend Arc to run On your platform of choice And this doesn't require you to submit PRs to the core Arc codebase It doesn't require you to recompile Or maintain your own container images The second major category Of plugins is what are called Item actions And we support these on both backup And so these are little bits Of functionality that run As each item is being backed up or restored They're different from hooks in that They're not actually Scripts that are being executed within your pods They're being run by the Arc server And they allow you to Potentially call out to external systems To take certain actions Or they allow you to actually mutate the item That you're backing up or restoring So if you need to add some annotations To items as you're backing them up Or variables or maybe you want to actually modify The spec as you're restoring Your backup into a new cluster We make it really easy for you to plug in Your own logic to do this Alright, so we have a demo Hopefully the demo gods are with us today Okay So Run our script here So this is all live So the first thing we're going to do Is show you what namespaces we have And we have typical ones you'd see We also have Heptio Arc, which is where Arc is running And we are using Rook For dynamic provisioning For persistent volumes today So we're going to start By deploying a simple Nginx application And you'll see that this creates a namespace A pvc, a deployment and a service So if we take a look At what the pvc looks like This is a Rook block Storage class and it is bound And we are going to be storing the logs To Nginx in this persistent volume So here is the pvc Similarly you'll see that it's bound To Nginx logs And if we take a look at the deployment We want one replica And we have one running And here is the pod So everything deployed Great for us here And we're going to go ahead And take a look at this service So we see it's got a cluster IP So let's go ahead and talk to Nginx It looks pretty straight forward So the next thing we're going to do Here is hit it 10 more times Just to get some extra traffic in the logs And we'll go ahead And exec into the Nginx container So that we can see We've got a couple files in here Access log, about a kilobyte Nothing in the error log yet And now let's actually Look at this access log So we're going to exec into the container And take a look at that file Pre-vanilla access log We've got the initial request that we made And then the 10 after So let's create a backup It's this simple You just say arc backup create Give it a name and then whatever filters you want In this case we're only going to select The Nginx namespace And it's done So we have an arc backup The data is available in object storage For this demo we're using Minio deployed into the cluster But in a real world scenario You probably would want to have your data backed up Outside of your cluster So it's time for a disaster We're going to go ahead and delete The Nginx namespace This will delete all of the components that we just deployed Including the persistent volume That was dynamically provisioned One of the great things about arc Is that it can walk from the pod To the persistent volume claim To the persistent volume To figure out that there is a relationship between 3 and make sure that we back up Everything that we need to be backing up Alrighty So our namespace has been deleted here I will prove that to you So you can see we do not have An Nginx namespace anymore And just to show you that there's no longer A pv anymore That is gone, we can't find it So Let's go ahead and use arc To restore the backup that we just took And while this is happening I will say that when the backup was going What Steve described about doing an FS freeze before and after The snapshot was taken was exactly What we had arc do today So our restore is Done Let's go ahead and take a look We have a pvc, it's bound Using the rook block storage class again And We have pv similarly So this is just going to show Everything that we had before The individual names for anything that has A generated name like the Pods for example This has a different name than it did before And everything is running Fantastic Let's go to the service, this is a different IP Than we had before And we'll go ahead and Take a look at that file system And again this is the log file system From the persistent volume That arc restored Still has about a kilobyte That is wonderful Let's go ahead and take a look at that file It's all of our data, so we have not lost Anything And just to show that we can Augment it, we'll go ahead and curl it Another time and take a look At the file size And the file one more time So you'll see that 1045 Has gone up to 1140 And if we take a look at the file one last time You'll see That we had a series of Quests from 20 past the hour And then the last one from 23 minutes So our backup was successful Our restore was Successful and we were able to continue Using the data that was in the volume that we Recovered And that's the end of the demo Let me get this back There we go Great, thanks Andy for the demo I'd like to say please come join us In the Arc community It's a completely open source We have a number of external contributors Who have been working on Arc since the initial release And so we'd love to have you come join us Whether it's to provide feedback on Arc Whether it's to provide real world Use cases that you're using it for Or whether you'd like to add features yourself Please come talk to us So we're easily accessible through GitHub Or through Slack, we have a Slack channel In the Kubernetes org We have a Google group if you'd like to subscribe For release notifications So please come join us And we really are looking for your input We have So many ideas about Backup and recovery But I'm sure you have more And specific needs So please do come and find us Whether it's today or next week, next month We would appreciate the input And at this point, if anyone has any questions We would be happy to answer them Why don't you come up to the mics I think everyone will be able to hear That would be great Sure, go ahead So in this case, if you restore Back a copy That doesn't include data about A pod that happened To survive whatever outage You had What happens to that pod? So is the question If the backup didn't include the pod And then there was a restore? Yeah, exactly Best effort for what happens If you are expecting that pod to be running And it's not in your backup Then it certainly won't be in your recovery So You just need to be careful with How you spec out your backups And make sure that Your label selector is appropriate To match your pods and whatever else you need Or you don't use label selectors And you just say I want to back up everything Okay, cool, thanks Does that answer your question? Sort of A specific question Like if there are pods running on your Machines that Kubernetes manages Or rather just containers That aren't part of pods That Kubernetes knows about Will Kubernetes kill those Or let them continue Or like So Kubernetes will not touch Any containers that it's not responsible For Similarly if you have containers That you are running manually Or an ARC backup and an ARC restore We don't know anything about those containers Because Kubernetes doesn't Okay, it doesn't like wipe and recreate No, the cubelet will leave them untouched Cool, thanks Take one over here What's the Appropriate way to monitor Whether a backup was successful or failed Good question So we have logs That are stored per backup So If you're doing a backup You'll see that it's in progress And when the backup has completed or failed You'll be able to retrieve those logs And see what the problems were Is there any way you can add a status hook Or something to just say Just call this service Webhook script whatever to be like This didn't work and I just want you to know That's a really good idea and it's actually in line With something we are planning on doing Which is in addition to the pod hooks That Steve mentioned we do want to have A simple hook so when it starts Send out a web hook when it finishes Whether it was successful Or a failure send out a notification as well I don't care if it's successful Just that it's done The other thing to note is that Backups, restores and many of the other Arc concepts are CRDs Within Kubernetes And so you always have the option To write a watch on the CRDs Themselves and look for failures In the status Cool, thank you So great presentation In the multi-cluster We're looking at Disaster recovery use cases And trying to figure out how to do that I was just curious whether you'd seen anybody Do this particular scenario where You have like a primary cluster And a secondary cluster And something that monitors if the primary Cluster goes down and then Huses Arc to Basically Launch what was running on the primary On the secondary That's a very good question I don't think we've heard Any specific requests around that But one of the things that is on our roadmap Is being able to Take a backup that was in Say one region If you're on a cloud And be able to migrate that over To a different region and restore over there So that potentially could play into What you're looking to do And definitely fit into that picture In terms of monitoring and automating Moving the data From cluster to cluster as needed And I would also add to that That because Because Arc uses CRDs You always have the option to Write a layer yourself on top of Arc That's monitoring the health of your primary cluster And if there is a disaster You can write code to basically create Or restore CRD in your secondary cluster And automatically restore Objects into that So that's something you can very easily do Around Arc Thanks, it's a very helpful building block I'm wondering What are Pre-conditions that might cause Arc to fail The guy asked a question before I gave me an idea What if, let's say You tore down a namespace Or you lost a namespace that you had backed up And you wanted to restore it So if you had a namespace up And maybe had some of the resources Created already What would happen if I did a restore? Sure, so what Arc does right now Is it will try to create every single object That you've specified as part of the restore And if it encounters any conflicts It logs it as a warning Or it puts it in the status as a warning Right now, so it's very visible When you see that the restore has completed It'll tell you if there were any errors Which would be catastrophic Such as there was a conflict At the present, we just Record that fact So if there's something that's already pre-existing In the namespace, we won't touch it In the future, what we'd like to do Is make it pluggable so that you can say On a conflict Here is custom logic to run to make the decision Do I patch what's in there Already with what I have Do I accept what's already there Or do I replace what's there What came from the backup Are there any types of resources that don't behave So well than when they're restored From a backup as if compared to Just being created Yes, the one that I can think of Off the top of my head is load balancer services Those depend on the UID of the service And that's not a field that is Something you can mutate or set It's set by the API server itself So if you have a load balancer API service Tied to say an Amazon ELB If you take a backup and you do a restore You're going to get a different one Unfortunately And hopefully we can work with the community To see if we can solve that How are you typing so fast? So I've got to thank Joe Beda For finding a script on GitHub I think it's called demo magic Where All of that was a real demo I just was hitting enter to get it to type for me Okay, thanks Thank you I'd like to know The performance of the backup And do we Backup that data or the Full backup every time So we'll go as fast as the API server will let us Right now We aren't setting the QPS on the Client so I believe the default is About five requests a second Which certainly could be slow If you have a lot of data We do plan to make that configurable Where does the PV snapshots go It's as fast as Your cloud provider Or whatever you're using can do the snapshots Over here So just to Tie into the last two comments It would be great to see it run faster Actually because it takes a couple of hours To do a restore in the event of a Major outage it'll be pretty Difficult for us Our RCD size is about 850 megs Two to three thousand And a whole ton of config maps that Helm leaves behind And the other thing is so what are we Doing about the load balancers Because that's a major impediment To restore I have been involved in a little bit Of discussion about that with the community I honestly don't know Where it currently stands But we will be following up with that Thanks So my question is related to previous ones How well does it interface With other things that are managing resources Like if I do a helm deploy It's kind of keeping track of what resources Are part of the chart And something horrible happens Everything goes away I restore using arc If I do another helm deploy Will it pick up correctly Or we'll try to start a whole new thing I'm not intimately familiar with helm But the way that our backups And restores generally work Is we Backup the majority of the object We may strip off status For example And then most of our objects We restore as is There's a couple of exceptions here and there So if there are certain pieces Of data that you need That we're accidentally stripping off Or not on purpose Then please file an issue If you find problems and we'll correct them Thanks I think we have time for one more over here So to answer the question earlier for load balancers There's actually an open PR To set the load balancer name Which cloud providers use to look up load balancers So then they would be able to restore Then and not rely on the UED For a name that's open So if you want to go comment on that PR We're trying to figure out a good way But also for a question for arc What about resources managed outside Of Kubernetes like DNS To have a hook for if you need to have A load balancer like right now To have DNS be able to update as well To have like an outside hook I think that's a great idea I would be happy to talk more About exactly where the hook would fit in And feel free to file a GitHub issue And then we can talk about it Thanks Thank you everyone I think we're about out of time Thanks everyone