 I would like to thank everyone who is joining us today. Welcome to today's CNCF webinar, five key traits of effective disaster recovery on Kubernetes. I'm Suresh Narvade, platform engineer at Uswitch and cloud native ambassador. I'll be moderating today's webinar. We'd like to welcome our presenter today, Michael Piranti, VP of product marketing at Fortworks. So a few housekeeping items before we get started. During the webinar, you're not able to talk as an attendee. There is a Q&A box at the bottom of your screen. Please feel free to drop your questions in there and we'll get to as many as we can at the end. This is an official webinar of CNCF and such as subject to CNCF code of conduct. Please do not add anything to the chat or questions that would be in violation of the code of conduct. Basically, please be respectful of all of your fellow participants and presenters. So with that, I'll hand it over to Michael to kick off today's presentation. Great. Thank you, Suresh. Thank you for that introduction and thank you for having me. I'm really excited to present today. This is one of my favorite topics and you probably think I say that about all my topics, but I really do think that disaster recovery for Kubernetes is both interesting and in its own right, but also the fact that we're talking about it and we have so many people interested in this topic is really a huge statement of maturity of the overall Kubernetes ecosystem. Disaster recovery is not the first problem a company needs to solve in their cloud native journey, but it is an essential problem to solve before cloud native technologies and practices with an organization really become ubiquitous. So the fact that we're talking about this here means that a lot of people are either at that part of their journey or they know that they're going to need to get there eventually and they're looking for to understand the problem and to start thinking about solutions. This webinar today is not about any particular solution to disaster recovery. There are many on the market and I'll share at the very end of the webinar a little bit about what FortWorks does in that space, but today's webinar is really about understanding what are the contours or what are the requirements for effective disaster recovery for Kubernetes applications and that's very distinct from DR for traditional applications or applications that run in VMs even if those happen to be in kind of modern environments like the cloud. So really excited to talk about all that today. A little bit of background on me, I'll go through this really really quickly. I've been, when I'm realizing I'm wearing the same sweater as in my headshot, people have told me before that I look exactly like my headshot and I guess in that case that is true. I gotta make sure to change my sweater. This is my second company working around data management and persistent storage for containers and my first company started actually before Kubernetes was released from Google back when it was still called Borg. So I've been thinking about these problems for a long time and I've been working with a lot of companies over a number of years as they've implemented those solutions and so a lot of what I'll talk about today is based on those concrete learnings from working with real customers and happy to share that with you and some of those lessons learned. I'll start with a little bit of a meta commentary on how we got here because Kubernetes is a key part of the Kubernetes ecosystem and it will continue to be for years to come but I think some of the trends that we're seeing are really bigger than Kubernetes and Kubernetes was released by Google and prior to that kind of its early forebearer was created by Google because there was the shift happening and as the old saying goes the future is already here it's just not widely distributed yet. What companies like Google and others started to see is that the traditional way of managing an enterprise data center was not up to the task of running truly large-scale, highly dense, multi-tenant applications like you had with Google search or YouTube or other large SaaS applications that were doing similar things to Google in the early days and that traditional way of running an enterprise data center was really organized around the virtual machine and we call this a machine-defined control plane, meaning the thing that is most important in our architecture in our data center is the virtual machine and that could be within VMware, great name for a company if you think the VM is the most important part, VMware. But even in the cloud on Amazon, Google, Azure, that prime object is also the virtual machine. The cloud is just the data center that's been virtualized and provided as a service and in making the VM, the center of our architecture is what we did is we figured out ways to manage those individual VMs. We would back the VM up, we would migrate the VM, we would provide data security by encrypting the VM and then eventually we would run applications on top of those VMs but then if we wanted to back up the application, well we would just back up the VM because there was this assumption that most applications ran on a single server. Now fast forward to today and we're realizing or maybe now it is just the case that most applications do not run in a single VM. We have distributed applications, we have distributed databases like Cassandra and Kafka, Elasticsearch that in most instances run across several machines and so the idea that we can control our application through some subset of machine-based operations no longer is the case. The second kind of change and the reason that we need what we call an application defined control plane is because not only do typical applications run across multiple hosts so that you know individual machine-based operations are no longer sufficient but there's also this problem of how do you make sure that the infrastructure is available when the application needs it and this is maybe a little bit of a longer-term discussion. I would love to come back and talk about this topic in more detail in another webinar but we're starting to see Kubernetes not only you know provide mechanisms for managing applications directly as opposed to managing them kind of as a collection of machine objects but also to provision infrastructure when it's needed based on the application requirements. So I would bring up projects like Kuber through its plug-in model you know companies like Portworx are doing this from a storage perspective basically allowing a containerized application to provision its own infrastructure on demand and I think that's also a really really important part of this of this new model and it's much much different from the the old world where we would provision some infrastructure then we would run some applications on it then when we needed more infrastructure we would provision the app the infrastructure and then run more apps on top of it. Kubernetes is merging that world of application deployment and infrastructure deployment into a single set of processes that are controlled at the application level. Okay so so what does this have to do with DR? Well a lot because the core argument that I'm going to make is that DR itself needs to be needs to be controlled as an application construct not as an infrastructure concept as it has in the past right and if we start to zoom in now if we said okay so we're in this app controlled world what does that look like? Well we have this new stack right it's based around Kubernetes but it's a it's a pluggable model where I can plug in different solutions for monitoring based on standard interfaces the same for security the same for networking the same for storage and in doing so as an IT organization overall I get a lot of benefits excuse me I get a lot of benefits from that right you know we hear this from almost every customer that we talk to I mean you know maybe it's a little bit of a selection bias thing you know companies that adopt Kubernetes want to do more of it but what we hear them say is you know we're more efficient when we run applications on Kubernetes our developers are happy we leverage automation more and know have less problems less security breaches and that's a great thing right of course that is a great thing that's where we're all going for that that's the reason that you know a conference like KUKON is you know getting 10,000 attendees it's because this stuff really matters in the real world of enterprise IT now bringing it back to DR what we're also noticing is that there's kind of a there's a point at which organizations bump up into limits of how many more applications they can put onto the Kubernetes platform without having Kubernetes native ways of solving some key business requirements so I'll go to the next slide I would say that you know in organizations ability to benefit from cloud native technologies and Kubernetes is going to be limited by how many apps they can get to effectively run on that platform and I would argue that the primary barrier to running apps on Kubernetes is not what I'll call a technical barrier right the the inability to run this particular process in a container right it's not or to be able to describe it in a Kubernetes deployment file right that's not the problem the problem rather is addressing key business requirements that would prevent an application from running in this environment or or that environment and there I mean things like maybe data protection right can can I be confident that this app running in Kubernetes has the same level of data protection as I'm able to provide say in a VMware environment on-prem because I have a bunch of different tooling in that environment or you know can it provide the same level of disaster recovery maybe in terms of zero RPO or you know the ability to guarantee zero data loss in in the event of entire data center outage it's these types of business requirements that can prevent those incremental apps from moving over to Kubernetes and thus if we want to get more out of our Kubernetes investment what one of the things that we can do one of the most effective things that we can do is start to figure out how do we address these underlying business requirements so that if we go to any user within our organization and here I'm kind of talking from an IT perspective but we could also say you know if I'm you know I'm into it and I have you know hundreds of teams building different components of a SaaS application I could also say how do I go to any team and and give them confidence that they can run their application more efficiently more securely more performant on Kubernetes than in their traditional environment right when I can do that then Kubernetes really can be the overall control plane for our entire data center and all of the benefits that we talked about in the previous slide you know start to start to accrue at a macro level within the organization not only for a small subset of our applications right so that that's where we want to get to um you know what what we've noticed working with our customers because the the problem of bringing that incremental app to Kubernetes tends to be a business requirement problem not a kind of what a quote-unquote technical problem um what what we've seen work really well is the implementation of a Kubernetes storage platform because if you look at what the those types of barriers data security disaster recovery backup and recovery compliance governance migration at the core all of those business requirements have this concept that there's some data that I need to be able to manage in a particular way and if I can do that in a kubernetes native way then those business requirements are solved and I can bring all of those apps over um you know if I were to kind of boil it down and say well why do I need a kubernetes storage platform again I'm not proposing any particular platform um but rather arguing that the concept applies to the to the problem at hand of bringing apps over to kubernetes I would say that all enterprise apps have data right you have to be able to manage data in order to run enterprise applications we can we can do microservices all day long and there can be stateless microservices but there are not stateless applications right so so data is ubiquitous within the enterprise um just as you know kubernetes is new software new technology in order to solve new problems of scale of uh running distributed applications of running multi-tenant applications and we had to build this software to solve these new problems we would argue that your existing storage technologies that were built and optimized over you know a decade or more to solve the problems posed by vms are not a good fit to solve the problems posed by by containers and it's simply the scale and the dynamism of a kubernetes environment is just an order of magnitude greater than a typical vm environment and and and you know the best technology is is technology that's purpose built for requirements and and those requirements often are not met by kind of existing uh technology that that wasn't built for kubernetes um and finally you know there's always a desire amongst engineers to you know to subdivide a problem you know not boil the ocean you know pick low hanging fruit and get some early wins there and you know i agree with all of that but sometimes with within kubernetes we can say well let's take our stateless applications and run them outside of the platform let's move our state full applications run them on the platform and that's going to simplify things but in fact what we found working with a number of customers is that having those dual systems actually complicates things it increases the complexity of the system um it increases the cost because you're running two different systems um oftentimes cloud lock in if you're using managed services um and at the end of the day that increased complexity reduces agility and so and so we don't want that um now again we're talking about dr today and so i'm going to you know the next slide after this is going to talk all about dr and the effective traits of dr but i really want to make sure this last point is clear before i move on to the specifics um remember our goal is to allow kubernetes to be the data center control plane right we want um you know as as practitioners of kubernetes who who see that there is a better way to build and run applications uh we want to be able to run as many applications on kubernetes as as possible because it's simply a better way right it's more efficient it's more cost effective developers like their jobs better etc um solving dr is critical to that right and we're going to talk a lot about that but you also have to think about how do i do migrations across clouds how do i do backup how do i do data security how do i do government governance and and because all of these problems have data at their core they're they're distinct but they're also similar and so i would i would encourage you as you're thinking about you know what is my dr strategy for kubernetes how can that dr strategy also help you you know with with migrations right if you're doing an on-prem to cloud migration um or you know cloud cloud to cloud how can this ability to to manipulate data um to move objects between environments how can i use that to also solve my migration problems or also solve my backup problems um what you'll find is you have fewer tools to manage fewer tools to learn and you can get greater efficiency from the processes that you're putting in place um but let's let's now talk focus for the rest of the webinar just on dr because i think you know we could easily spend you know you know several hours on this topic um just to kind of establish why we're here right you know dr is clearly important to all enterprise applications and you know i i say all um and i don't mean that you know the dr requirements for you know our our core transaction processing system on cyber monday are the same as for our c-i-c-d environment or jinkins environment but some but uh 451 um analyst firm did some research and they they looked at you know applications that would be considered non-critical business critical and mission critical and then what were the associated um recovery time objective and recovery point objective for these um for these applications and those are metrics you know basically how long um uh how much downtime can we suffer right before an application is back up and running right before you know our users can reconnect to that application that's rto um and how much data are we willing to lose right we could we could have a very um uh we could have a long rto say you know app needs to be up and running within 24 hours right we're okay having down time for up to 24 hours but we can't lose any data right that would be that would be a one day rto and a zero rto right these are just the common um dr um kind of levers that that we can pull well it turns out that you know for mission critical apps the rto objectives and the rto objectives of less than an hour are are pretty um are pretty ubiquitous or i should to rephrase that most organizations have very aggressive rto and rpo's for mission critical applications and you know that's not surprising but if you look at this chart you know even our non-critical applications have non-zero rto's and in some cases low rpo's in other words how much data can we afford to lose and i would i would frame this in terms of something like a jinken server right um you know unless you're cloud bees most most likely the way that your business makes money it's not through your you know your cicd pipeline in the sense that that's not your product that's not what you're selling um you know your your your e-commerce website or you sell you know ai software whatever you know sports forecasting whatever the case may be um but your jinken's environment is a critical dev tool and if your dev pipeline shuts down then you can make updates to the thing that you do sell which then in turn you could lose money so even these non-production or non-mission critical applications also require disaster recovery and that's really the point which is that within kubernetes what we need to be able to do is provide dr at the application granular level and be able to choose different rto's and rpo's based on the types of applications any app running on kubernetes should be able we should be able to apply dr to any app running on kubernetes but we should be able to um pick and choose our rto and our rpo based on business requirements network topology and other requirements such that we can fit into the into this model where we have different rpo's and rto's for different applications okay so the webinar is called five traits of effective disaster recovery on kubernetes um and so i want to walk through five things um you know the reason why we have to have this webinar is simply because dr for containers is different from dr for vm's i'm not suggesting that you forget everything that you know about about disaster recovery okay and learn it all fresh with kubernetes i'm in fact i've just talked about kind of the core concepts of dr rpo and rto those still apply in a kubernetes environment but how we ensure those rto's and rpo's is is different based on the ways in which um kubernetes and containers are different from vm's and so i'm going to walk through each of these intern um so you can really understand that and again you know that i would say you know if if um you know portworks didn't exist and i didn't work at portworks i would still give the same talk meaning these are concepts that apply regardless of which solution you pick in order to meet in order to meet the requirements so the first concept um is container granularity and this is like you know i win the prize for most obvious statement that you know dr for containers needs to be container granular but but the reason why this is i think is really really important and it goes back to that difference between the machine based world and the application based world so we can illustrate this in a really simple example which is to say you know we have a three node kubernetes cluster and we're running a bunch of apps on that on that cluster um we have a one three node casandra database or a casandra ring and we have three one node mysql databases and so the simple question we can ask to understand how container granularity applies to uh dr for kubernetes is to ask the question how do you back up just one of these applications um and you you'll see that it starts to be complex if the only tool in our arsenal is a machine based backup so you know let's say i want to back up one of my mysql databases these are individual mysql databases they each run wholly on a single server right distinct from our casandra which is a distributed database across three servers with mysql it's easy it runs on a single server let me just take a backup of node one and now i've got a backup of my mysql database well that's not the case because node one also includes um you know casandra configuration um and casandra data so when i restore that backup and i have to get rid of all of that stuff in order to just restore my mysql um what i want to be able to do is is capture only the state associated with that single mysql database running on node one and and leave behind all of my casandra right that's what we call container granularity um by the same token if i want to back up my my distributed casandra database um clearly i can't just take a snapshot or a backup of a single node i need to do it for all three but then i run into the inverse problem which is that now i have all a bunch of mysql data right i have three individual mysql volumes that i don't need for the purpose of backing up my casandra right and let's say those mysql's are you know a tera by each do i really want to back up all of that data when the when the question in hand is backing up casandra clearly i don't what i want to be able to do is zoom in on the container granular volumes and provide backup of only those volumes um so the next point is is kubernetes namespace awareness and this is simply the idea that dr that's custom built for kubernetes speaks the language of kubernetes so you know increasingly um teams are running you know dozens or hundreds of pods in a single kubernetes namespace um and i want to you know backup not just my mysql database or not just my casandra database and and the previous example in a container granular fashion but i also want to back up an entire namespace right i might have a a business unit within my organization that i've given a kubernetes namespace and i want to make sure that the entire thing is backed up right not just an individual application but all the applications running in the namespace and i want to be to do that with a single coming kubernetes command not as a combination of these machine granular commands that we've already seen are problematic um this is what that would look like right and it's it's actually even more complex than this diagram makes it look because it kind of looks like namespace one is running on the on these servers on the left and namespace two is running on these servers on the right but in fact what's happening is kubernetes is multiplexing all of these containers which in and of themselves are distributed systems in and of themselves are composed of multiple pods running across any combination it makes it impossible to think about a single machine based command that would capture this concept of a kubernetes namespace the problem with kind of what i'll call legacy or existing dr solutions um oops sorry got overzealous with the um uh the um the mousepad um hopefully everybody can see my slide again the problem with kind of traditional um or existing dr solutions is that they don't have these concepts of the kubernetes in the api in place so if i want to back up namespace one there there is no kind of namespace object within my my dr api that i that i can call in order to capture this group of objects um which makes it very difficult to back up say namespace one and put it in one location and namespace two and put it in another location okay so uh the third concept is application consistency so we talked about container granularity and we want to be able to apply that for an individual application say a distributed concentra database but we also want to up level and say i want to i want to back up or i want to do dr for all the distributed applications running in a single kubernetes namespace okay well in order to do that effectively we also need to add in this third concept which is application consistency right going back to an earlier point sorry i wasn't even touching my mouse in the um uh the screen changed me one second hopefully everybody can see um can see my screen sorry about that um in order for these dr to be effective we need to make sure that when we back up that casandra database or that group of kafka brokers or that elastic search database that the distributed backup that we're taking is application consistent meaning we're not going to get data corruption because we're taking kind of serial snapshots of a distributed system and i just you know to go back to our um our simple wide example that is to say you know if i'm going to back up this casandra database which again is three nodes but it's one database three nodes right it's a distributed system i have i have a three um uh node casandra ring i can't just take a snapshot of node one snap then a snapshot of node two snap and a snapshot of node three snap and be guaranteed that when i recover that database in my dr site that i'm not going to get data corruption um casandra has a particular way in which it needs to be snapshotted in order to be um application consistent right in in in essence i need to quiesce the database make sure that none of those nodes are going to accept any new writes i'm going to snapshot everything then i'm going to unlock the tables um and then you know i'm going to continue writing and you know flush all of those writes that are still in memory down to disk and then and carry on um there's a particular kind of sequence of events that needs to happen in order to take that um application consistent snapshot that's different for how i'm going to snapshot my sequel um it has its own way of snapshotting and we need to perform it but it's different from casandra because typically it's all going to be running on single server um basically what you need to do is make sure that your dr solution understands this notion of application consistency and say i'm going to you know have a dr site on the east coast for an application that's running on production on the west coast i need to make sure that each of those incremental backups is application consistent not simply crash consistent for my distributed systems um this last point is really important and um uh you know as they all are but i would say this is probably one of the biggest differences between container granular um dr uh or kubernetes dr and traditional dr which is to say our dr system needs to be capable of backing up data and application configuration just data is not enough just application configuration is not enough um so if i if i look at this i say okay what what what composes a kubernetes application right let's take a simple example of a kind of a an app that does run on a single server right we're going to apply this later to the concept of distributed systems or for now let's just keep it simple well if it's a data service it's going to have a volume it's going to have some application configuration and it's going to have a bunch of kubernetes objects associated with it uh kubernetes objects could be secrets service accounts you know pvcs controllers there there are dozens of these objects and if we add in something like you know open shift that you know there are dozens more a very kind of specific objects that define how an application runs on that particular kubernetes platform um and if i want to recover my application i need to have all of that application state as well as my what i'll call state state right my data um if our dr solution is based on something that is only going to take you know a crash consistent snapshot and i'm going to put that data somewhere else right that's key it's it itself essential but not uh sufficient um then i'm going to have to recreate all of my application configuration and all of my application configuration all of my kubernetes configuration in my application configuration in the new environment in order to recover that solution right that's very time consuming and if you've been around dr for a while um you've you've been a part of projects where everything's in place in the new environment we just can't get the app up and running right our rto target you know gets missed by you know 24 hours 48 hours you know a week or more this has happened in production applications so you know we we have dr in place but we just can't recover that application oftentimes it's because of application configuration problems um so what we need to do is um for disaster recovery and i would include in fact the the slide here that says talks about this in terms of migration because if you think about dr it's taking an app that's running in one location and making it available in another location we can think about that as as a migration it needs to contain both the application configuration and the data as a single set of objects right that that i can recover together i can move together um i can back up together in order to ensure that even if i have a low rpo meaning i don't have not suffered any data loss i can actually still get that very fast recovery um and the the last point is that our dr solution needs to be optimized for a multi-cloud world um basically you know i need to have options for you know local metro area data centers as well as wide area network based dr and often this is mapped to that that chart that we showed at the earlier which is what are your business requirements what are your rpo and your rto objectives um so if i if i look at you know an application that is going to have you know a very low rpo requirement i might have two data centers and what we call a campus networks is going to be very low latency between them i need to make sure that you know all of my data that's written to one data center is synchronously replicated to another data center all of my application configuration moves over as well right the report works is one particular implementation of this but you know there are many ways to solve a problem and you know for low rpo low rto applications you need to be able to figure out how to have you know a dr setup that gives you that um uh that capability you know so if i lose an entire data center i can fail over my application to a second data center now that's really different from say you know having uh you know a dr site on the east coast um for a production application on the west coast or from from europe to asia or vice versa um there the latencies are going to be such that we're really not going to be a use synchronous replication we're going to have to do some type of incremental backup um but again we want to make sure that all of the data and all the application configuration are in place in order to do that um so we got um see what that looks like there okay so um so everything that i've shared up to this point is really you know it whether or not portworks exists it doesn't change the need for container granularity the need to be able to speak you know within the kubernetes primitives like namespaces the ability to take application consistent snapshots of the distributed systems etc all of that main it is maintained um i mentioned at the top of the show that portworks is one particular implementation of this and i don't i don't want to take up a lot of time talking about it but i i do want to just you know hold up an example of a company who has successfully done this in production i think that's really important just to know that you know there are your peers are doing this today um there are being successful with it and you know sometimes seeing those success stories gives us confidence that we can try it too within our organization so i'll just mention really quickly uh that this particular customer um wanted to move a new app over to kubernetes um had a kind of a traditional enterprise dr requirement around zero rpo failover between two completely distinct data centers um and you know if that app could solve that business requirement i mentioned that at the beginning that it would come over to kubernetes um and if it if it couldn't then you know unfortunately the app wouldn't be at a move over it was very binary within this financial service institution um and you know i'm really proud that we're all but to help them address this uh this dr concern which again really speaks to the maturity of the kubernetes ecosystem there was took a lot of you know into projects and people and processes working together in order to make this happen but you know if i were to say you know point to one thing that gives me confidence that kubernetes is going to be around for the next 10 years i mean it's the ability to run quote-unquote traditional enterprise applications with these hard requirements around dr successfully on kubernetes um that that gives me um confidence that really there there is no there's no limit to the types of applications that that can run successfully on the kubernetes platform um the last slide before we move over to um uh to q and a um would be to say you know if you're interested in in kind of exploring how portworks can help your organization um with um you know with dr or or you know other solutions you know we're really happy and proud member of the cncf foundation uh you know we go to all the kubernetes events so you can always find us there um and our platform portworks runs on any kubernetes platform so you know if you're you know if you're an open shift or you're running one of the cod providers or ranch or ibm you know we work with all of those um you know you can continue to use whatever hardware you have whether that's the cloud or you have particular storage hardware you know you bought pure storage and you really love it but you want to apply that container granularity um you know we can leverage your existing solutions um and then provide solutions not just for dr but also for migration for backup uh security um etc um so with that i would like to turn it over for questions and so i think saraj is going to walk us through that part of it yeah thank you everyone awesome uh thanks michael for a great presentation uh we now have some time for questions if you have a question that you would like to ask let's drop in in the q and a tab at the bottom of your screen and we'll get to as many as we have time for so we got two questions as of now at what level would service mesh like is to fit into the dr fellow or setup for data center if applicable yeah that's a great question um you know the the networking component is um oftentimes is one of the the key elements of an effective dr system because what you want to do is you want to do a couple of things one is make sure that the dr site is staged properly with your data and your application configuration right that that's the starting point um we want to stage it based on our rpo and rto objectives so if we have a zero rpo objective well you know every right to data center one needs to be replicated data center two right that's what we talked about um then we want to detect a data center failure right um we want to detect that it's not just a momentary network partition but but something that is truly a failure and there we're going to be able to use our monitoring stacks right um prometheus for instance you know we're going to look at things in grafana we're going to we're going to use our our metrics and monitoring to determine what is truly a disaster then we need to make sure that our application traffic is routed to the appropriate site so if that is um you know in the um in the zero rpo example we want to make sure that we're going to push everybody to data center two and that's really where istio comes in um exactly how you determine what is a failure communicate that to your service mesh such that your traffic starts to be routed to the second environment is different based on you know are you are you working in vpcs kind of what is your you know what's your load balancing strategy like um so it's hard to talk about in the abstract but i would say kind of the three things you need to do in order to implement dr within your organization is make sure that your apps are staged in the dr site determine your strategy for for disambiguating between a momentary blip and something that you consider to be a true disaster right your your failover trigger um and then look at how you want to um redirect your applications once that trigger has been met um in order to route your traffic to the um to the dr environment and you know if you if you have particular questions about your your network you know feel free to reach out to me at michaelorworks.com or you know come over to the website and we can kind of help you you know look at the particulars thank you uh next question is rdc solution runs for azure um i i don't feel comfortable kind of talking about their particular implementation um so i won't be able to answer that question unfortunately i will say that that as a company portworks has customers that are running in all of the major cloud kubernetes services um in doing dr in those services so azure kubernetes service amazon kubernetes service uh gke um so if the question is more does it work in a particular cloud environment i can say yes that is uh that is absolutely a case um you know rancher open shift etc okay yeah thank you so next question is uh i'm new to kubernetes if we run aws managed kubernetes with dr and aws cloud has their own dr solution on their region location so the two dr solution will get conflict or not um yeah it's it i would say no um well let me let me it's it's a without the particulars it's hard to know exactly how to answer the question so let me let me give a little bit of a higher level answer um what i would just described in terms of kind of effect five effective traits for disaster recovery for kubernetes provides dr for applications running on kubernetes um and one of the main points i wanted to make there is that there's a difference between effective dr for a vm and effective dr for an application running on the vm um the traits of effective dr that i described are at an application level um um and you may want to continue to have a dr in place at an infrastructure level um and the two are not necessarily in conflict it's just they're they're taking um uh providing redundancy at different levels of the stack right if i'm a dev ops team and i am responsible so i build an application and i'm responsible for running it right that's a very common you know two pizza team type type organizing principle within even even kind of enterprise it and certainly within kind of what i would call like sas startups and things like that so so i build the app and i'm responsible for running the app now if i'm going to get paged at three o'clock in the morning because you know i have a data center outage and i need to bring my application back up it's much more efficient for me to be able to recover just my application and not have to think about what infrastructure was my application also running on at the time and let me you know recover that which would also include if i have active sessions on in in one infrastructure and then i fail over to another infrastructure those sessions are going to get caught off it increases a lot of complex adds a lot of complexity uh if i'm only trying to recover my particular application so this is where we see teams disambiguating so for instance a really good example by way of analogy would be to say you know if i'm running kubernetes on amazon i'm it doesn't replace other amazon technologies that that manage for instance my you know vms i still have my amis i still might use cloud formation for actually bootstrapping um individual vms but now i have software on top of those vms that provides additional capabilities and i'm going to use those capabilities where appropriate for certain tasks and i'm going to continue to use my infrastructure prop capabilities for other tasks so there is some overlap from a dr perspective i have a dr solution at an infrastructure level and i have a dr solution at a kubernetes level but at the end of the day they both serve their own purposes um and um and i think it's it's increasingly common to have a solution at both sets of the stack uh because there are different users of those sets of the stack yeah thank you next question is uh about can you brief about rook storage uh i don't fully understand the question uh i mean i'm familiar with uh with rook storage but i don't know like specifically around dr or just in general okay so uh maybe whoever posted the question can elaborate on your question meanwhile we can take the next question which is if github's model is in place we may not need to back up kubernetes config how does tool like port works differ um you know that's a that's a good question and i don't know that in practice i fully agree i fully agree with that or and i don't i don't want it to turn into a kind of a philosophical debate um you know all of our customers practice you know uh version control and and get ops it's just you know it it's very rare that you have a team that's moving to kubernetes and it's not doing you know some version of version control and you know deploying uh via modern deployment practices that are increasingly called get ops um and so the application configuration um is you know sometimes there are runtime changes that happen um and you know or you have a particular you have version control but you have 15 different versions of a particular um application configuration or container um and always understanding what version of the of the application was running uh with what particular data volume is a complex problem in and of itself and so for very low rpo in very low rto applications um basically where you can suffer zero data loss and your your recovery time say is less than one minute um it's a more effective solution to copy over the um the application configuration versus rebuilding it or repulling it from scratch from your version control system and i'm not i'm not suggesting that's always the case like i i talked about um you know if i have a a 24 hour rto where you know i've got some time before i my application needs to be back up and running maybe a get ops version control based application configuration um would be um would be appropriate but if i'm talking about a sub two minute um rto uh where my application needs to be up and running like that chances are you're not going to have enough time to rebuild from source so i think it's a matter of understanding your business requirements understanding how the different technologies work together um and you know pick the right solution for you again i'm not here to you know pitch any particular solution so much as to lay out some of the um some of the mechanics uh good so this one more question can you elaborate on hyperconverse storage for br using rook plus safe fs hyper yes um so um i'm not an expert on that so i want to be careful that i don't position kind of i don't want to talk in categorical statements so i would i would you know reframe the question or i would say if i wanted if i wanted to answer that question if i wanted to become an expert on this topic um i would do things like look at whether or not rook and sef can can um you know meet the requirements for for kubernetes dr that the model that i outlined earlier so for instance um can i you know can i target a namespace with that product so if i have 100 pods and 100 volumes running on a kubernetes cluster um can i manipulate them at a namespace level i honestly i don't know how to answer that question um that's the way i would approach it um enough and um you know can i you know um run in in different you know network topology modes you know both kind of you know a stretch cluster across two data centers as well as distinct clusters um you know can i take um application granular snapshots versus machine granular snapshots i think that's the way to to answer that question and unfortunately i'm i'm not in a position to be to be able to answer it for you but i think it give you a model for being able to you know ask you know uh the right questions in order to get an answer to your question yeah so we got one more question which is how is port work different than bolero which is how do you are it's okay i'm on uh he he's speaking i'm not i'm speaking i just have to listen yeah you are you are speaking actually um uh no one i'm mute um so um yeah that's a good question i would say at a high level um so portworks is a what i'll call a quote unquote integrated solution um meaning it's it's a tool that handles the application configuration the kubernetes objects as well as the volumes um as kind of a single class of objects and we we have a technology that we call kube motion um which is basically this ability to capture application state and data as a single set of objects and one way you can use that is in dr another way that you can use it is in um is in backup um another way that you can use it is in migration so bolero i would i would suggest is not a dr solution but rather a backup solution the the difference there is um and forgive me if i get any of the details wrong my understanding is that if you had a zero rpo um uh failover requirement meaning zero data loss between two distinct data centers that bolero itself would not handle the synchronous replication of data between environments and i don't know if the way that its plug-in model works you could make sure that that was happening at a storage layer and that that bolero was integrated into what's going on at that base storage layer distinct from its backup solution i'm just i'm just not sure um in order to be able to provide dr from a backup perspective um uh the difference would be that you know portworks has a you it's again that integrated solution so you would you know log into uh portworks and you would you know point and click to the objects and the applications and the namespaces and the clusters that you want to back up you would pick your your backup locations you would push all of that data there portworks would you know guarantee that you know it's there if the data transfer fails it would retry like kind of all of that just nuts and bolts stuff um and then if you wanted to recover one of those applications again you know point and click pick where you want to recover it to so um i think bolero is a important component of building a backup solution but it's not a integrated in in backup solution itself uh whereas you know portworks as a kind of a solution that you know people drop in and it solves kind of you know from nuts to bolts without requiring a whole lot of integration points um is that in the solution so i hope that helps and i didn't get any of the details wrong yeah thanks uh so we have last question for the day instead of having a standby dr site what is the strategy to back up the production component is infrastructure for an effective recovery yeah that's a good question so um you know a lot of you know a dr site assumes that you're you know for for that zero rpa use case it assumes that you're running compute already in that other environment and you know as soon as you turn on compute you start paying for it whether or not you're buying the servers you're running it in the cloud um so i would i would look at kind of a backup model if you if you're really trying to save costs but you still want to be able to recover applications and you're okay with you know a lower rpo um because backups are not going to give you that synchronous replication it's it's going to be based on snapshots and they're going to be you know some some limits and the snapshot granularity with different solutions i would look at that um and i would just suggest that again you look at making sure that you're backing up the config the app config and the Kubernetes objects as well as the data um and that could be using something like valero it could be using something like portworks you know castin other solutions for for that um because the recovery part is is almost all instances the really really hard part of backup and recovery making sure that the app can run consistently without data corruption in the new environment quickly um and the best way that i found to ensure that that is one make sure it's containerized right Kubernetes takes care of that for us um because we kind of smooth out some of those environmental differences and then make sure that you're backing up the data the app config and the Kubernetes object does a single group um pull them all down together you don't have to kind of pre-stage your compute in order to do that um but you still get you know moderate moderate rto's um in that case okay so this one more question i think we have one minute to answer that will portworks take a backup of it 3d um so so portworks is a persistent storage and data management solution for Kubernetes so etcd is is just a database right i'm saying just an air quotes it's very important one for kubernetes um portworks can you know take a backup of any stateful service kassandra kafka etc so the answer to that can be portworks can be used as your persistence layer for etcd but there there's a kind of a an integration problem is that if you're using your your etcd is underlying your kubernetes service and portworks is providing persistent storage for your kubernetes service based on that etcd it's kind of like you know the snake eating itself so in that case what what we do is we back up the objects in etcd which is different from backing up etcd itself basically when we see the kubernetes configurations within etcd we will rewrite them and put them um in a in the backup location so that even if your etcd goes away that all of the data is still preserved um it's a little bit of a um a subtle distinction kind of backing up etcd versus backing up the objects that are in etcd but for this kind of world in which you have multiple layers of abstraction that are built upon each other that's the most effective way that we've seen it work for our customers okay great so thanks michael for a great presentation all right and that is all the question we have time for today so thanks for joining us today the webinar recording and the slides will be online later today we are looking forward to seeing you at a future cnc webinar have a great day thank you so much everyone have a nice day