 Good afternoon everyone and thanks for coming up for this session. We'll be talking about the efficient strategy about OpenStack backup. These are the outlines we are going to talk in this session. And first we'll introduce who we are. Myself, Ghanshyaman, I'm a software developer in NEC. I'm working in OpenStack upstream developers since 2014. And I work in QA and NOAA and doing some of the POC and backup and storage site. Yeah. Good afternoon everyone. I'm Abhinav, Abhinav Adhraval. I work as a solutions architect with NEC Technologies. And I'm from bottom of my heart. I'm a systems software developer since 2005. Primarily spent my time with virtualization, cloud, big data and IoT storage technologies in last about 12 years. So you can catch me on Twitter at techieabhinav. So that's all about my introduction. Thanks. So OpenStack backup. This is an open question actually. If you are building a cloud and one of the key consideration is about backup and recovery. Because we have a lot of resources in OpenStack, VM volumes, we have database and configuration things. And there are a lot of backup solution available for clouds, but those are either specific to specific vendors like Microsoft, VMware only. And there are a lot of questions comes in mind when we think about the OpenStack backup. Like, can we use traditional backup solution for OpenStack? What all to backup for OpenStack? We have a lot of things. And what should be our RPO, RTO and TCO things. So all these questions and everything we will be discussing in this session. So we know the very first question. Maybe audience also would like to know why we need to backup cloud. So all these questions can you just summarize initially. So you can get the background. Thanks, Kansyam. I'm sure that audience here would have similar questions when they are attending this session. Like, first of all, the very first question is like, why do we need to have backup of OpenStack? So I would like to, you know, summarize that with this slide. Why OpenStack needs to be backed up. So, you know, there are, so the primary reason for having backup is recovery from a data loss situation. And data loss can happen due to multiple reasons, including human error. So, you know, your engineer, a single RM minus RF can cause all your data lost or something overwritten by engineer by mistake. So, you know, you need some mechanism to recover from those kind of scenarios. There can be data corruptions through other ways, not just by human errors. So even in those scenarios, you need to have some mechanism to recover. And none of the hardware in the world so far is 100% reliable. So even if it's 0.0001% not, you know, guaranteed, then also you need to have, because your critical data is residing in the cloud. So you need to have some mechanism so that you can recover from there. Not only data loss, but there are scenarios when your full system can go down. Your full cloud can go down basically. And then so there are scenarios like, for example, you're performing upgrade. So during upgrades, something nasty happens. Some power failure happens. Some script is not working as you expected. And then you're gone. So typically the best practices says that before update, you take backup. And in case something bad happens, at least you have something to, you know, roll back your system to some stable state. There are very rare scenarios of natural disaster like some flood or something, some earthquake, those kinds of scenarios. So there are DR solutions required for them. And for DR also, you need to have some backup. Not only these, the recovery situations, but you know, you add some business values if your cloud is basically planned well for backup. So like you get competitive advantage as compared to other providers or other solutions that you're deploying. Basically, you're ready for any kind of worst case scenario. So you can survive in any kind of, those kind of scenarios which are explained earlier. And then for some of the industries backup is kind of compliance requirement. Typically industries like healthcare, they are HIPAA compliance. They need to have, you know, all the servers and all the data needs to be backed up. Finance industry and many more. I mean, they need backup as a compliance requirement. And if you have a cloud which you say that is backup ready, that's ready for those kind of industries. So yeah, this was very first question. And I guess I've explained it that why backup is needed. Now, backup is needed. It's known from long time. And there are multiple solutions. I call them traditional backup solutions. They're typically there for long ages. But now, since we are talking about cloud, why we need something different for cloud? Why traditional backup solutions are not a direct fit for cloud kind of backup requirements? So the first and very important thing is scalability. So as you know, like OpenStack is designed to become scalable. There is no, theoretically there is no limit on number of nodes or storage and all. But most of the backup solutions which are existing today, traditional backups, they mostly have limitations either on number of generations. I mean, the storage they can handle or number of servers they can handle. But so it's a contradictory situation when for OpenStack, you need a backup solution which needs to be limitless, especially for scalability kind of scenario. Then multi-tenancy. So your traditional backup softwares, they manage users mostly through Active Directory integration and those kind of scenarios wherein OpenStack or any other cloud is, for that matter, is multi-tenant. So instead of users, you have group of users as tenants residing on your cloud. And you need solutions or your backup behavior different for each tenant. So one of the possible ways could be to have different backup solution for each tenant. But then you're losing on the management cost and effort required and you're asking your clients to take that burden. But as a cloud provider, if you want to have a single backup under a single umbrella, then also you need to have that considered in your backup plan. Next thing is the context aware part. So when I say context, so in cloud, you're not just backing up your user load like the VM or the volumes. But you also need to have backup of information like your VM network settings, the configuration files of your cloud services, some audit requirements, audit trails, and all those things are also required. So the context that your application is running on cloud is important and that also needs to be saved. Then distributed architecture. So most of the current applications, they are kind of distributed, clustered applications wherein it is not simple as a single node application when you are taking backup. You have to maintain synergy between services running in different nodes. So typically cloud will come under that kind of applications would be running in the cloud and you need to handle those kind of applications backup. And last but not least, cloud resources, as I said, like cloud-specific resources like configurations, logs, cloud DB or some messaging, persistent data, all those things also needs to be backed up. So given all these, I think none of the traditional backup is 100% fit for cloud, for implementing cloud backup. And then we need to have some solution which is way beyond this. So what happens due to these challenges, like the requirements, what kind of challenges admins are facing in open stack when they have to plan their backup if they're not considering any automated solutions. So things like, as I said, like there is volume backup, there is your VM backup, there is configuration backup. So all these things are scattered. Your admin has to plan a regular backup of all these things in a different manner. There is no single one-click solution ready for admin to deploy. Then once the admin is doing, basically he's trying to write some scripts to take backup of logs, configurations, DB. It's kind of error-prone. He has to execute backups manually. He's not able to schedule them properly because of the lack of... So he's not able to utilize the features of traditional backup. So there's no integrated user interface, one-click backup kind of scenario. And during backup, if some failure happens, he has to handle all these things manually. So looking at all these challenges that your admin is facing, you need some solution which is basically solving the purpose and helping the admin. And now while you're designing your cloud, you also need to consider integrating such solutions in your cloud itself. So Ganshyam would be later explaining about what all solutions are there. So before I hand over it to Ganshyam, I would add some of the key considerations admin should basically consider while he's planning the backup policies. So typically the standard backup best practices apply when you create your OpenStack backup policies. So for example, your tolerance to the failure or how quickly you want to recover would drive your frequency of backup. And things like how many backups you want to keep would depend on the kind of storage you have, what is the capacity that you're holding. Whether you want to keep your backup off-site or on-site would also depend on the kind of failures you are anticipating. One of the most important things while planning backup is you should also plan... See, only taking backup is not enough. I mean, you also have to ensure that once needed you are able to restore and make your system up and running. So you have to ensure that whatever backup you have taken is correctly taken. But it's not possible that every backup all the time you are able to verify. So you need to also define how often you want to verify and test whether your backups are correctly taken. The targets where you are storing your backups, they also need to be reliable. So you have to carefully choose which target you are going to use for your backup target. For example, you want to use SIF, you want to use Swift, some manifest target or those kind of different targets are available. You need to define different matrices like RPO, RTO, which are targeted for the cloud operating. So all these things I think before you consider any solution, you should take into mind and then you should carefully choose the solutions. Now I will hand it over to Ghanshyam for further explanation of the solutions. Yeah, thanks for explaining the very important question why we need cloud backup. So now the question comes, what all things to backup for OpenStack cloud? We have VM, we have volume, we have other configuration files, data files, log files, like what all to backup and what all are important. So on cloud source side we have configuration and log files. So every service is in OpenStack, they run against the set of configuration which varies their behavior between different cloud. So on your cloud whatever configuration you have set, you wanted to backup those things. And similarly, log file you have. And then obviously the database, each OpenStack services have their own database and their resources, information and metadata, et cetera. On the user workload side we have volume and VM. So the user workload side, the VM and volume, they can be backed up as a complete workload or you can backup like VM and all the test volume with them. So now on configuration, log file and database side, so configuration backup, we have configuration files like by default the location is etc slash stuff like NOAA, Neutron, Cinder, Keystone, et cetera. Or whatever you have configuration location in your cloud. So from there you can take the backup, you can have the file level backup for those. Similarly, log file, we have the log file on configured location like by default we say slash where slash lag log NOAA, Neutron, Cinder. So wherever you have configured the log file location in your cloud you can keep taking the backup. So depending on how big your cloud is and how frequent the request comes on that so log files may be very big. So it's up to you guys that how many log files and till long you want to keep those log files. And on DB backup, so we have a database for each OpenStack services and those database can be on separate controller side, controller node or they can have a separate MySQL server where they keep all the database for all services. So if it's on single node, you can take backup for all the database together like for example with MySQL dump tool. Or if you want to take backup for particular service, for example, NOAA, you can take that. So yeah, MySQL dump, like we can question depends like for speed and reliability this may not be good. So I'm not recommending this is the only way you should backup MySQL. You can choose the other solutions and your own backup utility for database. So these are the resources we should at least care about in OpenStack backup. And now let's discuss what's the essential features we need in cloud backup solution because backup is not just copying the data from somewhere to somewhere. It needs a lot of things and we should consider a lot of features and factor when we are choosing the backup. So very first thing is kind of backup type like full backup, then differential backup, incremental backup, file backup. We can say like still it has the redundant differences of backup with respect to the previous full backup. But yeah, incremental backup is kind of much needed because you don't want to back up, you don't want to take full backup every time. You don't have that much of storage, nobody has I think. So incremental backup is very needed and file level backup. So if particular file you want to backup and restore, the feature should be there. Next is policy-driven backup. So there are a lot of things you want to define in policy, like your retention policy and about your permission on particular storage, your data format and your encryption types on particular tenant or particular user. So those kinds of things you can define in policies and that backup solution should care or should consider those policy and automatically do backup according to those because each time you don't want to, as Abino explained, nobody wants to do the manual backup. That's why we have the backup solution here and that's why we should have a policy-driven backup support in its backup solution so that you can define policy at each level and obviously the automatic backup, the skew dealing and all these things you can, the automatic you can do if you want to like schedule backup overnight or weekend for particular time, you should be able to do that. Then we have on restore side. So restore, yeah, one click, like we have backed up 100 of VM, 1000 of VM. We don't want to restore it one by one or for example, I have a particular tenant and I have backed up the complete data of the tenant so I should be able to restore it completely. So one click restore should be there. It's restore everything whatever you have backed up. And then selective restore, that is also very important because if I like under one tenant or as a single workload I have backed up 1000 of VM and my one VM got corrupted so I want to restore that VM only. So I should be able to select the particular VM or resource from a complete workload or tenant backup and restore that. Then yes, the file level recovery. As I said, if particular file we want to restore or recover we should be able to do because if I got one file corrupted and want to restore it, I don't want to back restore the complete image and then put the VM and then restore it from there. It should be able to map the backed up image and from there I can choose the single file or particular file to restore. So next we have deduplication. That is very, very important because nowadays as I said nobody has too much storage. Storage is very important. So in any kind of storage, any kind of data we store we want deduplication and same in the case of backup restore also. So with deduplication we can save a lot of storage, bandwidth so that is really important. On similar note we have data compression so if we can save some space with compression technology that is also good. Next, data security. So that is all up to cloud provider or depends on the user, VM they want to backup. So they want to provide a data security or not kind of some authorization, authentication things or at least encryption way of the data. But there are a trade of like if you support the encryption things obviously it makes backup and restore slow because it has to encrypt and decrypt the data. So if we have a use case of much data security so the backup solution should provide the encryption support. And next we have multi-tenancy as we know explained why traditional backup cannot be used and one of the factor was the multi-tenancy in OpenStack which is very nice and much needed. So each solution should backup each tenant resources or workload with by providing the isolation between tenants. It should not be like I have backed up one tenant and restoring there and changing, modifying or disturbing the other tenant resources. So that should not be there. So the tenant isolation should be taken care while doing the backup and resources. Next, non-disruptive. So if I'm using VM volume and my cloud provider want to take backup, I can do backup but don't ask me to stop the services or any downtime. So that's what we want to do. So it should be non-disruptive. So irrespective of VM or volume downtime or read-write, read-write stop, the backup should work. Next, geolocation. So, yeah, we can locally backup on NFS storage or anywhere but still due to a lot of factor we want to have one backup copy or at least the backup on the remote side or other geographic location. So if backup solution has support for this, that is really nice. Next, we have scalability. So OpenStack, as we know, it can be scaled out at any level. It can be of 100 node or 1,000 of node or million nodes. So, and initially, we don't know how much we need. So maybe cloud keep growing, keep scale out. So backup solution, whatever we choose, it should be scalable. So it's not like you just work for 1,000 VM and after that, okay, there's something, some issues. Of course, it will not be there. So as OpenStack is scalable, the backup solution should be scalable. And here, unlimited data transfer. So there should not be any limit on data transfer if I have very heavy VM or heavy volume of 1 TB or anything. So there should not be like there is a limited data transfer things there and which makes my backup and restore very slow. And this is very important. And most of the traditional backup has some agent installed on the particular resource, VM or on the directory, which you want to backup. So in case of OpenStack, we don't want any agent to be installed on VM because that makes slow. Even because we know when we take the VM backup, so the image of the backup lies under the translation layer on host. So if I trigger backup, it goes to VM, some agent there. It goes triggered again at the down. In translation layer of host, get the data images, then it goes all the way there. So that's obviously we don't need and we don't want. So there should not be any guest to agent required for backup solution. Just would like to add one more point about the agentless requirement. So it's not only, the performance is definitely one of the requirements, but other than that, scalability is also affected if you need to install agents on every VM. So basically, if you don't need agents to be installed on user workload or user VMs, you can scale your cloud in independent of your backup requirement. So that is one of the key requirements to make your backup architecture such that you don't need to have agents running on each VM. Yeah, that's really true. And next, the resource requirement by the backup solution. It's not to be there, it needs 50% of your cloud resources to backup your cloud. So as minimum, as required is always good. And next, last but not the least, is user interface. So admin or anyone who is doing the backup, they don't want any complex interface or they don't want to have a lot of internal technical knowledge of how that solution or software works. So as simple as interface for that backup solution, is good. So those are the essential features we actually look for backup solution in case of OpenStack backup. And next, we'll go through what all we have in market currently. So the first we'll go with the Trilio Vault. So on Trilio Vault is a backup and recovery solution for OpenStack from Trilio Data. And it provides multi-tenancy self-service policy-based solution. And yes, no agent on guest. It's available non-destructive and it provides point-in-time backup of data and configuration. And yeah, on Restore, one-click Restore you can do selective Restore also and file-level Restore also. And it provides a flexibility of selecting, selectively restoring the VAM on different network or different availability zone or different cloud, region or different cloud. So there might be scenario like in my cloud I have multiple compute node in multiple availability zone. So I have backed up one VAM from one compute node under say XYZ. And while restoring, I don't want to restore that on the same compute node because that compute node has gone bad. So I should be able to restore backed up VAM in different compute node under different availability zone. By selecting the availability zone, I should be able to restore that in different compute node. So that is also present here and then, yeah, file-level backup and Restore support. And they have backup storage support on NFSF Swift and AWS. And next, I got this architecture from Trilio team. It's very simple and easy to explain. So I will not go deep into this architecture. What I want, why I included this architecture here to understand from deployment point of view what all things we have to install and where we have to install and how they work at very high level so that we can understand how and how much hardware resources or how complex they are etc. So on Trilio architecture, in Trilio Vault, on the NOAA and Horizon side, we have Trilio Vault API and Horizon Plugin. So on the same OpenStack dashboard or whatever you have dashboard, so you can have the integrated UI. And then you have Trilio data on compute node and you have Trilio Vault. You can install it on VM or physical server. So what happens when you trigger backup or restore from Horizon, so it goes to the Trilio Vault and it trigger data mover on compute node to get the backup of those VM and then they do the backup on their select to restore storage on NFS, Swift or Chef or Red Bullets whatever it is. So that's how it works. There is no agent on VM or there is no other complexity. So we have done the functional POC on Trilio data and it went very well. But the performance POC and DR things is in progress. And if you want to know more about Trilio data, we have a couple of sessions tomorrow or day after tomorrow also. So next we have Freezer. So you might know this. We have this as an open stack project. So Freezer is a distributed backup, restore and disaster recovery as a service platform. So this is an open stack project and key components are Freezer, scheduler, agent, WebUI and Freezer API. So it is, we will explain what all components are installed where and what all they do. And backup file level using point-in-time snapshot and multiple compression algorithm they have and synchronizing backup and restoring on multiple node. So this is the architecture of Freezer from, you can get it from GitHub, it's open stack. And so as we explained it has four key components. So Freezer, scheduler and Freezer is and they get installed on node, either VM or physical server from where you want to execute the backup. So what Freezer, scheduler does it retrieves the, it's a demo and it's retrieved the data from Freezer API and it's in the form of job, job template you can define. You can define the accent inside the job what all you need to do, what VM you want to backup, what all policies and all those things. So it retrieves the data from Freezer API and it trigger the jobs and Freezer agent to execute those accents and the job. And Freezer agent then it execute either it can be executed standalone or by the scheduler and it provide the flexible way to execute the backup restore and other action on running system. And then Freezer API is used to store the metadata and it provide it to the horizon GUI and talking with scheduler and all. And next we have Converture. So this is a open stack this is backup for open stack and KVM so only it is for KVM, open stack KVM only. But yeah if you have that environment you can use it. So it do automated policy based backup and yeah it has full and incremental backup and their storage target yes with TC2, S3 and local remotely you can do with SSH they have compression option retention policy, scheduling policy and encryption option but as I said this is only for KVM so it doesn't spot much other hypervisor. So next is RQ so this is the backup solution from Avnix it's a backup restore utility for open stack with STN and it do STN backup restore policy based backup with automation then you can customize the policies and the offsite application you can do for DR things and it do full VM backup and tenant level backup also so at all together you can backup the complete tenant data so whatever network and VMs and everything you have you can backup together and does full backup and no agent in this so these are the four solution one is only for KVM so four solution we have currently which can do the open stack backup and we have these matrix from our POC and investigation which those feature I already covered so I'm not going to ex going each and every one for each and all solution so you can refer those even later so it's same continuation now we'll summarize those so as we know backup recovery is integral part of open stack cloud we cannot say this is the cloud and we don't need backup no we need backup in various scenarios and backing up solution in a traditional backup way is not an option as we explained traditional backups are way behind from what we need for cloud and open stack and open stack backup solution as we explained like we have currently four solution I think before 2, 3 years we didn't have much so those have merged a lot since few years and features still missing in existing solution or in progress maybe one is like network failure handling like when I do the backup and it's very heavy backup at the like 90% state network failure happen and again I have to take the backup from scratch or starting that's not good so that should be there some failure handling network failure handling or some mechanism to resume the backup whenever you fix the network should be there Trilure data has some default retry mechanism but that's not configurable as of now and I think it will be in the roadmap somewhere and then deduplication so many solution they say like deduplication they don't need because they do incremental backup because it does only incremental backup and as I explained deduplication is still needed because in incremental backup also we can have a lot of duplicate data and that depends on how big cloud we backup and all so those are the things and yeah we summarize with those so we have mic there so if you have any question please ask so with OpenStack they do or they do backup on OpenStack yeah I don't know with that actually so because we load on OpenStack it's fine but what we're talking about is backup of cloud itself because other backup solution we another they do the backup on OpenStack like on Swift here we are talking backing of the OpenStack cloud so as I explained earlier in my slides like backup of workload is different from backup of cloud so there are traditional softwares available not only convolved but there are various others who are offering backup for workload only but in this session we were trying to focus more on the needs for backup of cloud itself and how is it different from traditional backup? A lot of traditional backup they're moving from doing backup on towards the backing of on cloud that is the separate thing but cloud itself how we can back up so that only those four we have great thank you so quick question then so it sounds like you're differentiating the infrastructure from the workloads the applications that we're running in our environments what are your thoughts on bringing those two together because ultimately to recover my business I have to recover the cloud and have a consistent recovery of the workload I think this we're not saying that we're isolating workload and the infrastructure part we are we're talking about a solution which takes care of both together so yeah for example like in VM we have full context in that what all network setting is there what all security group was attached and all volume so as a complete workload when we say so some of the solution like we did the with the Trilio so they do the complete backup of complete open stack so say for example I have VM one with security group one two three volume attach one volume bootable I backup and then I restore so every setting of all security group and everything will be same as before data and it's capturing all the workload payload data if you want to think of it both in the ephemeral storage as well as the underlying volumes that may be supporting exactly thank you yeah because if they do separately then it's more towards that additional backup yeah okay maybe we can discuss with you because it will be nice to get to know more solution thank you yeah so regarding open stack freezer I believe that it's quite easy to you know there are steps to basically take a backup of a file or a db or a or a vm so in your experience what are the steps that we can take to use freezer for taking backup of the whole cloud how to restore it because these standard options and there are very essential things like we discussed there should be the complete workload backup with full context of the vm and volume so that is the one we should care like in ZOB we can define the accent with particular vm we want to backup I'm talking about doing a backup for the whole cloud as the infrastructure backup not the vm workload backup so you mean the particular vm the infrastructure backup so freezer is kind of backup as a service solution which gives you backup of any kind of resource including databases including your vm or including your service at all right so I think it's pretty easy I mean you just have to combine those apis and you know once you have listed those resources so I think it's lacking in that part as of now that it doesn't have a single click cloud backup the whole cloud backup integrated in freezer but I think implementing that is not daunting or challenging task so is it possible to do that with a policy file yes I think I think it should be yeah please have auto time but yeah so we're using Galera on our control planning and my expertise it's not super there where it comes to that is that backup you know any difference my SQL cluster right so is that backup process going to be different or were those products you talked about support you know a database that's handled with Galera or is it for single control plain some of them they have single control things with also like with freezer and database you can do separately but yeah internally like what text internally they use and what efficient that we don't know much so it's all at the single plane you can do complete database backup also yeah thank you sorry very quick question which backup tool would you most recommend and why yeah so as we discussed all essential features and the metrics things so still we did only functional POC on Trilio and freezer things so Trilio and tool good and freezer also but the performance POC and DR thing we are still in progress so it's like we have the features available in each solution and up to like SLA and RPORTO we can I think there is no single question it's a tricky question and to answer it tricky way I would say there is no single backup solution I would recommend for all kind of scenarios you have to evaluate them based on your need I mean so for example if as we are discussing like if for freezer if you have to write something on top of this but for Trilio if you already have that probably you can go for that I mean it's not recommended as of now from our side you have to evaluate their features and your needs match them and you know you get the answer thank you I have a question thank you for your session that was very excellent the question is that you mentioned four different solutions for this backup but does NEC also offer any kind of backup solution yeah currently no so actually sorry Gansyam let me interrupt but NEC has solutions on the storage side target side of it I mean NEC has a product called HydroStore which can be used as a backup target unfortunately it doesn't have a control plane software for the backup part of it as of now but most likely they are planning to have that kind of so it could be from the partner ecosystem it could be from their own but it's not yet final so for target part they have yes thank you for your backups there is there any quiescing that you guys do for the backups any kind of quiescing like a flush to the file systems before you do the backups flushing the file system no yeah I mean our system it uses a database but the database actually uses the NTFS file system for the structure of it so in like in vmware we'll call a quiesce on the database before we back it up so we're doing like a vm snapshot we'll do a quiesce on it I don't know if any of those four that you mentioned have a quiesce option no not as of now but that would be a good future required it would be yeah thank you thank you ok thanks everyone for joining the session thank you