 So, these are the azindas that I will go through the sessions. First, we will look at what actually disaster in cloud. Then we will just take a look at how disaster can affect DNS. And then we will just go through what are the disaster recovery, how it comes to rescue. Then we will look at what are the crucial disaster recovery plan that we should make. Then we will go through a brief overview of designate. Then we will see how we can deploy multires and designate, and how we can use it for disaster recovery. Then we will have a small demo, and then we will discuss the challenges. So, let's start. So, what is disaster? Anything that has a negative impact on the company's business continuity or finances could be termed as disaster. Disaster, when we talk about disaster, we talk about anything. It's an event that can bring down your data centers, like tornadoes, earthquakes, fires, sometimes hardware failures, and there are human-induced errors also. So, when disaster strikes your data center, sometimes what happens, the data, you lost data, and data is the lifeline of an enterprise. So, you don't want to lose your data because it impacts on your business continuity and also on the revenue of the company. And if you talk this in terms of DNS, how disaster can affect DNS, we see that when, because our application, most of the applications, they are being resolved through the DNS. So, when just DNS goes down, your site will be inaccessible because the DNS server will not be present to resolve the domain names. And the failed component, sometimes failed component of some DNS servers may cause the mail delivery to be delayed anywhere from few hours to few days. And it literally cripples a global IT operations. And you may lose your customer goodwill as well as your reputation will be lost. And nobody will be able to reach your website when your DNS server will be down, whether your application is up or running. So, I would like to tell you a real incident of a disaster affecting DNS. Back in 2001, Microsoft websites and its other associated companies' websites like Amazon.com, Expedia.com, and Hotmail.com, they had their DNS servers, their four DNS servers on the same network segment. So, what happened at that point of time? Some technician changed the routing table in a way that the outside traffic, no message was able to reach those DNS servers of Microsoft. So, the site was, the Microsoft family of websites were down for 22 hours. And later, it was blamed to be the poor network design that all the four DNS servers were on the same network. So, later it was fixed. And what happened? Just because those 22 hours of down, because all the sites were down, Microsoft families of websites were down, so they lost the incurred a heavy loss. And also there in advertisement also there are other websites, retail websites also lost a huge amount of revenue. So, how we can just prepare ourselves to recover this type of disaster? So, we have really crisis ahead if our DNS goes down. So, disaster recovery comes to rescue. So, what disaster recovery is? The process, policy and procedure for recover of technologies and infrastructure after a natural or human induced disaster. It's a process of ensuring continuity of a set of workloads, following or in advance a large scale disaster that disrupts the current environment or infrastructure. So, actually disaster recovery is basically what makes your IT environment run again following a disaster that really brings down your data centers. So, by large scale disaster as we already talked, which can lead to a complete loss of data centers, like such as floods, tornadoes, hurricanes, hurricanes, fires. And when we talk about the disaster recovery, what we actually mean is that the recovery site should be at a geographically distinct location. So that we will be able to just, at the time of disaster, we will be able to quickly recover by switching to our disaster recovery site. So, this is a pie chart that just depicts the different types of failures due to just the cause of disasters. So, when we think about disaster recovery, first thing that everybody thinks is data backup. So, is that the data backup is the only solution? No, that is not the only solution because data backup is empty. If you have no recovery solution, you can back up the data and metadata to recovery site, but if you don't have any recovery plan, then the disaster recovery of no use. So, disaster recovery is more than simply backing up the data. So, when we plan our disaster recovery, the first thing that the plan begins with is the business impact analysis. And at that point of time, we think of two key matrices. One is recovery time objective and another is recovery point objective. So, what is recovery time objective? It's a maximum acceptable amount of time that your data center will be offline. So, say if your recovery time objective is four hours and some disaster strikes at 7 a.m. in the morning, then by 11 a.m., the workloads that is running should be recovered. And second one is recovery point objective. Recovery point objective, it's a maximum acceptable amount of time in which your data will be lost because maybe your data may be impacted just because of some disaster. So, the recovery point objective is measured in terms of quantity. Note, it doesn't mention the quality of data that will be lost. So, you can see through this graph that if you want to have your recovery time objective and recovery point objective to be very low, then definitely you will have to incur cost. In banking application, this is really required because you have thousands of transactions going in seconds. So, you really want your disaster recovery plan to have RTO and RPO to be very less. Physically, for all the applications, if you want your RTO or RPO to be in milliseconds, that is normally not feasible. So, these are all defined in SLAs. So, another thing that is very crucial while we plan our disaster recovery is ensure that the replicated data and metadata consistency. So, what is data? One is data consistency. It's point-in-time consistency. Means the data should be constant in time. Means if we are backing up the data from our primary site to recovery site, then that data should be consistent as well as the changes that are being made after the backup should also be reflected at the disaster recovery site. So, next is metadata consistency. Metadata, there's all the configurations that we do in all the configurations that are in different components. So, metadata consistency, that the configuration updates, that will be in the same order related to one another and the data updates, and two data updates. Excuse me. Okay, next, let's... because we are going to just plan a disaster recovery in designates. So, let's go through the designate overview. So, designate is a multi-teenage DNS service for OpenStack. It provides DNS service in your OpenStack Cloud. It's providing a REST API by which you can easily manage your join and record sheet in OpenStack Cloud. It's integrated, it's REST API integrated with Keystone that provides authentication, and it can be integrated with NOVA and Neutron for auto generation of records. So, let's have a brief overview of designate architecture. It has API just like another OpenStack component APIs like NOVA and Neutron, which accepts the HTTP request, validates authentication tokens, and then passes the request to designate Central for further processing. So, Central, that is the component where all the business logic resides. So, it talks to database, and also it's also talked to workers, producers. So, here, what the producer does, it actually generates theoretical tasks and send to worker to do the further task. Worker is the component that actually performs most of the work. It performs the creation of zones, creation, relation, updation of zones by performing, by creating zones files across the backends, and accordingly, it's updates. So, backends are just, it's a pluggable, it's a pluggable DNS drivers. So, backends in designate, we have supports for various DNS drivers like Bind, PowerDNS, Dinect, and there are others also. So, MiniDNS, it's a minimal DNS server that is written in Python. Basically, what it does, it actually propagate zone information to customer-facing DNS servers. Now, multi-designate deployment. So, here, we will have two sites, like in primary sites, which is active, and we have the DR site that is on the standby mode. So, because in disaster recovery, when we talk about workloads, we actually, for backup, we take two things, one data and another metadata. So, for storage replication and database we can perform, there are multiple tools through which we can just backup our designate database because here I will only focus on designate disaster recovery. So, I am only discussing about, in designate, what we need to backup so that we can perform disaster recovery at the latest stage. So, we can take database backup and we can have DR middleware like that will keep the metadata, that will copy the metadata from primary site to DR site and keep them in sync at the DR site so that later we can recover that. So, this is a multi-designate deployment. So, here we have, in reason one, we have active designate deployment and in reason two, we have designate in standby mode. So, actually what we are replicating here, we are replicating a database to our disaster recovery site. So, what I actually use for this, it's for because I had a small amount of not very heavy workloads for I just tested it for light workloads. So, accordingly I used MySQL dump that comes with MySQL server and basically what MySQL dump, through MySQL dump, you can take the backup of your database and it actually generates a script through which you can recreate the database at your disaster recovery site just by copying them there and another is for backup replication of the other conflicts like pull configurations and others. I used R-Sync. R-Sync, it's a very efficient, it's an application tool that copies, that copies the files from one location to a distinct location and it's efficient because it has an algorithm, like it uses the delta compression algorithm and that transfer, that performs only the incremental transfer, thus reducing the bandwidth over the network. So, we will have a sort demo in which what I have done, actually I have did it with two VMs, one in India and another one in Japan. So, what I did, I just integrated my designate component with NOVA and you will see in the demo that when I will create the instance, then the records will be automatically generated and we can, at that point in time, we can take backup of my designate database and also we can take the backup of another configurations of designate at the recovery site that will be there in Japan and after that I can just recover from there whenever the disaster strikes at my primary location that is in India, then I can recover from my OpenStack deployment in Japan. So, let's do the short demo. Okay. So, go through the demo. So, basically this is the reason one is my primary site that is active designate deployment and the stack two is my, that is recovery site that is there in Japan. That is my, that's recovery site. So, basically I created one, a June with name example.com. So, it's created and I will perform the necessary configuration for, just for automatic record creation in case any instance is created. So, I restarted API and sync component. So, after this, when I will create an instance, so the designate sync component will automatically listen to the event notification from NOVA and later it will trigger the creation of automatic records according to my configuration that I put in the designate component. So, I created an instance with name test VM. So, we will see that the records will be automatically created in the configured zone. So, I just configured these three records. So, we have two IPv4 records and one IPv6. So, now what I will do, I will just replicate. I'm here, I'm replicating all my designate configuration to a remote site. And I can also just set our sync in my, as a Chrome job by which I can just settle this activity so that my current, my primary site and my disaster recovery site will be in sync. And I also, when we perform disaster recovery, we have to replicate all our zones files. And so, we can see I replicated all our njdev configurations and all the zone files there. I can also take the database dump that will generate a script that I will just replicate at my, take a backup at my disaster recovery site. And there I can just recover all the database and its data. So, my SQL dump at the time of backup, it automatically provides the logs due to which we can mitigate the inconsistency that may happen when the data changes at the time of backup. So, now we switch to disaster recovery site. So, actually I'm just changing the IP address there. So, we have the same pool configurations that were in the primary site. I used bind9 as my DNS server backend. So, these are the script that I just did the backup of. And now I will execute this script so that the database will be recreated at this recovery site as well as the back data will be populated in the database. So, I restarted all the services. I started all the services of designate at my recovery site. Now, all the services started. Now, we will see that now I will be able to list the zones at my recovery site that was there in primary site. That's authentication problem. So, keystone is started. Now, we will see... Now, we recovered the data that we backed up from our primary site. Now, we will see that we are also able to all the record sets of that zones. So, in this way, we can just perform the disaster recovery. We can perform... Because I just did it in a manual form, we can automate this with the help of some Python scripts or we can automate the workloads with some heat scripts. And now we will look what are the challenges that we face during our disaster recovery. So, basically, if we are managing multiple DNS servers with designate, actually, there is no DR plan in place for DNS server. You will have to... You will have to just back up all the zone files and note of all the DNS server. Maybe you will have to just... You will have to redeploy... You will have to just have the same deployment at your disaster recovery site. So, also, there is no favorable procedure currently in designate, by which if some disaster strikes at my primary site, I can quickly switch over, automatically get switched over to a recovery site. And the next challenge is the current plan is incorrect or unreliable. Like, if I'm my school... Like, I was using my school dump to back up my database, but it is good if I have small workloads, okay, like around 5 GB or something. If I have heavy workloads, then my school dump will be of no use. Like, it may take the high bandwidth and low latency will be there. And also, next challenges that I faced during the disaster recovery is the plan includes unnecessary technologies like we can... There are other technologies like we can take... Like, there are RBD measuring or... There are backups available. So, we can just use that instead of some... Instead of MySQL or MySQL or R-Sync for other workloads. And what happens sometimes, the plan has not been effectively tested. Like, sometimes, when we plan our disaster recovery, we sometimes consider... Like, I consider light workloads because we don't have heavy workloads. So, it was actually not able to test my disaster recovery plan on that workloads. So, these are the challenges that I faced during my disaster recovery plan. Any questions? Yeah, I'm also replicating the keystone, but I didn't focus in my presentation because it's a really... Disaster recovery is a big topic, right? Yeah, you need to recover keystone at the recovery site. You mean to say that I should have distributed my workloads, right? Actually, you mean to say that if I'm creating zones, okay, so that will be... Like, it will be pushed to both the reasons, right? And will be in sync. If I create any new records, then that will also be available in another reason, right? You mean to say. Yeah, we can do that with the help of pull configurations. But sometimes it's hard. You say it's propagating with the help of pull. We can just push that zone information to our targets, but it makes the configuration really hard just because of long distance. Maybe it will take time.