 OK, good morning, everyone. I'm going to be talking about surviving the worst vision for OpenStack disaster recovery. This presentation is relevant for people who have applications that existed before the cloud that weren't developed specifically to run on an OpenStack cloud. If all of your applications were written to run on OpenStack, then you can probably handle DR at the level of the application, and you would write your application with awareness of a cloud. But I'm assuming that almost everyone who's here has at least one application that existed before the invention of OpenStack. Now, this vision I'm going to describe is based on work that was done with colleagues from within IBM as well as many very fruitful and useful discussions with colleagues from Red Hat. And I'd like to thank and acknowledge their contributions. So what I'm going to do in this presentation is I'm going to start by laying out some basic concepts of disaster recovery, making sure we're all on the same page that we all have the same basic understanding. If any of you were in a talk that I gave on OpenStack and disaster recovery in Portland, this may be a bit redundant. I apologize, I'm assuming most of the people here were not at the talk that I gave at the summit in Portland. I'm going to describe and make sure we all agree what disaster recovery is, because disaster recovery is a term that everyone uses in a slightly different way. So I want to lay out exactly what we mean by disaster recovery. I'll then define some concepts such as recovery point objective and recovery time objective, which are key concepts in understanding disaster recovery. Talk a bit about how we replicate data, how we replicate state. One of the key things in order to survive a disaster is you need to have your data available after that disaster, which means having a copy of that data someplace else. In other words, replicating that data. And then I'll talk a little bit about consistency. After going through this basic background, I'll go through a workload example, show how that example maps to OpenStack, and talk a little bit about what would be implied in terms of having a DR solution for that workload running on OpenStack. And then finally, I'll start laying out our vision and talk a little bit about what we're doing. The objective of this talk is to motivate the requirements for disaster recovery of workloads running on OpenStack, as well as encourage involvement and ongoing efforts to enable supporting disaster recovery solutions with OpenStack. So as I said, let's start with a definition of what is disaster recovery. And instead of using my definition, I'm going to use the Wikipedia definition. And we could pick other definitions. So according to Wikipedia, disaster recovery is the process, policies, and procedures for recovery of technology infrastructure after a natural or human-induced disaster. Let's look at this definition in more detail and look at the component parts of it. Natural or human-induced disaster. Flooding, hurricanes, earthquakes, all examples of natural disasters. One key characteristic of these natural disasters is they impact a large area. Poisoning, fires, terrorist attacks, also disasters that can impact a large area. The key point here is these are disasters which can take out a data center. They can take out a city block. They can even take a large metropolitan area essentially off the grid for periods of time. And we can all think back over the past decade or so of multiple natural disasters, which essentially took large chunks of industrialized nations off the grid for a week, days, maybe even more than a week. So surviving a disaster requires geographic dispersion. You cannot survive these types of disasters by having a copy of your data on the other side of the data center. We're assuming, in this context, you lose a data center, you may lose a metropolitan area, and you are going to want dispersion of your data at the distance of 100 kilometers, maybe you're going to want it at the distance of 1,000 or 2,000 kilometers. Technology infrastructure. What is technology infrastructure? Servers, storage, networking, software, configuration. All of the stuff that's required to run that workload and all of these elements can map down to OpenStack. So servers are managed by NOVA, storage by Cinder, the networking by Neutron, the software images sitting glance, and he can be used to do the configuration. So there's a good mapping of this infrastructure, this technology infrastructure, to what OpenStack manages. And then finally, process policies and procedures for recovery. There's no disaster recovery for that. OK, so these process policies and procedures for recovery have three aspects. There's what do you do upfront when there is no disaster, and you're a good path. And hopefully, this is the only thing you're ever going to have to do. Disaster recovery is like an insurance policy. Preparing for it is something you should do. You want to never have to use it. But it's good that it's there if you do need it. So the upfront is where you're going to spend almost all your time. And upfront includes multiple pieces. One, planning, figuring out what workloads need to survive a disaster, how you're going to survive the disaster, what quality of service attributes, what SLAs you want in terms of surviving the disaster. Second piece is copying. As I said, if you want to survive the disaster, any resources you need in order to continue working after that disaster need to be available at some geographically dispersed site, some other place that needs to be a copy of them so that you have access to that. And finally, testing. And I'm not going to go into testing in any more detail, but it's like that old adage. Backups never fail. Every single backup in the world succeeds. It's only the restores that fail. If you've never tested your DR solution, it's almost guaranteed that when you actually need it, it will not work. So testing is critical. I'm not going into it in this talk. It's not because it's not critical. It's because there's just a huge amount of work to do there. We don't have it all figured out. And it's not something we can go into in a 40-minute talk. So as I said, copying is a key piece of this. And there are multiple ways we can copy. You can copy continuously, as state is being modified, as new resources are being created, all the time copy them. And this can be done synchronously, or it can be done asynchronously. And I'll go into those in a little more in a slide or two. And then it could be done periodically. A periodic copy, for instance, and this is a legitimate DR solution, is a daily off-site backup. I backup all my data, all my resources, once a day. I put it on a tape. I put that tape on a truck. And I send that truck over to a salt mine somewhere, and save that data in the salt mine. That's a legitimate DR solution. It has different characteristics than doing something continuously. A periodic offline backup is probably not what you want your bank doing for a DR, because if they do that, they're going to lose all the transactions for a day. And if you just deposited your salary in the interim between when they did that off-site backup and the disaster, where your salary's gone, but there are many workloads where that may be good enough. So for instance, in engineering shop, in many cases, if you lose a day worth of work, that's acceptable, not desirable. But as you'll see, there's a trade-off here between cost and quality of the solution, which is why you may move to a periodic detection. You need to be able to detect that a disaster occurred. In practice, what I've seen in many real-world disaster recovery solutions, there's a person in the loop in the detection step. It's possible to automate this, but usually because the cost of handling a disaster is so great, they normally want a person in the loop, even if it's mechanically detected, automatically detected, to say, yeah, there's really a disaster, failover, move my workload over to the recovery site. And then there's recovery. And if the recovery happens at our secondary data center, we have to first recover the infrastructure, make sure we have servers running, we have our storage available, we have the right network configurations, and then we need to recover the applications on that infrastructure. OK, recovery point objective, RPO, recovery time objective, RTO, two key concepts in disaster recovery. Recovery point objective says, how much data am I willing to lose in the event of a disaster? In other words, how far back in time will that disaster take me? Now, as this figure indicates, the less data I want to lose, so for instance, an RPO of zero, which means I lose no data, typically the more expensive the solution. As I described, an off-site backup, which may have an RPO of 24 hours if I do a daily off-site backup, may be a lot less expensive than something that's making sure that every modification I have, every change of state is available always off-site at a secondary data center instantaneously. So RPO, the lower the RPO, the more expensive the solution in general. The flip side of RPO is RTO, recovery time objective. How long does it take me to get back up and running after the disaster? The ideal, the best that can be achieved, obviously, is an RTO of zero. An RTO of zero means that you don't actually feel there was a disaster from a client perspective. A disaster occurred, you just keep working. Everything flips over automatically. An RTO of zero is very hard to achieve. It's rarely done. Sometimes it's done with very specialized solutions. Sometimes things like mainframes with very high end storage can automatically flip over. But in general, an RTO of zero is not what's gone for. An RTO of zero, we need to understand it's a hot site. Everything is always running at both data centers. And so it ends up being very expensive. You could have much lower RTOs, where, for instance, all you're doing is making sure your resources, your state, is available offsite. You have absolutely no servers, no network, other than the storage, which holds that persistent state, nothing available. And then after the disaster, you go out and you acquire that infrastructure. You can acquire that infrastructure from a cloud. You can acquire that infrastructure by going and buying hardware and deploying something like OpenStack on that for a private cloud. But you wait until the disaster. And so there's a range, a spectrum here, of available RTOs and RPOs. Now, replicating the data, two main approaches if we're replicating continuous, synchronous and asynchronous. Synchronous is what you need to do if you want a recovery point objective of zero. The key point with synchronous replication is the host writes a datum, in this case, A to the primary. That datum is copied to the secondary site as part of the host write. The secondary site acknowledges that it got the data. The data is hardened at the secondary site and at the primary site, only after the infrastructure knows it has two copies, one local copy, one remote copy, is that write acknowledged back to the host. That enables having an RPO of zero. By contrast, asynchronous replication, the host writes the data, and that write is acknowledged back to the host. Notice there is no copy of the data at the secondary site. If a disaster occurs now and my primary site is gone, that write of A is lost. It's lost forever. That's what an attribute of asynchronous remote of replication. Asynchronous replication means that there will be data loss in the event of a disaster. The reason people do asynchronous replication is either typically because of cost. It's a less expensive solution, but it also enables going further differences. You really don't want to go asynchronous replication if one data center, for instance, is in Hong Kong, and the other data center is in New York. A synchronous replication would just add a huge amount of latency to every single host write. And then the asynchronous replication continues at some point in time in the future. The primary site will send that data to the secondary and get an acknowledgment. And only after this point in time will that write A be available. Consistency, here we're talking about power fail consistency, ensuring that the data at the secondary is what the application could have seen at the primary. It's not necessarily a state that actually did exist at the primary. It's what could have been seen at the primary by the application, in particular if the event that we had simply turned off the power at some instance in time at the primary turned the power back on. It is something that reflects the state that could have been seen on that set of persistent storage by the primary. In general, when you're looking at storage systems that support replication, such as asynchronous replication, it may require some fix up that is done by the infrastructure to ensure that the application sees a consistent state. It may not be instantaneously consistent, but from a application point of view it will only see consistent data. Just a concrete example, if an application is writing A, B, C, D, E, and then at the secondary we see A, B, C, that's consistent. We lost D and E. That's fine. We lost some data, but we're consistent. If on the other hand we see E, B, and D, not only have we lost data, we have an inconsistent state. E may have been a withdrawal where A was the deposit that enabled that withdrawal. And essentially, when you have inconsistent data, you have garbage. Now, inconsistent data results when data is not forwarded in the order of host rights. This is particularly an issue when you're dealing with multiple volumes, multiple resources which are getting that data. It's primarily an issue with asynchronous replication. In some corner cases it can actually even occur with synchronous replication if people are interested after the talk during the break. I can explain that. And as I said, inconsistent data is essentially garbage. OK, so let's move on to an example. And as an example, I want to consider a three tier application. This is a standard three tier application. We have a client. And for the sake of this description, the client is outside of the cloud. We have a set of app servers running on hosts. And let's assume these app servers are managed by NOVA, a set of data servers also managed by NOVA running on hosts connected by a network which may have been configured by Neutron. These app servers, and in particular the data servers, are speaking with persistent storage managed by Cinder. We have the images for these app servers and these data services coming from an image repository, Lance. And we're checking the security of everything with an identity service, in this case, Keystone. OK, so what would it take to ensure that this workload could survive a disaster? Let's look at the component pieces. If we look at the images, these are the images for the application and data service obtained from Lance, we need to ensure these images, both at the image content, the actual software, is available at the recovery site. And we need to ensure a compatible image metadata. The image metadata does not need to be identical. It needs to be compatible. And intentionally, I'm using a vague term for compatibility here as opposed to a precise term of consistency because it really depends on the specific aspects of the metadata. If we look at the app servers and the data servers, well, this involves the virtual machines. What are those attributes of those virtual machines? It involves the security aspects, and it involves the network, which enables everyone here to communicate. And here, again, we need compatible metadata for the security, the network, and the VMs. And this metadata needs to be compatible, for instance. I can use different flavors of VMs at the primary site and the recovery site. I may, at the primary site, invest a lot more and want to be getting top level performance. I may say, OK, if a disaster occurred, I'm willing to give up on a little bit of performance during my recovery and have a different flavor of VM at my recovery site, which maybe doesn't give me quite as much performance. But what is important is that the metadata need to be consistent with the application's persistent data. So for instance, if I've attached at the primary site three volumes to my VM, and my VM, my application is writing to three different persistent volumes, its persistent state exists on those three volumes. If at the secondary site, I only know about two of them, and I try to bring up the application on the replicated copy of my data, I'm going to have garbage because I only have two thirds of my data going up to the application. Then the persistent storage, the data managed by Cinder. The persistent storage is the persistent state modified by the application. It needs to be replicated in a consistent way to the secondary, in a way that's also consistent with the desired RPO and RTO. And the configuration information needs to be replicated. For instance, we want to think the volumes are the same size at the two different sites. We don't now have different views of what the size of the volume is at the primary and the secondary, but the information doesn't need to be completely identical. For instance, we could use different volume types to implement the volumes at the primary and the secondary. Again, the question of what quality of service do I want for my application if I'm recovering it from a disaster, I may be willing to accept a lower quality of service. And while not shown on the picture of the application, Heat also plays a role here. Heat, first off, could have been the means of actually deploying the application at the primary site. It is a means of extracting configuration information and getting this information so that we could deploy it at the secondary site. One thing to note is the templates from the primary and the recovery site may be different. For instance, the number of instances of an app server that I'm running at the recovery site may be different than the number of instances at the primary site. OK. So I've gone through some basic DR concepts. I've gone through a workload example. I now want to go through our vision for open-stack disaster recovery. I'm going to start by giving a high-level view of some of the basic tenants and an overview of what we're looking at. I'm then going to talk about state. And there are three specific aspects of state I want to talk about. Images, the software, the blobs, the things that get executed, data, the persistent state that's modified by those executing software, by those applications, and metadata, all of the information that's used to describe the VMs, describe the network configuration, describe the storage, and so on. And then finally, I'll talk about how do you automate a process to do the DR? Now, this is called vision. That means it's not all there, nor is it all going to be an ice house. But we are starting some work, and at the very end, I'll give some pointers to some other sessions that are going to be occurring, talking about some of the steps towards getting this towards ice house. Also, as I said at the beginning of this talk, part of the objective here is to make sure we have a common understanding of what's needed. But part of the objective of this talk is also to motivate people to get involved in helping ensure that OpenStack provides a good DR solution. So what are some basic tenants of our vision? First off, the disaster recovery here is between two independent clouds, between a primary cloud and a target cloud. At some points in time, we've even talked about maybe one of these clouds isn't even an OpenStack cloud. Our focus right now is between two OpenStack clouds, but in principle, it could be between any two clouds. These clouds are independent of one another, and we really mean independent. You can't have a single keystone for those two clouds because if you have a single keystone and you lose that single instance of keystone, you've lost everything. The one caveat here is perhaps a global Swift cluster. And if you were all at the keynote on Tuesday, where the usage of this was described, a global Swift cluster is built in such a way that if you even lose an entire region, an entire data center, your data is distributed in multiple data centers. And even if you lose one data center, you still have access to all your data. Nothing gets lost. So the Swift cluster, the global Swift cluster is designed and built in such a way that it is disaster tolerant. We have the primary and targeted clouds interacting through what we're calling a mediator. At this point in time, I would admit still not a completely well-defined concept, but we want to have some piece of middleware, some piece of software that sits between the two clouds to ensure a good decoupling. Very important for us to enable hybrid deployments. That means, for instance, I could have an on-premises private cloud that I'm running OpenStack. And I want to use for disaster recovery purposes a public cloud provided by some service provider, because maybe I don't want to own two data centers, or maybe I don't want to have all the physical resources available for the recovery of the disaster. But everything we're talking about should also work with two private clouds or between two public clouds. We want to protect a set of VMs and their related resources, not an entire cloud. Not all workloads are equally important. There's a lot of workloads where I may not want to pay the price for ensuring disaster recovery, or where I may want to do a fairly inexpensive solution for disaster recovery, for instance, the off-site backups. And we need to allow flexibility in terms of both RPO and RTO. OK. Pictorially, this is our vision. We have a primary cloud, storage, VMs, networking, security images, and a secondary cloud, which is our target for recovery. At the bottom, we see storage replication. And one of the things that's being worked on towards Icehouse is enabling continuous volume replication, essentially setting up and being able to leverage facilities that exist in most storage implementations to replicate data from one cloud to another cloud, or from one storage instance to another storage instance. What we see is in most storage controller products, be they integrated vertically integrated controllers, or be they software-based solutions that are deployed on servers. There is support for volume replication. There is support for replicating data from one data center to another. At the top, we see the workload description. That workload description concludes, conceptually, the metadata, the images. It's extracted from that primary cloud. I said we're going through some piece of mediator. In this case, it's called the DR middleware. That is sent over to the secondary cloud, applied at the secondary cloud. And this is work where we're going to start on this in Icehouse, but I think we're fairly convinced it's not something we're going to finish in Icehouse. And this is going to be an ongoing process. If we look at state, as I said, there's three aspects to state, images, data, and metadata. The images in OpenStack are maintained in glance. There's two aspects to this. There's the registry, which is metadata. And I'm going to talk about that later. And then there's the actual backend store, the blobs, the software, the actual executables of the images. These backend store could be, for instance, Swift. And I'm going to focus specifically on the Swift global cluster as an example. It could be storage managed by Cinder, in which case we could let Cinder replicate the storage used by Glance in the same way it can replicate. We're working to get it to replicate the storage used by the application. Or in this case, it could even be a manual replication. I go and I create an image in my primary cloud. I by process have to create that same image in my secondary cloud. Now, manual things are not great, wouldn't recommend it, but it is a possibility. So if we consider using the Swift global store, the screen box represents the image. Here we have a Swift global store with one of the regions in the Americas, the other region in Africa. And this works very nicely from Swift. I go ahead, I create that image into Swift. And Swift worries and ensures that I have a replica of that image stored in multiple regions. In this case, I have one replica that's stored in the Americas. One replica stored somewhere over Kenya. And from a user Swift perspective, this is a single object. But Swift is protecting the data such that if I lose one of my data centers. For instance, I lose the American data center. I still have access to the object, to a replica of that object in the African data center. Data. We can look at high RPO solutions. So for instance, one approach for a high RPO solution would be to do a backup to Swift, either a remote Swift cluster or a global Swift cluster, a daily backup. If I'm doing backup as my DR solution on ensuring the state, maybe I can do it a couple of times a day, maybe once a day. It's not something I can do very frequently, so it's going to be fairly high RPO. I'm focusing here more on the low RPO solutions, which involves some sort of storage replication. This could be host-based, in which case it's going to be managed by Nova. We've been focusing more on the storage-based replication, which is managed by Cinder. And I'm going to go through, at a high level, what's going to be discussed in much more detail in a design summit session later today, how we might go and have a storage level replication. So a user comes and requests from Cinder a volume, and she says, I want this volume replicated to Europe. I'm getting a volume. I'm coming into my local data center, and I want to make sure I have a copy of my volume in Europe. She will also specify various aspects of what type of RPO she's desiring, which helps pick, is it a synchronous copy, an asynchronous copy? Given that it's to Europe, this is almost certainly an asynchronous copy. The scheduler in Cinder will then locate a driver that supports the appropriate pairing. In other words, it's going to find a driver in the local data center that knows how to replicate that data to Europe with the appropriate SLA, with the appropriate RPO. The driver will create a local copy and perform any initialization it needs to do to set up that pairing. Afterwards, there'll be a request to come in and actually create the other side of that pair. Cinder will send it through this replication gateway. Again, we're having some piece of intermediary, a mediator between the two clouds, which will actually ask the remote cloud to create the volume. We will then set up the pairing and ensure that there's a pairing between the primary copy and the remote copy. And then after this, anything that gets written to that primary copy will be replicated according to the policies that were requested when this pairing was set up will be replicated to that remote copy. And as I said, this is work that's going on for Icehouse. Metadata. On the side of the chart, you see lots of examples, and these are only examples of OpenStat metadata. Some of this metadata, as I said, needs to be consistently copied to the other side. Some of it needs to just be compatible. So for instance, as I mentioned, you can have different flavors of VMs. You can have different volume types at the two sites. And this metadata needs to be replicated in a compatible way so that I can bring up my infrastructure so that I can run my application at the recovery site. Metadata could be replicated periodically, or it could be replicated continuously. In general, whether you do periodic or continuous could be somewhat independent of the choice of the data. So how you replicate the metadata is not necessarily the same way as the way you do the data. You do need to ensure consistency. For example, the example I gave, if I have a VM with three volumes attached on the primary site, when I bring up that same workload, that same application, I better have all three volumes attached at the secondary site. We need to transfer it and apply it at the remote site. There may be fix-up required. And in general, some of the commands are going to apply it at the remote site at once, and other of the commands are going to be applied at the remote site only on recovery. So for instance, if I'm transferring metadata about an image, I probably want to apply that at once. If I'm transferring information about a user, information that's going into Keystone, I probably want to apply that at once. Initializing a VM, however, is something I only want to do at recovery time. And so we need to be selective here about when this metadata gets applied. The other thing that I want to point out is that copying the raw data from the controller databases, basically going underneath OpenStack, just copying the bits out of the databases, probably not going to work. It's probably not going to work for a couple of reasons. One, it's not selective. As I said, we don't want to have disaster recovery for a cloud. We want to have disaster recovery for a workload. And a cloud runs lots of workloads. The other is, basically, we would entail having the same configuration, the same hardware at both data centers, at both sites. And again, that's not something that's a reasonable expectation. Finally, automation. When we look at automation, we need to identify what to protect and set up, in particular, looking at aspects of the state. And we need to test. And as I said, I'm not going to test. Basically, what you have above that line here is all the good path stuff. At the beginning, I said we have sort of good path, detect, recover. What's above the line is that good path stuff. The line, you can think of as detection. And below the line, we have the recovery, the fellow. If we look at the automation in a little more detail, let me give a couple of examples. How do we automate the example relative to glance, ensuring that we have the image state available such that we can recover? Well, let's start by creating the image into the Swift Global Cluster, which I already showed. Here we have an image A3B5. It gets created into the Swift Global Cluster. We then come to glance and define the primary glance to point at that image, which is sitting in Swift. We extract the metadata from the primary and transfer it to the secondary and then replicate it at the secondary and then have the secondary pointing to the image in Swift. Now, since this is a single Swift cluster, they're actually pointing to the same object. There's only one object in Swift. Because if we're using a geographically aware DNS, they are likely to be getting under the good path different replicas when they actually request the object. And in the event of a disaster, Swift will just work things out and ensure that the secondary glance will get the image that's stored on physical hardware, physical resources that are sitting in the secondary data center. OK, second example, Nova. Here, let's consider provisioning a VM at the primary, possibly with heat. At the start, before we've provisioned any VMs, the primary Nova knows about the flavors foo, bar, and baz. The secondary Nova only knows about the flavor baz. Let's assume we come in, we want to provision two VMs, VM1, which is flavor foo, VM2, which is flavor baz. We've created those VMs at the primary site. We need to extract the metadata and the dependencies, for instance, the flavors from the primary and replicate to the secondary. So here we have the VMs. We know we have a VM1, which is flavor foo, and VM2, which is a flavor baz with dependencies on these flavors. We create the dependencies and a heat template at the secondary. So we've added the flavor foo to the secondary. We also now have a template sitting at the secondary, which we will use in the event of a disaster to actually deploy those VMs. That template knows how to deploy them. We do not apply that template at this point in time. We don't want the VMs running at the secondary. We only want that in the event of the disaster. If we have a disaster, at that point in time, we'll use the template to deploy those VMs on the secondary. So I've gone over some basic DR concepts. I've given an example of a workload and tried to motivate what would be needed from OpenStack to ensure disaster recovery for this three tier workload. And I've gone over our vision. I hope I've motivated the requirements for disaster recovery. Even more than that, I hope, since we're only starting to scratch the surface, I hope I've encouraged at least some of you to get involved in this effort. We have a Wiki that we've been working on with Red Hat. There is a Cindy Design Summit session on continuous volume replication today at 2.40. And there's going to be an unconference session to go into more technical details tomorrow at 9.50. I think I have time for a few questions. So before finishing, are there any questions? Let me do one question at a time. So the first question was, what do you do about the CAPEX, the resources that you need to have at a secondary data center to enable the recovery? And this is a very valid concern, right? Now, there are certain industries, for instance, banks, where the cost of being down is so high, they're willing to pay for essentially a hot standby, a data center where all the equipment is sitting there running. One of the big benefits of doing DR with OpenStack and with the cloud is that I could, for instance, do my recovery from a private cloud to a public cloud and acquire the resources on demand when I have the disaster. Obviously, I need actual physical resources allocated for holding my persistent state. But the CPUs, the networking, the servers, all of that can be done on demand. What was your second question? So the question here was about the network issues. And I think there are issues that we haven't addressed all of them. I think using Neutron to help manage the networking should help address this. But I would not claim that we have everything addressed yet. Talk about how it's addressed in traditional approaches. But traditional approaches tend to have a lot of manual aspects involved. So I did mention this indirectly in a couple of ways. I talked about the flavors of the VMs and making sure that they actually are available. I also mentioned, without going into it in any detail, that there's this big planning step. Part of the planning step is ensuring that you're actually going to be able to acquire the resources or have the resources available to do the recovery. So the question is, what are the top three infrastructure requirements from OpenStack for supporting disaster recovery? I think storage replication is going to be one. But beyond that, I don't think I can answer that question because it will be workload dependent. It depends on what the workload is that you're doing. There is no stock answer. Even storage replication, if you're running a stateless workload or mostly stateless workload, then storage replication may not be a big issue. If all of your state can fit into a Swift global cluster and it's natural to get the data out of Swift, then storage replication becomes a lot easier. If you're running a database, then it's a lot harder. So the assumption here is I've lost my entire data. The question was, how do you take care of the VM, of the path between the VM and the data if you're replicating the data? Well, the assumption is I've lost my entire data center. Now, if I've lost my entire data center, that means I've lost the VM. I have to bring up a new VM at the secondary data center, which is speaking to the secondary copy of the data. That all needs to be configured. That's part of configuring the metadata at the secondary site. And that's part of why you can't just blindly transfer the metadata from underneath from the control of databases, but why there's some fix-up required. I think we're out of time. I'd be happy to continue discussing this. Some of my colleagues here, both from Red Hat and from IBM, would also be happy to continue discussing this. Thank you, everyone, for your time and attention.