 Hello and welcome from my side to the OpenStack summit and to this session about data protection in OpenStack. Where I will show you that data protection inside OpenStack requires more than just being able to protect the data of the virtual machine. Just getting the data out is easy, but having a full OpenStack data protection solution that can fulfill all needs is not that easy. My name is Robert Rofa and I'm the product manager for Trilio Data, which is a company providing a data protection solution for OpenStack.self. Alright, let's start with the first and most important question. Of course, why would you actually need data protection inside OpenStack? It is always sad that sheeps and kettles are not like the pads from the Lager Zero. But let's take the metaphor of sheeps and look at it a bit further. Yes, sheeps are many in the flock and when one sheep gets lost, can it be replaced without affecting the rest of the flock? But still each sheep is having some value and there are those thick sheeps that generate a lot of wool. And that's what we want to have because that is why we are running all those applications is the wool that we can then sell so that we make money. So that is every sheep is adding value to the whole environment. But even when we follow the original metaphor of sheeps and pads, meaning everything inside a cloud stateless, is it that truly? Let's take a look at the classical 3-tier architecture. We have the front-end server, clearly stateless, just accepts requests and sends responses. We have the application server, which is again taking the request from the front-end server, working with those, sending the information back that are then presented by the front-end server. And those jobs can also be called stateless. But then we have the third tier, the database server or file server that is used. And that is where stateful data comes in. And here's then the question, where is this stateful data located in your environment? Do you have the stateful data outside of the OpenStack environment? Then you have the question about how to deal with the networking, the security. You need to have an additional environment just for this database server. So you will also run into many additional questions. And it will most likely happen that one of your tenants will put a database server inside your cloud. And with the database server inside the cloud, do you have stateful data inside the cloud? And the moment you have stateful data inside the cloud, do you require data protection? The next element that many people are then saying is, well, everything is a microservice. I can have only my single microservice that I need to protect, which is true in a way. But microservices, when you go by the full definition, are developed independent of each other. So you have a certain set of microservices which define your actual application that are proven, that are qualified, that are fully supported. So you want to be sure that when a disaster happens, that you are able to restore such a qualified set. And for that, you can either keep for every microservices or version copies somewhere stored, or you can have a data protection solution, which allows you to restore your application, including all microservices in a defined set. And so we have defined now that a data protection is needed inside an open stack environment and why it is needed. So the next will of course be what to use as a data protection solution. The very first thought will of course be, we take the legacy backup solution that we have used for the past 10, 20 years and bring that into the cloud, which will be able to do the job of protecting the actual virtual machine data. That is true, but as I've said in the beginning, protecting the virtual machine data itself is not everything in an open stack environment. But let's go there step by step. What we know about legacy solutions is that they are nearly everything, nearly all of them are using agents, which are installed on the virtual machines that they are protecting. If the agents does always come a media server, we're just controlling those agents and is the element where all your backups are going through before going to the backup target. We require a backup administrator who is centralized controlling the backup jobs and do restore jobs. And what we get out of that is that we have a backup and recovery solution that is limited in the amount of agents that can't control a media server and only backups the data itself. So that is where it gets critical because managing the backups alone will be tough. It will become an nightmare because when the tenant is coming to the backup administrator and says, did you set up my backup? I have many questions that the backup administrator needs to have fulfilled before it is possible to even take the backup. The agent needs to be installed. It needs to be installed in the right version. It needs to be reachable. And then does the backup administrator also know which virtual machines does he need to protect? And does he even have access to those virtual machines? Can he protect them? So managing that and coordinating that with the tenants in an environment where the owner of the virtual machine and the owner of the infrastructure are completely separate is becoming an absolute nightmare over time. So how should the backup solution be to avoid that problem? And actually the OpenStack definition and the OpenStack principles are defining that already. The backup solution needs to provide its services in a self-service manner. So just like for all the other OpenStack services like Nova, Neutron, Glantz, Zinder and so on. Does the backup administrator only take care of the server itself? And when the tenant is then saying, please backup my virtual machines? Does the backup administrator say, dear user, here we have implemented a data protection solution for you which you can use yourself to set up your backup jobs and recovery jobs as needed? Here you find the documentation. I will make sure that the service is running at all times so that you can safely protect your virtual machines. So with that, do we have identified the first requirement of a data protection solution in OpenStack? And that is self-service. We will continue to find more requirements when we take a look at the challenges that a classic data protection solution is giving you inside an OpenStack. Another challenge is the cost of the media server itself. The media server is big. It's so big that when you host it inside your cloud, it will take up most likely a complete compute node just for the media server. And this media server can then even control only a limited set of agents. So you will require more media servers when you have a bigger cloud. And you then also have the problem that you need to split this up somehow because most backup and recovery solutions are not designed to have scale-out media servers. They have scale-up media servers. So that when you have multiple media servers in the same environment, do you have to define how do you split everything up? And the only way how you could solve that would be to put the media server inside a tenant. So each tenant is getting its own media server, but then each tenant would have to pay for the resources required to put this media service in. So you have costs there. Well, when you accept the limitations and put the media servers outside of your OpenStack environment, are you giving up internal networking? Because the media servers need to be able to talk to the virtual machines that have the agents installed. They need to be able to reach out to those, start the job, get the data and do the restoring once in the case. So no matter where you put the media server, you will pay a price, be it in hardware, which will always pay in one way or the other, or be it in network and security. So this means that when we take a look at our agents and media servers, something you would want to avoid in OpenStack environment, you would actually want a solution that is adding a backup and recovery service to your compute nodes so that just like Nova Compute, just like Neutron and all the other projects, you're adding a service component to each compute node, which allows you to scale together with your OpenStack. And in the ideal case, would the service also be able to take all the backups of the virtual machines without the need of an additional media server? Because when we look to the already identified self-service requirement, they're coming from an OpenStack controller. So when this backup service is integrated into OpenStack and is sending its requests out to the compute node, would the backup service be able to take that up and there's no media server required to control it? So we can add to the requirements that have been identified that the solution that you would be looking for ideally is not only self-service, but also agent-less and allows you to scale linear together with your cloud. The next big topic to think about is again following the OpenStack principles itself. We are talking about multi-tenancy. Multi-tenancy means, as we all know, that many companies, many talents, many departments are running on the same infrastructure, but they're logically divided and have no chance to intervene with each other or to even see what the other is doing. And you require the same for the backup and recovery solution. Because when you have a backup and recovery solution that is used by everyone without multi-tenancy, which would mean that everyone can see all backups, would you invite a high security risk for each of your tenants? Because they would be able to see the backups of other tenants and be able to restore those backups. That is something that you want to avoid, of course, at all costs so that your tenants feel safe using your cloud. So the multi-tenancy is an obvious further requirement that is needed inside a data protection solution. When you go with the centralized backup administrator, don't you have that requirement? But as said, that still means that you have all the already set disadvantages that you will have to take care of. So far we have spoken about backups. But what is with the recovery? Recovery is bringing more challenges into an open-stack environment. Because when you take a look at what the backup administrator needs to know on a high level just to be able to restore the virtual machine, we are talking a lot of metadata. Because every project inside OpenStack is adding some data to the virtual machine that is defining it inside the OpenStack environment. And when you take a look at the five core projects alone, do you see how much metadata we are talking about? There are obvious elements like NOVA, the flavor, so what is the actual size of the virtual machine, but also less obvious elements like the placement group. That is something that is often forgotten. But the placement group has been decided for a reason when the virtual machine was created. And it is necessary to have this information so that it is possible to restore the virtual machine where it belongs to. And it is also necessary to have all the networking information so that the virtual machine, once it is restored, is directly capable of working again. The tenant, the user, is not required to do additional manual steps. You see where I'm going. The element of cloud is also mostly highly automated, so you need a data protection solution that you can integrate into such an automation. And you need a data protection solution that is knowing about the metadata that an OpenStack environment is providing to a virtual machine. So what you're looking for is a fully integrated data protection solution, a data protection solution that knows about your OpenStack. So you're looking for something fully integrated, something that is helping you to make sure that you have all the metadata defining a virtual machine inside OpenStack, defining its connection to the outside world, which you then can use to easily and smoothly restore the virtual machine inside the OpenStack environment. Okay, so let's say we have found a backup solution that is fulfilling all those needs or you have created it yourself. The next will be, of course, installing this feature. And this is where you will find the next challenge because a classic backup window or a classic maintenance window, you won't find that inside an OpenStack environment. An OpenStack environment is used by so many people and so many departments or companies, generally tenants, that it is highly likely that at any given point in time will there be a critical job running, which means that you can't take down the whole environment just to do some maintenance. And you can then, of course, try to manage that by shutting down only partly so that you move everything to the parts that are working, then do the maintenance and installation, and then you slowly roll through your OpenStack environment. But you will have to do that for every single time you're updating or maintaining a data protection solution. What you would be actually looking for is a solution that is non-disruptive to the environment so that you can work with it, you can install it without the tenants noticing. At Macs, where they notice that the backup and recovery service is not available, but all their critical jobs can still run. So there's a maintenance window without shutting down the cloud without affecting anyone else. And with that, do we have found the last requirement that a data protection solution should fulfill inside an OpenStack environment? Any solution that is not following these points will be able to protect your OpenStack environment, but you will always come to a cost. You will always have a downside to one point or another and will most likely face issues that you need to solve in addition. Okay, so let's take the time now to summarize what has been spoken about, what has been shown to you today in this session. Data protection inside OpenStack is needed. For one reason or another, will there be stateful data inside your environment? And this stateful data needs to be protected. And as said in the very beginning, even a sheep has value that might be worth to backup and recover. When you have this requirement, do you then require a solution that is fulfilling several easy points? The first one being that following the OpenStack principles, the solution should be providing self-services to the tenants so that they themselves can take care of everything and the backup administrators are not running into the management nightmare because only the tenants know what they want. The solution should be agent-less for the same reason, but also to allow you to scale the solution together with your OpenStack environment without the limitations that a legacy backup solution would bring. It should also, again, following the OpenStack principles, provide you with multi-tenancy so that you can make sure that your tenants have a good feeling about the data protection solution that they can trust it. It also requires full integration into the OpenStack itself so that it is possible to gain all the information that are defining a virtual machine inside the OpenStack so that you can use those informations for a restore without the need of any additional manual steps without the need to keep this data somewhere else. And lastly, does the solution that you have the requirement to not interfere with your other OpenStack services should you have the need to install it, maintain it or to do any other maintenance work on it? Otherwise, you will never find a window where you can actually do that. And with that, am I at the end? I thank you for listening. I'm open for any questions that you might have. And should there be questions in the future, please feel free to reach out to me directly or to anyone else at you so that we can help you finding answers to your data protection.