 Hello, welcome to the Marketplace Theatre again for the next presentation. So our presentation is on upgrading a system and we are talking from a telco user perspective. So this is a little bit different to all the upgrade sessions we have besides this one on this summit and unfortunately there's also an upgrade session in the design summit in parallel at the moment here. So the schedule is a little bit difficult there. So I'm happy you are here and I'm coming from Huawei. We are one of the big telco vendors here but we are also big open-sac contributors so we combine the two worlds and from that side we are a good partner if you want to do both things in parallel. So what I want to bring to you today is a little bit of an explanation and background what is an upgrade for a telco user. I will explain to you the different scenarios that we have to consider for upgrade if we are in an NFV and telco environment and then I will also introduce you shortly to the work we are doing in the OPNFV project. You will find an OPNFV booth right over there in the other end of the hall where you can get more information about OPNFV. So let's go into the introduction requirements first. So upgrade and high availability is something which is hard to combine. So in telco networks everybody talks about the five nines. What is five nines for us? It means five minutes down time per year everybody can calculate that but the difficult part of that is what is considered a downtime. And in telco business downtime is everything which makes your service being not available. So this includes also the downtime you are planning when you do an upgrade. That might be in the middle of the night but some people are still taking their phones or want to watch a video or whatever during the night. So our network needs to be up, our services need to be available also then. So it's hard to find the right time of the day where you want to do an upgrade. Everything needs to be considered into these five minutes and therefore we need to take special measures to meet these requirements. The whole work you do for an upgrade of course can be much longer when you do the preparation in the system, when you bring the new software onto your service and so but the downtime of the service must be very, very short. And what we can do today in telco networks usually is around one or two minutes for an upgrade where a network element will not be available. I think today OpenStack operators are very good if they can do it with ten minutes downtime. So it's really hard work to do. One thing is good about it, we are talking not about a single application being down but we are talking about the service being down. So there might be possibilities to go around that when you consider a single network element. We have to look at the whole service for it. Another thing we should consider is that we have a different networking structure or application structure when we talk about telco networks. We don't have so many different applications running in a cloud and we wouldn't know what applications they really are. Here we know our network elements, we know their requirements, we know the consequences when we do certain things in the cloud environment or in the cloud platform. On the other side most of these elements we are running in many, many instances. So it's worthwhile to prepare well and do an automated process which you can run on all your applications you are running there. So this creates a different type of thinking about upgrade. And something I want to explain here is upgrading in the cloud will be different to upgrading a singly vertically integrated box. Here in the cloud environment you have new technologies you can use for the upgrade than we traditionally use in the telco networks. We can decouple the systems because we have different building blocks now in a way we usually don't have today in the telco network. On the other side of failure during an upgrade when I run multiple applications on a cloud platform and my cloud platform might go down that will affect many, many users. So the problem might be much bigger if something goes wrong. So I have to be well prepared for that. So this is the background, the requirements we have to meet with our upgrade process. So what are the scenarios for such an upgrade? I start with a well-known Etsy and FV reference architecture. I'm sure most of you have seen this picture but I might explain a little bit what it is. We have the hardware building blocks in the dark green area here. We have a virtualized infrastructure manager. It's called by Etsy. This is open stack and all the other tools you need to manage your cloud infrastructure. So there will be also SDN controllers as one of the things which will be there besides open stack. Then you have management systems. This is the VNF manager which is doing life cycle management for the applications, bringing them up or down, configuring them, establishing the network for them, establishing service chains and all that stuff. Then we have the applications above the big green box called VNFs, virtualized network functions. Of that we have an operating system where the carrier network operators can create all the stuff you need to do with his VNFs and we have a network function orchestrator which decides how many VNFs to install in which places, when to do automatic scaling and all these things. So when I'm talking now about upgrade, the great thing with NFV is that I can decouple the upgrade of the different building blocks. So each of these circles with these arrows stands for an own upgrade process. So I can do an upgrade for instance of the VNF manager separately to all the other components in this network and that will help a lot when I build up an upgrade process for the whole NFV based networking. So what I want to do in the presentation now is I will walk through all these different cases and I will look into that what does it mean for the upgrade process. All the different building blocks do different roles in the whole NFV architecture and therefore the requirements for the upgrade process will look differently when I look at the different cases. So you always find in the right upper corner the building block I'm currently talking about and then I try to explain the specialty for upgrading that concrete building block. So I start from the top going down. So first thing is the OSSPSS system. Normally you don't know what this is because as a subscriber, as a phone user, you don't know that there's somebody behind that managing the whole network. So the network is running if this thing goes down. That's a good part because it makes it much easier for us in upgrade. It's just a normal application. We can use the 10 or 15 minutes downtime as for any IT application for the OSS system. So we are pretty quick on that part. The next one is the NFVO, the orchestrator of the cloud. So the orchestrator usually is necessary when I want to change things there or when things need to change automatically. So it is not directly involved when people take their phone. It's not directly involved for the traffic going around. So also here the upgrade will be easy. But we should think about what is possible during the upgrade of the NFVO. There will some things be in the NFV network that will not happen when the NFVO is not there. Maybe no fault handling of major things. Maybe no automatic scaling. So there's a traffic burst somehow and I cannot bring up additionally new VMs to handle the traffic. So these things will not happen during that upgrade. But the rest we are fine. So we can do with a reasonable downtime here as well. Going down from there the VNF manager. Again a thing which is not that bad for upgrade. The VNF manager is doing the life cycle management for the VNFs. So it's bringing up new applications there. This is not that necessary for the networking being active. So people can still use their phone if I bring down the VNF manager to do an upgrade on it. Maybe we should think about one thing there. There might be different VNF managers depending on the VNFs. The vendors will be providing. There are generic VNF managers which have a very easy functionality. It will be like that. But there will also be VNF managers that have very specialized interfaces to their VNFs. So when we have traditional telco VNFs they might be very complex applications with many VMs that have different roads. They have internal redundancy and things like that. It depends on the vendor what he is doing with the VNF manager. Maybe the VNF manager is involved when I do failovers or things like that when I do fault handling. So in these cases I have to be careful for the VNF manager. So now the most interesting part there, what everybody is talking about, upgrade of the VNFs. So how is this done? The thing is normally we can do the same mechanisms that we are using when we are upgrading a physical network function. So physical network function, meaning it is running on our physical hardware, same functionality but implemented in a traditional way without cloud technology. So when a telco vendor creates now a virtualized version from that, he can transfer the same mechanisms he uses there for upgrade for the new virtualized network functions. So the network function still has a similar architecture. You can use the same mechanisms. And that is interesting since some of these VNFs will be stateful and very, very complex in the upgrade process. So when we can keep that process, that will make life much easier. But there will also be new small VNFs like a load balancer or a firewall. These ones will be single virtual machines. So when I do an upgrade there, that thing goes down and maybe even much longer than the five or ten minutes until I bring it up again. So there I have to make sure that I can direct the traffic to another VNF doing the job in that time. So this will help me to run through an upgrade process for the single VM which takes longer. But I need to have a network view when to upgrade which of the VMs doing the same job in my network. So this is a new task now where I have to have an overview over all the VMs which need to be upgraded and which are sharing a certain functionality. And my third point there, I cannot talk much about because there are hundreds of different combinations of that. I can have stateful small VNFs where I have still to get the state back when I come back from the upgrade. Or I can have little bigger ones, middle VNFs, anything in between is possible. So we have to have very specialized methods for that. The fifth one, we go for the NFE infrastructure. So we need to upgrade our servers in the easiest part or our storage system or our network components, switches, routers, everything we need to maintain in a telco network without the whole network going down. And for big routers that will not be easy, but the big ones will come with their own upgrade process. So the mechanisms there will vary. But most of these components are available in the infrastructure in many instances. So when I bring down one of them to do the upgrade, the others can take part of that. And since we have a cloud environment, I can migrate all the tasks away from that one before doing the upgrade. So I prepared some slides to show how that will go and walk you through a process to show you the consequences of that. It's important to know then what are the dependencies between the different components and what are the dependencies between the VNFs running on those components. And to walk through all this upgrade process, according to the dependencies, it makes it necessary to have some control process. And that's something which doesn't exist at the moment, which we probably will need to introduce, whether that will be a good part of OpenStack or not, we have to decide. So when I run through that, I just put a small cloud there for the upgrade control thing because I don't know where it is. We have to decide that. So what I want to do in this use case example is I want to upgrade these three servers. Service is a pretty easy part. It would be much more complex when I go for network equipment and I wouldn't be able to do it in the time I have for this presentation. So what we assume now is we have a VNF with two components. These two components are virtual machines running on the servers below them. And I assume here I have a VNF that has internal redundancy. So these two components belong to the same VNF. One is running in active mode. One is in standby mode. And in case of a failure, it would take over from the active one. You know these mechanisms. And we have one third server here which currently is empty. So it's idling around so I can probably do the upgrade very fast. So what I do in the first step, I calculate my upgrade process. What are the different steps to go through? And in that case, what I will do is I will first upgrade the server number three because it's idling. Then I will upgrade server number two. We have just a standby on it and then I will run the upgrade on the server number one. And I will need to do something before I can do that. So first thing, upgrade server three. It's idling. I can just do it. So the brown one is the upgraded server. Is it brown here? Yes. So the next step is a little bit tricky. It's something we don't do today. And we don't know how to do it at the moment because nobody knows how to talk to this guy to do this migration step, what I want to do. And there are also different mechanisms I can use for this migration. So when I want to shift this VNFC number two from server two to server three, I have several possibilities. One is live migration. Not in telco applications, some of them will not survive that. So in that case, I have to use something else. So what I can do with a standby easily, I can do a shutdown for it. Why not? It's not doing active traffic. And maybe I can also just switch it off because it's the standby node and we have all the mechanisms in place in our application to survive that without any outage. So one of these methods will be used to move the component on the server three. Next step, now this is already running on the new hardware. Why not use it? So in the next step, I do the switch over and we have these mechanisms in our applications from the active role to the standby role and that we will not lose a single connection for doing that time. So now the active service already provided by the new hardware. Everything else is not affecting anything in the active service. So I repeat the same thing. Server number two is idling. I can upgrade that one. Now I can move the other standby node. So component number one is now standby. I again have the three possibilities, keep in mind. So I'm moving that to the server number two. And the last step, I can upgrade server number one and I'm done. So for servers, it looks pretty straightforward. It was eight steps. We should automate that. So I can run that easily because I might have many, many cases to run through this process. So you won't sit there as an operator before you do the server upgrade and tell all these guys to move their applications. You need an automated process for that. And that will be something we have to implement. So this was compute resources. When I look at networking resources, it will be much more complex to decide when do I need to move a component from one equipment to the other. And to have an automatic calculation of these dependencies and to find the way how to do that without creating an outage will be a challenge for network equipment. But typically in telco networks, we also have redundant paths for the networks. So there will be possibilities for that. OK, now the remaining part, the virtualized infrastructure manager. So talking first on OpenStack. So we have a lot of things going on there in upgrade. We have many sessions on upgrading OpenStack during the summit. One is in parallel on the design summit at the moment. Keep in mind the VIM is more than OpenStack and even dependencies can be there. But when I look at all these upgrade sessions, everybody here does this upgrade differently. Nobody knows what are really the dependencies. Everybody finds a way around this or that problem depending on its own data center architecture. We need to find a method how to do that best for everybody in the telco environment. So there will be some work to do for that. And we need a really stable process. So we can execute that in many cases without a failure of the upgrade process. So let's talk about the downtime of the VIM. Typically the applications are running on the compute service. So when the VIM goes down, shouldn't be a big problem. But the upgrade processes sometimes affect the traffic flows. Some of the OpenStack modules affect the routers which are configured. So there are a lot of dependency to be considered and a lot of things where we might have new requirements to the OpenStack modules to keep certain things which they don't do at the moment since we now have HA applications running. So just to explain one of the most important things there, when I do the OpenStack upgrade all the applications must stay running. So the new version, even for a major version upgrade, the new version needs to keep up all the configuration of the network, all the virtual machines that they can stay on running during that upgrade. So besides the OpenStack part in the VIM, also SDN controller needs to be considered as part of the VIM. So SDN controller part of that is we can consider it as network equipment. Part of that we can consider maybe also as a VNF, as an application running on the infrastructure. But part of that is also cloud management. So we need to do the same thing there as we do for OpenStack. We need to have the similar requirements to the SDN controllers as we have it there. Keep the traffic up, keep all the configuration of the network up even when you do major upgrade of your own software. So we will have to do some work in those upstream projects as well. So one last consideration on that, because normally we don't talk much about that, but as OpenStack guys we might need to change our hardware. OpenStack controller nodes are running on. So when they go to a new hardware generation, it needs to work. We have three OpenStack controller nodes and one is running much faster than the other one. I'm not sure how many people did try that. So there will be a few things still to test and find out and I'm sure we will need to do a lot of improvements for OpenStack for these things. Okay, how do we do these changes? In OPNF-E we have created a project to analyze all these things around upgrade. It's called escalator and at first we are thinking about the parts OPNF-E is building. So upgrade of the infrastructure and the infrastructure management. And we decided to start with a WIM upgrade. So at first think about the tasks I have described as OpenStack upgrade. So we will work on these topics first. We will define the requirements in the much more detailed way than I could explain to you now in this short session. So also things like duration times, granularity of the upgrade. How do I do it in an installation with many sites? Things like that. What do I need to do in the upgrade preparation phase? And what are the mechanisms we have to support for the upgrade? Do we need new interfaces for orchestration so we can run the upgrade? And are there new information flows necessary to be able to run this automated process? So as OPNF-E is an open source project like OpenStack is. So we have contributors and commitors in this project from China Mobile, from Docomo, Ericsson, from us and from CETI. So it's a joint activity of many companies there already. And for the first release escalator will be in which is OPNF-E's Prama Putra release. It will come out next February. In this case we don't need to consider a real upgrade because nobody will upgrade to the Prama Putra release. The first upgrade we need to control is from Prama Putra to our C-release in OPNF-E which will probably be based on Mitaka. So we still have a chance from there to bring blueprints for Mitaka. So for the new version which needs to be installed. So that's the outline what we need to do in that project. Just to give you a short overview on what is this, the functional requirements we are analyzing there are describing the preparation tasks, validation of an upgrade plan. So I told about this sequence we are doing there to validate will it work in my particular data center, the necessary backups, snapshots I have to do to prepare the upgrade, then execution phase itself of course. So we go through all these requirements. We also have to think about different use cases, will it change when I have a minimal configuration like in our test labs, when I have HA configurations, redundancy in every place and even multi-site configuration with many, many locations like we will have it in the live telco networks. And as the last thing of my presentation I would like to invite you to help us in the escalator project. Please join us. Here is the link where you can find all the information about the OPNF-E escalator project and that way you can best contribute to the upgrade process for the telco networks. Thank you.