 Okay. Welcome everybody. I want to speak about NFVI upgrades and migrations with live critical telco workloads. Let me start with the key takeaway message. One great achievement is to develop and to provision a full telco cloud solution in service. That requires quite a lot of effort from development, integration, testing, delivery services. Once this cloud is in service, another major achievement is to maintain this cloud solution in service, meaning to handle the software lifecycle management. That is what we normally call upgrades. And upgrades have to happen with a minor impact on the service. Otherwise, our customers will not be very happy. In Ericsson, for many years, we have this concept of ISP. Even before the concept of internet service provider came, we used to call in-service performance to the ability of tele-consistent to stay in uptime permanently. And this concept is very important coming from the early 70s. And now with the cloud, it's also super important. Anyway, my name is Gerardo Martinez. I am the NFVICID solution architect. I work for a program called NFV program, where our main goal is to verify all the Ericsson applications together with the Ericsson NFVI solution. We do not deliver an NFV solution as such. We just make this process of gluing the applications and the cloud to make sure that the cloud delivers to the application what is promised. Performance, throughput, storage, networking. This thing is a very dynamic team, I have to say. I am coming on behalf of a super spectacular team spread all over the world in all our R&D sites. This team is composed by people who is expert on applications, infrastructure layers, R&D and services. Also the people who deliver these to our customers, they join us and they get the experience to learn how to install these applications. NFV, we could put it in an equation as NFVI plus VNFs, CNFs and life cycle management. Some people also like to refer this to NFV-Manu, management and orchestration. So these are some key concepts that we handle during the whole presentation. What do we do? Well, basically we take the latest Ericsson NFVI software just before it's released to our customers. We do LCM activities with simulated traffic, so-called NFVI upgrades. And once the upgrade is completed, we perform resilient test cases and we verify if everything remains in place, including the basic features. Once the NFVI software solution passes quality tests and we do the official release and the NFVI software is in general availability to our customers. We are an independent organization, so we actually participate in the process, but we play the external customer view, so we provide feedback together with our applications to the unit that really send NFVI. What are the forces behind upgrades? What can trigger an upgrade? In fact, the word upgrade is sometimes pretty confusing. Well, Telco customers normally will upgrade because they want to get a new feature, because they want to get a security fix in place, because they are waiting for a bug to be fixed. This is probably not the nicest reason to do an upgrade, but necessary. And last but not least, because sometimes the products reach a level where maintaining the software is no longer a business and therefore the companies define an end of life and of support and therefore the customer is required to upgrade if they want to keep the same service level agreements with the vendors. What are the Telco customer expectations when you do an upgrade? Well, they expect that there is a very careful, verified upgrade procedure that has been tested. They expect the downtime of the procedure will be according to our promises and they expect control risk, so no surprises happen during the upgrade. If an upgrade is not possible or too disruptive, then we have to consider migrating the workloads or offloading the application so we can run traffic in another cloud while we perform this upgrade. There is sometimes confusion about upgrade meaning, because you might call upgrade sometimes something that in the end is an update. You are not delivering perhaps a feature or sometimes you might find somebody who say, oh, I upgraded to the latest version of Kubernetes 1.23 and then you ask, did you do a rolling upgrade of the cluster? No, no. I just deleted and reinstalled the higher version. That we don't call an upgrade ourselves. We call that either an uplift or a reinstall, but we have to make a clear difference here because customers do not expect that we delete and we reinstall a higher software version. This is more or less our NFI solution, how it looks. I just want to say the standard protocols that we use in our solution. We have, of course, hardware, switching fabric. We try to comply with IEEE, RFC, IPMI, rack scale design, redfish. Then we have, of course, the OpenStack layer, which today in most of the cases is the standard, but we also are trying to deliver a solution without OpenStack. Finally, we have this containerization layer. I guess some of you have heard this term during the summit called LOKI, Linux, OpenStack Kubernetes Infrastructure, which is more or less what we call NFI. It's a standard way how NFI is delivered. What are the Telco industry facts and practices? This is where we start to talk about differences of the way the things are done. Telco industries are heavily regulated. The service of a Telco operator is based on a government concession. You can hardly organize or create a startup for a Telco operator. You normally need to approach the government, bid for a frequency band, make a major investment, and then you can have a Telco operator. You also make commitment with the government, like emergency services, which are these emergency services, like 9-1-1-1-2, they have binding legal commitments. You might have heard that in some cases, in some countries, when the emergency services are down, they might deploy the policemen to make sure that the people are served. So this is a very important topic. Standardization. Normally in Telco, the standards are made first, and then the products get developed. So that is a little bit different to the way how open source works. So there is historically a major background of Telco generic requirements that even comes from the time when the bells is making a split in the US to the baby bells, and then you have bell core, lab bell labs. Bell core eventually became Telcordia, and actually Telcordia eventually became Ericsson. They used to have these books or standard called GRs, generic requirements. One of these requirements, for example, is the concept of first office application, or FOA. Why? Because in Telco we like first time events, the first call, the first upgrade, the first new product in service. Every time in Telco there is a first, there is big news. So once the FOAs are successful, then the GA is granted. It's very common that when a Telco product goes in service for FOA, during that soaking period where it's running traffic, all the operators, the tier ones and tier twos, start to ask around how the product is performing. And there is an internal communication process we don't see as vendors where they provide each other feedback, and that is very important for the reputation of the product. Finally, we have this, the fact that Telco operators normally cover a geographical area, and therefore there is this concept of maintenance window, the time when people sleep, people do not talk. Those are the times where we perform these upgrades. We cannot do them the whole day. We have to have a special time, and that time depends on the location. Normally these upgrades will happen on nights, on weekends, but not all the time yet. So CI CD is still a concept to digest for Telco lives operators. This is more or less how the upgrade flow goes. You make some days of preparation, then you agree on the night of execution. You do some post checks. You perform the upgrade. If the things are good, you leave it. If the things, if there are issues, then you have to consider a rollback. And then you spend some other days in the post soaking period, et cetera. If something doesn't go well, customers expect you go back exactly as it was, as the system was before. And then this is where I would like to talk about the challenges of that concept. First of all, the size of the data center. It's literally close to impossible to upgrade a data center in one night. So therefore you have to schedule with the customer how many maintenance windows you are allowed to perform, and you might be surprised that some customers might expect only to use weekends to perform an upgrade. Therefore you might eventually take even months to upgrade a full data center. So very important, consider the size of the data center before doing an upgrades, because maybe too big means too long. Multitenancy challenges. Customers got promised that every single CPU in a host is to be used. Therefore, they tried to play what we call the Tetris game, tried to clutter all the VNFs and the CNFs as close as possible, which makes a big challenge if you start to bring down hosts one by one. Because sometimes you might have combinations of VNFs that do not necessarily like to lose specific redundancies at the same time. Backups. A backup doesn't have any purpose if you cannot restore it. And one of the issues, very good, right? And one of the issues why we have challenges on the backup areas, because so taking an image backup, exporting an image backup or a database could take sometimes even longer than redeploying the system. So there we have a major discussion with our customers. If we do backup and restore, maybe redeploying is faster, but redeploying is not something that telco customers will take lightly. We need to explain very well why a redeployment might be better than a restore. And finally, traffic resilience. We have to make sure that traffic is in place, that we can rewrote the traffic in case of issues, georedondency application pools. Customers also do not like reboots. So when we do upgrades of Kubernetes layer, OpenStack, we have to find opportunities to align reboots, to make sure that every time every layer takes an upgrade, we can maybe save the reboot or the reboot number or minimize the number of reboots. Another interesting thing, migrations and OBM migrations and worker drain. I always say that in OpenStack, VMs are like a soccer ball. Everybody knows it. Everybody knows where it goes when you do a migration and it's a simple game. But when you do Kubernetes drains, it becomes a bit more difficult because you have too many containers moving around. So normally I make the analogy that dealing with containers is like playing pool. You have many balls, you have many holes. It gets pretty difficult to follow all of them. Therefore, it's very good to have these simulations. Kuba CTL, for example, has this very nice dry run concept, which still needs to be improved. And it's very important to know this concept of pod disruption budgets. This is extremely important when you design Kubernetes applications to make sure that the drain process do not get hanging. And I will leave the advanced level for a later presentation. I will repeat this presentation in 30 minutes later today, if you are interested. I want to close just with the last comment about pet versus cattle. How many people know here the famous dilemma pet versus cattle? Well, when you upgrade a data center, you end up having a lot of attachment to it because, especially in my case, I've been two years doing this. Therefore, you really get this attachment and you try to think that your data center is like a pet. But in practice, we are an industry and we deliver a cattle, right? But how to deliver the two things at the same time? And I have a proposal just to leave you, which is horses. Because we have to treat these data centers with ownership. We have to understand that we are really running an industry here. But the key is to master this system so we can control and have predictability all the time. So that's it. Time is up. So anything you need, you can meet me for coffee. We don't have an stand today. We can talk. Thank you very much.