 Okay, are we good to go? All right, welcome. Thanks for coming over. This is Chris, I'm Dirk. We're going to talk about the next step of OpenStack evolution for NFV deployments. So what we want you to take away from this talk is not so much like a detailed list of NFV tackle requirements to OpenStack, but a bit more of like an understanding, say how the development process is changing. So in the NFV tackle environment, so how we are currently moving on, like transitioning our systems to like a fully cloud-based approach, so leveraging all the different open source components. So I'm a chief researcher at NEC Labs in Germany. So mainly working on like networking in general, in particular on SDN. And what interests me at the moment is so how can we transform the current systems, so leveraging NFV and SDN, so how we can design better mobile networks in particular. So doing some work in the IRTF and also involved in OP NFV. I'm Chris Wright. I'm the chief technologist at Red Hat and I've been working in open source for quite some time. My background is coming from Linux, being a Linux kernel developer and the hypervisors virtualization, the networking side of the IO path for hypervisors is what sort of brought me to where we are now, which is talking about how do we transform the industry? How do we move the telcos networks from the traditional model that we see today to a cloud-based model that we're building together right now? Okay. So just a quick overview of what NEC is doing in that space. So NEC is an IT and communications company in Japan obviously. So we are doing cloud infrastructure and telecom networks and services. So we have been involved in like UMTS where we had the first UMTS deployment, the first LTE deployment. Had an early VPC system and in general we have been working with Linux-based and open source-based platforms for quite some time. So you see on that diagram basically our VEPC system. So I'll tell you a bit more on that in a second. We were founding members of ONF, Open Daylight, OpenFV and one of our in a range of open source projects and relevant standards activities in this space. And so our platform today is like, yeah, we've chosen Red Hat Enterprise Linux for our NFVI systems that we run on code servers like using different network interfaces today using OpenV switch and DPDK. We also have chosen Red Hat Enterprise Linux OpenStack platform for our WIM. And so NEC has had a product like for orchestration management and of course our different NFV applications. So the system with different third party vendors like for building say complete systems for customers. And so I'm not sure how much you are aware of so how like the telco industries has been involved in recent years. So for NEC, so like our say previous product range as for like many other vendors in that space was like based on the so-called advanced telecommunications architecture systems. So like these were the like five nines highly reliable redundant systems. So big boxes with all the monitoring management features. So we have chosen Linux quite early time so we're building those systems. And from those we derived like the first generation of virtualized systems. So it was at that time our in-house developed resource management for orchestration. And so now as I told you earlier we are moving this to like an open source platform say generalizing the system, try to figure out so what of the specific proprietary developments we had done actually make sense in like upstream projects and then how we can be like successfully contribute them. And perhaps I'm sorry I was one too far. So we are currently working in NOPNFV which was like the whole industry basically to figure out what are requirements that cover the whole range of the open source projects and systems and so how we can as an industry at large like do a useful contribution to those. Did I want to talk with this? Probably not. So I have to ask you a question. ATCA, the A, the first day is advanced and I imagine that's because it was Linux based. Well for the NEC case yes. And the interesting thing to note there is open source already has a history in this environment so we're just expanding the footprint of the work we've been building on for many, many years. It's probably over this point a decade ago when the Linux industry was focused on bringing Linux into the telco environment so we're just expanding on that and if you look at this, you know, Dirk mentioned the OPNFE project. You probably had a chance to hear about that. Is anybody here not familiar with OPNFE? All right, well, I guess we're ready for beers then. The OPNFE project, I think the purpose of this project is across a number of dimensions but one of the things that's important to notice here, we're at the OpenStack Summit. We spend a lot of time talking about OpenStack and telco but the OpenStack portion or orchestration component of this overall software stack is just one piece. If you look at all of the projects up here that build up the NFDI or the virtualization infrastructure, the VIM layer, the management layer is one piece but there's critical components underneath. So what happens when you wanna make a change to the platform that spans all of these projects? OPNFE is there to help bring together all these different building blocks, organize the development efforts across these different building blocks and potentially bring them together due integration testing show that we're actually building something that works for the telco environment and sort of iterate and continue on that process. What you see here is a bunch of projects. Most of them you already know from an OpenStack context but again, there's a significant amount of work that happens below the OpenStack layer. See if I can get this to go the right direction. All right, this is just an obligatory slide about Red Hat and what we're doing and why we care about open source and upstream and the leadership that we bring there. We have a long history of being an open source company. What's important for us is working together with both the development community and the user community to bring features into the software projects that we take from an upstream and then deliver as productized versions of those upstream projects. What that means is we're developing relationships on both sides and it's really important to understand that if you are only in the developer community or you're only on the user side, you're not getting the full picture and our goal is to help the whole industry move forward and build these platforms in a way that they're reusable across the industry because one of the dangers that we see is as you build from an upstream project point of view, as you build features on top of that and don't contribute them back to the upstream project, you've created something that's both a fork and a long-term maintenance burden for you as a vendor and it's a piece of technology that's tying your customer to you in a way that that customer isn't necessarily looking for. Part of that is working together with our partners so this is why we're here today. So NEC and Red Hat saying these are all the cool things that we're doing together to make any NFV real and so you'll see some of the specific work activities that we've done in the upstream to enable NFV as a platform. The last piece there is talking about the robustness, stability, quality of the software that comes out of these open source projects and delivered as products. You're talking about platforms that are already running significant infrastructure, trading platforms, air control traffic platforms, well, I was gonna say train systems but it's kind of awkward timing but train systems as well so there's a lot of interesting effort going into building these as enterprise quality platforms that are critical to the telco environment as well. So this is just a simple view of how we take those individual projects and turn those into product components. You see Neutron and Nova on the left side from the OPNFE picture translate that over to what we call RELL OSP so that's Red Hat Enterprise Linux OpenStack platform in that upper box you see compute network storage. Those are the obvious key components of a cloud and the management piece really is what allows you to deploy and continue to monitor, maintain and upgrade your infrastructure. Underneath that are these other key components. We're all familiar with KVM and its user space component QMU, the application library on top of that that allows you to program, gives it a programming interface as Libvert and we've had to make a lot of modifications in the KVM and Libvert layers as we're trying to expose features out through OpenStack. At the bottom you see an SDN controller and Ceph or potentially Gluster as a storage technology and there at the bottom in the sense that these cross all of the compute nodes in a data center. For us the SDN controller is either something available through partners, through a third party or we're working heavily on the Open Daylight project and hope to bring that into part of our product portfolio in the near future. On the right hand side you see something called CloudForms. There's an upstream project called ManageIQ and this is an orchestration framework really. It's a tool that allows you to manage your infrastructure. It has a history that starts with doing kind of taking events from your infrastructure and then responding to those events and changing how you're doing resource allocation and giving you kind of a catalog of resources. That type of activity is similar to what you see in a management orchestration layer in the Etsy NFV reference architecture. So if you look at what we're doing in OP NFV, we talked about some of those specific pieces. We talked about OP NFV being a mechanism for tying those pieces together. On the left hand side, that's just the platform and in OP NFV one of the things that we're trying to build is testing infrastructure. So we wanna take all these pieces, we're modifying them upstream, we're bringing them back into OP NFV. What do we do with those? We have a project called Feros which is geared towards building actual physical lab infrastructure so that we can run this environment and actually run, like use a traffic generator and run telco type workloads on top of this platform to validate that it's actually doing that we've set out to enable it to do. Some of the components that build up the OP NFV project that help us get to that point. On the left, you know, the right hand side, the green stuff, the far left box is the continuous building integration. So we need a CI system. We need to be able to bring these pieces together, assemble them and then launch them into a testing phase. And so you see code names internally to the OP NFV project of projects that we have focused on deployment, the integration step and the testing step. And finally, what's really gonna be important going forward. So this is just foundational building blocks. How do we make this whole system work, validate, test, iterate, like we wanna make sure we have a stable platform that we're starting from. What we're really trying to do is bring it forward. And on the right hand side, you see the requirements projects. And we'll talk a little bit about some of those requirements projects later. But in those requirements, we're looking at what are the functional gaps in the open stack, Linux, KVM combination that we're using to build a network virtualization infrastructure layer. And then how do we take those functional gaps and turn those into development efforts and actually solve the problems that we're trying to solve for the telco use case. Okay. So looking a bit, so what is actually the motivation for like the industry to really invest so much money into NFV. So for me, it's really three things. So automation really being the most important one. So being able to deploy systems at large scale in a fully automated fashion, have the full lifecycle management automated elasticity of systems. And then as a result, we have all these benefits from flexibility. So if you wanna like roll out new services, that's all what's going to happen in 5G networks, right? So like reducing the deployment times from like days as today and two seconds. So adding new services, removing new services without having to actually change anything with the physical infrastructure. That's a key benefit. And then maybe actually maybe less important is the whole cost efficiency through consolidation benefits. Also important, also important to have good performance and so on. But the first two are really the key drivers for NFV at this point. But specifically, so when like we talked to our customers, for example, so we get a list of actually concrete requirements. So like availability in the virtualized networks, fault management for that. So remember like five nine's ATCA systems. So in the cloud, these things are all a bit different. So our mission is to provide the same level of availability with a more flexible approach. So faults can happen. It kind of have to be able to detect them and react accordingly. Performance, so you want to virtualize, but you also really want to like take advantage of like the hardware capabilities. And like especially in like certain Teco functions, it's really important to squeeze the last bit of performance out of the functions. And then so like building larger systems like requires new forms of orchestration that like across data center, across domains. That's also an important requirement we are getting. And so for the rest of this talk, so we're going through like two selected work items that the industry at large, whatever NNAC in particular, have been working on. So this is our like complete list. So detecting and notifying about hardware failures. That's what I just mentioned. So you need to be able to really know what's going on in the infrastructure to be able to react correctly. Collecting information and configuring BM allocation was all like CPU pending Yuma topic. I mentioned orchestration already. So open stack availability. So the availability of the WIM itself is important. Like physical server scale out, live system upgrade without like minimizing the overall system availability. Like more advanced VM connectivity also has been discussed today in some talks and then better ways to control machine machines. So today we are focusing on the first two. So it's a little bit. So what's the motivation behind? So what's our approaches and what's happening in the open source community? So as I mentioned before, so of course we know failures can always happen. So everything kind of has a limited lifestyle and always has kind of problems. So the question is more how to really avoid impact on like critical service availability. So like, of course that's a general requirement for all kinds of networks, enterprise, and telco. But the telco environment, well this not being able to achieve this can really have dramatic consequences. So that's a real important thing. ATCA solved that by like, investing heavily in standby systems, intensive monitoring of all the functions, all the blades in the rack, like specific monitoring blades per box, and also a tight integration of the ATCA system into a network operators management system. So there was always a very like detailed and deep view from the management system into what's going on those boxes. And well, obviously in the cloud, this has to done differently. So we have to think about how to, what is the right weight and the right level of telemetry and a reaction approach. So without really losing all the benefits of virtual relations, without losing the ability to move systems around to scale out and so on. And so like in the ATCA world, I mean there were like whole taxonomies of what can go wrong and so how this information can be used to also predict, say more dramatic failures at some point. So like physical machine failure, the chassis can have problems, storage can have all kinds of problems, the network can have different problems. So there was quite some effort in really analyzing this very, very deeply. And telecom operators today, I mean still have like the desire to be very well informed what's going on the system, maybe not to that very detailed level, but still that's an important requirement to at least be able to know that for certain functions. So in the NFV world, so in anything we're seeing three different approaches at the moment, so there's like first approach basically, is like reporting like hardware failures through the hypervisor to VVM and from there to the say element management system and then some higher layer orchestration system. So this would basically mean like the VVM and potentially the NF application itself that would be aware of what's going on. So you can see this a little bit as a mechanism that systems that were ported directly from like previous platforms would use where you kind of want to integrate or still want to maintain the integration in the management system. So that's still something you have to be able to do today. Option two, that would be say the more cloud ready approach. So basically try to escalate problems through the VVM and then have ways to like for like automated reaction and then some escalation decision and at what point you need to like inform higher layers to react accordingly. And okay, so there are also of course like existing like systems that can also do certain monitoring functions in these cases. But it's interesting to like compare the first two. So for the like option one, say one question is how to actually report like problems from the hypervisor to VVM. So you had to think about how to like relay error notifications, so how to emulate this essentially. And so what function you would have to add to the guest OS to be able to deal with this information. For option two, so if you are like working with OpenStack, you probably know that it's like there's Xenometer for example for monitoring and storing events. And what would be like the cloud approach to use a general orchestration system to react to failures. So looking at what has been done in this space, especially for like option two, which is like I think the desirable midterm, the long-term approach. Actually quite some work by the community has been done so for addressing like Xenometer performance topics. So say how to deal better like with like high amounts of like measurement data. So like time to live extensions to Xenometer. There's has been like new work started on Gnocchi to the time series database that's currently going on. So that's good. So that's of course, that's the whole industry working on that. So companies like operators when providing requirements, developers right ahead, but also other companies like addressing those issues. That's hanging a very good step. It's also good to see that OP NFE and it's a doctor project, like identifying further requirements and are working on further implementation plans. So more on that later. If you're interested in the topic in general, so Russell, Brian had a nice blog post on availability topics for OpenStack. So I recommend you to check this out. So this is a nice example so how the industry is trying to address these really strict requirements from like operator customers and to arrive at a reliable but still flexible cloud-based solution. So the next topic, so we call that collecting information and configuring VM allocation. So if you think about, let's take this example here from our VEPC system running in a virtualized environment. So we have different functions, so in different VNFs. Some of them are like dealing with signalling communication, others are really forwarding as a user plane traffic. For some of them it makes sense to collocate them onto one box and to really optimize the intercommunication between them. And so you can see if you were like dealing with a user plane gateway, you really don't want to lose any performance by having to move like the application from one processor to the other. You really want to be sure that you have an optimal memory allocation in these cases. So all these numers, CPU pinning topics, there's a real justification behind this. So of course that's not only integral networks, we also know that other kinds of networks have similar requirements. But so having these really hard performance requirements, so vendors, so before we had all the open source solutions started to develop their own approaches how to really implement that, so also including us. And the way to do it is basically you have to actually be aware of what your infrastructure provides or how the architecture of your compute node looks like and so on to be able to make good decisions on how to allocate compute platforms and so on. You have to be able to express certain requirements for VNFs, for VMs and then do the allocation and do things like CPU pinning and RAM allocation. And yeah, so what our previous range of products did was to have a concept of expressing requirements for VMs. So basically some kind of flavor approach as we call it today so being able to set like the resource control level for different types of VMs. So for example, for some VMs it's really important to do CPU pinning, disabling, crossing newer nodes and disabling sharing of the physical core. And so here you see this example here. So assume you have like this, say, critical VM two, which is like initially configured so it was like level one or three, so that basically means we don't turn, or we don't disable a new node crossing. And if you turn this on, of course, you make sure that, well, the whole VM like stays on one new node and like similar here, you would turn on, or you would disable sharing physical course and then by that making sure that this critical function really does not share the course with any other application. So that's, I mean, state of the art in visualization today. So what we are doing is basically trying to transfer the way that our orchestrator used to work, like to open stack and trying to generalize the approach a little bit that makes sense, not only for the tech environment, but also for other types of deployments. And so also there, I mean, there has quite some work has been going on or has already partially completed. So the word driver guest VCPU topology configuration was part of Juno. The new one node placement is part of Kilo and word driver pinning guest VCPUs to host PCPUs. The large page allocation for guest memory and also the PCI based new mass scheduling. So this is all work where like companies like Red Hat have invested significant development resources where the whole industry provided requirements and contributed development resources. So quite a lot has been going on and that's quite an achievement, I have to say. Just want to add one thing to that. Yeah. You mentioned something that's important, which is the initial attempt to do this that NEC had worked on and a number of other folks in the industry was often unique to their view of the world. And so when these requirements were surfacing from the telco world to the open stack world, it looked really foreign to open stack people. One of the concerns was you're proposing an interface that says let me take this virtual machine and stick it on that machine right there on those CPUs with those banks of memory and the cloud side of the house saying you're out of your mind. That's absolutely crazy. It breaks the abstraction that we're trying to build with the cloud. And through this sort of back and forth, what is the real problem statement that you're trying to solve, not the solution that you're coming with. We converged on this set of blueprints and code merge that allows us to get the same level of packet processing efficiency without breaking the cloud model. And in fact, to a certain degree, as Dirk mentioned, this is useful for other use cases and you see in Amazon, for example, you can get an HPC instance. You could probably guess what's under the hood of that HPC instance. Right. And so looking a bit into the future, so we mentioned the OP NFV doctor project before. So this is like one of the projects in OP NFV that is like initially developing requirements. So what do we still need for fault management? What is not yet there? And then thinking about, okay, how can we implement this or design this, so in the right way and contribute this to upstream projects? So I'm not sure how much you had heard about OP NFV doctor so far, but so we have just released our first public document. So describing the whole idea and the concrete requirements and also some implementation ideas. So check it out on the OP NFV website. So there are two use cases that we have described. So one is like the fault management use case. So if assuming we have a hardware failure on a physical machine in our infrastructure, so what would be the best way to analyze this in the room and how do we decide, say, how to escalate this to higher layer orchestration systems? So for example, figuring out so what VMs were actually running on that box and so who kind of owned them and so what is the right way of informing higher layers? For example, enabling and active standby reaction. So standby reaction. So that's one thing. The other thing is like the kind of a maintenance use case. So I mean, even in the cloud we kind of have to change disks or upgrade systems at some point. And so there are the ideas that, so like cloud data center operator at some point, he signs, okay, I want to do this maintenance on this box and then also again figure out, so in the same way like a map this physical machine to like affected virtual machines and then notify higher layer orchestration. So the document I mentioned describes this in more detail and then also derives a kind of concrete list of requirements for that. And so there have been two blueprints coming out of that work so far. So one on Cilometer and one on Nova. So like notification alarm evaluator for Cilometer and so extensions to a new Nova API. So I think they have been discussed this week and so far we have gotten quite positive feedback. So that's I think for opening NFE quite good result. So in like this startup phase of the project it kind of shows how the industry is currently working on making fault management even better and trying to work with those absolute projects. So there's the link to the website for you there. And do you want to look at that too? So that's what we wanted to talk to you about today. First and foremost the telecom world, the telecommunications service providers are reinventing their networks and moving all of this infrastructure from function specific hardware, proprietary hardware, functions that are trapped in hardware and expensive and difficult to roll out ways to an open source infrastructure. And we believe that we can use these components to build the right level of performance and availability needed to address the market segment. Also important, just by being at the OpenStack summit this NFE platform is again more than just OpenStack. It's a broad set of projects and we're really looking to impact the entire stack so that we build this combination into a service provider capable infrastructure so that the service providers can bring the same web scale and web agility to their data centers that we're seeing in the modern web scale applications in the internet. And then as I mentioned earlier, it's really important both from a Red Hat and NEC perspective and we believe in general for the industry to focus on upstream. All the development that we do is pushed directly upstream all the conversations that we have. We try to engage in the community rather than doing it behind the scenes and creating solutions that are either locked into a particular vendors platform or sort of bolted on to the side in a way that when it gets to the community it's really not an acceptable solution. So upstream first is a critical part of the development process here. In OpenStack, we have a telco working group so for those of you who are here, who are interested in NFV and telco features and are not in the telco working group, who is that by the way? You're all part of it. I don't believe you because I don't recognize all the faces. Please come to that working group as a way to organize our efforts and what we're doing in OPNFV is directly related to this. So in OPNFV what I like to say is if the OPNFV community in the OpenStack telco working group community aren't the same people we're fundamentally doing something wrong. So our goal is not to have this external group OPNFV to do stuff off by itself but it's to be really a part of each of the respective communities that are important. So we're kind of blurring the lines and merging together with this working group and ultimately as we gain sort of awareness in the community and credibility in the community we don't necessarily need a focused working group we just have people that are a part of the broader community. And then OPNFV is there to help take that industry defined reference architecture turn that into an open source built reference implementation and OpenStack and the relevant projects that we talked about today are really critical to that. So thank you for your time and we're really excited about this part. Questions? Yes. I think you have to use the mic because we are being recorded. Sorry. So you talked about this hardware fault detection and notification to the guests. Shouldn't the guests normally not make any assumption about whether the underlying server has that capability or some other server may not have it and just do some application level keep lives so that it runs regardless of the server has some acceleration or not. I mean, shouldn't that recite a good VNF should do its own fault detection? Any comments? Well, first of all, my point of view, totally agree. And what Dirk was saying is there are applications right now that are trying to run on this platforms that don't have that capability built in. So there is a mechanism and a method for injecting faults into those guests where they were running on bare metal, they used to be able to detect faults locally and sort of preserving that kind of behavior. But long term, the right solution is faults can happen. You build your applications in a cloud aware way. So you're prepared to deal with faults at the application level. Absolutely, and they're one of the great things about the NFV sort of transformation of the network is just creating opportunities for new people to come in and build new services for the telcos that are from the beginning designed in that fault aware kind of cloud application style. So absolutely. So what we are seeing is there's of course existing investments, that's one thing. I mean, it's also mindsets that need to change. So how to develop VNS in a cloud aware way. And this is happening very early until it has fully happened. There's still a demand for this other way. Yeah. Yeah, on your slides, you had a couple of projects on innovation that had to do with tuning the performance of compute like CPU pin-in and such. I didn't see anything that had to do with improving the performance of the actual network. Like positionally where the compute nodes are in a network or how to increase the performance of the network. So my question is that of any, is there any work being done there? Or is that of any importance to OP NFV or is it low? Do you know of any projects that are trying from within OP NFV working group that are looking at that? Right, so OP NFV has a data plane performance project. So that is basically looking into say, compute infrastructure performance improvements. So where I see this become important is if you look into like service chaining, where today we are trying to figure out how to implement service chaining at all. So how to do encapsulation right and how to attach metadata. But I think in the future it will also be important to do this efficiently. So like to avoid having too many hops in convenient locations and so on. So co-locate functions and so on. So OP NFV has two projects on service chaining. And so it's possible that, so these aspects would be addressed there. We also have two projects on the data plane. Accelerations, one is really kind of broadly how do you do performance measuring of a forwarding device and then one more specific to essentially the integration of DPDK and OVS. So that's host level packet acceleration. And then work that we've done together and you see in Red Hat to enable, first you wanna do quick packet processing in the host and then you need to deliver it efficiently to the guest and the traditional IO stack that delivers packets through KVM is Verdi O. There's been a specific project using shared memory between the host and the guest and presenting that shared memory to the guest as a network interface card and then a specific DPDK pole mode driver to pull packets out of that shared memory segment. So from the host Nick, you have something that's essentially allowing you to DMA almost directly into the guest, not quite but almost. So there's absolutely a lot of focus on that but the CPU pinning, the IO awareness, the NUMA awareness, that's actually part of packet processing acceleration because you can't efficiently do, some of this is CPU bound work and you can't do that efficiently if you're not confined to a NUMA node. So they're all kind of complementary. Oh, I think we're being thrown out. Thanks for the questions. If you have more questions, come on.