 Okay, I think we can get started So my name is Adrian Holman Hi, I'm Isaac Yamagata and we're welcome you to here today to talk about networking and networking in an open stack context One of the things I guess Intel has You know been a keen supporter of and contributed to over the years is about open standards And also open source software stacks in particular, and we're really excited to be part of this open stack community I think there's a huge amount of interesting things we can go and work together on The amount of talks already this week about SDN and NFV So I'm going to gloss over the slide pretty quickly, but when we think about networking It's really the combination of these two paradigms So when you think about SDN It's the separation of the control plane and the data plane and then that really is there to help give you The sort of logically centralized view of your network also helps with various innovations that you can introduce On the NFV side, this is more really about leveraging standards IT infrastructure How do you move these network functions into virtual environments, and then how do you deploy them on industry standard high-volume servers? The combination of these two paradigms They're quite distinct, but they are mutually beneficial and that combination together offers great opportunities for innovation for OPEX benefits, CAPX benefits as well as being able to produce new services faster to your system Now in that context flickly of SDN and NFV, often times we think about that sort of top layer of Appliances really from a more of an enterprise side So you might be thinking about load balancers, firewalls, switches, routers, those type of devices But really when you think about the whole gamut of things we're looking at from an SDN and NFV perspective There's this other huge Section of the market that we're looking at virtualizing and applying on so it goes everywhere from the Wireless core for evolved packet core type appliances all the way out to the edge with border gateway devices And then right the way up and even into access space So there's quite a significant market here to go address and one of the things we have to think about And talk more in this session is what's the implication of applying these type of workloads in an open stack managed environment? Now a lot of the Talk process around that is is to move from this type of an environment. So this is where you've got More of a traditional way of developing particularly on those appliance I mentioned So you go from this very monolithic vertically integrated solution It's generally deployed on proprietary hardware and systems And we want to try and move is to this situation over on the right-hand side We're taking all of these applications. We're moving them into virtual machines Then deploying them on interesting standard high-volume servers Now a lot of the I guess the specifications around this are being developed at the Etsy and a fee community That's the European telco standards Institute And that's a grouping of I guess the world's leading service providers and tens and we're looking at what it takes to move These type of applications into this type of domain What are the specification requirements and how then does that go and impact all the various standards initiatives? Now when you think about all of those type of appliances They bring a very different workload characteristics to an open-stack deployed environment that we need to think about In many cases if you look prickly like wireless or a warline cores Imagine the case for your cell phone call. You don't want jitter on that. You want a good nice clean voice connections You want to make sure you have support for 9-1-1 or 9-9-9 type emergency calls so Really what we're actually doing here is you're transforming country's infrastructure and as a result this is subject to lots of regulatory type constraints There are a huge amount of standards that these virtual appliances have to comply with You're rarely going to get greenfield deployment So you have to make sure all of these devices interoperate with existing discrete appliances so We've got to bring some considerations from that space into open-stack you move into this carrier-grade environment then and often times talk quite a bit about a high availability in an open-stack context and Many cases that's more enterprise and some public and private cloud type high availability The carrier-grade benchmark for high availability is quite a bit higher if you think about the service contracts to have with their customers, you don't want to drop calls and Looking in at the regulatory side of things that can actually be financial implications of certain outages So you have to get this aspect right We have to look at how we schedule in an open-stack cloud So for instance, it's not just about compute or storage or your networking when you want to deploy a particular plans You need to make sure you've got all of three of those things working in unison So I think the community has some work to do in order to try and move forward on that Responsiveness so This area that I think we need to think more about is things like alarm generation metrics generation You have to be able to respond quickly to events in a system if there is an outage You need to address it quickly and within certain boundary guarantees Manageability there isn't really good life-cycle manageability in age in open-stack So is that who's going to talk to some things we're doing there quickly around a service VM? When you do quickly this in a fee environment the service providers have massive investments in the higher layer orchestration softwares things like operating support systems business support systems So well, we need to work on how do we interact between something like an open-stack environment and these upper layers? Often time is the world be there will be a few components between them, but nonetheless though the requirements Integrating OSS BSS downwards is going to impact what you need to do and the type of interface that we need to provide in open-stack Predictability is another characteristic we need to look at One of the items in here if you think about it's not just predictability of open-stack as the environment itself as the sort of management plane But it's the predictability of being able to deploy workloads So for instance, it's not it's not really acceptable to say I'm going to deploy a VM and not really understand that that platform You're deploying on is suitable for it is ready for it as as a various data plane networks up and alive and ready to go The next two are going to talk about those together because It's not just about Performance and we can't just think about scaling out as a method to get performance and to get the type of Capabilities that you need in the carrier environment You have to combine high performance IO with predictability with low latency with a low amounts of jitter So looking at these two things together is really starting to influence a lot of the thought process around some changes We have to make So Collectively all of this is going in influence open-stack We're working with many of the folks in the open-stack community like for instance Erics and other members of the Network Cloud Builders program that we have running. So the folks are interested in that. Please come up afterwards. We can chat some more about it But ultimately what we have to do is now influence Thinking around how we get some of these requirements into open-stack because it's very different from the sort of enterprise private cloud mindset So as I could talk more on the manageability side so I'd like to introduce one of our Contributions to open-stack One of the key component of NFB is livestock management so We call it the advanced network service framework The scope of in the in the open-stack community there is several efforts for to support NFB with Neutral networking the scope of Scope of NFB is rather large. So there are several missing building blocks for network function virtualization so One of the key component is a life cycle management so we so we are going we are now contributing to it so life cycle management So for Neutron Neutron service plug-in or each service drive device driver implements their own life cycle management so it increasing development cost and cost and And the way to life-cycle management is their own so the Interfaces in cost in cost in consistent and In constant in constant and also for users You that doesn't care what don't don't even don't know or don't know that the service is actually Is that rice? physical physical so it is quite Desirable for users. It's able to manage those services with uniform manner in Reliant of the services but rise over but rise over physical physical one so so it so it is so it is so it So it is desirable to create a common code base to provide a unified interface for life cycle management management so that for for vendors It lowers the bar for vendors to provide virtual appliance And at the same time for users it provides a consistent uniform interface for Life-cycle management. We call it this framework as the advanced advanced network servicing frame mic so this block there This slide shows that block diagram diagram of the framework. It consists of Three three main main component one one main Main component is keep is takes care of life cycle management. This is the heart of heart of The framework it keeps track of which services Service VM is created and used and it it also tracks which Which network service are used and how it is configured and next component is a rest API so that cloud administrators can manage Service VMs or services through this unified interface and last one is a Communication communication interface Here communication means communication between a neutron service and Service VMs and services Open-stack service is owned by cloud administrators on the other hand Service VMs is owned by our cloud users. So there is a security boundary between them So it is not so easy to communicate for for for them for them to communicate between between It is it is somewhat difficult For neutron servers and the VMs to communicate So let's have a look at how this framework works Let's suppose Some cloud users wants to publish their web applications He has a web web application servers and it also he has he also have a backend database server before he he before publishing he before Publishing the services he wants to is deploy a firewall firewall services between public internet and application servers and backend database servers For security reasons So the service insertion is requested at the frame of the framework notices There is no service VM available for firewall So it it talks to Nova to boot spin up service VM So Nova spin up service VM Then Service service agent in a within service VM starts up at this point Any configuration is not done. It is a kind of a blank state. So configuration that user defined is pushed Needs to be pushed into this service agent service agent. So vendor driver sends its configuration data it to configuration agent and it does configuration agent inject the Service configuration into the service VM. Then service agent knows how to configure the service Then not so service is enabled and it's working and also we want to Insert a firewall between public internet and And the application server. So this process is also be this process is repeated again Service VM is spin up and service agent starts up. The configuration is injected into the service VM Now Now firewall is deployed. So it is not it is ready to Publish the web app web app web service to the end users. So end users will access to this deployment like this Okay so as as the development Going on and now the discussion In the community goes on. So it is Raised that this framework is not specific to network actually So it is very venous. It is good for other open-stack project so So it is It is that discussion is going on going is going on to create a new new project dedicated for this life-side management actually in this Open-stack summit it will be discussed But this and I think I it will be decided to create a new one officially So this frame up will be moved out of a neutron project So I think after this summit the incubation in the the incubation process will start so For this and metrics and showed the current status. It means what we have right now basically the implementation of proof of concept is Almost done. So I've tried to push this Parties into net to neutral upstream but the new the discussion on new project is Started so that merging process has been suspended for now so New project will be a start is about to start So there is a lot of to do so there are lots of Room for you to contribute this so please join this project Thanks, Azaku. So now we've talked a little bit about the manageability and life cycle adding that in I want to go back to what we talked about in terms of the performance and predictability and it's sort of the traffic profile so First thing to think about is well, why would you even bother doing some optimization and configuration on the system? I just took a small sample of some performance data points that we've collected over time Top left one shows the advanced encryption standard new instruction So that's for crypto processing this case shows it in an IPsec workload where the kind of bottom lines in the bottom line And the top ones are the difference between optimizing and tuning your system versus just kind of blindly deploying Top right will are yeah We'll talk some more about on DPK the data plane development case and how that drives the IO related processing capability Bottom left if you're looking at some crypto related offloads or compression offloads huge capability there for PCA accelerators to get involved And on the bottom right another instruction set based optimizations So leveraging the advanced vector extensions and using that for some crypto workloads So the key point of this is that there's a huge amount of potential in your infrastructure if you tune it and configure it properly and In an NFV context, that's particularly important Well, we really have to go do now is unlock some of that potential and make it easy to use Would readily accept that everybody isn't going to want to use this It's not for every use case But for some of those NFV ones that I talked about They really have to get access to this type of material if they don't you're not going to get the type of performance that you Really need to be successful So one of the things we've got to go do now is expose that These CPU and platform later features into Nova into the scheduler to make sure that we can now provision based on workload requests So a brief look at the scheduler just put it in context. So really we've got a filter scheduler running in Nova. It's Kind of a two-part process really there are 20 plus filters in there right now The filters range from everything like host aggregates CPU load the disk allocation There's also this compute capabilities filter and really you apply all of these filters their binary in nature So it's a pass or fail you can link them all together and based on that you come up with the right subset of platforms based on your input configuration Once you get your right subset of platforms you go into a waiting discipline and right now It's really RAM based utilization. That's Really deciding what order you try to deploy VMs on those platforms So one of the things that we've released into ice house and Libvert are some changes to allow the full CPU feature set to get exposed up through the Libvert layers into the Nova compute Libvert driver That allows the Nova DB to have access to the full range of CPU flags that you've got So everything we expose with CPU ID You don't have to care about the instruction, but it gets exposed up to all the Linux and Libvert subsystems The impact of all of this is now if you know you have a workload and you've you've gone to the trouble of either Optimizing for a particular instruction set or you just use some compiler optimizations and you want to identify a particular CPU CPU type or generation you can now specify that through the flavor In this case, I'm showing example Let's say if you wanted an AS and I for some crypto operation There may be a PCIe accelerator for maybe some crypto offloads and you've got a Really a heterogeneous setup in your network or in your infrastructure. So some with some without some a mix So the combination in this case of the PCIe filter to find that right PCIe device You want and the extensions that the compute capabilities filter can leverage can now get you to the right subset of platforms Then you get into the normal waiting discipline. So it's regular as is Now Extending it like this does proliferate more metadata to the system So there's a very interesting project in development called graffiti There was a talk on it yesterday. I hope some of you got to see But the graffiti could be very interesting way to take some of this and up level it so that it's more manageable in the environment Now once you move on from this feature mapping so the feature mapping is I have this VM I've developed it for XYZ features. Give me the right platform with that once you move past that now you have to go and configure the thing properly and Really in an NFV in an SD-in context, there's a huge amount of configuration You can do that gives a really incredible benefits to the workload So I think of a new mud and on you know for memory architecture We've gotten platforms now with your multiple sockets and it's different Speeds to get to the memory depending on which socket they're associated with So what we've got to do are things like CPU pinning and isolation make sure the workload exists in a place We have to make sure that the host OS doesn't interfere with that for the VMs We need to make sure we're not landing other VMs on this type of a platform That's going to move around and impact on your great configurations There are some interesting stuff you have to do around the IO devices So make sure that you're getting IO's that are closely coupled with the CPU that you've decided to pin your workload onto Make sure you're getting memory close to that socket that you're Minimizing the amount of QPI this sort of inter socket transfers that can happen So this is all work that's starting now. It's we're going to target, you know Hopefully for this or watch out for some blueprints and please contribute to them if you You know you have some opinions on how we need to go on develop this but this type of configuration really does lead to being able to deliver on some of these performance and Predictability type characteristics that we have to go work So moving a little out of that more compute centric domain and if you start looking at the networking site Well, we have to go do there so Lot of what we talk about particularly in terms of the sort of explosion of that east-west traffic and You combine that with getting more and more cores per platform and then in time when we do smarter and more sophisticated scheduling algorithms because You have to remember east-west traffic doesn't mean on the same platform Typically within a data center, but if we get smarter scheduling we can co-locate these because you've got the capability on a particular node You can save on lots of fabric related bandwidth Now your need for virtual switching and virtual routing capability performance really goes up So lots of great opportunities and challenges here. We need to go work One of the things that we have been doing is working quite a bit on the IO capability of These generic platforms. We've got these sort of a industry standard high volume servers what this chart shows is the progression of 64 bytes a small packet performance forwarding data points. It's a really common one that a lot of the network equipment manufacturers would really look at in terms of Determining how fit for purpose a platform is And the type of increase in throughput that we've seen over multiple generations here really does line up with the type of thing We've demonstrated on more of the compute side with Moore's law and so Really what this does is it opens up the opportunity for Network vendors now to go and deploy on a single architecture with single tool sets to help to consolidate what they need to do and you can Stay on a sort of an industry leading beat rate of micro architecture and process improvements Part of the enabling for all of that is the software suite that you need to load And one thing we've been working on is the data plane development kit Now dbdk is a collection of Utilities at its core is this pole mode driver framework It allows you to pull packets out of your neck and get them into user space incredibly quickly Bypassing lots of potential overheads you might find in standard networking stacks There are a number of utilities around that too that are particularly important to leverage in packet processing workloads so memory management Cueing functions and flow classification Now what we did is we released all of this with a really open license. It's obviously licensed in It was April 2010 There's a new completely independent open source community has formed around this now That's at dbtk.org and until we'll contribute to that too But what we've done now is take some of this dbtk as a foundation and try to apply it to the switching domain So we've forked the open vswitch.org program and we've created this until dbtk vswitch It's created as a reference architecture. We want to demonstrate How do you get the performance capability out of a dbtk software suite and leverage that in the vswitching environment? We're also contributing into The main line of open vswitch.org so there are patches have been released in there now So if you want to look at the head of that tree you'll get to see some of that dbtk related enabling and You combine that then with work that we're doing with a number of vendors to take some of these technologies and pull them into commercial switching and V routing solutions So we were demoing this this week I think the demo might be closed now But the performance we were showing was an approximate 10x going between a standard open vswitch Org solution with a vertical style interface into the VM and an Intel dbtk vswitch the reference architecture solution So with the dbtk type guest, it's a 10x type of important performance improvement So there's a really incredible performance potential on offer What we're going to do now is release the patches that we've enabled for that on one of the Intel sites called zero One dot org that should happen in the next week or so We're going to take those patches then and start to upstream them through the regular blueprint process during the genome cycle So please watch out for that The patches we're talking about are really about the sort of initial setup and configuration of your vswitch because this reference architecture for vswitch It it's really a replacement of the forwarding plane The standard open vswitch control plane interface still exists. You've got OV STB. You've got your open flow interface But there are some subtle differences in how you need to configure that in your platform So what we've created is some patches into Nova into the lipvert driver of Nova These target at the first cut is the VIF bindings So you're open user space now. You don't have the V8 pairs that you would use in the kernel for binding between Let's say the north side of the integration bridge you set up And the tap device you want to create coming from the VM So we're going to bind that with patch ports just another method of doing it The other thing we've put in here is huge page table support So one of the performance Methods that we use with dpdk to get that incredible increase is to leverage huge pages typically one gig in size So I wanted to extend here is all of that huge page related memory in your platform expose That amount up to Nova and start managing the huge page and the money free huge pages that you've got Related work then needed to go into the regular open vswitch agent So this is more than a neutron side You know the patches to go in here or to look at the binding between the physical bridge or the tunnel bridge You've got at the bottom of your network on the compute note and the integration bridge So again, this is all running up in user space. So you don't have to be pairs. We've moved to a patch port method There's a couple of other smaller changes in there like how you identified that this is a dpdk enable vswitch versus a standard one They're all going to be released to separate blueprints and patches into the Nova community Originally, we were going to target a new mechanism variable for this, but we got feedback during the Hong Kong Summit You know, please work to consolidate it on on this one vswitch Path, so we've taken that on board so all the patches have been updated to work in that environment So I guess to conclude on it then We're going to say that SDN and NFE that it is driving this network transformation And there are a whole set of new requirements that are going to come into the system They're going to need a fundamental mindset change in how we look at The so the type of patches that are acceptable into an open-stack community We are working with others to drive these changes I think we need to do a better job of consolidating all the work on that I think there's a lot of great ideas coming through and NFE But it's I think we have to do more as a community to be cohesive and get behind some of the changes proposed There are some very very very interesting challenges here So it's a it's a great area for technical guys to get involved and invest in These are tough things to go solve. I think we can if we work together on it So I guess it's a call to action is to work together. I'll make this a platform suitable for SDN and NFE Okay, so Any questions? I think all the results that you published assume trusted VMs right and The drivers on the VMs have to be modified You cannot use the standard virtio type of stuff, right? So the trusted VM issue is a big A big concern here. So how do you see this being addressed? So the the 10x does leverage a Dpk based guest so you need to have that in there But there is a model that doesn't get that quite high performance is around 3x using a standard virtio interface in the guest Right, so that there are other models. I think you might be referring to it There's one called an IV shared men model which does require more of a trust base in your guest, but It's options. So you kind of pick and choose if you want really high performance a Dpk based guest that can be trusted Or so you don't have any trust concerns there So if the guest is not trusted you have to go back and use cumul essentially In order to communicate to the guest, right? So the performance gain goes down to maybe a 2x or something like that, right? So with a regular virtio interface in the guest you have about a two and a half X Okay, with the Dpk enabled guest. Yes, you still have security But the performance is about 10x you can go faster again with an IV shared men model. I'm sorry I can hear that hi. Do you have any update on this pdk program and tells us pdk signal processing? I don't I don't work in that program, but we can chat offline. I'll connect you to the right Okay folks, and thank you very much for coming If you have any questions, please come to us afterwards