 can we start now okay good afternoon everyone welcome to our session so in this session we're going to talk about basically the future of nfb how do you get over the virtualization phase into the more cloud native nfb this is a joint presentation by melanox technologies and meta-switch networks so my name is Chloe Gemma I'm the senior director of cloud market development at melanox and here with me is Colin Chagensa dancer so he is from the UK with beautiful British accent and he's the director of architecture at meta-switch networks so if you're from the telco space you probably have heard about meta-switch so they're among the first batch to be selected as one of the domain to the AT&T domain 2.0 vendors so they are a B&F software vendor so we are in the middle of a transformation for the telco industry and for any kind of transformation you'll go through stages so let's look at an analogy of a digital transformation so this is a five-year-old writing a block so he calls himself a digital native so he was born into technology so all the iPads and different tools come natural to him and then he calls his parents digital immigrants because the parents can actually learn to use new technologies but they were not born into this era and when it comes to his grandparents it's forgive his language they call them digital retards and they can barely use a toaster and when it comes to the NFV transformation we're also seeing three different phases from the hardware appliances to basically a network function cloudification the first phase is cloud Neanderthals so basically if you look at the telco industry and the vendors they're going through a consolidation phase like your Cisco Juniper they're putting a lot more services onto their edge routers and with that they can do certain kind of flexible configuration but it's definitely not very scalable and with NFV I call NFV the first phase a cloud immigrants so basically most of the VNF vendors I talked to they're simply just moving whatever software they're running on hardware appliances to running on virtual machines without really changing much of the software architecture at all so with that you do get some advantages of like being orchestrated by a cloud management platform like OpenStack but it really will hit a lot of scalability and resiliency issues that we'll talk about later and looking into the future we think NFV is going to go cloud native so in that phase the services will be re-architectured so that they're natively running in the cloud so they can dynamically skill and recover from failure in a skill out manner so although your infrastructure hardware may not be 5.9 availability but through the software and the manual and the hardware all together you can achieve very high availability and reliability through a different way so quickly looking at an example of a firewall so traditionally pre-virtualization if you want to scale up a firewall service you need to put in more boxes and these boxes are normally one-to-one redundancy to achieve high availability and in front of these firewalls you need to have a session aware low balancer in the sense that it really needs to know which packets to send to which firewall box because these firewall boxes are heavily stateful and they only keep the session information when the session was created on that box and then when we look at post-virtualization phase most of the firewall vendors I talked to their software is exactly the same I mean mostly the same as was running in the pre-virtualization phase but they can run them on virtual machines they're still heavily stateful in the sense that the session database is still tightly coupled with the firewall processing array themselves and then you still need although your low balancer may be virtualized you still need them to be session aware because these state this firewall virtual machine themselves are stateful so in the next phase how do we make the firewall service scale much better at the application level so we must split the firewall application itself into the state stateless firewall processing array and the stateful storage so basically when you separate your state into a logically centralized cloud storage you can almost scale up and down your firewall processing array infinitely and also when you want to recover from a failure for example one of your virtual machine comes down you can just adjust the low balancer so that the load gets distributed to other virtual machines so it's much easier to do service upgrade and recovery from fault so what does it take to achieve cloud native NFV it's really a teamwork from the application perspective the VNFs they need to be stateless and break into microservices and at the orchestration level I think the previous speaker also addressed this you need to have an intelligent manual layer that can really collect statistics from the infrastructure and see how it operates and then feed that back to the orchestrator itself so that you can optimize and adjust your resources in real time and then from the infrastructure you must have an efficient infrastructure to be able to support your cloud native VNFs and your intelligent manual so first we talk about applications what do we mean by cloud native VNFs so you must be able to auto provision them I cannot have a whole lot of manual intervention and also auto scaling so this capability is already available in the open-stack infrastructure but you must be able to get your VNFs to be able to scale very easily and auto heating from the errors how do you do this transformation so traditionally when we look at the VNF applications they are very monolithic although internally they might be broken into components they're still built into this one big image and deployed as one whole entity so they're very monolithic and in that sense it's very intimidating for the software developers even though you're changing like just one line of code you still need to build this whole thing and it's very high risk for faster changes and also as we mentioned the VNFs are still heavily stateful so the first step of transformation is really to break that monolithic application into multiple microservices so that each module is much easier to handle and you also have a much smaller failure domain and each of these components microservices can scale independently and also it's a transformation from stateful to stateless so that your application itself can scale much easily so the benefit is very easy to see very small failure domain better scalability and resiliency and also because you have smaller components you have better business agility to be able to do continuous integration and continuous development but it does have impact on the NFV infrastructure now that you're broken while now that you're breaking the application into multiple microservices you have much denser VNF instances they can run in virtual machines or containers but you potentially have much more VNF instances on a physical server and also you have much higher requirement on storage performance because now your state is not associated with your transaction processing anymore and you will have much higher volume of east-west traffic between B&Fs so with that I'm actually going to invite Colin to talk about how they are going through this journey of making their VNFs cloud native thank you Chloe so for those of you who don't know about meta-switch networks we've been around for just over 30 years now writing complex highly available software mainly in the telecoms and networking world and if there's something that last 30 years has taught us is that technology changes and it changes fast and you either evolve with it or frankly you die there are lots of examples of companies out there that had dominant positions in the world yet even a few years later they're not around anymore so what we do is we regularly review our product range look at our customers what they need and sometimes make quite radical changes in our product line we did this through some of our telecoms products in 2011 and it was really inspired by the success of the over-the-top players in the telecoms market and trying to figure out for our customers how the telecom carriers could compete in this world and we asked ourselves a fairly simple question what would a free telco provided voice video and messaging service look like it was pretty obvious to us that it was going to be in the cloud environment but we decided to start with a clean sheet of paper so this wasn't about adapting and carrying forward our existing software this was how would we do it from scratch we decided to think like a startup so we wanted to copy cloud design paradigms and we wanted to leverage as much open source software as we can there's no point reinventing the wheel so as Chloe has said most of the heart of designing for the cloud is thinking about state where it stored how you manipulate it if we're producing a CIP product which is what we wanted to do for this market it's CIP state we're talking about so this is things like subscriber profile state which is the configured information about individual subscribers registration state so this is devices that have connected into you knowing where they are having their contact details and dialogue state you're in the middle of processing a call you need to know how far you've got the you're alerting is a call being dropped that kind of stuff so there's a lot of this state and every time we process a message we're basically manipulating the state we might be updating registration information looking at some configured information or just storing where we are in the call and a traditional box based system all of this state is stored within individual boxes and if you want to think about things like load balancing and all the rest of it has to be stateful you have to be directing the messages towards the box that has the particular bit of state you need in it but we decided that wasn't the right way to go about it we didn't just want to pour our existing products forward we wanted to copy the cloud paradigm so here's an example of the architecture that we selected I'm not going to go into all of the sorted details because I suspect not many of you are interested in the final points of IMS but I will talk about some of the structural elements so right on the left hand side we have Bono this is the edge proxy so this is the software element the individual subscribers connect into when they activate their device and this manages things like negotiating firewalls or traffic once the youth devices attached to that it basically stays attached to that for the duration of a registration now there's a horizontally scaled pool of those variable in size and we spreading coming traffic over that just using DNS round robbing load balancing so we don't need a particular load balance a box in there we just use the flexibility in the protocol to do that in the middle we have Sprout this is a bit that's actually doing the SIP routing now it has to have lots of dynamic ephemeral state this is the you know where am I in the call what registration information do I have and it needs to be because we want to scale it it needs to be spread between all of the instances so I'll come on to in a minute for that we chose to use Memcache D to allow us to spread that dynamic SIP information amongst all the instances and then that's complemented by two other horizontal pools of servers Homer which stores XML documents describing the configuration required for the subscriber's devices and homestead which basically stores a cache of subscriber profile information so this would naturally be in an HSS but for performance reasons you cash it locally for those two things we want more persistence so we selected Cassandra as the back end data store what advantages does this architecture bring us well first thing to say is all of the elements are active all the time and that might not sound like a major advantage but the classic problem with active standby is when your active instance fails it's just standby actually in a good state if you've chosen to use an active active model you know whether an instance is working on because it's got traffic going through all the time so it brings you a degree of confidence by selecting n plus m rather than one plus one get much greater efficiency we don't have to have a spare machine for everyone that's running we only have to have enough spare instances to cover the size the chunk of capacity that could fail scale out trivial as said we've got the load balancing element we just fire up more instances and adjust the load balancing there's no architectural inherent architectural limitations on scale everything runs in parallel we don't have bottlenecks and I said because we haven't got an explicit load balance a box in there we don't need to worry about bottlenecks from the load balance through itself which is a problem you often come across and frankly just looks right you know if you show that picture to you know a core open-to-date they'd say yeah well that's a cloud native application whereas frankly if you take a lot of telecoms applications and show their architectural pictures it'll be rule that's from you know 10 years ago but what about the state storage function you know it's wonderful to offload the state to another device but that doesn't make the problem go away you still have certain requirements for maintaining that state and those requirements and normally it's got to be scalable it's got to be distributed and it's got to be fault tolerant now there are well proven open source ways of doing this you've got Apache Cassandra MongoDB H base mem store the list goes on when you're designing an application for this kind of thing you've got to tailor the state store that you're using to the particular requirements that you have so that means things like read and write performance latency throughput it also means thinking about the nature and the size of the data you're storing you know are they large files is a key value information and you need to think about persistency is this stuff that you know when the system stops you can afford to throw away or is it stuff that actually you need to have a persistent copy for clear water which is the name of this project Cassandra fitted the bill for the persistent information so that's the device and the subscriber profile information and memcash D for the dynamic state now deployment details remain important I'm not going to tell you that simply selecting the right technology gives a viable deployment it doesn't you have to worry about this subsystems network overall topology you know there are lots of ways you can get this wrong but if you selected the right components you can achieve some pretty impressive results now I've actually got some performance data not from our own systems but from Netflix which is public information the biggest thing to show about this takeaway from this graph is actually the fact it's a straight line at the heart of their business is the fact that they can serve up traffic to their subscribers just by increasing the number of servers it's not bottleneck in here it's just want to serve more people you increase the number of servers they've got they're using Cassandra to serve a large chunk of their data and they get a pretty impressive performance figure but the most important thing as I said is it's a virtually a straight line so if you want to increase capacity you just add more of them and the same is effectively true of clear water so what are the results we developed and tested quickly I must confess it was on Amazon AWS it was kind of 2011 and it was something that was available and stable prototyping quickly proved the viability of this approach we scalability tested up to 15 million subscribers and 8,000 calls a second but that's just a test point that's not a it only works up to that if we'd thrown more hardware at it we could have gone higher fault tolerance has been tested in a geo-redundant fashion so we've spread those instances not just across individual cloud instances but we spread them across geographic locations so you automatically get protection against earthquake flood fire and failure of individual cloud instances which you know is a good thing because it's still hard to provide a few gone some of the other talks it's hard to provide a fully HA cloud instance we updated it pretty easily to support the 3GPP IMS interfaces we released its open source in May 2013 because that's part of the nature of what we're trying to do here and the first production deployment was in March 2014 Chloe. Okay thanks Colin so that's basically what the applications need to go through to be cloud native so the next thing in the whole NFV architecture is the manual so what does it take for the manual to be facilitating a cloud native NFV so basically the manual shouldn't be simply taking policies from your OSS BSS and just do the resource allocation it shouldn't be a one-way street it should really be a closed loop in the sense that after it allocates the compute storage and network resources and do the policy driven service orchestration it must be able to efficiently collect statistics from the infrastructure itself and then run that through potentially a big data analytics engine so that you it can really provide insights about the health of your system so I think in OpenStack you have the kilometer and you have heat so that's basically the first step to achieve a cloud native being a feedback loop so if you want to achieve that really you need real-time big data analytics so that you can get insight and then do the real-time feedback so that you can adjust your infrastructure very quickly and last but not least is the infrastructure itself so we talked about basically now you have VNFs that has the capabilities to scale up and down really easily and you also have a manual that can intelligently figure out when I should scale these VNFs and services up and down so but you still need a VNF infrastructure that supports easy deployment and portability of these VNFs and you need to do that without performance penalty so when we look at the spectrum of VNF deployment options we look at virtual machines on one end of the spectrum they have very good manageability and scalability but they're just a little heavy weight and if not implemented right you may have a performance penalty and then on the other end of the spectrum you have hardware segmentation this gives you maximum performance bare metal performance but it's a little hard to scale so increasingly we are seeing people like the VNF vendors and the telco service providers they're looking into using container Linux containers such as stalker so that you can have a lighter weight virtualization scheme while maintaining the portability but no matter what so for whether you choose a virtual machine type of deployment or container type of deployment both can benefit from SDN network virtualization to be able to port across multiple environment so this can be for example port across different clusters in the same data center or across different data centers or across different cloud so when you have for example IOT type of deployment you want really low latency so it's very likely that you VNF are going to be deployed in the essential office which has limited capacity so you really want the ability to be able to burst your excessive workload into a nearby central office or even another service provider central office or maybe a cloud service providers data center that's close by and both virtual machine and Docker gave you a very good portability so now that we established two key things so one is you're going to have more entities on the same server and across your environment to communicate with each other so they must do this very efficiently and securely and the second thing is once you separate the state from the transaction processing you're going to have like very efficient storage access to make sure that not only your control VNFs but also the VNFs that involve packet state like packet gateways can become cloud native so storage is actually a key element in your telco cloud so at Melanox we focus on basically the NFV infrastructure we want to build the most efficient virtual network to support cloud native NFV so that not only include a network that allows you to do near line rates packet processing but also in terms of storage because of the acceleration and offload that we do you can achieve much much higher IOPS at a much lower latency and we do all these because a lot of these processing are offloaded to the NIC hardware itself so that you don't really have any like significant CPU overhead so that your precious CPU resources can be dedicated to actually do the service processing not the packet processing so that's when we say your computer is also more efficient because now you have more CPUs to run more workload on the same same physical infrastructure so we achieve all these through three key things so one is virtualization and the second one is acceleration and third one is convergence so we'll look at them one by one first virtualization and offload so there is a technology called single root IO virtualization so this is how we facilitate high performance virtualization for compute virtualization basically when you move from bare metal to virtual machines there is a virtualization penalty that you normally need to pay but with SRIOV we are virtualizing the physical NIC function into multiple virtual functions at the hardware level so that each of the virtual function can be associated with a virtual machine so the virtual machine themselves can communicate directly with your networking device that's your NIC card on your servers so that the end result is you can achieve near bare metal performance when you are running virtualization so let's take a look at basically the result of SRIOV plus e-switch so e-switch is an embedded switch that handles packet processing in the NIC itself so with our latest generation of adapter card we actually are achieved close to 100 gbps on a single interface and this is virtual machine to virtual machine that's across your cut through your hypervisor and kernel layer and we achieve very good memory isolation at the hardware level with very low CPU overhead so as you can see on this picture on a 100 g interface we achieved somewhere between 92 to about 95 gbps of throughput from one virtual machine on one physical server to another virtual machine on another physical server and we do that at a very low CPU overhead and but SRIOV doesn't solve the problem of overlay SDN so that solves a problem of the penalty you need to pay for compute virtualization but you still need to handle network network virtualization and at this point of time a lot of network function virtual network virtualization is done through an SDN scheme that's an overlay style of SDN so with overlay style of SDN basically the SDN controller set up tunnels on top of your physical switch fabric and with tunnels your packet format changes so before the tunnel before overlay network virtualization you have the green part of the packet plus the the CRC the checksum and with an additional layer of overlay you will have the outer packet here I'm using VXLAN as an example so after you do this the tunneling protocol themselves actually introduce another layer of packet processing so you have to first look at the outer packet and then decapsulate that and then look at the inner packet to be able to direct that to the right virtual machine so this looks trivial but it actually may break some of the NIC hardware offloading because the inner packets are no longer accessible by the NIC and the NIC some of the NICs wouldn't know where the offset of the checksum really is and so with a NON with a NIC that doesn't support overlay offloading like the VXLAN offload basically the inner packet is pointed to the CPU and the CPU need to do the packet processing itself which really slows it down so at Melanox our NICs actually support both stateless offload for both encapsulated packets like the VXLAN tunnel packet and the native Ethernet packets themselves so with that you see a big performance impact so we've worked with multiple SDN controller vendors like Plumgrade, Midokura, NewWatch and Vmware NSX and this is what we've seen so the light green bar actually shows the VM connected by VLAN so basically this is our base case and then the dark green bar actually shows actually the red bar shows basically virtual machine to virtual machine communication with VXLAN but no offload at the NIC so as you can see on a 40G interface you can get to about 17 somewhere between 10 and 20G BPS of throughput but the moment you turn on VXLAN offload that's the the dark green bar that you see we get to a performance that's very very close to native VLAN performance so that's the performance that's without the tunneling the outer packet and you see a little bit of gap between the VLAN performance and VXLAN with hardware offload because this is done with our previous generate our connect X3 generation of NIC so it only does stateless offload it doesn't do stateful offload for the inner packet like in-cap and de-cap and with connect X4 that's coming out now we support both stateless and stateful offload so you will see these two bars getting very close to each other and near line rate and so this is a look at the CPU on the receiving host and on the sending host so the effect is actually more pronounced on the receiving host you can see that CPU utilization is about half of I mean with hardware offload compared to without hardware offload and we tested that on a 20 core system we saved seven cores they're freed up to basically not run packet processing anymore they can be dedicated to actually do service processing so very huge gains in terms of efficiency and then let's take another look at the second technology which is our acceleration technology with RDMA RDMA stands for remote direct memory access so as Colin mentioned when you separate out the state from the transaction processing you really what was ideal for you you really want the virtual machines to be able to access memory no matter whether it's on your local machine or it's on another remote host in the same cluster or in a different cluster so RDMA does just that so RDMA is a transport layer protocol similar to see TCP IP in the OSI layer but the difference is it was designed in like much later than TCP IP so it's designed to be able to be offloaded by the hardware itself so that it can achieve very low latency very high throughput without CPU penalty and we can actually run RDMA at a hundred G BPS so RDMA originally was an infinite band protocol mostly used in high-performance computing clusters but we've moved RDMA from not only supporting infinite band but also supporting Ethernet so you can run RDMA over converged Ethernet we call it Rocky and it's also routable across different routing domains so even though in your data data center you run layers three you can still use RDMA to do fast memory transfer so Colin mentioned that for some of their application for the clear water application they actually use memcache D for the really the fast states they need to access so let's take a look at some of the enhancements that RDMA brings to memcache D so basically on the left-hand side you can see that in terms of latency the green bar shows memcache D with RDMA and the red bar shows already I mean memcache D with TCP so you can see that the latency is reduced by about 66% so you only have one-third of the latency with when you use memcache D over RDMA and then in terms of throughput also red line means memcache D over TCP and green line means memcache D over RDMA so you have about 200% performance enhancement for memcache D over RDMA so this is significant improvement I think clear water itself is actually more of a control plane VNF so it really doesn't handle packet state but there are actually VNFs that will handle packet they need to get the state from for example memcache D on a per packet basis which definitely have much higher performance requirement so with this enhancement we can enable more VNFs to think about moving to cloud native and Colin also mentioned that they use for persistency for information that need to be persistent they use database applications such as Cassandra we work with a lot of the database application providers but we also want to enable the acceleration from the lowest denominator which is actually the block storage or file storage so here I want to use block storage as an example so we have accelerated ice fuzzy with RDMA so we call it iser and iser is integrated with OpenStack already so if you look at the performance enhancement without iser so you see the two bottom line in terms of throughput so that's your pure ice fuzzy and when we enhance it with iser you can see about six times of throughput enhancement and five times lower IO latency and we do that at the much lower CPU utilization okay so this is my timer we're almost done so last but not least is conversion so this is more from an operation and management perspective so with such a large pipe that can work for both VM to VM VM communication and VM to storage communication you really can converge your storage and network mean your storage access and your network access into the same pipe so that is much easier to manage so this is a summary slide we work very closely with OpenStack and so our involvement is mostly with Neutron and with Cinder so for Cinder we provide an iser plug-in so that any block storage access you can leverage RDMA acceleration and for Neutron our SRIOV and VXLAN implementation are all in our Neutron plug-in so we work upstream so these are commonly available in Juno and Kelo and also we integrate with multiple OpenStack distros like Miranda's Red Hat and Ubuntu with that thanks for your attention and if you have any questions now it's the time and for Colin and me is there is there a scale point that it starts to break down and I mean when you when you talk about breaking everything up into stateless and obviously you work with AT&T if you're talking about the mobility side or Verizon or Timo or any of the big carriers there's so much state in the network at what point does the replication of that state the retrieval of that state from a state cache scale out and start to lose that efficiency over the bare metal separation and memory state I think it depends on the nature of the protocol in our particular example with clear water and sip I think we're pretty confident that we're in the kind of Netflix situation and we've we've got the architectural performance if you've got the customers I think there are definitely applications especially with more complex protocols that are less easily optimized for this kind of approach that you will hit problems and you'll find that you're the need for frequent access to shared state will be a limiting factor clear water isn't but yeah that there will be cases thank you okay okay I think our time is up and thank you very much