 Hi, good afternoon everybody. My name is Ares Cohen. I'm with the Melanox technology Sorry to keeping you waiting for your beers and drinks. We'll try to be as swift as possible Hi, everyone. My name is Iran a CEO and Cloudy go Me and eras today. We want to present a different perspective how to implement the network services and virtual network services in in high performance IO environment, so we'll start just with Virtual networking today So when we look at virtual networking today, it's mostly been implemented using a software switch in the hypervisor this software switch provide Software switch in the hypervisor this Software switch provides switching but it provides much more Most of the virtual switch and provide some programmability inside them open-flow p4 and Other and we are able today to provide Virtual services and advanced virtual services in the hypervisor using the virtual switch but this is working very well and mature, but does it fit I I Performance virtual networking when we are talking about high performance. We're talking about 25 40 and even 100 gig networking And the answer is for that not only by us by other as well that he doesn't fit and To solve that there's few existing solution. One of them is obvious tpdk obvious tpdk is the same as obvious just instead of using data path in the kernel It use it in user space it use the tpdk as the IO in pole mode and you can put one CPU or more for the pole mode pulling the data it provide higher performance, but still it's mostly software and You as a cloud provider you would have to provide Relatively high CPU footprint for the networking services Another solution is VPP VPP works on packet batches process the the packet on a graph of of a pipeline It provides more customization higher performance, but it mostly on How to optimize and to use quiz more from your CPU mostly the x86 and again relatively you will use I Footprint of your CPU to provide high performance and networking when we're talking about 40 and 100 gig networking Another solution is sri ov. So what is sri ov? sri ov is a way for hardware vendor nick vendor to Be able to represent their nick as multiple PCI devices We called those PCI devices virtual function and The VM can connect directly to the virtual function bypass the hypervisor all together and Connect to the embedded switch that is on the sri ov nick The advantage are clear we get line rate performance up to the PCI or to the rate of the of the sri ov Nick we have persisted scale in the number of VMs up to again to the performance of the nick and The disadvantages are that These vfs are statically allocated each time you want to allocate more you need to restart all the other vfs They don't provide full nick configuration So you need to rely on the pf the physical function on some of the configuration This is another limit and when you change it you change all the others most of those nicks provide very limited switching when between the vfs on the same nick and the most problematic and the most The biggest problem in today's sRV solution is that they don't provide Local virtual services because they bypass the hypervisor and the hypervisor was the one that provided The virtual services and the switching so we are not able to do that today So if we look in virtual network services, what are the challenges if we want to provide them in high performance IO So we talked about a Software switching it's working. It's working. Well, but as we're showing the number later It takes relatively ICPU footprint on the other way if we provide it in a centralized manner Then we have to to move all the data all the packets to this centralized location and Like illustrated in this slide if we provide the VM and sRIV connection I performance of 25 gig or more then the bottleneck became the network node the The virtual service itself so It so even if the VM gets the performance then the centralized location will be the problem When we look on this problem What we really want to be able to do is be able to apply the services the network services on the path of the data in a way that doesn't consume Large amount of our CPU footprint and when we look at this problem We can look and as an analogy to other area and that we think it's a bit similar and to see how it was solving that area and The area that I'm talking about is web services web application And if we look about web application Evolution over time it started with a couple application That was one application then it moved to the multi-tier level web app DB and Now there's a big effort to move into smaller element that can be managed separately orchestrated separately scaling and out very easily and The latest trend is server less and what does it mean server less? so Serverless basically means that you don't need a wrapper around your task just waiting to be executed You let your cloud provider decide when to run the task and The task runs upon an event and the cloud provider will decide where is the best place to run that task the most used example is When you copy images to your object store and you want to generate thumbnails If you use the micro services or VM paradigm You have to copy the data from the object store to the VMO container Generate the thumbnails and then move them back to the Object store you would have some process waiting in idle just for you to Copy stuff and you will copy that up way and back not in an optimized way If the cloud provider is able to do that then he can decide where is the best place to do that? and in order to avoid copies and Will generate the task only when it's needed? So is is there a similarity to web services to network services? We think there is if we look in the network services It started as a appliance this appliance was a monolithic appliance that included all the network services Depending of the appliance of course, but it included in a coupled way everything then we move this coupled solution into a VM The work that is being done by us and by a lot of other companies is to take this Couple solution to split it into smaller elements that could be orchestrated in a micro service fashion and We think with the latest trends That starting today that we see more and more Network cards becoming smarter. We have a simple element in the network card that has doing tunneling either manipulation either rewrite and Errors will show in a minute. It's coming more and more that we'd have more and more Computation power on the nick themselves With this trend that we feel that we can change a lot of paradigms that exist today and how you look in network services and if we model them a little bit differently we can do things that are impossible today and We can even if we look at it as a the nick becoming the centralized piece on all the system We can create uniform fabrics That and we can even start thinking about changing the the way we connect things and topologies And we'll show it in a minute, and I will let errors talk Thank you run so Iran just mentioned about moving workload inside the nick which is Smarter and has new capabilities than just sending and receiving packets And today we're hearing more and more about smart nicks. So modern data centers are really about having higher requirements than previous System so modern system modern data centers are looking for very fast network adapters People are talking about 25 gig today as the standout instead of 10 gig and the more We look in the not too far future. We're seeing Systems that are going to run on 100 gig and even this year. We'll start see nicks that running at 200 gig So really the amount of traffic that we can pass inside and the the data center is growing And it's mostly because of the microservices and the east-west communication Low latency is also critical. It's critical because we are looking to Segment our workload in a different way microservices really bring us into distributed Workloads and in the distributed workload latency play a very significant part Transport transport offload is really Important as well transport offload enable us not only to have a very thick pipe to pass the data But also it provides us the efficiency and how do we pass this data in a proper manner Kernel bypass or direct user space access. You're probably familiar with dpdk Is another way to to achieve very low latency and flexibility on the? data transfer Advanced virtualization support Iran mentioned SR IOV or single root IO virtualization of one is one of the options But there are additional virtualization capabilities that Modern data center require Flow-based switch so with SR IOV most nicks provide some kind of switch inside the nick but we are in a modern world and we're looking at flow-based much better granularity and The last point is software programmability. We do want to have very flexible and programmable infrastructure So there there you're starting to see more and more smart Nick coming up in the industry today And smart Nick are not only one option. There's multiple options to have smart Nick The first is having still a sick silicone design that Used to be the standard Nick, but it is much more enhanced and provide flexibility And we believe that this is the main use case that people will go because in the end of the day ASIC provide the most efficient cost-efficient highest performance option But there are some cases that the standard smart Nick ASIC is not enough And then you will see solution like ASIC with an FPGA Which is a very strong trend in the industry today adding an FPGA as bump on the wire to provide offload capability in hardware speeds and lastly the system on a chip which is basically a nick together with One or more CPUs general purpose can be armor can be something else that provide even more flexibility Even though obviously power wise and size and cost they're a little bit more complex and more expensive so though all those three options are available today and we believe that the ASIC approach will be probably the most dominant one in the industry going forward For Melanox perspective Melanox has our nick family is called ConnectX And I want to introduce you to the ConnectX 5, which is the most Latest device that we have in the in our portfolio today ConnectX 5 is a ASIC based smart Nick. This is a device that chip To many many many vertical it is not only for telco or only for specific market It will go into General-purpose cloud it will go into high-performance computing it will go into data centers running Oracle or Hadoop or what have you and at the same time because of the architecture is so flexible It can be used as a very advanced nick to telco or NFE It features Gen3 and Gen4 PCI Express by 16 So it provides a very thick line of up to 200 gigabit per second of a single nick Which basically spread into two ports network ports that each can run up to 100 gig But it really supports all this the other speeds as well. So it supports 10 25 40 50 and 100 gigabit per second So all the needed speeds are supported it supports stateless Offloads which mean check some RSS all the advanced kind of capabilities that you would expect to see both in a standard packet type VLAN as well as in encapsulated packets so you know that in SDN encapsulation are very is very important and solutions like VXLan MPLS NS age and others are very common and we do want to enable all those offloads in that space as well It has very high DPDK performance So about 133 actually the newest number which is almost line rate of 100 gigabit per second It has very advanced sRIOV support with Howard base quality of service It has accelerated switching and packet processing I will touch in that in a minute, but this is a very advanced virtual Embedded switch that run on the nick itself and provide us the ability to offload a lot of the services That Iran talked about earlier and it supports remote direct memory access or RDMA, which is a very advanced transport service Very common in storage for example So I mentioned earlier Accelerated switching and packet processing or ASAP square direct This is really innovative technology that provide The nick the ability to process packets in as a flow-based switch and provide very advanced capabilities. So The nick itself contain an embedded switch. It's a real switch It's switch implemented in hardware and it is a flow-based switch a flow-based switch means that it can offload Match action operation. So there is a classification that is done on the packet per packet And instead of do that in the virtual switch or anywhere else We do that in the real hardware. So it's very fast and very efficient and for every classification There is an action that is associated with that Action can be for the packet to support drop the packet encapsulated decapsulated Do routing do not a lot of the services that you would expect to see from a virtual switch like OBS or other virtual switches The hardware capabilities are there but without an API to consume them. We can't really use it So we have the capabilities for over two years, but only recently we We got to the point where together with the community the linux community the open V-switch community dpdk community is open stack community We enable the set of API and integration in the open source communities to take advantage of those low-level capabilities in the nick the Main API is to consume those services are linux DC traffic control, which is a set of tools in linux and dpdk itself and If you look at the drawing on there or the chart on the left-hand side, you can see one of the use cases that We already implemented and I'll show you a little bit of the numbers that we Measured already and this is just an example of one use case later on Iran will explain how we take this kind of high-level use case and break it into much more advanced services or Service less services But this use case basically take OBS in this Drawing it says virtual switch, but it is OBS and offload the flows Rules into the nick embedded switch where the virtual machine themselves connected with sRIOV So what we get here is we basically get a complete SDN enforced traffic data plane while the Data plane itself is implemented in hardware so you can imagine how the performance is far greater To give you some sense of numbers we have run a comparison against Yet another high-performance Mode of operation, which is obvious over dpdk obvious over dpdk as you know Change OBS instead of being run both in the kernel and the user space It moves all the elements into user space and then use dpdk to access the nick hardware and by that provide a much higher performance Just to give you a ballpark number obvious standout obvious a kernel based implementation Can do you know, I know half a million maybe a million packet per second of 64 bytes in a very good day Probably much less than that OBS over dpdk provide much more, you know in our measurement It was close to 8 million for few flows if you add more flows It kind of drop to about 2 million packet per second, which is much more than you know the half a million But it is still not enough plus dpdk consume quite a lot of CPU cores That basically waste a lot of your resources to run your vnf's When you when we use OBS offload We basically Run the entire data path in the nick and provide the VM connectivity with sriov and with that We gained two points first Performance is much much higher. So if you see in the numbers there For a few flows we get about 30 million packet per second and it will grow even more with connect x5 and For a large amount of flows we're kind of leveling around 15 million packet per second With encapsulation and encapsulation And the other benefit is that we don't use any CPU cores. So you actually can Spin much more VMs or much more application and by that improve the overall efficiency of the system so Cloudy go and melanox we partnered together to provide in the base offering The virtual services available today in neutron in open stack for sriov devices and what you can see on the right is a tenant virtual topology that has two network connected with a virtual router load balancer on two of the VM security groups that is a state following firewall and Virtual networking and we want to be able to implement it in a physical deployment that we see on the left That is actually distributed. So we don't have a network node. We don't have a central place that does the services and What you can see on the compute node, we have the cloudy go engine the cloudy go engine is the engine that Make it possible and orchestrate all of that using all the newest capabilities that the error said of the connectix 5 So before we did dive into the cloudy go engine Few things that we did a bit differently so we took the network service and we split it into the smallest possible element and and Then when we talked with hardware vendor, they are saying offload everything to the hardware when you talk with software vendor They say do it of everything everything in software, but in reality There are things that are better than in software and that's that are and can only be done in software in some cases it's stateful operation a complex operation and manipulation and areas that you need fast innovation cycles and There are areas and actually you want most of the packet to be processed by the hardware where you do repetitive work other manipulation and more and more and other work so What we are trying to provide is an hybrid solution an hybrid solution that knows in your Pipeline that you spread it into the smallest element which element will be run in hardware and which element will run in software And when you think about this problem most of the use case that you provide you just say, okay Let's take tunneling. See how I can offload it. Let's take routing. See how I can offload it But if you look an open stack and what we saw it's much more complex You have a pipeline that includes virtual routing security group tunnel encapsulation and in some cases load balancing you need to create this Pipeline and think where to put each element of this pipeline in order to do it right So it becomes an orchestration problem and a big orchestration problem where to blow where to place this element and The unique feature of our engine that is not fixed most of the system just use caching mechanism Decide, okay. This was recently used. I will use the other one and just place in the hardware but our solution taking account all the pipeline and The traffic patterns that currently exist and it can change Modules that were in hardware can switch to software and vice versa and the the engine for the nicks it supports and currently we support a list of nicks including the Connectix 5 and for We know in very good. What's the advantages of the nick? What are the benefit which feature it has and what's the limitation and every hardware as its limitation? It can be the number of flow or any other things. So we need know to place it in the right place and learn depending on the traffic pattern and to know to be adaptive and So if we deep dive into the The engine you can see that you can connect VMs directly to sRIV virtual function Or you can connect VM through the cloud ego software engine Or connect the containers via zero copy kernel model that we provide in a very efficient way the cloud ego programmable Engine provide minimal latency and very very small latency and You can use our model or you can include your own model in a very very simple way to program and include to the system and Program this model in a regular programming language And we know to take the services and to split them as we said earlier take Services like not firewall load balancer routing and split it into a small elements and then put part of it in the Nick and part of it above in our engine So if we go back to the scenario that we discussed earlier We want to implement this virtual topology. So how do we do it? The cloud ego engine knows to orchestrate the flows that only the flows that needs software processing will go to the cloud ego engine and In a way that actually most of the traffic is flowing in the nick But we are able to provide Advanced services stateful and everything that needs to be done in software can be done in software and our engine Is very optimizing software You can run the same and even higher performance than the existing project that we showed earlier but the main idea is to Use existing hardware that is not used today Some some deployment that we saw they have advanced nick and they are not using a Lot of the feature of the nicks because it's very complex. It's very complicated to do And So to give you some number we did a test scenario that provide encapsulation virtual routing security group and If we look an obvious TBDK the paradigm of software switching and we have your very optimistic numbers saying 8 gigabits for one core to provide a Virtual routing stateful security group and And encapsulation for a large number of flows and if we want to provide 100 gig we would have to to use at least 12 cores in the solution just for the virtual networking in this use case We have a two socket machine with total of 28 cores. So you can see the footprint for just Providing the virtual services is huge and you have less CPUs left for the user workload in our solution in Cloudy go you can use one core for the virtual services and You give left 27 cores for the user workload this is night more than 90% CPU saving and Increase your ROI as a cloud provider because you can use it for VMs and Here the assumption is for VMs in in each course and Another interesting thing is that obvious minimal latency is 33 microseconds and 33 a microsecond is Just when it's dying forwarding no advanced processing in the cloud we go system because we know To pass most of the traffic inside the hardware and to use the capabilities in use the the smart nick Embedded switching it's actually close to zero and we neglected the CP the in the nick latency here Just for the calculation because in both you have the same Latency so we improve the density almost but 200 percent we save CPU saving by 90 percent and then we almost eliminate latency for the virtual services so this was the base offering but when you provide the actually the biggest challenge is when you come and do deployment of this high performance solution and especially in telco NFV Connecting it to to the access is a very complex thing and usually you need a lot of customization The the our engine knows to use embedded feature that are in the connectix and are not currently exposed via the API like Q and Q PPPOE and provide Those feature like the niche protocol. Let's call it that are not exposed and We know to make it easy for the integration and provide them So and if you look on the VCPE use cases a telco provider that wants to connect the VNFs that provide the VCPE Would they have to do Q and Q load balancing or tunneling and encapsulation to the end device We with the engine will be able to provide to offload most of it to the nick and Make it transparent for the integrator almost transparent and if you need special customization It's possible using this solution This is a It and tea and tea unless you have any questions. We will be happy to answer. Yes Excuse me. I'm sorry. Hello. Did you have some integration to be for a switch? We we actually Do not this is a different Solution than P4 P4 Mostly give you the ability to have a dynamic Heather match an action here We are providing it in a pipeline in a programmable pipeline So it's a different solution from P4 Other. Yes on the hardware. You mean the flows on the hardware. Yes so we provide in the engine we both provide a statistic counters information about packets and statistic about what pass in each model and What we have in the hardware meaning? depend on the smart nick ability To provide us the counter in the information but all the counters that are provided by the hardware are exposed if on the hardware of course act If you can monitor the flows manipulate So a cloud ego is like an abstraction layer that Give you the service and knows to do part of it in software in hardware Yeah, I don't completely understand your question if you're talking about manipulating them directly Then you can cause problem But you of course can change the service and what you defined via cloud ego and cloud ego engines knows to Manipulating dynamic dynamically and change flows that were in the hardware and those two Go to the software and vice versa. Are there other questions? Seems to me that would be awfully appealing to almost any data center today. It looks seriously at this Have you had have you had much interest in it? Yes? Yes, we are we are I cannot disclose much about this but we have a Lot of interest. We are actually in the POC State of for this use case. So just one other thing to that to the POC It's interesting because you're you're adding a kind of an excite a turbo boost, right to these switching and You could find yourself in a situation where you got a rolling upgrade coming through and somebody has not accounted for some component here Bypassing it and consequently all of a sudden the performance plummeting and the system effectively having to be rolled back In order to accommodate former service loads, okay, so because The when you think about it and it's depends if you have the same nick on your cluster What we provide is a way to do some rolling up their upgrades Like you can test your upgrade on only 10% of the flows and then go on but So we do that not only to provide To solve the problem that you described but we we we see a need for for fast innovation cycles today in the fabrics and we don't Deploying things that can get things broken. You won't get it and we provide the ability to test it and a very Small percentage on your flow and then take it down or increase it How close are you getting to the kind of the ASIC speeds? Are you within an order of magnitude is something you would get on a juniper core? switch Will we get I guess I'm just wondering, you know, how much longer do they have a sort of a monopoly on on that really on the high-speed stuff Currently on the CPU Processing To to pass to the CPU directly, it's it's actually the PCI limit Today, but I think Melanox is working on the next generation PCI that will solve it But we are we are we seeing numbers if you look in numbers of things that are possible only on software without the hardware acceleration, it's amazing numbers today and And we we from our experience we see that the all the existing Software layer are becoming more and more mature and be able to provide Advanced networking in high-speed. Okay. Thank you Are there any other questions? Okay. Thank you very much