 All right, thanks for showing up, folks. So my name's Ron, I'm from Netronome. I'm going to talk a little bit about application performance because folks tend to talk a lot about how you run things. We talk a lot about how fast the things go. So a little bit about Netronome just to set the stage a little bit. We sell accelerated smart NICs. So basic idea is your standard low profile PCIe Gen3 by 8 NICs, 10, 25, 40. On another product line, we also support 100. We also support software loads on top of those. So it's a little bit different than most people expect from a NIC. A traditional NIC, you get, you've got a driver, you've got the firmware on the NIC itself. If you need to add some functionality, like let's say Genev comes out and you need to do some tunneling protocol, you're stuck because the NIC has a fixed function ASIC. It can do VXLAN, but it can't do anything else. Our NICs are fully programmable, so as software loads get more complex, you can actually change the personality of the NIC itself. I'll talk a little bit about that and how we do that a little bit later. But the basic concept is it's about boosting network performance. By offloading the data path and putting it directly onto the network card itself, you're going to save CPU cycles. You're also going to be bringing a lot more innovation to the NIC itself instead of depending upon an 18 to 24 month ASIC cycle, which a lot of people depend on. Our space, where we tend to work, NFVI, so we're not a VNF company. Some folks tend to think that. We work in the infrastructure side. We also offer security features. And our primarily space is in the cloud networking sector, where we work with everybody, the open stacks, open NFV, et cetera. So what's happening most of the time today is we have these stacks. And we talk about what software is running on the server itself. And I'm a network guy. I'm a hardware guy at heart. So when we talk about controllers, be it a contrail controller or an SDN controller, that's obviously running in software. You have open stack orchestration. You have other cloud orchestration tools. All of these are stacked on top of the server itself, and they're running in the CPU. Then in addition to that, you've got a virtual router or a virtual switch. You've got Linux firewall capabilities. All of these are running in software. They're getting continuously upgraded. Some of them are on a four-month cycle. Some are on an eight-month cycle. But you're seeing continuous feature upgrades, i.e. in the latest, or the latest, I should say, in 2.6 OVS. You saw contract and IP tables released, NAT as well. And whatever's coming out next needs to be supported as well without having to do a forklift upgrade on everything that you own. And most traffic these days is not intradatacenter. It's essentially cloud-based workloads. Distributed storage, NFV, distributed security, and packet processing. And all of those are networking workloads. Add to that, you've got workloads from machine learning, server search, transaction processing, et cetera. And you need to start offloading the CPU. In other words, you need to start adding in functionality so that these features can be done in a quicker and more effective way. And there's a lot of acceleration devices out there. GPUs, fantastic for machine learning, scientific computing, back to the days when CUDA was released and you just did protein folding to today, a lot more massive capabilities. FPGA acceleration is amazing as well. And for distributed security or packet processing, you can also rate your algorithms for machine learning and deep learning. Network accelerators, which is where Netronome fits in. Network accelerators fit for distributed storage environments, NFV, distributed security, and packet processing. And the point of a network accelerator is it's focused on networking traffic. It's not focused on algorithms. It's not focused on parallelization. It's truly focused on a run-to-completion model where you bring a packet in and you drive it to the VM via SRIOV or DPDK or VertIO. The main thing to talk about here is how do we do this? When you look at your traditional environment where you've got your virtual machines on your server, when you look at this box, this is the Agilio NIC itself. It ties in to your Linux kernel where OVS is running. We mirror that data path. So we've got our match tables. We've got our actions. We've got all tunneling protocols. We also can add additional features with scripting language P4. And then we can deliver things directly up to the virtual machine. Now notice, though, it matches exactly what your Linux kernel would see, which is your match tables, your actions, and your tunnels. We take the information from the kernel. The kernel gets updated via OpenStack, via OpenFlow. It drives the rules down into the match tables. I'm going a heck of a lot faster than I expected to. And so if a packet comes in and hits our cache, it's then matched. If we notice what the match is, we take the appropriate action. We do the appropriate tunnel end cap, decap. And we send it directly to the virtual machine. But if you notice, that data path does not go up into the server itself. It does not go up into the CPU. A traditional NIC, what you have to do is you have to take the packet, drive it up to the CPU, do the matching there with a memory hit, come back down to the NIC itself, and then deliver via SRIOV or VIO, VertIO, excuse me, back up to the VMs. So you can see by having a dedicated data path that's directly related to this NIC itself, you're going to save time. You're going to save CPU cycles. And you're going to actually get better application performance because the CPU is not bound up trying to do network processing as well as doing application processing. And I'll take it back a little bit, and I'll take it forward. We also work with a few folks, Marentus, with their OpenStack. We work with Juniper on their OpenContrail. And the reason I kind of go back and forth on this is the data path, if you look at it, is different from your traditional OpenV switch to your V router. But it's the same box, and it's the same card. Because this is a programmable adapter, you can actually put a different software load on the adapter, and it becomes a virtual router instead of a virtual switch. So you're doing a least prefix match. You're doing an MPLS label lookup. Instead of doing your traditional match and action, sorry, you're going to take a picture. I'll flip right back to it. But if you notice, the data path has changed because our ASIC is actually a programmable ASIC. It's doing the same exact activities that you would do with a virtual router. And it's doing them on the same exact hardware. So what do you get from this? With your virtualized server, you're doing a full OVS offload. The point being, what we've shown, and we have a bunch of documentation around this and some interesting demos going on as well, you can burn up to 12 CPU cores just running 40 gigs of network traffic. So what you want to do is, if you can pin your OVS down to a single core with our adapter, you can get about 25 million packets per second on an L2. Well, L2's kind of useless, right? So using VXLAN tunneling, you still get about 21 million packets per second on a single core. If you pin your kernel-based OVS to a single core, you're going to get about 3 million packets per second. So your ability to process and get the line rate is severely curtailed. And that's something that we're trying to show here is that if you think about application performance, people think about throwing more CPU at it. The other thing you can do is take away the nonessential processing functions that are related to application performance and put those onto a purpose-built device. So something that we've shown with Ericsson, where they had 2 million packets per second for VNF, when they were running at 40 gig, they needed 2.8 racks to get to support a 220 VNF environment. And the reason for that, if you look at it, was that 16 of their cores were being burned running OVS, and eight were running virtual machines. So you have a 2 third, 1 third. In any environment, you would want 2 thirds of your CPUs to be running application. By doing OVS to offload, what you're able to get to is 22 million packets per second of VX LAN processing down to a single core. So you can now get 11 apps or VNFs on the same device at the same 2 million packets per second. You get down to one rack versus, let's say, three racks, 2.8 moving to three. So you can actually have three times lower TCO, massive improvements in your data center density, and better application performance. The other thing that is changing is, how do you necessarily change a data path that's been built onto a purpose-built ASIC? Well, you really can't. If you're running something like P4, anybody familiar with P4 scripting language here? So it's a data plane scripting language. You can add things. We've got a demo running now where you can actually take VX LAN, the general purpose extension, and add telemetry data into the packet as it transits the NIC. So if you look at our, come on over to our booth, we're only two over. As a packet comes into the NIC, and say your service chain among multiple VNs on the same server, once the packet goes in and then comes out, that's the only way you can measure a type of latency. You don't know what's happening inside. But with a NIC, and you're going VM to VM, we can add an ingress header that says, here's your timestamp, and an egress header that says your timestamp, every time the packet hits the NIC. So if you leave virtual machine one, we can have an egress packet. And then as you go into virtual machine two, we can have an ingress packet. So we can test latency of every single device and see which virtual machine is taking the longest to process through a service chain. With P4, you can actually script that so that any networking device that supports P4 can actually run that code. So it can be EBPF code running on the host, or it could be NFP, BPF code running on the networking device. So a little bit about what I was showing, talking about with this demo. As the packet comes in, goes through a switch and hits our NIC, we can add telemetry data to that. And as it wraps through, you can see two virtual machines. So once upon a time, every single packet would leave the server, go up to a switch, come back down to another server. Today, we don't see that. What we see is a packet can stay inside of a server in that east-west mindset and go from virtual machine to virtual machine. So with Agilio, with our solution, you can actually add real-time data collection. You can do this at wire speed because the packet is inserted on the fly. And then you can triangulate between your server, VM, and your switch, especially if you have a programmable switch, and you can gather data at each hop, so you can actually get a lot more telemetry information than you can get previously. So just to take it back a little bit, standard NICs, these are not full-height, these are not massive power, these are just your generic PCIe Gen3 by 8, low-profile sub-25 watt NICs in your standard 10, 25, 40 gigi. The software, when we talk about our software load being OVS, this is not a custom OVS coming from Netronome. This is standard OVS. When we launched, we launched with OVS 2.3. Right now we're at OVS 2.6, 2.7 comes out. We'll add that to the feature trail as well. And you get these with a subscription service. V-Router the same, 4.0, 4.1, 4.3. With CoreNIC, which is what comes with the NIC, it's just a standard NIC operating system. So the basic concept here is use the right tool for the right product or for the right problem. In this case, offload your network functionality onto a NIC and you wind up saving CPU cores, you wind up getting better application performance and you can actually monitor that application a lot tighter. Any questions? What? Oh, the good question. The question is what kind of ASIC is on the card? And the first answer, everybody says, well, it's got to be an ARM processor. Well, no, it's actually our processor. It's a multi-core, 72-core, Netronome designed. We fab it with Intel. They build it in their custom fab. So it is our ASIC. It is a run-to-completion model. All of those cores can be programmed. So technically, if you said, oh, I wanted to do my own custom data path that's not OVS, maybe it's my OVS or something like that, you could actually program it to this core. We've got an SDK and so forth, but it is our adapter. How does what? Tapping as a service. Boy, I gotta tell you, I don't know the answer to that. Tapping as a service. All right, thank you very much. We're just two booths down. Come on and check out the demos.