 All right, any better? OK, my name's Ian Wells, and I'm here from Cisco to talk to you about Cloud VPNs in OpenStack. Basically, this is an NFV application that we wrote within Cisco. The application itself is, I'm afraid, closed source rather than open. But what I'd like to share with you is what we did. And what we learned on the journey of trying to write applications for NFV-based applications on an OpenStack Cloud. So just to give you a background information on what we actually did in the first place, then the first question is, what is Cloud VPN? What were we trying to accomplish? And in this particular case, what we were trying to do is offer multi-site private VPN networks to customers of internet service providers. So that's the idea that you're, for instance, McDonald's to take a random name. You've got many sites. You would like to have your own VPN that covers all of those sites. So you buy or rent a service from your internet service provider. They put a box in your site. And that box connects you to everybody else in your corporate network. So a view of that is like this. So we have what's called a CPE within the customer site, customer premise equipment, I believe. And in the ISP, you usually have a router on the ISP side, which actually is going to terminate those connections. And then we make VPN tunnels using IPsec that actually join together each of the CPEs to an overall network. So if we're using NFV to implement that, then we replace the actual physical box with a virtualized system. And so what we do instead is we actually connect to a system within the cloud, which is usually more than one virtual machine, single VMs or a single point of failure. We try and avoid that. And so our NFV service in the ISP acts as the hub for this virtual private network. The other advantage it gives us is that now we've got a virtual service, then we can start adding features to that virtual service. So in the case of what we've been doing, that includes some firewalling and some HTTP proxying with detection of malicious content. In the application we've done, then we have an orchestrator at the top. The orchestrator does two things. One is it starts those virtual machines within the cloud and also then configures the virtual machines in the cloud to do the task that they're set to do. The other is that it actually configures the CPEs on the other side to say the cloud you're going to be talking to or the endpoint in the cloud you're going to be talking to is over there. That's the one you want. So what can happen is I can, as a customer, I can order a CPE and say, I'd like to have a VPN attached to my cloud. It can get shipped off from the telco to my premises. All the telco needs to know or to do is to enter in the serial number of that CPE and give it a little bit of stock configuration. And then when the CPE is first switched on, it will call out to the orchestrator and say, tell me what I have to do. And so the whole process from start to end is finished. It is straightforward. It doesn't involve anybody's hands on keyboards in an ideal world. In fact, you can go to one step further from that. So the orchestrator that's managing the service, you can actually give the customer themselves a self-service web application. And that web application makes everything happen. They just say, I want this service. It costs me 100 euros a month or whatever. And everything, all of those bits of process are basically forwarded into the orchestrator. And the orchestrator takes care of the order. A little bit more of a technical diagram of what it works. This gives you some idea of how we think of it, I suppose. So we have a number of layers here. But the important top layer of the application there is the point at which we start talking between ISP-facing services and the services in the cloud themselves. So the orchestrator is actually a cloud application just the same as everything else. It manages talking to the CPE and making things happen. It manages running the individual virtual machines that form the VNF that's being run. And it talks to a thing we call the ESC, actually the Elastic Service Controller, although in this instance the service is not elastic. Its job is to manage the life cycle of virtual machines. If a virtual machine stops working, and that doesn't just mean that OpenStack can see it's stopped, that means that for whatever reason is not doing what it's supposed to do, then the ESC is monitoring it closely, and it will kill it, and it will replace it. So that VNF remains running consistently until you tell it to stop. Obvious question for anyone who's used to OpenStack is why is it not solved by VPN as a service in OpenStack? It's a slightly different use case. So VPN as a service is part of the cloud infrastructure. You're offering a service to cloud tenants who log into that cloud to get into their services. Now it's useful for many things. The obvious use cases want to log into the administrative network of your applications so that you and only you can actually get to the back end interfaces of those applications. If you're trying to do the cloud VPN use case we're talking about here, firstly there's one problem which is that you're actually coordinating both something in the cloud and something outside. But the other problem is that you're no longer talking to cloud tenants, you're talking to customers of a service provider. They're running this cloud for their own problems. They run this. This is an application as far as they're concerned. And so they're delivering a service using an application. Think of it more like, for instance, Netflix where they're using a cloud to run an application but they're not reselling the cloud or a service that helps you use the cloud. They're reselling a tool that you use directly. So Netflix delivering videos to your home. They're just using a cloud to make that possible for them. But the cloud is nothing to do with you. You don't even need to know it's there. So what did we learn about this as we tried to do VPN, sorry, NFV applications within OpenStack Cloud? And the first thing, which is quite important in my experience is high availability is a thing that everybody has a slightly different opinion of. So high availability generally, I mean the blunt statement of what it means is it means it's better than normal availability. Generally you're saying that I'm running redundant services and again redundancy here is not something that you demand from an application or a cloud. You're just trying to improve the odds that this thing runs and keeps running in the face of failures. But it's not really about running many copies of the same service. It's the consistency of that service. The service is gonna be there even when things start to go wrong. You expect things to start to go wrong and you need a way of dealing with that before it happens and before your customers get disrupted. So what's cloud HA? Well in clouds, VMs die. We all expect that. They're not supposed to live forever and clouds make no guarantees about how long virtual machines live. In particular if a server dies and it happens to be running your virtual machine then your virtual machine will go down. There's nothing that prevents that from happening. So you have to expect that and even if a cloud is nominally HA then virtual machines are gonna die sooner or later. If they spend a lot of money on hardware then presumably it will be later but they will still die. That's just an inevitable consequence. Hardware and clouds dies and that hardware may not just be your compute nodes. It can also be your control nodes. You need to deal with the consequences of hardware dying and usually that involves running many copies of the processes that comprise OpenStack's control layer. So you're looking at a redundant database, redundant rabbit MQ, many copies of the Nova API and so on to make sure that if one piece of hardware dies then something else will take over its job in the service and won't forget what's been happening. You don't wanna, it's certainly with cloud with OpenStack, you don't wanna be forgetting that someone's asked you to run a virtual machine, for instance. Upgrades will cause disruption inevitably, I'm afraid. So what happens when you're doing an upgrade is you will have an outage of some variety. It might be three seconds depending on how you set this up or it might be five minutes. If you're really unlucky and everything goes horribly wrong it will be a lot longer than five minutes but that's hopefully not gonna happen. But the point is that upgrades will cause some disruption and if your service or if the thing using your cloud can't cope with that level of disruption then you're gonna run into some problems. So in summary, be careful of what you ask for when you're talking about cloud HA. Cloud HA specifically, the OpenStack part of things. You can get HA, you can get more HA. You can make this thing very, very reverse, nearly bulletproof. You can make it run in a five nines data center and so on but the more you ask for the more it will cost you and in the way of these things the incremental cost gets bigger and bigger and you're getting less and less for it as you try and go up the tree. The next thing is application HA and to give you an example of a standard web application within a cloud because it's the boring example that everybody uses. Typically you have web applications serving the customer, they're making the pages. They get the most load, they're stateless. I can make another one to expand the load or reduce the number to get rid of the load. If I were to lose one of those that's probably not the end of the world. The load balancer makes sure the traffic is going equally to all of them at the back end I have some sort of redundant database. We can debate cloud application models all we like but I'm just using this as one example. The idea is that if I'm doing HA and a web app dies then the load balancer takes care of that. It notices that the web app is no longer responding to requests or maybe isn't even accepting connections and takes it out of the pool. So it will stop trying to attempt to forward requests to that and nobody will see the disruption. Maybe one person if you're particularly unlucky. If a database server dies then database HA takes care of it. Now there are different models for database HA but nevertheless the idea is that I lose one server out of the database and I've not lost anything in terms of the application and replacing components it's not necessarily a fast operation. You have to take them out of circulation quickly but you've got time to get them back in because nobody's missing them. Your load increases a bit. You make compensation for that. But yes, you've got some time to replace them. And the reason I bring this whole thing up as cloud HA is because the people I work with think in terms of network HA. Network HA is very different. So in network HA we have a bunch of routers with a bunch of networks in between them. They tend to think in a number of patterns this being one example here of how to make sure that no single node and no single link is one single point of failure. If I lose a link or a node everything will keep running just fine. And again, I lose a node then I need to stop sending traffic to that node or I lose a link, I need to stop sending traffic down that link and there are routing protocols to make that kind of failure response very, very fast. I've always got some idea of where else I could use. I know that the link's gone down something's got to detect that as well and then I re-root my traffic to go a different way to its destination. Replacing the failed component, I mean at the very least it usually involves rebooting something but more often than not it involves actually replacing a physical piece of hardware which could be as much as a truck roll depending on how things are going. It certainly involves getting a piece of hardware out of the cupboard and fiddling with it, it's not instantaneous. So in summary, there are some common elements here, right? You firstly, fixing the failure fast is absolutely key with whatever's left to you after the failure has happened. Repairing the failure before another one happens that's also key because we talk about single points of failure, multiple points of failure are incredibly rare if your single points of failure are uncommon. You know your probabilities, you have to multiply two small numbers together and you get one number that almost never happens is the idea. So yeah, you get the repair done because that's a window of opportunity for a second failure but it's usually not a very big window even if it lasts into hours. So how reliable does your cloud have to be? Now the question here is, how do you specify how reliable you want your cloud to be in order that you can deliver your service? So how often do virtual machines die? That's a question you can ask. You can then say, well, that's too frequent, I can do something about that but it may not be the only thing you want to fix. How often does the cloud API stop responding and how long is it out? An obvious question because when virtual machines die, the first thing I'll do is I'll try and run another one and if I can't run another one, I've got problems. What if the whole cloud disappears off the network? Not something that opens that can solve for you in general but this is a thing that can happen. I mean, it's happened to Amazon enough times in the past that a whole region just drops off for a couple of hours. It's a thing that you need to consider as a failure case. You can say, well, it's not going to happen very often but it might happen. You might want to have, depending on how frequent it is and depending on the disruption, you might care to have a mitigation strategy. So yeah, what really matters? It doesn't matter if you lose a component, if your service is still deliverable. The customer doesn't see it, that's fine. What matters is if you can't deliver the service. If something happens that will disrupt your end user, your end customer for your service, if you can't make it less frequent, then find a way of making it less severe. So cutting somebody's internet service off for five minutes is bad because they'll be on the phone saying why is my internet service gone? Resetting a TCP connection, resetting a web request is not bad because they'll hit the refresh button and everything will be fine. So do your maths on the frequency failure and the cost of that failure in terms of impact to the end users and then decide, and there is literal maths here, probability, you can do with, work with probability, you can work with how much it will matter to them and how much it will cost you if it happens and try and work out which are the most important things that you have to solve and which are the things that you could solve that are just not worth the effort, not worth your time to deal with. Another point that came up with designing NFV applications is obviously performance. We talk about this over and over again in NFV. We're seriously considering putting really quite heavy weight applications into a cloud and then letting it run it for us. Now, not all VNFs are the same. Network functions vary. Some of them are compute intensive, a few of them are even storage intensive, but commonly the problem you have is I can't move my packets around this network or through this virtual machine fast enough. We're network engineers, that does tend to happen. Of the bottlenecks that affect that, then the first one is switching and it's almost invariably the software switch on your server that matters to you. So in fact, the open source infrastructure that you can use, OVS and Linux Bridge equally, will move 10 gigabits of traffic quite happily. So if you happen to have a server and you're lucky enough to have a 10 gigabit connection on that server, you'll find you can fill it even from a single virtual machine if you put your mind to it. But what happens when you're filling it is you tend to find you're filling it with very big packets. So I do IPERF or whatever. I'm running TCP, TCP is clever and it will expand the size of the packet until it's as big as it can possibly be over that link. And what you find, so you're running the biggest packet that link will take, you get 10 gigabits. You fiddle with things so that the packet size comes down even a relatively small amount. So 1500 bytes is common. If I come down to about a thousand bytes and I find often that OVS won't deliver the performance I'm looking for. If I come down to a fairly standard mix of traffic you find on the internet, which averages more around the 200 byte mark, then I'm really struggling to deliver anything like 10 gigabits through that kind of switch. So if I'm forwarding internet-style traffic, then I run into that problem. Now there are many advantages in the way we switch packets in the hypervisor. And if you look around and talk to people here, you'll certainly find plenty of people are offering you solutions. Now, it just happens that our preference here is using user space switching with DPDK, which means that what you do is you take the responsibility of moving packets away from the kernel. So it's no longer either OVS or Linux bridge that's doing the work. You use a very optimized layer of device drivers, which is what DPDK basically comes down to. Plus it optimizes the way in which you're using your compute resources. And then you move packets around on that, which gives you, generally speaking, significant enhancement on performance in the switching. And we've certainly seen performance up to, you know, 64-byte packets at 10 gigabits in the right circumstances. It varies a little depending on your setup, but that's one example. Another bottleneck you run into is getting those packets into and out of the virtual machine. And we've stepped through, this is a quick line through history. Originally, we used to emulate real physical network interfaces, and that was bad. It really didn't work for obvious reasons. You're not trying to be efficient. You're just trying to replicate what used to be so that everything works without changes. And then over the years, we managed to come up with Vert.io, which is, it pretends to be a physical interface, but it isn't really. It's a compromise interface. The kernel that you're running, be it Linux, be it Windows, inside the virtual machine, understands how to talk to a Vert.io device. And QMU or whatever you're using as your hypervisor understands how to provide a Vert.io device. And they talk a much simpler protocol, if you like, between each other, and they can move packets considerably quicker. And then we took that a step further. So the problem you run into there is getting packets into, if you're using QMU, into the QMU process, so that it can then hand it off quickly to the VM. So we moved to something called Vhost, and that actually gets you a direct pipeline from all the way to the kernel to the virtual machine. A lot fewer packet copies and also fewer context switches as your process is running, which improves matters. The latest thing to come out, which is in QMU 2.2, practically speaking, it works in 2.1, but not as well as it should, apparently, is Vhost user. And Vhost user is an interface from user space processes to virtual machines in a similar vein. It comes across as a standard Vert.io interface to the VM, but you're moving packets very quickly from another process to the VM. So going back to the DPDK data plan you mentioned in mentioning a moment ago, then you can move your packets from a process into the virtual machine much quicker with that particular technology. And finally, the third bottleneck we see is actually in the virtual machine. You can optimize the movement of packets all you like, but if the thing that you're running in the virtual machine to implement the service is not very fast, you get nowhere. So there are various things that work there. I mean, DPDK data planes, even within the VM, improve performance. So it's not talking to physical hardware anymore, it's talking to virtual hardware, but you're improving things by doing fewer copies of packets, for instance, to actually improve the performance of the virtual machine itself. The other thing you can do is you can reduce the number of interrupts into that virtual machine. In general, when a packet arrives, you get an interrupt. The packet's there. You need to go and do something about that packet that's just arrived. If you happen to be running in a virtual machine, then interrupting the virtual machine generally means you have to escape the VM, do a bit of work in QMU, and then go back into the virtual machine again. Every time you escape the virtual machine, there is what's called a context switch. It's not fast. You want to avoid it. It basically makes things run slower than you might reasonably expect when you're writing the software in the first place. So it turns out reducing interrupts is it's beneficial on hardware. It's actually also very beneficial when you're running virtual machine software in VMs. And in future directions, then one of the things that we've been thinking about, it varies a little. It's not a straightforward answer. It's simpler virtual machines. At the moment, I think we and plenty of other network vendors, that's the word I'm looking for, offer technologies, and we're using what we've got. So we're using full-on, full-service routers that deliver every feature you could possibly imagine. One thing you can do is you can reduce that down to absolutely only the thing that you need to do. And theoretically, you can get a speed boost for that in some cases. But it's a balance between development time and being able to deliver a service. And also, if you find that you're running many, many virtual machines for that purpose, one to another to another, then the overhead of running packets through all those VMs can cost you more than the benefit of having simple VMs. So there's trade-offs to be had there. And then another point that we found with our engineers at least is education. Sounds a little weird, but the problems that you run into dealing with writing NFV applications are not always software problems. Sometimes they're just making people understand how they have to deal with virtualized systems and with clouds. Clouds are new enough, it would seem, that a lot of the people I work with, they're kind of mentally adjusted to the idea that they're running things in virtual machines now, but not terribly well adjusted to the idea that clouds insulate them completely from the hardware that they have to deal with. So they don't get to do things well. I mean, the next slide gives some ideas, right? NFV applications generally have to meet a service level agreement. You have to do at least this well to really, that's a minimum requirement for delivering to a customer. And so application designers realize that and they say, oh, I could be just that little bit more efficient if only you would let me run on that particular machine over there. So just let me do that. I'll tell you which machine I want to run on, that would be great. The OpenStack interface is obviously not that. The OpenStack interface is you've got a virtual machine, you want to run, but I'm not going to tell you, you can't tell me where to run unless you happen to be an administrator of the cloud. You tell me what you want, not where to run, and I will deliver you what you need, what you ask for. So it's all constraint-based in OpenStack. Getting application designers to shift their mental model to constraint-based, it has been certainly an entertainment, let's put it that way. But I think the other part of this is that the constraints that we offer are not necessarily the ones that they want. So today I can say I want a certain sort of CPU, I want a certain sort of memory. I can do even things relatively more complicated now, like saying I need my memory to be on the same NUMA node as the CPU that I'm using. But it goes beyond that. They're going to want in the future things like, I would like the latency between these two ports on the same network to be less than so much, which implies a certain degree of co-location, but also a certain understanding of the hardware to say this is close enough and this is a little bit too far away. So I think as we go on into the future, we'll find that we need to express more constraints about for this application to deliver what it needs to deliver, it needs to be running in this circumstance. And then there's a few techniques that we've been talking about internally, which have been in-cloud redundancy. Again, going back to that first comment about redundancy, there are several models that work and it turns out in fact that the network engineering models of redundancy work very well in clouds. I can run two routers, I can make them do the same job, I can make it so that if one dies, the other one fails over or I can sometimes even make them balance load between them and that works just great. So you can use the network technologies you understand to get them to work in cloud. Alternatively, you can think about how a cloud developer, a web developer or whatever, would think about writing their application and you can borrow techniques that they've learned as well. So the important thing here is to understand the options available to you. Geo-redundancy also comes into this. In fact, with the cloud VPN application going back to what we were originally talking about, then what we found is that we can do in cloud redundancy for the network services that we want to offer to deliver cloud VPN. But actually, we get on better if we do geo-redundancy. So instead of running two virtual machines in the same cloud, we use two clouds and we run separate VNFs, one in each and the customer connects to both and the customer is absolutely proof against pretty much every sort of network failure you could possibly reasonably expect in a service provider network and also proof against that whole cloud just going away one day because the power's gone out or somebody's back to truck into the cabinet that happens to contain it or whatever. So geo-redundancy is very useful and again tends towards network engineering technology for solving problems. Finally, repairing problems locally is key. So you want orchestration that runs within the cloud to repair the VMs that work in that cloud, solve the problems locally that happen locally. It gives you a very tight little control if you're talking to a local cloud controller which will run local VMs which you monitor across a local link. So you can pick up problems very quickly and get them fixed very quickly if you do it right. But obviously you can't fix every single problem. So you have this path of escalation up the chain. So if I can't fix a problem in a data center then I have an orchestrator that's cross data center. I tell it something's gone wrong. It says, I'll just throw that data center away. I'll run something over there instead and then we don't have problems with the cloud that seems to be causing the issues. Eventually that escalation chain will get up to a natural human being who will for instance go out and ship a new server to a cloud which has had one too many failures or whatever. But the idea is the pattern you should always follow is a control loop where you sit there and you monitor and you repair and if you can't fix it, you escalate to someone higher up the chain. And that's a quick briefing on what we've found as we've been working with cloud VPNs. I'd love to hear if anyone's got any questions on what I've just explained or actually if you've got any more comments that you've learned in your experiences. So I open the floor to anybody who'd like to speak. Processing. Yep. So you mentioned the specialty set of idea of reducing to interrupts. So have you looked at this sort of doing something on the hypervisor side to reduce the number of interrupts that's just emulating the interrupt or something like that in the virtual device? Yeah, I mean, honestly no. And the simple reason is that if you get the driver within the VM to operate in polling mode, so it's basically sitting there seeing if another packet's turned up, then fine, it eats CPU. I mean, it usually eats a whole CPU dedicated to it. But then on the other hand, worst case you're kind of expecting to take a CPU or however many CPUs worth of resource at your highest load. So what we find is just kicking VMs into polling mode, letting them use more CPU than they maybe need to works better. I imagine that that's a trade-off. I mean, you could certainly see that you're using more power that way because you're running up the CPU load. But yeah, I mean, that works for us. I'm sure there are other solutions. Yes, actually it is. So to be fair, that diagram kind of minimizes the complexity a little. There's elements there to make sure that the orchestrator is highly available. You know, it's got redundancy for the simple reason that you don't want the orchestrator to fail either. And there's some elements to spread the load as well because as you scale up to enough customers, then you care that the orchestrator can't keep up. But in our instance, we're using NCS which comes from Tail-F. It's got device drivers for really quite a lot of physical devices, not only Cisco devices actually, it's across the board. And we also find it's very good for controlling virtual devices as well. We tend to favor NetConf Yang as our interface and it's got a good model for that. And you can do both things at the same time. So we find that we're programming both the provider edge and the VNF running in the data center at the same time with the same tool. It doesn't interface with OpenStack directly. So the ESC in there is the thing that calls down to OpenStack. It tells it would like to run a service and the ESC makes sure the service is running and keeps it running until it tells it it needs to stop running a service. So we've taken a two level approach for that. Hey Ian, we're talking about virtualizing network functions. In a traditional network, physical network deployment, you would have availability models based on network services that are traditionally based on active standby kind of models where you would have instant failover. Can you talk to me a little bit about your experience with that type of availability and what you had to, any challenges you had in the decomposition of the applications? So actually you have two availability models as you're running network services in a cloud in our experience. One is active standby, works fine. But there is some big butts here, right? You normally use a protocol like VRRP between your interfaces. When you're doing that, then basically both interfaces have an address and then you have a third address, which is the service address, which will migrate between them, depending on which one is active at any given moment. And the problem with that is that OpenStack's firewalling rules absolutely do not like it. You've got some options. One is address pair extension. With theoretically you should be able to say, and by the way, these two ports here, they have an extra address on. Everything's good. You just let the packets come through. The other is, in fact, we added port security in Kilo, which allows you to completely disable all address-related automatic security rules on those ports. But unless you do something about it, then OpenStack pretty much prevents you from doing the useful stuff. Interestingly, it also turns out that OpenStack prevents you from passing packets onto a third VM because you can only receive packets that are addressed to you, which means that pretty much any kind of routing or packet transit doesn't work. So you normally need to do that kind of thing anyway. It's just another knock-on effect. And for what it's worth, there's actually another model, which is you can do load balancing between multiple VMs. Normally you do that with HTTP and OpenStack has support for HTTP, but there are other ways in which you can do load balancing, including ECMP, which allows you to do effectively load balancing at the packet-by-packet level down to multiple services. I have a question on two elements. I see you did not touch upon, so that's about the offloading into the hardware, like SRIOV, and Cavium has some stuff available. So can you explain something about your experience on that? Okay, well, SRIOV is interesting, actually. It's quite effective. Surprisingly, it's maybe most effective because you get the benefit of using DMA. So when you're putting the packet into the virtual machine, the packet's copied there directly from the hardware and it never goes anywhere else. So normally you would have to land the packet one place, look at the packet and say, I know where that's going, and then copy it into the virtual machine. If you're using SRIOV, you can land it there directly, which is certainly an added advantage, and you can certainly improve your packet rates doing that. You end up with interesting problems, not least of which is if you're passing packets from one VM to the next on the same host, you actually become limited by the bandwidth of the DMA, which is typically 10 gigs, which is annoying. And also, although this varies a little, then at least the common NICs that you get around the place like to separate traffic by VLANs, which is a little disappointing. There are better ways of moving traffic around networks than VLANs, more scalable. I forget what your other question was, sorry. Oh, Cavium, wasn't it? Yes, yes. Sorry, Big Garden. Yeah, so I mean, what that amounts to, it's not the only thing you can offload. So you can offload some network work into hardware. You can also offload things like encryption into hardware and in rare occasions, even with network packets, you can make use of the GPU as well. Now, some of those are quite straightforward PCI pass-through problems. You take a PCI card and you hand all of it or a chunk of it to your VM, let it do with it what it will. The ones where we're talking about pushing in network functions into, if you like, the fabric between VMs. I've heard a few proposals about how that could be enabled, but I don't think we've really settled on either what the functions are or the standards between it yet. I'll be interested to see how that develops. Yeah, I mean, one of the concerns I almost always hear is about mobility of such things because it's so tight to the hardware. So what are the benefits of actually using that for NFEs? Yeah, so well, mobility is interesting. Generally, and again, this varies by VNF, but if you're maxing out VMF, if you've put a function in your cloud thinking I'm using all the resources that are allocated to me, then you're not looking at immediate migration yourself. You don't want that machine to move because you'll get a performance hit when it happens and it can't keep its SLA as it goes. But you've got high availability, so what you can do is you can kill it and let it respawn somewhere else and that gets you migration. Now, when you do that, the kill and respawn, or if you just spin up another instance, then your constraints say, I will be needing this function from the network. Only put me somewhere where this function exists. So in some respects, that solves that problem. There will come a case where there just isn't any more of that function available, and then, absolutely, you're kind of stuck because you've maxed out the capacity of one element of your cloud. But in general, you can make that work. In a cloud that has a diverse set of servers, each with different sets of hardware acceleration capabilities, I imagine it gets very complex, the combination of, well, you know, that server has SRIV, but not DBDK and that other server has something else, but not something else. And this VNF works better in this, you know, and people who are submitting VNF have characterized them only in certain. So is it likely to end up that most providers will end up sort of with the lowest common nominator and sort of almost disabled hardware acceleration and just, or, you know, can you just speak to that? It's an interesting question. I mean, even the minimal case writers, I like SRIV, but I want two interfaces. So I have to find a card that's got two virtual interfaces that I can have. And even that can be a problem under some circumstances. I don't think it necessarily forces you to the lowest common denominator. I don't actually know how this is going to work to be true, to be perfectly honest. It'll be interesting to see. But what is absolutely clear is that the scheduler that we have today, which firstly is part of NOVA, it's scheduling on things that are no longer solely NOVA's responsibility. So we're going to be starting to think about, you know, available PCI cards for pass through, which is loosely kind of sort of NOVA. But if we start moving on to available network elements within the fabric or anything else, or, you know, latency, any of these sort of problems, then that scheduler is going to need to know more. I make no judgment of its intelligence, but it's absolutely going to need more base information to work with than it currently gets. So I think that will be the sign that we're drifting in the direction of multiple interesting functions to work with. It also sort of works counter to a cloud, where everything is kind of fungible, if you like, or roughly equivalent. If you're saying I want a specialized network service and there's a limited capacity for that network service within the cloud that you're using, then eventually you'll just exhaust capacity, which is, you know, annoying when you exhaust, you run out of, no worse really in some respects than when you run out of CPUs, but you've got memory left. You've got exactly the same problem. The more complex you make that thing to solve, the more likely you are to have sort of spare something, spare one resource, and have used up all the others, so you can't run anything. Okay? Just a quick question on the hardware piece. So how you envision or what's your experience with mixing hardware in a cloud environment, and how do you operate, and how do you roll them out and operate them in a cloud environment? Well, it's interesting. I have to admit that I'm not on the production side, the production cloud side of Cisco, but I mean, in general, I don't think we've seen much of a problem. We've done some on the neutron coding side of things, then when, again, coming back to SRIV, because it was convenient point start, then one of the things we find when writing software, writing extra bits of OpenStack, is that each element needs to know itself and report itself what it's got, because you add compute nodes simply by spinning compute node up and saying, that's the controller that you need to be talking to. It reports in and says, these are the resources I have. So in much the same way as it says, I've got 16 cores and 256 gigs of RAM, it says, and I've got an SRAV card, and I can give you 64 free NICs if you want me to. Theoretically, as long as your servers are doing that, and again, they report their CPU type and everything else, and they report it up so that the scheduler can make use of it, then you can mix and match as you see fit. As I say, the problem comes on an operating perspective that you run out of one resource when you've got plenty of some other resource left over, but there's nothing fundamentally that stops you doing that. Okay, I had another question after all. So did you see any need for multi-Q virtual interfaces in the work that you were doing? I guess if you're running DPDK, maybe if one core is not enough for handling the traffic, then you might have a use case for that. The answer is, I don't know. I know roughly what that DPDK application is doing, but not the detail of it. I don't know whether they actually put multi-Q to work. It's not supported by OpenStack at the moment. It's supported by the hypervisor. Well, in that case, they haven't come and asked me, so I would guess, no, we haven't used multi-Q yet, but that's a fair point. Yeah, now you mentioned it. I remember the patches because there were a lot of debate about whether we should be putting that patch in or not about two cycles ago. But yeah, no, it hasn't come up, but I will have to go back and ask them about that actually. Thank you. Anybody else? Okay, well, in that case, thank you very much for your time.