 Okay so hello everyone welcome to the last next seminar for the year. Today's speaker is Aditya Kiala who is an associate professor at the University of Wisconsin-Madison. She got his PhD from CMU back in 2005 and before going to Madison to spend a year here where he was part of the ethane team along with Martin and Nick and other people. So his research in this topic are varying between software-defined networking and network function lately also he's working on video quality experience and network architecture in general and today's talk is going to be on the crossroad between SDN and network function and a joint control platform that helps get the benefit from both of them. Welcome to my talk. So I'll describe the system we've been building over the past year and a half or so called OpenNNM and it's a system that allows you to exercise joint control over network functions for middle boxes and network forwarding state and talk about why that's important and how we achieve that. This is work done by a bunch of my students at Wisconsin and I remember I was the lead student with the design of this system. So network functions are also called middle boxes. Let me just quickly walk through what these things are. So traditional networks have routers and switches and they do fairly simplistic things often like forwarding packets you know make it sometimes quality of service and rating meeting and so on but often network operators need a lot more functionality out of their networks for security purposes or compliance or performance and middle boxes or network functions are these devices that help them bridge this gap. So there are ways in which operators can use custom packet boxes and functions into their otherwise very simplistic networks. So there are many of these things like load balancers, firewalls, caching poxies, gateways of various kinds, plan optimizers, inclusion systems, traffic scourge, what have you. And the key thing is that compared to their routing and switching counterparts these devices often do fairly intricate stateful processes for the packets they see they do fairly intricate bookkeeping in terms of per flow state what state connection was in, session was in. So why are these things interesting? Well these are not arcane devices that you'll find one in a network or so on. So here's a picture from a paper published in SIGCOM a couple years ago that did an analysis of different kinds of devices in about 55 different enterprises. Essentially the key takeaway from this picture is that in the different enterprises that were studied there are at least as many middle boxes as there are routers and switches. So it's not just in enterprises that you see these devices but increasingly there are studies that show that these kinds of special packet processing devices are also common in cellular networks, in ISP networks and as new applications arise or as new devices come into play or new threats emerge new middle boxes get thrown into the mix. This is a rapidly growing very diverse market. So the state of the art in these network functions and middle boxes there are a couple of different trends that are driving this. The first is that this term may have network functions virtualization and broadly here what's happening is that there are traditionally these devices were deployed as hardware appliances that are increasingly being replaced by software devices or virtual machines that run, that package the functions that otherwise used to be implemented as hardware. The advantages here is that compared to the hardware counterparts they offer much more cost, they are somewhat easier to manage, they can be operated much more easily and some recent work that appeared in NSTI last year shows that it's very easy to provision these things. You can spin up new virtual machines that have middle box functionality in as little as 30 milliseconds. So it's very easy to sort of bring these things up, operate them and move them and so on. The second trend is the use of software defined networking to stream these functions together in a network. So traditionally these devices, these hardware appliances were deployed in choke points within the network and operators had to cajole distributed routing protocols to force specific traffic subsets to go through those choke points to be processed. With SDN, what you can do is you can actually decouple these devices from their physical location within the network, deploy them on path and then steer appropriate traffic subsets through them. This is a lot of interesting consequences. You no longer have single points of failure, single points of congestion and all of that stuff because of this decoupling, so you get a better performance. But you can also do interesting things like chain and you can have different kinds of middle boxes to deploy at different locations within the network and then you can use SDN to chain traffic for appropriate traffic sequences. So this is sort of the state of the art in network functions today. There's another kind of NFV approach which I call sharding which has been advocated by NYSERA at all, which basically takes something that would normally be a middle box and actually split the processing up into usually processing that happens at the edge of the network, possibly like in every VM or hypervisor. So it's sort of like sharding it by server and splitting it up and pushing it to the edge. In that case, the middle box itself may not really exist anymore. Maybe it's sort of an algorithm and controller or maybe it's just a lot of processing that happens, maybe in hypervisors, maybe in the VMs and cells or the virtual switch. Which one of those, which one of these, is that sort of a combination of this or is it a third approach or is it in either one of those categories? Yeah, that's a good one. It's not exactly, it's not that because you slice it out. It's leveraged by SDN but it's not traffic storage. It's sharding. I think it's a third approach. Yeah, so it may be something in between. This is not meant to be exhaustive in any way. This is just to set up what the rest of the structure is. That's absolutely one interesting way in which network functions can be instantiated built up from the underlying infrastructure. And what I'm going to describe may have applications for that but I have to think really hard about sort of what trade-offs it might be. But for the purposes of this talk, let's assume that this is the set up here. All right, so this talk is about what I call distributed processing. This is something that's enabled by SDN in particular with in combination with NFP. And here basically we're talking about services with abstractions where you may want to dynamically reallocate processing of traffic across different instances of NF. So the simplest example that comes into mind is something like load balancing where you have two different instances of middle boxes. When one of them runs hard, you basically use SDN to take some subset of traffic being processed by that and move it to the other middle boxes. This can give you sort of maximal performance out of those two instances of the middle box and a fixed cost. But the more interesting thing is that when you combine this with NFP, you can use this sort of distributed processing and dynamically allocation as the basis for building interesting new abstractions. So one abstraction, for example, is you can build the abstraction of an infinite capacity or an elastically scalable middle box. So one middle box runs hard, you deploy, you can use NFP to quickly spin up additional instances to meet capacity and steer appropriate traffic subsets to them. You can build the abstraction of an always up-to-date always available middle box. When one of the middle boxes is using stale version of software, you bring up an up-to-date version, you keep it live, you move traffic from the dated version to the updated version and decommissioned over one. And you can always also build abstraction of middle boxes whose functionality can be dynamically enhanced. So suppose this was deployed inside an enterprise data center, it's processing traffic and it sees something anomalous but it doesn't have the functionality or the resources to do further analysis, in which case you can invoke a brawnier instance of this middle box in the cloud, take the processing for the subset of traffic for which you observe anomalies and hand it off to this brawnier version in the cloud, which has the ability to come up with richer analysis for the traffic. So you can build these kinds of interesting abstractions if you're able to dynamically reallocate traffic across different systems. So what is missing though is that in the context of these distributed processing applications where you have this dynamically application, today there is no system that can allow you to simultaneously meet SLS. So for example, ensure that the deployment as a whole offers a certain minimum throughput. Ensure accuracy or efficacy of middle boxes that the collection of IDS instances that you're operating as a whole, raise alerts for all HTTP flows that are known to contain malware and keep costs low or efficiency high in the sense that when the resources are not being used, you tear them down and say what the cost is to you can use NFP and its combination with SDN to achieve one or maybe two of these, but you cannot simply just use them to achieve all of them together and you need something more than the control that NFP plus SDN can give you for achieving these kind of things. So let me give you an example of this. So here's a simple example where the goal is for this chain of middle boxes, you want to ensure that the throughput stays high and you may want to scale out middle boxes as they become bottlenecked to ensure throughput stays high, but you want to keep costs low. Two sets of flows go through this chain of middle boxes and because these are stateful packet processing machines, they all establish state for these two sets of flows. Something happens with these two sets of flows, traffic patterns change, volume increases, causing one of these stateful processing entities to become bottlenecked. So in order to sustain throughput, you may decide, okay, I'm going to deploy an additional instance of this middle box. So now you have a couple of choices. You may say, all right, I'm just going to wait for new flows to arrive and some of these existing flows to die out. I'm going to move all the new flows here and then that will even out my load and get me back to the performance that I was hoping for. The problem here is that these flows may never really die out or go down in volume and there are actually no new flows that you can reactivate to this new instance. So your bottleneck continues to persist and your SLA suffers. You can say, all right, let's forget about that. What we will do is we'll just take these blue flows and move it to the new instance. The problem here is that this associated state that you ended up creating for these blue flows is not available at its new location. So this may raise false alerts or false negatives. You may miss some attacks because you don't have the necessary context and this impacts the NF's efficacy. And this is the scale of case. And in the scale-down case, similarly, when you no longer need the second instance, you may say, okay, I'm going to decommission the top instance and the way I'm going to do it is I'm going to wait for flows to die down. And when there's no more activity happening at the riverboats, I'm going to just decommission it. But as I will show later, there are some traffic flows that can last several hundreds of seconds. And so waiting for those flows to die down can actually span several tens of minutes, if not hours in various situations. Can you give the sense of what types of statistics some of these things are actually keeping track of in particular what intrusion prevention only to make its decisions only keep track of statistics per flow or does to actually make decisions to them? I'm actually going to go into that in fairly a lot of detail because we need to understand that to be able to control that. So I'll give you a concrete example, specifically in the context of the broad intrusion prevention system. Thank you. So hold on to that. Any other questions? Sure. By accuracy, do you mean that like the intrusion prevention would like not work correctly? Or is it more realistic? So two things can happen. One, it may miss a tax that it would have otherwise detected if it had the combined capacity of these two middleboxes. Or it may raise false alarms because it thought something anomalous was going on. Both of those are a pain to deal with for operators. And that will hopefully become clearer with that example. Okay. So in this case, what we really need is in addition to moving the blue flow to the top instance, you want to move the blue box state to the instance of the flow's new location. So essentially what, in order to meet all these three things simultaneously, what would be nice to have is a mechanism that allows you to transfer live state while updating forwarding a piece. Open NF is a system that is built for this purpose. It enables quick and safe dynamic reallocation of processing across NF instances. So these are important objectives here. Quick essentially means that any reallocation decision invoked by an application, in this case elastic scale out, can be invoked at any time and it will finish predictably soon. By predictably, I mean to say that it doesn't, the amount of time it takes to finish doesn't depend in any way, shape or form on the flow arrival patterns or flow completion patterns. Safe here essentially means that in the context of these distributed processing applications where we are transferring live state, we can guarantee certain key semantics for the live state transfer that the while we're moving state, no state updates are missed. State updates happen in a certain order because these things impact the outcomes of the actions of the middle boxes. So this is a system, this is the only system out there that can ensure these two properties. Because it can do this, it can enable the creation of rich distributed services applications that simultaneously allow you to meet SLAs, ensure the efficiency and ensure efficacy of middle boxes. So that's what OpenNF is. I'm going to go into details about what it looks like, how we manage state and how we control it, what key ideas we use to get to the point where we ensure safety and liveness properties of different kinds and I will talk about initial results from the evaluation of preliminary program. So this is sort of roughly what OpenNF schematically looks like. There are a bunch of APIs that middle boxes implement to speak to the controller. Controller has a bunch of APIs that various applications use to invoke state transfers. Applications here are things like elastic scaling or dynamic enhancement of middle box functionality or whatever you may have. The applications invoke high level reallocation operations depending on their internal logic. For example, move this set of flows from this instance to this other instance. In response to that, the controller imports and exports state from different middle box instances controlled by the application, all the while doing that in coordination with the network. So this is roughly what the system looks like. So the key there are a bunch of challenges that we face in realizing the system and ensuring that things can happen quickly and safely. The first challenge is that we all want to design the system for a particular class of kind of like just knacks or a particular class of other devices who want to be able to be inclusive and be able to bring in as many different kinds of network functions as possible. One way to do this is to create a programming model where we say all nfs have to be written according to this particular way of creating state and managing state that would make us easier to control the set state but that may limit the kind of nfs we can bring into the world. We don't want to do that. So to this we designed a simple API where we relegate a lot of the state gathering functionality to the nfs and this will become clear in a second. The second sort of the technically deeper challenge that we face is in how we guarantee these safety properties that I described. And these safety properties arise because of race conditions. When state is being moved from one instance to the other, packets may arrive at the state's old location causing state updates and those updates may be missed, may not appear in the state's new location or they may appear out of order and state can become inconsistent across different locations of where it was. So to do this we developed a bunch of state update primitives that ensure provably guarantee certain safety properties while the state is being moved. And the third thing is that when we are moving the state and we are trying to guarantee all these properties we end up imposing a bunch of overhead on the metal boxes. They take some amount of CPU away from them, some amount of memory away from them. It also creates extra overhead on the network and we want to be able to found that in some way so that they don't interfere in the objectives of applications that we are trying to realize. And to do this we designed a fairly careful northbound API that allows applications to invoke certain things or shut certain things down to control over it. So let me go into what is the state that OpenNF is actually trying to control. So this addresses your question of what kind of state are we talking about here. So the way we approach this is we looked at a bunch of open source metal boxes out there and examined what kind of state they create. And essentially what we realize is that the state that is created or updated by a metal box applies either to a single flow or to groups of flows or to all flows seen by the metal box. So here's an example for Bro ideas. Bro is an intrusion collection system developed at Tixie, very widely used at a bunch of campuses. So one set of state that Bro maintains is a bunch of per flow state. For every flow that's that proceeds, it maintains connection objects. Each connection objects has a TCP analyzer object or an HTTP analyzer object which keeps track of TCP level state and HTTP and HTTP session level state. In addition, the Bro intrusion detection system also maintains a multi flow state. This state is shared across multiple flows. Flow is in the traditional IP triple. So one particular example is port counts which is on a per destination host basis to look at all the different hosts that are connected. And this is multi flow state of that kind that's maintained by Bro ideas. And finally, there's all flow state which is state shared across all flows seen by the instance. There's some sort of rough statistics on traffic. So we classify state on the basis of this scope. We talk about per flow state, multi flow state, or all flow state. And using flow as a handle for reference to state is a natural way for us to reason about which state to move and which state to copy or share across multiple boxes because this is also how many boxes today are designed with respect to how they manage state. So these are the kinds of state we're going to be dealing with. Per flow, multi flow, or all flow states? So how do you define between single flow state and multi flow state flows? So the question is? You've got the previous slide. So how do you think between multi flow states and all flow states? How do you know that the flow of it is like a reference? Yeah, so. Either of these. Yeah, so basically, it's not an application doesn't exactly need to know what kind of state it is. The, in general, per flow state is something that you would want to move, right? Multi flow state is something that you would want to share. Per flow state is more clearly defined by specifying more fields of the connection to the multi flow state, which can be specified on the basis of just one of the fields or iterations of some sort. So the application doesn't actually need to know the details. It can, as you will see on the next slide, it can provide a filter. And the middle box, if it has multi flow state, offers that state. Or if it has per flow state corresponding to that filter, offers that state. So for pro, you just looked at the source code to figure out which one is actually per flow and multi flow. Yeah, so in the middle box, yes. So we'll have to kind of say, OK, so this is multi flow state in pro. And this is per flow state. And we will need separate handlers for dealing with multi flow and per flow state in case of pro. But applications themselves don't have to be aware of what nuances there are for how the state is being managed inside the multi So the middle box is aware of it. So the middle box is aware of it. The middle box, when you rewrite a middle box to match this API, we need to know what we are talking about. The control and the applications are against the code. So this autonomy may work for pro, but do you imagine other middle boxes who might have finer grant states than per flow? Absolutely. So let me go on to the next slide. And we have actually built up this for a bunch of different middle boxes and it'll become clearer how we deal with that. So essentially the way the API for exporting and importing the state is basically get, put, and delete F, where F is some filter that describes the state. And for each of these, we have the scope. So get multi flow, get per flow, or get all flows. So this filter is defined on the packet header fields. So this filter gets handed, the get multi flow or get per flow gets handed down to the NF instance. And then it is up to the NF instance to identify and provide all the state that matches that particular filter. So if it's much more fine grained, it'll collect everything that corresponds to that particular filter F and hand it off to you. But the filter might not be defined based on flow. Maybe it's based on the packet payload. Yeah, absolutely. So we don't talk about filters that are on the basis of packet payload. Right now the filters we deal with are either things like URLs or the connection. packet payloads are the hardest challenge for us to deal with. Although there is nothing, we just haven't implemented that yet. There's nothing fundamental that prevents us from being able to do that. So can you curate the state of the middle box? Let's say there is a middle box that keeps log of the long list flows. Can you curate and say, give me the time? That's a great question. Or export. So the open NF controller, because we want to accommodate a bunch of different middle boxes that could be implemented by different vendors, it actually doesn't know what the state is. It's agnostic to the nature of the state, the structure of the state. All it does is it allocates state across different instances. So the application that's designed to work with a particular set of middle boxes may involve a little bit more about the specific state for those middle boxes and may be able to do those kinds of queries. But the middle box needs to expose that through the API in some fashion. The controller itself is agnostic of the nature of the state. So the controller just implements the transition. It only does reallocation. That is, take things from here and move them over there or copy them over there. So it's very simple like the other to expose the store. So what are the semantics of the state? Like, as you said, middle boxes can actually be distributed across multiple instances. So are there some restrictions on what the state's physical scope should be for this interface to work with? So in general, the application deals with one logical middle box. Queries for state from that logical middle box. In our current implementation, that is tied to actually a physical instance. So what is this application here? So the application are things like these things. These are applications that are on top of the controller. And they actually sort of, at least at a logical level, they know that there are these many instances that I have deployed or elastic scaling would track how many instances are being used. And they deal with those specific things and moving stuff across those specific instances. You can build another layer of abstraction that hides out all of those different instances and presents one logical nf abstraction, in which case the kind of management functions you would be able to do on top of that could not be as fine-brained as you imagine. No, let's say that there are two go instances here. And they both track the state for HTTP flows. And they do it in a distributed fashion, based on how the network actually drops the traffic. And Bro has its own internal semantics for, hey, this destination IP is consuming so much amount of HTTP traffic. The physical state could actually be distributed across these two middle boxes. So there's open nf controller actually to take care of getting the state from each of the middle boxes, aggregating it. Let me get to that. So the API will become a little bit clearer. So right now, I'm just describing move. There are some states that multiple middle boxes would need to share, like some HTTP processing happening in one instance, similar processing for an overlapping set of practices is happening at another instance. There you would need to share the state you maintain for those. And there the nf API doesn't work with move. You end up doing copy, which is periodically copy and keep the state consistent across the instances. I'll get to that later. So basically, the thing is that nf's don't need to change anything about the internal state organization. They would have to be modified to identify different scopes of state. And then respond to the controller by punishing the appropriate piece of state. But they don't have to adhere to a specific way of aggregating state or dealing with how state is managed or modified. So let me go into the operations, which is what the applications can do running on top of future instances. So a high level application like a load balancer may decide to reallocate port 80 close to a different instance. From this instance on the left to this instance on the right. So the first thing we may want to do for this is to move flow specific state corresponding to port 80 to this instance. There may be some state that is shared across different kinds of flows, like connection counters, which span different ports. That state may have to be copied or shared across the different instances. And essentially, what OpenNF provides is various guarantees for these operations. It can ensure that when you're moving state, for the per flow state for port 80, that it can be moved in a loss-free way or in an order-preserving way. And when you're creating copies of state, that you can ensure various notions of consistency over the copy layer, keep it strictly consistent, and strongly consistent or eventually consistent. So that's sort of different sets of operations that you would allow the controller to implement on top of the instance. So this is sort of roughly how move would look like. So there are two instances here. Control application issues are moved, like we saw in the previous example. The controller issues a get port 80 to the first instance. It doesn't know anything about how the instance actually manages its state. In response, the instance, depending on how it internally manages state, may send a bunch of chunks with some opaque identifier to the controller. The controller, once it receives all of these identifiers, issues a delete of that state, and then puts the corresponding chunks into the destination instance. Once the put returns, it updates the forwarding state, such that all the flows corresponding to port 80 now go to instance 2. This is roughly how a move operation and a high-level system look like, yeah. So it's a little confusing to me is what types of consistency mechanisms you have. So it seems, at least, that you need to make the get and delete a transaction or something or have a lock. So I'm going to come to that. So the loss, tree-ness, and auto-preserving semantics that I described are essentially those safety properties that you want to ensure hold for this whole. For copy, there are other kinds of traditional consistency models, like eventually, or stop at the end. Did I misunderstand, or are you going to delete in the peripheral state before you update it, before you actually reroute? Let me come to that. That will become clear when I actually set the base condition. And holding the team to that. OK, so that was intentional. Yeah, that was intentional. So let's assume that this is kind of how it works, and things can go wrong. So I talked about these kinds of semantics. You ask what kind of semantics would you ensure for these state operations. Well, when you're moving live state, what can happen is that some packets may end up at the wrong instance, and those updates may be lost, or they may arrive at the wrong order at some instance. So why does that matter? So you may ask, some middle boxes would probably be robust enough to such things happening anyways. Why would that matter? Here's an example of bro. Recall that bro often operates in an off-path manner. So it only gets a copy of the packet. It's not on path. So here, bro runs two states. There's a vulnerable browser detection state and a weird activity detection state. These are popular things that exist in bro already. What this vulnerable browser detection state does is it reconstructs MD5 checksums for HTTP responses to look for malware signatures. And this kind of a script is not robust to losses of packets. So when packets are lost because of our move, the script may see a hole in the HTTP response, and it may decide either they ignore it, which could be a false positive, or it may decide to compute a checksum with it, which could be a false negative. That's kind of why we would want, in this particular case, to make it robust to losses. The weird activity detection state essentially looks at different directions of a transfer, and it looks for SIN and data packets being seen in unexpected order. So data is getting transferred from server to client, and all of a sudden you see a SIN going the other way, and that is a flag that bro would raise. And this kind of an out-of-order sequence of actions could be something that our move imposes because of the way we are structuring different data. So the pro ideas should be robust to losses anyway, because it sits out of power. Yes, absolutely. So given some number of losses within the network, you would expect that pro would end up with a number of false positives and false negatives. We want to ensure that our move doesn't change that base. So given that, if that's the output that you would get, we want to be equivalent with respect to that output. Because once you say, well, it's robust anyway, it's just imposed losses, you don't know how far away from that baseline you end up. But is that like a well-defined baseline? It's something, I mean, it's middle box specific. Each middle box is designed to deal with some amount of loss or reordering of packets. And then some logic around that determines how the middle box would end up treating that. But as I will show, if you are careless, you can end up losing thousands of packets. And then it is just not clear what the middle boxes are going to be. So let's talk about this race condition and what exactly would happen. So here's the move operation. So packets may arrive during a move operation. So a move is issued for the blue flows. You start moving the blue flows. And when that has been moved and you've deleted the state, another packet arrives and updates state for the move. And then the routing forwarding update happens. So the problem here is that the second instance is missing this update. And this instance also is seeing unexpected state being updated. So this will raise some sort of a false alarm. And this, depending on how the logic is implemented, may either result in a false positive or false negative. So there may be a couple of ways in which you can implement this that come to mind to make sure that you're robust to such loss. The first thing you can say is, OK, well, I'm not going to keep forwarding traffic. I'm just going to freeze all traffic, make sure the state move happens, and then let blue traffic go to the second instance. Would that work? Would that ensure safety? Well, it depends on what you need. But I mean, you could lose packets. You could still lose packets, right? Because what would happen is that the packets that were in transit. And the timing's also messed up. The timing is not a routine. Let's not go there. But you cannot prevent losses completely by just doing that. Because the packets that were in transit at the time that you stopped forwarding and you started buffering may end up being dropped by the first instance. So you can't completely overcome losses this way. But for the part of the transit, if you wait some of the time, in such a position, you can prevent this problem. Absolutely. Exactly. You don't know how long to wait. And some of the things that I indicated, there is a service where you're moving processing through the cloud, where you go over the van. It's unclear how long that instance would have to wait. There's one out there. Because Aditya has a hard stop at 1.15. Let's keep questions a bit low. Right. We need to go again. OK. OK. So essentially, the semantics we want to offer in this particular case is this loss-free property, which essentially means that all state updates due to packet processing should be reflected in this transferred state. And all packets the switch receives should be processed. So both of these conditions should be simultaneously. Those of you who are familiar with the consistent updates paper from Princeton may think, why won't you do consistent updates? So if you just mark packets with certain version numbers, that should kind of take care of this. It turns out that it only can guarantee that a packet is processed by some switch, by some instance. But it won't guarantee that the outcome of that processing is reflected where it's supposed to be there. So there's some subtle detail for why consistent updates won't work. OK. So the key idea is we have this thing called events that essentially are things that NF instances raise that help the controller observe or prevent or stage updates to state as they have. So here's how the loss-free move essentially works. So it's a slight twist on the buffering and stopping traffic idea that I described on the previous slide. So instead of actually buffering packets at the switch, what we do is we let the packets go down to the NF instance. We enable events on this instance. Essentially, this identifies the specific set of flows for which these events apply. And this basically says, when you see a packet, raise an event to the controller, but drop it locally. Don't process it. So in this case, when the get or delete is issued, the state corresponding to the blue flows moves to the controller. A packet may arrive. The event handler catches that, raises an event to the controller. The packet is not processed locally. At some point, the state is put onto the destination instance. The put returns. When the put returns, you flush all the buffered events to the destination instance for the package to be processed there. And at some later point, you update forwarding. And any packet that arrives at the original instance is still caught by this event handler sent to the controller and eventually punted to the destination NF instance to be processed. So in this particular context, events allow us to let packets go all the way to the NF instance. And so we can track which specific packets were ended up at the NF instance and punt the processing for them to the eventual destination. Does that make sense? So future packets coming in after routing update are processed by the destination instance. So we can prove that this essentially ensures loss-free move across different NF instances. Other things can be reordered. Things can be reordered, and you might still be above it. Yes, that's the next step. Any questions about the loss-freeness, though? I mean, if you assume that the controller can proper up. Yeah, yeah. So that's a simplistic design. There are ways of improving. So buffering can lead to loss of issues. First, you're sending events at 98 to the controller for the entire filter F. And that itself poses some overhead. Why do we have the controller? Why not just do it at the native of itself? And what happens if you lose the controller itself? Sorry? What if you lose the controller? Well, sure. But that is not specific to what we have. It happens in our field. But the most point is a good one. The middle boxes are already scaled out, right? They have a lot of features. Right, so absolutely. So basically, so the question was, can you have events go across middle boxes? The problem is that we cannot prove safety guarantees there. We don't know whether we have caught all possible events. That's a certain piece of detail. Let me get back to that. At least as it stands now, we don't have a proof for the fact that loss-freeness can be ensured if events are sent directly across middle boxes. By the controller being on path, we can observe what's happening and ensure that the last update ends up getting to the end of the system. We lose that when we leave it to the end of the system. It may be possible to be able to do that, but we haven't gotten to that yet. That was your question. So all these guarantees are assuming no failures? Yeah. Actually, it is assuming that you can deliver events from the controller to the destination middle box faster than the events that are generated on the original destination. Why? Otherwise, you will never be able to resume delivering events, like forwarding the events directly, right? If you generate events on the original destination and you're delivering them to the next destination on a slower rate, you will be increasing your buffer sense of the events. Yeah. So in the baseline implementation, we actually buffer all the events. When the put returns, when we know that the new state is there, then we release all the events. So we assume there's sufficient buffering at the controller to do that. But that can impose arbitrary delays, and that takes a lot of buffering. We have certain optimizations that I'll get to when I talk about the evaluation that allows us to release events as state chunks get put at the destination, in which case, we would be able to move events out of the controller. There, we are certainly assuming that if you want to keep the buffer in control, the size of the buffer in control, then you will be able to send out events at least as fast as they are generated from the source instance. You would really have to be faster, otherwise, you're about to roll every train. Yeah, absolutely. But at some point, the forwarding update happens, and then the events stop. So what we are talking about is the interval between the state being moved and the forwarding update having taken. It bounded amount of time. But if you imagine that the pipes roll it, it's line-rate, right? Once you start buffering things, you'll never drain the buffer. Yeah, absolutely. You'll never drain the buffer ever. So let's, OK, so the line rates of let's come back to how we deal with that. In this thing, it's not obvious how we would do it. But hopefully when we move on, it'll become clear of it. So Bob, you raised this point about out of order. So what could happen is on the slide, you know, you control the flushed packets. And then at some point in step six, forwarding got updated. So that's step five. Some packet B2 got flushed by the controller, was sent to the switch, and the switch forwarded it to the eventual destination. Some packet B3 came. The forwarding state has not been updated yet. It went to the original instance, and an event got raised, and that got buffered at the controller. At some point forwarding update kicked in. The packet B4 ended up going to the destination instance. And the buffered packet at the controller gets released at some point, and arrives out of order at the destination instance. So this is an issue, an example where out of order arrivals of packets can happen. And like I just said on an earlier slide, Bro's weird activity skip, for example, is one case where it is not robust to such out of order arrivals of packets. So here what we want to make sure is that all packets are processed at NS in the order in which the switch or the network forwarded them out to either an instance. This is the property we want to ensure. I don't think I'll have the time to be able to go into the details of how we do the order preserving move. But the essential high level idea is that we use events again to track the last packet received at an instance. And until we know what the last packet received at an instance is, we buffer all packets at the destination instance. Once we know the last packet received at the old instance is, we release that, have it be processed, and then let the buffer packets be processed. That is one way we can ensure order is processed. So just a quick, very high level, fast overview of this. So we flush all buffer packets to the destination instance. We create this set of events on the destination controller. What this says is on the destination instance, what this basically says is each time you get a packet, instead of dropping it, buffer it locally for later processing. And after this event gets enabled on the destination instance, we forward packets to go both to the original instance and to the controller. And this way, we can actually track what the last packet seen by the original NF instances. Once the packet event goes to the controller, it gets released. Because this packet is processed locally by the controller, you get an event back saying the processing for the packet is done. Once you get an act back for the processing instance, you essentially, any future packet coming in gets buffered locally. But once you get the act back, that packet b3 has been processed, you release all the buffer packets to be processed below the instance. So that's a very quick overview of how we deserve order. If one of both of these special packet b3 is lost, how does it happen? That's going to ask the same question. Sorry. Which of those packets, which you used to indicate? Yes. We have put into the transition. If one of those, or both of them, get lost in the process. So you're talking about these packets. Yes, b3. Yeah, so basically, in this case, if the packets are lost, so one set of things is that the controller also examines the counters at the switch to understand how many packets have been forwarded. And that's one way we can determine when the packets have been emptied out for a particular older forwarding ending for the flow. The specific case where loss happens, that can be your safe cut. By examining the counters, you can say, look, I haven't seen any further packets come on this channel. But I got this event from here. But my counters tell me that the packets seem to have cleared. And at that point, you can release the packet. So the final thing is just sort of being able to found the overhead. And essentially, in this case, we allow applications to control the granularity at which they do various operations, like moving state or copying and copying state. And they can also determine what kind of guarantees they want for certain operations. Do I really want both loss-free and auto-preserving, or just loss-free, or what kind of copy do I want, and so on? So here's a quick overview of the application that I showed you on the previous slide. So the things that I described on the previous slide will become clear on this case. This is, again, those two scripts, vulnerable browser detection and weird activity detection that I talked about earlier. This is the third script that looks for port scans. So here, what you want to do is we want to move this red flow to the second instance. So the red prefix go from old instance to the new instance. The way this is implemented is in doing that, you first copy the counters or the multi-flow state across the two instances. You move the state, per flow state corresponding to the red flows from the old instance to the new instance. And you request that that move be loss-free and auto-preserving. And because this is state that is shared across the two instances, at regular intervals, you copy from the source to the destination instance and the destination to the source instance to keep these statistics loosely consistent. So we need to copy multi-flow state because the scan detection script relies on this shared state to be able to identify scans across the instances. The loss-free because auto-preserving is something that's required by the weird activity detection script that I understand that was looking for transfers going in both directions and out-of-order arrivals in the two directions. And the vulnerable browser detection script needs the loss-free property. Makes sense, so this is roughly how the load balance monitoring application would look in open-end. So going back to some of the questions that I got earlier, so we grow is probably the most complex of the different middle boxes we modified. And we also looked at a bunch of other middle boxes there's roughly four to 10% increase in code. The amount of stuff we had to change in each of these middle boxes varied with different middle boxes. Quid was particularly challenging because we had to deal with TCP socket stick and serializing that and deserializing that was non-privileged. But in general, the amount of modifications we had to do was roughly bounded to about 10% of the size of the entire code base for these middle boxes. We've written this controller as a plugin to floodlight. And the results I'm going to describe are based on a desperate way there's a HP proclose which connected with a bunch of servers, some of these servers act as middle boxes, some as the open-end controller, and some as traffic generators. Okay, so this is sort of to just measure at the middle boxes what do get and put performance numbers look like for different kinds of middle boxes and for different amounts of per flow state that you're exporting and importing from those middle boxes. And this is the total amount of time for doing that export and import. Essentially, what we see is when we broke down the costs of what's happening at the middle boxes, it was dominated by having to serialize the state and then deserialize the state for put. That was the dominating cost for these two operations. The other key takeaway from this graph is that as the complexity of a middle box goes because of the structures of the state maintained, the time taken to forget or put to complete also increases. So in general, it's the growth at linear? In, well, at least... It looks worse than linear. Well, so it's linear. So we double the size of state and it kind of roughly doubles in these cases. I don't know what it would look like if it's enormous. So at least in the sets of things we examine here, it's... What looks for 500, it's 400 milliseconds. Then for 1,000, it's more than 800 milliseconds. So it would be super linear. Yeah, a little bit more. It looks like maybe the overhead is increasing somehow, or maybe that's just an experiment right now. This is... Yeah, I mean, there ought to be error bars on this, but at least in the sort of... We experimented with 250 to about 2,000 posts. In that range, it seems linear. Like I said, if there's a large number of posts that we're moving, I don't know what the scaling properties would look like if there's something you could accept. Pro was designed to handle huge number of posts. That's right. I mean, pro is designed to handle... Yeah, that's right. And so... It's much larger number of posts. How would you handle that, please? Yeah, absolutely. So... So absolutely, we don't want to... So we are not moving the end, at least in sort of the scale-out situation or reallocation situation, we're probably not going to move the entire set of millions of posts to a new instance. We probably need to move piecemeal. So if it is the case that the scale-out is not linear, we would have to stage the moves, right? Instead of saying, okay, move everything now, we would have to kind of divide the process and move it piecemeal. That's kind of how we would have to handle it. And that may indicate the overall move time, but at least in this implementation, that's kind of how we would do it. Okay, so this is sort of the high-level operation, like move with the loss-free and auto-presenting guarantees, what would the performance look like? Here we have some number of flows with 5,000 packets per second for this middle box called PRAX, and we want to move all the per-flow states from one middle box instance to another middle box instance. This is the total amount of time it takes to move. So when there are no guarantees to move 500 flows at this particular date, you take about 200 milliseconds. This parallelizing optimization is something where instead of first getting everything and then putting it, you parallelize the get input for different chunks. That helps you reduce the time. But the problem here is that a bunch of packets are dropped because you're not actually asking for any guarantees. In this specific place for 500 flows, it's about nearly 500 packets are end up getting dropped. With loss-freeness, no packets are dropped, but the overall time it takes as measured at the controller for the move to complete inflates by about 2X corresponding to this. And although packets are not lost, all those packets captured by events see some amount of delay because they end up going to the controller, they are buffered at the controller, and then they are released. The average delay is something like 100 milliseconds and the maximum for a given packet could be over 200 seconds. There are certain optimizations we could do further. On top of this, this was just parallelizing get inputs. This other optimization is ER or early release. And essentially here, instead of waiting for all the puts to finish and then releasing the events, we release events for chunks as they are put at the destination instance. That won't change the overall completion time for the move, but it will reduce the per packet delay because the overall completion time is determined by the last chunk being moved. And once we have order preserving optimized move, that takes the largest amount of time, over 400 milliseconds for the 500 flows, the maximum delay seen by packets also increases. So what is contributing to the delay here? There is a bunch of events being generated by packets, by the NF instances. There is an event handler at the controller that's running on a different thread than the get input, but the scheduling across the two threads is not perfect. The events are, once they're generated, they're released to the switch and the switch forwards them to the event to a destination instance. So they are actually sent as packet out events to the HP switch. And it turns out the HP switch is not very great at processing packet out events. So a big fraction as many of us may have noticed. So a big fraction of this delay is because of that delay imposed by the HP pro-curve switch. But at the same time, the overall time it takes for the move to finish could also be optimized if you're able to move smaller, if the application is able to move smaller pieces of state. Again, because of limitations with the HP pro-curve, which is flow table size that we were dealing with, we could not issue really small moves. And we could not say, okay, instead of taking this entire filters, split it up into smaller filters and issue moves for those smaller filters because the switch didn't have the table, that kind of a table size. But if we were able to do that, then the latency that we are seeing for these packets would go way down because they will be limited by that particular slice that we're dealing with. That's the way to deal with things like the HP switch, which have very poor control data plane interfaces is you would hope that you could just basically use the switch to handle a control plane and then have some, then just sort of mirror packets out one data port and then send them back. That's exactly the optimization that we have done here. Thanks of the latency issues in post-pilot. Exactly, that's fine. As long as you don't care about possibly, as long as you don't care about the ingress for information being lost, then that works great. Yeah, and we can actually get around that as well. I'd be happy to talk more about that. Well, planes are matters in the next movie. And I'm masking things and re-encoding things. There's actually no way to do it, unfortunately. Okay, so these delays in this case are packets that are either buffered in the controller or at the destination instance waiting to be sequenced with respect to other packets. So the key thing though is that this delay is not dependent in any way on the flow or arrival or departure packets. It's purely a function of the load on the middle box, the speed of processing of the middle box and the amount of state you're transferring. Copy can finish about 111 milliseconds for this 500 flows that we have here. But if you want really strong guarantees on shared state, that is very expensive and our current implementation for every packet we impose a 13 millisecond radius. This is something we don't advocate doing with the company. Although we haven't seen many instances where middle boxes would need strong consistency properties for shared state, okay? So these guarantees come at a cost. So applications may want to be careful about picking one guarantee versus the other, depending on their overall options. So this is some sort of result where we compared openNF with other examples where, for example, you sort of scale out and using openNF you can do VM replication. That it turns out because the VM has a unneeded state can result in internet log entries for pro. If you can only do forwarding control only, where you just wait for flows to drain out and then let's say scale down. Again, in the traffic trace we collected, there were some flows that lasted as long as 1500 seconds and so the scaling down would be deducted as long as you want. So that's actually the end of my talk. There are a lot more details that you could find in the paper or feel free to ask me questions about it. But at a very high level, what we wanted to do is to build a system that allows quick reallocation of state, but at the same time we'd be able to, at least within certain conditions, be able to reason about the kind of semantics those reallocations would offer. And we wanted to make it low overhead and ensure that the amount of modification would refer all the middle boxes are also minimized. So you can go to this website to see more details. So I have a little bit of time for some more questions if people, there is something that I left unaddressed. In your research, how do you identify clubs? I mean, what clubs is the club here? What is the handle we used to have? How do you identify clubs? How do you know something about clubs? Where do you get that information from? How do you classify traffic in clubs? So, the API essentially is based on the connection five people, that is how we define as a flow. So, like, you know, source status, destination status, source code, destination code and protocol. That's what defines the flow for us. There are some middle boxes like a packet cache or a return as a laminator where you're keeping chunks of content. There is no flow. All that exists is a bunch of shared content across all floors. So such a middle box will only have all flow state. And the set of filters that you could provide for those middle boxes will also be more restricted. So there's no automatic way for us to tell what the scope of the state handled by a particular middle boxes as of now. We need to do some more instrumentation and analysis of middle boxes to identify what scope of state. Okay, because usually when you do flows, I mean, when it's open for some of the other developers, I mean, let's first identify that something is a plug and then also you can identify that, okay, this is the last item, you need to wrap up the state and say that this flows down. So, I didn't see how you do that. Or maybe take that for granted in your research and say, you know, someone else did this localization somehow or... So wait, is your question about when do you identify a flow? How do you know something is a flow? Because you're dealing with flows here, you say all these flows. So how do you know something is a flow? Right, we just use the connection people that's the handle for identifying... Yeah, but who is doing that classification, that's our last one. Is it the controller or who does that? So it's not, the controller is actually not doing any classification. The applications issue these things that... So a flow is not actually like a TCP flow, it's something that enables, that leavers an ensemble of traffic that a particular middlebox is maybe handling and you want to move that ensemble or copy state corresponding to that ensemble someplace else. That ensemble may actually have a bunch of TCP flows in it and there may be state for different kinds of state for those TCP flows that the middlebox itself maintains. The API essentially doesn't distinguish what those little pieces of state are. It's up to the middlebox to gather all that state and provide it to the controller. The controller is not actually active. If I understand it correctly, since you're dealing with open flow switches, the traffic, the sort of unit of traffic classification is an open flow flow entry. And so other information that you might hear about like who's the user, is he authenticated, what's the application is not encoded in it because the open flow switch is not doing it. Okay, thank you very much. Thanks.