 So, I'm a little concerned, just... So, hello everyone. Today's speaker for NET seminar is Nate Foster. Nate is a professor at Cornell University. Kirikis Ph.D. at University of Pennsylvania, and then a test dog at Princeton. He has done great work on language obstructions and things like phonetic and lesbian space. She just was at O&S for the last couple of days, and today she's going to give a talk on the recent work that they have in this group. Thanks. So, it's really exciting to be here, where you guys did a lot of the work that enabled the work I'm going to talk about today. Thanks especially to Yanis for hosting me and resting me from all the different construction to learn from as well. So, this is work that we've been doing, trying to develop what I call a formal foundation for network programming. And the ultimate goal is to be able to build the kind of precise and automated tools for building network software that we have for ordinary software. I should start by saying that I didn't do this work alone. We used a group assistant to do it. And actually, there are a bunch of human group assistants who did, of course, all the real work. So, Arjun Guha, who's here, he's a post-doc in my group on his way to a faculty position. Arjun is the lead of this project. Mark Wright, who's a PhD student, also contributed. And I'm an undergraduate who has been publishing us build some automated tools on top of all the other work. So, the motivation for this whole project is really to address some serious issues that networks have with respect to reliability. So, just as kind of puff-feast motivation, I thought I'd kind of give a rote salary of examples of network failures that have happened in the last few years. You can go further and find many more such examples. So, I assume most of you use GitHub to store your code. And if you happen to be using GitHub just after Thanksgiving last year, which is when we happened to break this paper, GitHub went down for a bunch of accounts for the better part of the day. So they had this kind of complete outage. And after the fact, they did a post-mortem and discovered that what had caused the outage was actually a misconfiguration between a pair of switches that caused them. So, GitHub, you know, they're smart people. They build basic infrastructure for programmers and they couldn't get sort of a fundamental problem, I would say. There was another instance example in Amazon a little earlier. They were trying to implement a sort of routine maintenance task and they made a mistake in shifting a bunch of network traffic onto a backup network and this triggered a sequence of cascading failures that eventually took out their Holies Coast data center for a couple of days. GoDaddy had another issue, similar kind of sort of internal network events that led to corrupted router tables and led to an outage, taking out their DNS service and the customer sites. My favorite example, United, who's about two years ago now, had some connectivity issues with the network that caused their flight system to become unavailable and actually led to a full ground stop at SFO. So my favorite example is instead of issuing tickets like this, which is my ticket for today, they're actually handing out tickets that look like this written by gate agents and pen. So we're kind of in this weird situation where networks are a really critical infrastructure. We use them for programming, for business, to the Amazon, DNS, even airlines, and yet we kind of build them using very rudimentary sort of 1960s-era software techniques. For the most part, many networks are actually running operators going in a manually interacting with the command line interface and programming network that way. And so the result is that there's not very reliable and this is pretty crazy. So if you're a, so my training is in programming like a business, if you're an appeal person like me, when I first started thinking about this, I kind of had a simplistic sort of a bunch of devices that sort of get bits from here and there. And the reason that this problem actually turns out to be fairly complex is that, of course, networks are not so simple. There's many, many different kinds of devices involved and there's many different languages and programs that are probably getting bits from here and there. So just to kind of show a cartoon example, if you kind of take, you know, an interaction between, say, my laptop and a certain YouTube, there's not just sort of, you know, a series of tubes or a bunch of switches between the YouTube. There's many, many different kinds of devices. So you have, like, my laptop, a bunch of posts, maybe connected to a local network by internet switches. You know, on the other side, you know, a bunch of YouTube servers that are serving up the videos. You can maybe do some routers. You can end a load balancer. There's some kind of gateway router that sits between the data center and the rest of the internet. Of course, there's other ISPs. So you've got things like PGP. You need firewalls to deal with security. You might have wireless hosts. So you need things like wireless gateways and middle boxes and so on. And so, you know, a fairly simple thing like my computer connecting over vanilla IP to YouTube actually has to traverse, you know, many, many different kinds of devices. And each of these devices runs its own program in its own language. And so even checking a very simple property like, you know, I have basic connectivity is fairly challenging. You have to reason about all of these programs and all of their interactions together. So you guys here have been developing a new kind of architecture for networks where they're kind of in the most extreme version. The vision is to replace all of these special purpose boxes with very, very simple boxes running a standard, very simple, and elegant program language. And so if you want to think about things like building reliable networks and things like verification, this presents a great opportunity because instead of having to analyze and pick apart, you know, all of the programs for those many different kinds of devices that I had in the previous slide, you can imagine building tools that represent the meaning of the programs written for this standard and simple and canonical device and then starting to automatically reason about their properties. So I probably don't need this slide to talk for this audience, but I'll include just the same. So what I was talking about in my network, in my view, the key two ingredients are this idea of generalizing and standardizing data planes and separating forward and from control. So instead of a legacy network like this where you have many different devices, each running their own program, maybe different programs, instead you have all the programs running on one or more controller machines interacting with a bunch of stock devices like SDN, the one that I'm going to focus on in this talk is that by having a clear and simple specification for what devices are and what programs they run, we can start to build tools that capture precisely their semantics and we can start to reason formally about the behavior of network programs. So just to motivate, I thought I'd start with a very, very simple example. You were at Arjun's talk a couple of days ago, you saw the same example. So here's my network. I have three features that come up in more complicated settings. There's just one switch. It has four ports and it's connected to three hosts, one, two, and four for some reason, and a logger that's going to be a middle box sort of monitoring web traffic. And so the policy that I want to implement in this network consists of three high level components. There's a security component, so I want to block all SSH traffic. There's a monitoring component. There's a traffic to be diverted to this middle box which is going to log in and make up some postdoc analysis after. And then there's sort of a forwarding and routing policy which says I want to provide connectivity between the rest of these hosts. So very, very simple. But I'm going to show you that I actually implemented this policy even using something like SDN or OpenFlow turns out to be a little bit complicated. So let me just skip this slide. I think everyone in the room knows what OpenFlow switches and controllers are. So if you want to actually write a program that implements that policy correctly and by correctly here I really want to, at all times, enforce my security policy. At all times, enforce my monitoring policy and provide connectivity and have some failures, then you really have to deal with a bunch of low level issues in writing your program. First is, you know, what I'd like to do is write a program that somehow configures that one switch so it has this forwarding table. You can think through it as I talk. It basically has a sequence of rules they're in priority order and they do things like the top rule filters and drops SSH traffic the middle rules detect web traffic and send them both to the destination to the middle box and the rest of the rules for the remaining traffic to the host. So I'd like to get that switch into this state. And so the first thing you think is if you're using something like POX or NOX you might write a controller program that looks like this. So here I have to find a handler for the switch join event so the switch connects to the network to the controller and I get this identifier and send it a sequence of messages that instruct it to install rules corresponding to the entries in the previous slide. So this sounds good but of course what can happen is while these messages are coming down to the switch we could have traffic that's traversing the network. And so in open flow network these packets that are coming in before the rules have been processed and installed by the switch will get diverted to the controller. And so I actually have to write a second program which looks pretty similar but not identical that handles the packets that are diverted to the controller. So here I write a different event handler the so called packet an event which receives a packet from the switch and basically applies the corresponding actions from a policy to that packet and does that by explicitly sending out a packet message. So of course this creates a problem because we now have two realizations of our program of our policy in our controller program and it's important that these disagree and you actually might notice here that I have a bug the packet in fragment here doesn't correctly handle SSH traffic and so it will be allowed for packets that might arrive while the rules are reinstalled. And you might say well this is silly we should never write programs that generate packet ins but in general the network's not going to be static your policy might be dynamic and change over time and so you're going to have to deal with periods of transition where there's rules coming down and traffic traversing a switch that may or may not match all of those rules and this leads to a situation where you have replicate a function on it. So a second problem that can come up when writing these control programs is that switches are actually free and many do to reorder messages. So going back to our first event handler the switch joining function we sort of enumerated the rules in order from highest priority to lowest priority and so what we expect is that the switch when it receives and crosses these messages will basically take us from an empty flow table to one that's fully configured with our flow table and I've elated the priorities here that you can read them as going from top to bottom so we a naive program in their head would think well what's going to happen is the first message is sent to the switch, it installs a rule the second message is sent, the third message is sent and so on and eventually we get all the way down to the full table so if you look back carefully you discover that switches are actually free to reorder messages so the switch could receive a bunch of the messages buffer them and then choose to install these two rules instead so this rule just says everything coming two host one forwarded out port one and host two forwarded out port two and so this is a problem who want to be actually even then for example our security policy we've now at least until the rest of the rules are installed we have to go back to flow through and we talked to some friends at google and they told us that actually issues with reorderings of messages have caused problems for them to pass so switches do buffer and reorder control messages and have to install cheaper messages before installing more expensive messages so if you want a reason precisely about the forwarding behavior you need to deal with this kind of situation the solution is not that complicated you basically have to realize that the switch has this extremely built in and insert into your program some explicit synchronization using barriers which basically say don't proceed past this message until you've processed all previous messages and let me know when you've done that so in this case we'd be sure that our policy was at all times enforced because the rule to which we're off SSH traffic would be installed before any forwarding was through and likewise for the minor rules so again sort of not rocket science and this is a very big example but it's another kind of low level detail that is easy to get wrong and leads to completely opposite behavior so a third issue is that this is maybe again sort of just misreading the spec but if you're not careful you actually have you can end up with patterns that you send in messages that don't mean what you think they do so here I wrote down patterns for each of my rules that matched on things like destination IP and destination port but actually this pattern here for example destination port 22 if you write this literal code in pox it matches all packets and the reason why is that if you look at the open flow specification the way that a switch will interpret this kind of message corresponds to this flow chart which says that basically you need to walk up the packet starting from ethernet and going up the layers and you only proceed to higher layers if all of the dependencies at lower layers have been specified so a pattern like destination port 22 is exactly the same as that pattern not being there because you haven't said that the ethernet frame type is IP and the network type is DCP network protocol type so again, you know not rocket science and a good programmer is not going to miss this detail but it's something that's easy to get wrong and that leads to a completely different implementation of your policy so of course what you have to do is just write these more complicated patterns that have the property that all of their dependencies are satisfied so these are all kind of simple low level things and for a program involving one switch you can imagine that a program might get them all right but if you're talking about a program that's controlling hundreds or thousands of switches or if you're talking about a controller that provides higher level abstractions like some kind of programming language and that's a compiler and an optimizer that's actually managing these rules for you these little issues can cause bugs and in fact all of these issues have caused bugs and some of the research controllers that have been proposed by our group and others so since they're in the room so Arjun had a system of pain that he developed at round that had the first three of these issues we've been building a system of that core that had the last four and there's a system out of the vehicle that also had bugs by having some of these same issues so that should give you a flavor of even for a simple and you know in some sense overly simple specification of switch hardware like old info 1.0 there are a bunch of these low level details that I think ordinary programmers are not very good at managing so to help programmers get these things right there's sort of a cottage industry in the network community Payman and Nick and George and some other people have been developing tools for checking configurations automatically so there's things like flow checker which was done by some people out of in 2010 ant eater from Brighton Godfrey and that Cesar at UIUC NICE which was done by people at EFL headers based on here Veriflow which is done by the same people as ant eater and others and the way these tools work is they basically a little different. The rest work by basically wrapping a controller with a runtime monitor that inspects the sequence of messages that are going up and down and checks for violations of safety properties that's a little bit of a simple explanation. Some of these kind of have a more static component with actually build up a model of the overall network configuration check properties but that's the basic idea. So I'm really excited about this I think it's wonderful that that your community is getting excited about the kinds of tools that usually don't leave sort of PL ivory tower and I think that these kind of tools have the potential to have a huge impact. There's a couple of things that I think are a little less good about these tools. One is that you know all of these the design of these systems had to work really hard and come up with really clever optimizations to make them scale and run fast. For example, as you set a space to analyze the Stanford KEPAS network and had to come up with some quite clever optimizations if you just did the sort of naive way of encoding these configurations, there's blow up all over the place and you wouldn't be able to run on anything of realistic size and therefore it was pretty further actually much farther actually building a tool that they can be run sort of online. But fundamentally they're dynamic tools and there are sort of fundamental flexing limits that you're going to run up against in any of those decision procedures for checking these kinds of properties. Another issue is that each of them actually sort of picks a custom mathematical foundation. So their notion of correctness rests on a model of a switch and a model of a network that is inspired by things like open flow but maybe not as precise as the PL community would like. And so there are issues that are not represented in some of these informal models for example some of these tools don't look at barriers at all and so they just assume that switches will install messages in order and as we saw this can lead to the complete opposite behavior that you expected. So what we did in this work was to try to come up with a slightly different approach and that's to get away from verifying low level configurations at run time and instead have a system that lets us generate configurations that are guaranteed to be correct. So, you know, if we've done our job we won't actually have to check these things because of a mathematical proof that all of our configurations, all of our messages all of our control software is correct according to some specification and we'll be able to do reasoning about the behavior of the network using programs expressed in a much higher level of abstraction and we'll get a very rigorous proof of correctness by actually showing that the translation from our high level language to these low level switch rules and run time systems and such by carrying out that proof of correctness in an mechanized proof of system. One other piece of work that we're going to do is actually not prove the correctness against some sort of informal ad hoc model but actually develop a very detailed mathematical model of what a network is. And so all of our theorem will rest upon this very careful sort of transcription of the open flow standard into mathematical structures. Okay. So some of the questions that Arjun got during his Ones talk made me realize that some of what's possible in terms of full methods research may not be well known by this community so I thought I'd give one slide that kind of shows the kinds of things that people have been doing. There's been this amazing amount of progress in the last few years and it's now possible to build large realistic software systems using verified or certified techniques. So a couple of the really high profile success stories you've probably heard of the SCL4 operating system. This was a major effort out of Australia actually verifying the behavior of a microkernel against the reference implementation in a mechanical proof of system. There's another system that I really like that you may not be aware of. That's you know, what kind of the kernel I believe is written in C it's proved correct against the reference model implemented in Haskell. So they basically wrote down a very simple model of what the behavior of the kernel should be and then show correspondence between the real C code and the Haskell. So another system that's in the PL community I've been quite happy about is the CompSert C compiler. So this is a project by the law. And So what CompSert is is it's a compiler that goes from a large but safe subset of C all the way down to x86 and there's a proof of correctness that all of the executables produced by the compiler are correctly realized the semantics of the C program that started that way. And there's another system out of MSR called that star that's so, you know, formal methods which kind of in the 60s, 70s, 80s was, you know, really only applied to kind of really small abstract kinds of systems has gotten to the point where you're able to actually build things like an operating system or a compiler or there are other examples, database systems or a network controller and actually prove their correctness formally and mechanically. The reason this kind of progress is impossible is that a lot of the tools that have been around for decades have gotten very mature. So things like ACL2, Isabel Hall and the Cockroof Assistant now have really rich user interfaces large libraries of theories and theorems and there's just sort of a lot of sort of folklore wisdom that is available. And so if you want to try and build a system and prove it correct, you don't have to kind of start from the very foundation you can actually stand on the shoulders of sort of the people that build libraries and such for these other systems. In fact, there's even textbooks now. So I do my PhD at Penn. We now teach our PL course in a proof assistant, there's a book called Topic Foundations and out of Chipala, MIT has a book coming out at any time now from MIT Press called Certified Programming Defended Types. So the kind of techniques they need to build these kinds of systems are becoming sort of pedestrian enough that you can teach them to advanced undergraduates and graduate students. So what does this actually look like when you build a system? So the basic idea is that you write your code I'm going to show you kind of snippets of COF code and this is basically sort of simple functional language so you just sort of just write your system just so you write it in ordinary language you then do a ton of work so this box should be like twice as long as that box you do a ton of work to prove that your code has the property that you want and then you can actually extract the code to you know code in the ordinary language that you can then compile and write so basically when you build a system in this style you get a very strong guarantee that the system that you've implemented and proved correct is the one that's that you're actually writing so some people call this sort of certified program. Okay so I thought I mentioned this is one of the questions I got sort of asking what's the limits of this kind of technique like can you add FPGAs or could you have arbitrary functions and I just want to be clear this isn't like we're using some kind of automated theorem proofer to do this, we're not using like a SAT solver or a model checker this is actually a very rich programming language and a very rich sort of logical language and you can do any kind of proof you want at great effort but there's kind of not fundamental limits in terms of what you can express in any community so how does that work? In the sense that do you define what the output you want to be is etc. Yeah so there's a couple of different ways you do extraction some systems work by actually taking the confidence of your proofs and getting a functional program from that for LISP or for OCaml or for Haskell that's not what we do the other thing you do is basically take which was written in the proof system in its logic and extract from basically do a very simple translation to say OCaml so to say it another way the language that you use for proofs like Coq is a it's called type theory and you can both represent computations and also proofs and you can basically translate the computations into code for other functional languages there's very simple correspondence so if you're the kind of person who's kind of paranoid about where could bugs creep in you should worry about a couple things so this translation that goes from the program for the proof system into OCaml is not verified you have to trust that and you also have to trust of course the compiler that you used to compile this code into a binary the kind of argument people often make is that well this translation is A very simple and B used in lots of different systems and so it's very unlikely that a bug in your system would sort of manifest in the exact same way in the extraction process so you mentioned quite a few research successes so do you have to do that on some commercial successes of verification and actually proving that your software is correct yeah so the general trend is that in industry there'll be some sort of high profile and expensive mistake like the Intel Pentium bug and then people start to invest in full methods because they realize that sort of using ordinary engineering processes to build these kinds of systems is not as reliable as they want to so actually Pentium bug Intel invested in a big time to inform that it's mostly things like model checkers and sent software things like that I know that Zalbi Bawal who is called CompSert works closely with things like the European people at Airbus and the European space agency for building the kinds of software like that there are others so what is our system going to look like I kind of gave a very 50,000 foot bit of a high level language like it's translated into open flow when I take you through the compiler phases in real market tail so the idea is we're going to start a high level language called Netcore that we've been developing in the frenetic group for a couple of years and we're going to build a compiler that goes from the Netcore language down to what we call flow tables these are sort of idealization of global switch data and then we're going to optimize those flow tables and then we're going to take the flow tables and convert them into open flow messages to get sent to the switches using a runtime system or a controller and we're going to prove that at the end of the day the behavior of this system that you actually run and we're using a mathematical model of open flow we call federally open flow correctly implements the semantics of our language so I'll show you two pieces in detail but think of this just as a standard kind of compiler going from your source language down to your machine language so each level of abstraction we're going to represent in a proofed system and then we're going to define the syntax and semantics of each of these kinds of programs each of these arrows that goes between them these will be realized as faces of the compiler and we'll build machine check proofs that each of these translations preserve the semantics of the upstairs level and then to actually get a controller we can run we'll extract this to a camel so let me take you through each of these levels let's start with Netcore I assume some of you have seen this but not all of you so Netcore is a high level language for programming open flow switches it's not super high level it's not like Java or Askel or something think of it more like sort of the C of open flow so in Netcore you basically describe the behavior of the network declaratively using standard kinds of programming features like standard OR and conditionals and such and then a compiler turns these into flow mod methods so let me just step you through some of the pieces of the language step by step the thing to notice is that you're not programming with rules you're describing the behavior of the network so the first component is we have a syntactic class we call predicates and what a predicate does is it lets you describe a set of packets located somewhere in the network pick off a packet that's parked at a particular switch or at a particular port on that switch then you can match on header fields like source ID and destination you can also apply transformations to packets so these we call policies or programs so you do things like forward the packet out of a different port modify a header field or actually query the packet and this will cause the packet to be a divergent controller and then you can combine things like predicates using OR and validation and you can compose policies in parallel so I think this is sort of do all of policy one and all of policy two and get the behaviors of both of them and you can do things like condition a policy on a predicate so if a packet is parked at a switch on a port then do P and otherwise do nothing so there's a bit more to the language but think of it just as sort of this high level declarative language where you kind of specify the polling behavior you want and then let the compiler generate the rules so here's going way back to our simple example here's how you write it in netcore basically just say if the packet is not SSH then forward it between all of the hosts using the destination IP to determine where to go and also forward all of the traffic to the middle lines so it couldn't be much simpler than that just sort of declarative say what behaviors you want union them together and then restrict by your security policy okay so the first step in kind of building this verified controller is we have to say what is the meaning of a netcore program and so what we can do is define in a precise way the semantics of what one of these policies means so the way we do this in our system is we use what's called an operational semantics so we're basically modeling the behavior of the system as a series of transitions between states so to do this we need to identify what are the states of the system and how can it step between states and so what we do for netcore is so if we have a netcore program P the state of the system is going to be a bag of packets that are in flight in the network so we'll have this big collection of packets that are in the process of being forwarded and what we'll do is define a little step relation that goes from one bag to another bag and I'm using sort of standard bag notation here this stands for a union of this bag and this bag and this stands for a singleton containing just this one located packet and the way you should read this is program P will step the network from a configuration with packet LP and other packets M to a configuration with the union of bag M and bag M prime if when I take my netcore program and view it as a function and I apply it to that packet I get M prime so in general a netcore program can basically take a packet and produce zero more packets and all this says is to sort of run the network one step you just pick out some packet non-premistically from all the packets that are in flight you can supply it to a program see what packets you get so it's a very simple operational description of what the network does there's lots of things we're not modeling but what you care about is things like reachability properties this is sort of sufficient to any questions about this this is kind of standard stuff for the audience but how familiar are we with operational things so the point is that this is basically a very precise description of the A relation which is what the network does so models hop by a forwarding behavior it abstracts away completely from the fact that there are switches and controllers and messages and so on we can view the behavior just as this kind of step and so it enables very simple reasoning about network-wide properties so the next step going down is we have to come up with a translation from this netcore language to things that are more like the tables on switches so we'll build a compiler that does that so what is a flow table? well it's basically an intermediate representation of network state that is kind of like what RTL does in an ordinary language so it's kind of one step closer to what switches actually have but a little bit idealized in that we have sort of the tables for the entire network in one structure or the fact that tables might be finite and stuff like this step or just have sort of very large tables so the goal is to basically go from our netcore programs into these tables once we have a table like this we can view it as just like a netcore program a function on packets and define it semantics in the exact same way as an operational step where at each step we nominously pick one packet plug it, run it through the flow table get some more packets how does this compiler actually work so it basically has to go from programs on the flow table and does this by basically flattening the program into tables and it does this using an operator we call Intersection so let me just give you a flavor of how it works so suppose that we're compiling this program here it just consists of a single policy fragment IP H1 go to port 1 this will generate this little flow table here which is one rule that matches on the Ethernet type and then does the corresponding we can have another program over here that matches on port 80 and goes to port 4 that generates this flow table and now if we want to take the union of these two things what we need to do is basically intersect these rules generating this rule here and we take the union of their actions because we're doing both actions and then we concatenate onto that the flow tables themselves so by defining a kind of intersection operator on the flow tables we can actually implement things like union so this is kind of the keyboard question I just understand the capabilities of the device that the device has multiple tables so right now we're just targeting a global point of no so nothing I will talk about involves multiple tables or type tables or things like that we'd like to do that in future work and even if we did an idealized representation like this so this is this is kind of similar to the level extraction of no sex I think it's a great first step is kind of to go from your structure language to these tables I'm not sure if this is the right point to ask this question but what is the expressibility of the network language it seems fairly restricted in terms of what I can express in terms of the behavior I want to work can you give me an example let's say I want to implement a load balancer so I want some fraction of packets to go to go out of one port for the same match I want some fraction to go out of one port and some fraction out of the other port can I express something like that let's say I want some behavior based on the time of the day if it's past 10 p.m. I want to have a snapshot of the load balancer status we can represent it in net core and it's not hard to actually show that open folder tables can all be represented in net core what we don't do is things that are staple in dynamic what we do in our controller is we put all of the dynamism in a program that sits above net core that generates net core policies the interface to this higher level is I have that query thing what query does is it reads network state and constructs channels that it feeds that state up to the higher level program the high level program can do whatever it wants it can look at clocks, it can do random numbers it can talk to a Kerberos server and it can generate then a stream of net core policies which capture instantaneous snapshots of a network configuration so net core policies are not what I write as a programmer as a programmer you write a typical program in say OCaml that's just a general purpose program that reads on these channels and emits net core policies and then you do not write net core policies this is a design choice we could add other features to net core that are staple and dynamic and we've actually played with that in the past because we were interested in kind of this very precise kind of reasoning that actually seems in my opinion nicer to kind of split the dynamism into a higher level okay so let's speed up a little bit since I have about five minutes I think so the first theorem that we have to prove in building this verified stack is that our compiler is correct and so there's one issue that you have to deal with right away which is that the naive compilation out of my kind of sketch for you actually has a horrible blow it's exponential and we just we ran out of policy with I think nine clauses and like we had to kill it from the right for so long and so what you need to do is actually build in an optimizer that takes the flow tables and simplifies them at every step so we have an optimizer verified stack that actually removes empty and shadowed rules and if you throw this in it turns out that the complexity reduces and you actually it's hanging around so at the end of the day the correctness theorem for our compiler which we've formalized in Coq and proved is right here and it basically says for all optimizers that preserve semantics if you evaluate a policy against a particular packet after this which import with a particular buffer ID that's equal to evaluating it against a flow table obtained by compiling that policy so it basically says the compiler produces an equivalent flow table after optimization so to prove this kind of work so that the actual top level proof kind of falls out as only a few lines doing things like developing a library of algebraic properties of flow tables and adding some new automation to Coq for doing things like proving equalities between banks and in addition there's sort of a key invariant related to that second bug that I showed you which is that all of the patterns that are generated by the compiler all of the matches need to satisfy that just give you a flavor of what this looks like this is sort of an invariant that the compiler has to maintain at every step and it basically just says if you're matching on things like TCP fields then you better have specified the protocol's TCP and the infinite frame type is IP so this is a big type in the Coq proof system that captures that condition precisely and part of proving the compiler correctly showing that this property holds at every step of the compilation okay so next we need to go all the way down to actual open flow messages and so to do that we need some model of what open flow is and if you start by looking at the spec well there's even for open flow which is very simple it's a very long spec and it's not really a mathematical object you can prove something about what it is is a bunch of very carefully written English sentences some diagrams and flow charts that capture the system and and see structure definitions for the messages there's a very nice specification and you know you can read it nice bedtime reading but it's not something you can really prove with your notebook so what we did is to take that specification and extract from it a model which is a precise mathematical object that you can prove something about which we call featherweight so the idea here was not to try to capture all of the gory bits of the specification but just to pull out a relatively small extract from it that captures everything related to packet forwarding and the lives of the details so this is what the model looks like here's the syntax it's too small to read it's intentional but they're sort of you know the model has elements for things like controllers and links and switches and packets and those are specified as types here and then it has a bunch of operational rules that are like the judgment that I showed you for how net core programs step in this case the steps are much more complicated because they're things like controllers sending messages down the switches switches pulling messages off of wires putting them into their buffers switches reordering their buffers and then pulling messages out installing the flow tables so this kind of captures all of those behaviors that are specified informally in the spec so the key things we did when designing this model was we wanted to make sure that the theorems were improved accurately reflected or adequately reflected the forwarding behavior of real networks so we made sure that we didn't ally the details that were related to packet forwarding and in particular we reflected all of the essentially synchrony in our model so we didn't you know impose order where there might not be any in the real world and we also didn't want to bake in just the net core control we wanted to come up with a framework to even instantiate with other controllers and so in our model there's sort of a very abstract way of having controllers so let me just show you sort of highlights of this model so we kind of go from C-structs in the spec to COC data types where everything is like forwarding we go from flow charts like this that I showed you into code that captures exactly how pattern matching works and so the other day Arjun and Mark basically did a ton of work to take these elements in the spec and turn them into types and formulas in COC's logic that exactly capture all the features laid to packet matching forwarding flow table updates and so so we also dealt with all the synchrony so things like these informal sentences in the spec turn into structures in COC with bags and bags represent things that don't have an order we also abstracted out the specific controller in our system so that we're not just proving theorems at the end of the day about net core but we can actually prove theorems about arbitrary controllers the way we did this was basically to have a mathematical definition with the controller kind of has abstract state and a couple of steps for how it consumes and produces messages a quick question about the bag do you assume that the bag is sort of a finite size like the bag can be more labor-fine in a number of times that's a good question for Arjun yeah, thanks for some finite numbers and answers like for example are they bounded? no, so I'll here these bags are unbounded so you have actually proved that there are less of all possible methods on things that they can have if I have to say a million methods they can have a million packets of possible audits your program is right given all the audits let me answer something slightly different so what we do want to account for is packet loss due to buffers overflowing okay I'm just curious like is the theorem actually proving that given all the possible offerings of let's say a million messages yes, absolutely and there's a good tool for doing that, it's called induction so you know that the bag is finite but unbounded and so you consider an arbitrary element okay so the controller model is very abstract and so to actually prove this theorem that the system correctly implements flow tables we again wanted to do this in a kind of general way, so we wanted to be able to handle things like a naive controller that maybe doesn't install any rules a so-called reactive controller that works like you think did just installing rules and reaction traffic it sees and also proactive controllers that compile and install flow tables before they see traffic and so we came up with the general framework for doing this this is a slightly technical but the theorem that we actually showed the equivalence is what's known as a bisonulation so I see a weak bisonulation and what this says is every step that the controller takes in the high level abstract model like netcore there is a corresponding sequence of steps at the open flow level that produce the same observable behavior and vice versa so this is kind of in things like process calculus and concurrency theory this is kind of the canonical notion of equivalence for non-chromatic concurrent systems and so here's the kind of snippet of the main theorem in Coq it's highly cleaned up but it basically says we have a weak bisonulation between the concrete and abstract systems the key ingredients for doing this are actually we did this in a very general way showing that as long as your controller has a couple of natural properties a safety property a liveness property saying that at all times it correctly approximates your hallow program then we can prove this theorem for you automatically so the takeaway point we didn't prove a specific theorem for netcore we came up with a general technique for proving controller correctness and to use this you only have to prove a couple of natural properties of your controller and you get this theorem so very quickly and I'm running short on time we actually compile and run this thing so it's a real system it's about 12,000 or maybe 15,000 lines of Coq there's about 1,600 lines of blue code no camel and it consists of all of the abstractions I talked about and then we extract it, compile it, and run it and we've been using it both on production, traffic in our lab and Arden's using it in his house so here's some pictures of the things we've been running it on we have a project switch we're using it on the lab and Arden has a cheap Wi-Fi box in his house he's been using it for and we've been running it on a bunch of sort of simple economic applications like there's discovery and shortest path routing and broadcast and such we wanted to see that building our controller using these verified tools wasn't going to cause performance to completely suck and so we did a couple of very simple micro benchmarks this isn't any kind of comprehensive performance evaluation we haven't optimized this performance at all so don't take too much away from this but it's kind of usable for lab settings so the first experiment we did was just to run Cbench and see how fast our controller could respond to messages and as you can see it's basically sort of faster than other prototypes that are not designed for performance but much slower than things like NOx which uses C++ and even our previous unverified macro controller in Haskell and the reason is basically we don't have multi-core support and we have a little bit of latency for the glue code going between COF and OCaml so we expect this to get better as we start to optimize some of this code but it's not so slow but it's useful we also did a very simple experiment just to look at what does the control traffic look like over time so this is again just a micro benchmark but we took a simple Waxman graph with six nodes, two hosts per switch and we built an application that both broadcasts along a spanning tree and does point-to-point forwarding between each of the hosts and generated a bunch of things to a broadcast address so basically the traffic is one host will ping the broadcast address and receive replies from everyone else and here are some time series there's no traffic for a bunch of different controllers so this is our verified controller as you can see it starts with a big spike of traffic as it as it installs all the rules for all the flow tables and then after that there's basically no traffic, these are just the echo requests that are part of the keep alive part of the help of the protocol and this is exactly the same as our unverified controller more or less which was written in Haskell so this is again these are time series with experiment duration on the x-axis and the amount of control traffic and the log scale on the y-axis just for fun we compared with a completely naive controller one that does all processing on the controller and here you can see that when you get the broadcast there's a spike of traffic and then just all the rest of the traffic going through the controller so it's just a time and here's a ethane style micro flow controller that only installs exact match rules and so you can see when there are responses so the takeaway is basically we haven't somehow fundamentally changed the behavior of our compiler by doing this in performance with respect to proactive compilations competitive I'll just mention this in 10 seconds we've also built a tool for automatically verifying network programs this is sort of analogous to Payment's Hedderspace tool uses the same kind of reachability encoding and we use it in 3D as a backend software so to sum up networks are important infrastructure that we still build using the 1960s, 1970s era techniques I think SDNs are exciting because they provide a very clean foundation that you can imagine building a mathematical foundation on top of and using this to build verified, certified and so on tools and as a I guess Penny did the first step so as a second step in this direction we've been building a machine verified controller and exploring verified compilations of languages like Netcore kind of out of time so let me say we've been doing a lot of other work in frenetic the one piece of work I'm going to point out is we had a SIGCOM paper last year on how you can handle updates and so you might worry that as Nikhil asked, our Netcore is very static so if you have a dynamic program what kind of guarantees do you get and the main theorem in this paper showed that if you implement the transition between the medical programs using so-called consistent updates then the properties that are common to both programs are preserved across the update so if you stitch the theorems in this paper and this paper together you can actually get guarantees about dynamic networks as well lastly so this is part of the frenetic project which is about work with Princeton and lastly just one advertisement so if you are interested in this formal verification approach to networks come to Cornell maybe like the weather is here we have a great group of lecturers who are lecturing on topics related to formal verification networks so in particular, Penguin and Nikhil will be talking about Hedderspace but lots of people as well okay, thank you for your time so when I think of what controllers do one of the most fundamental things is topology discovery and is that handled by your runtime system? yeah, so the compiler is basically compiling flow tables a particular switch so it knows which switches there are before it compiles that's how is the network map that SDN tends to provide the programmers exposed up, I guess is more my question so how do I know where my links are and then build on that so the I feel like the focus is given a policy, how do I compile it but I'm kind of missing the bottom so the compiler needs to know which switches there are to compile their flow tables to improve them with respect to a particular set of switches and the runtime gathers up that set of switches by consuming the switch up events and building up a data structure okay, so I can think of it as maybe a microcontroller whose pure purpose is to understand link connectivity there's a little application you can build using Netcore so the Netcore policy is defined on switches but I don't know what the switches are at first so there has to be a slow definition of I'm really trying to understand that right, more profoundly, which I was just saying you need to know which switches there are to compile their rules so that part is baked into the runtime if you want to know about link discovery then you should implement like every controller does you know, an LDP like protocol for discovering which switches are going to have to do each other and that program you can write in Netcore, the real Netcore which is a slight extension of what I showed you today so real Netcore includes the ability to for example push packets into the network so you can use that to knocks or poxes or the standard controller's notion of discovery are the differences between real Netcore and verified Netcore largely things that you don't have the tools to prove yet or the time to prove I wonder if you're trying to understand the delta and the reasons for it so it's a there's certain things we have to have particularly to have a verified topology discovery module we would need to extend the verified part of the controller to include general synthesized packet out messages which is what you need to do to build an LDP like protocol that's something we haven't proven yet but it's something you can do with Netcore except that the theorem doesn't hold for that particular application so our actual system is structured as a part that's been verified some unverified parts and we've also laid on top of that some additional unverified parts to handle things like injecting packets that have not been proven correct at all we can prove them correct also but that would be a ton of work and academic contribution paper is we want to kind of separate from the system that we're building because this is like 10x or more slowdown and you have to develop and change every single function for it hasn't developed a productivity slowdown and I have one other question if anyone else wants to okay so you mentioned cock code is running at runtime did I hear that correctly? extracted cock code and that's part of the reasons it's a little slower that the code is coming from a higher level so there's two reasons it's a little slower so one is OCaml actually doesn't have great support for multicore so our Haskell controller which has great support for multicore was doing the standard thing of having a thread for every switch and so for benchmarks like Cbench that thread could basically process messages much faster than just typing a single thread so we don't have that in OCaml that's one reason and the second reason is we have our cock code which extracts to OCaml and then there's sort of the glue code that's unverified that goes between binary wire formats and data types that were extracted from cock and so there's some extra translations that happen that are really compositions of two functions and you have to basically take bits off the wire apply this glue code put it in the cock back out of cock, back out of the wire so there's this extra glue code that inputs that translate into every single message okay so that definitely answers my question there's no code that you run once it's every time a net core policy gets defined you're generating the correct outputs and your method to verify the correctness of translating the net core inputs to flow tables and everything below is using cock to do that so it's using it at runtime the compiler, the optimizer, the runtime system those are written in cock they then do extracted to OCaml and linked again to much of the code and at the linking boundary we just basically haven't been smart or haven't done it in the most optimized way to basically make it as fast as possible and so there's extra shin code that gets executed on every single little event that you wouldn't have in things like the same code for example had hand optimized partiality things that did smarter things we haven't quantified that directly but you can see from this benchmark so this was our previous controller and factor 3 slower I expect we should really make this much better literally done no profound we just wanted to show that you can run the thing if not then be able to yeah we haven't quantified the limits so the other question is I guess how does it do you have plan to integrate that with branching as an anchor? yeah so we've actually kind of jettisoned our Haskell implementation for the moment and we're using OCaml and Octave it's a little unclear to me how much we'll continue with this verified base it's central we actually I think over time as people start extending things and working on different run time systems or different mileage extensions I don't expect everyone to always accept that 10x or more slow down if we're really correct so over time we may just move to another system certainly not the trick has come up with a way to be partially verified yeah so one nice thing about this system is you can actually use components if you want to call the compiler write your own under verified run time if you totally do that I'm curious to know that the other verification products that you mentioned they actually go to the extent of verifying not just the hardware program but also the XA6 core that is generated by even developing a much less hardware and verifying the hardware is actually doing the right thing so what is the boundary of verification that you push? how do you watch that from the XA6 input? so all of these platforms make decisions right and it was actually a little imprecise to describe what ComServe does so they use a technique that isn't actually verifying the compiler correct as we have observed one it's that they actually generate certificates for particular programs that show this output directly from this input so that their compiler produces the certificate and they have a proof checker this makes it slightly easier to build various spaces of the compiler than sort of doing this once and for all so that's one difference is that you actually have this kind of static once and for all guarantee you get for particular programs it was correct and that's probably okay for a compiler in addition they don't ComServe stops at its model of XA6 so it doesn't there are other people working on verifying different chips but that's where they say we give you a correct XA6 program so the way the boundary where you stop here is that you're assuming that the open flow agent is on it? so bogus switch, this is actually something that some of the payments workers in the other tools address is that basically you could imagine detecting bogus switch limitations so it's going to be a test package generation because there could be behaviors that are logically impossible but of course the hardware does something bad and so doubling back is sort of a fair amount of issue and we're not okay thank you very much