 Good afternoon everybody. I'm Kong Shen and I am delighted to introduce today's distinguished speaker, Professor Jennifer Rexport. Jennifer is a golden professor of engineering and also chair of computer science at Princeton University. After finishing her PhD from University of Michigan in 1996, she worked at AT&T Bell Labs or AT&T Research for eight years, then joined computer science department of Princeton University in 2005. It's actually 15 years ago, and she has been outstanding in every aspect, receiving large number of prestigious awards, including Grace Hopper award from ACM for outstanding young computer science professional and also ACM, the Athena Lecture Award and also CICM Award for Lifetime Contribution and also IEEE Internet Award. She is also IEEE and ACM Fellow and also a member of American Academy of Arts and Sciences and the National Academy of Engineering and Science. Without further ado, Jennifer, this is all for you. By the way, if you have questions, please send the message and at the end of her talk, she will go through some, if not all, of those questions and then answer them. Great. Thanks so much, King. It's a pleasure to be here even if only virtually and I'd like to make this as interactive as we can, given the limitations of Zoom. So by all means put questions in the chat and I'll stop at a few times during the talk to see if there are any questions. The Internet is really a remarkable story, an experiment that escaped from the lab within my lifetime to become a global communications medium. And part of the beauty of the decisions the early designers made that helped make that possible was to have, you know, a best effort packet delivery infrastructure that was quite simple, leaving a lot of the functionality to the end host applications and other services that run on the end host computers. And to me, the real takeaway lesson of their early design decision was by putting a lot of the key functionality of the Internet in programmable devices the computers that we all run, that they really enabled who could affect change who could innovate, lowering the barrier to innovation democratizing innovation if you will. The result of that is remarkable innovation in the devices that connected the Internet, the applications that run above the Internet, and even the media that support the Internet from below. The one thing we haven't seen is remarkable innovation ironically on the inside of the Internet. And I would attribute that to the lack of programmability that where you can program you unleash innovation by allowing a larger set of people to innovate. And so what I focused on a lot throughout my research even in my graduate work at Michigan was in trying to make the infrastructure more malleable and more more programmable to enable it to adapt to new new demands and again unleash innovation. And you might say well why do we need that innovation on the inside and the whole point was that the inside be a best effort packet delivery mechanism simple. In fact, what we've seen on the inside is is that the Internet's design is complicated over the years it's over specified through slow standardization processes standards embodied in fixed function hardware developed by a small set of vendors, who sell closed equipment, where software and hardware are bundled together, and the interfaces for configuring the equipment are quite broken complicated. And as a result few people can innovate. Even when I was at AT&T a company that buys perhaps more network equipment than any other. There was very little ability for us to have much say over what the boxes did. And so what you end up with is that the equipment vendors are the primary ones innovating and adding new features as a slow and laborious process that were the people who own the network are kind of stuck outsourcing that to the vendors that saw them So you might say again why do we care the inside of the Internet isn't the point, but I would argue is what determines what whether the Internet is performant secure cost effective reliable protecting of user privacy and more and so that we should care that the inside of the Internet is more open to innovation. And in fact as a research community is a little bit of inside baseball now but within my research community we have spent tons and tons of effort creating open source hardware and software at different layers the packet forwarding devices, the interfaces to packet forwarding the control software that tells the packet forwarding what to do. We've created lots of research programs and programmable networks in and above the Internet. We've created programmable test beds for trying out our ideas. I would argue we're actually desperate. As researchers did to have the Internet be more programmable, in part because it facilitates our research but in part because it facilitates the kind of change we'd like to see in the Internet and the ability to make that change. So what I want to talk about in this presentation is we've tried for so long to make the Internet more programmable and what what works and what doesn't. And what have we learned along the way or more specifically what have I learned along the way and I want to talk about a number of projects I've worked in since grad school through now, and many of them were failures are going to try to opine on the ones that failed, as well as the ones that had some success and also talk about projects others did that I didn't contribute to but I'll opine a little bit about why those projects succeeded or didn't. So what the broader goal being to get some lessons learned about what it takes to affect change in the Internet and what it takes to make the Internet more amenable to innovation. Now the higher order punchline at the top is a sort of strategy I would call sort of keeping it real that to really affect fundamental change in the Internet we have to be ambitious. But if we're not pragmatic that ambition gets gets lost and we're not able to actually affect change. And so there's going to be sort of two lessons to that one is a sort of pragmatism that involves staying in touch with reality. That to affect change by creating collecting real data about the network, building real software attracting real users, understanding the technology push in the market poll that makes real change happen. This is sort of a change what you can. The disadvantage though of focusing only on this is that we can end up in sort of a local minimum where we're only changing the incremental things were allowed to change. And so I would argue this needs to be counterbalanced by a larger effort to shape what future reality might be to create new artifacts others can build on to build communities to enable new kinds of change that are maybe out of reach today, but delay the groundwork for them to be made tomorrow. And so that's the balance between ambition and pragmatism and sort of advocating for in the talk. So I'm going to do this through a show my age now a 25 year retrospective starting with my time as a PhD student of Kang's when I was Michigan as a grad student I'll just briefly if you'll indulge me a moment I'll talk a tiny bit about grad school, especially since this is where I did my graduate work. And then I want to tell you a little bit about my time at AT&T, both other work that was going on in parallel in the academic community at the time as well as some things I was working on. And then I'll talk more recently about what's happened in the last 15 years about programmable networks. So, this is all work that's been done with a ton of different people I'm going to mention at the end a number of the people involved in the work but by no means is any of this work I did alone quite the contrary. So, when I went to grad school in 1991 parallel computing was in its heyday. A lot of people myself included were really tantalized by the idea that you could harness multiple small processors together to take on tasks bigger than anyone could do alone predict tomorrow's weather before tomorrow, for example, and these were often interconnected in regular topologies and part of the reason I came to Michigan was Kang's group was building a machine of this type and I could get in on both the hardware and software of that machine. And our goal was to handle a lot of different workloads, scientific applications, real time applications, and so on. And in particular, we recognize that the network needed to behave differently for these applications, a real time application cared about deadlines best effort about throughput, and the right kind of routing and flow control might depend on the patterns of the workload. And so we worked a lot on how to make the little devices programmable little router sitting inside a single parallel machine. But in the end parallel computing didn't take off or at least it didn't then. And in some ways Moore's law wasn't over yet. Now 25 years later it looks like it might be almost over 20 years later almost over. And in the end application development ended up being a lot harder than building the parallel machines. So I have a the reason I have this picture of a coffee mug is at the time I used to go to parallel computing companies and they would always give you a mug about the connection to the way or whatever so I have this collection of mugs of now bankrupt parallel computing companies, sort of a little, a little shrine to this lesson if you will. But I just want to say that part of the difficulty here also was that we didn't have real data to evaluate our research. It was very hard to get real workloads at the time, and custom chips that we built were lagged behind the silicon features that were available in commercial chip sort of an inherent challenge that a lot of academic research had. But I should say I don't mean to be so negative. Actually in fact a lot of the things that were worked on at that time have seen currency and data center networking today. The sort of modern factories of data centers are in many ways large scale multi computers of a sort. So one might argue I should have been more patient and switch to T, if you will, that Moore's law is coming to an end now and data centers actually use ideas like this. So I sort of skepticism about my, my PhD work as sort of a long term thing it was more at the moment it wasn't quite the right time for this kind of work. And yet, the idea of programmability as a way to customize the network I think did have some longer term potential. And so I continued to be something I was preoccupied with. So after my time in graduate school in Kang's group we were looking at what kind of operating system to run on the parallel machine we were building and I got really interested in network stacks and started learning about networking and the X kernel and the World Wide Web exploded and people started doing really interesting research studying the Internet through measurement and asynchronous transfer mode started to emerge as a new kind of network device that were kind of similar to the multi computer routers we were looking at in Kang's group. Wow, I can pivot a little bit to think about the network that is becoming this global network, not the one inside a box but the one writ large across the Internet. And maybe what I know about is useful because some of the things that go on in these devices is not that really kind of devices I was familiar with. So when I went to AT&T I continued to work on asynchronous transfer mode networks and how to route in them, how to schedule, how to shape traffic, and eventually how to carry Internet traffic over them. And then eventually as that technology fell by the wayside, I started looking at another kind of technology I call IP, not over ATM, which is sort of what I work on now. And so I like to joke I've sort of worked on on several technologies that that didn't really take off and in fact I skipped one I also worked on routing and telephone networks as well. So I've worked on routing and multi computer networks routing and ATM networks and now, you know, routing and other topics on the Internet. But each of those have a lesson I think to tell us about how networking works today and we'll see later in the talk that one of the challenges and evaluating ideas, even when I was inside a company inside AT&T it was difficult to get real data. And once I was able to start getting real data became really clear that the collecting of that data was a research topic in its own right, and that understanding how to manage a network using that data would be an interesting question to explore. To me for me at the time it was like I'll just go ask for some data and run my experiments and be done event and then it kind of took on a life of its own. The second is the ability to carry traffic over ATM networks which are sort of circuit switched style networks more like the fun network. Some of the ideas there that you might take a group of related packets and handle them like a circuit is going to come up again later in the talk and that's something a lesson from these style of networks. Now in parallel to all this in the in the academic community, a hot topic at the same time was an area called active networks. And then, and the idea there, the researchers who pursued this I wasn't involved in this at all. They were really concerned that the internet was becoming difficult to innovate. The same points I was making earlier in the talk they wanted to be able to enable experimentation with new networking ideas, and also to enable future networks to accommodate innovation. And also, explored a couple of different models for doing that one extreme approach of carrying code inside package so that individual packets of data might carry the code that would run on their behalf to allow end users much more say over how the traffic is handled, or that the devices themselves should be programmable, so that you could run programs by a behest of the network administrative. So at the time, this was quite exciting the idea of programmable ability to enable innovation to demultiplex traffic in the network into software programs, a unified way of thinking of adding software functionality to the network lots of really interesting ideas here and in fact, they animate in many ways things that I'm interested in that I'll talk more about. But at the time, there were two questions that were sort of unanswered, who should really be the programmer of the internet, should it be the end user or should it be the network administrator. From my point of view, I was preoccupied at AT&T with the network administrator, whereas a lot of active network research was focused on the end user. And to be worried about performance while we're making the network programmable, worrying about it too early can constrain your thinking but worrying about it too late can lead to adoption problems. And so these were sort of two, two issues on my mind is that community was grappling with active networking. And so for me, I really became interested in the network administrator as the programmer and performance as a first order concern. And so I just want to tell you a tiny bit about some of the work at AT&T driven up I think by my, my really strong wish to work with real users in this case the network management people the network operators and real data. The things I was sort of hungry for throughout the time I was in grad school and even the first few years I was at AT&T. I also became as I talked to the network operators aware that they were really doing heroic things to hold the internet together with very little scaffolding, very little methods, the internet wasn't really designed with its management in mind, and it really shows. And so I became aware that they were really at a high level over and over again trying to design a control loop, measuring the network by whatever means necessary, controlling the network by whatever knobs the network and network vendors would make available to them. And so it all doing analysis or optimization to optimize the flow of traffic to do maintenance in a way that doesn't affect real users to detect and block attacks, and so on. I became really fascinated with the idea that even if we're not allowed to program the network, because the vendors don't let us, maybe we can program above the network and implement this control loop in ways that network administrators would benefit from. So I'm interested in the management system being the place where programmability might be possible. And as a result, this is where the pragmatism part of the story comes in, taking the existing routers as a given, not just their hardware but their software to because the companies that sold that equipment didn't allow even AT&T to change what code ran on the boxes that AT&T itself bought. If you want a little more detail here, are there any questions or comments? You feel free to either unmute or ask in the chat. Okay, I think I'll just go on a little further then. So, one question we thought about was how to route the traffic in a way that would minimize congestion. So the way networks work at the time was they would run like an individual network like AT&T is part of the internet would run a protocol where every router was a node in this picture and every edge is a link between two routers. There would be weights on the links that would and the routes that the traffic would take would be the shortest path based on these weights. So you would travel for example on this light blue path because two plus one plus five is the smallest path that gets between the entry point and the exit point for this traffic. Now the network administrators were given a protocol that was standardized and implemented in the boxes where the only thing they could do would be to set the values of these weights. Now, the people who designed these protocols figured these weights might be set based on the physical distance that the traffic would travel over that edge or maybe something inversely proportional to the capacity of the links. But in practice, neither of those two heuristics end up delivering traffic efficiently. So the problem the network administrators implicitly are solving is kind of backwards. They have to pick these weights so that when the routers compute shortest paths based on them, the traffic flows in a sensible way. This is not the way you would define the world if you were a network administrator this is what you do when you know you have a nail and you don't have a hammer you have to put it in with a screwdriver. But that's what what they had to do they had to essentially express all of their desires through these numbers that the routers would use to compute paths. So the goal here to be clear is not to compute shortest paths, based on a way to graph is to pick the weights so that the shortest paths are the ones you want. Those are recurring themes throughout the talk working backwards to answer the question you actually want because the equipment doesn't let you answer the window one. So what did you have to do then you need to know the offered load between the entry and exit points in the network you need to know the graph and the capacities of the edges. And then you've got an optimization problem to figure out the weights that will let's say minimize the most heavily utilized link in the network. You've got to work on collecting the traffic matrix collecting the topology and solving this optimization problem, and I won't go through the technical details but each of them are actually quite interesting and thorny problems in their own right. And just to give an example, if one of the links on this path were congested particularly this link were congested if you were to increase its weight to three. You might see this bottom path become shorter than the longer path now longer path, and you would divert a fraction of the traffic away from this. So basically what we did was create for the network administrators, a better network wide way to reason about how they might go about doing that. So, but in the end, after doing that work of how to measure the traffic matrix how to optimize routing and so on, we ran into a lot of interesting challenges that were perhaps actually more interesting than the original problem we thought we were going to solve. During the traffic matrix or inferring it from more limited data was itself an interesting sort of tomography problem to actually get the network to divulge what offered load it was carrying. The optimization problems were hard. The optimization problem ends up being an NP hard one, the protocols have all sorts of peculiar quirks that if you don't model them quite right you don't actually follow what the network actually does so we spent a ton of time, validating that our models reflected reality in terms of how the network routers would actually behave. And finally, the big problem we ran into is every time we changed the link weight to make the network less congested will cause transient disruption for the traffic already being carried in the network. And that was a problem as well and so we ended up having to design effective ways to make change to the network without the change itself being worse than the problem that it was selling. And finally, after a lot of effort, we're able to get all of these techniques deployed and operational AT&T's network, but really this picture is where all the action was the algorithms work on the previous slide well interesting was, was nowhere near as media as some of these operational questions. So, another thing I wanted to mention in my, when I was at AT&T I used to work the night shift in the operation center to understand better what the network administrators were doing I wish I had done this earlier in my career and I wish it did it more often now. I worked in the night shift with the people actually holding the internet together I heard lots of war stories and interesting anecdotes. And it really made me realize how much I was abstracting away of the things that actually mattered to running a real network, and inform my work in really significant ways. So if you ever have an opportunity to talk with people that do the actual work of making a system run I highly, highly recommend it. So, the other thing as we did this work we studied how to make the flow of traffic more efficient we studied DDoS denial of service attack mitigation. We were marching through a whole bunch of different ways to make network management better. And then we started to become concerned that continuing in this way was going to be really ineffective and I'll explain why. If you look at the way networks actually work they forward packets. There's a data plane that forwards buffers drops marks this very very simple streaming algorithms on packets at the behest of a control plane that in legacy equipment is standardized those the kind of protocols I was just talking about. And then finally there's a management plane that collects data and configures these protocols. And so all of our work was making this management plane, bigger and bigger and bigger inverting whatever the control plane did trying to coax it into doing what we wanted it to do. Always adding a lot of complexity to be able to indirectly make the network do our bidding. And as we did more and more of this it became clear that we're making the management plane, more and more and more complicated and harder to reason about. Sure, automating things that used to be done manually and hopefully doing them better, but at what cost at the cost of really complex software that was hard to reason about. I started thinking of what to do and I chatted with some people at AT&T who worked on the telephony network, and they told me about this work from the early 80s. What was called the network control point, which in the phone network separated the actual delivery of telephone calls from the control that set those calls up. Now AT&T it was often the case that old timers who worked on the phone network would tell you stories, and you would have to patiently listen to them. Initially you would think okay they're just those old bellheads telling you stories again and they just don't understand the internet. And at first I thought it was one of those kind of conversations but I humored the person and what read these papers from the early 80s. And what they talked about was separating the control plane from the data plane, the network control and the network action didn't have to run on the same box. And that logical centralization of control would enable flexibility, scalability, innovation and more. And I thought wow that's actually pretty mind blowing that's pretty cool. But what would that mean inside an ISP we're not setting up phone calls we're routing data packets. And I thought well most of the problems I've looked at are on network wide route control. Could you separate routing from routers. Is that a weird idea or is that a good idea. And in particular we how can we do that when we're not even allowed to define the interface that we have to the equipment, because again we're stuck with this pragmatic constraint. And then it came to a group of us that we should trick the routers into doing what we want them to do and tell them what they should do. And I'll tell you a little bit about that part of the story next. So, the idea was to separate routing from routers that routers normally talk amongst themselves to compute routes. What if we had a computer tell the router the answer, what if we gave it only one choice. So they had no choice but to pick the answer we gave it. Essentially what we did is we took information from the neighboring networks that connected to AT&T, fed it to a computer and spit out the answers to what how those individual routers and AT&T network should forward their packets. Essentially short circuiting the control plane by brainwashing the routers and doing what we want. And this was very much incrementally deployable AT&T's neighbors would speak to us in the same way they normally do talking to some general purpose software rather than to a router but they don't know the difference. And no changes to the routers because we're force feeding them the answer to the computation that we do. And you would think okay well this is a horrible idea because how could one computer do all the work of hundreds of routers, but it ends up that routers run often multi year old CPUs and have relatively limited memory. So there's a computation that all those routers do is tremendously redundant that many of them are doing almost the same thing. And so there's a lot of opportunity to take a high end server with a lot of memory and amortize the computation of multiple routers work among that among them. And it's possible in a single box at the time this is 2004 2005 that a single high end server could do all the work, even for a network as large as AT&T's. And you can't just have one computer because if it crashes that's a bad thing but you could just replicate now that single centralized entity to have multiple copies. A gift from Moore's law, ironically, that I gotten a lot better in the 10 years since my PhD. So, this idea of having a single computer be the brain for the entire network seem like a crazy idea at the time and we had to get people at AT&T willing to deploy it. We went to different groups within AT&T and tried to find users that would cover different business units inside the company security. The security group really wanted to block denial of service attacks and was having trouble doing it. The people running virtual private networks for enterprise customers had customers annoyed that they didn't have direct control over how the traffic traversed AT&T's network. And the interactive applications like at the time Sony EverQuest and other online gaming applications were frustrated every time AT&T did maintenance on this network that video game users would have their games interrupted. Harkening back to some of the problems we grapple with a few minutes ago in the talk. So we went through for each of these three use cases and built applications that would run on the new platform we built. And no one of these applications alone probably would have gotten the work deployed, but because all three of these groups were clamoring for this extra flexibility, we're able to get AT&T to deploy the platform we built and run these applications in practice. So I'll just talk really briefly about the online gaming case. So what was happening is users will be playing a game like a first person shooter game. AT&T would need to take a router down for maintenance to fix a broken card or to upgrade the operating system. And for a period of time, the routers would be talking amongst themselves to compute the new routes. And in the meantime, the actual traffic for the gaming application is being dropped, delayed, delivered out of order. And in fact, in one instance, 20,000 video game players died at the same time in a game of Sony EverQuest because AT&T did maintenance. So ironically, AT&T is doing maintenance to make the network better and the users are complaining because the application's behavior is disrupted. So what can you do about this is a really, really simple idea, but fortunately not hard to do when you have central control. You know ahead of time you want to take that box out of the network. You know ahead of time you want the other routers to route a different way. But the protocols don't give you a way to say that, but the network administrator or a centralized system acting on the network administrator's behalf can to take the steps in the right order so that nothing goes wrong. So in particular, all we do here is have the routing control platform tell the router on the other end to start using the other exit point in the network. The first exit point is still up so that packets in flight are successfully delivered. And in the meantime, after the traffic is successfully starting to flow only through the second exit point to take the second, the other egress point down. Really trivial idea is just to do things in the right order. Something it's hard to tell the protocol to do in a distributed fashion but pretty easily to do central. In the chat from from Trevor, how does the central computer managing things connect to the router. So it's connecting in the deployment we did at AT&T connecting over the network itself. Okay, so it has like a mini virtual network or something. Yeah, there are a lot of ways you could do this in practice in the simple case in AT&T's network we use the intra domain routing protocol to have this box look like another router inside the network. More generally you could imagine what you just said is the sort of generalization of that where you could have a separate virtual network that runs distributed protocols, just for the purpose of communication between the controller and the network devices. And not for more scalable kind of communication you normally need to support. There's no, there's no, no free lunch here you do have to do something to bootstrap. And then later deployments like for example Google deploy similar kinds of solutions now in their backbone. They run the distributed control plane amongst the routers, just in case the controller dies. And in fact, AT&T did that in our deployment as well so if our box died you would at least the worst case is you'd devolve back to normal routing it wouldn't cause that. In other words the brain dies the body continues to function, even if it doesn't have all the bells and whistles. There's another question can the centralist function be distributed. Yes, indeed and in fact in practice I think you would want that and indeed more recent deployments of these ideas do. So you might replicate for reliability that's all we did, but you could imagine having a controller and each geographic region and that's what Google does and I think other other deployments do as well particularly in settings where you would care about latency I think it's important today that it was when this work was done but definitely the ability to actually do the entire computation and one box helps you there though because it's an awful lot easier to distribute for latency and for reliability than it is to distribute for scaling. If you had to distribute for scaling you'd have to shard the content and it would be quite difficult to manage so fortunately the Moore's Law solves that problem for us. Which of the three applications benefit the most from the RCP that's a great question. I have to think about that. I think the maintenance one was the first one because I think that was just, and it wasn't even related to new service offerings that was related to customers who were complaining. So I think in a way I don't know benefits are kind of loaded word but I think the one that really got it done was the maintenance one because AT&T had to do maintenance and customers were vigorously complaining every time they did. The other two applications are a little bit more about emerging services that were becoming important at the time. And so I think DOS, the denial of service attack mitigation was next because that was also quite severe when it happened. And the virtual private network one was a little more niche, but there was a business unit that cared about it quite a bit so that certainly helped. Yeah, great question. Any other questions is probably a good time to pause because this is an end of another section. I think you kind of answered the question, the second question about decentralizing. Because the whole idea of the internet was to decentralize things right. It was a full tolerant story originally and what you're doing is this RCP is if you knock it out but you're saying that even if you knock it out, you can still use the network. Right, so there's a bit of a cheat going on there in that there's still the distributed protocol running underneath as a backup. And second that the central controller really isn't centralized and there's a kind of a joke I mean we use this term logically centralized which is kind of a lame term and really it really means distributed. And I say that in a cheeky way but it's sort of true I mean the idea is the abstraction to the network administrator should be that it's a centralized network wide configuration of high level intent. And in practice if it has to be implemented in the distributed way to make it scalable performing and reliable so be it, but at least we can design it with that in mind, rather than having the level of distribution be at the level of individual network devices, which is what was the prevailing wisdom before that. But yeah it has to be distributed really and then in practice to address scale performance and reliability just doesn't have to be scaled in the way it normally had been. So the central control manage unplanned outages. Right good question so the unplanned outages are handled by the routing control platform doing the same computations the routers would have done. Had the routers been running the algorithm and that can mean you know picking the best desk best exit point to leave the network, the shortest path through the network to reach the exit point and so on. And in fact the first thing we built was a routing control platform that computed exactly the same answers the routers would have computed no bells and whistles no new features for denial of service attack mitigation or anything else, just showing that a box alone could do what the whole network was doing by itself. And then finally, we started adding the ability to override the normal default behavior if you will to handle these special use cases on time. So the open compute project, yeah definitely highly relevant and more broadly the they work on software defined networking and software defined infrastructure is quite relevant here in many ways this is an early precursor of that area, and with a very specific focus on internet service provider routing rather than the larger picture. Yeah great question. Any, any other questions. So, so next an interesting thing happened. Who is a professor at Carnegie Mellon came to visit AT&T on sabbatical, and he got a group of us involved in a project that he and Nick McKeon at Stanford were involved in called the 100 by 100 project. And we told our way about the work we were doing and we were very apologetic well like well, we're doing it centrally because we don't have a choice and Cisco won't let us talk to the routers in any other way and so we're doing it we're doing this. And maybe you know even if you were allowed to do it differently maybe this would still be the right way to do it. And he gave us a kind of a pep talk, if you will, that maybe logical centralization of the network control is actually a good idea, not just a hack to get around the fact that we weren't allowed to do what we wanted to do. And he was right, we didn't sort of see it at the time because we were so preoccupied with the, the day to day, we're doing. And he also said maybe this is actually a good idea not just an ISP networks, not, but maybe another enterprise settings as well. And so we started thinking what if you could design what the equipment did what would you do. And we came up with this idea we called the 4d architecture and you can tell by the fonts here that this was an idea from 2004 2005 because the, I kept it in these fonts just because it sort of quaint. The big high level idea was that you have three things you want network level objectives driving the decisions the network makes, not local views. Network wide views to make sure those objectives can actually be executed and direct control over how the packets get forwarded, rather than anger indirect control over the algorithms used to compute that state. High level objectives would go through a central decision plane that would disseminate decisions down to the data plane to actually have them executed, and a discovery plane would make sure that you constructed the measurement data you needed to know the network topology and traffic and more. So those are the three ideas network wide visibility direct control integrated through network wide objectives. This is a control loop I talked about earlier right measure, analyze and optimize and control, just now put into a layer stack with with cheesy fonts. So that's, that's all well and good. At that time I left AT&T to go to Princeton I stayed for an extra bit of time to get this RCP deployed and AT&T is back when and then I left to come to Princeton. And when I came to Princeton I thought, hmm, I really like to think more now about how to make networks inherently easier to manage not bolt on network management on top of legacy equipment but how can I actually change the network so management is inherently an easier thing to do. So I wanted to revisit all the old questions again, but but with the ability to change the network infrastructure, and yet I was puzzling over leaving industry how is it going to keep my feet on the ground and understanding how the technology actually worked and how to affect change. And Larry Peterson who was at Princeton already at the time. It was was actively building an experimental platform called Planet Lab. There was a virtualized programmable distributed systems test bed to allow people to play with their research ideas. And it was his his reaction to the failure of research and active networking something he himself was a part of at the time, a more pragmatic take. Give people Linux virtualized on a server and let them run experiments on a slice of a Linux machine on multiple machines all over the world and see what they come up with. And so, talking to Larry was interesting it made me feel the active network ideas had some currency and importance. And there was a way to make it real. And yet at the same time, Larry and his colleagues were thinking about distributed systems research not networking so I was still puzzling over how could you take these ideas and make it real at the networking layer. And so I spent a lot of time working with Larry on how to make Planet Lab operate one level lower on network equipment. So, in parallel to all of this other people are having the same kind of conversations the 100 by 100 project that Nick McEwen and we're involved in. We're also puzzling over how would you do logically centralized control in an enterprise and the campus, and the ethane project at Stanford that Nick McEwen led started thinking about logically centralized access control to be able to block unwanted traffic. And they did a really exciting project that later led them to realize that the interface that they needed to the data plane needed to be something more flexible. And so they started to define and I got involved in this later, a standard called open flow that would abstract what the data set the data plane is able to do to forward packets that most devices that forward packets at high speed at their heart are simple. They parse a packet, they match on some of the bits that have been parsed an IP address and MAC address, and they take a simple action on the packets that match a particular rule they drop it they forward it out a particular report or do something else match action processing. All these marketing terms we use a networking router switch firewall network address translator load balancer, all of these are just different versions of exactly which bits you match on and exactly what simple action you take on matching packets. The open flow standard was pragmatic, it didn't design new hardware. It didn't do anything really except to say that's a general design pattern, and we can have an open API to it that the vendors don't get to control. The open flow standard came out as a nice way to abstract the interface to the underlying hardware, and it was eased by the fact that there was an emergence of merchant silicon vendors, Broadcom, Marvell, Intel and others that were starting to make chipsets that router vendors could buy. So you didn't have to be one of a small number of companies that had your own silicon foundry to make a router or a switch, you could buy these chips. It was kind of impossible to start thinking not only about how to create open interfaces because the technology was starting to be a bit more open, but a lot more vendors are also interested in building products and many of them are more open to having a standard interface to their equipment, than the established vendors were. And guess what there were a bunch of data center operators like Google and Microsoft and others who were starting to build data centers who wanted that flexibility. It was kind of a perfect storm in the sort of 2008 to 2010 timeframe to do this. But at the beginning, the focus of the academics, myself included was to think about programmable control. Now we can go below the management plane to the control plane because we now are talking directly to the packet forwarding hardware, using at least a better interface than the clumsy one that we used in my AT&T work. So we started to look at what it would mean to deploy ideas like this on a campus. And we did a deployment at Princeton and Stanford and Georgia Tech and a number of other schools. And we ran a bunch of experiments and learned a lot of stuff. And around that time, Jonathan Smith and Penn who's now at DARPA reached out to me and some others and said hey, you know that active networking stuff we worked on a while ago I think there's an opportunity to go back and think about it again. Now the time is right. So let's get a programming languages researcher and a networking researcher at several schools together and figure it out. And that's why I got introduced to my own colleague David Walker indirectly through, through Jonathan suggesting that we start talking. And similarly at Penn and at Harvard faculty on both sides of the aisle got together and started thinking about if the networks are going to be programmable what's the language we should use to program them. And it was just an eye opener is like, oh, networking is not only about scarcity is not only about optimizing routing and blocking unwanted traffic, making sure the resources are used well it's also about getting the right abstractions for expressiveness. That's a kind of question I've been struggling with but hadn't been able to put my finger on, but the programming languages people that's their bread and butter. And so, first thing we did was we started trying to program on top of open flow directly, and we expect experienced a lot of pain, the interface was low level and broke open for sure but still pretty low level, not a linguistic formalism. It's an API not a linguistic formalism and so every time we experienced pain we tried to generalize and abstract and eventually develop higher level abstractions for programming open flow networks. And in particular what was great about open flow is it provided network by visibility and direct control. And it was one of the things that we wanted in the project, and it had a simple data plan abstraction that could be explained to others including programming languages researchers who might frankly find the alphabet soup of networking, quite frustrating and intimidating otherwise. But it was still pretty bad it was a lot about bit patterns and ternary content addressable memories and rules and parsers and very low level details and very low level management of resources, and it gets worse. So if you want to write a modular application they have to share the same rule table that the process the same packets. And really, you haven't gotten rid of distributed systems problems you still got a controller and a data plane and in fact multiple data planes that have to be able to talk to one another with latencies between them. So you're still not really sidestepping the problems that distributed systems brings to the table. So one of the kinds of things we ran into as we tried to build applications on open flow. Sure the applications were fun hopefully useful but in the end quite painful to develop. So we're still again at a control plane, but now with at least a better interface directly to the data plane using open flow and so we went back to that control loop I talked about we developed query abstractions for collecting measurement data. So what we did with the data is for composing modular applications together, so that we could operate on the same packets with more than one piece of functionality to load balance a web server and then route traffic to it while collecting data about it. And we also figured out ways to control the network that wouldn't cause transient disruptions of the type that hampered us so often at AT&T. So just a time I'm not going to go into most of that but I just want to briefly tell you about the last one just as an example because it is a recurring theme how do you update the network without causing transient disruption you want to make the network better but not make it worse while you're fixing it. So, the high level idea is that we'd like a really simple obstruction that I have a particular policy a way I want packets to be handled. I want to change it from one policy to another. I've got a distributed set of network devices that each need to be updated. And I want to, I want to update the configuration as if it happens in one fell swoop, but in practice I can't. And in fact, I don't want the network to go through weird intermediate states where some of the devices are updated and some aren't. But it'd be nice if I could assume that packets were handled entirely by one policy throughout their journey, or entirely by another because then as long as those two policies satisfy a property I care about. They don't have loops. They don't allow unwanted traffic. They don't drop traffic I care about if I know that before and the after satisfy that. If I don't have to reason about what happens in the middle that's an awful lot easier because then I just have to test at the policy before and the policy after achieve my constraints and my goals. But I do have this problem in the end that we do have to update the devices one at a time, because even if we have tried to update them at the same time they'll be differences and exactly when changes take place. And they'll be packets in flight that might have experienced one policy at one switch and one other at a different switch. But now we've got a better interface than what we had when we did the work I did at AT&T we've got open flow. And so the high level idea, and it's really a simple one is essentially due to face commit. We're going to update the middle of the network but not activate those changes. Once all of that's done and stable will go to the perimeter and start letting packets be tagged in such a way that they can access that new policy. So update the middle, update the edge, and then in the background delete the old stuff that's not in use anymore. That's the high level idea. More more more detail for people that are networking oriented. When a packet enters the network we stamp it using the programmable switch with a version number that will determine which policy gets used. We update the interior to act on new tags that are not yet in use. So they won't get matched with anything. And then finally once we know that's done we go to the perimeter and start stamping packets with the new tags so that they can access the new policy. And once we're comfortable there are no packets in flight in the network anymore. We delete the old rules because there's no need to keep them anymore. That's it. The worst case it doubles the number of rules and practice most changes of the network have narrower scope and narrower impact and you can make them much smaller. But high level, this ability to have centralized control meant that really simple ideas from distributed systems like a two phase commit can be implemented directly rather than using these kind of weird mechanisms were forced to use when we had legacy devices. So, maybe I'll pause here to see if there are any any questions about about this part of the talk related to open flow. So, there are a lot of use cases for open flow we explored a bunch of them, at least a few of them here as sort of use cases to test out our ideas and these use cases help does develop better programming abstractions. These are really killer use cases that we had no role in that Google and a company called the Sierra that spun out of Nick McKeon's Griffith Stamford did, which was to help cloud providers run their private backbones and run their data centers to do traffic engineering in a private backbone, and one of the kind of problems I worked at AT&T but unique to the setting that these large cloud providers have and to run multi tenant data centers where the data center provider would need to support multiple separate customers, each with their own virtual network, and these became the two killer use cases for for what became software defined networking using open flow really the cloud providers wanting to have the same kind of control over their network that they've always had over their servers and their storage. So, stepping away from this part of the talk, we started the grapple with the fact that open flow itself was quite limited. Again, it was always a pragmatic protocol just standardizing what the existing hardware already was capable of doing. And what the existing hardware was capable of doing was pretty limited. At the point now there was one table for matching and acting on packets you could act on 12 packet headers and do very simple actions. And if you couldn't handle the packets, otherwise you had to send them to higher level software, a recipe for performance and security problems. So open flow 1.1 got designed and open flow 1.2 and time went on and every generation of the standard got longer and operated on more headers and was more and more complicated and there was always still people unsatisfied. And so the group of us involved in open flow became very concerned that where does this end. I mean this is like second system syndrome happening five times in a row. And so we thought what you would better just put the brakes on because this is clearly not the right end game. In fact, maybe it was time to be a little more ambitious and think maybe open flow was a pragmatic choice and should be jettisoned in favor of something more flexible. And in particular, we came back to active networking again. And so we started with a different focus hardware designed with programmability in mind, but without compromising speed without compromising power, figuring out what's the most programmability you can put in packet processing without giving up on those performance and power constraints and performance first variants of active networking, if you will, I wasn't involved in this work at all this was done by Stanford, Nick McEwen's group and folks at Texas Instruments. So just stepping back in at the end of Moore's lawn and art scaling. Now here we are years later finally it's really true that you've got really domain specific processors right graphics processors GPUs machine learning processors TPUs. This is the networking equivalent is an attempt to do packet processing at line rate with a restricted programming model that's suitable for packet processing. Is it exactly the right answer to how to do packet processing probably not but it was an awfully good first stab at it. The idea is still we parse packets, we have match action processing, but now the parser is programmable, we can decide what the packet format is and which fields to extract the actions you can do are programmable using an arithmetic logic unit not a deep idea just borrowing from standard computer architecture. You can decide which fields you match on how you act on them, change the fields as they go through the pipeline. Leave some state behind that will affect the handling of the next packet. Simple registers to store the state. And so now we're starting to see commercial devices on the market that support this simple programmable data plane. And in fact we have them running in the Princeton campus network that I'll allude to in a moment. So, this was hugely exciting to me, and to me finally the pieces come together. I mentioned earlier programming languages was the missing piece and those the second missing piece was hardware. So I think it's not really my own forte but obviously hugely important, trying to find a sweet spot between what networking applications need the language abstractions to support expressing them and the hardware that that is a good match for that model. So in designing before the language called before it's programming protocol independent packet processors, the group of us got together to design this language with that hardware on the previous slide in mind. With all the lessons we've learned from having programming languages researchers work on open flow, how would we want future networks to be programmed. And we defined a protocol independent language that would define parsers and typed match action tables that would let you write programs that are hopefully target independent so you can work with different kinds of switches and network interface cards, and rely on a compiler to generate the lower level verilog or microcode that would need to run in the device. To be honest on this goal I think we're still pretty far from achieving it but that was that was our goal at the beginning that that parts proven much more difficult than we ever imagined and we're still struggling with it now six years later. And reconfigurability in the field that if you buy one of these devices you should be able to change what your network looks like to change the thing from a router to a switch to a firewall to a net to something else completely by reprogramming the device. I'm not really interested in time I'm not going to go into a ton of detail here because I'm almost out of time but the particular piece where I've been interested is going back to that control loop I'm really a one trick pony I think about network control loops and how to make them more programmable. So I've been thinking a lot about abstractions for programming the measurement functionality to collect the data we want and not be happy with the data we have. And these are just an example of a handful of applications focusing on security and performance. We've developed data structures that can run efficiently in this peculiar programming model that I've been describing so far. And more generally to develop general telemetry platforms that will allow us to write high level queries of the type big data application support on packets, as if they're all available at one central location, and yet compile them to run directly as much as individual data plans of the switches through compilation technology languages meets hardware with a compiler in the middle of the translate high level queries and network administrator would care about into actual processing of packets using compact data structures in the hardware. And that's what I still work on now and I'm really interested in also going one step further not just passive data collection, but affecting the control on the packets as well to integrate, not just the measurement and the analysis by the control to I won't go through this example in detail but essentially we went back to load sensitive routing how do I get traffic to flow the way I want it to, rather than telling the routers what link weights to use, or having a routing control platform, say what forwarding table to use, can we actually implement the distributed routing protocol we actually want directly in the switch centralized intent, but distributed implementation. I won't go through this in the interest of time but we did work to implement this directly in the switches and so now we can do the control loop for this one specific application load sensitive routing completely in the switches directly in the data plane. But stepping back, the longer term goal here is to make this control loop integrated I think when I was at AT&T and even when I was at Michigan, I really thought of these three steps as separate you measure you analyze and optimize and you control. And we thought about it that way because we weren't allowed to think about it differently, because the measurement data was whatever the vendor would let you have the control was whatever knobs the vendor would let you tweak, and you would analyze or optimize, given whatever data you've got to do whatever you're allowed to do. But in practice what you really like is to be able to integrate all of this and be able to state and a high level declarative way. What are my goals and my objectives, my constraints. I want to minimize the most congested link in my network and still route traffic through a firewall or still block a certain set of traffic that's unwanted and synthesize the device local programs that collectively will implement the control that achieves this goal. We're not there yet I mean I get the slide I rushed over is one tiny example of doing this for a single application, but more generally I'm really excited about the ability to leverage techniques from programming languages to express these high level goals, leverage the techniques from the compiler community to synthesize device local programs that are faithful to the restricted capabilities and resources of high speed networking hardware and that's that's a big focus of my work right now. So I'll just end by saying what we've done over this this whole body of work by myself and my group and also by others in the community is keep pushing a boulder up a hill. We started by making the management plane programmable because that's the only place we could innovate. Eventually, the community worked on making the control plane programmable by having open interfaces to the data plane. And now the data plan itself is up for grabs with newer programmable packet processing hardware so now the stack is programmable. Now we have the hard problem to think of, where does the functionality actually belong to put it not where we can but but where it actually should go to solve the problems we actually face. And we're just now starting to think about that. So I'll just step back and say for me the lessons from this whole journey from the projects that failed to the ones that were more successful was first to identify use cases to find someone who's struggling in pain and network administrator is always my person I always empathize for the network administrator and frankly nobody likes to talk to them as much as I do so they they're always happy to tell me their stories. Find someone who's struggling and find out why and figure out what technical problem they're solving in some unnatural way because the technology doesn't work for them. These problems, particularly in networking or thorny and multifaceted and they end up benefiting a lot from connecting with a whole bunch of other areas of computer science electrical engineering and in some cases other even broader areas and I found that to be a really fun aspect of being in an industrial research lab, and and even in academia there's a lot of opportunity for that. And finally knowing some history a lot of ideas come around again. The 1980s work on the phone network, the active networks work in the late 90s. I was just dismissive about when I first heard about them but they've come back around and actually influenced my own work profoundly, despite my skepticism about them. And sometimes old ideas like my, even my PhD work, I become new again because the environments change the technologies change. And so it's, it's actually useful we're kind of an a historical field sometimes because the field moves so quickly it's easy to be dismissive of history but I'm starting to get old and crotchety enough now that I think it's actually useful to study some history. So let's say there are a couple of lessons the last two lessons are sort of contradictory but I think they're both true. One is to not fight what you can't change. And then the projects I mentioned in the routing control platform work we assumed we couldn't change the software that the routers were run. And so we worked within it. In the open flow work we assumed we couldn't change the hardware. And then the more recent work before we assume we can change the hardware but we can't give up process processing packets at line rate under power constraints. And each of those generations of change were possible because of the successes of the previous one. And so fighting to enable new kinds of change by building deploying and sharing prototype systems, articulating vision with a larger community of people and fostering the community so that more people are able to do that kind of work to push the boulder further up the field. And I just want to say thank you particularly here at Michigan it's a pleasure to be around some of my former professors and particularly my advisor. And at AT&T I really enjoyed working in Albert Greenberg's group who taught me a lot about how to run operational networks and do research that connects with people who solve real problems. There's a lot more details about the specific people I work with on each of these projects in the bibliography, but I'll think I'll stop here and I think it's probably just a minute or two left for questions I'm happy to take any questions.