 How many of you guys took some candy from me? Look, you know, somebody pointed this out earlier, thanks for pointing it out, but all of you took candy from a stranger. You all should know better, and now you're taking candy from Ken. So I'm just saying. So the talk today is about networking in the cloud and SDN Primer. There's a big question. There's a lot of interest in what SDN is. I find that I spend a lot of my time, most of my time actually, just educating people and talking with them about the details of SDN, the differences of SDN, why SDN, why is it important for the cloud? And there's different answers to this based on, you know, what vendors point of view you subscribe to. I'm going to try my best to stay impartial and just give an explanation as I see it. I do personally believe there's a particular approach that fits for infrastructure as a service that's better than most, and I'll be upfront about that. But as a whole, think of this as me just explaining what I think about the market. I'll try my best to leave some time at the end for questions, which I'm sure there'll probably be a lot because I couldn't cover everything. And you know, outside of that, there's a lot of vendors over here. My company, Midakora, is a vendor. There's Nuage, there's VMware, which bought Nasera, that's a vendor over here. There's Big Switch. I'm probably forgetting a few others. I mean, there's a lot of Plum Grid, there's a lot of other vendors in the space over here, and everyone has different approaches to the same thing, even though that some of them might be conceptually the same in terms of like the end problem they're solving. They all solve it in different ways. So if this appeals to you, I would suggest talking in depth with the people that have these solutions out there and just learning more about the solutions, and you'll find out what the differences between everything is. So let's start with my basic premise, and I think the premise that drives most people to the cloud anyway, or to SDN anyway. The current state of networking in the cloud is too manual. Nobody really wants to deal with having to plug in and out, having to deal with new devices coming online, manually configuring these devices, having people in the way. And so the first question around SDN really is an automation question, and that's something that most people are very interested in, and it's like, how do I automate this network layer where it makes the most sense to us? Now, this is actually a problem that people have had before. A long time ago, switches, particularly telephone switching, was all manual as well. And it took a while before that technology went from being manual to being fully received by everyone and being automatic, there being interfaces in between that were standardized between everybody. Once all the switching gear was standardized and all the standards were there, there was different vendors that could play in the space together and inter-operate, and we're in a very similar world today. This gentleman over here might be a bit crazy. His name is Alman Stroger. I'm not exactly sure if he was crazy or not, but I gotta say his behavior is a little bit odd. He was an undertaker, and this was in the late 1800s. This gentleman over here created the first electromechanical switch. You wouldn't think of that from an undertaker, but he had a reason to do so. He believed that people in Topeka, Kansas, who was running the switching facility where he was out of, was sending his customers, when they asked for the undertaker, to his competitors rather than himself. And he was really bothered by this. He was like, okay, the wife or daughter of someone who is a competitor is working at these places and they are taking my business away. So he started thinking about how to solve this problem. In the process, he ended up moving to Kansas City. When he moved to Kansas City, he had a problem with the switching department once again. He believed that the people that ran the switches in Kansas City were giving his customers that were trying to call him busy signals. So I don't know, there's two points over here. One point is, it sounds like this guy was kind of paranoid. But he did very much so, want to make switching automated. And his work turned into what became the electromechanical switches that led to the rotary phone. And this is a little bit what his technology looks like. He hired a couple of electromechanical engineers to work on this, solve the problem. He got patents, sold his company, spent the rest of his life being an undertaker again. But this is the basis of how electromechanical switching and telecom started, which is kind of interesting. And we're at the same place with networking today. Everything that we do with networking or a lot of things that we do with networking don't have standardized interfaces between them. Because everything is a little bit more manual in terms of how things operate, there are problems. And again, into a little bit of the problems, some of the problem sets that Almanovi here wanted to solve was a privacy issue. He thought that because they had these guys that I don't know if you guys ever watched A Little House in the Prairie before, but these people that were like Harriet Olsen that could listen on the call and figure out who was talking to who, they can end up getting in the middle of the call, messing up the call, routing it to other people if they wanted to. He wanted to solve a privacy issue, and he wanted to solve an issue having to do with intended human errors. What he inadvertently solved along the process was he ended up making switching, so you dealt with unintended human errors. You had faster speeds of connections between them. You had lower operational cost, particularly when this was done in mass, and people really started adopting this. And the industry actually ended up becoming better because of this. And I can't really imagine a world today where we would pick up the phone, dial the phone, it'd go to where we want with having physical humans in the way routing these things. It's just not scalable. The cloud is very much like this. We're coming from a point where in networking we don't normally have to deal with, well, our environments are a lot smaller. Most people deal with a lot smaller environments. We can deal with the humans that are in and out that are manipulating these devices. But from a large-scale point of view, it's unfeasible as we built these very large data centers that are filled with gear that are going to be running OpenStack clouds, of course. And it's just unfeasible for humans to be involved in every step of the way. But the thing is, is the requirements from point of view, the end user is exactly the same. Nothing has changed from the requirements that people want. So what does a traditional network look like? Think of these blue boxes over here as routers. And on a traditional network, there are packets that are flowing between these routers. Some of these are control packets. Well, what's a control packet? A control packet is responsible for making decisions on where the information gets sent. The other part that is very important is, well, examples of what some control packets are, are things like ICMP, ARP packets, DHCP. There's a lot of other types of control protocols that are out there as well. Other types of information that's on this line, and this is what's in the data plane, is this is the information that you're actually forwarding from one box to the other that you ultimately want to get to your end destination. And so networking is comprised of control packets, and both the data plane and the control plane. And typically, in network devices, these are on the same box. Well, why do we really worry about what these boxes do? So this is a very good question. Computer science is built on several fundamentals. And one really big fundamental of computer science is abstraction. We like to abstract everything, everything in computer science. And the reason why we like abstracting is by building abstraction layers and having standard ways that people can talk to that abstraction layer, you can enable people to build on top of it. So you build platforms. We started with this with a long time ago with machine language. Had to talk directly to the box. And then there ended up being assembly that talked on top of that. And then there ended up being other languages where you didn't really have to deal with lower level languages. And then it built on top of that, where today we're talking in Ruby, and we're talking in Perl, and we're talking in Scala, and you're talking in all these other languages that abstract everything away. Well, we've done this with every aspect of computing. We've done this with operating systems. If you noticed a few years ago, maybe 10, 15 years ago, everything was a big monolithic systems. But over time, they started being broken up in terms of capability and abstracted away. And as they got abstracted away, there's more participants that can play. This happened with operating systems. This happened with programming languages. This happened with the move from functional languages to object-oriented programming. And so it's a very core concept. And when you end up doing anything at scale, you end up having to abstract all the information away. And provide these common interfaces which people can build scalable systems on top of. So abstraction is needed for the network. And what that really means is that we need to treat this control protocol, or the control protocol separately from the data protocols, and build systems on top of that. Let the data flow to the boxes where they need be, but abstract away the control protocols, and give programmatic access through things like quantum and other plugins that come in there to be able to manipulate the physical network. So this is a very simple example of how things change a little bit in a SDN world. You end up having your routers still talking to each other using these data formats. But you have a controller which handles the control protocols and manipulates them and then determines what's going to happen and manipulates all of the networking traffic appropriately. There are a few categories of SDN. And so that being said, that's basically the idea that you need to separate the control plane and the data plane. You need to abstract everything away. But there's another big idea of SDN is as you accept that, you realize that there are different use cases. And so what we have over here is three really big use cases. We have infrastructure as a service cloud, which I think most of the people in this room is familiar with. And this is kind of what they're interested in building. We have data center fabric. And we also have the carrier WAN use case. Now, let me start from the bottom of work when I weigh up because we probably don't need to spend too much time on the other ones. When you're looking at the carrier WAN use case, this is people that are building solutions that are more traffic engineering type solutions. Their goals are to utilize bandwidth better, better network utilization, connections between data centers, things of that sort, connections between some of their providers, network resiliency, things of that sort. And so they're very much more interested in the carrier WAN approach. Now, Google recently published something. I think last year they ended up doing a speech at the ONF, the Open Networking Summit, which is going on right now, by the way, and talked about the work they're doing with open flow switches managing and using these switches to manage your bandwidth. So this is an example of how that happens. And there's a lot of different ways to make that happen. But that's one example. Another example is the fabric approach. And this is the idea of controlling physical and virtual switches and making them basically look like one big switch. And people like Big Switch, appropriate name, does this. Jennifer Q. Fabric does this. NEC Programmable Flow does this. This is the category they fit in. They fit in more of the data center fabric sort of approach. And then you have really the overlays, which we think, or at least I believe, fits very well within the infrastructure as a service clouded world. With this, you have guys like Mitokora, which is the company I work for. You have guys like Nasira. You have guys like Newodge that are building overlay solutions that solve this particular problem. Our goal is completely different because of the requirements around cloud networking is different. The requirements for building a cloud is, of course, you guys understand that we need multi-tenancy. Another one of the requirements that we're hearing often from the people that we're talking to is that you need L2 isolation. And L2, for those that are not familiar with that terminology, is switching. So we need switch level isolation. That's the layer two. Layer three is really routing. So we need routing isolation. Within things like infrastructure as a service cloud, we have concepts like VPC, which relies on things like L3 isolation. You also have concepts like VRF, which is virtual routing and forwarding. And that's the way that you can have multiple routers that are controlled by a main router. And the concept is you can programmatically look at this and make it all look like the layer three is completely separated. So you need concepts like VRF or the ability to do something like VRF. So that's really the right word for it. You need a scalable control plane, which means that you need to be able to handle things like ARP traffic, ICMP traffic, DHCP traffic, and do it in a scalable way, where you're not having broadcast storms or ARP storms throughout your entire cloud. And that can be something that can cripple a physical network, if you will. You need to be able to handle NAT. The parlance with an open stack is calling that a floating IP. So you need to be able to handle floating IP, map it to a private address, that sort of thing. You need to be able to handle ACLs, security groups. You need to be able to handle things like VPN, while there isn't very many constructs with an open stack to do something like that right now. I think that's something that's going to change in the future as the product grows. You know, one of the things, I'm going out. One of the things that we also think that is quite important is from a service provider point of view. You end up having to have a BGP gateway, which we think is a very scalable interface to your cloud network. You need a RESTful API, of course, with in the world that we live in, everything needs to be programmatically accessible, not just provisioned, but also through that API being able to be monitored and viewed and deleted and changed, moves, ads, changes, things of that sort. And it needs to integrate very completely with the cloud management platform, like OpenStack and CloudStack and these other stacks that are out there. So really, this is the big requirements that you have within cloud networking. And when you put all of these on a sheet together, you realize very quickly that the solutions that do traffic engineering really don't fit within the infrastructure as a service cloud use case. The solutions that do just data center fabric might be able to do the L2 isolation very well, but have no answer for the L3 isolation, has no answer for the scalable control plane, has no answer for the ACLs, the staple firewall, some of the other pieces that you need along the way. So as you start looking at these solutions, you got to figure out, like, how do they make this happen? And it is a difficult thing to make all of that happen. So also, some of the requirements that we think are necessary is moving over concepts that are in the data center today into the cloud. One of these concepts are the idea of having, if you go into data center today, what you end up doing is you end up getting a cage. When wire comes into that cage, you can put a router at that top of the cage. Then you have switches in your cage that connect to your host. And if you look at these boxes over here, you have that tenant with a tenant virtual router, which is connected to virtual switches, which is connected to hosts. Your virtual router is connected to the provider's virtual router. These are concepts that are very akin, very much similar to how it happens when you get colo space in the cloud today. And we think that this is necessary for you to be able to build a cloud and handle networking concerns. Now, this is particularly important, I believe, if you end up having to deal with moving workloads over, like, if you have existing workloads on physical gear and you want to move that over into the virtualized cloud environment and you have particularly networking requirements associated with that. This sort of topology, I think, is a very important thing to have. So our candidate models, our choices that we have in terms of building this is building it using a traditional network. And I'm going to go through some of the thoughts behind that and why that might not work. Building it using a centrally controlled, open flow-based, hop-by-hop switching fabric. And then building it using an IP-based edge-to-edge overlay. So if you use a traditional network to build cloud networking, well, you'll run it to several problems. But I'm just going to point out two. And I can probably come up with 10 problems that you'll have with that. But some of the problems that you end up with is that if you use virtual lands for isolation, L2 isolation, you run into an inherent problem. And this is the problem given the bit size of the VLAN identifier that you'll run at a limitation of 4096. And you can have 4096 individual tenants. Well, for a lot of people, that doesn't matter. They can end up having less than that number of customers. But if you're doing a moderately sized public cloud, it's not an option. It's very easy to hit that number if you're a service provider. Also, VLANs have large spanning trees which terminate on many hosts. Now, think of this. If you have, let's say, everyone or your host on your network that is trunked with every one of the VLANs that are there and you're allowing multicast traffic on your network, every host is going to have to deal with receiving those packets and determining whether a VM on that network can receive that, or should receive that packet or not. Now, this is very problematic in the cloud as well. And it just turns up CPU cycles that are going to be on that machine, which is completely unnecessary. And also, in order to do L2 multi-path, you need something like MLag. A lot of these solutions are vendor-specific. There aren't very generic solutions to this. You'll also need for L3 isolation to use something like VRFs. Well, VRFs are not scalable to cloud scale when you want to do something as large as what's within a cloud. They're very expensive hardware. A lot of it's proprietary. And from what I've seen, it's not very fault tolerant either. So those are two points about L2 isolation and L3 isolation. I can go through a lot more, but I think that those are two valid points that make it very difficult. If you want to use an open-full fabric, your issue has more to do with the limitations of the physical switch itself. The virtual network state is very important to keep. The size of the ARP tables and the MAC tables within these devices are very small in general. And it's very easy to run out of enough space on these boxes. And then you just run into problems with switching in general as you end up trying to build large infrastructures using an open-flow sort of fabric. These devices are not designed to be provisioned at the height churn that the cloud environment requires. And there is no atomicity of the updates. And these boxes are not fast enough to update as you would want them within the cloud environment. So this is a physical problem. This has less to do with the controller itself. It has more to do with the switches. But it's not good enough from the speed point of view and even a design point of view for cloud virtual networking. And also, if you want to do the higher level things, the things that are above the layer two level, in terms of the isolation, you want to have the firewalling, you want to have all of the details that you can have in very much full control, it's very hard to do that with an open-flow fabric. So this is the method that I believe in that I think fits best within the infrastructure service cloud. So we had the two other models. This particular model, there's a lot of vendors over here pitching overlays. Big Switch has an overlay. Newatch has an overlay. Nasirah has an overlay. Midakora, our company, has an overlay. And the idea is that you're not using isolation, using VLANs. You're encapsulating the packets. You have a very large bit size in terms of the header, which goes into the millions of tenants that you can have. So you can build a very, very large cloud if you wanted to. This is completely decoupled from the physical network. So all we require and all these overlay solutions require the physical network is IP connectivity, generally. Sometimes they might require particular switches that can end cap and decap VXLAN and things of that sort. But there's a lot of models, but our model as such, that all it requires is IP connectivity. And this is a good thing. So what you're doing is you're taking the network intelligence. And rather than having them in the core of the physical network, you're pushing it to the edge of the network onto the hosts. And when you provision a VM, it doesn't change the underlay state. You don't have to physically change anything. You don't have to worry as much about where am I going to put that tenant router and the capacity issues having to do with that. If this tenant grows to a particular size, do I need to make a bigger tenant router? How do I do that without interrupting the physical communication on that existing tenant? There's a lot of questions that goes unanswered if you take another approach. Using an underlay, it just delivers to the destination host IP. There's a concept within MPLS called forwarding equivalence class, so it's very similar to that. We use the underlay to be able to handle the destination traffic. We believe that you should use scalable IGP, like IBGP and OSPF, to build multi-path underlays. And this sort of thinking is inspired by research paper. A lot of people have believed in this concept, and there's a research paper that Microsoft did, Microsoft Research, called VL2. So if you guys are interested in this term of thinking and you want to see some scientific documentation on it and see what some of the leading researchers are doing on it or have done on it, read that paper. It's a very good paper to actually go through. So some of the trends that support this sort of solution is that we have things like Intel DP DK, which facilitates faster packet processing. You can do faster packet processing on these x86 boxes at the edge. And the numbers of cores in these servers are getting increasingly faster, and that's quite good. That supports the case. The knowledge around building class networks is getting better and better. And there's more people that understand that that's a very good solution for building an underlay for a physical network. So that's something that makes a lot of sense to have a leaf and spine architecture using IP. Within the cloud, there is a high East-West bandwidth requirements, and using a class fabric makes a lot of sense for dealing with that. Here's another thing, and I think this is a really big one. Using merchant silicon also makes a lot of sense. This is where you don't have to use switches from, let's say, Cisco and Juniper. You can end up using switches that are Broadcom or Fulcrum microchips that based like Arista and products from Cumulus and Delta and Acton and some of these ODM manufacturers that are out there. And you can build your network underlay. I've seen costs at 1 tenth of the cost of what it would cost if you're using equivalent Cisco and Juniper gear to be able to do such. So you can end up reducing your cost from a CapEx perspective quite a bit by doing something like this. And all of these guys like Broadcom, all these guys like ODMs, like Quantan and Acton, are beginning to sell directly as well. So that's great. And from a management point of view, and I think Ken can probably talk about this a lot, but everyone who's in management and it has to deal with the operations of these systems, these switches turn into just basically Linux boxes. And you can put other software on there to help manage it. On Arista boxes, you can put Chef clients on there if you wanted to and do some pretty amazing things. And also, there's a move for optical entry DC networks as well, which really support this concept of overlays. So here's really kind of an example of what an overlay looks like from our point of view. And this is a little bit more about our solution in particular. But I think it was the most pertinent diagram I can find. I'm not trying to sell you what we have. But let me make this a little bit more practical. If you consider these boxes on the left over here and think of these as the edge gateways and you see these boxes on the right over here, think of these as your compute host. This is running some hypervisor. These are running hypervisors like Zen and KVM. So some sort of hypervisor you're very familiar with, a Linux-based hypervisor. On the left over here, this is an L3 gateway. It's handling traffic coming in and out of the cloud. On the bottom over here, you have a network state database. And this keeps information on what the topology on top, that blue pane is in the blue pane. It keeps track of all the virtual devices, how the virtual devices are all connected to each other, the pre and post routing rules, the firewalling rules, the NAT rules, load balancing rules, all of the details and needs to happen. Everything in top is a virtual construct. It's completely made up. There are no VMs it's running through. There's really no device it's pushing it through. It's just a concept. It's a concept of how these things are all connected together. When traffic comes in from the left over here and goes into one of these boxes, let's say it goes into that bottom left box over there. And let's say it's destined for the top right box. Well, according to this diagram, you have to go through a provider virtual router, a tenant virtual router, and a virtual switch before you end up getting all the way to that VM. Well, as you noticed, that's not there. From when you're using an underlay, well, we don't really care about what the internal network looks like. But those are just switches and routers and whatever you want. The only requirement that you have is that all of these boxes that are blue on the bottom over here around the edges can see each other using IP. So this is totally made up. So how do you end up making it look like it's gone through this topology? Well, when a packet comes in, software like our software as well as other people's software manipulate the packets using the control protocols like ICMP and ARP and things of that sort and make it look like it's actually gone through that topology. We make it look like it's walked through a provider virtual router, a tenant virtual router, a virtual switch, and gets delivered. And then we send it in a tunnel from that bottom left point to the top right point. Then it gets decapsulated at that edge. And once it gets decapsulated, it gets sent to that particular VM that it's attached to. Now, from the point of view of the end user, the tenant, it has just looked like it's gone through that top topology. But from the point of view of the data center operator, it hasn't gone through that physical topology. We've just tricked the system into thinking that it's gone through that. This is basically how an overlay works, right? If we're taking traffic, let's say from the bottom VM and we're going all the way to the top VM, if you notice, you have to go through a virtual switch B1, tenant B router, provider virtual router, tenant A virtual router, virtual switch A1 to that V port. But in an overlay of a world, we just take that traffic, manipulate the packets, send it directly from the bottom right box to the top right box through a tunnel. And once it gets delivered to that top right box, based on the key that's associated with those packets, it gets delivered. The end user thinks it's just went through all of those devices and it hasn't. Now if you noticed, we don't have to shuttle traffic through these interim devices anymore, like routers and firewalls and other virtual switches that are there, because that logic is all within our system. And this is basically how we believe networking can scale. With a product like this, what we've done is we've taken the intelligence of that's within a router and spread it out across the cloud and created basically a grid router. So you have this left side, which handles incoming connections. You have this right side, which basically handles all of the connections for the hypervisors. And you have this part in the middle, which is basically your forward, your fibs and your ribs and basically your network information state. And so we've blown up what a router is and spread it across the entire cloud and turned it into one big grid router and provide this overlay sort of solution so where traffic doesn't have to pass through boxes. So if you want more capacity coming in and out, you can add boxes horizontally. If you have more VMs and you add a new node, that's an infrastructure service compute node, you can just add more boxes. If you need more network state database capacity, you just add more boxes and everything turns into a linear scaling rather than scaling up. And you don't have these service interruptions when this happens. So this is basically a concept or a practical concept of how overlays work. And what we believe is that overlays are the right approach. There's a lot of solutions that are out there in SDN to recap. There are solutions that do traffic engineering, WAN scale carrier type stuff. There are solutions out there that just do fabrics. That's more for the data center, I believe. And then you have solutions like overlays, which I think the best application for an overlay based solution is infrastructure as a service. It's probably the best use case I've seen in recent history. And just saying overlays are great or one thing, but you still need to have a scalable control plane. And that's where every company has something that's different. So what I would suggest is if you guys are interested in this sort of technology, go out there, talk to all the vendors, learn about how their software works, learn the difference between them, figure out how their control plane scales, and see if it makes sense for you. So that's all I got. Anybody got any questions? I'm trying to understand if this is the same as the ATM using VPI, VCI, and switching in the fib layer. And I assume that you still have next tab routing and the router just need to know the next tab. I'm not familiar, honestly, with that. So that's a good question, but that's actually probably something that one of our guys at our booth can handle better than I can. What I can say is that from an underlay perspective, the only thing that we physically require is IP connectivity. We don't require anything more than that. So everything else logically happens or happens in the virtual space. And we have something called a virtual router network, which we have these devices that are connected to each other and that's how they work. Our software ends up doing all the calculations and manipulates the packet to make it look like it's gone through those, but the software itself isn't a router and a firewall and they're just constructs within our software and we know the logic of what needs to happen there. Does that make sense? So. What kind of challenges do you foresee in terms of integrating with the OpenStack Quantum? Particularly, a Quantum does not support any IGP protocol. Yeah, so I think every one of the vendors that are here has some sort of Quantum plugin. I know from the company that I work with, Midokoro, we absolutely do have a Quantum plugin. And so we are very much tied within Quantum and we offer services not only that Quantum offers, but we also offer services outside of Quantum within our own API. So if it's something that needs to happen out of Quantum because it's not as fully featured as we'd like it to be, you can talk directly to us. We're spending a lot of time pushing, let's say our API suggestions within the community, particularly operator side suggestions within the community to make Quantum more full-featured and get a richer API set and we want Quantum to be the layer that interacts with us. But in the meantime, you can both talk to us as well as Quantum to make that happen. So one of the things that you mentioned to overcome the deficiencies of the physical network is to make the layer two networks a lot smaller. So we're talking about the underlay network being layer three and the cost of that achieving layer three at line rate versus achieving layer two at line rate in the data center. That makes the underlay network a lot more expensive, right? Maybe I misspoke a little bit or maybe I didn't make it clear enough. But basically the underlay we think should be built out of, it should be a Klaus fabric built with merchant silicon switches that can do IP switching that if they need to be connected to routers, they can do that as well. Leaf-spine approach, so you end up having economically east-west traffic. So there still is a layer two area there, right? It's very relevant for the cloud actually. Particularly, yeah, I mean Klaus is very old. I think it goes back to 1950 something. But yeah, it's not new, but we're repurposing these things to make them work within the cloud. So. Can you start small? Can you, what if you have a large enterprise network? Can you get one department going on this or does it have to be a rip and replace for the entire enterprise? I would say that really, yes, you can absolutely start small. There's no real requirement that this has to go through thousands of nodes. You can make them as small as you want. If you have a small enterprise that's doing it, there's sort of a small department that wants to do this, you know, generally what we see is we see people using Greenfield gear where they're buying new gear to do so, so they're not using old gear. So I think as long as the hardware is sized properly, then you can absolutely do that. So anybody else? Hello. So the control plan signaling is encrypted or clear text? I think that is a vendor-specific question. So it really depends on how every different vendor handles it. I know in our solution right now, the tunneling is done through GRE, so there is no encryption that's on top of it, but we're adding encryption mentions as well. So I don't think that will change over time. I think other vendors might have other answers to that, but I can't speak for them. So anybody else? All right, thanks for your time, guys. Really appreciate it.