 First of all, thank you all for being here. My name is Kyle Forster. I'm one of the founders of Big Switch. I'm here with a couple of my colleagues in the audience. We have Sunit with me, Doug and Kanzia in the back. There are about 10 of us from Big Switch who are here in and around the conference. Being part of these types of sessions is, for me, one of the most fun parts of my job. At Big Switch, we talk a lot about what modern networking is going to look like. We have this belief that the networking community, we're in the middle of a renaissance. And that the modern networking landscape that's going to come out the other side is dramatically better than what we have today. I love OpenStack because, to me, this is the nexus of a whole series of different components that make up what we think of as the modern networking vision. And while the software components are really important, and we're even going to talk about trends on the hardware side, the thing to me that's the most important is the community around it. Because it's the expectations. It's the tools. It's the skill sets of the people in and around the OpenStack and networking communities that are really going to drive this forward, that are going to take the landscape of what we know as networking today and bring it to the next. So this is going to be a hands-on session, and we'll show you a sign-up sheet for VMs in a sec. But I would invite you to think about the role that you want to play. This era of modern networking is very, very, very early. There's a unique opportunity in this charmed period to put your fingerprints on it. So what's the role that you want to play in this emerging landscape? It's going to be played out by the questions that you ask your colleagues. I certainly know the questions that I ask in my team. What questions do you ask your team? What questions do you ask your peers at other organizations? What questions do you ask of your vendors? Because everybody in this room can play a very active role in shaping the emerging networking landscape. So it's an exciting time, and I urge you to think through the role that you want to play. Obviously, if you want to talk about it, I'm with me. I'm around any time. So today, a bit of the agenda. First, we're going to show some slides on the context, on application modeling. Because one thing I've really noticed through a series of discussions at this conference, Hong Kong, Portland, was that a lot of the people who are actively going down the neutron path are doing it without thinking, hey, why am I choosing Nova versus Neutron? So let's just do a little bit of grounding, and I'll give you some perspective from somebody who spends a lot of time with both Nova and Neutron. Our company does both. But it'll give you a bit of a sense for, at least in my mind, why people choose one versus the other. So we'll set some context there. While we're doing that, I'll give links for a sign-on sheet. Our provisioning server, I have to apologize, kind of pooped out on us. So we have about 20 seats worth of VMs. And I'd ask that, basically, as about 20 people sign up. It's actually fairly easy if you want to look on with your neighbor, if you don't wind up with one of the sign-ups. My apologies, again, for this. But after the show, I'll actually give a link where you can spin up your own VMs anytime. We'll talk a little bit about the big switch P plus V fabric. First of all, I'll talk about what a P plus V fabric is, and then we'll talk about the moving parts of P plus V and the big switch solution. We'll do a hands-on session first. This is really for people coming more from a networking background and into OpenStack. We'll do a hands-on session with something that looks and feels like a traditional networking CLI. But I think you'll start to see some of the mechanisms underneath that bridge us from traditional networking CLI over into the more modern era. The second hands-on that we'll do is actually using the OpenStack UI. And in parallel, we'll look at what's actually happening on the controller so that you'll go from a grounding of, hey, here's hands-on CLI to, OK, if I do everything on the OpenStack UI, here's what's going on under the covers. We'll touch on a few advanced topics that, as we're rehearsing this, we just don't have the time to cover in this session. But we could set up in one-on-ones later. And we'll just say, hey, here's some of the other things that you could do either or we could do in a follow-up phone calls types of things with the demo VMs. And then last, I'll talk about where you can get more information. Yeah, a bunch of people here already get the joke. For the last 20 years, what's changed in network provisioning? Absolutely nothing. In 1993, it was Talnet. And 20 years later, it's the same old shit. The reason for that, certainly across the vendor landscape, and I actually believe across the user landscape and networking, the focus for 20 years has been speeds and feeds. The best product in networking was the fastest product. And the fact that these things became increasingly antiquated in the way they were used wasn't really an issue. But we think that we're on the cusp of a major change. If we look across networking, let's look at the hot areas of networking right now. We're talking about SDN controllers. We're talking about multi-path fabrics. We're talking about Linux shells. We're talking about Neutron. These are all technologies that are pushing the forefront of the competitive frontier, that really have more to do with, I call it agility, provisioning speed, takedown speed, how quickly can a network react to other things that are going on in the IT environment. So it's a really exciting time. We actually, I think a lot of us at Big Switch really believe there's been more innovation in the last two years in networking than there's been in the last two decades. So it's an exciting time, I think, for everybody in and around this trend. The hard design inspiration, and I believe the company that's the design inspiration for many of us on the leading frontier in networking is Google. I'm guessing that most of the people in this room have some level of Google envy. We would love to run our data centers with a lot of the same technologies that they do. So let's talk about it. They're famous for they have six different SDN projects, well, very, very deep in production. The G-Star WAN, the B4 WAN have gotten several write ups. And one of the more recent ones to be public Andromeda is a design that's very much a P plus V design. So a single controller that's talking to both the V switch, as well as a physical fabric, as well as the storage network, and all the way out to the routers that serve as the gateways. It's a unified P plus V model. This is where we get a lot of our design inspiration. Just a little bit of grounding on SDN. We started a big switch back in early 2010. The entire SDN community could easily have fit in this room at that time. There'd probably be plenty of room to spare. And the way I look at it, we've gone through a couple chapters. The first chapter was SDN for programming platforms. Now, this was a very interesting set of technologies, not particularly commercially applicable, but it was a really interesting place to start. The second wave was SDN for overlays, primarily because those of us out there building controllers. The only thing that we could control was a V switch. Physical switches were locked away and some for both commercial and as much as technical reasons, locked away in parts of code that at the time we couldn't touch. So a lot of SDN became synonymous with this idea of doing a V switch overlay. Now we're here. This year, you're going to see the first two really, really major P plus V products at the market over the late summer. And what you're seeing is the emergence of what I think is the third ending of SDN. Just like you saw in the Google example, controllers that control both V switches as well as physical leaf spines. And it's this unification of the V switch and the physical network that's going to unlock a whole series of interesting areas. And you're going to see that in the hands-on demos here in about 20 minutes. So promised hands-on. I'd ask everybody, there is a sign-up sheet to Google Docs up at this URL. If you wouldn't mind, if you're interested in doing the hands-on component, head up to the sign-up sheet. You'll see there's a series of the first column is empty. I ask that you put your name or your email address, some kind of unique ID, to reserve a row. In that row, you'll find a couple of different links. And you'll find usernames and passwords. We'll use these for the hands-on session. If either you don't have a laptop with you, or unfortunately, we're only able to set up about 20 seats, I just ask that you look on with your neighbor. And we'll go through on the slides. You'll actually see all the CLI and all of the GUI steps come up. That is definitely not what I would expect. Were you able to get in the Google spreadsheet? Most importantly. Yeah, the first link is for the sign-up sheet. And then the Google Hangouts for extra help. We have a couple of people in the hand. No, no, no, the hourly link. You see me? You mind just checking the Google Hangout link? And then if there's a new one, then we can just put it up in a minute. OK, perfect. I'll give everybody a minute or so to get through this, but on the sign-up sheet. So it looks like number 11 is still empty, number 13. It's right. OK, it was there, and then somebody, is it who's first or who's last? Who's first? Please do not overwrite somebody else's entry. They advertise 50 concurrent users, so I think we're putting them to the test. So I'd ask, for everybody who actually signs up, would you mind just raising your hand? And we'll see if there's a reasonable distribution around the room. So everybody who doesn't have a sign-up, would you mind just looking on with somebody who just raised their hand? It's an opportunity to meet your neighbor. This would be kind of nice. So does everybody have the chance to meet? Looks like everybody's had the chance to meet their neighbor. Let me, I have a handful of slides on context, and we'll go through that before we actually dive in. Of course, every time we've ever done this, there are some people that just want to start in the hands-on right away, feel free. You won't break anything. I've tried to break stuff. I'll also have a link so that afterwards, I'll have a link so that afterwards anybody who didn't get to do one themselves can actually follow the link and sign up and do it separately. So let's look at some context. I promised a little bit of discussion around NOVA versus Neutron and why we see folks tend to choose one versus the other. Just a little bit of grounding. My oversimplified view of how you model an application. We see folks with both open stack in with traditional data centers. They're kind of three models. The first model, I just called the mosh pit model. We're not going to have any isolation between any of the tiers in any of our applications. We're going to just throw it in there in one big subnet, one big VLAN, no segmentation at all, and we'll see what happens next. So let's call that model one. Model two is the very tidy model that says for any application, regardless of the number of tiers, if it's an administrative domain, we'll give that some isolated segment. In my mind, the neat and tidy model looks great if you're building infrastructure. If you're actually on the application side and using it, it tends to be a little difficult. The third model is kind of messy. It says, hey, I have some segmentation model. I try to segment my apps as best I can. I segment tiers within those apps. But realistically, I have a lot of exception cases and a lot of complex apps that I have to deal with. For lack of a better term, I just call this enterprise because it just shows up every time. So the reason for segmentation, I think everybody immediately flies the security flag. Certainly there's a threat mitigation aspect to segmentation. But there's also a simple fault isolation. Somebody accidentally turns on Windows Server load balancer and it sprays broadcast packets all over the place. Well, you have two options. Those are either isolated to a small number of devices or your lights go dim. Some machine decides to core dump its way out of ETH0. I've actually seen that happen a couple of times. What happens next? So there's a really good fault isolation reason to do this. Very often it's troubleshootability. Hey, if something goes wrong, let's at least limit the domain of place we have to look. Sometimes it's compliance. Very frequently it's either an implicit, sometimes an explicit sort of social contract between infrastructure and the applications running on top of it. Let's just leave it at there are a lot of really good reasons to segment an application. So when you think of the tools that you've got, so stateful firewalls, very, very big and heavy weight, but kind of guaranteed security, often though at least one organizational hop away. Subnets, VLANs, routes, and ACLs. Slightly less heavy weight in terms of time to put them in. Very effective in terms of blocking everything. But sometimes a little bit kludgy to use. Security groups, in my mind, one of the innovations of EC2 and OpenStack, it has a long history before then, but everyone's popularized it. Phenomenally easy to use, but also actually have quite a few very significant security limitations. They're actually not perfect isolation. And then last host IP tables. Again, phenomenally flexible and wonderful and easy to use for app teams, but guaranteeing any kind of consistency across the organization is kind of fool's errand with this approach. And the constraint, I mean, you can read faster than I can, but I'm sure a number of people here kind of live in these operating environments. There are a lot of constraints on your selection. You can't choose every tool every time. So you kind of have to work around the medium that you got. If you look at, I think of this as the common case Nova design, easily upgradeable to Neutron. And very often, this is kind of the Neutron. This is sort of that 90% of the tenants follow this design. You wind up with one VLAN subnet per project. And then to isolate each of the tiers, you wind up using security groups. It's clean. It's simple. It maps very, very nicely to Nova. You have basically one non-routable VM, a non-routable subnet that represents your tenant. Any tenant VM that actually needs external access requests a floating IP. Those floating IPs all sit in a public subnet VLAN, which is protected very specially. And then your security groups inside. Some of the downsides of this, within that tenant, because of security groups, you get very nice L3 fault isolation. I think a lot of people use security groups, and they say, hey, I can't ping, I can't curl in between two machines. Therefore, they're totally isolated. Well, if you come from more of a networking background, you recognize the importance of L2 isolation. This is broadcast isolation. And that's something that security groups don't typically do, other than some custom kind of fringe cases. But think of this is still very susceptible to an ARP storm to ARP spoofing. So if you actually do have a threat or a fault in here, the chance that you can actually spread across your security groups to the entire VLAN by flooding out ARPs, which are absolutely required, right? You're never going to get away from broadcast 100%. That actually is quite high. So you get a lot in easy use here. You get a lot in easy design. You don't get perfect isolation. And it's a trade-off to what we're thinking about. I mean, at Bigs, which we support this design in our products on both Nova and Neutron quite a bit, because it's kind of the ease of use version. If you look at what's required from the underlying infrastructure, most typically this is on Nova, right? It's every production Nova environment that we've seen so far as VLAN manager. You have your leaf switch router, right? And your spine are statically configured. You configure all VLANs everywhere. You configure them once and they're done. And it's the V-switch that winds up doing most of the work in terms of VLAN selection per tenant, right? That's a fairly easy INI config. It winds up doing the NAT function in order to get your floating IP. But that winds up the host V-switch and the host IP tables wind up actually doing any of the hard and dynamic work. Now, the downside that some people have with this particular design, traditional switch vendors never really designed for this idea of putting all VLANs out to every edge port. So the vast majority of traditional switch implementations you see cap out here around 600 to 800 tenants. There's a soft limit on switch CPUs that as soon as you put 600, 800 VLANs on every edge port, you start to hit cases where you can spike the CPU very high. There are very small number of implementations. I think we're lucky to be one of them where you can actually put all 4096 VLANs to every single edge port with no performance impact. So we kind of built our implementation for clouds. So we're actually able to do that. But the number of places where you can actually get up to 4,000 tenants per pod instead of six to 800. It's worth thinking. It's worth asking about, ask peers, ask vendors, et cetera. So that was Nova. Now, let's talk about Neutron. So the reason, kind of the core reasoning in my view behind Neutron is to say, hey, we want to do tier isolation and we want to do it at the VLAN subnet level. So in case of a threat or in case of a fault, if somebody starts spewing out all kinds of broadcast traffic, whether it's ARPs, which you're going to need no matter what, or some kind of fancier form of broadcast traffic, the number of places it can go is fundamentally limited. Also, the connectivity graph that you might see gets increasingly complicated. With security groups, it's hard to keep very complicated connectivity graphs all straight and even and enterprise consistency is difficult. So at least to me, when we see people make the Nova versus Neutron decision, and the informed Neutron decision typically is, hey, I'm going to take my connectivity graph and I'm going to implement that as a logical router with a set of routes. And I'm going to take every tier that I've got and that's going to be a logically isolated domain. This has some really nice advantages because you're basically taking the connectivity information and you're saying, hey, all my connectivity information is here and that's a little bit separate from my tiering. It also maps really nicely into existing enterprise practices where, hey, you're kind of used to using routes and VLANs and subnets for isolation anyways and you might have a set of practices around security or on compliance or on reviews that map very well to this model. So it's taking the existing kind of enterprise, hey, routes VLAN subnet for isolation model, making it logical instead of physical and mapping that into open stack. So in this, every tier gets its own VLAN subnet. So a single project could have many VLANs and subnets and each project gets a router and the project admin gets to decide, hey, here are the routes that I'm going to put in or here are the ACLs that I'm going to put on my routing interfaces. It's much higher effort to get going but over time this actually maps very cleanly to enterprise practices. So one thing that we actually see fairly often in neutron builds is 90% of the tenants will actually be in the simpler model that we showed a couple slides ago. And then 10% of the tenants, when things get hairy, flip over to this model. It's actually a very kind of, after we've seen a cloud kind of up and running for a year, that's a very, very common case. This is especially for the folks who come from networking backgrounds. As soon as you say router, somebody says, okay, which box is the router? Let me be very clear, by router, I mean a logical router. This is a software function that's running somewhere and depending, one of the parts that makes neutron so complicated and I think why there's such a hard learning curve here is because exactly where this router sits depends very much on your neutron plugin. They're kind of three big categories. The newer ML2 plugins. You specify a very specific compute node or a set of compute nodes that are actually running the L3 agent. So there you can actually say, yeah, that machine, that's the one that's running the router. Of course, if that machine goes down then there went your router, right? But I think you kind of get it. The second overlay underlay systems, kind of an improvement on that single point of failure problem, the router is actually a distributed software function so wherever routing needs to happen it happens in the V-switch as it's under the overlay or it happens in the overlay gateway if you're coming in from the router. So overlay underlays distribute the routing function and then unified P plus V fabrics and I'm kind of the new generation that's coming now also do the same approach of, hey, this distributed routing function. So there's no one box where you can say, oh, that's the router. The routing happens as soon as the packet enters the V-switch or the physical fabric. So the first one I think is a little bit, you can kind of point, hey, there's the box that has the router. The other two for both resiliency and for I think they're kind of more engineering elegant. They just take a little bit more time to stare at. The routing is actually a distributed function. Let's look at, in the ML2 case let's just look quickly at the moving parts. So you have a neutron server with the ML2 plugin and the vendor specific driver mechanism that's going to be covering the physical side. You have a spine switch router, you have least switch routers where you're actually dynamically provisioning and pruning VLANs, or in some cases VXLAN segments. You have an L3 agent, a specific box that's running the L3 agent. Sometimes you have separate box with HA proxy and pacemaker in order to make all of the parts of the L3 agent that you need to be HA actually somewhat HA. And then you have the V-switch. So here kind of the moving parts that are worth thinking about in an ML2 deployment. In an overlay underlay deployment, you wind up with a neutron server, spine switch router, least switch router. You no longer have the L3 agent, but all of the work is being done in the V-switch as it starts to tunnel. One thing I actually couldn't quite fit into the picture. You also have an overlay or underlay gateway sitting next to the router. It's either a dedicated server, a set of dedicated servers, or sometimes it's very specific hardware appliance. And so if you look at the overlay underlay case, here's some of the moving parts that are worth knowing about. Overlay gateway. If you look at the P plus V case, you have a couple fewer moving parts. You have a P plus V controller that's talking to your leaps and spines. So your leaps and spines are gonna be dynamic. You no longer have the L3 agent, but you have the V-switch. So in practice, let me just flash back real quick. In practice, you wind up actually going from ML2 style. You have a few extra moving parts for overlay underlay, and then fewer moving parts as you go to P plus V. And you'll see that when we go in the hands-on in a sec. So everybody kind of. Yeah, of course. The overlay plus the overlay on the underlay switch like the top of crack and the spine, if you have an overlay above it, the extension or whatever. You are hiding all the V-land that are happening at the tenant level, right? Typically we see them for transport purposes. So it'll be like an L3 down to the top of a rack. So it's hidden from the tenant side, but on the provider side, you wind up actually deploying them out. So it's not like you have a one big V-land, which is then kind of chopped up by overlays. You wind up having a series of V-lands, generally per rack. And then that's a very much a physical layout function. And then you have the tenant function actually chopped up in the overlays. Okay, so all these tenants. Exactly. So I'm just gonna do quick intro on the moving parts on the big switch side, and then we'll go straight into the hands-on. So our approach as a startup, in order to get P plus V, we pick a very specific set of hardware to work with. We call them the Google switches. Yeah, maybe you've heard that Google builds their own switch. Well, what they're really doing, they're buying switches from Chinese, Taiwanese ODMs, and then building their own software stack on top. And with our clients, we help them do the exact same thing. The other case is where we just announced a partnership with Dell, where they're gonna be shipping what we call open switches. So these are Dell switches, but they ship with the hardware only. And you get to choose which software stack to run on top. So in our particular approach to P plus V, as contracted to other folks that are looking at P plus V, our approach on P plus V is that we use these, it's kind of Google style switches. You buy the hardware without the software and we help people do that. On the software side, we provide three moving parts. We provide the SDN controller, which implements the Neutron plugin. We provide the Switch OS that runs on these bare metal and open switches. And then we provide the V switch. One thing I think that, at least that we're very, very proud of, we call it Switch Lite. It's actually the exact same user space code that runs either as a Switch OS or as a V switch. So it's the exact same operational experience. We're taking a V switch and we're making it really, truly feel and act like a networking component. So in a typical deployment, we'd have the big switch controller, or either a pair of VMs, pair of physical appliances. We'd have the Neutron server. We'd have Switch Lite OS running on the spine switch. It's typically 32 by 40 switches. We'd have Switch Lite OS running on the least switches. And we either have the Switch Lite V switch or we also use Open V switch. You see this, we're gonna do show switch here in a moment on the CLI. Just a very quick shameless plug. Have to do it. I think that we're good for Neutron. Our default, we don't use spanning tree. It's always, it's always multi-path. So you get kind of full nice cloth scaling properties. You can also provision every V land, every edge port with absolutely no performance penalty. I think we're good for Neutron. We have the cloth span with scaling. We have the physical least spine plus V switch kind of unification. We have no gateways on that side. And then we have a couple of bells and whistles that we're not quite ready to, not quite ready to show in a broad audience, but would be happy to show in one-on-ones. Everybody ready for hands-on? So first let me see a show. As anybody who has a lot, who has, you know, got IP addresses, usernames, passwords been unable to bring up any of the links or get in, any username, password trouble? Two, good. So you'll see there are two sets. There's a link and then two sets of usernames and passwords for part one. And that's where we're at. Why doesn't everybody bring that up? You should see a button. And the upper right that says start demo. This will kick off a bunch of scripts. Have you clicked start demo? Perfect. It's running. You should see a topology view pop up. You might actually, now just to make sure we're all on the same page, does everybody see something that begins to look like a topology? So if you don't then you're probably on the first tab, which is a set of instructions. There's a tab to the right that shows the topology view. So even though you're actually wearing Cisco name tags, I'm actually looking at you two to say, hey, are you there? It happens. You look down. You seem to tell me, you mind doing a quick walk around and say, hey, give me kind of the thumbs up, thumbs down, or do you want to move on to the next? What do you think? You got the CLI? Yes. Of course, you can have it on any switch that supports our Switch Lite OS. So we have a hardware compatibility list up on our website. It's a, I think you'll find with everybody building independent Switch OSes, we're all kind of gravitating to a very similar HCL. Frankly, the reason for that, the HCL is almost exactly what winds up getting ordered by the hyperscale vendors. So there's a specific set of models that hyperscale vendors order, and those wind up driving down the costs. So mostly a dependent Switch OS folks wind up there. Yeah, instead of here, it's a software simulator, but we actually have, if you're really interested, if we, I don't think we'll have time, but one-on-one, we could actually VPN into the lab and we could actually show you, hey, here's a rig running on hardware switches, but you'll see from the CLI perspective, it's the exact same. Has everybody found the CLI? And I should just say differently, could you raise your hand if you haven't found the CLI? You found it? Perfect. So if you're familiar with, if you're familiar with traditional networking, the CLI should feel very familiar. Tab complete, question mark. These are sort of some of the basic traits to get yourself oriented. Tenant is open-sec projects. Ah, that's okay. Really overloading. I know. The tricky part is too, there are actually corner cases where you can have multiple tenant, you can have many open-sec projects all mapped to one tenant, but treat that as a corner case. The simple case is open-sec project to tenant is a one-to-one. Yeah, then you need to keep it very, hey, here's a networking tenant and here's the database tenant. I know the term's overloaded. Oh, that's okay. I asked a few quick commands, show switch, show host, show link. These let you show from the CLI front what the topology looks like. It'll show all the interfaces on the host. Say again. It's actually a mix. So for show link, it's a combination of LLDP and open-flow commands. For host, it's actually also a combination. For show switch, it's entirely actually the open-flow handshake that's going up and down. Show link, we use, there are actually three parts. There's LLDP cone across the wire. We actually send also another packet type across the wire to validate link up, link down. And then we use OF signaling. So the switch OS can actually signal link up, link down. It's a really big belt and suspenders problem because actually every different type has different pros and cons. So we interpolate information from all three. We found that a lot of trial and error went in to making sure that was actually the most reliable. So I think for, I'm gonna speed up here through a couple of slides and I'm kind of gonna assume, hey, there's no, there are about seven or eight different steps here. They don't really build on each other. This is sort of basic exploration. So don't worry about it. If you feel like you're falling behind, like there'll be points for everybody to catch up. If you're familiar with configuring networking stuff, show run is part of your toolkit. So we have a couple of different kinds of show run. There's show run that'll show you the entire config. Show run for the controller node itself, which will sort of tell you the health of the underlying infrastructure. And then show run for specific tenant at a time if you wanna see that specific tenant cell two, L3 config. I'd ask that everybody try debug rest. And then you try a couple of commands. You can do debug rest, you can do debug rest detail. As you do a couple of commands from here, our CLI is actually a very small Python script that's running on top of our REST API. So what this is gonna do is echo out all of the REST calls that are going back and forth underneath. So if you try debug rest and then try show host again, try show link, try show switch, you actually start to see what it looks like if you were doing this without the CLI, but instead just do an implementing towards the REST API. We've actually had quite a bit of success here because what you wind up finding is somebody kind of from a DevOps background sitting right next to somebody from a deep networking background. And you wind up with somebody, hey, here's how I troubleshoot a typical network problem sitting right next to somebody looking at the REST calls going back and forth saying, well, here's how I could script it. It sends a lot of output. So to turn it off, just do no debug rest. If you wanna get into the underlying Linux shell, this is one of these things that a small number of people do but everybody really wants. Just debug bash, actually gets into you into an isolated bash shell. So you could load your own Linux software onto the controller if you want. A little bit of syntactic sugar. This stuff is just useful. The controller CLI actually respects some of the basic Linux commands. This just turns out to be one of these really useful things. So grab cat out. That generally, that's not an issue. That means that just the VM itself is timed out. So the controller is saying, hey, I'm no longer sure exactly what IP address this is. As soon as the first packet comes out, then it'll relearn the IP address to Mac mapping. Yeah. As soon as all these VMs, none of them are passing packets. So as soon as one ARPs, and then you get an ARP reply, things will actually start populating up. It could be any, whatever any packet that crosses the wire, it'll relearn. If you actually, for the purpose of this demo, you can go back into the topology view and our simulator, you could actually log into the VMs underneath. So that you could actually log into a VM, start pinging around from a VM. And you could actually see, I think to your question, if you start pinging from a VM, you'll see it populate the Mac address to IP address bindings. If you do ping, you go back to the controller, you show switch all flow. You'll actually see all of the flow table entries that just got populated from that ping. You'll see them show up. And then you'll see them age out. So start pings from different VMs and you'll see actually different combinations of flows going down to different switches in the network. If you want all the gory details, it's incredibly ugly to look at on the CLI, but it can be done. So now we're gonna go through a couple steps that build on top of each other. I'd ask that you either take a look or sit back and just watch the thing sort of go through. We're gonna start with this topology of the idea of a red tenant with both a web tier and an app tier. Show switch all flow. So here's the next step, to implement this idea of a red tier, the red web tier and the red app tier. First of all, we create a tenant. We call it BVS. That's our sort of virtual segment network segment. All right, we're gonna define a BVS. We're gonna give it an interface rule that we think of this as almost a search term. We're gonna say, hey, anything from 10.0.1.1 put into this particular isolated BVS. So this will actually grab one of the VMs. We create a second interface rule that'll grab a second VM. We'll exit that out. In the app tier, we'll actually grab a third VM. There's a copy of this. If you actually go to the left tab, there's a copy of this. I forget to what slide number it is, but if you really wanna copy and paste it out. Say again. Oh, enable conf. You're configuring two separate segments that are L2 isolated and then we'll configure a router with router rules to route in between them. What you'll see is on the OpenStack side, if you're using the OpenStack UI, you create an OpenStack network. You create a routing interfaces. You create routes on the OpenStack side and that actually automatically populates all the stuff underneath. So we're effectively showing the guts from the OpenStack UI side. This will all happen for you. No. So the setup by default is the, I should be specific. The setup that we set by default for the tutorial is that all VMs can ping all VMs and all VMs have L2 access to all VMs. So we're gonna take any to any and then we're gonna split it up. I'm gonna build in a separate. Exactly. If you wanna really dig into the details, one of the cool things is to here, we're actually building separation after the fact. So instead of setting up the network system a priori, you could actually set the network system up post facto. Which is useful until you set up routing rules. So this isolates everything. We haven't set up any routes in between them. Exactly. Yeah, if there are no packets passing, you won't see the show flows. So you can ping and you should be able to show the pings cannot connect. You can ping in the same isolated segment, exactly. Which the, ah, shoot. That's a pure shell in a box terminal mapping problem. For the next demo, yes. For this current demo, we just did shell in a box access rather than give these external facing IPs. So here are the set of commands. This is actually creating a logical router within the fabric. You'll see that there are two ways. We're doing it now through the big switch CLI. And then we'll do it through the open stack UI where it's, hey, click, click and all this is gonna happen underneath. But I figure we'd start with the guts and we'd work our way up. I mean, if you ever need to troubleshoot it, it's sometimes nice to get it enabled conf tenant red. So in the interest of time, I'm gonna zip through the next few within that router. If you're very accustomed to creating managing networks, you'll create router interfaces. You create a router interface on each of your isolated segments. And then you create routes in between. So if you're coming more from the networking side of the house and a little bit less from the open stack side of the house, this is very much how you would think about creating a high level of isolation across a complex app. And this is a, we call it a logical router, just logical router. The logical routing function will happen wherever it sees a packet under the fabric. So if the packet entered from a VM, the route will happen on the first hop. If the packet entered from a firewall or from a load balancer or from a router, the packet will route on its first hop. So you don't need to worry about where the router is. So the logical router, the logical routers, it's actually a distributed routing function. So unlike ML2 plugins where you say, hey, there's the L3 agent. It's running on that particular box. This, the route will happen, the TTL decrement MAC address swap will happen whenever the packet on the packets first hop into the fabric. So the nice thing, you don't need to worry about where it is and there's no single point of failure. So here's the last thing that I asked. Try, if you've gotten the route set up, try a command test packet in source host VM one, ether type IP, desk host VM three. The output that this will give you, it'll actually trace the logical path. So it'll say, here's the logical segment, where packets going between the source and desk would start. Here the logical routers that it's gonna hit in the path, in this case it's just one, but you can actually have many. Here are the routes within those routers that this hit. It'll actually give you the lines of the ACLs that it would have hit. And here's the logical segment where this packet would be delivered. It then says all the physical information. So it'll say, here's the V-switch where this packet, we think this packet would enter. Here's the specific tour that this packet would hit. Here's the spine where this would hit. Here's the leaf, and then here's the next V-switch. So it'll give you both the logical path as well as the physical path. This entire thing is simulated, so it's doing that without actually sending packets out over the wire. Building on top of this REST API call is actually one of the super sexy things that folks are building on with it today. And that's an open stack? Construct? This is a SDN controller construct. So we gave it IP 10.0.1.254. Yeah, so each interface that it has in an isolated segment will have to have its own IP address. Exactly, that's your VMD fall gateway. Which switch in port. We're using OpenFlow to populate the forwarding tables themselves. We use OpenFlow 1.3, so it winds up being actually a melange of the L2 table, the L3 table, the V-line ingress table, and the VIRF table. The way to think about the logical router is actually entirely like a human interface function. The data structure underneath is compiling the routes and translating that out into the appropriate switch forwarding tables. Our assumption is that you have one or more routers, and you can group them in interesting ECMP groups that are connected to the fabric. We don't go through it here, but we have this concept of the external tenant, which wraps your routers to the outside world. So Red wants to talk to the external tenant. You set routers to say, okay, I want Red to be able to route to the external tenant. The external tenant owns your gateway routers. It owns your physical firewalls, typically. It'll own your load balancers. So we wind up seeing this construct quite a bit. Exactly. It was more, rather than creating an entire special case there, it's more elegant. We actually just see it as its equivalent of MAC addresses in a specific tenant. I know there's still a couple people working through this, but so let me beg forgiveness, but I'm gonna keep on moving on through a few more slides. So I think you've seen the basics. This is what's happening under the covers, and we haven't gone through some of the more advanced things that you can do on the CLI. But think of them as being exposed through the OpenStack UI. So the second demo, we're gonna achieve the same end result, but we're gonna do it entirely by using OpenStack. I think it's important to actually start with the CLI underneath, because if you bring in a networking team to troubleshoot this, you wanna be as close to today's tools, we believe, as you possibly can be. So troubleshooting actually feels like network troubleshooting, but initial provisioning, you can provision in the old networking style, or you can provision through, I think, more modern techniques. So this, if you go back to your signup sheet, you should have a link under Hands On Part Two. Go back to your spreadsheet, then you should have a link for Hands On Part Two. So what you're gonna see here is OpenStack set up with Neutron. So it's OpenStack set up in a way where each project can create their own router. Again, I kinda go back to the application modeling piece. This probably isn't for 90% of the tenants of an OpenStack design, but it's for the 10% where the enterprise complexity starts to come in. So this is kind of intended to cover that last 10%. Very often we find those are the most demanding tenants, and those are also some of the most important. So you should have a username and password there in the signup sheet. Create tenant A, and so you create a project. And from here, if you've used OpenStack with Neutron before, this should be very, very familiar. But if you haven't, and I find actually a lot of people haven't, think of this as a quick tour through OpenStack with Neutron. Yeah, the tenant name password doesn't matter as long as you can remember it. So if you don't mind doing a quick log out and log back in as the tenant that you created. We'll get the tenant experience rather than the provider experience. You can do the same thing on the provider side, but this is a little more real. Member, I mean it can be either, it doesn't. Now you can see through OpenStack, if you're logged in as the tenant, tenant A, you're managing project A, just go down to the manage network area and click networks. For the purposes of this, create as many as you like. Create one, two. In this case we're showing create all three. So in this case we're creating web, app, and DB tiers. Now unlike creating with, if you created this exact same construct with security groups, you can kind of see some of the ease of use versus enhanced isolation. By creating this with security groups, you would create one network and then you'd create a web, app, and DB security group. You would get L3 isolation, but you would still be very susceptible to any of the L2-based attacks, any of the L2 stuff, any of the R, say again. Any external network? No, not in this particular one. We can show it, it just takes a little bit longer. I don't know that it has as much. So we assume you create web, app, and DB, exact analogs to the security group model. Afterwards, your networks tab should start looking like this. You have three networks that are provisioned. I'm gonna start speeding up here because I'm noticing that some people are still actively on the hands on, but a number of people are now kind of sitting back and saying, hey, let's just look at the slides. It's okay. Is it useful? So on the router side, right once you've provisioned a router, again, it's the same analog, your provisioned router interfaces. I think you start to see some of the analogs that happen between, this is sort of OpenStack default neutron with the CLI work that you did previously. Now if you wind up actually provisioning stuff in the CLI first, right, let's say that the admin of the particular project has a strong networking background and prefers a networking style CLI for this. You can provision CLI first and then it will show up in the OpenStack UI. So it really doesn't matter which one you do first. Again, I come back to very often, for smaller projects, you're gonna wanna be using security groups for this stuff. But this is really for the enterprise projects, specific compliance, specific security needs, some of the more complex ones. You add an interface on top of each of your isolated segments and at the end you verify this is a router that has three legs. It has one leg, one interface in the web network, it has one interface in the app network and it has one interface in the DB network. So if this is what it looks like, we'll come back to this one in a sec. After interfaces, you go to the router rules grid. This has actually since been, this is, the router rules grid has since been upstreamed. But when we built this originally, there was no sense of how to provision router rules within Nova, sorry, within Neutron. This has been upstream now so this should be actually coming out in future distros. But what you'll see is if you go from the interfaces tab over to the router rules grid tab, it's sort of a simple click to say, okay, hey, here are the different routes that I want. You can see the source down the left and then destination across the top, provisioning the routes. What stack is this? Kanzi, do you know when this, when we upstreamed and when it's coming down? Icehouse. So Icehouse, the router rules grid should be part. Cannot set the login? Can I take it in quickly? Thanks a ton. Yeah. External is, you might see tenant external. That's a default tenant that we put in that think about external as wrapping your hardware routers. So if tenant A wants to speak to the outside world separate from the floating IP mechanism, then tenant A would have a route to talk to tenant external. Tenant external, we find that tenant external generally houses your hardware routers, generally houses critical infrastructure, stuff like NTP servers, generally houses firewall interfaces, et cetera. That's not good. It could be. We're running this thing on a very, very slim VM because to fit, I think we have a total like 60 VMs running on this machine right now. So to fit everything on, we slimmed it down. I wouldn't be super surprised if there was some timeouts. We default to say there's an external tenant and then when you add in hardware routers that would route out of OpenStack out to the outside world, you configure those as interfaces that are owned by the external tenant. But we by default create the external tenant because so often we're finding that people begin to put stuff in before they have the hardware routers in. So the external tenant is kind of there already as a placeholder. We treat the semantics that you're trying to express are say for the web tier of the red tenant. I want this red web tier to have routed access to an interface on the hardware router. So that's the semantic that you want to express. We actually found fiddling around with it just for pure usability. The easiest way that we could express that is to say that hardware router interface is actually owned by an external tenant. The special case tenant external so that configuring reds access to the hardware routers is the same as configuring reds access to the blue tenant or reds access to the green tenant. At scale what we found is that people tend to have in very complex environments where this starts to show up multiple external tenants because you have one external tenant that represents your true outside world but you might have another that represents a specified security zone for parts of apps or databases that are living outside the OpenStack DB. So red tenant which doesn't need particular access to some legacy database that's been there for 15 years might connect to the default external tenant but a blue tenant that really needs some Oracle system that you've had forever might go to external dash security zone A. You can tell with this, the default external tenant is basically your default route. So we can't screw around with it, we couldn't change it. We couldn't change the routes to it here. For that you go to the underlying CLI. We found exposing that particular workflow like a limited route through OpenStack was extremely error prone. So we have all about 15 minutes left in the session. We're gonna end this here. This is the sort of the end of the hands-on piece. If you're interested in doing more, I think you probably saw as you were logging in, there's a series of these sort of PowerPoint to HTML longer sessions where if you're just doing this solo we have sort of an hour, it takes about an hour to an hour and a half if you're doing it solo following this where we explore some of the more advanced permutations of what we've shown. But here we wanted to give everybody the flavor of the basics to say, hey, here's CLI. If you're really used to, if you really come more from a network engineering background, here's the CLI that you're comfortable with, but if you really want your tenants to really be driving much more from the OpenStack UI background, here's the OpenStack UI equivalent. So we've shown some of the basics, some of the things that we haven't shown. We have designs for both NOVA and or ML2, as well as the unified P plus V piece. So I think at Big Switch, our goal is to span a lot of the different topologies that come out of OpenStack. And one of the feels that I hope that you're getting is that a lot of the folks that we work with have incredibly complicated enterprise apps. It may only be one out of 100, but if you kind of can't handle that one out of 100, then you're fundamentally limiting, in our view, a user's breadth of their OpenStack deployment. So very, very early on in the product lifecycle, we said, hey, let's tackle some of the most complicated topologies first and work our way down from there. We didn't go around, go about configuring HA controller pairs or HA neutron service pairs. There's actually a dual active standby piece here that's a little bit tricky to get through. It's just the construction of neutron. But we're very happy to walk through that and one-on-ones or after the session, we also have white papers on that kind of thing. Configuring ACLs in and out of critical infrastructure. Security groups are fundamental stateful elements. They work very well on V-switches. They don't translate very well to hardware switch ports. So if you have, whether it's bare metal servers, say you have an NTP server that doesn't work as a VM, say you have firewalls or load balancers, you need something, for that case, you absolutely need something other than security groups to protect traffic inside and out. So for that case, we support ACLs that translate cleanly into hardware. Inserting an L3 service is actually super easy. If it's an L3-based service, you configure a Nextop so that you could insert a stateful firewall, say in between somebody's web tier and app tier. We talked a tiny bit about configuring the external tenant. There's a lot that we can go into there about different tenants with different security zones to which they'll egress. We didn't talk about dual home hosts. This is actually an interesting topic by itself. There are a couple of demos that we could go about separately, demonstrating multi-path. I'm guessing that folks in the room who come from a networking background will kind of understand the limit and feel very intuitive about the limitations of spanning tree very rapidly and a number of us have fought very hairy situations with spanning tree. One thing that we didn't show here, but we can show actually in a separate demo is, hey, here's what a post-spanning tree world looks like. When you can take care of the multi-path protocols under the covers without really worrying about it too, too much. Here's what it looks like when you can have every link in the fabric be active. We didn't talk too much about resiliency, but if you truly want an HA environment, then you need to worry about resiliency at every single step of the path. You can't have a single point of failure anywhere in the environment. But this is something that we can talk about separately about how you can take one controller down, you can take two controllers down, you can take the whole management network down, you can take one link, you can take two links, and what happens in some of these cases? Quick question, you're talking about configuring dual-home hosts, what about dual-homed instances? An instance with two network adapters or two VNICs connected on the same layer two network? It's a big design question. We actually went back and forth on how to model that for usability a lot. We came to the conclusion that the easiest way to model it was to model that as two hosts. So that somebody intuitively thinks of a VM with two NICs as being two separate VMs because you can very rapidly put those into different legs of different security zones. Since we can't control the host software on that side, it was just the easiest way to make very obvious what we can control and what we can't. Sure. Legacy applications are being virtualized and all of these have multiple things connected on the same. We see it quite a bit. So you see it in NFV side, the very simple case for multiple NICs is floating IP. So any web server that you want to have external access to, and that immediately shows up on our side as two hosts. Pplus V is very flexible in terms of management. It could handle all of the architectures. I mean, what we've seen is that with overlays, even SWIT is at a lot of flexibility and manageability. It gives scaling issues. When you grow up, how is it addressing PIN to VSO? Clients don't have to worry about it because having a controller, single controller, or even a check controller where a single point of failure for a large enterprise looks sounds scary. Very. I think we didn't show here, but we can demo. Hey, here's what happens when a controller goes down. All of your flows actually stay in place. So anything the network was doing before the controller went down, it continues to do. Even new source test pairs will actually be respected. The thing that we can't do when a controller goes down is add new VMs, so add new MAC addresses. Let me answer that question very, very, very carefully. Actually, let me take a simpler answer. We centralize where we can and we distribute where we must. It's a great question and one that's, I think it took us a few tries to get the design just right. But it's a great question. In terms of scale, I think you had asked, and it's kind of the last thing let me touch on. We typically see sort of a, I think on the enterprise side, kind of a standard pod being somewhere in the 16,000 VM range. There's some that are a little larger, it's primarily driven by blast radius. So the reason that we can scale much, much better, I think, than overlays, first of all, we have a cleaner state distribution model because we can offload an awful lot to the physical switches. But also, since we have no gateways, that actually winds up being a scaling bottleneck and an awful lot of overlays deployments. And by taking out the overlays gateway as a moving piece, it actually removes a major scaling bottleneck in these fabrics. Our hard scale, there's soft scaling limits that are imposed by OpenStack that we kind of believe the OpenStack community is helping us through as we go. If you want to do over 5,000 VMs in OpenStack, you need to be very, very careful about your OpenStack components and about hardening them. But underneath on our side, our scaling limits are fundamentally L2 tables of switches, which are huge. In terms of next steps, if you didn't get hands on a demo account or you want to explore some of the more advanced stuff that we didn't have time to cover here, go to bsnlabs.bigswitch.com and you can sign up for a demo account. I only ask, please, no personal email addresses, no Gmail addresses, please use your work addresses because we typically kind of veto out Gmail and personal stuff. I don't know, you'll have to ask the OpenStack folks. If not, you can always email info at BigSwitch or just email me. So certainly there's bsnlabs, if you want to keep doing this as self-training and please don't hesitate to write us at info or just write me personally if you want to have a one-on-one conversation. Hey, I'm intrigued at your earlier statement that you said a lot of vendor switches max out at several hundred VLANs. It doesn't seem to correlate with what I've seen. I wonder what the reasons you see these switches maxing out is, you talk about CPU peaking, I wonder if it's a spelling tree thing because you may be per VLAN spelling tree. But I've not seen a new data-centered design that's had a spelling tree in for some years now, so... So if you had a data-centered design without a spelling tree, does this limitation you see then go away? Much, much broader range because limitation is primarily driven by the way that's the... Primarily the CPU for the... Legacy OSes use some of the switch-ship registers to implement spanning tree. So we see like the 9000 virtual core limit. That's actually an ASIC limit underneath due to its old spanning tree code. Okay, so that limitation that you see with the several hundred VLANs is a spanning tree. Is it a fundamentally spanning tree? Okay, thank you. There are some... Once you're in the non-spanning tree world, there's just a much broader range of vendor implementations, some of which have the same limitation still, and others that don't, but you get... It's just an area that I call out because people need to be very careful. Anybody have any other questions? If not, then I think we'll wrap up. So in part, I just wanna say thanks everybody for your time. I wanna say thanks everybody for spending time in the hands-on. I mean, these are the really, really important sessions in my mind. I think as I mentioned at the beginning, we're at the sort of nascent era of a very new part of, I think, chapter for networking, and you start to see hints of some of the tools that are showing up, but I'd urge you to think about the changes that you wanna see in the networking community, because fundamentally I do believe that networking is going through a transition now, and I do think that by asking the right questions by being very insistent about the questions that you ask, both of vendors, peers, teams, you can't actually have a lot of influence right now. We're in a charmed period, and this is the opportunity where a lot of people kind of have a lot of influence on the way networking's going. So I wanna say thanks. Thanks a ton. I'll be around for the next one.