 Make a little room. We got to get Jim up here Jim come on up There's there's more room up here Come on up if you're on the side wall over here move on up if you're in the middle move on in There's a crowd forming outside. So y'all can scoot forward. Please scoot forward I plan on completely breaking the fire code here. Everyone just pile in man. It's a really tiny room Very small how does the hottest feature an open stack and they give us the smallest room There you go So containers on bare metal If you want a public IP on every container Shall we get started? Okay? Let's get going So we're gonna present to you tenant network isolation for bare metal deployments Utilizing neutron. So the idea of this whole project is to Provide the same type of a framework which is available for virtualized environments for bare metal environments in terms of multi-tenancy so So I am Siktev Kapoor. I'm from Arista Networks I'm Jim Roland Hagen from Rackspace. I'm Devon under van der Veen just to drive IBM So I want to call out some of the active contributors who have been really Working on this program even though we are here just to representing the team But but there are a lot of people who have put in a lot of effort to make it work So Tariq Allahi and his team from SAP. They were the actual first people who approached me First time that hey look we need this feature So please work with us We want to utilize your switches and this piece is missing and he put me in touch with Devon and us special Thanks to Devon end up for jumping on it when we first approached him on that. Hey look This piece is missing in ironic and I think we can help build it and Since then the team has been really added Ohm Kumar from HP He's sitting down here. He's been very very active. He is Significantly participated in this Every evening he and I would be on the IRC private channel. So I will run all day long and Report to him what my issues are what what we are finding what the problems are He'll be looking for answers And and the other way So then he will report back so he will work all day long and at the nighttime he will report back to me Hey, look, I'm finding these issues and so forth. So it was like pretty hectic But we got it all working Jim, of course has been Partner in crime so shoulder to shoulder. He's been pretty actively involved Myself Mitchell We've been like he's my he's been my partner in crime in getting this thing done from our research team and Ironic core team has been very helpful in reviewing and providing support Last-minute issues, you know, so significant help there and neutron cores Bob Kura Kevin Armando Kyle everybody has been a very very supportive very helpful and Presenting different ideas that how can we bring the information from ironic? into neutron Okay So having said that I'm gonna so the other agenda is actually fairly short But you know depending upon the questions we get we can get it as much deep dive. We need to so Gentlemen here, you know, who has the biggest and the deepest knowledge in the background about ironic he's gonna cover the history and background of where we were and Jim will get into the problem statement and Proposed solution and then I'll step in and discuss the architectural details and I have prepared a little demo We will actually show you this holds a whole solution working from end-to-end and Then we'll take question and answers So having said that so way back when Some folks started using Nova to drive bare metal machines And then I pulled that out into ironic the original goal was HPC Using Nova compute using open stack for HPC workloads Scientific workloads tend to work better with more processing power That's all single tenant private cloud kind of stuff that expanded with triple O to still be single tenant workloads Using ironic to deploy Open stack or deploy other complex applications folks began experimenting or doing private Hadoop clusters on ironic But again, this is all still using ironic and open stack in a single tenant environment in that situation a flat network is fine It's single L2 domain for all the servers Even scaling up to hundreds of Machines is just fine in that space. We didn't need multi-tenancy. We didn't need network isolation We didn't need to integrate with neutron for this kind of stuff We did integrate with neutron for DHCP and IP assignment and that so on But the idea to provide a bare metal public cloud has been there pretty much since the beginning of this project When we saw what it could do I get that's that's really awesome people will want that but how do we get there and and I've been steering the project Actually, Jim is now steering it but I've been turning it up to now To kind of deal with things incrementally And it's mature enough that now we're tackling the really hard problems and this one in particular is required Deep work with neutron. We're changing our API. You guys have done some changes So and it's been two cycles now from that when the idea came up and we began talking we started in Liberty Working with you Liberty. Yeah We started working with cool and yeah, yeah, we began a conversation before Vancouver Correct. That was our first real design together. So in just six months We've gone from our first real head-to-head meeting about this Workout of design to you guys have a working proof of concept now, correct and the code's up there and we're this close. Yes So the real problem here was like Signal tenant environment. There's no traffic isolation between machines. Every machine is on the same L2. They can see each other's traffic Arp spoofing as possible. There's no separation from the provisioning network to the tenant network Even in a single tenant. It's like your instance would have direct access to the control plane of the cloud In, you know, I'm doing scientific workloads or Hadoop. I don't really care It's all mine. I trust myself or I trust my users in that but We really needed to solve that right and so Rackspace a year and a half ago was working on building a bare metal public cloud and Obviously single tenant networking won't fly there So we we were working on ironic We started building this out downstream because you know, we had months and that doesn't work upstream And so we basically we hacked up ironic a little bit made a little ml2 plug-in and Basically that provided secure multi-tenant networking Totally cut tenants off from the data or from the control plane It was only our use case. It was bonded interfaces. We trunked two V lands down one for each of our shared tenant networks And it was super super hacky and relied on a lot of stuff that it shouldn't have but it totally worked We did do it in the open We started talking about it right away. We put up patches Even though we knew they wouldn't get accepted as is we put them up to show people how we were doing it our ml2 thing was open source it Used Cisco equipment, but it was pluggable for other drivers with plug-ins inside our ml2 because we hired exhibit apparently and Eventually other people wanted to do this Arista and HP primarily and we started working on this with them like they said about six months ago and Wanted to do it right and so we did and So the proposed solution, I guess the solution we came to is basically what we had downstream So how that works is there's ml2 plug-ins for given switches that know how to configure a switch the right way right They use the port bindings extension the profile goes up there That binding profile contains Switchport information like the host name the actual port it lives on anything else you might need and Most people deploy ml2 so they can just pick it up and use it And then we went back to Ironic and Nova and sorted out the actual isolation from the control plane and so in the past Nova when it created the ports at boot time would create parts on the control plane network and just pass those over to Ironic and ironic would pixie boot on them and just it would work What we did here is Nova needed to create the tenant ports because it had the information it needed But then we had to make Nova not actually bind those ports through because we didn't want to connect it up before it was deployed and so We modified the binding profile to Allow for that And then those ports get passed to ironic for later use When ironic does the deploy it creates its own ports on the provisioning network Does all the pixie and deployment on that shuts the machine down flips over to the tenant networks and It turns it back up and that's when the tenant now has access to it Yeah, so like what Jim mentioned, so there are really three networks involved The provisioning network is where the server really pixie boots brings the image down And it starts to boot and once the deploy the entire deploy phase takes place on the provisioning network and Once the deploy phase is over and the IPA is running on the server and communicates with the ironic conductor at that point a network flip takes place the The server gets disconnected from the provisioning network and gets attached to the tenant network And that's what a tenant a network. That's the ultimate. That's the end goal Where the server needs to be connected and the cleaning network is pretty much was in place already This is what's used for a deletion of the instance Before the instance is taken down it's put on the Cleaning network and is remotely wiped out and for context. That's where for more updates erasing hard drives making sure The bare metal is recycled properly happens So some two players may wish to isolate that from their provisioning network given there could be code running there Yeah, so So the Nova when you launch an instance The Nova is essentially dealing with the tenant network as it does for virtual virtualized networks But nothing is bound the house ID is not presented and the ML to drivers. They were not Abind any port to the network. It's just created the port is sitting there unbound and Ironic driver is what which is managing the entire back end of it and causes the network flip So once the instance is ready to connect to the tenant network at that point it will do an port update to Neutron to connect to the the tenant network So here are the basic components of the solution. So we have ironic and Nova So Nova kicks starts the process ironic ironic driver takes over and goes through this provisioning network and Network flip and the neutron on the other hand is facilitating the create Bind unbind delete of the ports from the respective networks So on the on the back end Neutron we're utilizing Neutron ML to core plug-in. So the way ML to core plug-in works is that there are multiple Vendor drivers which can all simultaneously work together and The context gets passed from the neutron ML to core plug-in to the mechanism drivers and then that by utilizing that context mechanism drivers then can Provision the hardware. So so that's the part of which is already in place as a part of ML to core plug-in Which is what we've been able to leverage Significantly to come up with that was one of the reasons that we were able to come up with the solution rather quickly As you can see one point I want to make here is this this Solution really dependent upon touching on three different projects the ironic Nova as well as a neutron So coordination, you know having multiple blueprints and all that This is really fast to get a feature this big through three major projects and open stack It's it's been a work from a lot of people like sort of began this saying it's it's impressive Yeah, so there are a lot of moving paths to make this thing work and and these gentlemen made it Really possible, you know dealing with all the logistics and so forth So here I'm gonna quickly walk through the flow how how you know the entire Flow from end to end or plays around so when you launch an instance so that Nova will Create a port on the tenant network because when when you issue a Nova boot command You're specifying that ID of the tenant network, right? So So at this point in this stage you will if you go look at the neutron port So you will see a Portland neutron port is created on a tenant network, but there will be nothing in there so no information is there and Therefore ML 2 drivers cannot do anything with it So it's just sitting there for later on ironic to come in and plummet and then at some point the create port on a provisioning network takes place and this is what ironic driver kicks in and When the create port on a provisioning network takes place That's when the binding profile which Jim had mentioned earlier it gets passed on and there is some details on what gets passed on I'll cover in the following slide so that information gets passed to the Mechanism driver and the mechanism driver then can identify which switch this particular server is connected to and Therefore which interfaces or which ports need to be configured and and that gets configured And that's how the connectivity is achieved. So so the server will get the DHCP address It will get the TFTP server's address and whatnot and it will start to pull the image and Once the image is downloaded it starts to run that once the IPA starts to run Communicates with ironic and that's how the coordination and the network flip Is facilitated and at some point when it has reached at certain stage then ironic will initiate the network flip so a Delete port on a provisioning network is issued at that point so that goes to Newton and again goes back to the mechanism driver and it will unconfigure that particular port on a switch and The server is disconnected at this point from the provisioning network simultaneously It will issue an update port on a tenant network. Remember the port was already created by NOVA so ironic is coming in and it's issuing an update on this and in this update is all the profile information presented so that Mechanism driver will know what to do with it and it goes and configures the appropriate interface with the appropriate VLAN information So that's how The port gets bound on the tenant network and you have a complete Server up and running on a tenant network. So so that's the basic flow which takes place I didn't put the cleaning network. That's part of deleting the instance so now in order to make that provisioning of the switches ML to Mechanism drivers need to know which port or which switch Where this bare metal host is connected to right? Unlike a virtualized environment in there The hypervisors are already known the host ID as long as it's presented So the the drivers know Where the hypervisors and they can figure it out, but in this case the server is down There's nothing so you have absolutely no information. So therefore the ML to drivers need to know the physical connectivity as to which port on which switch and What the connectivity machine so for that what we did was we utilizing the binding profile this framework already exists in ML to and We came up with this local link connection information. So what it the connection really represents? The physical connectivity so switch ID switch ID is an ID which represents the switch and we're really we chose to use a Mac type of ID in this case the reason for using that was For the future extensibility, you know, so that it can be automated LLDP or any other protocols which which can discover these nodes So even though in the first phase we decided we will plummet manually So we will the operator will add this information manually through the CLI but eventually to make it fully automated, so We will we can automate. So that was the reason to use the Mac ID in a Mac Mac address form I mean switch ID in a Mac address form and the switch info. This is anything Which helps vendors to? specify Which switch is it right so because we're utilizing ML to plug in so it's a fairly flexible So you can have a vendor may choose to have ML to driver for only virtualized Deployments and a separate ML to driver for bare metal deployments or they may choose to have one ML to driver Which which work for both right or a vendor may go even further step and they're utilizing multiple type of switches, right and they may want to have ML to driver for each type each model Each switch model that we may want to have a different Mechanism driver, so this gives that flexibility So, you know, whatever whatever vendor wants to choose they can they can specify that and the port ID is the actual port where the The node is physically connected and if you notice this is a list Okay, this is not just one link. So so when we present to the ML to driver We'll give it a list. The reason it is a list is that Your node may have multiple necks. It may have multiple ports. So therefore you may want to use port groups Or you may want to have you may want to bundle bunch of ports and represent as a as a single interface So all of those are get packed and and the ML to driver on the back end Will pass through it and will know exactly which interfaces are involved and therefore they can configure Another thing which we have added is a new we nick type bare metal so this helps the ML to drivers to filter on so if you have a ML to driver, which is doing deployments for Virtualized as well as bare metal deployments and you want to filter all the bare metal ports So this gives you the ability to do that Right and all of this is managed by the ironic driver You know as a part of the the flip part which I was mentioning earlier so in order to facilitate and provide that information so We have created one new CLI in ironic and we have updated one CLI so So new new CLI is a create port groups Right, so this is how you will specify. So here is an example down here. You will create a port group He'll specify the node ID. You'll specify the MAC address of the node Which you're using and then you can utilize that Port group and specify into the update port To call out one thing here. You have to tell ironic about the switch And that's that's the important part about this slide correct Yeah, so this is the part where the operator or Deployer is involved. So, you know like when currently when you do create port you're specifying the MAC address manually So now in addition you're specifying this information All right, so the rest all is done on the back end Okay, so that gives us the ability to support These configurations so you can have a single port Server connected to a switch on a single port you could have a port group utilizing lag Connected to one switch or you could have a port group which is going to multiple switches for Utilizing M lag right so again specified exactly the same way So when you when you're presenting if for instance that you wanted to represent this structure I'm like peer utilizing lag So you will just simply switch ID that list which I mentioned earlier link And so each each link represents that structure which I mentioned so you're specifying the list the back end ML to driver Knows and that's why the switch ID and the switch information is there So even if they are different models or whatnot, so the the back end knows exactly which Which port of which switch is the node connected to enhance it can appropriately configure that The question was do you have active passive support? This says nothing to do with active passive from from this point of view that would be at the host right? Pardon me that'd be configured No, we I don't think we have covered that use case Is there a need for that that one that that would Let's come back to that So So here so I'm gonna play a little demo, but before I get into that I'm gonna describe to you the how the demo is set up Right so on the left hand side we essentially have a Open-stack controller, which is ironic Noah and neutron with the Aristos ML to driver running Right and this time I'll do driver talks over eapi. That's a standard Aristos API And it connects to cloud vision Okay, so the cloud vision Essentially configures the switches or whatever in this particular demo. I've used just one switch and the controller is physically connected on Ethernet one, so we I'm essentially utilizing two ports port one for connecting the controller and The port two is where the bare metal node is connected so so that's my setup and Essentially One thing before I run the demo What I'm going to say is so as an admin admin will go create Provisioning that work. No, it's a neutron that create command to create the provisioning network and similarly the tenant network so those are the two like sort of prerequisites and then the Then the operator will go and Create the port or do it because those these pieces I don't show in the demo So I just want to give a background so that you understand when you watch the demo. So so the the operator will issue an ironic port update to Specify that physical connectivity. So once that information is present at that point We will issue the Nova boot and then the rest of the Information you will watch it going through Having said that So what I did was actually I was trying to be very brave. I have this demo running actually live It's it's sitting in San Francisco, California, and I'm locked in So I was like Beaten into my head that hey Murphy's lock kicks in and nothing works in the real life demos. So record it. So Plus it takes a long time for the Server to boot. So I've kind of cut short. So made it very simple and short. So So all the I'm starting a minute into the demo because I've already given you the introduction So another thing which I've done is I have uploaded this on YouTube so you can actually watch it at your leisure if you if you need to Having said that details I have created two networks One is a provisioning network, which is on 100 subnet. The provisioning network is Utilized for the deploy phase of the bare metal Server you read this phase it uses this network provision network to fetch the image From the TFTP server and once the image is fetched it will then reboot and connect to the tenant network, which is on 200 So Same networks have been Learned by the Vista tour The provisioning network is on Milan 98 and the tenant network is on Milan 35 So essentially Milan 35 and Milan 98 Are of importance to us for the bare metal deployments We need the physical connectivity information. So here is the port And if I go In this case you will notice there are a couple of fields which have been added to the port structure of Ironic in the Liberty release. Yeah, one is local link connection What this essentially states is that bare metal server is physically connected to which switch in this case We are connected to a restart at the SC 7050 switch and port ID Remember, I mentioned so I'm gonna switch ID To important networks are Vlan 35 and we land 98 Since I've created those two networks. They both have DHCP instance running on them for so what it means is the switch has connected So we have a little dilemma here in the interest of time. We're gonna cut off the demo Watching VLANs change on a switch isn't that interesting anyway Yeah, correct. Yeah, so so so what I was gonna say is that this is on the YouTube You can watch it here is the link Right here so you can play this So yeah a couple quick things this is not in the Liberty release it will be in mataka It was close but not quite there And there's a lot of future work to do yet and with that I'm sure there's a million questions out there, so Feel free to come up to the mic or just you want to cover the future Let's let's just we've got a few minutes left. I would love to hear questions And I'm just just yell it from the back. I'll repeat it for the recording No, it's in Garrett. Yes Yes, the neutron part is already merged that's part of Liberty The question is a question for your vendor. No, yeah, the question the question was is there any open source ML to implementation That supports this yet and I know No HP HP ML to driver is there or this time I'll do driver is there which will work that okay Yeah, we've been noodling on that one. I don't have an answer yet Have we considered using questions? Have we considered using ODL in the middle instead of something something? And I haven't personally can ODL configure switches directly for this kind of thing So, okay, so it is the deal. It is the deal ODL works as another ML to driver Right, and it's no different than what I described. So so the answer was The answer from the audience over here was that this would work if someone added that support to the ODL ML to driver Correct sounds like something someone could do question in the middle what's the Ironic is on the provision network. So the a machine that has been booted on the provisioning network must be able to Currently make API calls out to the control plane like download images from glance or something It could be You need those two networks to be able to access each other Yeah, right ironic needs to be able to talk to the node and vice versa though in some configurations depending on your hardware vendor There are ways to separate those so that ironic does not need that access or I want to use the out-of-band channel instead of that network for some things depends on the hardware driver The question is is there any intention or thought or support for provisioning in finnaband? I Don't know the answer. I know that within hardware that our own supports. We would have Melanox drivers in some of the ways we build machine images that we could do that too first is the deployment No, we have tested just one port for now Oh, have you tested with port grips Yeah So we got it working last week by the way, so this is this is fresh of the oven The question was have you only tested single port or also lag and I'm like If we try Okay I The question as I understood it was if you configure this with link aggregation at what point in the node provisioning process Does link aggregation take effect? So it's a essentially a logical port at this point. So when A port create happens on a provisioning network. So like I mentioned, it's a list of links It's not a just single link, you know, you could have as many things as you want When does it take effect does it take effect during the pixie boot and deploy process or only during for the instance? Pixie and I'm like isn't Thank you So we just started this yeah more stuff later So future work We got VLAN VX LAN capabilities Capabilities for things like a subnet to each host for security Things like that. Did you have something more in mind? Oh That stuff Yeah, so that's that features, right Yes, so those get interesting they're very vendor specific as I understand it because So at rack space, we've looked into doing things like that and we'd basically be putting ACLs on the switch And so we haven't investigated that upstream yet, but I expect we will soon Yeah, see that's a Excellent question. See the security groups and security APIs. They all exist in neutron, but they are all very Virtualized word centric, right? So now that we have brought this feature in and We are utilizing all of the framework which exist in neutron for virtualized frameworks So this will hopefully make it seem less to be able to utilize for this We have as a group For this initial implementation. We have not looked into this, but this does open up The path for us and if there is an expert Bob has a point Yeah Yes, yes, and I think it would fit great it also is a good path to getting that metadata in Nova to You know configure the config drive and what is and there's already been some work done in Simple in it. I think not kind of a simple in it to be able to read that metadata if it's passed in on the config drive Cloud in it to cloud and to be to and some planet So have you encountered any issues during the position? So can you share with us for example? the last step is to Unbind the delete the port from the position network, right? So after that, so you will create update the port to the tenant network So is if there any reason for some reason that the very ampere is broken. So it's not updating the phone So so yeah, it's it's hand on the position, right? So what are you gonna do? Let me tell you about switch off systems and switch API is that aren't meant to be pounded with request Downstream we have run into some of that It's solved with a lot of retrying and occasional build failures that get rescheduled But I don't we haven't seen any major issues that yeah So be normal debugging related issues normal when you're building something, you know, thank you everyone. Thank you