 Okay, welcome welcome everybody This is our presentation on the get me a network feature that was added in the newton release which is Given we've already released okada, and we're working on pike. It seems like I mean talking about a thing that we released a year ago is Sort of odd, but most people are just getting upgraded to newton. So timing is actually good I'm Matt rudeman. I am Worked for a while way on the nova ptl for the last three releases And I am Armando Miyacho. I work at Susan now, and I've been the newton ptl For the mitaka newton and look at the releases So today we're gonna go over what the rationale for the feature was basically use cases what the problem was What the proposed solution was the different community effort involved because it was a couple releases going and a lot of different people involved different projects between newtron and nova The test methodology how we came up with doing the testing the integration testing Possible future enhancements Armando is gonna attempt a demo a live demo and then questions if anybody has any hopefully So let's dive in with the rationale So you see here on the left of the screen a clueless user with a laptop and trying to boot his own first VM in the cloud and Prior to to get me a network He would either he or she would either Start a VM potentially with no net no networking at all or he would have to Basically get a good understanding of how newtron worked and provision newtron Networking building blocks in order to provide the VM with a way to to the cloud or to the internet and to To other the cloud resources so In reality what needed to happen was that again the cloud user in order to get networking to their VMs needed to get to Intimate knowledge networking knowledge And how newtron worked He or she needed to create a logical network for tenant isolation and on that network You needed to provision an IP space by means of a subnet and if the if then network If that VM needed to go out to the internet and then back you needed to create a router provision the router attach it to a public external network and so on and so forth they obviously led to frustrating users because That was obviously a multi-step process There ought to be a simple way to do this so the very high-level requirement was how do we Take that mess and automate it to make it easy and That's what get me a network does the operator performs a one-time setup With some defaults and then the user gets a network provision when they boot a first VM within their project and then he's Wooting I guess I mean that person doesn't look like you a little bit Yeah, I don't wear a sport coat though So the solution with on server create what we basically do is if you don't already have a network available Nova then works with neutron to see if there are some You know minimum requirements if we can actually auto-allocate a network topology If the news if the user doesn't have one then we tell neutron to create one Any other boot request after this basically gets the same network topology So this is a one-time only per project unless you delete everything But it's created the first time and then after that Nova will ask neutron for available networks It's already there and we'll reuse it We should note that this doesn't automatically give you a floating IP though There's good reasons for it will will show them in a bit So it sounds simple enough, but it took us about two years to get this done So in this slide we're outlining the timeline. So this this You know, we had the proper discussion back in the Vancouver summit in in 2015 again it's about to be two years now and We achieved the rough consensus in the in the room when we went went back home we worked on on the neutron spec and We start so the obviously we in that spec outlined a couple of strategies implementation strategies and ways to deal with with the auto provisioning problem as stated in the Vancouver summit And then implementation, you know kind of stalled for resource constraints reasons that I'm not going to digress too much in you know right now and But that said in the mitaka time frame we managed to resume the work and Achieve Further consensus with with the Nova team and the Nova mid-cycle that was held in January in 2016 in Bristol And we were able to complete the fissure on the neutron side in much doesn't 16 in time for You know in the mitaka official release at that point basically the ball was back into Nova's court And we said you know now it's your problem you you you get it from you know to the finish line and That's what happened. I mean So the novel team got together they put together a spec outlined what the strategy was in order to describe what the user experience needed to be like when dealing with the API as well as the common, you know the common line and you know might put many hours during the weekends and and and nights and the future got implemented in time for the Newton release and Throughout the integration testing. We kind of like ran on doubt a few a few issues that There were kind of like overlooked on the neutron side But we managed to you know get the future Tested like and went I was gonna add that During Liberty there was actually a lot of mailing list discussion about a lot of this Monty and J Pipes specifically had been talking about a lot of the sort of end-user requirements for this So this wasn't I think it's important to note that this wasn't just Nova developers and neutron developers saying hey We could write a bunch of code to do this thing and not really knowing if any actual end-users care about this or would be using it It was really driven by Mont we said at least if they know that we usually say well, how would an end user care about this? We'll go to Monty because Monty's got the shade with you know 50 different public cloud counts And we say for the UX on the end-user design. What do you want to see out of this? So there was a lot of end-user input on how this was done and design going up into the specs Yeah, and I mean from from a from a time standpoint Some we argue that actually took quite a long time You know roughly two years to get this implemented and this wasn't like overly controversial as a matter of fact You know when we go into You know into the neutron Like side of a side of the house there was some initial pushback and actually the pushback was mine because my main concern was Okay, now if you're asking me to do the provisioning a multiple steps at once You know atomically and in a potent fashion now you're putting an awful lot of you know constraints on on the implementation And potentially you we may affect the ability for neutron to to cope with load And I you know I was initially like reluctant to the idea of adding that much complexity into the code base But then obviously, you know the rationale was there and we just had to make you know, we just needed to again Work hard in order to pull that off and So okay, I mean I put this the spec in there for You know for reference But I mean even though there was an awful lot of like complexity to deal with on you on neutron side We were able to rely on on a building blocks that were already available in the neutron system Such as subnet pools and an external and default external network. So what? What subnet pools do? If for some of you who may not know about them It's a way For the operator to specify a much larger like pool of addresses that can be used to fetch IP ranges from when allocating tenant networking So rather than I'll force the tenant to come up with its own IP scheme By means of a default pool a tenant can say I want an address out of this pool and you know an IP scheme gets generated from it So the workflow I mean the way we went about implementing this from a neutron side is that then the operator has to come up with a you know with an enough large default subnet pool They may you know be IPv4 as well as IPv6 that he or she you know is confident that You know it's not going to get exhausted throughout the life of of its cloud. I mean it kind of can Things get complicated if you get to a point where you're you're the subnet pool gets exhausted then You know you'll have to Massage things a little bit So you end up like marking a subnet pool as default again as the Subnet pool to fetch IP addresses from you end up choosing a public network the one that you use in order to allocate your floating I think you're you know You're the ones that you're using to get VMs out to the outside world or the internet you mark that as default And those are basically the two Decision points that the human being makes on behalf of the neutron deployment so that the reason deployment when faced with a request of Provisioning a topology on the user's behalf In Newton knows, you know what what resources to use during the boot phase then Neutral simply takes you know goes in in the default subnet pool and Pulls up an IP space it creates a network a subnet rather an uplink as I showed you in the in the earlier slide and Returns the network ID to Nova at that point Nova has everything in its to do in order to proceed with you know with the Pool process and you know give a VM and a virtual nick so on the Nova side as As Armando said in Mataka neutron delivered their API that delivered their changes in the Newton release Nobody had really picked this up. I guess there wasn't really a Anybody already planned to work on this and I think it was just like one day in IRC we were talking about Well, they've already got this site this API available over in the networking service What is it going to take to use this thing and we sort of talked about it for I don't know a few minutes and I thought This doesn't seem that complicated. I'll start writing a spec. So here's the link to the spec. It's actually It's actually pretty detailed But I I generally go overly detailed in my specs But as I was going through this there was a lot of the interesting parts of this was when I started looking into When I started looking at the code and Nova and how it actually works now the funny thing is We found out that there was this weird case that you can get into where if you don't actually have a network available Nova doesn't fail. You just get a VM with no address no at no networking Nothing and we thought well, is this a is this a bug shout? You know, should we start failing if we failed that would be an API change? So let's not do that But are there actually use cases for people doing this thing actually specifically not wanting networking when they create the server You can attach it later There's an API in the compute service to attach a network later But it just seemed weird to us that you would create a VM with no networking to start So something like that drove a lot of discussion in the mailing list and how do we handle this? Do we actually specifically say in the API that if you don't want networking you say that you don't want networking? That drove the We were thinking about well, should we make defaults in the API? Should we sort of assume behavior and we just eventually came to the conclusion that Assuming how the user is going to use it as a bad idea. We should just be very explicit and specific in the API this is another case of We don't a lot of the problems that we have is Designing things as we don't really know how users are using things. We don't know users are using which API is and how they're doing them so We try to be very explicit in this case Another thing that came up was Nova supports rolling upgrades So rolling upgrades in Nova means that the control plane can be talking to the prior version of the compute and Since the compute is what's actually allocating the network When we were doing all this in the control plane We needed to know we could be talking to a mataka compute that has no idea that the user is actually requesting this specific thing So we had to build that into the code and there's act Nova actually has some Metadata built into it that we know what the versions of the services are and we can make decisions in the control plane Based on what level the compute is So really the way this works is you you don't auto allocate until all of the computers are upgraded to the level of code That understands this type of request As I sort of touched on with the the nun case is The boot request today takes a dictionary and it's got a either a port ID a network ID or a fixed IP address Or some sort of crazy combination of all three Which every time I have to look at how this works and what's actually required in different cases It's just it's confusing So we were thinking well the initial the initial draft of the spec said well, we'll just say They provide an enumeration for the network ID And eventually kanichi actually kanichi omichi was reviewing the spec and he said what if What if we just said you either if you want the specific stuff? You provide the dictionary as you have before if you want a specific network or a specific port you do that If you want the very simple case, we just have an enumeration You either say I do want networking automatically created for me or I don't And so that's what ended up being in the api Um Before this microversion if you don't specify networking, it's Automatic so we still have we still wanted sort of that same behavior, but the api is explicit So with this microversion you still have to specify Whether or not you want auto or non you cannot specify Just no networking without providing this network key There was some pushback on that in the development mailing list about This used to just sort of work automatically before and are we making it more difficult now by making people actually be explicit in the api request We thought if you're an api user You are probably being very specific about the stuff that you're doing anyway So the way that we massage this a little bit was we said if you're your cli user if you're using the command line You want it to be simple. You want to make same defaults So in the cli if you don't specify anything we still default to do auto allocated networking Another key thing was that nova doesn't do retry of this So when nova asks new neutron to actually auto allocate the network topology If it were to fails for some reason nova doesn't retry And that's really based on trying to make sure that everything is atomic and idempotent over on the neutron side Which when we get later into the test discussion that's drove a lot of the test um methodology So i've already talked a little bit about the workflow the workflow is basically it's a micro version So anything over 237 to do this The network's key that's where I was saying if it used to be it could be a dictionary or it can be an enumeration If you want auto allocated networking you specify networks auto Nova will check to see if all the compute services are updated to the latest level to support this If not it still goes back to the old behavior, which It won't auto allocate a network if there's one available you'll get it So it for anybody that's been used to how this has always worked. It's really no you don't lose anything You just don't get the new stuff Nova api will validate that if there's a network already available to the project We'll just use it as before if not we check to see if the defaults that the operator is supposed to have set up Has set them up if not then we have to fail the request because we can't we can't perform it Down on the compute side Same thing basically if there's already a network if you just Created dev stack dev stack gives you a public network. You just use it No problem. Nova won't tell new trying to create anything But if there's not if there's no networking available to the tenant already Then we ask new trying to create it If that's if it works we get the network id back we create a port on it if it fails we don't retry because Basically, this is a global failure if neutron isn't set up to handle this It's not retrying to a different compute isn't going to work So we fail and we put the instance into error state so the testing the going back to the Atomic and item poet in nature that we wanted this to be in the neutron side is When we wrote the tempest test The none test is easy. You basically just say I don't want networking Make sure that there's no networking. That's an easy test The difficult test was making sure when you start out with no network for the project That one gets created But just one gets created and it doesn't fail if multiple requests are coming in So the key point to this and I think when I wrote when I originally wrote the test I was creating two servers at once at the same time But that didn't really there was this weird little edge case where A third server three servers getting created at the same time was really what made this sort of fall over On the original implementation on the neutron side Yeah, and that cost like writing all the code was pretty simple Once we figured out the design it was really the testing that was like, oh look this Three servers two is okay. Three makes it all fall over and start burning Um, the reason is that the first two servers will come in concurrent and they'll see neutron will say There's no networking for this thing. I'll start creating it Eventually it figures out Wait a minute. There's only supposed to be one of these it rolls back the first one The third server coming in comes in After the first two and then it sees Something is getting created. But there's two of them. What am I going to do? There shouldn't be two of them And that's the whole network ambiguous error if you've ever gotten the network ambiguous error booting a server Where nova just throws up its hands and says, I don't know which one to pick you didn't tell me I Whatever I'm going to fail So based on these failures, we ended up having armando worked on a bunch of Actually hardening the neutron side of this. I don't know if you want to go into details. We ended up basically leveraging admin statuses on on networks where when when neutron does the provisioning Again, since the provisioning actually involves a number of steps that Cannot, you know for what for a number of reasons happened in an atomic fashion Because there is backhand coordination involved if you're using You know different sdn controllers and whatever You need to basically lay Atomicity as a semantic on top of the existing plug-in api that neutron implements and basically we did lever, you know We did accomplish that by Creating, you know networks in disabled state and turned the the admin state back to you know back to enabled Once we were happy that only one network was available was being was being provisioned and That was it, you know that that allowed us to work around this this particular corner case And I will say Everything in the everything in ci is gating on this test This is not a scenario test. This is not a slow test. This is something gets run on every patch So thousands of times a day so once we got this worked out. This is actually Gating on everything from ever actually merging. So it's pretty solid So, you know once we go to the point then it's okay. The feature is done done on the nord side on the neutron side what's next and Actually, one of the reasons why we're here is and there is also a forum session about about this develop and tomorrow It's you know get people interested in this feature trying it out because it doesn't do everything right? It doesn't address every possible provisioning case that the people may come up with In fact, as Matt mentioned at the very beginning of this session We don't auto provisioning floating IPs and the rationale there was okay If we have like dual stack deployments, so we have ipv6 deployments only Actually, we don't have floating IPs or we don't want to get into situation where you only have ipv4 Addressing you don't want to you know necessarily allocate floating IPs if a user is not interested in using one ever So we kind of like left it aside It's enough you know enough of a of a step a simple step that can be left to the end user on the other end we don't allow the You know the the neutron system to figure out which kind of external gateway mode to associate to the router A router is as you can you know can can do s nothing or not and We use the default which is s not is enabled by default at the same time We don't you know we we provision a very like explicit Network entomology which involves a logical tenant network, which is typically VX LAN That's backed by a router that's connected to an external network What if a system does Rely on provider network, you know provider VLANs alone. Those are not quite Addressed yet at this moment So I think you know words I can do solely so much so if you get you know Two minutes left. Let's see. Only two minutes. Let's see if we can pull that off Do I have to hold this? Right, so what we have here is dashboard. Let's see if I'm still logged in Oh, yay so What I want to do here is show How this works in practice So I'm going to I'll try to type very fast See so right here. We have a router external network associated to the tenant No demo tenant And we're gonna provision here Two servers And we don't specify any networking So to be to be told we started five minutes late So I think we can perhaps overrun a little bit, but let's see what happens Uh, so we did so what I did here boom We haven't touched anything except like Launching two servers without specifying any networking. What happened in the under, you know on the back of the screen at the terminal is two Auto-allocated networks have been provisioned an ipv4 and an ipv6 network because that's how I configured my dev stack that Are connected to the public external network now My vm should go and get attached to those networks. So it's still taking some time But eventually it should it should do that and it's a live demo. So I Left to see whether it's gonna work as expected We also found out a few hours ago this morning when we talked about this actually when we went through this dry run a couple weeks ago on a hangout Armando was using the nova cli with nova boot Everything worked fine this morning He was using open stack client And it was an auto provisioning the network and we were trying to figure out You know, is your dev stack actually running like, you know trunk code and What could be going wrong with this and we then I remembered that open stack client the cli doesn't Default to the latest of it. It doesn't do api version my Negotiation for the cli which nova cli does So if you're using open stack client by default, you're not getting the micro version that's gonna do this thing automatically All right. So everything is good. Life is good. The demo is behaving. Let's delete those servers and and now We want to try different deployment scenario where Let's wait these servers to go away So again, as we've seen right now The user hasn't done anything Except running the server and the networking provisioning has done on on his on his own her behalf The m's are gone But what we can do now is that we can use another clever command that was introduced at time in the in the mitaka time frame which is purging the the tenant deployment so by doing Neutron purge and that's the you know the tenant or project ID What Neutron is going to do is going to wipe out all the networking resources for that tenant so The task that prior prior to this command would require the user to script Deleting you know deleting the networks the routers and security groups and so on and so forth is done at a push of a single button So you should see that network. Yeah, which we had to do that in tempest and it was right It is not fun to code that up. It's a client side. It's a client side orchestration It's not something that actually we have actually there is there is a way to Delete the auto the auto allocated topology Neutron. So you have an api endpoint that takes the delete verb Uh, it's just not exposed through the api binding So if you were to talk directly to the api endpoint and you know, you could do that actually So the public that the public network is a rather external network created by the admin So what else I wanted to show you now is to create a shared network And that's obviously visible to all tenants and yet boot service on them And you'd see that actually no tenant networking is provisioned whilst the buddha server or servers But the shared network available to that and another project that the is going to be used in order to plug the bm So you can see, you know, I'm actually going to do that as admin I'll create a network and you know in an ip range and then I'm going to flip back to the Demo project or tenant and I'm gonna boot the vm. I'm gonna boot the servers. So let's do that and see what happens So you'd see now that The shared network gets provisioned under the hood Well, I will buy it by the tenant and when I boot the server servers, they're gonna plug into the shared network So give it a sec. It's not a very beefy dm and it's running on my laptop There you go. They're just plugged Nothing else is required So with the 10 seconds we have left does anybody have one question You could shout it out and I could repeat it if chat maybe mark If you could use the mic or we would just replay the question. I'll just replay it. Yeah Is neutron purge for auto-allocated networks? No, every resource provisioned by the tenant That's owned by the tenant and and Correct. So and the new you know the admin can also go and wipe any tenant or project environment that he cares about Or is pissed off by All right. All right. So I guess that deserves a round of applause. I mean, come on. That's an amazing demo that didn't fail Thank you So give it a try give us feedback again. There is a forum discussion tomorrow I should have actually added the note here, but I was lazy enough to not do that and Thank you very much for for coming here and listen to us. Thank you