 Okay, okay, thanks everyone for coming and good afternoon. My name is Anik Mazumdar. I work for Cisco. Joined here by my colleagues, Shishang from NEPHOS 6, and Sharmin also from Cisco. So what are we going to talk about today? Icehouse, right? Not very exciting. Not anymore. But we're not going to talk about here about roadmaps or what's coming up in Liberty or something beyond that. Those are other sessions. People are covering already. We are just going to talk about, okay, so we have IPv6 and we have public cloud. So what is the importance of IPv6 in the context of public cloud? And then how we took a small piece of work out of that and tried to productionize it, okay? So how was the journey for productionization? So what things worked out of the box? What things didn't work? What were some of the things we took from Juno? What are the things we had to develop? And how did we test it for a production scale deployment? That's basically what this thing is about. This is about sharing our experience. So that's why it's called case study, right? It's not about roadmaps or anything like that. So IPv6 in public cloud, so what's the relevance of here? This is my first summit here. So when I hear people talking, there's a lot of talk about private cloud. There's a lot of talk about enterprise. But public cloud is a little different ballgame, right? You're not in control of the workload. You do not know who the tenant can be or is, right? So anything, you should be ready to handle anything and at a very large scale. And large scale, the problem with IPv4 is that the number, it's not very large scale. With public IPs, it's limited. Everybody is feeling the crunch by now. Even the RFC1918 private IPs, it's a limited pool. So the provider cannot literally afford to number every VM. That's why the concept of overlapping IPs, and everybody has the same IPs. You have the NAT, you have all those problems, right? So IPv6 in public cloud takes all of those things away. You don't need that. And you don't have the problem of overlapping IP. But the problem is that adoption, right? So it's a chicken and egg problem. So you ask the provider, the cloud provider, do you have IPv6? And they say, no. Okay, then why do I need to worry about it? We'll work with IPv4 the way we have been working. So that's what one of the bullet points says, that one of the purpose of IPv6 in public cloud is to enable adoption, right? So if you, as a public cloud provider, if somebody does not proactively do it, tenants won't come and tenants won't be using it. And the problem will start getting worse and worse. And why it is actually important to have IPv6 in public cloud, like with things like Internet of Everything and Internet of Things, right? I-O-E-I-O-D. It's talking about massive scale. Millions of endpoints. You possibly cannot address with V4 addresses. So you must have IPv6. Lots of telcos are at least numbering all of their underlay, that is their physical network with IPv6. Whenever you talk to the telcos and say that I want to host a public cloud in your environment, they partner with somebody like Cisco. They say, do you support IPv6? Because our infrastructure is purely IPv6 by now. We want OpenStackup to also support IPv6. So these are some of the reasons, and I don't want to go bullet by bullet, why in public cloud IPv6 is very, very important. This is a statistic. These bars do not show the adoption level of IPv6, but basically the thing to look at here is that these are some of the major public cloud providers today. And the adoption level for these are at various levels. Some support it in a shared public network. Of course, I may be missing something. And some support it at, say, a load balance of VIP level. So I only support like a V6 VIP, but I do not support the VMs to be addressed with IPv6 addresses. So there are various kinds of adoption level, but in any case, it's not very high level of adoption. So people are just starting to get there. As we go through Kilo and Liberty, the features are getting more matured, and the features were already there in Icehouse are getting more stable. So as OpenStack is matured, we are also in a position to adopt it more and more. So what did we do? What was our key study? It was about enabling IPv6 in an Icehouse environment using Red Hat OpenStack for pure provider network. That was the simplest use case. Why? Because when you go for provider network, you're not dealing with neutron router. If I go for tenant network, I have to understand how tenant networking works with IPv6. That is a feature that was available Juneau onwards. So you have to first stabilize it. You have to mature it. So we wanted to see at the simplest level for a provider network where the router is external, I don't have to deal with a lot of OpenStack stuff. Is it easy enough? Is that a cakewalk? If that is, we'll go to the next step. But in that venture, we found out several things which my colleagues will also explain. So where we made the major changes? Not the changes, I'll say, but what were our touch points? You see the red stars there. So many of the changes were at the network node level where we had to do several stuff. We did not change anything on the APIs, but we did put some validation code on when you say, suppose, enter IPv6 default gateway. Is it a valid default gateway? Is it something that can actually be used? So some of the validation stuff we introduced so that it can actually work, and also on the compete node side security groups, IPv6. So just to make sure that Slack works, you can actually get an RA announcement from a provider router, things like that. So how did that logical scope look like? So first of all, we used dual stack. Why? Because some of the open stack services still needs IPv4, for example, metadata. Cannot work without IPv4. And dual stack is also how most people actually still use IPv6 today. I mean, there are so many things still depending on IPv4. So we used dual stack. We also used provider routers, because it is a provider network, so the router is external. It's a piece of actually two Nexus 9000 Cisco switches. The RA, the router advertisement, was done from those routers. One of the things you might note here is that the O-bit, the AMO bit, the O-bit is one. That means we basically enable Slack for address assignment and address allocation, but we also used DHCP v6 stateless for optional information like DNS server. So that is how our setup was. We used DNS mask for the DHCP v6 stateless option. It's not a separate process, just the same DNS mask, also handing out optional information like DNS server. Now we come to the next section, the challenges that we face. Thanks, Anik. So we're going to, in the subsequent session, cover a lot of security-related issues that we encountered, operational, functional gaps, and what we did for scale testing over this period of implementation. So starting with one of the most interesting security vulnerabilities that we identified was with respect to reconnaissance attacks. So if you understand when neutron creates a port, it also creates a Linux bridge and also accompanying TAP interfaces, and it plumps the OVS interfaces into this bridge. Now by virtue of Slack, IPv6 addresses get automatically configured on these devices like the QBRs and the QVOs. What subsequently happens is that when these interfaces receive or auto-configure their IPv6 addresses, a hacker from within his VM can scan all of these interfaces on the hypervisor that it resides on by doing a mere ping-6 on the FF02 colon colon 1, which is the all-nodes multicast address. And in response to that, all of these interfaces are going to expose their IPv6 configured interfaces back to the hacker. So that's one sweep of scan that the hacker can do as a result of this. Now there are several services that run on the compute host like SSH that are connected to IPv6 and using the right set of credentials, the hacker can gain SSH access into the hypervisor through one of these auto-configured link local addresses. Now the manner in which this vulnerability was addressed, and this is one manner in which it was addressed, there may be several other options, but the one that was recommended in terms of plugging this whole was to globally disable IPv6 interfaces for all current devices and anything that would get created in the future, like new virtual machines on that hypervisor. So disable IPv6 globally at the host level and that would not auto-configure any interface once a device would come up. So we implemented this vulnerability fix and verified that none of the QVB or Linux bridges or QVO interfaces received an IPv6 and therefore nothing was pingable and there was no scannable operation that could have been performed by a potential hacker. Now in doing so we also discovered a very fundamental bug in Neutron and we'll probably cover that in one of the subsequent sessions. All right, thanks Charming. In the next slide I'm going to talk a little bit more heavily on the IPv6 side. So Charming shared with you guys one of our major security concerns which we discovered during the production or testing. But actually in the IPv6 world there is the notion of first of all the hop security protection which gives users the list of two you can use to build your first line of defense on the router or on the switch sitting on the edge. So following some example let's say the IPv6 RA guard, the DCP RA guard and also the source guard and destination guard, prefix guard and surely also imply the device tracking. So follow the same best practice recommendations we introduced the RA guard the first time into the ice house release. And actually what we did is we tried to backport the code from the Juno and make it available and work in the ice house environment. So conceptually it's very simple. What we tried to do here is we want to block we only want to allow the router advertisement from the legitimate address. In this case it's the link local address of the default gateway of tenant network sitting on the upstream router. So by doing so the hacker cannot send any bogus RA to any VM instance in the network. Also questioning how this link local address is actually feeding to the neutral router as a matter of fact the link local address of the tenant gateway is actually provided as a gateway parameter during the subnet creation process. So by the same token we also introduced another feature called the DCP V6 guard. This is a feature completely new to ice house also completely new to the community. Here the DCP V6 guard tried to block those tried to block the DCP V6 information reply message from authorized DCP V6 server. So by in order to achieve this goal the time when neutral server tried to create this dynamic security group policy for any port and as soon as the neutral server realized oh this port is actually tied to IPv6 subnet running in the DCP V6 stainless mode and it is going to automatically add one more rule to the port and the port says I will only allow the information reply coming from the link local address of the DNS mask binding interface inside a queue DCP namespace. So in other words if any hackers sitting in this open style cluster and try to mimic or try to send out some wrong information to poison your DNS name server cache then thus reply cannot go through. I also mentioned a couple other security tools such as the destination guard but those are heavily rely on the edge route and switch so we're not going to cover them here in this presentation. So just in case any audience here plan to use a Nexus 9K or any Nexus product with IPv6 enabled open stack cluster I want to bring a couple things to your attention. The first one is on any layer 2 segment please remember to turn off this the OMF which is Optimize Multicast Flood because otherwise it's going to all of the attempts tried to do the IPv6 neighbor discovery are going to fail. So please remember to turn it off and in case you have a VPC in place we also recommend you to use this command IPv6 ND synchronize command to accelerate the address table convergence between your VPC peers. On the layer 3 side as you can see on the right hand side we also recommend you to turn on the HSRP instance to provide redundancy of your tenant gateway IP address. So next I'm going to hand it over back to Sharmin. Some of the operational challenges when we tried to productionalize this there were several others but these were one of the major issues that we had to resolve fix, identify or just record as an open item as a caveat. So some of the first things that we did was image support testing Slack for Slack and DHCP v6 stateless because that was the scope of our requirement. In terms of Slack all the images passed fine there was no issue with Slack as a function. As far as the windows was concerned Slack and DHCP v6 stateless worked as is for the 2008 and 2012 images for the CentOS and the rail images. There is a bug in the DHCP's in the DHCP client script where in the absence of the network manager it would overwrite the IPv4 or IPv6 DNS name servers and resolve.com so we had to enable network manager to address that issue. In case of Ubuntu in addition to having to enable the network manager there is still an outstanding issue in the G-Lipsy library where the resolver imposes some sort of limitation on the number of DNS servers that can be added to the Etsy resolve and if you exceed that number it's going to truncate the DNS servers in resolve.com so that was one more thing that required to be noted. In terms of IPv6 system of record inconsistencies now this is a very corner case but nonetheless very prevalent and this scenario occurs in a situation where if there is a virtual machine on an IPv4 only network and at some subsequent point in time you add an IPv6 network subnet to that network by virtue of against lack when the VM receives an RA it will auto configure the interfaces with IPv6 but which is fine which is expected behavior but what happens in OpenStack is that there is no system of record for a corresponding IPv6 interface in the Neutron database and also it would not it is an out of band configuration so there would be no IPv6 tables rules that would get generated on the compute node for that particular VM so that's something that is still an open item in a corner case that needs probably it's not something that's a small patch by any measure but needs to be discussed in the community now as far as IPv6 gating criteria is concerned if you recall from the previous slides when we said that we address this security, sorry when we said that we were addressing the reconnaissance attack issue by globally disabling IPv6 addresses on the hypervisor this actually manifested in another issue and that's a fundamental bug in the Neutron code base where in the when we disabled it so basically in theory first of all disabling IPv6 globally on the hypervisor should not affect VM traffic being forwarded to IPv6 traffic being forwarded to the guest VM but what we saw is that when we disabled IPv6 on the host hypervisor all of the IPv6 table rules got wiped out for the guests against the TAP interfaces when we dug down deep into the code we realized that IP tables manager actually uses this very same flag as a gating factor to decide whether or not to generate IPv6 tables on the compute host or not in reality it has got no bearing, no connection but that was an issue and a bug has already been submitted to the community at this point in terms of subnet validation between V4V6 we discovered this very recently and we happened to stumble upon this when we enabled a certain flag which is forced gateway on subnet which really does IPv4 scheme type of validation and it triggered a new bug in IPv6 subnet validation went through the same routine and it does not board well because IPv4 schemes are different from IPv6 so it resulted in this invalidating a valid subnet creation scenario so that's a fix that's still open needs to be put in the community from a host compute perspective this is not related to IPv6 but something that we realized when we did our scale testing is that beyond a certain load we started seeing inconsistencies in the TAP interfaces so TAP interfaces in the compute host would intermittently go down for no apparent reason and blocking traffic what we realized was that the network manager on the host was actually interfering with trying to manage the devices the TAP devices that it didn't even create but it was trying to manage those TAP devices and putting them down for no apparent reason so we had to put the network manager on the compute hosts so this is some of the high level flavors of the issues that we encountered and mitigated along the way coming to scale testing, performance testing so our requirements were very focused we had very specific use cases to deal with so our objectives were primarily 4000 interface testbed for IPv4 and IPv6 each generate ICMP traffic between or against these interfaces for IPv4 and IPv6 test the resiliency of DHCP agents and metadata agents because those were the focus of components for enabling this feature and ensure that the stability of the testbed over a period of time was maintained so we left the testbed running for a few days and then again ran the ping test to ensure nothing went down, things like that so from a process and tool perspective we then housed the Python package that generated concurrent requests against all of these actions to boot VMs to reboot to ping, gather stats to do some preliminary analytics, not very fancy but something that would get us going with our outputs and numbers, capture console logs to make sure metadata behaved the way it was supposed to customize VMs, things like that from a scenario perspective we executed three to four scenarios 2000 dual stack dual vNIC VMs so dual stack dual VMs just to accommodate the 4000 interface scenario at any given point in time we generated a load of 50 concurrent boots or reboots and this was generated from another blade that's a 40 vCPU blade so we ensured we had full concurrency when we tried to generate this load so in a total of 8000 interfaces across three networks we were performing all sorts of operations like DHCP offers, acts, requests replies, etc from a scale testing topology our control plane had about 35 virtual machines just basically the control plane for open stack doing keystones Glance, API the MySQL cholera cluster rabbit MQ clusters from a data plane perspective we had 18 compute nodes and with 2x10 physical cores with an over subscription ratio of four times CPU and one and a half times RAM so that gave us about 160 VMs per node to be hosted we had DHCP agents running on four network nodes but at any given point in time we had two DHCP agents serving each network and we had in a total of three networks all these VMs are distributed across three networks for IPv6 and IPv4 in terms of storage we used a shared CEF cluster and the target test VM itself was a very minimal flavor with a single CPU one gig RAM image flavor sorry, the image we used was CIROS and it was a dual NIC VM again just to simulate the 4000 interfaces so you can see from the scale testing perspective once the test bit was up we had an even distribution of all VMs across all the compute nodes and just some raw statistics for the most part the VM boot times were reflective of range of about, performance range of about 20 to 60 milliseconds and going all the way sometimes up to three minutes. At the tail end of the scale testing we had some control plane level errors but we did meet our basic test criteria and requirements from a scale perspective. I'm going to hand it back to Shishong for some performance analysis. Sharmang Shishong with you the scabied environment and now I want to switch gear a little bit and talk about IPv6 performance in this scale environment as you may recall in the VM distribution chart you can see roughly around about 100 VM instance on each compute node right and let's first as a first step let's take a look at how PIN how PIN performs on that VMs with about 100 VMs sorry on that compute nodes with 100 VMs so in order to collect this result we design a test we issue 100 PIN to a VM at both IPv4 and IPv6 address simultaneously and then we repeat the same procedure across all of those 100 VMs on the same compute nodes so by that means we can have a larger volume of samples we can use to calculate the minimum, maximum and also the average value of the PIN and here on this chart you may also notice that I intentionally build this correlation so as a comparison between IPv4 and IPv6 side by side so you can take a look and I will work through all of those charts with you in the next couple of slides so now let's first focus on those charts at the top you can see that for the IPv4 the minimum respond time range from let's say 0.2 milliseconds to 0.26 milliseconds but if you move your focus to the right-hand side the maximum response time actually somewhere between 2.5 milliseconds or 5 milliseconds and sure in this case you can see some outliers and with the average and dance around 0.45 milliseconds so this is how IPv4 PIN perform in this scaled environment now if you look at the charts at the bottom which capture the IPv6 PIN also the minimum, maximum and average and at this time you can see the extreme similarity and as a next test what we try to do is let's move up the product stack a little bit because PIN only reached a certain level in the ISO 7 model so now in this case we did more tests and try to understand IPv6 performance and for TCP traffic and also the UDP traffic and remember those one of the VMs we talk about the old running on the same compute node here we add two more one act as an IPerf server another one act as an IPerf client and for each traffic run I leave IPerf running for 200 seconds so the IPerf can pump the traffic from the client towards the server direction since we talk about IPerf I want to highlight two things because this is a pretty complicated testing and in this presentation the data we collected only represent one single string so in other words for TCP you only see the throughput for one TCP string so is for the UDP and then thing is in order to avoid any further confusion we use all of the default settings for example we use the default MS settings for the TCP and I also use the default setting for the UDP in terms of payload size so we did not change anything so now coming back to this charts you can see actually TCP performed pretty well in this case for the IPv4 TCP actually hit up to the 13.1 gigabit per second and if you look at the IPv6 data this is running in the same ballpark it's 12.7 gigabit per second a little bit slower primarily because the IPv6 header is larger it's 40 bytes versus 20 bytes the 20 bytes is for the IPv4 header but in some it's still pretty close but if you look at the UDP I remember in this case we still talk about one UDP string where is the default MTSI 1500 now the UDP for IPv4 on average the throughput go to 648 MB but for IPv6 the average UDP throughput stay at 603 MB per second so it does not necessarily mean UDP perform worse than TCP if you have a jumbo frame or if you have a parallel stream it's the same time then you can see UDP very easily hit up like a gigabit per second now in the second scenario we did our testing slightly different in this case we have a we still have a 2VM but in this case 2VM spread across 2 computer nodes and in this case what the implication is traffic not only traverse the open virtual switch the local open virtual switch not only the BRNT but also the BREX at the same time the traffic also goes through the physical interface with the 10 gig capacity because of this physical limitation on the pipeline we can offer so very naturally you can see that TCP actually the throughput stay roughly around 8.57 gigabits which is still pretty good right that means you have 85% of your of your pipelines looking at the IPv6 in this case is very very close is 8.29 gigabit per second between these 2VMs instance across 2 different computer nodes but for UDP it's very interesting actually the data actually got smoothed out quite a while here you can also see it's almost like a straight line already for the IPv4 UDP throughput on average is around 690M and for IPv6 UDP throughput on average it stay at 682M megabits per second so here in summary I don't want to put too much comparison between the TCP and the UDP instead encourage you to look at these charts from the perspective that IPv4 and IPv6 because the point I want to make here is in case any of our friends here plan to launch IPv6 in your cloud or in your traditional network I know based on our previous discussion with our customer they always have a doubt in their mind how well IPv6 perform versus IPv4 so I hope this test result I share with you across all of these charts is going to help you make your decision oh I see yeah that's a very good point sorry about for the confusion in this case like I mentioned for every traffic run I leave IPv4 for 20 seconds and every 2 seconds I print out the statistics which show you the throughput in real time so what that means is by the end of the testing cycle I have 100 samples for that particular testing period of time right and this is the data I'm sharing with you on this diagram so now I know Sharmin and Anik both share with you a lot of the challenge and issues we encountered during this testing and you may also be wondering right now what kind of change we made to the Ice House release so on this slide I tried to summarize a couple of things we believe we added value to not only to this production system but also we were going to add value to the community so coming to the future because it's Ice House officially in the community Ice House does not support IPv6 right however that does not necessarily mean the user or the cloud provider you do not have luxury to enjoy the benefit of IPv6 so as a matter of fact we patched the Ice House the source code there's a couple of main changes here I share with you two of them and on the controller side we add a lot of check points to make sure the incoming API call has the right format for example as Anik just mentioned the IPv6 gateway IP address must be in the link local address in addition we also optimized the security group policy source code in order to support the new features like dcpv6 guard on the network node we actually fixed a lot of bugs and here I share with you three of them in the first case in the first case when we evaluate the dcp.py file one thing we notice is they actually treated the dcp stateform mode in the exact same way the dcpv6 stateless mode however that's not the case there's a very subtle difference between the stateform mode versus stateless mode if you think about from the neutral subnet creation perspective it is quite likely that if you want to use the dcpv6 stateform mode you are going to have allocation pool or maybe more than one allocation pool but on the other side the dcpv6 stateless mode doesn't really care because all it offers is optional information such as name servers so for these reasons we actually restructure the spawn process that particular method the way that we can differentiate these two scenarios dcpv6 stateform mode versus dcpv6 stateless mode in the second example I share with you here is also very interesting because by the design by the design when you use IPv6 subnet and that subnet running in the slack mode you should not launch the DNS mask process however for the configuration for the neutral network configuration with only one IPv6 subnet the code actually has a bug and this bug leads to the case that it says try to launch a DNS mask process but without any the dcp range in other words it is completely useless so over the time you start noticing how come you have quite a few state processes running in the network node and then you notice this is a bug actually it is not quite obvious if you use the do stack mode the last one at least in the ice house code or in the Juno code is to try to insert the default route to IPv6 subnet in the format of 0.0.0.0 but that is not the case IPv6 does not like the format so while we are trying to fix all of these bugs we start thinking how come these issues did not get captured in the unit test code so we look at the unit test more carefully then we realize even in the up to the Juno release even the Juno release the DNS mask process launch for IPv6 subnet is skipped there is no unit test code for that and for security group and IPv6 table is also skipped so that probably can explain some of the bugs I share with you why you kind of go to the upstream community so this gives us the feeling that it is quite urgent it has a quite urgent need for us to work on the testing side and the result we are working very actively with the Cisco QE team we developed we not only bridge the gap as a highlight here as part of unit test section but also developed 22 more tempest test cases coverage functional test API test and also negative test at the same time we also developed quite a few variety test cases for the purpose of scalability and performance testing in case you are very interested in what we did and don't worry about it because we are going to contribute back all of our code to the community now let me hand the floor back to Anik so I think one of the key takeaways is that in OpenStack when a feature comes out there is a lot of difference between something being functional and something being production ready we went through all this pain because we had a real customer demand behind it we wanted to have a scalable network scalable infrastructure with IPv6 which does not go down all these things do not come up when the customer is actually running there IPv6 one of the general attitude is that we have a problem with IPv6 let's try to do this and that and solve the problem it's not a problem it's trying to solve the problem so it's very important that operators who are actually trying to do CDS deployment come back and give this feedback back to the community right otherwise it always remains a functional code and everyone has to troubleshoot on their own and reinvent the wheel multiple times right I mean everybody who will try to do something like that will have to go through the same pain but just by sharing something like this we save the trouble so what are the next steps what does next steps mean since we are done we are done with the provider network scenario what are the other things that we need to do in order to have functional public cloud with IPv6 right one of the major things is tenant networking now tenant networking code is there in Juno and Kilo but one of the unsolved problem is that how do you provide external connectivity I mean we were in IPv6 session even yesterday I mean prefix delegation and other thing ideas are being floated it's the right thought process but the code is still not there so without that sort of a thing I mean today in V4 world everyone brings their own address that doesn't work with V6 because if you bring your own address I need to NAT it in order to route it to the internet I cannot NAT because it is IPv6 nobody wants to support NAT in IPv6 that because my ISP won't accept that so it's a chicken and egg problem so this is a very critical problem to be solved in a public cloud scenario why it is not so critical in private cloud because private cloud you can also always put something else after the open stack cloud right and do something there we have to be absolutely compliant with the standards we also want to give the tenants the choice to do IPv6 only, V4 only or dual stack, today we only support dual stack right one of the bugs that Shashank or I think Sharmin you mentioned is that if you bring up a V4, if you add a V4 subnet after you have actually spun up a V6 subnet on a network it gets a V6 address sorry the other way around it gets a V6 address without having any entry in the neutron database now it's a behavior question also right it's a philosophical debate also should the tenant have a choice to give an IPv6 address to a new VM or not so such things used to be looked after multiple prefixes is very important because one of the things that IPv6 preaches that you really do not need the IP and port concept to represent application you can as well have multiple v6 addresses it's representing an application because there are so many of those why do you need two constructs to represent a single thing so we want to also support multiple IPv6 prefixes which is I believe is there until already one other thing we want to support is something like Amazon like DirectConnect which allows a tenant to basically connect to their enterprise network over a private connection and not only to the internet so that's about it shared our experience hope it was useful thank you