 OK, so I've been given the signal to start. How do you like our new logo? Some people say it looks like Google, but really, we started before Google. It's not fair. Anyway, my name is Jesse Martin. I'm a cloud architect at eBay. And you may have seen recently some press release around what we are doing with NICERA. I don't know if it was because NICERA was just acquired by VMware. There was a big price associated with it, or if it was because everybody was excited about what we are doing. I'd like to think that it's the second one. So I'm going to talk a bit about what we are doing at eBay. And I'll go from a quite high level to pretty detailed, because we want to make sure that you understand behind SDN the hype, what we are really doing, and how we are using SDN to run our developer cloud. So you may wonder, what's the problem with my title? I'm not showing, so it's going to be interesting. So eBay is not really a public cloud provider. It's not a cloud provider. But what we have is many different businesses. So you know eBay, we have PayPal, and we have a bunch of others. So what it means is that even if we are an internal provider for the eBay Inc organization, we still have a lot of different businesses that may be potential users of our infrastructure. What it means also is that those users, they may have different environments. For example, at eBay, we have a production environment. We have a production secure environment where we put all the information that identifies users. We have a QA environment. We have a developer environment. And same thing for PayPal. They have production, PCI compliant environment, QA environment, and developer. And it's the same thing for all of them. So even though we don't have a typical public cloud multitenant infrastructure, we still have a lot of different environments that we have to support. So traditionally, those environments, they were implemented as physically isolated environments. And this creates a lot of problems. So we'll talk a bit about that. So now we tried to convert this infrastructure, and we looked at the principles that we wanted to apply on that infrastructure to build it. So the first one was that we were supposed to be able to deploy any application anywhere. So what it means is that if you have some resource available in one side of the data center, you should be able to use it for front end, even if before, only QA was using this side of the data center. The second one is soft cabling. What it means also is that if you want to create a new network, you should not have to go in the data center and wire different machines to different networks. Shared, standardized infrastructure. So that's also very important, because the fragmentation I was talking before about is also caused by a lot of different infrastructure requirements. So we used to have more than 100 different server types at eBay. And we are starting to reduce that number to a few standard servers that users can select from. And this improves a lot our efficiency and ability to automate. The next one is to virtualize everything. A virtualization doesn't only mean running an hypervisor on top of a compute server. It means that everything that we do should be abstracted from the application. So for example, it could mean network virtualization, storage virtualization, and compute virtualization. The main reason is that we don't want the application to depend on some specific characteristic of the hardware, because if we want to change some hardware model or characteristic, the application should not be impacted. And the last one, the most important is to automate everything. So based on those principles plus the problem that we had before, what we had to do is convert all those physical environments into something that can be automated and can be deployed more easily. So we try to first convert all those physical environments in what we call class of service. So a class of service, if you want, is an extension of an SLA or quality of service. It captures everything that you would implicitly implement in your physical environment. So if you have a QA environment, you are going to have firewall rules. You are going to have access permission, a definition. You are going to have support specific contracts, and so on. So we try to capture that into a logical entity that we call class of service. And then what we do when someone creates a project or a logical environment, we ask them, is it a production environment? Is it a developer environment? Is it a QA environment? Is it an external environment that is going to be completely externally facing? And based on that, we are controlling how the infrastructure is configured. So what it means is that in a typical cloud provider, you have provider-specific rules on what tenants can do. Here, we don't have only one set of rules, which applies to all tenants. Depending on the type of project that the tenant is going to deploy, we reconfigure the provider-specific constraints and capabilities to fit that specific class of service. And this allows us, for example, to run together on the same infrastructure, QA, production, developer, and external class of service. And we reconfigure around that logical environment the provider infrastructure so that we can enforce all those restrictions and obligations. So now the other aspect of what we are building is we wanted to have a shared infrastructure. So I'm going to go in the details of our network. The main design goal for our network was scale. And we wanted it to be as simple as possible so that the network engineering team would not have to build environment-specific networks. So we had this design that's based on the spine and lift, which is pretty much an industry standard for scalable network. It's a layer three type of network. So there's routing happening at every level. So the drawing here is a drawing of a switch, a label of an icon of a switch, but it's really a router switch. The size of that infrastructure depends on the number of ports that you have in your spine. So you could have the same switch in your spine that you have in your lifts, which means that you would have 48 ports here. And four times 48 would be the maximum number of, 48 times 48 would be the maximum number of servers that you can have in your environment, in that topology, right? So at eBay, we call that topology a bubble. Some people might call it a pod or I don't know what's the official term. Anyway, the key parameter here is that we want to have line rate or a consistent oversubscription from any node, any two node in that infrastructure. So it's consistent B-section bandwidth and latency across all the nodes in that infrastructure. In order to achieve that, since there's multi-pass to any node in the infrastructure, we use OSPF and ECMP for finding the best pass between the two nodes. So now if you look at this infrastructure, you realize, but how do you do isolation? Because you have a large infrastructure which has between 5,000 nodes to 10,000 nodes and you cannot really interpose easily firewalls or control points. So if you look at the options, if you want to do it at layer two, one of the options here is to basically build smaller physical infrastructure. It's not really efficient because what's happening is that first, you have to build that infrastructure physically for any new project that you want to start on a new class of service. It creates fragmentation because the size of that network is not really flexible. The isolation is quite coarse-grained because it means that every node within the same network here has the same reachability to any other. So you cannot isolate this node from this node, but you can isolate this one from this one. So the isolation is still coarse-grained. The best thing is that there's really physical isolation. You can guarantee that a tenant here cannot see this tenant here. And it's foolproof because there's really no complex technology involved. Maybe the more complex part is the firewall in between the two networks and that is pretty standard design. The other option is VLAN-based. So what you do is you still have the same infrastructure, but instead of being flat L3, you have VLANs and you assign machines or VMs to a specific VLAN and you can have a firewall between the two VLANs. It's pretty complex because you have to do VLAN assignment for every project or class of service, which means that you are limited in how many network or how many projects you can create on top of that infrastructure. There's one limit which is 4096 networks or VLANs, but there are some techniques to increase that number, but it's not the main issue with that infrastructure. The main issue is that it creates a large layer two full domain. If you have a problem in one of those layer two VLANs, it may impact all your network because of the way STP Spending Tree Protocols operates. It takes some time to fix issues in STP and you might end up with some outage while the STP protocol recalculates your networks if there's any kind of miscabling happening. The good thing is that it provides layer two isolation so you could have layer two adjacencies between those nodes. They could use the same IP address space, the same network, and it's somewhat soft cabling because you don't go in the data center and pull wires, but you have to reconfigure a network switches to add, for example, a new tag or to allow a tranking. So that's layer two. Now the other option is layer three. So you go back to my first scaled out network where everybody can talk to everybody and you put some layer three filter either, for example, in the hypervisor using security groups, or if you are using hypervisor, you could have, for example, something like VShield that is redirecting the traffic of all the VM to a local firewall and this local firewall is going to provide this isolation. What we found is that there's, the good thing is that it allows you to have this scalable design that I mentioned before, the spine and lift layer three. It's good for user policies because it's a pretty lightweight to add, for example, security groups and rules associated with those security groups. But the problem is that when you are trying to combine those security groups with provider security, it becomes very complex because if you look at the Amazon security groups, by default, you can send traffic to everybody. But what if you have a provider rule that says, no, you cannot send traffic to this, you cannot reach this part of my network or there's some service that you cannot reach. It starts to have the combination of all those rules starts making it very complex. So I think it's good for tenant specific policies, which are a specialization or if you want a restriction on top of the provider policies, but I don't think it's very good when you try to combine the two together. The other problem is the management of rules because those rules have to live on every server. And if you modify the membership of a destination of your rules, you have to update all the servers. For example, if you say all my VM can talk to this group of machine, if you add a machine in the target group, you would have to update the rules in every server. Even if the firewall technology that you are using is doing that for you, at the end of the day, they have to modify the rules in every servers that have this rule specified. And if you are using L3, you may have those machines in complete different subnets. And it's very difficult to do root summarization or aggregation when you want to rule aggregation or summarization when you want to define those policies. So you might end up with non-efficient policies. So the solution that we found was to use virtual layer two networks that are provided by VMware and NISERA. So it's kind of like VLANs. You can specify that a VM or a server is part of a network, a virtual network, and you can put a firewall between those networks. So you can prevent them from talking to each other if you want, or you can prevent the routing between those two networks if you want. That SP space can be completely separate from the SP space that you have in that layer, which is very useful because when you have 48 server here and you put VMs on top of those servers, the number of IP that you have to provide for each rack is huge. So for example, you have a slash 24 for this switch, but then if you have 10 VMs per rack, per machine time for 48, it's already 500 IPs, plus you have to have some for the management and so on. So at the end of the day, you end up with wasting a lot of IPs there. So the nice thing is that you still have layer two isolation. That guy cannot see this one unless there's a route between the two or there's a firewall that allows that. It's compatible with the large scale network design that I mentioned before. Can be fully automated. Everything is configured through APIs. You can still enter post network to have provider specific policies. It can complement layer three isolation because I can still run security rules on my own VM if I want to or in the hypervisor or some other place in the network. It's still compatible with this type of isolation. And you can have a large number of networks not limited by some tags like the VLAN tags. The negative point is that you have some tunnel overhead to perform that layer two isolation. The traffic is going to be tunneled on the provider network. And the size of those networks is limited because for every layer two network that you are constructing, you have to have tunnels between any two members of that network, which means that it's growing very fast, but it's not different than the type of limit that you have, for example, on the physical switch where you have 48 ports and that's it. So it's giving you, there's still some limit but it's still something that we can manage. So what is SDN? Really to very simplify at the maximum and I hope no one is going to blame me for that. If you look at the architecture for a switch or router, you have some logic that is defining the behavior of the switch engine. So that logic could learn about the network from some protocols. It could be like a Mac learning logic that is going to add some entries in some Mac table in the routing engine and that's the typical design of a physical switch. Now with SDN, what you do is that you still have this routing or switch engine. It could be a virtual routing or switch engine that runs in an hypervisor, for example, but that switch is kind of a dump. It's a slave to some logic that exists either in an agent on that machine like the Midokura solution or it lives in some centralized controller like what NICERA is doing. And that centralized controller either through protocols can also learn about the network but most importantly it can learn from APIs. So what it means is that instead of trying to discover that a new machine was created on an hypervisor by listening to Mac address, since you know that you created that machine on that hypervisor because you are the one that actually did the instantiation of that VM, you can pre-configure the switch to already know about that VM because you know the Mac address, you know on which hypervisor it was deployed so you might as well put the entry in the Mac address ahead of time. Like that you don't have to go through the discovery phase which could be based on ARP for example and if you want to move that VM from one hypervisor to the other one since you know that you are moving the VM you can reconfigure the network at the same time you are moving the VM which avoids all the problem of IP migration and rediscovering where this IP went and find out how to reach that IP and most network is going to break the connectivity anyway because you are changing subnet and things like that. So that's like two minutes what is SDN at least as far as I understand it. I'm not going to go into details of what's happened in there or in there because it's too complex. So now the other aspect of SDN is still my opinion SDN experts might disagree but there's kind of at the bottom what we are using at eBay right now which is just like playing with overlay networks. So what it means is that we are not really reconfiguring complex routing or complex behavior of the switches and we are using virtual switches in hypervisors. The type of protocol that we are missing with is ARP and L2 protocol. In the future we might do that with L3 but it's still going to be pretty basic. Then you can do more advanced things if you have physical servers that are also implementing SDN and now you have physical servers that can be part of a virtual switch by membership of ports and other criteria. And at the top of the totem is maybe what university are doing right now with SDN or OpenFlow which is completely implementing new traffic engineering behaviors by reconfiguring network switches so that instead of using for example OSP, FECMP and having kind of a basic understanding of the network through protocols you can also configure the behavior of your network based on how you understand it. So for example if you have multiple paths to a machine instead of saying all the paths are equal and that's a static configuration you may want to say okay this link is saturated so I'm going now to switch more of the traffic on that other link or there's some very important traffic on that link so even if it's more costly to go to this other link I'm going to do it because there's some important traffic on that link. So you can play with more like the behavior of the switch without relying on complex protocols which might not know at all what your network is trying to do because they have kind of a myopic view of the computing environment they only see their peers and what their peers are telling them. So that's the simplified view of SDN and as I said for now we are playing in that level so not yet Ninja or Wizard of SDN. So the first implementation that we did is because we wanted to learn also at the same time it's pretty new technology we started a year ago just when Nisira went public with their offerings. So we wanted to try out on a use case that was not like production sensitive that happens that people that are using them developers they are pretty nasty and when it doesn't work they tell us so even though it's not production you don't want it to be down so the developer cloud what it is so it's a logical environment similar to what I described before with a class of service that defines what they can do and we give them a cell service API that they can use to get VMs so we are not using horizon yet we have our own layer because we wanted to strip it down to the maximum so that they don't break anything but the idea is that it's an environment that where every developer should be able to put their application and collaborate with their friends because before what happened is that they would run an application on their desktop or their laptop but at the end of the day they close their laptop their application is not available anymore or they put it on their desktop and every first day their desktop is rebooting so they have big stickers please do not reboot so that cloud is giving them a place where they can run their application and we had an experimentation event that's called Skunkworks where all developers at eBay can showcase a project that they developed in their free time and we use this environment to give them infrastructure to showcase their projects to VP and the winner at the privilege to have their application on the site live so it's implemented as a set of layer two networks and we'll go into details soon and we are grouping those networks layer two networks in one big layer three networks and we have a perimeter firewall that is controlling access to that layer three network and what this layer three network can access outside we are not exposing yet private networks so developers are sharing the same network and this is something that we will do in the future is allow developers to have their own private network and play with more complex infrastructures topologies if they want to and it's isolated from our production infrastructure so the important thing and we have some representative of our operations in the room what is pretty scary is that we are giving developer access to our production site except that they are not really on the production site because they are isolated through layer two so we are using our shared infrastructure that we are leveraging for production which is high scale and on top of it we developed and deployed this developer cloud but none of the traffic that the developers can generate is going through that production network because it's routed outside it's transmitted outside through the tunnels to the gateway that bring that traffic outside of the production network there's some traffic that can reach the production network but it goes through a firewall first so it's a very interesting proposition for us because now if we want to reclaim the 40 servers that those guys use or 100 or 200 servers that those guys use we can kick them out and use it for production we don't have to change anything we just have to re-image the server and it's done if we want to use some spare server to give them more capacity because they have a big project they want to test some algorithm on 150 servers we can also do that we just take temporarily some production servers we attach them to that network and they can experiment with that infrastructure so very important to enable a scale out test and we already have some projects that requested like 16 VM to do some messaging tests they could not do that on their desktop that's definitely not an option and we cannot give to anyone their own environment to do those kind of tests so now if we go in the details so there's a lot of details here you should be able to almost do the same thing that we did if you take that topology so I mentioned that we created virtual networks which are pre-created so for example we created 10 networks 10 virtual networks each, so that's a Nova network gateway with a virtual switch, OVS kernel module and every time you create a network using NovaManage there's a gateway that is created on that network upstream we have one interface that goes to the firewall for access to our corporate internet or QA network and one interface that goes to the blue network which is where all the provider network is where all the production network is like an infrastructure network so when you create a VM it logically has an interface on the green network it's tunneled through the virtual switch back to that box and there's a set of policy-based rules that are directing the traffic coming on those gateways to the orange link so that it goes through the firewall directly and we allow them to talk to each other but we don't allow traffic to be routed outside of that dotted line here so what we run on that box is Nova network metadata API so that CloudInit could work here on every hypervisor we have the standard KVM and Nova compute plus the OVS switch the root here is just basically the their gateway interface we have in terms of infrastructure we have the scheduler, the APIs and Quantum Quantum talks to the NICERA controllers which is on the same network and the service nodes so it's pretty simple if you want the only trick is that because we are using Nova network and we are not using the HA a way of deploying Nova network we had to figure out how to make the gateway HA which means that when for example this box dies the standby gateway can take over the traffic routing for the networks so what happens is that we have a pair of switches and a pair of firewalls each connected to those two standby standby and active gateway and the Nova network when it restarts recreates all the gateways and routes that are required for the traffic to go through the standby gateway and we fail over this IP here that is the IP that the switch that the firewall is using to send traffic back to the dev cloud so it may happen that we lose some traffic the failover is not instantaneous so it's not the best solution but so far we did not have any outage and we did not have to try our HE logic which is good in terms of tricks I can talk a bit more about the flow and you will see how this happens so we have our cloud portal that is used by the developers and here you request an instance automatically we tag this request with the class of service the type of OS that you want and the size and we have an orchestrator that is going to frontend the request the main reason we have this orchestrator is that we need to configure DNS and today in Nova there is no easy way to automatically configure DNS and we are evolving this architecture to be completely based on OpenStack removing the green component here but this requires some acts and hooks that we have to put into Nova so when you have your instance at some point you create the VM on the compute you have a phase where you have to get the IP and create the port so that goes through quantum quantum talks to the NICERA controller creates the port the port is attached to the network the switch is created by the administrator when you create the network so you have to map the switch to the network which is done through some tags in the NICERA controller at the same time when you create a network what happens is that you have to create the gateway and the way this works is OVS implements a bridge emulation Linux bridge emulation so by using just the same Nova Nova network processes to create the gateways the bridge is created not a bridge sorry a gateway interface is created also talking to NICERA attaching this gateway to the same network logical switch so that's the basic flow this can be simplified there's two elements that I want to point out one is because we pre-created the network and we have a maximum capacity for each network of 256 IPs we need to assign ourselves the network for each request we cannot let the tenant choose their network so what we do is we look up the Nova database we look at how many ports are already assigned or how many IPs are already assigned in that network and we select the next network that is available and we send that network information as an extension to the boot instance and so we worked someone in the room, Subu implemented an extension to the Nova scheduler in order to do that so we can remove that step from our internal orchestrator and we also have the requirement to create DNS entry forward and reverse for the created VMs so today we do that through some glue that is listening to IMQP to RabbitMQ and there was a session previously talking about the future of DNS in OpenStack and eventually this DNS management will be a component provided by OpenStack so that's the flow now some people say you're crazy to use OpenStack it's not stable so for us that's our availability here since we opened the two deep I think I am to blame for the two deeps because I did some operations on our infrastructure without using the right interface and I caused some issues but otherwise it's really stable for us and if you look at the interesting graph here is the latency to create a VM and it's pretty consistently under 75 seconds and it includes all the steps that I talked about before the network connectivity creation and all of that and our users are pretty happy because I looked this morning I had to look at the slide every day because that's the guy that disappeared we are reaching 800 VMs now so that's without advertising too much that infrastructure because since we didn't know how it would work we'd never really say open for business but people are finding out that there's an infrastructure where they can get machine for free so it goes very quick so what we have to find out now is how do we reclaim the VMs that are not used because since it costs nothing people might have two or three VMs that are running and they are not using them so we have to find a way to do some garbage collection and reclaim the VMs otherwise we are going to pay too much licenses to our vendors so what works and what doesn't work so the good thing is that if you are in an enterprise the first thing that you realize is that there are processes to change firewall rules and that there's an SLA to change firewall rules that can be between one week, two weeks depending on the complexity of the change so one of the benefits of this architecture is that we configure the firewall once pair class of service so the firewall has some specific group of policies that is identical in every colocation for a given class of service and we manage which VM goes into which network and therefore which class of service so the network security guys are very happy because they get to define the policies they get to control every package that goes through that firewall but they don't get the tickets every time we add a VM so they were very happy about that and that's a model that we are going to extend because it's making a lot of sense to focus on the policies we have the hard work to create new rules every time you have every change in the infrastructure so the network are pre-created so that's good for provider networks but for tenants we will do that dynamically we can use the same pattern to create different class of service we have the logic now to do that for example for the two or three other class of service that we want to implement as you can see as I said our stack OpenStack Ubuntu KVM worked very well for us we can compare that to something I did not mention is that already 50% of our infrastructure runs on our own cloud and we can compare the performance and scalability of our own cloud with what we have developed for the developer class of service and it compares very favorably so for us it's a proof that we can move to the next phase which is start deploying production use case on that infrastructure and there's an interesting feature coming up in 4SOM and Quantum V2 that we are anxious to use so that's very good and I think that that area of Nova is where innovation is going to happen sorry, this area of OpenStack is I think where the most innovation is going to happen because I don't want to diminish the Nova capabilities but there's not much happening in the hypervisor space you may add virtual VC for OpenVC for new hypervisor Elixi or Hyper-V but there's not a lot of new features that we can develop there but I think that on the network side there's still a lot that can be done and that's a very interesting area to contribute the bad thing as I mentioned in OpenStack you cannot do an assignment of network policy based so you have to let either the user define what nick they want to use for their VMs or you use the default policy which is every public network gets attached to the VM plus the project specific networks in SX there was only one network flavor you could only have one gateway if you use the Nova network the way we are using it so this has limitations because for example for two different class of services you might want to use two different routers connected to different firewalls and I think that in Folsom we will be able to do that scale out there's still one bottleneck which is the router or the gateway where all the virtual networks are terminating so today in our design we have one gig link but since it's mostly interactive traffic it's SSH connections it's not a problem but if we have for example use cases that require a lot of data out of their virtual network and that data resides on the physical network it has to go through that gateway like the North South traffic and that gateway is going to be a bottleneck so the NICERA team is trying to remove as much as possible this part of the topology the one thing that is complex is linked to the organization and policies that you have in big enterprise which is usually you want separation of concern between networking, security and the server management team it's either for compliance point of view or just for plain best practices so what's happening right now is you see that on that gateway server we have networking components we have security components because so we have a virtual switch we have rules, IP rules we have file rules and it's running on the compute server so who is going to manage that infrastructure is it the server team is it the networking team or is it the net-sec team so there's interesting challenges with network virtualization where you don't really know now who is managing what so I think that as we mature we'll find out the right middle ground but most of the time the networking team will say oh if it's not wrapped in a shipment and I don't want to manage it that's it will say oh it's not a file that I support so you're on your own and the server guys will say wow there's too many things on that works that I don't understand that I don't want to manage because it's networking, it's security and all of that right and it's understandable because now you have to have people that understand the three technologies so it's kind of the challenge that we have so what's next so we are going to implement different class-of-service additional one more like production-like one thing that we looked at is for production traffic we'll likely go to another model which is bridge network to avoid the overhead but since there's less and less overhead now with tunneling it might not be even relevant we have a bit more than 80 machines I don't know exactly how many machines we have today that are supporting the 800 VMs but we are going to increase this infrastructure to as much as possible I cannot tell the number because it depends on how many developers we have more gateways so we are going to move the gateways to 10 gig links so that we can have more north-south bandwidth available outside of the virtual networks we are going to integrate with Folsom and look at also integrating with some work that happens in the load balancing space and plug load balancers in that network so that we can have traffic coming from the internet through like a one-armed type of load balancer setup going on the virtual network through some floating IP to serve some typical front-end use case and we are working on a cleaner OpenStack integration which is what I mentioned before that Subu and Tim are working on which is moving all the parts that we had in our own custom cloud implementation to be in the OpenStack proper domain and at the same time contributing back as much as we can onto the OpenStack community to what we are doing in that space and last but not least we are hiring too this morning there was a question by Thierry I think how many people are hiring OpenStack developers and I think that everybody is but at least we are a bit different than the other ones any questions? Yes, I did look at it to be honest IP advisor choice, the controller choice was based on maturity so I'm sure that in a few months or a few years there will be more options right now I'm not paid by NICERA I guarantee you but they are the most mature solution and it's been proven so far I'm not saying that you cannot make Ryu or Midokura or other solutions work but with our scale requirements and our availability and reliability requirements we felt more comfortable at this point with NICERA so today it's only internal facing but the next phase that will be delivered in December is external facing too same infrastructure and the difference will be that each network each tenant network will be its own virtual network because we have a stronger isolation requirement because if one tenant gets compromised we don't want the other tenants to be compromised so there we will use private networks and we will control the traffic that goes in and out of each tenant network no villain segmentation, all virtual networks our network team said no villains there are too many issues with villains network segmentation yes they could it's still compatible with security groups the only thing is that they would not be able to grant themselves access to something that the perimeter firewall disabled and that's also something that is important is when I mentioned that security groups cannot really be combined together with provider rules is that if you give access to security groups and people can change any rules they want then you have to make sure that they cannot disable some provider specific rules and that's where having this separation is very interesting for us exactly, so there's provider security which is defined for each class of service and then there's pertinent security and if you don't want to expose your service for example your database outside of your virtual network you have two options you don't put a floating IP so no one can reach in or you put a floating IP and you select which host can access your database for example oh let me go back because I think it's maybe important so let me go back one more so there is let's take this example each VM is in the top of the rack environment which is like the leaf and spine environment but the gateways they are not connected in that leaf and spine environment they are connected directly to our core or distribution fabric because it's north-south traffic so there's no point of putting for example the gateway here and going all the way back to a firewall so our infrastructure is connected differently than our compute nodes and the other reason is because the connection between the firewall and the gateway is using villains so that's the only place where we have villains is that here I did not mention that but the orange link here is a villain so those two are two villains and the villain span is really between those four, six elements it doesn't go out and that's the maximum that we want to do with villains it's just like local villains if you want so if we spin out another collocation or another instance of that we can even reuse the same villains those things will not be connected to some trunks so we limit the layer to a span so the question is how do we create the gateway here so the NOVA network through NOVA manage when you create a network automatically NOVA network creates that gateway for you and it happens that because of the bridge compatibility it's using the same Linux bridge command to do that, BRCTL and under the cover this results in the creation of a NICERA port attached to your network so the question is how many machines and what are the upgrade challenges so today we have 80 machines so it's like 10 VM per machine I think we can go to 12 with our current configuration I don't know if we can go much higher the upgrade challenge that we are seeing is first the open v-switch is a kernel module and when you upgrade a kernel module you have to be careful not to bring down your VMs and have to reboot so the upgrade of that component here is a critical part you have to get it right the upgrade pass for the rest of the components is similar to any kind of open stack upgrade I guess you take a backup of your database and you hope that the upgrade scripts is going to do the right thing but I don't think that there are any more challenges we upgraded our infrastructure from Diablo to SX and multiple releases of SX up to the actual release and it went pretty smoothly I don't think that there was any specific issue because the Nissan infrastructure is pretty agnostic to what happens on the open stack side the only issue is the open stack components and so far it's been pretty easy to upgrade I crossed my finger and hope next time it's going to be as good but any other question? OK, thanks