 Hi, everybody. I'm Igor Bolotin, and here's with me Vinay Banay. He's our SDN and network virtualization expert. I've been cloud architect with eBay for a few years already, joined as ex-commerce cloud architect, later as PayPal, cloud architect, and now cloud architect for eBay Inc. It's been actually an interesting journey so far. Building and running clouds at scale. I remember when I just joined ex-commerce, we just started planning our first production deployment with Diablo. It was one of the early milestones of Diablo. We went live with Diablo D5. And a couple of years later, before we knew it, now we are running Havana already. Of course, the first deployment didn't go with Neutron. There was no Neutron back then. But now we're also running Neutron at scale. This is our standard marketing site, by the way. But what is important for us is that eBay Inc. is a large corporation. Few companies, few businesses, the biggest ones that are mentioned, eBay Marketplaces, PayPal, eBay Enterprise, formerly known as GSI. But there are a lot more names that we don't see here on the slide. We mentioned just few. Magento, StubHub, Wear, Milo. All of them need infrastructure. All of them need and use our cloud. Here's what we're going to talk about today. Business case, what do we do with cloud at eBay? Deployment patterns, problem areas, and how we solve that. So what our business asks? Well, you've heard it multiple times already. It's agility, agility, and once again, agility. Everyone wants to go fast. And we are not exception in this case. Our business also wants to go fast. Why? Because it enables innovation. But it's not only going fast. It's also going with efficiency, with being able to utilize fully our infrastructure, shared infrastructure, shared multi-tenant cloud, with availability that is necessary for the enterprise. Because we have so many different businesses, they have different needs for security and compliance. And we need to be able to satisfy all of these needs. And that drives quite a few of our decisions, including what we're doing with networking. Here's how the overall big picture deployment looks like. It's a global cloud, multiple geographic regions. In each region, we're running several availability zones. And we have actually started using cells for scaling within individual availability zones. We deploy identity and image management today at the region level. We are going to make it global. We have open-set controllers running, of course, at the availability zone level. So when we're talking about Nova, Neutron, Cinder, that's deployed at availability zone level, and the cells are cells. So I already mentioned pretty much all of that. Well, I forgot to mention Trove, database as a service. We're also running. And I believe it's also deployed at availability zone level. We use Zabix for monitoring. Everything is behind load balancer web. It's actually an interesting mix. Some of the availability zones use hardware. Some of the availability zones use virtual. The cloud is shared. There are multiple tenants on that cloud, multiple business units. Everyone is sharing the same infrastructure. And the tenants themselves are also different. We have production tenants, marketplaces production, PayPal production, eBay Enterprise. We also have our developers using color clouds, our QE organizations using color clouds. Also have sandbox environments running there. Internal clouds like IT. And all of that different set of clouds they share. I mentioned that we're using Neutron, right? So for production traffic today, we're running everything in breached mode. We're not yet using Overwise for production. And we're not using DHCP for production. However, for some of the clients, some of the tenants, including DevQA, it's all virtual. It's Overwise with DHCP with everything. Here's a kind of nice picture. Let's put it this way. It looks like this with top of the arc switch is connected to the distribution layer. Actually, it's a pretty much classic spine and leaf architecture. We only show two switches here, but it's really might be a lot more than that. Depends on the size of the availability zone. Might be four, it might be eight. It really depends. And there are gateways nodes for the Overwise that are connected to distribution and to the core. Of course, when you run it at scale, you look at some issues. And I'll let Vinay to talk about specific issues. Thank you. Thank you. Hello, folks. Before I touch on areas with scale issues, I was just at a talk like 15, 20 minutes ago about how Neutron doesn't scale and how they're going to use their own solution. So our experience has been a little different. We've used Neutron not only for internal clients, but also for running production traffic. But having said that, we had our own share of problems. So what I would like to show in this from now on is some of the issues that we face and how we manage to surround them. I'm sure there are going to be some questions and answers. So what I would do is if I could request you to hold your questions, which have more detail towards then. But if you have a simple question that you want to clarify, you can go ahead and raise your hand or interrupt me. So first problem that we had is hypervisor scale out. How many hypervisors can you have in your Neutron deployment? So it turns out that if you use overlays, you're running to issues, meaning you can't scale out like in thousands and thousands of servers. But on the other hand, if you use bridge networks, when I say bridge, I mean like flat VLAN type of network, provider networks, we tend to scale out much better. So this plays into this overlay versus bridged. So we use both, by the way. And then we also had some issues running Neutron API servers, especially in regards to DHCP running active standby and active active mode metadata and Neutron API servers. And we also have we don't use reference implementation SGN controllers. Like for example, Neutron has a reference implementation. We don't use that. We have a commercial SGN controller. We use Nicera NSX as our controller. And then we also have network gateway nodes. Network gateway nodes tend to be the, if I go back to the previous slide that Igor was talking about, you saw two gateway nodes to the side. I'll just go back a little bit. You see the ones, these are the gateway nodes. These are the ones which take traffic in and out of your cloud, which does overlays. So and we also have issues related to upgrade. But we decided I'm not going to talk about upgrade because that needs a separate presentation by itself. So we won't be able to do justice talking about issues that we face during upgrades. So I'm going to address each topic at a time. So hypervisor scale out. How did we address that? We address that by using, this picture shows you all the OpenStack controllers and Neutron. There also we have Cinder and Glance and all those controllers, but we're just showing Nova and Neutron because that's where the scale comes in. So Nova has a, they have a solution called Nova Cells that allows you to scale. So it reduces the load on your queues and makes your Nova scheduler run much more efficiently. But if you put it, break it down into cells. Neutron API doesn't have cells. So yes, I have 1000 hypervisors or 2000 hypervisors, like a large number of hypervisors that I can break, I can divide and conquer using Nova, but I don't have such luxury with Neutron. So typically what we do in a cell is we run several hundreds of hypervisors. We try not to run in thousands. It's like in several hundreds, high hundreds. And then we put multiple cells in a availability zone. So let me back up a minute. AZ here, availability zone, it's not the same thing that's used in Nova. This is the AZ from Amazon, meaning it's an open stack deployment. Just wanted to make sure. And then we run, our intent is to run three to five cells in an availability zone and scale out up to thousands of hypervisors in a single AZ that is a open stack deployment. So we, as I mentioned earlier, we use Nova Cells to mitigate the hypervisor scale out issue with Nova. So Neutron scaling, on the other hand, we try to solve that problem in multiple ways. One of them is to not run all overlay because that's where the scaling problems we run into. So we run a hybrid mode where we run tenants which need bridge networks and also tenants who run overlays. As Igor mentioned earlier, all of our production traffic runs on bridge networks. We don't use overlays there. So in a given open stack deployment, since it's multi-tenant, production traffic, Dev QA, all those things when they run, the Dev QA and other tenants which are non-production run on overlays and production stuff runs on bridge networks. So just to give a little picture of how our overlays and bridge networks intermingle in our data center. So let me back up a little bit. So as you see, we have racks of compute racks at the bottom and on the top left you have a tenant who's trying to have his tenant living on a bridge network and we have a tenant who's living on an overlay network. So what we do is, both of them share the same network resources. So the bridge tenant directly uses provided networks to configure their networks and their VMs are directly plumbed into the physical networking gear and then the overlay tenant goes through our overlay virtualization SGN controllers to provision their workload. So I'm gonna briefly touch about some of the things, concepts in overlay networks. Most of you might be familiar with that, but I just wanted to touch them so that you can understand some of the scale problems that we are having. So the first, so we were using our technologies, overlay technologies are mostly based on STT and VXLAN. And so one of the problems that we have in overlays is handling bum traffic, broadcast, unknown unicast and multicast. So we found out that running multicast in overlays is not a good idea. It doesn't work very well, right? Then there's also the issue of ARP, which tends to be broadcast. So there are several ways you can mitigate that. So there's a concept of a service node in our deployment in our, from our SGN controller vendor that handles anything which is all this bum traffic. But if you guys have been following OpenVswitch, OBS now has the ability to do proxy local ARP on the hypervisor. So the ARP packets can be intercepted directly on the hypervisors. So there in overlay networks, there are concepts of logical switches and logical routers. The basic logical switch is a software construct where it's a mesh of tunnels. So we are sitting on the same virtual network can directly talk using tunnels. And then we have a concept of a logical router. A logical router in our case in this mode, which is a centralized logical router, sits on a gateway node. So if you have multiple subnets connected to the logical router, traffic needs to go from one subnet to a different subnet. It goes and hits the logical router, which runs on one of the gateway nodes. Now, that also becomes a little problem. So what we've done is there is a concept of a distributed layer three router. So what a distributed layer three router does is when you have a VM sitting on one VM, talking to a VM on a different subnet, but they're connected to the same distributed logical router, they can directly tunnel without having to go through the logical router. So this also reduces some of the bandwidth and processing on the gateway node on the logical router. So our gateway nodes are all done in a scale art fashion. And so some of the other issues that we ran into, I'm going to go with each topic at length, is on the neutron services is like some of the issues that we face with Keystone tokens. Problem of running with a single thread at quantum neutron servers. And prior to Grizzly and prior to Havana, there was it used to be, neutron used to be called quantum. So quantum was all running within single thread. So RPC calls, we have restful API calls, we're all being handled by one thread. And that was causing a problem when we, when you start having lots and lots of hypervisors. So we also have, we have issues with DSCP servers. So we're going to briefly touch on that. And then one of the other issues, which is not such a big deal, but this is something that we took some measures to reduce the load on our neutron API servers with the healing instance caching interval in Nova Compute. So let's look at the Keystone token issue that we had. So everybody knows the process how Keystone and the servers and the clients interact. So on the left hand side, you have all these clients when they need to talk to any of the servers. So they go authenticate themselves with Keystone. Keystone in turn returns a token and the client presents the token to the servers and the servers again go back and talk to the server to the Keystone server to authenticate the token and then they perform the request, then they execute the request. So it turns out there's also a lot of chatter between the servers. Like for example, Nova server talks to neutron servers constantly and also Cinder. So all this thing leads to some scaling issues. But before I get into that, I briefly want to talk about there are two kinds of tokens. In the past, Keystone was using UUID tokens. So it's a UUID token. So there are two parts. One is you create the token and the second part is when you authenticate. So there are two calls that you have to make to Keystone, creating the token and authenticating with the token. So that's what happens with UUID. With PKI-based tokens, you just go create the token and the token gets signed by the Keystone, by its cert, and it comes to the server. So server, when it has to authenticate, it doesn't make a call to the Keystone because it has a key, the public key. And then it can verify. This is a token is correct or not. And then there's also some services do token caching. But in our case, we're not doing any of the token caching. So what was happening in our case was in one of our, we have multiple availability zones, as Igor was talking about, which are running OpenStack control plane. So we had a problem with tokens. In one of our deployments, we saw that there were 98% of the tokens being generated in our database was coming from inner services, like many Nova calling, Neutron, Nova calling, Cinder, Glance. Those tokens, 98% of the tokens were being generated by the services. And out of which, 92% of them were quantum or neutron tokens. So when we did the math, we were generating under 25 to 30 tokens per second, new tokens. So every time Nova had to go ask Neutron server for a port, it ghost calls the Neutron client that would go create a new token. Even though it just service to request a couple of milliseconds ago, it would still create a new token. So this was begging for token caching. So this was causing a lot of RPC overhead. The token table and the databases were getting bloated. So when the table gets big, adding and deleting, looking takes a lot of time. So this was kind of slowing us down. So we took multiple steps to address that. So one of the things that we did was token caching, which is not always a good idea, but we were caching it for like an hour so that we don't have to make an API call to create your token. But still, you have to make an API call to authenticate the token. So we're just addressing one half of the problem. So the second half of the problem gets solved when you use PKI-based tokens. Because now if you cache the token and then use PKI-based, then you can locally authenticate the token, right? So you're reducing the amount of network chatter and the RPC calls that you're making. So doing this thing helped us a lot. So if you're running Neutron at scale, make sure that you have this. So it turns out there were some open-stack bugs on this. And most recently, the bug was fixed in, I think, Icehouse. And it was also ported to Havana where the token caching is now available for admin tenants. And the other issue that we ran, this is like a version before, I think, Grizzly when we were running Folsom. So prior to Havana, I think there was one API server thread for Neutron, which was handling both your REST APIs and your RPC calls. And I don't know how many of you know, but at that time, the DHCP renewals were also being handled through RPC calls. And that was putting a lot of load on the API server. So what we did was we broke up the server into two threads, one for handling the RPC calls, one for handling the API calls. So Havana, those things are fixed. The chatter that was coming from the VMs for IP address renewals has been eliminated now. It's handled by DHCP release. So that traffic has gone down. And then there's support for multi-workers. So you can spawn multiple threads to service your Neutron servers. So this also helps when you start having large scale deployments of Neutron. Is it a quick question? I can't hear you. Yeah, you know what? Why don't we do this? We're going to have like 10 minutes towards the end. We'll address it. Just keep that. I'm having trouble hearing you. Once you get a microphone, then I can address those questions. So the other problem that we, I wouldn't necessarily call this as a problem. It's like a healing instance cache interval. So all the Nova computes have a periodic task that runs, I think default is 10 seconds. What it does is it makes a list of all the instances running on the Nova compute and goes and checks the Neutron server to say the port information. So it is not a big deal, but we decided to increase the time out, the periodic interval from 10 seconds to 10 minutes. So yeah, we adjusted it to 10 minutes. So that also reduced a little bit of load on the Neutron server. So DHCP scaling. So DHCP, how many of you by showing your hand have Neutron deployed in your environments? OK. That's quite a big amount. So you know, DHCP is a little flaky sometimes. The VMs boot up. As always, you find out the VM is not working 9 out of 10 times. It's something related to IP address not being allocated, or DNS mass died, something like that. So what we've done is we've tried to address this issue on multiple fronts. As Igor mentioned earlier, we don't use DHCP for production environments. There's no DHCP. So what we use is we use config drives. So config drive is a mechanism through which when NOVA can inject metadata into the image. That includes hostname, your network, and any other information, key value pairs, and also your private key, public key keys. But that also requires you to have a cloud aware image. But in production, these are all controlled images. These are our images. So we have cloudified all our images so that they don't need to go through DHCP. And they're all running bridge anyhow. Whereas for overlays, we employed a couple of techniques one was to run DHCP active-active. And also, we are planning to go towards running active-active. I don't know if you guys know. Neutron has this concept of DHCP agents. You can create multiple agents. And you can, when you spin up a DHCP, DNS mass, DHCP, you can allocate to a different agent. So that way you can reduce your load that way, too. Let me do a quick time check. So moving to the SGN controllers. So this is how we deploy all the way from our OpenStack controllers to the compute nodes. These are all the various elements in between. So we have all the OpenStack controllers talking through load balancers to the Neutron API servers. So we deploy Neutron APIs in active-active mode. So they sit behind a load balancer. And the Neutron API servers talk to our SGN controllers, which is, in our case, NSX, through a load balancer. And these guys, in turn, use OpenFlow to communicate with the compute nodes. So on the network gateway nodes, which is, again, as I was showing in the picture earlier, where overlay traffic that needs to go out from the cloud and come back in, has to hit these gateway nodes. So the gateway nodes are only needed with the overlay networks. For a bridge network, you don't need gateway nodes because you can take advantage of your physical routers and switches that are part of your provider network. So we deploy gateway nodes in scale-out fashion, meaning you don't deploy two or four. We deploy eight or 10 gateway nodes. So some of the problems that we ran into using gateway nodes was high CPU utilization. I'll get into that a little bit more in detail in the next slide, but let me give you a quick summary of that. So I don't know if you guys are familiar with OBS. In OBS, when the flow comes in, the flow is matched in the kernel data path. There is a miss that flow gets punted up to the user space. In the user space, there is a V-switch D, OBS V-switch D daemon that looks at all the list of flows and then computes an exact flow and pushes it down into the kernel. So once the flow is in the kernel, all subsequent packets get expedited. They get switched very, very fast. And also, we also had an issue where, remember as I was saying, we use load balancers everywhere. Some of them are physical load balancers. Some of them are virtual appliances. So in our cloud and all our services, the tenants that are running, they all run. It's all API-based, meaning they don't directly call servers. They call WIPs. And there's WIPs or server pool of machines. So any app it wants to talk to app B, it always communicates through a web. So when we're using physical load balancers, we found that the east-west traffic was going out of the gateway, coming back in. So it was like hairpinning. That was also creating a lot of flows on the gateway node and making the CPU utilization go up. So there are a couple of ways that we could address that. One was, east-west traffic doesn't have to go out and come back in. You can create a virtual appliance, put it in the hypervisor, and then the east-west traffic can stay under the cloud. So it doesn't have to go in and out from the gateway node. The other thing is to use SNAT. I mean, if you don't want to use appliance, but if you still want to use a physical load balancer, then you use SNAT within the load balancers. I'll get into that why, because if you use SNAT, you can take advantage of a feature in OBS called mega-flows that will help you reduce the number of flows. So there are a couple of other enhancements that were done in OBS that's impacted us in a positive way, very significantly. One of them was this mega-flows. I'll briefly touch on what mega-flows are. And the other one is multi-core version of OBS V-switch T. The OBS V-switch T, which was running in the user space prior to, I think, released OBS 2.0 before it was not able to take advantage of multi-core. So I'm going to briefly touch on those two topics. So mega-flows. So what are mega-flows? Prior to OBS 1.1, for whatever reason, the flow was there were matching on a lot of fields in the flow. Even though you didn't care, you just wanted to do between the source and destination. If it matched on all the port, exact match flow was pushed down into the data path. So starting with OBS 1.1, they created mega-flows. So that mega-flows allows you to wildcard entries in the kernel data path. So this reduced the number of flows. So typically, the flow eviction threshold is pretty low in OBS. So that helps the traffic between kernel and user space. And also, reduced the number of flows in the kernel. But to do this, you have to run OBS 1.1 or greater. So one of the negatives about using mega-flows is if you have security groups on your VMs, the effect of mega-flows is lost. The multi-core improvements for OBS, this is a big one. So let me explain what that means. In the past, prior to OBS 2.0 version, so this is how you had it, right? Like you had the kernel module, the v-switch d-demand, and these are all the CPU cores. And the flow came in. And if there was a mess in the kernel, the flow was handled, punted up to the user space. And OBS v-switch d was capable of running on only one core. So here I am running with 32 hyper-threaded cores, with 256 gigs of RAM. But I can only use one core, right? So my CPU was getting hot. And we're seeing packet losses and misses. Starting with OBS 2.0, they fixed that issue by taking advantage of all the multi-core. So my CPU utilization for the same amount of traffic, same traffic flow, it went down, way down. So this was very useful. So I would recommend that. So right now I think OBS is 2.1. We are running OBS 2.1 in our environments. I would recommend that if you're using OBS-based neutron back end, then recommend using 2.0 or 2.1. Let me do a quick time check. Yeah, we got 10 more minutes. So there are a couple of things I wanted to talk about. So one of them was the future work that we are working on. These are not directly related to scale, but they're kind of playing to the same area. One of them is the VPC model. So we have different properties within eBay. Marketplaces, PayPal, StubHub. And within PayPal, there are different tenants within that. So we'd like to take a logical grouping of tenants and create a VPC so that they can share resources. Whether they be glance images, Cinder volumes, or networks, they can all the projects within a VPC can share. So in that regard, we started by submitting a blueprint to Neutron. We call it the network tagging blueprint. So it addresses several use cases. If anyone is interested, they can take a look at it. And if you want to work with us, that's fine too. So I'll talk about one use case, which is very common. So today, we know that when you want to spin up a VM, this is, again, only in a bridged VM, not overlays. When you want to spin a VM, there are two ways that Nova can spin a VM. Either you pass in the nova boot command, the network, the nick, the dash dash, nick. You specify what network ID you're supposed to be in. Or you pre-create a port and pass the port ID. I think this is like very, Nova is in charge. Nova will pick a compute host where it thinks the VM needs to sit. It may or may not have, it may or may not pick the right network where the server is sitting. So let me give you an example of that. So these are all racks, in our case, compute racks. We call them fault zones, because we do not run a flat L2 across. Each rack is a layer 3 for us. So if you need to go between racks, it's a layer 3 hop. And they're all in different subnets. So along comes a VM. Somebody wants to spin up a VM. And you want this VM to be spun up on this rack, because that's a net ID that you passed. But guess what? Nova has it's mind of its own. It will go pick a different rack. It goes and lands on here. So now you're at a lack, because your DSCP is not going to work. Because you landed a VM with an IP address on a different physical network. So these are some of the things that we're trying to address using this blueprint. So we have a lot of interest, I was talking to somebody from Yahoo. They are also interested in working with this blueprint. This is a very common issue that you run into when you're running bridge networks. So hopefully, in the next release, you know we should have this feature available through something called network tagging. So in summary, I think we've got like seven minutes. I'm going to wrap it up really quick so that we have some Q&A time. So summary, before you deploy, make sure that you have a plan. You know your requirements. Understand your size and scale. Pick the SDN controller based on your needs. Make sure that you have enough linear domains. And pick the good mixture of overlay and hybrid so that it will help you scale. And make sure that you monitor your cloud on a regular basis. So with that, I thank you. And if you have any questions and answers, Igor and I will be happy to answer. Thank you for your presentation. I had two separate questions. First question about your oblate gate nodes. Seems like you guys actually end up using XAD-based approach. XAD-based. Yes. Rather than whatever the hardware supports. So have you looked into any available VXLAN hardware solution as well? Yes. And then why you end up in the desert first question. The second question is about multicore of obvious stuff. That's really desired, as we all expected to be. And but it's kind of an ironic conflict with that problem you saw observed like a high CPU utilization. Because if there are multicore is available, then you're going to have more problem of high CPU utilization. So how you can address the problem? So there are two questions. Sure. So let me answer the first question. You can also jump in. So we are talking to white box vendors. We're also talking to some of the hardware vendors for addressing the gateway needs. In fact, our partner, as you can control our partner, is also working with those vendors. We would like to have a model where we use a physical gateway node, which has multiple ports where the switching and routing is done in silicon. So that's you want to add something to that, Igor? White boxes. Once the solution is going to be mature enough to run in our environment, we will be using that. So on the VXLAN front, also, there is not a whole lot of support in the NICs for doing offload. So the second question about the multicore. So for us, what we saw was there was a huge improvement in the CPU utilization. And we did not see any drop packet. And so we can talk more about it if you want. You know, let me clarify that. The problem with CPU utilization, it's not the utilization per se. It's uneven CPU utilization. So when you get to use all of your cores, overall, it goes down. Igor? How do you facilitate instance access to the NOVA metadata service in a Neutron provider network model? So there are two ways we can do that. So we are using the first method as part of the DHCP, the DNS mask. We use the DHCP option to advertise a static route, but 169.254, the next top being the DHCP server itself. And we use namespaces for the DHCP server. From there, we use the metadata server, like, uses the Unix domain calls to send it to the metadata server. And from there, it talks to NOVA. That's one way. The second approach that we are talking to our SGN controller render is to put the metadata support in the router itself, the logical router. So that way, if I have active, active DHCP, then I don't have an issue. Because if I have active, active DHCP, one has got dot 2, the other one has dot 3, what route do I send? So those are issues that are better addressed if you put it in routers, because the routers already have HA. My question to you is on the gateway nodes, if you don't have gateway nodes as some Cot server or something, can you not replace them with just simple routers? No, because remember, these gateway nodes are not just routers. They are doing encapsulation and decapsulation of the overlay headers, whether it's VXLAN, STT, or GRE. So regular routers, if you use, you cannot do that. So some of the vendors like Arista, Juniper, they're coming out with products, which do that. But it's not integrated into Neutron and OpenStack, right? So once that happens, we will do that. So you mean you need native support for VXLAN or STT to be able to use? Overlay support, yes. Exactly. So one of your previous slides illustrated the issue of the way you're doing provider networks, and you have different VLANs per rack. And obviously, you mentioned the blueprint. But if your environment right now has multiple provider networks with the VLAN boundary being your rack, how do you deal with that right now from a provisioning perspective? Do you create the port in advance, or is there something else, some other magic you're doing? So today, the logic sits in NOVA. So we have custom code that we've written ourselves, where it sits in NOVA. That's where network code doesn't belong in NOVA. But that's the reason we try to propose a blueprint so that we do it the right way. So what we do is when the NOVA picks the compute node, the hypervisors, from our configuration, we can identify what rack it landed on. So the NOVA goes in queries, quantum, and Neutron servers to say, give me all the networks. And we use tags to identify the network which matches with that zone that landed on. Then we pick the network and send an IP address. Thank you. Yeah. So on the OBS mega flows thing, you mentioned that if you're doing security groups with that, you sort of lose that benefit. Can you explain why that is? Sure. So because the whole concept of mega flows is you're doing either layer 2 or layer 3, and you ignore the rest of the layer 4 data, like the protocol information in the ports. The moment you have, you cannot coexist with having a security group rule, because for each, that means the moment you have one security rule for your VM, then you need to disable mega flows because you cannot coexist with both. But with 2.1, what they've done is it only impacts with their partitioned the flows on a per VM basis. So if one VM has security group rules, it only impacts the flows related to that VM, but not the other VMs. So when you said the network tagging blueprint, you mentioned that when we spawn a VM, it could potentially sit on any of the other racks, which where you don't want them to sit. But the NOVA boot command today has a availability zone option where you can actually specify the exact hypervisor where your VM needs to be spawned. So why would you not use that option, and why would we want to build something? Because what is the connection between availability zone and the network? Like when we talk about VMs, we talk about hypervisors where the VMs will sit. So when we give availability zone, we can give an exact compute node host name along with it. So isn't that like an anti-cloud pattern? I mean, you have created a cloud where resources could go anywhere, but you're creating availability zone, it can land there. So instead, for me, all I want is I want my VM to have an IP address. I don't care what IP address it is, but today, instead of doing availability zones, I land on a given compute node and we can derive some information from the compute node. We use that information to figure out what physical network it is in, and we'll assign an IP address from there instead of, if not, then it takes three or four tries before we get the VM booted up. So just for the previous question you said, if you implement security group rules, you cannot use the mega flows. But security group rules are implemented in the Linux bridge, right, in Neutron. So the Linux bridges really connect to the open V-switch and they did that because that's where you can implement the security groups. So this is not Linux bridge, right? This is OVS. Yeah, but still, if you look into the typical bridges configured by the L2 agent and the L3 agent. There's no L2 agent here, right? It's done through open flow. Okay. So we can talk more about it in details. I can explain that. Because in the default architecture, the VM connects to the Linux bridge and then to the open V-switch. Right. So you're right. Yeah. Okay, I have a question. Sure. So you're actually using pure software way of solution. What happened if the OVS traffic goes very high and the CPU usage on hypervisor goes very high or even crash? Then we are asking controller to do something under this circumstances. So I think if I understand your question, you're saying how do you monitor the CPU utilization of the computer nodes related to OVS? Yes. And whether are you using pure software way of solution or not? Pure software? Pure software solution. Yes. OVS, yes. Yes. It's pure software. Yes, it's pure software. And what about the CPU usage? CPU usage. So it turns out on the computer nodes, the amount of CPU utilization that are OVS demon consumes is not much. Usually it's a gateway node that happens to be an issue. But if there is a lot of east to west traffic there, because there are a lot of VX LAN in capital relation on the OVS, the CPU usage might go very high. So for east to west traffic, we use STT. So STT takes advantage of the NIC off-road capabilities very well. And we've seen the performance of our VM. Actually, far exceeds the bare metal in terms of not latency, but throughput. Actually, in the last OpenStack, in Hong Kong, we presented some results about STT versus bare metal. STT performance tends to be higher for VMs, because it takes advantage of the way it does. How it combines small packets and takes advantage of the NIC off-road capabilities. OK, we will take the rest of the questions offline because we run out of time already. Thank you very much. Thank you.