 Okay, let's get started. Nehoma, I think it's the right one. Hi, everyone. So, thanks for staying here. Today, I'm going to talk about actually the, our SDN approach we've been using in our data center. Actually, this is my personally second talk for OpenStack. We had Betty first talk with Mike Wilson, one of my peers at Portland last time, regarding our data center. So, how many people actually attended our first talk at Portland here, other than my friends? Okay, so, all new actually here. Okay. So, from this title, I'd like to just have, okay, how many kind of interesting keywords here? Okay, truly open, commoditized, and software-defined, network or networking, and OpenStack, of course. So, from those titles, I think there are two important keywords. One is, of course, OpenStack. The other is SDN, software-defined, networking. Now, the finally OpenStack meets SDN, and SDN finally meets OpenStack. I think that's all about Neutron project. That's my view. I mean, you know, think about how many plugins, you know, SDN plugin existing in the current, you know, Neutron. Okay, it's not working well. Okay, so the question is, why they need each other, basically? Are they nice to each other? Do they really need? That's a question we would like to answer for this talk. From the OpenStack perspective, one of the critical components that we have to have is the technology that allows to virtualize networking. Without it, there is a lot of problems. Now then, as of today, SDN is one of the most promising technology that seems to give us the power to virtualize networks, really. From the SDN perspective, SDN still stays in its infancy, and they need to find killer applications to prove, yeah, SDN is really awesome. Then OpenStack is one of the biggest candidates for them right now. Okay, so these two big brothers finally start to talk to each other. Okay, for our data center, we... very interesting kind of requirement. Our company basically... Our products basically in a traditional hosting company, like Dedicated VFS, Shared VFS. Those are not really cloud products yet, but our entire infrastructure based on OpenStack and SDN. So very interesting. We aggressively utilize all edge cutting technology for the underlying, but still we're selling traditional products. So ironically, that situation gave us more favorable room, favorable room so that we can focus on only set of targeting goals and optimize fully. And then we're trying to actually move to the next stage. So let me talk about our L2 fabric data infrastructure. So when VM is migrated to another rack, the requirement is this one. We don't want to ask customer change public IP address at all. We need to keep that. And we need to still achieve QoS, isolation, ACL, file, everything. So basically this talk consists of two main parts. What we've done, what we are planning to do in terms of SDN. So for this part, we already done. So I'm going to share all the detail level of algorithm, you know, mechanism, how we can achieve all these goals. And the next stage is... Let me just use the... So when there are relevant VMs belong to the same tenant, and the next stage we would like to actually provide the tenant isolation network as well. As you can see, yeah, this is wonderful application for SDN. You know, that's our hope. Okay, we hope to utilize the SDN technology to achieve our goals here. Now, there are some several key points I would like to really make here before going to the next slide. From the targeting L2 Fabric. What does it mean by L2 Fabric here? Okay. Another data center, you know, I don't know, typically probably L3 Fabric rather than L2, right? And they chop up some of area, or pod or zone, something, and then they link together through L3. But our data center is purely just L2 Fabric. There is a reason. So what we are trying to build is actually simple data forwarding plan. And very interestingly, there is no unknown traffic in terms of a virtualizing network in open step. Meaning is that when new VMs created, we already know that. Which hypothesis are signed, traffic needed to be allowed, or which traffic is needed to be blocked out? We already know that there is no yellow signal. That's one important, actually, observation. We don't want to use any L3 agent. It's a lot of actually performance overhead issue, so many issues we ran into. We don't want to use any netting. That's our one of our strong requirements. Have you seen this movie Matrix? So Agent Smith. Remember how finally Neo dealt with this huge number of clones of Agent Smith? Don't you remember? Runaway. Literally, five minifigures ran away. Okay, gave up. Avoid them. Don't face them. Third one. We would like to keep high entropy in the packet. There are several reasons. One important reason is we would like to really utilize this variety of information for underlying L2 layer multi-pass. We don't want to lose variety. So this kind of simple L2 fabric is very naturally helped us to achieve seamless and very straightforward VM migration. So let me just give one example before starting our real talk. So 15 minutes here. What is all about? If you ever run Neutral Portalist against 20,000 port, it takes 15 minutes. Okay? Here's the ball. I didn't measure. Did you measure the time? Let me try it one more time. One, two, three. Okay, now three seconds. So we found this problem and then we walked through all the code and when we fully optimized the code finally it was reduced to three seconds. Even for 20,000 ports. Okay. Point is not that specific example. Point I would like to make here, that kind of a scale level we have to deal with in our data center. Okay? Huge number of objects. Basically, we are running actually 20,000 physical servers in one frame of OpenStack. I think we are the one of the very few largest deployment environment, I believe. We are not using any cell. Basically, if you know the cell basically we effectively using one cell. By the way, in how about a release I found that somebody actually found that the Portalist bug or problem so it was improved a lot actually. Oh, Europe? Okay, cool. Okay. About halfway. I know that still there are some rest of things still overhead. So it still takes kind of five minutes or six minutes. But at least I couldn't see that. More than half of is gone. Good. Okay. So let's look at how many available SDN control right now. Okay? Not boxed anything. Right here. Big switchy. Nice here. Onyx. Floodline. Huge number of SDN control out there. And some of them already has a plug-in in OpenStack. Some of them will be. And for example, like Open Daylight. Okay? Okay. As a user, we don't care. What we are carrying actually is it open or not? If something is closed, how many boxes here? Is it four or three? I don't want to waste time to figure out, you know, something is hidden. Oh, is it four or three? Just open it. I need to see that. Okay? So Open Daylight is one of our kind of candid really we are interested in. And we have hope. Open Daylight probably, you know, truly open kind of component in neutral. It has not been actually yet merged. But I believe during the actually Ice House, they're going to probably, you know, add finally this one. So let's see. By the way, this is the only plug-in name that really starts with Open. Okay. So let's look at the general entire frame of SDN architecture. There are three components, typically, one of the external entity at top, at the middle, SDN controller, that has typically a kind of set of SDN application logic. And some controller actually has a network technology management plan as well, to keep the global view of all the physical switch, how they really connect to each other. Some of the plug-in is only single instance by design itself. Some of the plug-in by design already distributed. And at the bottom, as you can see, often full of protocol switch there. I mean, there could be other type of, you know, protocol as well, but right now as of today, OpenFlow pretty much kind of one promising standard people really utilized. And then we also fully utilized the OpenFlow protocol right now. So what happens typically is that, what it calls is Northbound API, one of the external entity, whole API to SDN controller. And then after that, through the Southbound API, basically OpenFlow protocol, they deploy necessary flows. Let's see what kind of property or, you know, features there. So basically through the Southbound API, what they're trying to do is kind of making some fold play, okay, by specifying rule, okay, this pack needed to be dropped or allowed to wear. But interesting thing is once you start to use the OpenFlow protocol on physical switching, effectively you're going to lose some of the legacy, like traditional features, for example, source map learning, that's gone. Because you're going to have full control over the entire physical switch, okay. There's an interesting feature, features, timing. Some flow is reactive, some flow is proactive. Meaning, we can proactively deploy necessary obvious flow rules on targeting switches or ask switches to consult testing control when there is no rule yet deployed. But as we discussed, okay, in our specific context background, there is no known traffic. It means we can do all the time proactive. We don't have to worry about all kind of delay issue when you use reactive mode. Transition, okay, our data center, okay. Still everything is traditional L2. We have not really utilized yet any kind of S&M technology for the physical switch. How are we going to, when we turn on from the existing switch to OpenFlow, what will happen, wow. We need to figure out something, like for example, I'll say this, storm control is very conventional feature on existing traditional switch. Then you need to actually make sure that the storm control still will be available for your OpenFlow port. As some of vendor switch actually has only, providing only pure OpenFlow port versus some of vendor actually providing hybrid port mode. Meaning that OpenFlow as well as legacy traditional feature still is working. And definitely you're going to actually run into this issue. Okay, what is the maximum allowable number of OpenFlow rules you can add into physical server? This is really big deal right now. Some of vendor using T-CAM, and T-CAM typically is 4K, is a very expensive equipment, and you're going to end up only using 4K. I would like to say to those vendors providing 4K, don't say you support OpenFlow protocol. We've seen this number, 120K. Actually we are using IBM Switch right now. And thankfully IBM Switch is actually very intelligent. I don't have any special relations with the vendor or something. What we are focusing on is that as long as vendor providing standard protocol, we just go with it. Anyway, thankfully IBM Switch has a very intelligent way to deal with this OpenFlow protocol implementation using 4DB rather than T-CAM. So they could actually successfully actually increase the maximum allowable number of OpenFlow here. And eventually you need to think about how you can bundle up those necessary flow so that you can effectively reduce the required number of flow on the Switch side. This is all generic as in architecture. Let's finally look at our OpenStack SDN framework, how it works. So there is a neutral server neutral DB. Suppose that we have one of the chosen SDN controller. Agent, what it does basically prepare necessary basic obvious structure. For example, IP address of SDN controller so that whenever you need to do something, consult that guy. There is a real request to create virtual interface. What happens is it actually creates the relevant database item and then sending typically REST API, whatever the protocol let's say. There is a communication between them. So call the REST drive to the SDN controller and then deploy obvious flow through actually that established connection with OBS. So from this architecture very interestingly it is by design intended to be minimal functionality there for agent. Because obvious SDN controller should have its own control logic. All complex, all kind of business logic should be there, not agent. That's why intentionally agent supposed to be so far is so dumb. So good to be here finally. ML2 finally has this path. For some or other part, it didn't have this RPC core from server to agent. Because of this lack of feature, that's why I was so happy to see that RPC finally you guys added the functionality. So let me see here. Who is creating OpenBSQT have interface? Nova compute. Why Nova compute needed to create OBS tab? I mean we thought that the neutron is main motivation is pulled out all the network functionality to the neutron whatever the component in neutron supposed to create OBS tab. Of course, there is no RPC core. Who can do that? Okay, just use Nova compute that has already RPC core and just using Nova compute. So these are the problems actually we kind of run into in our data center. But because it's any kind of problem anyway we have to apply this framework to our environment. So let's talk about how we can apply this frame into our data center where we running 18,000 physical server and each guy running all OBS. 18. 18 virtual switch. They are all connected to S10 controller. Not only that you already have hundreds of top of rack switch, physical switch. They are all needed to contact S10 controller. Answer now here. Okay, doesn't scale. We can't use this solution in our environment. Doesn't work. So we got dilemma. We started to talk to S10 vendor controllers. So what is truly scalable S10 solution now? Not yet. Will be soon. When? Okay. Who knows. So then we turn into actually the neutron team. Yeah, we got this kind of question. So here, can you use different approach? No. Oh really? Why not? Vendor working on it. Okay. We have a circulating dilemma right now. Okay. This one actually story is back then like more than six months ago. Okay. So we started I think deploy real world VM I think in January. Right. So we just thought that somehow we need to have more understanding or deep insight about what's going on really in terms of using S10. So we look at one component here, computer node. They're running, often we switch it. They're running a neutron agent and they're relevant to VMs. And then observation we made were basically, oh yeah, neutron agent already fully distributed everywhere. Structurally it's really good. There is no single point of failure. Oh yeah, this is a good framework. And when you look at the deployed open-flow rules on every single computer node, those are rules very specific to their own VMs only. Of course, right? We don't have to deploy the other host VM flows on my host node. I don't have to worry about that. So structurally I'm saying already fully distributed. So we thought that oh this is very good structure we can run somehow without using any S10 control. So that's our starting point. Okay, let's add actual S10 functionality into agent itself. We're not going to use any external S10 control. And then deploy necessary obvious flow only through neutron. Yeah, so this is our approach, finally. When there is NOVA neutron server got the request to create, of course create DPI then directly call agent through RPC, finally. And then ask them okay, please deploy necessary API. But not only that if you ever need to also control physical side switch, not only computer node, then you can also send this REST API separately to the S10 controller. I think what they explained is pre-commit or post-commit is kind of related to this story. Then finally the agent itself deploy necessary obvious flow directly without having any S10 control. Now left side that story is what we've done so far. Right story, what we are planning to do. You're going to get to know eventually why we need to eventually have S10 controller. Because in terms of dealing with the physical side anyway, we have to have some kind of S10 controller puzzle. So, now this diagram I just pulled out some of the famous, it's almost actually just a paper that has this five page last year published in S10 hot S10 conference. And I found a very interesting suggestion from the paper. They're talking about the separation of controller. Let me read this paragraph. In the structure as you can see, there is a source and the destination and the very first hub switch, they call it actually AG, English AG. And then for the destination, there is another English AG switch. And then between them, yeah, whatever the fabric elements there. Then they say that fabric is responsible for packet transport, mainly. AG is responsible for providing much more rich services such as network security, isolation, mobility. And turns out to the author of this paper is the S10 inventor, Martin Cassado. So we were very thrilled about that, like, oh, we already done it and then S10 inventor saying the same thing, oh, this is good. So actually, thankfully I had a chat with Martin about the very first day who was almost leaving. And then I got kind of confirmation from him and he promised he would watch my video, so hopefully Mark needs to see this. Okay, so I borrowed your diagonal here. Okay? Martin, sorry, Martin. Anyway, so the point I would like to make here is this one. Some kind of, there's a very distinct role between the set of edge switch versus fabric switch. That is very interesting recognition and understanding in terms of utility as a SDN. Okay? So, here are key services we already implemented. We're using for live production, using only neutral. We don't use any SDN controller yet. I mean, as I said, we're going to use that for other functionality, tenant isolation network. But when you use a full-length network, physical full-length network, and on there everything is directly attached. There was a lot of question, I remember. Some people are arguing about why we have to use always L3 agent. Some people, you know, want to use directly attached but still would like to keep isolation. Yeah. This is the case. Exactly. We done it using this frame. We already deployed the firewall rule. Well, very interestingly, when we implemented this firewall rule API, which doesn't exist at all yet. That's why, you know, we newly created a framework within our framework. We actually end up using NOVA API, unfortunately. Because as we discussed, the neutral framework doesn't work. There is a lot of lacking factor. What a shame using NOVA, you know, component 2 to deploy the obvious rules. But anyway, through ML 2, we really hope that finally we can properly utilize ML 2 to provide all this API in the right place. We already have an entire IP spring rule. So if one IP address assigned, the other IP address try to be used by VM or make it turn down. We already utilize QS bandwidth there. We already utilize multiple IP address per port. These are all implemented by open flow rules on every single host without using any external SDN control. So I bottled this slide from my first talk at Portland. I believe finally this slide found the right spot with all background sharing here. So this is background, you know. Given that all this context, we finally utilize this method. So for QS we set up outgoing bandwidth like 10 meg or 50 meg and then we deploy the destination megadress matching open flow flows for incoming packet. So effectively cut out. If some bogus megadress try to be arrived at VM, okay, we just out front just cut out. For the outgoing packet, we look at the source IP address from the packet and then if it is a resume IP address or spoof, it doesn't go out at all. Now what if VM try to connect to the other VM that reside on the same host through their public IP address that we need to allow them, right? So we end up actually deploying that n-square number of source destination megadress matching flows, right? Because if we have n-vm on my host then I need to have full combination of flows. So it was not really desired. So we actually end up working on optimizing this pass. So we newly added actually another path of you know, beef interfaces between two breaches. So as you can see here now total number required obviously is 2n rather than n-square. So on each side we just deploy the destination megadress only. We don't have to look at source at all because of this different route. By a rule I would say it is effectively same functionality of a security group. So the API allows to specify protocol, targeting protocol, targeting ports which one needed to be open which one needed to be closed for incoming, for outgoing we can do everything here, like this. Now, very interestingly I got to know while I'm attending some design session for Neutron team, one of kind of a lack of a feature or fear that people worried about migration from the Nova network to Neutron is this kind of security group is all of a sudden it's kind of gone. I mean unless you try to find specific SDN control that's really providing matching functionality. Basically it's very hard to kind of keep the traditional feature that provided by Nova network before when you migrate, if you lose some functionality then we're going to do that. So I think this hopefully can be one of good motivation to have in our Neutron so that people you don't have to worry about. You can keep same functionality even Neutron without using any special component. Okay, we've done so far what we've done. So it's been really successful for us to utilize all this technology in terms of providing public IP address to the customer without asking them to change it when there's migrated there you know but we finally actually come to the point to the next phase we would like to provide the tenant isolated network using this SDN technology. And then we found that actually now finally we need to work on some real SDN control side. So this is a very typical structure. From now on I'm just trying to share possible options with you guys. So Neutron agent basically try to actually change the actual address to positional as I said in terms of dealing with the physical switch side you have to think about how you can effectively reduce the amount of obvious flow on the physical side. So this is one of kind of a very well known trick by changing the mega address to the location based you can easily bundle it up on the higher level actually switch using mask. That's the basic idea. So then what happens is that physical side switch will see only positional mask. They don't even know what was the actual mask at all and they can easily bundle it up at the core switch. Now finally at the destination we can just transfer back to original mega address. So VM side they don't even know what happened internally mechanism. The physical side they don't even know what was the actual mega address. That's one way to deal with and as Robert actually explained even with this framework sometimes we need to set up server or not. It depends on your kind of environment but basically what needed to do is here actually we just provide past determination algorithm. That's all. And what you're planning to do is actually okay we're going to use external control against only physical switch side not but virtual switch on every host because agent already dealt with all kind of necessary functionality. Now another option to consider is this is I think one of very popular mechanism the S10 control actually tried to push in. Here, overlay network. So your fabric side can be L2 or L3 but you don't want. As long as there is an overlay network tunnel created like a lot of protocol VXLAN STT, GRE someone something is standard there is a lot of debate which one is better in terms of performance but as you know that GRE is least I mean GRE performance is really bad. So people try to utilize any better solution VXLAN any other type but there's a good thing though here in terms of using overlay network it simplifies a lot of actually necessary requirements because overlay network everywhere as long as there is a routeable path there. So what switch side we'll see is like normal TCP or UDP packet. So VXLAN use UDP and STT actually use TCP for encapsulating L2 packet so cross side anyway is just normal UDP or TCP packet they will see and then when there is arrived at the destination and then they're going to just use a tunnel again. So these are all actually for the unicast stuff. Again this is our plan we're going to use external controller anyway. Now multi-cast broadcast packet is kind of very tricky part. So this is one way there is a drawback with this approach but basically what it call is who are you meaning is that when the switch receive the multi-cast or broadcast packet look at those mega-dress meaning that who really wants to send this broad packet meaning is that I can figure out the old relevant VM that belong to the same tenant ID. That way we can just broadcast necessary path only to the relevant VM. So by looking at who are you and then oh yeah I know I need to send out two ports because there are some VMs existing under that path and each switch is just having all the who are you question and then properly select only necessary ports as an output. But the problem of this approach is that we need to also work on this stuff actually you're going to end up producing a lot of overflow rules because you need to look at the source mega-dress. And once you start to look at the source mega-dress typically you're going to end up in a T-cap which has 4k then kind of dilemma. So it's kind of there but I'm envisioning actually eventually when this Sdn technology is really mature enough and the vendor really try to provide Sdn dedicated chipset or all kind of automated stuff I really hope that we can effectively have some good network architecture soon. I don't know who knows. Okay so another way to deal with is that whenever the one multi-cast packet generated we can generate multiple unicast packets for one multi-cast packet so that we're just sending out multiple number of unicast packets to the switch and then switch side only see unicast packet just as normal so we don't have to really utilize any specialized MankindZen4 stuff. So this also is one of the approaches from the Sdn controller right now. Okay. So these are the old plans we're trying to do if something goes well in the next release I really hope to have this kind of talk again okay how we can achieve the work from I don't know but anyway so we are on the way now. So let me bring our first slide for this talk. We started from this slide right. OpenStack Meet Sdn, Sdn Meet OpenStack. Our message if you would like to deliver is this one we need to have truly open Commodore Touch Sdn solution in OpenStack as a default so that everyone can use that okay. And if some people some customer or company wants to utilize very advanced feature okay please go ahead no problem but for the community we need to have truly open plug-in that's our message we try to really deliver so thankfully when we already have several discussion with the Neutron team and then Nova team and some of all other PTL members at least we got very positive feedback seems one way to go I mean we're going to leave that existing plug-in we're not going to change anything we simply try to add new type of plug-in like this so that people can enjoy this thing and then I submitted a design suggestion but unfortunately it was rejected because there are a lot of more topics important other than this one so if you're interested in some of the written document about our work you can see some more paragraph over there so thanks a lot I think I can just start to get some questions so could you please microphone so I have a question about your top of the rack switches are you using any white box solutions are you working with any vendors the kind of features that you're asking for I don't think any of the top of the rack switch yeah that's kind of a tricky part eventually anyway you're going to end up using some specific vendor product anyway right actually as I said we're using IBM switches so IBM switch is one of the core contributors to the open daylight too so thankfully we got really good support from IBM right now so probably we're going to just continue working on that so I someone has asked the question I think first I mean are you going to contribute this upstream and why isn't it upstream already I mean a lot of this is really nice stuff if you're thinking of this as a competitive advantage for a blue host I'm not sure but it would be really good to see patches for this we are fully committed to open everything we develop that's for sure now why we don't have yet a lot of other practical issues but we are really fully committed so if you help us to really add you're going to do that yeah absolutely and I would love to see all of this push upstream and I think you guys should definitely collaborate with us on this so cool you said you don't use net so how is your public IP solution in your deployment which is the public IP support private IP public IP yeah that's right we directly attached the public VLAN so every VLAN directly that's all so that's why I show that how we can provide the still even in the case how we can provide the isolated network so these are the older diagram basically we achieve isolated network meaning is that we only route through look at the destination matching meaning is that by looking at the destination mega dress which VM has this mega this we know that only passing through that all other VM doesn't even see this packet at all right any other question so is there do you see value in having kind of a combined information base for your virtual switches and your physical switches I mean the fact that you are now using an STN controller just for the physical your torres and everything else is kind of controlled by neutron what does that do to your overall visibility that's exactly the question that we are thinking now I mean that's kind of a tricky question and then to myself always asking the same question like eventually we have to ending up really should have the global view including V-switch or not but very interesting so far in terms of dealing with all this full set of functionality turns out to be not yet no but I don't know who knows that's a very good question sounds good thanks a lot for being here