 All right. Hey, everyone. I am Adam Johnson. I'm from Mido Kura. And we are providing network virtualization. Our project is called MidoNet. And today, we're going to talk about a use case, in particular, which is managed private clouds, in particular, for enterprises using OpenStack and network virtualization. So before we start, I just kind of want to see how many of you have heard of MidoNet or MidoKura before? Can you raise your hand? Actually, quite a few. OK, great. So we're not going to cover an overview. There's plenty of other places this week where you can get more of a deep dive on MidoNet itself. But basically, just the high-level overview is that MidoNet is providing layer 2 through 4 networking services for Neutron. So we have a plug-in. It's fully open source. And it's essentially simplifying Neutron, making it more scalable and performant, and covering more advanced use cases. And some of these use cases are definitely going to be talked about today by Monsenosaan, who's from KVH. So I'll let him start off. And then he'll talk about his use cases at KVH. And then we can then later talk about some of the advanced functionality that we're building into MidoNet as a response to some of the issues that they came across. And then we can open it up for Q&A after that. So I'll hand it off to Monsenosaan. OK, thank you, Adam. Hello, everyone. Thank you for taking your time to join our session. My name is Monsenosaan. I'm an architect for cloud and managed service in KVH. Today I'll talk about our experience with OpenStack MidoNet. This is Ajenda from my talk. First of all, I will briefly introduce Fuizu KVH. And then I'll talk about what is enterprise, what is our target customers, and what was the challenge when we go into the production and what we did and what we make sure what we have tested. And then what we approach to the customer and customer has some requirements through their work proposal. This is about KVH. We are founded by Fidelity Group Investment Company in 1999 in Tokyo. Started from Tecon Business, Metro E-Sanet. Then we built data centers, high-quality data centers in Tokyo. And on top of that, we started to provide managed service. And then December last year, we acquired by Cult. It's a group company and then also established by the Fidelity in 1992 in London. Now we are together. And now we have reachability to Asian and Europe. This is our service portfolio, network service, and data centers, and managed service, and cloud. This is our target. What is enterprises? Enterprise is different. Definition is a bit different from tech-savvy companies. They don't have a lot of engineering resources to babysitting their IT infrastructure. But they have a lot of pressure from the business side, CAPEX, OPEX reduction over time to market. As Jonathan mentioned in this morning keynote, for example, the Arbor case, a taxi company is really suffering by the software. And also, the virtualization and the public cloud is like a default choice for enterprise. The situation is getting challenging every day. So we service provider, like KVH, have to help those enterprises with private cloud. So this is a checklist, a checklist on what we can provide to the enterprises as our managed private cloud solution. And then we choose the open stack for the platform. I don't have to explain why we choose open stack in this room. But we service provider can have a lot of choices with open stack. That's the biggest advantage. So however, there are challenges before going to the production to us. Capacity planning and performance validation after deployment, and 24x7 support to the customer, and then root cause analysis, trouble shooting. And one of the biggest challenges is the stability on the networking. When we decide to go with the open stack, Neutron is not so stable. And especially in network known redundancy, high availability is quite sensitive. And another challenge is high availability on a controller. So without those requirements will be satisfied, we cannot say, yes, we are 100% confident with our managed private cloud. So then this is our choice. We choose Midonet for networking. And we choose Milantes for a controller. So the reason why we choose Midonet is their architecture is really scalable. And then there is no single point of failure. And then they are aligned with the open stack and 100% commit to the Neutron. The Milantes already has more than 100 deployment experiences. And they have certain architecture for controller. This is a comparison chart between Midonet and the other solutions. As you can see on the top, the OBS, the kind of default choice for Neutron doesn't support those requirements. It's really difficult to have tier 3 support with immediate action. We cannot wait the feedback from community because customer business is stopping. We have to take care right away. So however, Midonet is scalable and then no single point of failure, then it's really reliable. And then also, the other solution provided by the other vendor for Neutron is also supported the kind of reliability and scalability and the support. But since our product is private cloud, it's not large-scale. It's not for the public cloud. So we should minimize overhead node, overhead computer resources. And then Midonet is really compact, but it's well-functional. That's why we chose Midonet. Why Midontes? Midontes covered almost all those requirements. So that's why we chose Midontes for the distribution. So this is what we have validated after we decide to go with Midonet and then Midontes. And there are 15 bare metal servers, not budget, all physical. And we installed Midontes OpenStack and Midonet. And we installed the CELF as well. And then there are two Alistar switches for the public and the Cisco switches for layer 3 function. This is what we have test for the layer 3 gateway on Midonet. We can access to the public network through the Midonet layer 3 gateway. Midonet layer 3 gateway supports BGP. And then with BGP, we can have redundancy and eco load balancing, traffic load balancing. And once the one peer is go down, the traffic is fail-overed with almost zero impact. And we tested different failure scenarios. And in any case, the result is the same. And for layer 2 networking, since there is a limitation on the layer 2, so we can have the only one active path. There is a not active active. This is active standby. So one built-in, the specific network, can use the only one path. So that's the layer 2 limitation. But fail-over is working pretty well. It's predictable. And then we can understand this is layer 2, and there is a limitation. But the problem with this is that once the fail-over happening and the fail-back happening, and there is two down times. If the switch is rebooting again and again, interface flapping, the network is going to be flapping as well. So this area have to be improved. But this is typical layer 2 typical behavior. So we can control this point. And this is a high-level architecture for control HA. As I mentioned, we use Milantes OpenStack. So you can access to the Milantes website for more detail. But we tried every controller node failure. But it's not easy to break. It's well configured and optimized with a pacemaker, core sync, and HAProxy. So Minonet database with Cassandra Zukipa is co-exist on the control mode. So this is a customer voice. When we pitch to their product to the market, we have some feedback from customer. And customer is want to start from small environment, then scale out. Because they don't want to spend money at the beginning from day one. Then they would like to scale out. And then another concern is the upgrade of their OpenStack. Since the release cycle of OpenStack is very quick, they are worried about how to upgrade my OpenStack. And then interoperability with legacy system or current systems is also required. Once we have OpenStack environment, it should be connected to their firewall or the load balancer, how to do that. And monitoring how to monitor the tenant and then how to handling the alert or how to have a bidding data from the OpenStack. Those requirements is coming in from customers. And then this is how to scale out the computer and the storage. There are two options. One is just enhance the computer and the storage node within the same OpenStack environment. The option two is having another region. If the current OpenStack environment is getting busy. So our recommendation is option two, because it's already well-designed and then cookie cutter and predictable. But in case of option one, when the resource is growing and node is growing, then load average to the controller and the OpenStack process is getting higher. And then we have to carefully monitor those resources. But option two is really predictable. So that's why as of today, I recommend to go to the option two. So how to scale network? Since middlenet using a BGP, it's really easy to scale out. We can add more layer three gateway nodes and we can have more BGP PR interface or BGP routers if you want. And then traffic is multi-passed. And easy to fail over is working. Layer two. Layer two also can scale out. But as I mentioned, since there is a limitation of layer two, we can have only one active pass for one network. So we can have a more layer two node, but we have to control the build-on ID for the traffic expansion. So this is how to upgrade my OpenStack. So there is two options as well. It's similar to the how to scale computer storage node. Option one is using a rolling upgrade. We put new controller cluster with a newer version and then using a rolling upgrade. So option two is having another region with a newer version. Then once the new version of the region is up and running and stable, getting stable, we can migrate compute resources from current region to new region. This is more predictable. And then as I mentioned, it's cookie-cutter design. And then once the current compute resource from original node is migrated to the new region, then we can upgrade. This slide shows the several patterns about how to connect customer existing environment to the OpenStack environment. Then scenario one is customer can enjoy the NFB 100%. So option two, a customer cannot give away the box-type firewall. Option three, customer cannot give up both firewalls, the RhonoVersor box. So I will show you the detail on each of the cases. So option one, so it's very simple. We don't have to worry about interoperability with the physical box via our load balancer. We can enjoy the NFB load balancer with LBARs given by the mid-net layer for based load balance. Very simple. And the security, we can have a security group. If customer want to have a layer seven level load balancing, customer can put virtual load balancer into their tenant. Option two, customer cannot give away their physical firewall box. In this scenario, the traffic between the compute node and the internet and external network have to go through the physical firewall. In this scenario, the IP address allocation for floating as a floating IP to the HPM have to be controlled by firewall. So then certain IP address range have to be properly afforded to the certain firewall and the certain tenants. So then source-based routing or policy-based routing is required on the layer three core switch. So in this scenario, we should talk to the customer carefully, and we should test with this environment carefully. Option three, so customer cannot give away both firewall and load balancer box. In this case, it's really difficult to have a benefit with the mid-net layer three gateway. So as an alternative, we can use layer two gateway. So with VLAN and a cloud environment and the physical box connected through the VLAN. But in this scenario, the traffic between tenant and then public internet have to be controlled by physical boxes, as most of the cases by manual. And then if those physical boxes will have the Elba's plug-in, maybe we can control through the horizon. But this area needs to be improved into a priority. So this is a wrap-up for my part. So we choose reliable and proven network and controller for private cloud. And there is a wish list for future release, hope sometime soon, layer two failover, reliability, improvement, and easy to upgrade my open stack. And more dynamic interoperability with appliance boxes. And the building tool, monitoring tool, audit tool, tools, those features need to be improved. Now we are treated as a professional service. We are writing a script giving to the customer. But if the open stack project improve those areas, so it would be great to the customer. So we are all ears. So if someone in this room has a better answer, please talk to me. Thank you. And I'm going to share a few slides here just to show some kind of tech preview stuff that we have in Muteinet. Some of this is being built to address some of the issues. Not all of the issues are being addressed yet. But this is what we're working towards. So just a quick high-level blurb, again, we're basically doing the layer two through four networking services in a distributed system. We essentially are replacing OVS agent with a Muteinet agent. And the Muteinet agent is basically providing much higher-layer services than just standard OVS vSwitch could provide. So that's why we replace it. And then we just talked directly to the Linux kernel. So these Muteinet agents running on every compute host are essentially providing this distributed networking service. So wherever you're running one, that compute host has the ability to run all of the services from distributed routing, distributed NAT, security groups, layer four load balancing, and so on. It does get certainly tricky when you start putting in layer seven or physical appliances. So we do kind of expose some more advanced functionalities that the Neutron API doesn't have today. For example, we have access to the full routing tables. We can do kind of source-based routing, our policy-based routing, and filters on these as well. So you have a lot more fine-grained control for these kind of custom scenarios that you probably run into if you're an enterprise user. So we have several customers and users of Muteinet. They range from kind of the software as a service or web 2.0-type companies like Getty Images to the public clouds like Auro, which is a Canada-based public cloud. To some enterprises as well, like Dell IT, for example, is building an internal IT cloud using Muteinet. And then on the tech partners, we work with basically all of the Opus Act distros. So KVH has chosen Morantis and we have several customers using them. We have several customers using Canonical and Red Hat as well. So we see all kinds of different distros being in use. And on the hardware networking side, while we are in overlay and we don't typically care about what's running on the physical network, you could run Cisco boxes or Dell networking switches, it doesn't matter, we are building advanced functionality into some capable switches that are out there, which I'll talk about. Six months ago at the Paris Summit, we actually open sourced all of our software. We made a pretty huge move to do this. We had spent four years building the technology and we open sourced it all on GitHub, Apache license, same license as OpenStack. And the reason we did this was because we wanted to provide an open solution for a neutron plugin that's production grade, very easy to use out of the box because we continually see people struggling with deploying OpenStack when it comes to neutron, making it HA, making it scalable. There's a lot of moving components in neutron. So, MedoNet aims to kind of simplify that dramatically while future-proofing you for your scaling as well. So, that's our intent. We're hoping to see a lot more adoption. The user survey just came out and we're starting to see us rise up those charts a little bit and we're hoping in six months time we'll see another rise in the networking space as well. And basically, Medocura, who I work for basically is providing a enterprise version of MedoNet. So, you can go to GitHub or MedoNet.org and use the open source version today. You can download it. There's many ways to get started with it. And then Medocura provides basically a downstream enterprise version that we've hardened and tested and we build some advanced management tools on top. For example, we have a graphical user interface which is providing some advanced functionality which I'll give a preview of. So, these are the kind of enhancements that we're working on right now. Some of these are being released in our next version which we're announcing this week. And some of these are coming in the next version. So, these are all kind of short-term things that are down the pipe, which some of these I think will address what KBH is running into. And a lot of the other ones we're running into are when it comes into operational, you know, when you're operating a cloud along the networking and you're using an overlay, whether it's us or anyone else, OBS plugin, it's very hard to troubleshoot networking for many reasons. So, we're looking for ways to make that as easy as possible. So, these are some of the first steps. So, one of the complaints about the Layer 2 Gateway, you know, is a software-based gateway and you can run an active standby. This is available currently, you know, that will let you connect into a VLAN. The problem is, you know, it has a failover time, maybe one to five seconds. That may or may not be an issue for you. And it also has a, you know, you have only a certain number of ports that you can have on a x86 server today. So, we did build in what we call VTAP or VXLAN hardware gateway support. So, there are some switches out there like that basically are using the Broadcom Trident 2 chip set, which we can program remotely and turn into a hardware Layer 2 gateway. The current version of MetaNet has an active standby model there. So, it's still gonna have some failover time when one of the switches goes out. But we are building an active active model right now. And this is something we're actually building with Cumulus Linux. I don't know how many people have heard of Cumulus before, some of you. So, basically, Cumulus Linux is a switch operating system that installs on switches. So, you can buy a switch from Dell or HP or Edgecore or many other vendors today, typically white box or bright box cheap switches, top of rack switches. And you can now start to choose the operating system you wanna run on that. So, you may want to use like a Dell networking operating system, which was Force 10, or you may wanna use a Cumulus. And, you know, Dell, for example, is starting to offer, you know, up to seven switch operating systems. So, switches are kind of turning into servers where the hardware and the software don't have to be connected to each other. I don't have to buy a switch from a particular vendor and be locked into the operating system. So, Cumulus has kind of been one of the early movers in that space. And basically, they're providing a Debian-based Linux on top of your switch. So, you SSH into it and it's essentially Debian. And what they do is, you know, they're providing the magic of connecting into the ASICs. So, when you are running any networking in it, it's actually being, it's all happening in the Broadcom chipsets. So, we're working with Cumulus on their next version in our upcoming version to basically provide active-active hardware layer-to-gateways. So, this will essentially allow us to program top-of-rack switches, run them in active-active mode. And this is using basically an MLag. So, this is actually vendor-specific. So, we're starting with Cumulus and we'll work with other vendors because everyone implements MLag differently, unfortunately. So, for active standby, this could work on many different vendors. It's not vendor-specific, but for active-active, it will be. But this is something that we're building which could hopefully address some of the issues that KVH has. For the troubleshooting tools, you know, this is something that comes up a lot with people starting to go into production more and more. They're running into issues and they wanna be able to have more visibility into the network. So, we're starting to put together some more tools to make this as easy as possible to troubleshoot or just to get a view of what's going on. So, what this is, is just a screenshot of our Medonet manager, which is part of the enterprise version. And basically, we're providing aggregated metrics in Medonet. Open Source Medonet provides aggregated metrics. We're putting this into a time series database and then we're graphing this in our GUI. So, when you click on a router, a logical router or a logical bridge, which would be like a neutron network or a subnet, you're gonna see an aggregated view of the traffic coming in and out of that device. You're also gonna see spark lines for all of the ports. You can do this on the physical hosts as well so you can see the traffic in aggregate going through the hosts. And this gives you a good idea of is traffic flowing through a gateway right now? Or if you have a certain tenant that's being much more active than others, you can look at all of the tenant routers and see which one has the most traffic so you can see which tenant is being attacked. So, many ways to do this, but we are looking for as easy a ways to provide this kind of information with as few clicks and commands as possible. The other one that we have offered, previously in Medonet, we had a virtual trace route tool. So, basically in the command line, you could run mmtrace and you set some rule, basically or some filter. So you say I wanna see a trace for traffic that's destined for 8888 or something like this, right? And then we'll show you what that simulation would look like in Medonet going through every step of the logical topology, how the packets are transformed and if it was dropped or forwarded. Previously, you had to login to each compute hosts individually and run this command to get the results of it. It's a little bit cumbersome because in an overlay, there's two sides, right? You have the compute host and you have the gateway. If it's going out or if it's going between two compute hosts, there's two sides. So you'd have to login and run two different traces to see both ends of that. It's kind of slow and cumbersome and not great for networking ops teams at all. So we decided to make this a centralized tool. So we built this into the GUI and basically now you can easily create and define different filters or different trace requests. So you can see just an example here. You can have access to basically all of the elements of a layer four tuple and create traces this way. You create it, you hit start. It will collect some specified number of traces across the entire system. And then it's gonna display the results of the trace. So this is an example of one trace that we could see where it's gonna show going through every single hop of the virtual network. And this hop is essentially one, happening in one physical box. So Medina is essentially providing this network simulation which goes through all of these steps going through routing between tenants, providing floating IPs and security groups, all in a single hop. So it's not bouncing off of physical boxes or VMs to do this. And this is essentially giving you the results of that in a very simple read comparatively. If you're looking at debug logs, it's gonna be kind of hard to parse. So this is much easier to look at very quickly and tell successfully created flow, great. So I know that logically this should be working. If it's not reaching the other end, if I don't see the matching flow in here, then I know that maybe there's a problem on the physical network between those two hosts so I can start looking at that. And another thing we're starting to do is provide flow history. So we have all of this access to this data and we're building a, this is coming in a future version which is pretty soon, but basically we're able to send all of the flow data to a centralized data store. The demo that we've done so far is using Elasticsearch LogStash in Kibana or Elk and basically sending the data to a centralized Elasticsearch LogStash cluster where it's storing all of this in aggregate and then Kibana is the graphical user interface. This is an open source project and this is the Kibana interface that you're seeing here. So what we're looking at here is essentially a graph of all of the flows across the system. You can apply filters, so I can say show me all the flows that belong to this particular tenant or belong to this particular host or are destined for this particular IP address. You can mix and match however you like. It's very powerful and you can look at the time frames. You can zoom in on a particular timeframe. You can actually use this to find the exact flow that you're looking for historically. So if you have a user who calls you up and says, hey, I lost my connection 30 minutes ago, what was going on? You could actually use this tool to look back 30 minutes and find that exact flow and see what happened and the result will be the entire flow data. So you'll get all of this data and you'll be able to figure out what happened. If it was dropped intentionally, if someone added a security group rule, for example, you can kind of use this to deduce these types of things which are kind of common operational situations that you might run into. And we're gonna take this data, so this is in Logstash, but the enterprise version, we're gonna store into Cassandra and use Spark to do analytics on it. And we're gonna put that into our GUI and make it much more contextual. So you see all of these UUIDs here. That's because Kibana does not know what those UUIDs equate to, but we do. So we can start connecting those to other data in the system. And if you hover over the UUID, you'll see the resource that it belongs to and get all of the data without jumping back and forth. So we're gonna be demoing this kind of stuff at our booth from tomorrow. So if you're interested in seeing this live, our booth is T7, so you can definitely stop by and check that out. And we'll be there tonight at the booth crawl as well. So now we're opening it up for questions for either us. Any questions? Yep. It's Prometheus, which is from the SoundCloud guys. So we have a blog post. If you go to blog.meta.org, there's a post on how to actually set it up. It's very cool, very cool. It has events and alerts and all kinds of stuff. Any other questions? It's hard to see. Yeah. Yeah. So are you gonna share that out separately from the existing Cassandra control plane? Is there a worry about that? I mean, right now, basically, we're not putting a lot of control plane data into Cassandra. Most of it's going into ZooKeeper. But yeah, it definitely could be going into a separate cluster. Especially if your data retention policy is, you wanna store a ton of data, this could fill up very quickly. So it could definitely be sent to a separate cluster for that purpose. It would make sense to do so. Yeah. Any other questions? It's hard to see. Okay. So basically, all of the APIs and the interfaces are being exposed through the open source project. So for example, for the metrics aggregation, we put it into Prometheus, which is another open source project. So those could definitely be brought into other tools like Manage IQ, no problem. We're designing them to basically be very generic and pluggable to whatever tools you use. Hello. I get a question. You mentioned the physical server support. And you mentioned that you are working together with Cumulus Linux to support it. Does it mean that you are going to implement some specific feature into Cumulus Linux? Or are you biding the support on some specific protocol? Why did it be OBSDB or whatever? How do you implement this support? Good question. So for the current VTEP functionality, we're using OBSDB. So it should work across any of the switches supporting OBSDB, but for the active-active, the only way we can achieve that currently is by using MLAG. So MLAG is designed differently between every single vendor, because it's not a standard. So for that functionality, it's gonna be cumulus specific. And we are looking at building more features into switch fabrics. These tools are great. They're providing us visibility into the overlay, but no visibility into the underlay. So we're working on, we're working with Cumulus Linux initially to build us a switch agent that will run inside of Cumulus Linux and can provide metrics and congestion detection and reporting and things like this from the physical network. Initially, we were working with Cumulus because we have a lot of joint users and we like working with those guys, but it should be something that's not just tied to one distro. So we're looking at all of the other ones that are out there like Pica 8 and there's some others that are starting to come out. So we're looking at those and seeing, based on customer traction, how we should prioritize working with them all. But that's kind of what we're thinking. Any plans to support the pluribus networks or Cisco for this? We're open to talking to all those guys for sure. Yeah, definitely. Thanks. Okay, that's it, great. So thanks everyone. Thank you very much.