 I would like to talk about it. Also, we were interested only in the networking virtualization things related to open contracts. So it also involves the open stack, because our use case was about data center orchestration and network virtualization in these data centers. So I'd like to talk about this today. What's the plan? I would like to talk, introduce a bit an open stack. What is it? Why it has been invented? What problems it solves? And just talk a bit about this. And then focus more on open country, which was the most work we were doing, actually. And talk about why it was also invented. What kind of problems it solves? And how it does it. So I'll try to describe a bit the software architecture of this. And at the end of the presentation, I'd like to present the status and what are the next steps that has to be done. Why I'm talking about OpenContrail today in this presentation is because one of the tasks I was involved was porting OpenContrail to FreeBSD, because you may know these two systems, probably better OpenStack than OpenContrail. But they had been available only on the Linux platform. And we wanted to change it, and actually we did it. So when it comes to the data center, we have something like picture here. We have many hypervisor which are physical servers, physical machines. And we host many virtual machines that compound some user applications. When you are a cloud provider, you will have plenty of users. Each user may have plenty of virtual machines because his application may involve having separate database on different machines, some load balancing on different and this kind of things. So you'll end up having many virtual machines that will be transparent to your user. He will just like to use them as just a regular server. And when you are managing this kind of data center, you have to make a decision on which hardware you want to run which VM. It is important because given server may have not enough resources to host yet another VMs. And managing this kind of stuff and setting up networking between those VMs may be challenging. And something that will manage this has to be introduced. And we can see that it is quite similar to what we see when we have one machine. We have also resources. We have CPU cores. We have memory. We have storage. We have networking. And the operating system is doing management of this on our behalf. So this is exactly the same situation, but we have many machines. And what was created is an open stack. An open stack calls themselves that they are the cloud operating system. And I think they are pretty right because this is actually what it does. They divided it into three major functionalities, the compute, the networking, and the storage. I'll focus on those two first, about compute and about networking. About compute because we have to have ability to spawn a VM on a given machine. So it is necessary. And networking because we were most interested in this. As open control is a solution, is external solution to handle complex scenarios and very large systems when you have to have some kind of flexibility in terms of configuring and managing networking. So let's start with the open stack. It's actually what I said previously that they aims to manage those three things. And the things about compute and storage had been already very well supported by open stack. But when it comes to networking, the stack solution implemented in open stack is not good if you have a large number of systems. It doesn't scale just so. So in terms of networking, you have to have some external solution that will allow you to maintain your data center and expansion of your data center. But for the simplest scenarios, it works very well. So we ported also the networking inherently implemented in the open stack to FreeBSD. And also we ported the open control, which allows for more sophisticated networks. The open stack is composed of many, many components. The most important one for the compute is NOVA. It implements all which is necessary to schedule a VM on a given hardware and then talk to the hypervisor and just spawn this VM. It also provides this networking managing facilities. But these are in the case of these simple networking scenarios. Neutron is a component which provides networking service that may be implemented by the external systems, such as OpenContrail via the plugins mechanism. And there is also orders like glance, which holds the images. Because if you want to spawn a VM, you have to fetch the image from somewhere. So there is plenty of other components. But our interest and we were focused on this compute node components because they are the ones that are dependent on the underlying platform. Because the rest, actually every component in OpenStack is written in Python and they are just using standard tools, standard libraries. So if you have this library, you can just run it on any platform you want. But there are exceptions when you want to talk directly to the operating system like, for instance, the hypervisor. And since we have BeHive in FreeBSD now, we can use FreeBSD as the underlying platform for the OpenStack. So I'll now try to focus on this compute node and say a few words about that. Compute node works. It requires maybe it's like, we have a hypervisor like BeHive and we have to be able to control it. The simplest way is to use some abstraction like LibVirt, which is fortunately available on FreeBSD. We did some development around it. But the initial part of LibVirt for FreeBSD and support for the BeHive had already been done. So it saves us a lot of work. This LibVirt is controlled by the Nova Compute process, which is a Python demon, which just uses the bindings of the LibVirt library to spawn FEMs. And it also talks to another process, which is Nova Network, which is responsible for setting up networking. In this case, it creates bridges. It creates villain tags, associates some addresses to the interfaces, creates tab devices, et cetera, et cetera. And those three components are necessary. And those marked by violet color here were necessary to their mist, not mist, but they were not working on a FreeBSD platform. So all these three components has to be rewritten. But the main, in terms of OpenStack opponents, there is this Nova Compute and Nova Network, which we put most of our efforts. Of course, it was also LibVirt, but the LibVirt didn't require as much work as those two. So Nova Compute is responsible, as I said previously, to spawn and destroy some VMs. And Nova Network is responsible for setting up the networking. So how it actually works, let's try to analyze some example. When Nova Scheduler decides that this given VM should be spawned on this given platform, then Nova Compute on this server fetches the image from the glance service and builds an XML description of the domain that is to be spawned. This XML, it's a LibVirt defined thing that you have to create if you want to use LibVirt to spawn the VM. In the meantime, Nova Network configures the bridges, configures all which is necessary for the networking. And once the LibVirt spawns the VM, it puts the tab that corresponds to the interfaces from inside the guest VM on the proper bridges, because each bridge means each virtual network in this case. So this is a very simple scenario that allows you to have some flexibility upon spawning your VMs on different hosts and have some simple networking between them. Just to summarize what we up to now, what we did in terms of development, it was, like I said before, LibVirt. It is work of Roman Bogoridetsky. I hope I pronounced it correctly. He made initial part of the Beehive support for the LibVirt and also he made some hacks around the QM, which is also actually necessary by the Nova Compute to do some stuff like converting from one to another image formats. We did some adjustments in the Nova Compute just to allow it to use the Beehive hypervisor, because the code which is responsible for generating the XML has to take into account that we are now using different hypervisor. And there were some things that were specific, like manning some things inside Sys-of-S, which we do not have in FreeBSD. So this was adjustments done to the Nova Compute. And when it comes to Nova Network, it actually works by executing command line tools on a given platform. So in the Linux case, it was just bridge control and IP tools. We do not have this in FreeBSD, so we have to write our driver which will use if config for doing this stuff. Also, there is a big difference in executing in Mask, which is serving DHCP and in services for the VMs in this case. So we also have to make some modifications to this code. And the last but not least is a DevStack. DevStack is a script, a huge script that is supposed to install, configure, and run the entire OpenStack cluster. If you run the OpenStack cluster entirely on one host, it would consist of more than 20 processes. Each of them has to be properly installed, configured, and executed. So this is what DevStack does. And of course, there are no support for FreeBSD. And unfortunately, it's a bit pain to work with DevStack because there are a lot of differences between even flavors of Linux distributions. Like if you have Fedora or Cent or Zubuntu, there are different packages may be named differently. And you have to cope with this. We have totally different things in FreeBSD. So unfortunately, it is still diverging. We haven't yet merged upstream to the DevStack code. So all the time it's diverging because there are some changes, some things, and it broke hours. So it's a bit of pain. But we have to have something that is able to easily set up a cluster just, for example, for the development purposes. So that was what we did in terms of OpenStack. We created this part of Nova Compute and Nova Network. But our goal was to have more sophisticated ways of networking. So we now went to the OpenContrail, which is actually serving this networking part of the OpenStack. And let's have a closer look about what we have in the typical RAG in the data center. We have a switch which connects to those physical machines. And inside of these physical machines, we have those VMs. So we see that even if we have only one physical endpoint, networking points, we may have several virtual logical end points associated, because we have several machines running on this host. And this is only one RAG. In reality, you will have much more RAGs, much more servers. And you have to provide network connectivity between them. And if it comes to the physical network connectivity, this is an example of top of RAG architecture. You use a typical close network architecture to connect each of the physical hosts to each other. So it looks like we have now the problem of the physical connectivity is done. But the virtual end points may migrate from one host to another host, maybe in one RAG. Then later in other RAG. So the packets for this virtual end points has to cross over these physical networks. And it may not be as easy as it seems, because it may involve in the typical scenario, it may involve the necessity of reconfiguration of the switch, and so on. So what we can observe is that the very important observation is that in contemporary data center installations, the majority of the end points are virtual. So we need to take special care of this. And we have to have an isolation between them, because you may have several users. You may want to have isolation between its front end or back end system in these applications, whatever. So we have to provide all this. And we don't want to change the physical network when it is doing, because we want to build a data center and then let users decide what they want to do with this. So we don't like to be forced to do some hardware changes to that. And there are some solutions. There's many solutions I would like to present, two of them. One is the bridges and villains, which is what actually does OpenStack by default, that's what I was talking about previously. And the other is the overlay networking, which is what OPEC Contrail is doing. I'll try to compare them, and now in a minute we'll see why the latter is a much better solution. Let's stick with villains. When we have a villains, we just put VMs on the host and you divide the virtual networks by villain tags and bridges. We have a limit here. So if we want to have more than 4,096, we'll hit a problem. We can, of course, overcome this using a shorter path bridge, but it is not very flexible, difficult to manage. And what's important, the physical switches, those there may be a lot of them, they may be very expensive, has to keep a state of the virtual networks in the system. Why is that? Let's take a simple example. We have three servers here, server 1, 2, and 3, and have VMs on each. And I distinguished different virtual networks by color. So we have here red and blue virtual network. And let's assume you want to send a packet from VM1, this in the top left corner, to the VM9, so blue network, which is at the bottom. So what we'll see on the wire is that we'll see Ethernet, MAC address, villain tag, and IP of the VM9. And this packet will hit the switch. And this switch has to know on which port this VM is now connected, because the switch by our protocol knows where the physical hoses are connected. But if we put on the wire the IP address of the virtual network, it also has to somehow know that it has to put this packet to the port 3. The problem is when the VM9 migrates from server 3 to server 2, then the switch has to now no longer forward packets to the VM9, to the server 3, to port 3. But it has to forward it to the port 2. So somehow it has to get know that this happened, that this migration has happened. Of course, we know the migration happened, because we have the OpenStack orchestration system. And he decides where the VM is now located. So the information is, but it has to be somehow propagated to these physical switches, all switches that takes part in the root of the packet. There are solutions for this. For instance, standard OpenFlow tries to standardize the way how these controllers can talk to the switches and then do it automatically when the VM migrates. It can reconfigure the switch. But it is some solution, but there is a better solution, which OpenContrail provides. And it is based on the Overline Networking. It is by no means new technology. It has been known by the industry for years. But its utilization in data center is relatively new. And what it does, it separates the physical, and the logical, or virtual networks from each other. The physical underlying network is called by the OpenContrail Nomenclature IP fabric. And it contains no tenant stay. You have no information about virtual networks, about anything in the physical network. Every state information, except gateways, but maybe about this a bit later on, all state information is contained in virtual overlay networking. And OpenContrail uses MPLAs over Grieve, VXLAN, and MPLAs over UDP types of tunelling because these overlay networks are created as tunnels between the VMs. So let's go back to the same example. But now let's see what happens when we have an overlay network. So if VM1 tries to reach VM9, the V-rotor, the software in this server 1, Hypervisor, it encapsulates the packets for the VM9 with the header of the physical server 3. Because, as I previously mentioned, that this information is available in the controller, we know this because we have components of the entire system in those Hypervisors in the software. So we know that this VM is now running on the server 3. So we encapsulate our packets to VM9 with the header of the S3 server and put it to the network. This is just an Ethernet and IP network. So once the physical server was connected to the network, switch learn using ARP where are which servers. So it knows where to put packets on which port. And when the VM migrates from one server 3 to server 2, we still have the same VM. We don't have any longer the same packet like it was in the previous example with VLAN. Because now we know that this VM is on a server 2. So we put S2, Ethernet header, physical. So we encapsulate it with S2, Ethernet and IP headers. And this is just sent to the physical switch. It knows where is physical host. So for us, it supports 2. And the packet reaches the server, which is currently hosting VM9. There are clear advantages of this solution. Because the knowledge about network is only in the software. So you have the knowledge about virtual networks. The state of the virtual network is in the controller or in the compute nodes components. And what is really nice is that any switch will work for the IP fabric. You don't have to have any means of configuration this switch. Any switch will work. Only speed matters, actually. If it is faster, it is better. And if it doesn't have to have any sophisticated configuration possibilities, then it may be of lower price than the former one. And in case of OpenContrail, this whole process is based on the standard protocols, which makes it very easy to interoperate with existing equipment in the data center. So let's see how the OpenContrail is built. Here is an architectural overview of the entire OpenContrail system. From the high altitude, it is actually composed of two things. One things are the forwarding plane, which is a V-rotor. And it is available on every compute node. Because compute nodes are the machines that actually used to spawn the VMs, but they have to have some components. And the V-rotor forwarding plane is put on every hypervisor, on every server that is hosting any VMs. And the second component is a controller, which, of course, is built from different components. I'll talk about them in a second. But we may distinguish those two things. This controller itself is centralized, but it is logically centralized, and physically is distributed. It allows for scalability, because every component of the controller works in an active, active manner. So if you are lacking of resources or something like this, you just spawn another VM, because components of this controller may be also in the VM, or maybe in the physical server. So we may add another hardware. And everything just scale out very easy. So let's walk through a bit through those components. At the very top, we have a configuration node. And the main task for the configuration node is to provide the API for the user or for the orchestrator. In this case, when you use OpenStag, use Neutron and plug-in, which will talk to the configuration node. And it will be talking using very high-level description. So you just want this VM, VM1 and VM2 ones in the virtual network A. You want to allow virtual network A connectivity to virtual network B, for instance. You allow virtual machine 3 to be not outside world or something like this. You use very high-level description primitives to describe the states of the system. And this state of the system is held in the database. It's Cassandra, in the case of OpenContrail. It was choose because it is easily scalable. So if you are lacking of plays or the performance, then you may put some load balancing or do sharding of the database without any problems. So this is Cassandra, in case of OpenContrail. The rest API server is, of course, serving the API. So it is receiving the request from the orchestrator. And the very important thing is the schema transformer. It is actually some kind of compiler. It compiles from this high-level description of the network in an entire system state to more lower-level primitives like routing instance, like next hop, et cetera, et cetera. Because this schema transformer makes the transformation from the description what we want to achieve to the description how we want to achieve in this given system cluster we made. And OpenContrail uses IfMap to broadcast this information to the control nodes. So if something is changing, user changes configuration, IfMap server will broadcast it to the control nodes. And control nodes are responsible for setting up actual providing base information and communicating those into the compute nodes. It uses XMPP for this purpose. And it also gets some information from the compute nodes like if he decides to proxy some protocols like, let's say, ARP, DNS, or DHCP. Besides the communication with the compute nodes, there is also each control node communicates with other control nodes using BGP protocol. And it can communicate with regular hardware equipment. So any switch that understands NetConf or BGP also can communicate with control nodes so they can exchange their routing information and cooperate very smoothly. Of course, there is a lot of control nodes, at least two. I mean, one compute node connects to at least two for redundancy reasons. And all of those control nodes are active. So if one goes down, the other automatically takes care of the job that the first one did. So we are very fault tolerant with this. But now we come to the most important node, which is the compute node. And which actually requires some work when we try to use it on the FreeBSD platform. This is a similar picture I presented when I was talking about the OpenStack compute node. It's almost the same. We have Nova compute, we have LibVirt, and we have Behive and VM. So this hasn't changed. But now we have some new components. We have a NovaViv driver, which is supposed to communicate the state of the VMs, which is known by the Nova compute from the OpenStack controller to the control again, because he wants to know which port should be or which type device is associated with which VM. He has to know this in order to connect correct VM to the correct virtual network. And there is a control agent, which is a user space process, and it is actually sort of part, distributed part of the control node. It actually handles the proxies of ARP and DHCP, because these kind of protocols are not broadcasted over, not send it over the IP Fabric network, but are handled inside the compute node just here. And the main task of the control agent is to communicate with the ViewRouter, which is a kernel module. And the ViewRouter doesn't have any information, any intelligence in self. It just has a four winding tables, flow tables, and this kind of stuff. And it just puts packets from one port to another port, or from one VM to another VM, does encapsulation, and this kind of stuff. It is controlled by the agent. And they communicate with Jojo using Netlink. Netlink is a Linux thing, not available in FreeBSD. But fortunately, the communication between agent and the ViewRouter was only using Netlink, but is not using any sophisticated features of the Netlink, just the headers and the transmission control. So we are using just the same headers and using the sequence numbers to acknowledge that this part of the communication has been received by the ViewRouter or by the agent. And there is something like the flow, which is just a memory sharing region between the agent and the ViewRouter. And it is used for the flow tables. Because flow tables are hash tables, and both agent and the ViewRouter wants to have a quick access to them to quickly find which flow should go where. So those hash tables are shared by the agent and by the ViewRouter via the flow. And there is a PICT-A device, which is just a tab device. And it's used when the new flow is discovered. So if the ViewRouter wants to send a packet of a new flow, so the first packet of each flow is sent to agent. And then agent sets up. If there is no proper flow already set up in the ViewRouter, then it sends up to send it up to the agent. And agent sets up correct flows and communicates about new flow to the controller, and et cetera, et cetera. So this is those elements again here marked in the violet color. This all has to be all written, ported, or modified in order to make them work in the FreeBSD case. What we actually support in the FreeBSD is only one mode of operation. And it is tunneling via MPS over GRE. And I would like to show a very similar example of what we talked before when I told about the Overline Networks. And this is a very similar example, but it is much more concrete. Because we have all these nodes I have been already spoken. So we have the configuration at control node and compute nodes here. And how it all works. When we have a VM, and VM is spawned on the server one here on the left, then it informs the controller that the VM with IP address 10.1.1.1 should be. It just sets up the next hop for this VM and with the physical address of the server. That way, the controller knows that once somebody would like to reach this VM of address 10.1.1.1. It has to send packets, packets should be sent and encapsulated in the 70.10.1 address. And MPLS label number 39 should be put in the MPLS header. The same is for the server too, but different, of course, different IP addresses, different labels. And then if VM one from server one wants to reach VM on the server too, it looks in his forwarding table and he sees that next hop for 10.1.1.2 is 15.1.10.1. So he knows it has to encapsulate it in the GRI and add MPLS tag number 17. And that's what we see on the network, actually. And when this packet arrives to the server too, it is decapsulated by the V-rotor. This MPLS label identifies the virtual network where the VM is connected to. This is quite a nice thing because those labels are local to the compute nodes and to the virtual networks on these compute nodes. So even though there is limit of MPLS nodes numbers, we probably doesn't reach it because it is local. It's not global like it was in the VLAN case that we have 4,096 tags. And they were just used globally. And those are things that matters only locally. So we may never reach the end of this, actually. So this also helps with the scalability of the system. There is also yet another node, which is part of the OpenContray solution. And it is AnalyticsNode. It doesn't take presence in the actual transmission of the packets or deciding how it works. But it is very useful in terms of analytics, in terms of debugging, because each event that occurs in the OpenContray system is reported to the AnalyticsNode. They have here some rules in Gine, which is just very simple Mabridus pattern that can get any information. They have special query language for this. And you can just extract the information about how many packets were transmitted at given time in the given virtual network, or how many reached this given virtual machine, how many of them were encapsulated with what kind of stuff. Everything what you want to know is available via this AnalyticsNode. So it is a very, very nice thing. But the AnalyticsNode controller and compute nodes are, as it was in the OpenStack case, just on user space applications, not necessarily dependent on the underlying platform. So from our point of view, the compute node was the most important one and most interesting for us. And for the FreeBSD development, we just created a new model of ViewRouter. There are some common parts in this, like DPCore, which takes care of the encapsulation, the encapsulation, this kind of stuff. And everything else went to the FreeBSD subdirectory. When it comes to agent development, there were some differences between IOCTL's top device manipulations in FreeBSD and Linux. So we have to do this. We have to change the shared memory, the dev flow device they use on the Linux. We just use a regular file and mapped between the ViewRouter and the agent. And we have to create a listener. Listener is a model that learns about changes of the network states in the host. They use netlink for this, but we don't have netlink. So we use pfroot for this. And we have to implement this from scratch. And in order to make it sense, we have to made a lot of refactoring to abstract the differences between names of the fields in structures of network headers or something like this. There are differences between Linux and FreeBSD. And we have to take care about this and abstract out most of this stuff. And what has left to be done still? We need some improvement because current support in LibVirt is not very complete. We only can spawn FreeBSD machines at the moment, actually. We need some open stack improvements. But we are here dependent on what LibVirt actually does. So at first place, we have to put some effort in LibVirt. We do not support firewall in Nova because current implementation is only for IP tables. We have pf, ipv, and ipfilter. But we haven't done it yet. And we still, if you want to use it, you still have to use our forgo of Nova. We need to complete all the stuff because OpenStack doesn't want to integrate. If the support is not complete, they don't want to integrate it into their official repositories. We also doesn't support MPDs over UDP and VxLands, which are supported on the Linux platform. So this also has to be done. And a lot of work has to be put in the automatic provisioning. So to DevStack and contrary in Star Scripts because they on the FreeBSD are still suffering from many issues. And that's actually all I have prepared. If somebody have any questions, I'll be happy to answer. Yep. Hi. Thank you for your talk. Do you have any ETA in mind when you think you're going to get this production ready and integrated with the OpenStack? What do you mean by ETA? ETA, estimated time of arrival. Oh, OK. So the OpenContrail is already merged into official repositories. So it is available there. You have to have the forks of OpenStack components because they have not yet been integrated. But this is all available on the GitHub so you can just take it and try it by yourself. Unfortunately, it may not be as easy without some fiddling with some things, especially when it comes to this provisioning scripts and DevStack. But the rest is there and you can use it. So I don't know how many free cycles will be had because we are now focused on different development in the OpenContrail. Not this one related to the FreeBSD part. It was just one task. But we'll try to complete and somehow get this stuff merged to upstream and have it supported everywhere. Yeah? So do you know there's already work being done to get VxLan on FreeBSD? So there's someone actually currently working on that? No. OK, so I'll connect you with them later. And the other question I had is in the earlier network example where you said that you create all these MPLS tunnels or GRE tunnels or whatever you're doing, which is incredibly wasteful of bandwidth. So why don't you simply use a gratuitous ARP to update the switch? Sorry, could you repeat because I? If you move a virtual machine and you know you've moved a virtual machine and you've taken an action on your own part. So you could actually poison your old ARP entry to the switch by doing a gratuitous ARP. You could tell the switch instead of creating a million tunnels. I don't. I don't. I believe all the stacks doing this because they have to make. It's very hard to change it because it's running normally on Linux and they're just following the Linux way. So if you port it and start changing it, probably it will be like, well, for us to keep because they're going to use those GRE tunnels all the time. OK, so thank you very much.