 Hello, everyone. Welcome to my presentation. My name is Naveen Joy, and I am an upstream contributor with OpenStack. My presentation is about using a reliable ETCD-based messaging architecture for OpenStack. So here is what I'll be talking to you about. First, I will give you an overview of what is VPP and what is networking VPP. These are the projects that I'm working with currently. Then I will walk you through what's ETCD and how we are using ETCD with our project, that is networking VPP. And finally, I will show you how we could use ETCD with OpenStack and what are its benefits. So what exactly is the problem here? First is, if you have worked with Neutron, OVS is a default virtual switch used in OpenStack deployments today, and you may be aware of this. So what is the problem? OVS works very well for standard cloud use cases, such as if you want to push 10 gigabits of data, and if you want to do bulk transfer of big packets, there's no problem. Everything goes very well. So if you try to push small packets, actually iMix packet, you will realize that OVS doesn't scale very well, which is I'm not talking in general about OVS, but the default virtual switch in OpenStack doesn't scale very well. But the internet users actually generate iMix traffic, not large packets. So these packet sizes are just 1 fifth the size of normal cloud packets. And the average data rate is also very high. It's five times higher. And if you generate these kinds of packets, you will realize that your performance has deteriorated really rapidly. So how to solve this problem? One approach is to use VPP. What exactly is VPP? Vector packet processing is another V switch. It can support both layer 2 and layer 3. And what it does is that as shown here, it uses DPDK. If you're familiar with DPDK, just for those of you who are not familiar with it, DPDK stands for Data Plane Development Kit. It's also an open source project under the Linux Foundation, much like OpenStack. And what DPDK does is that it provides various data plane libraries and also NIC drivers that you can use. And by using DPDK, you'll be able to push packets at a much higher throughput. So if you look at this diagram, look at the Linux kernel that's sitting on the side. Normally in a Linux machine, you will find the NIC cards bound to the Linux kernel. Now in this situation, you can see that the NIC cards are bound to DPDK. So that exactly is the case. You are able to bypass the Linux kernel and you are able to push packets directly using the DPDK software to the NIC. So if you use Linux kernel, the problem there is that Linux kernel is very bad at caching packets. So you have a lot of CPU time that is being used there that's causing latency. With DPDK and vector packet processing, you are able to manage the caching and optimize that cache layer. And you're able to move a bunch of packets and minimize the latency. In this way, you're able to get much higher network performance. It runs completely in the Linux user space, as you can see here. So VPP is able to do fast network IO directly sitting in the user space, which means that if you want to do switch upgrades, v-switch upgrades, you don't really have to patch the kernel or reboot your machine. You're able to directly, much like any user application, you're able to upgrade your v-switch. So that is an overview of vector packet processing. Now what is networking VPP? So networking VPP enables you to control VPP within OpenStack. So if you try to install VPP and use it without networking VPP, you will have to directly work with it. And that is pretty complex. So networking VPP makes VPP easier to consume within OpenStack. And for attaining this high performance, we host user sockets to talk to virtual machines, more on this later. We support the following network types today. We support flat VLAN and VXLAN GP. Essentially, networking VPP is like a control plane for the VPP platform. This is how the architecture looks like. So networking VPP is exactly an ML2 driver for Neutron. As you can see at the top, you have the Neutron server and the mechanism driver. Then you have, on each of the compute nodes, you have the v-switch, which is the VPP installed. And there is an agent that's running, which is called the VPP agent program that is responsible for programming the VPP on that node based on what you tell it to do using the Neutron APIs. Then ETCD is a messaging framework that we use in our deployment. So what you can do is you can either use a single node or you can set up a redundant ETCD deployment using a three node quorum. And it can tolerate a single node failure. And you can also see here that we are also set up journaling. So when you tell the mechanism driver to do a certain task, at that point, what if you restart the software? You will lose that data. So we will save that state in the database. And then from the database, we write it to ETCD. So we have got that journaling to protect, restart. For those of you who are new to ETCD, what exactly is it? It's a distributed key value store. And it is also open source. It's available on GitHub. The simple way to put it is an application can use the ETCD APIs to read and write data. And it can also support TLS, certificate-based authentication and role-based access control for security. EBO and beyond, ETCD provides a reliable way to store and distribute state data across a cluster of machines. That's a fundamental role of ETCD in our project. It also supports versioning. So data is stored as a version key value pair. And in that way, you can easily roll back to a previous release. So if you're using today, RabbitMQ is being used as a messaging framework. And it works on a command-based architecture. So there is no real way to restart or if you want to roll it back to a previous release, there is no real way to do that. You'll have to go back and execute the same set of commands that you did earlier. So it's a command-based rollback. Whereas ETCD provides a state-based rollback where you can just add a previous pair of keys very easily. You can bring it up and then convert your system back to an old state much easily. If you're familiar with Kubernetes, this is exactly what the way it does, using state-based model of communication rather than a command-based model. You can also watch the data for changes. And it can enable applications to react to these changes in the key value pairs. So how are we using ETCD? So we are using it to store and distribute network states. As I mentioned earlier, the MariaDB database of Neutron is kept in sync with ETCD. And the VPP agent program that is running on each of the compute nodes will observe the desired state. So to give you an example, when you create a VM, a virtual machine, or rather, NOVA makes a call into Neutron to provision a port for that virtual machine. So we receive that call. And then what we do is we write that information into ETCD, of course, going through the journaling process. And at that point, the agent program on which the virtual machine is being spun up, that receives an ETCD watch event, OK? So OK, there is a VM coming up. So go ahead and create a port. So the agent will now do the needful. And it will compare the desired state in ETCD. And then figure out that, OK, I need to create a port. It is not there. And it will go ahead and do that. So it will push the necessary changes into VPP to match the desired state that you told it to do. And there is also a re-sync mechanism here. The VPP agent will fetch the desired network state data from ETCD. And it will compare it with the actual state when you restart it. So it's able to even react to those changes because the key value pair is actually stored within ETCD. It can read all of that and compare it against what is the current state and then program the actual deltas into the V-switch. Unlike RabbitMQ, where it's a command-based model, once you send a command, it's done. So if you restart it, there is no way for it to go back and figure out what command you execute it. So a state-based model is much more efficient and reliable. Here is an example of an ETCD key value. Here, let's say there is a node named Node1. And you want to create a port. So this is a key that is stored with the port ID. And then the value, as you can see, is a JSON data structure. This is a key value that is stored in the ETCD database and communicated to the agent that is running on that node. So it has now all the information that it needs to be able to create that port. So what are some of the considerations when using ETCD? The data has to be small. It cannot handle bulk data. And fast disks are essential because it's very sensitive to disk latency. And you can also get out-of-order events. For instance, if you're running, let's say, a configuration where you have virtual machine with security groups and you restart the agent, you may get a message to create security groups before a virtual machine is created. So you will have to write code to handle those out-of-order events as well. Now finally, how can ETCD be used in OpenStack? So currently what we are using is we are using RabbitMQ and AMQP. So AMQP functions as the broker and the software components communicate by sending messages. Instead of messages, excuse me, by sending remote RPC calls, that's basically a command-based model. So when you provision a port, a call is made to an RPC call is made to create a port and then it receives a response, basically a command that's being sent. So what is the problem? It's hard to scale. That's why you need to have a complex solution like CEL. You have to split it out and then hard to roll back. And it's pretty complex if you think the way that NOVA is designed today using CELS. So if you replace RabbitMQ with ETCD, you have a much higher scalability and robustness using the state-based model of communication. So this is something to think about and we have been successful using this in our project and I wanted to share this with you all. So that concludes my presentation. Thank you everyone for coming and have a great evening.