 Already? Okay, hi. I would like to welcome you to this session, deep dive into a Neutron upgrade story. My name is Arto, I'm a software engineer at Intel. Hi, my name is Rosella. I work for SUSE and I'm a core reviewer for Neutron. And I'm Ihar Hracishka. I'm also a co-worker for Neutron. Before we start full disclosure, we are three developers, we are not operators. So when we will talk how you are supposed to upgrade Neutron, we may make mistakes because just because we don't operate the thing. So if we do, make sure to point it out. We are looking forward for feedback. Yeah. Okay, so on today's agenda, we will introduce our upgrade sub team in Neutron community. We will talk about upgrade process overview and some more details about control plane, upgrade as well as the data play one. And at the end, we will mention our future plans on what's next and what we are planning to do in Neutron. So at the Takya summit, we have formed a sub team. We have the wiki page and we get weekly meetings. So if you are interested in enhancing the upgrade story for Neutron, please join us at the IRC meetings. We cover the green ACI, Olympic migrations, observation objects and RPC versioning. So for starter, I would like to present the upgrade process overview. The upgrade process will consist of from the few steps. The first one is getting, of course, the new code. Then upgrade database, go to upgrade the Neutron server. Then upgrade Neutron node and network node and at the end, the compute nodes one by one. So this is kind of a schematic of legacy, legacy networking, meaning that is basic open stack networking out. The service is that you have the L2 agent on a compute node. You have the one network node which is covering for the routing and not and the controller node with Neutron server and the database. So first of all, we need to install the, for example, going to Liberty to Mitaka, the Mitaka code. It can be on some other inbox. You have to get the database access on the node and in case of database upgrade, the Neutron has the two-phase upgrade consists of expand phase which is only additive changes are applied to database and the second one is the contract phase when you have the removal or altering the tables. The expand phase can be done during the Neutron server runtime. So we have reduced the downtime of API and after getting the expand phase completed, we are going into the offline phase when we are upgrading the Neutron process itself and its dependencies on the node and after that we can go to the contract migrations. They are not allowed to be run during the Neutron server is also running. So there is also an offline phase and how much time it takes. It's how much data you need to migrate because the contract in Neutron is also doing the data migration. During the Mitaka cycle, we had introduced the offline migrations CLI command which is kind of good for you when you are installing from master because it's checking up if there is a contract migrations pending to be applied. If not, you can skip this contract phase running. So after completing the data migration, we are switching on the Neutron server. Like it's on the screen, you have the Neur server talking to the other agent. The code has to be compatible between Neutron server and other L2L3 and DHCP agents. After completing the control node upgrade, you can go to the network node. First of all, you have to upgrade the L2 agent because it's wiring up all the other agents on the network node. Then go to the L3 metadata DHCP. It's important to mention here that L3 is responsible for routing and floating APs and not. So when you are upgrading the network node, you can impact the kind of data on the access from the internet to the VMs, but we are keeping that in mind that we should not touch the IP flows or the routing tables or IP tables to be flushed. So it should be safe to upgrade the network node. Then you can upgrade the L2 agent. You can go to the compute nodes one by one in a particular order and also have the L2 agent upgraded. It's wiring up the VMs. So there's also impact on the flows on the compute node. The upgrade process is to be done in place, meaning that you don't have to migrate the workload to the other node when you are upgrading the current compute node. And this is the end of the legacy scenario upgrade. The other case is DVR when you have the L3 agent on the compute node. It's also, what's different is that you have to go to the L2 agent first to be upgraded and then L3 agent to be upgraded and metadata as well. And in this scenario, the L3 is responsible for east-west routing and floating IPs assigned to the VM. And now it's about the controller update. Thank you. So I will now focus on this work. I will focus on a specific component which is controller and try to provide some details on this. So when an operator plans for an upgrade of the controller, he has some expectations from the process. The main one is that the downtime of the service should be reduced as much as possible because this service is the front-end for API. So as long as the service is shut down, you don't have no API requests are handled. So you want to avoid this. And if you have highly available setup for your controller, then the expectation is that there may be no downtime at all because you should be able to upgrade each of the nodes of the controller cluster separately and still run others to serve API requests. And obviously while you're upgrading your controller, your new controllers should still be able to talk to old agents that are running on computer network nodes because those agents, the order of the upgrade of those agents is required to be after the controllers. Sadly, it was not the case during the time of liberty and even in Mitaka. We have a very significant downtime when upgrading controllers because of the database contract phase. Also, you cannot run staged upgrade for your highly available controller, meaning that you cannot take a single node, upgrade that one, and then go to the next one. All controllers should be upgraded to the next major version at the same time, meaning that you need to shut down all of them. And another issue that we had is that even though the code was supposed to support rolling scenario, at least between controllers and agents, it was never actually tested in the gate. We didn't have any jobs for that. So we started to look at all those things. So let's cover API downtime. So why we even have it? Obviously, as I already said, because we have this phase when we need to run the contract migrations. Migrations for database schema and data is handled by a limbic library. And most of these scripts are not safe to run when the controller is still running. So before liberty, the process looked like you stop all your controller services on all nodes. So at this point, the API is down. Then you apply all the data migrations. And after that, you can start to spin up your controllers back. So during liberty, we looked at the migration scripts that we have, and we realized that some of them are actually quite safe to execute while controllers are running. And so we split those scripts into a separate branch of limbic migrations. We called the branch expand, and we left all other unsaved scripts to the contract phase. So we saved some downtime for the time when we can expand the database. So starting from liberty, we have the processes. First, we expand the database. Only after that, we need to stop the neutral service. We still have significant downtime while applying contraction, and then we started the service. So we were thinking like how we can improve. The improvement here is that obviously you want to get rid of this contraction in the middle. So we were thinking how to move it somewhere. So the solution is to do the contraction a lot later once you already started your new controller. But how do you do it? So that's the vision, but it's not easy to implement that from a technical point of view for that resolable cover a bit. I think I'll use my laptop. Otherwise, I cannot move my hands. I'm Italian. I need to move my hands while I speak. So next one is... So as you might imagine, we have issues to solve for the upgrades. As you already explained, we have some downtime in the API, and also another problem is that if we have a cluster of databases, we need to have the same schema because we cannot handle so different databases that are using a different schema for the data. And we were trying to find a solution for all these issues, and that's when we meet also versioned object. So let me explain why we decided to adopt this library. First of all, also versioned object enables you to have a strict representation of your data. And so, for example, if you have a network, you can define all the fields of the network object. And when you modify it, like if you add a field or remove a field, you have a version. And this is very important. Let me explain, for example, when you upgrade the neutral server, the client... So the agent, that it's the client side of the RPC, is still using the older version. And right now in Neutron, we have no way to version the data that we exchange over RPC. So if you see the graph, if you have the server running N plus 1, we need some code in the neutral server to be able to detect that the agent is using an older version of the RPC message, and so to basically convert the data in a format that the agent can understand. And another problem is what I was saying before regarding the database. Oslo versioned object is a facade that hides all the implementation details to Neutron. So we don't need to take care any more of this migration because the Oslo versioned object can lazily migrate the data for us. Thank you. So I really fell in love with these objects. But there's a big but. As Shakespeare says, the course of through route never did run smooth. So of course we found difficulties in our path and I'm here to explain. So the main problem is a conceptual one. As I was saying before, Oslo versioned object is very strict while Neutron is totally flexible. So historically it's a project that has plugins, extensions, so it's very hard to fit the Neutron resources that are so mutable inside this strict model. And so we need to find a solution to that. And we introduced a new concept that it's a synthetic fields. So in Oslo versioned object, you usually have a one-to-one correspondence between the fields in your object and the column in the database. So for example you have, I don't know, network that has an ID. And in the database you have a table network with a column ID and so on for all the other fields. But synthetic fields, as the name says, they are kind of artificial. So they don't respect this convention. It means that the data for a particular field might not be stored in the database, it might be calculated runtime or it might be stored in another table, not in the table that belongs to the object. So let me show an example to make it more clear. Like we have here the port object. This is probably the most complex object in Neutron. There are lots of extensions and lots of fields that don't depend on the port table directly, but on other database table. And as you see here in bold, like we used a very specific field. For example lists of object fields. So this means that this field is made of a list of objects. So it's an object inside an object. And the class for this list of objects is IP allocation. So now we have code in place in Neutron that when we see a synthetic field, instead of loading the data for that field from the table defined for the main object, it will use the table defined in the class of the field. So in this case for example IP allocation, we will use the table that is defined here IP allocation of the table port. And that's how we managed to express extension in Oslo version of object. So there's still lots of work to do. As you can imagine it's a huge refactor. It involves basically almost the whole code base. And we are optimistic that we will be able to get it done in Neutron. In Neutron, sorry. And we are working on it very hard. And if you want to follow our progress, I put here the link of the blueprint. So just have a look at it and at the patches that are in flight. And now I don't need it. So yes, objects are cool. But what they actually give us is that they are like hiding all the dirty details about how Neutron should handle multiple versions of schema behind their interface. So instead of updating every single place in the code base where we may touch the database with all those dirty details, we instead just switch all the database access code to the objects and then implement it in a single place. That's how it helps us for schema migrations. So let's look at how you upgrade your HA controller. So before Neutron release, assuming that we actually deliver what we promised, the process looks like that. So we start with a cluster of controllers that is running an older version. Let's say it was Liberty. And to upgrade the cluster to Mitaka, your user accesses your API endpoint through some kind of load balancer, I guess. To upgrade to the cluster, you need to shut down the whole thing. And during that time all API requests are just not getting anywhere. No one is serving them, which is not nice. Then you get applying your database migrations and everything and you start your services either in parallel or whatever. So now everything is fine, but we have this very bad situation where no API requests are handled. So assuming everything is fine and we have the feature in place, starting from Neutron for all upgrades to later releases like Okata, the process will look like that. So again, we start with the same thing, all controllers running Liberty. Then you can actually upgrade them one by one. So you switch one of the controllers, you upgrade it to the new code, you start it. So now you have mixed versions in your cluster. Some of them running Mitaka, some of them still Liberty. And you proceed with the process one by one until you get the whole cluster upgraded. The technical difficulty here is that in the meantime controllers should, first, you have mixed versions and both versions should consistently talk on both RPC side, which is messaging bus, and also on API side, which may not be always the case because we introduce changes in how API behaves. So there are questions on how to do it. We probably need to introduce some way of pinning an API version before the upgrade so that even new controller versions behave as if they are running the old one. And only after you are done, you may switch to the next API version. But for that, we need some kind of versioning on API level, probably microversioning that is still not there for Neutron. So it's also something that we will need to look at. So I will try to describe how this mechanism of handling multiple schema versions work like in details. It will be a bit tricky, but I hope you will be able to follow that. So here you see a very simple HA controller, all the nodes running some old release, and they all access the same database scheme. And the same data base scheme Now we go with the upgrade process. So the very first step is applying expansion. Expansion adds some columns or tables. They are still not accessed by anyone, so it's completely safe. So now we are starting to upgrade our controllers. So we obviously shut down one of them. We upgrade it to the new code. And here you see that the new controller accesses both the old schema and the new elements of it, new columns or new tables. What does it mean? Every time a resource is updated, it will write the state of the resource in both places. While you see this hard, the hard indicates the source of truth for this controller, meaning from which schema element it will get the state of the resource when it will fetch it, when it will fetch it. So it will write to both places, but it will still prefer the state that is saved in the old schema. So it's basically updating the new schema without anyone reading from there. But that's fine. So then you proceed with upgrading the second controller and it does the same. So now you get the next release. Let's say the first one was Liberty, now Mitaka, and now you have Newton. So for Newton, again, you shut down one of the controllers you started, and it still writes to both places, but what changed is that now when it will get the state of the resource, it will prefer the state from the new schema, but it will still update the old one. And it does it because we still have older version in the cluster, right? So we need to update both. So we proceed with the process. We upgrade the second one and it, again, does the same. It updates both places, but uses the new one as the source of truth. Why we still need to update the old schema? Well, because the code should be written in a way to support mixed versions, mixed versions of clusters that still run N plus one and N, right? So we update both places. And finally, when the next version comes, which is P something, again, they upgrade, you take another controller one more time, you upgrade it. And what changed is that now it does not update the older schema at all because there is no one to read from there anyway. And you proceed with this, the last controller, and now no one actually even writes to the old schema. So it finally became unneeded two cycles later. And that's the time when you can actually contract your schema. So we need to wait two cycles to drop anything, any column, any table. And finally, you get to the state where you probably wanted to get. So that's how database rolling upgrade works. So now let's see how is the data plane affected during the upgrade? So let's introduce briefly the L2 agent and it's the one, the agent in charge of the L2 traffic. So as you might know, it runs on the computer network nodes. And it's the agent that configures the local switches. If you're using an open-v switch, then it basically installs flows so that the traffic can flow correctly. I won't go too much into details, but you can imagine that to get a packet from point one to point seven or vice versa, lots of flows are required. And so let's go back and see exactly what's a flow. It was introduced by the open-flow protocol. And it's made of two parts, a match and an action. A match is basically a filter. It identifies some fields in the packet and the value that they need to have. And if a packet match, then you apply the action. An action could be a drop, could be output from port one or whatever. So an example of a flow is, for example, drop all TCP traffic. And you see here a screenshot. I just dumped the flow of VR int when Aviam is running. So there are quite a few. Thank you. So what's the problem exactly with upgrades? You need to restart the L2 agent. So before Liberty, that was causing a traffic disruption because the L2 agent was deleting all the flows at startup. So you needed an interval of time where the agent was processing the port and installing the new flows where you had no traffic. And we found a solution to that in Liberty with a simple strategy. The agent at startup creates a UUID. This UUID is used to identify the flows that this agent installs. So it will basically put this UUID in the cookie field of the flow. Then it won't delete any flows. It will just go on processing the ports and installing the new flows. And then when it's done, it will delete the state flows that it will be able to recognize because the cookie is different. So in this way we achieved no disruption. But some scenarios were still affected. And specifically if you were using provider networks with flat or VLAN. And this is because another bridge is involved as you can see from the graph. It's the physical bridge, the bridge that provides connectivity to the physical network. So the solution for this was basically the same as the one applied for VRN. So create a cookie, don't delete all the flows at startups and just install the new one and clean the state one when the new one are in place. So now with this improvement in Mitaka we have no disruption. And also we improved farther because the patch port that connects the bridges are not deleted anymore on startup. So now we have no disruption in our tour wheel. Yes, so the question is how do we validate that? If the rolling upgrade is working then our patch does not break it. So we are using the grenade CI. It's a tool that installs the latest stable release in a DevStack. Perform tempest smoke test and then upgrade the code to the current master and the smoke test again. In the Mitaka cycle there was a new partial multi-note job introduced meaning that we have the primary note and the secondary note. The primary note is only one meaning that controller and compute and L2 agent is installed there and on the secondary note there is a compute agent and L2 agent as well. So when the grenade is proceeding it's upgrading the primary note to the current master and then the second run of tempest smoke test are running against the mixed version environment when we can check if the RPC is compatible between latest stable and current master. Also, just after the Mitaka release we have introduced Olympic scripts that will cover the proper testing in-gate for the migrations for the Postgres SQL but also there is some improvement needed like we need to move more services to the old site of the server of the cluster. We have to stay in the older version of code for the L3 agent, the HCP agent and metadata and we need also more improvement in DVR testing. We have the DVR multi-note flavor job but it's not voting. We need to make it vote and also add more tests for the smoke run to cover the DVR functionality. And more on the future plans. We need to first of all complete the also version object implementation. We hope to do this in Newton cycle because it's a requirement for the next steps like allowing to upgrade controllers in rolling mode like Ihar presented and also improve the compatibility when running mixed version of controller notes. I guess that's all. Thanks for attention. Any questions? And also, like I said, if you are an operator and have any valuable feedback please reach us. Reach our meetings and talk with us what we are going wrong or what's not working in the Newton upgrades. So, any questions? Okay, there's a mic. Yes, you've added a level of indirection with version of objects. Did you do any performance testing? Because before that, Neutron was most heavy used component in real life installation. Neutron server can really be heavy loaded with many requests and now there is constant translation. Did you check how much will lose out of versioning? No, we haven't run any performance test. Partly because we still have lots of major pieces in flight. They are not merged. We don't have the integrated stuff to target with performance tests. Obviously, we will need to do that. But generally, if we will hit some performance issues, there are ways to optimize that. For now, we are sometimes making shortcuts in some places that may translate into SQL queries that are not optimal. But if we realize that there is an issue, we can make maybe changes that are not that beautiful from objects perspective, from the code perspective. There's also a kind of approach and change needed. We are overloading the Neutron server, asking from Neutron agents to the server to send all the updates of ports and so on. Maybe it's separate, but it will also touch our approach to the upgrade. Not to ask from the agent to server, but send notifications to all the agents at once. It's also a performance, not upgrade, but performance related fixes needed. Yes, we are going to work also on this in this topic. Yes, people already started looking at that, improving the communication between agents and server. That should offload some of the wasted work that we already do. It will obviously not solve all the issues that may be triggered from API level, but we have places to improve, not just on the controller side, but also on agents. The library in itself should be kind of safe. We are not the first project using it, so we are maybe the last one. But anyway, it's a valid point. I think we could already test now. We have small resources that we already ported, so we could just see if the performance improved or got worse. Thank you. One more thing is related to L3 agent. On all the versions of OpenStack, it was a serious problem to restart L3 agents or network node because on real life installations, it can take like half an hour to bring all namespaces in proper state. And restart sometimes, get the same thing. You strip down namespaces, you bring them up. Any kind of versioning here, same with UEeds cookies? No, we didn't work at that specifically, but there were generally improvements. So I don't know. Now if you're using DVR or VRPP, VRRP, you have a copy of the routers, so it should work better. Yes, but for classical, simple L3 agent without redundancy, is restart of L3 agent still cause interruption in Dataflow? I think no. There should be no data plane interruption if there is one, please report the bug. But obviously, since all our agents are essentially stateless, every time they are restarted, they reach the server and hits it very hard. And also this initial rescind cooperation takes a while. But did you strip down the existing rules in IP tables and IP addresses, floating from interfaces and so on? Yeah, that also was improved, not by our group directly, but we introduced IP set and there were performance improvements also in the way we handle IP tables. Yes, I understand there is improvements. The question is, did you still cause a downtime here? Did you strip down and then build up? No, it should not. If it does, then please report. Okay, so theoretically it should be no any interruption. Yes, there are only data interruptions that we were aware were coming from L2 agents, specifically OpenVswitch, and that was solved in Liberty and then Mitakov for some improvements. L3 agent, I've heard that there were times when it was dropping connectivity, but... Okay, thank you very much. Thank you. Hi, the particular examples that you went over dealt with OVS, do they also apply to Linux bridge? No, because Linux bridge flows are not involved, so that specific problem was with OpenVswitch. I think for Linux bridge historically it was a bit better in terms of data connectivity disruption. But note that we don't actually validate it in any way. The only gating that we have is for OpenVswitch. And also note that even those jobs that we have, they don't actually validate that the connectivity is not disrupted right, but we have some functional tests that do it. Yeah. So maybe a bit off topic for your presentation, but related to upgrades. How does it handle the DVR when the network service is distributed on the compute node? There is upgrade. Is there a small switch where the packets are either cached or lost for a couple of seconds, milliseconds, with the compute node? Can you speak louder because I can't really hear. Sorry, so I meant to ask about, in case in scenario of DVR where the agents are distributed on the compute nodes, is there a small switch during upgrade between the data flow from the neutron side to the Nova compute side and the virtual machines? There shouldn't be, I mean it's distributed so you can go one node, one by one, so I don't think it's a problem. I don't think it's a problem. And the data flow for the OVS is, like Rozela said, during the Liberty cycle, we have added the cookie, meaning that when you are restarting the L2 agent, you do not flush the all flows, only add new ones, and at the end of the restart, the not used one are deleted. So in our test, it didn't show any outage for the VM networking. And how about the SNET traffic going through the network node or the controller node? Maybe that's upgraded and the compute nodes are still having the older agents, does it still communicate? I don't think this is really related with upgrades. I mean even if the compute nodes, they are using older version of the agent, I mean the flows are still in place, so it should work. Okay, thank you. Thank you. Any more questions? Okay. Thank you very much.