 OK. So, hello everybody. I'm glad to welcome you today at our session. My name is Artur Koženjewski. I'm working as software engineer at Intel. This is Grzegosz. He's also software engineer at Intel. And today's presentation was also started with Hemant from Rackspace, but he wasn't able to join us. And only the two will be in this presentation. So, the topic of today's presentation is rolling upgrades. Perform a schema rolling upgrade in just one release cycle. So, first of all, and the question is, what is the rolling upgrade? And like upgrade is installing new kind of software, new version of software that was already running. And in the distributed systems, you should be able to replace one version of a software with a newer version without disruption of working, for example, API serving to the end user. And when you install this version 2, it should be working as the same with the version 1 in the same time. So, for example, when you will be serving the API, user will not see the API downtime at all time. So, you will be able to upgrade from one version to another without losing any customer requests. And the problem here is that you will access the database with two different versions. So, we need to have special case for managing two different versions of software, accessing the database, new schema, new fields, different type of fields in the upgraded database. So, why does the rolling upgrade matter? It's like I said, when you will have these rolling upgrades running, you will not drop any API request from the customer, from the user of the cloud like CSP. You will be having no operational downtime, both at API level, the server level and data plane level, like access to the VL via SSH or VNC. So, it's like your user won't be upset when he knock at your API and don't get its service done properly. So, current approach in rolling upgrades is to do this via multiple releases, like at least two, three or even four releases. And when you are doing four releases along migration, you got this extra cost of development and extra effort from maintainers and reviewers of the code to be aware that this something should be there for backward compatibility with the previous version. Also, the old code, like, for example, old column name has to be carried for the multiple releases, as well as you are getting the backward compatibility issues for all this extended period of time. So, the most complex... So, what we need to elaborate here is like the complexity of changes. It's like data compatibility, the data format and schema changes when we are talking about database layer. And also accessing the data with two different versions of software on the upper layer. Okay, so before we discuss the open stack architecture, I wanted to introduce you to other pre-existing solutions for intercustal communication with support for different service versions at the same time. Okay. So, one of the first was on CRPC, developed by Sun in the 1980s, and it was actually a part of NFS project. In SunRPC, we define XTR files, which contain the schema, version numbers are incremented, when functionalities change in the remote program, and existing procedures can be changed or new ones can be added. At the same time, more than one version of a remote program can be defined, and the version can have more than one procedure defined. Another one is protocol buffers. They don't have explicit versioning, but we will design it in a way so that old and new services with different schema definitions could still communicate. To get that, depending on the implementation, unexpected data is either stored as extensions, make it grounded safe, or silently dropped. And new fields should be added as optional, meaning that old data can be loaded successfully. At the same time, you should keep the fields ordering. Your numbering could break existing in flight data. You also shouldn't change the way that any given field is stored, but it may be a case-based case. Another one is Apache Thrift from Facebook. It's pretty similar to protocol buffers, as it doesn't have explicit versioning, but still defines strict schemas. To maintain compatibility when the protocol evolves, you shouldn't change the numeric tags for any existing fields, and any new fields that are added should be optional. Sensible default values also should be set so that new code can properly interact with messages generated by old code. And when messages are created by new code, they are parsed by the old code, newly introduced fields are ignored. However, if you modify and serialize the message again, the unknown fields are serialized along with it. So if the message is passed on to new code, the new fields are still available. Next thing to remember is that non-required fields can be removed, but the tag number can be used again in updated message type, so it's common to rename the fields instead and tag it as obsolete with a prefix so that any future contributors don't accidentally reuse the number, and changing the file value is OK, but the file values are never sent over the wire. So if the program receives a message in with a particular field, it isn't set, the program will see the file value as was defined in the program's version of the schema. Next up is Erlang, which is a distributed programming language and runtime system in which program state is stored in processes. So it's a dynamic language just like Python. Processes usually send and receive data represented as tuples. It contains a simpler approach to live upgrades using records, and records are an extension of a tuples syntax with which you can define schemas. This is similar to named tuples in Python. In case of Erlang, the preferred way to extend the existing protocol is just to append new values to the end of the tuple, and it works. This gets us to the database schema definitions, because Erlang databases are also stored tuples defined with records, and when you want to change the database schema, you should change the record definition. In OpenStack, we have... I wanted to show that in this case, we usually have two types of inter-service communication, in which we have to keep data compatibility in mind. So messages are both exchanged and stored on RabbitMQ message queue, and rows of data are stored in the SQL database, which is another story. So you probably know SQL database, which uses the data definition language to describe schemas, and in the simplest case, we could approach the problem in the same way, just add new fields, and this actually covers most of the cases, but it's too simplistic when we want to have the possibility to remove data to, for example, enhance performance of the database. We want to have the possibility of changing this schema in an incompatible way. There are also other things to consider, like database logs when adding foreign keys, and we did a presentation which discusses this particular problem in the previous OpenStack summit in Austin. Yes, so, like I said, currently OpenStack is using multi-release approach, generally using version objects for database access and RPC messaging, and expand contract schema changes. What is version object? It's Oslo library created initially in Nova, and then ported to Oslo to gain every project in OpenStack ability to use these objects, and they store the data version together with the data and describe in point of time how the data is defined, like what's the fields, what's the type of the field, like integer or string, and also it's useful in RPC messaging, like you can version your RPC message that is currently sent with JSON dictionary and have this strict versioning on the server and agent site. Also what is cool about version objects is they had this translation method to another version, meaning when you will be upgrading your server version to the newer one, and the agent will stay older in older version, and you change something in protocol exchanging the information between agent and server, you will be able to translate newer format on the server site and send it back to agent containing known format. Of course, also what we can do inside version object is to migrate the data between one format to another one or another format, another column, another place to store the information. When we are talking about keeping compatibility at database SQL level, it's more complicated. It's the expand contract approach when you have some kind of DB schema defined at one release, and first of all, you have this expand, which is only additive changes to the database schema, so you are only adding new columns or adding new fields and new tables, and does not disturb the older version to work, and after you will be... You can do this expand phase online, because it's not disrupting the older version of the server. The second part is contract phase, so it's kind of destructive changes, so we'll be removing the data, the columns, and it's not safe to run it online, unless you will move to the multi-release approach when the contract changes will remove not needed data from previous releases that are existing in database and are no longer accessed by any version of software that can run. In OpenStack projects we are generally keeping the compatibility between X and X plus 1, so it's one release backward compatible approach, and after you will roll to another version, you can, in contract phase, remove the not needed, not accessed anymore column or any other piece of schema, database schema. So the current multi-release approach is like when you have this application and it's reading and writing from the column, and you like to rename it to another column name in X plus 1 version, you will be having read from the older one and write to the older and newer one, and you need to provide the data migration script so that in the background you will be migrating in small chunks of data from the older column to the newer one, new format and so on. In X plus 2 version you will need to write to the older column because of the X plus 1 is reading from it, but you can also read from the newer column because also the X plus 1 is writing to the newer one, so you will be having the compatible data inside both columns. And the first moment when you can release the older column is like X plus 3 release because X plus 2 and X plus 3 are only using the new column only for the reading purposes. So it's like you can have, like you have seen up to four releases of having the stale data of the old schema version inside the database to become backward compatible to the older version. It's because of when you have multiple servers running at the same time, serving API and accessing the database and you have to make this compatible to one point, sorry, minus one release and if you would have, like, only one point when it will access database you can do this in two releases. So it depends on the project use case. So when we are talking about one release approach for the rolling schema upgrades we can, like, access it with, address it with also version objects. It's like when you have the X release and it's kind of tenant ID is example from Neutron. Currently in Neutron we have renamed tenant ID to project ID and this scenario will show you how we should done it if we had Ovo in place, which we don't have right now in Neutron. So not all of the data is using OVO to access the data. And so in the X release you have only the tenant ID and version object with version 1.0. When you will go to the X plus one release you will need to increase the version of the object to 1.1 and also you will need to write to the older column and also read from the older column. With using the expand phase introducing the new column and also write to the project ID which is the new column name for the tenant ID. So you are serving, you should be able to from X release to access the tenant ID and inside the X plus one release serving both tenant ID and project ID for tenant ID for the backward compatible like agents from the server agent communications. And of course there should be some kind of migration script and you need to run it on the background and when you will translate all the, you will migrate all the data and you can drop the old column and in this case it's kind of tricky because you will lose your X minus one compatibility so you cannot do this like being all the time compatible with the older version using the also version object only approach. And also you will need to switch inside this data access layer object are using some accessing methods you will need to switch the write and read from the older tenant ID column to the newer project ID when you will apply the contract scripts removing the older column and of course you have the older data in the newer column. So let's see what's the other opportunities. So what happens if you miss step when applying this and there can be a lot of misunderstandings when first introduced with this idea so here is for example an example in this case you start a new service which writes new data to the old and new column so that it can still be read by the old version but because it reads data only from the new place which with a possible fallback to the old column when the data is not found it can miss an update with the old service which the older service can make to the row added for example by the new version. Next is another problem could result when the new version stops updating the old column while the old version still tries to read the data from it and again because you don't have any information about the version all the release may miss an update or even new rows added by the new version. So this is the proper way that even the OpenStack CI needs to test like Multinode Grenade test Adjustment Services version running together with consideration for the aforementioned edge cases. A new idea to shorten the release cycle in which developers need to keep track of the changes in the data model are triggers and this is currently considered in Keystone and Glance. These are services which don't have internal service RPC communication which are between Q. So the main database layer with which we want to keep compatibility is the SQL database and by introducing these triggers you can actually skip two steps and the data migration part is started before the upgrade. So the layer implemented with triggers manages the data compatibility and the actual application doesn't have to switch between different patterns of accessing data. When aligning it with the previous examples it looks like this so when triggers are introduced and all data is migrated before the upgrade they make sure that the old date and newly introduced code updates and reads are consistent, have a consistent view of the data. On the mailing list there were some skeptical voices about the trigger approach. The main concern for me is that there is not enough expertise in the current Python community to maintain triggers and the one approach proposed by Michael Bayer is to use SQL kemi events which could be used like triggers and this should be possible to make upgrades in one release cycle entirely in Python since even without events you can place fields in a SQL kemi class under if clauses changing the model on the fly while the application is running This doesn't take into consideration though that we can't modify the data access pattern in the old version of the app so only in the new one we have to approach it with and cycle through four data access patterns and only the three last would be implemented in the new release. So this actually could be a split up into a sub-project like a sub-project of Keystone or Glance the compatibility enforced data access patterns can be made totally transparent for the main project by using triggers with table views to represent data as two separate databases so the first would connect to the first database and the second database can be created for the new version And like you see here the original developers would write schema and data migration scripts as normal on the first part and these migration scripts could then be used as documentation for creating triggers in rolling upgrades sub-project If you want live upgrades you could benefit from this separate project which would propose an alternative upgrade path And as I previously mentioned this could be used even with an older release of Keystone or Glance which doesn't support upgrades So summing up what the operator should get is in either case whichever running upgrade implementation each OpenStack project considers it should be as simple as downloading the new code running expand which can install triggers roll out the new version run contract removing triggers or schema and data And we want to encourage our projects to use that model from the operator's point of view Thank you Questions? If there is any questions there is a microphone so you can speak to the microphone My name is Arkady Konezkiv from Dell EMC Question with respect to ordering Is there an upgrade across all the projects and is there workflow or triggering order to make sure they are all done in the appropriate order so to reduce the disruption to the user? Yes, so most projects authors take that into consideration that when introducing new functionality it can cause incompatibilities so usually a new feature that is implemented in one release isn't across project feature isn't available until you upgrade to the new version so you could upgrade the services usually in either order but you won't be able to use new features from this release You need to be like cross project compatibility from another point of view using some kind of exchange enhance feature from Neutron you should be able to see that Neutron is not yet upgraded and not to use it and be also compatible with the older version of Neutron And the main concern here is API compatibility it's the HTTP API that services communicate with each other they usually don't use RabbitMQ but for example for notifications the version objects are used So for example now the Nova Neutron, Nova Cinder communication is using OSvif and OSbrick libraries so it's kind of also version communication between two services Kata implemented in Newton to also implement the OSvif for example so Nova can ask Neutron give me the port and the version of its service and Neutron can respond with the version that is currently running so it's kind of backward compatibility at this level cross project integration Any more questions? Thank you