 Hello, today I want to tell you something about configuration files together with atomic updates, what the problems are and what could be possible solutions. My name is Thorsten Kukuk, I am Distinguished Engineer at SUSE, I am also the Senior Architect for the SUSE Linux Enterprise Server and Micro S, and I am also leading the feature technology team. What is the problem with configuration files today on a Linux distribution? The typical way is that means when there is a configuration file, the distribution sometimes later updates the configuration file in the package, that means applies the updated package and now you have a conflict there. The package manager needs to get the changes that are made, merged with the changes the distributor made. This is normally in this simple way not possible, so there are two choices, use the new configuration file, move the alt away, the result is the system can be non-functional or insecure or can continue to work. If you continue to use the modified configuration file, the system can be non-functional or insecure or continues to work. That is the two options you normally have. If you look at RPM and how RPM handles configuration files, RPM has support for configuration files and the Linux distributor or the packageer decides how to handle the configuration file. The worst case is the configuration file isn't marked as config at all, so RPM would replace it with every change. Also configuration file is marked as config, so modified files are moved away as RPM itself and the new configuration file from the distributor will be installed. Files marked as config no replace are doing exactly the opposite. The modified file stay, the new files are written as RPM new. Depending on which kind of changes in the package required a change in the configuration file, it can happen that your service continues to work or stops working and it's not always clear in advance what would be the better choice. To use a real-world example, I think nearly everybody of you had already contact with ETC log-in devs. It's a default configuration file of the Stratoshoot and one option there is which have should be used for new password. There are many options you can choose. They were on that desks and MD5 are obsolete and should no longer be used, but the default is less secured heath and that is desks. As good as this administrator, if the distributor hasn't changed the default, you could modify the configuration file and add a line encrypt method SRA 512 to make sure you will really use strong heathes for your new password. Now you have a modified configuration file and the distributor enhances this package at another heath to modify the configuration file, modify the command in the configuration file or similar things. Every time you update a package, you have to look for RPMC, save and RPMU files. If you don't look for them, it could be that going on forward new passwords are again desks encrypted and no longer with the more secure heath. If you use the other one, if you don't, if you have an RPMU file and you don't merge the changes, it could be that the service will stop working because the important information here is missing. There are several alternatives. Some package managers let's admit merge the files during the update, so it is really nice if you have a single desktop machine or a single server, but it's only possible in a non-interactive way to update. Think about a big Kubernetes cluster or another cluster of 200 nodes or so. Do you really want to update every single machine by hand, merge the configuration files? It is very time-consuming and very error-porn. There's another solution. It's called 3-way diff, which is divvying between the original one and the modified from the admin and the new one. But if all three files have modifications in a func, then it's still to the user to resolve that manually, so back to when the package manager lets the admin do it already during the upgrade. Now atomic update is something that adds another layer of complexity on top of it. What are atomic updates? Atomic updates are a kind of update that is atomic. Atomic means in this case, it's either fully applied or not at all. So if something goes wrong during the update, it should look like to the system as if this update was never even tried to be applied, and the update would not influence your running system. Why should your running system not be influenced? Only thing of important mapping, customer facing, the customer is entering his new big order, and while he is entering the order, you restart the service because you updated the package. So the customer has to start from scratch to enter his order. Another nice thing of atomic updates is it can be rolled back to an old state. So if the new update fails as incompatible, the situation before the update can quickly be restored. There are several different implementations. Nearly every Linux distribution has its own implementation. One of the comments, one are user partitions. You have two, A and B, and you switch always between the one you are currently using and the one you are updating for the next time. For openSuser and through the Linux, we use butters as with Snapfots called transaction updates, where we create a new Snapfot and then apply the update to the new Snapfot and boot the next time the new Snapfot. But they all have something in common. The services running does not notice that there is an update running and they also don't notice if it goes wrong and will be deleted, removed or tried again. And you need to reboot to activate the updates. Well, it's always a question what is more important for your update, uptime or the secure update of the system because it's a remote system, it's an edge system, something you don't want to send a technical in every time something goes wrong. So what is the problem with atomic update that there is a next level of configuration files? The atomic updates, you have the configuration file in the running system, which may be modified by the update, by the admin. The admin applies the update, which may also modify the configuration file. And while you haven't rebooted yet, so the changes are not visible, the admin decides to modify the configuration file again because he got new requirements, he forgot it before, whatever the reason is. And now the admin reboots to activate the update. So during the reboot, we don't have a package manager who could apply it, but we need to then again merge the configuration files. And this is pretty complicated. Next to impossible and even if you think about rollback with configuration files, whatever, it's in the current form as a current package manager, it's nearly impossible to do. So what should happen with a modified configuration file is a big question here. In this presentation, I want to follow up or explain our goal. We started with interviewing customers, speaking on conferences, with users, the user community, other Linux recipients. And we wanted to define a new way to store and manage configuration files. During all these talks, the feedback was very positive. So less than 3% of the people said, we are doing it since 20 years this way, it must be good enough so we can live with it another 20 years. But the absolute majority of the feedback was, please do it, change it, fix it, make it better. But there were also requirements from this people we spoke with. So the vendor default should be easy to find for administrator, that he can look up what is configured, how is something configured, and that he can find out what you need to change. The best for this would be if all is in one well-known directory and not spread over the file system like user lib something, user share something or something else. Also, they have to see that something got updated, that they know that they have to look at their changes if they are still valid or not. And they want to see what they changed, what is done by them which change and which is coming from the Linux distribution. And of course, the most important one is, changes could be merged automatically as far as possible. So we came up with some concepts which would work for nearly everybody. The oil contains that we have two directories, SLEV-ETC and a vendor directory somewhere below SLEV user. The vendor directory would be searched every via grep, so that you can find options, whatever. And the whole thing would be seen as guidance for new software developers and packages, the writing all existing software is something for the very long term goal, not for to mid-term. So the side we came up with is that ETC should only contain configuration files of angels, which are either host specific, means they are only for this host like which ConeModule needs to be loaded with which options, which ConeModule should not be loaded, what is the network configuration for this, or made by the admin. For example, which configuration or how for the Apache server we started from which the NTP server with the NTP client gets the time and so on. There are already some existing solutions. One ConeModule is a three-way, as I already said. There can still conflicts happen with need to be solved manually. Another common solution is that ETC only contains simulings to the files in the vendor directory. The advantage is that files are always current with every update and always there in the right version as long as admin does not modify them. But if the admin wants to modify them, he has to delete the simulings and copy the original file to ETC and then modify this copy. And after every update, he has to check if the vendor made another change to the configuration file or not. Since this is now no longer under control of a package manager, he has not so advantage that the package manager will notify him that he has to do everything at his own. And there is a system-de-like way. You have a vendor here where the default configuration files are and in ETC only the admin changes are. And this is a way we want to follow. So, coming to the proposals. There are different kinds of configuration files that you always need to keep in mind if you want to find a solution. Because since you have different kinds of configuration files, this also means there is not the one solution. But you need to look of which kind of configuration files it is, how is the application using them, what should be the result. So there are at first the configuration files, the applications or the system, the network hardware. Common R key value style in it or configuration files. That is very easy as I will show later. There can also be an XML or any other on-format, YAML, JSON, whatever else. Then we have some kinds of databases like ETC, FEC, services, protocols. And in the end, we have system and user accounts. ETC, password D, ETC group, ETC shadow. As you can see already for all of them, you need different solutions. Key value configuration files are the easiest one. That's why we start with them to have some movement for the beginning. We decided to do it like or similar to system D. If there is a configuration file, etcapplication.com, the application should use this configuration file. If this does not exist, allow the vendor application.conf file and merge vendor specific.d snippets so that you have already one full configuration file with defaults and then look for overrides in etcapplication.conf.d and merge them. For more details and examples, it's the best to read the system D unit example sections of the documentation. That is really great documentation and stuff to read and understand in detail with examples, how this should work. So you might now think it's a lot of coding work. But in the end, it isn't. Many applications have already today's support for it. Even if it is most of the time not used by the distributions. Linux BAM, for example, has a configurable vendor directory where it looks for PAM configuration files. It looks in userlibpam.t and it looks in etcpam.t in this order. So you could already do it today without any coding change. You only need to do some packaging changes, put the PAM configuration files of every package in the vendor dpam.t or userlibpam.t and not in etcpam.t. Same for this CTL. We have a userlib, this CTL. You can see that as vendor dpam.t. We have a etc, this CTL.t. So we don't need etc to CTL. In the end, it's superfluous and obsolete. And there are several more tools to have dot-d directories. Alias is the anti-beth-completion-crony-con-dap-mod-dns-mask-drag-hood-grab-issue-locot-mod-pro-net-config-sudo-ass and a lot more. So all that is needed here is go adjust the packaging, think about how to do it correct and maybe a little bit of coding, adjusting, enhancing, but the main groundwork is already done and there. Now there are many other applications which don't use a key-value configuration file. XML, JSON, YAML or some on-formats. If you try this with XML, it's pretty hard. But if you know customization from Kubernetes and YAML files, you know that this is nothing new. There are already existing solutions, for example for YAML files. It's not really easy to write an overwrite for a YAML file. It's a little bit more complex, but it's already done today. Also nothing new. You just need to do it. That's all. System databases. These are files which are not really configuration files, but for historic legacy, whatever reasons, are in ETC for a very, very long time. So RPC service protocols, most of the time people today don't see that they are used on their system. But if you remove them, you will pretty quickly find out who all is using them. So there are also already solutions for it, also already done by several Linux distribution. Move those files to a vendor directory. The versions in ETC, if they exist, only contains additional local entries. Nip NSS user files will look first in ETC, afterwards below vendor deal. And it's already done today by some Linux distribution. So Nip NSS user files is now user-specific, but other distributions have similar NSS plugins for JLIP-C, which are looking in there when no DS4 is filed. It cannot only be used for RPC service and protocols, but for nearly everything what Nip NSS files supports. System accounts. That's the biggest problematic part. We don't have a solution for our system accounts yet. You could do what some others already do and put PassVD group fedo for system accounts in the vendor deal. Normal accounts and changes are going to ETC, and the JLIP-C NSS plugin leads them from both locations and merge them, but there are also many drawbacks. It does not resolve our problems. The admin wants to create system accounts, but with an atomic system, the vendor deal is most likely on the read-only part of the file system. NSS Compat and similar plugins don't work anymore because they don't understand or have support for vendor deals. It is very confusing for customers from our experience as a password for an account is in ETC Fedo, but they don't find the entry for this account in ETC PassVD. So they think something is broken, it's malware or something similar, what is on their system. We had quite some different ideas. One was .D directories for PassVD group and Fedo 2. OpenVal, the OpenVal TCP implementation is doing exactly this, but if you look at it, then you will find or see that all tools modifying the data, the user mod, group mod, PassVD, PEM modules to write back a new password, don't, are not able to handle it. And even for OpenVal TCP, the Fedo seed has limited some mod, but many functionality does not work with it. So it's not really making it better, but more raw from a visibility point of view. Then there are system desist users. In the first glance, it looked perfectly. The new RPM or that RPM has this users .D configuration file and at the next reboot, system D will create this user. In ETC, in the PassVD file, as everybody else is doing and looking at tool. That would be really great, but most of the time, if you want to install an RPM which needs on user, it needs them already doing installation to set the file ownership correct. So it's a nice solution, nice approach, but for our problem, it's coming too late. In the step, we would need the user already earlier. Now I'm several times mentioned a vendor directory. Several Linux systems have them already. So there's user fare defaults for clean Linux and several different packages. User fare base layout, scale, BAM.D, CoreOS for CoreOS container Linux and several packages like user. Writable and ETC Writable by Open2Core, user ETC by Open2Sousa. Some are using user fare miss, but that does not really fit with the file system standard. So we tried to come in an agreement with other distributors, but in the end, unfortunately, the FIS wasn't really interested in defining it. They only want to define common best practices. And there is no best practice, as you can see, and everybody else was saying, we have something, it's working, we want to stick with it. Now in the beginning, I brought the example of login.desk. Let's look how this goal proposal would look like if we use it with login.desk. Today, ETC login.desk is used by the FedoSuite, by AutoLinux and by Linux BAM, the three packages which reading the same configuration file. But only one package can own the configuration file, so normally it's FedoSuite. So if you update AutoLinux and it requires a change to log-in-desk, you also need to update the FedoPackage. That's already for a distributor, not so nice, but doable. From the admin view, ETC's login.desk contains a quick method. You don't want to use desks for it, but something else. So you have to recheck after every update that still the SRH 512 is used and not just for new passwords, which is a lot of work. So it's usually implemented for these reasons LibEconf. LibEconf is a library very flexible and configurable to pass and manage key-value-style configuration files. And Fedo, AutoLinux and Linux BAM all three support them already. You only need to compile them or link them against LibEconf and configure the vendor directory at compile time. So how would the solution look like with LibEconf? You have a vendor-desk log-in-desk from the distribution provided. You have an additional directory log-in.desk.d in the vendor-desk, which contains a snippet from AutoLinux. So no need to always update the Fedo suite only because AutoLinux has a change for log-in-desk. It's now done by the applications when reading the configuration files. As a user, you can create etc log-in-desk, which would overwrite everything below the vendor deal. But that's normally not what you want because then you have a gain or the problems. Instead, you will create an etc log-in.desk.d directory with a random file name. Useful is to give a name which means something to you like encryptmethod.desk. And this contains then your encryptmethod key with a new value. And every time you start passVD or any other tool, reading etc log-in-desk and using Lib or a link against LibEconf, you will have the vendor defaults and your encrypt method. And with every update of the vendor directories configuration files, you don't need to care because you only overwrite one or two variables and that's all. Now the question was in the beginning, how can I see that the distributed tour has changed the configuration file? This heavily depends on your update stack. For OpenZoozer and Zoozer, we use buttoffs with snapshots. So it's quite simple after every update to diff the alt against the new configuration file. You can do that in an automatic way. For others, it depends as I said on their method. And we are currently developing a tool that would merge all the configuration snippets for log-in-desk for example, give you an editor to convertable, modify them, and then write the diffs to etc log-in.desk.d for you so that there is no need for you to look in the system to find all snippets, merge them and try to find out what the final result is. That is currently work in progress and I hope that we can certainly release that too. There are several documentation about this. There is my GitHub repository. Read me about the background and the proposals more detailed than on this slide. There is an OpenZoozer wiki page. We have all the packages listed which we moved already. Also, what the problem is where the documentation, how to use it, and of course the one from LibEcon. So that's for me. Thank you. And you can go over to the question section.