 Hello, my name is Christophe Besson, I'm a software maintenance engineer here at Red Hat working for the support delivery. I will explain how LEAP works from a support perspective. In this presentation, I will at first make a brief introduction to give you the global picture. In a second time, I will focus on the two main parts of the upgrade process. So, what's LEAP? In Unuch L, it's a tool for handling upgrades to the latest, generally available, well-aid release. Its advantage being the ability to roll back in case it fails during this process. The upgrade is done in two steps. The first step is executed on the running world 7OS. It runs many checks, and then it downloads all the required packages from the repositories in order to fill the DNF catch. After that, a reboot is initiated on the new 4.18 kernel, and then the real upgrade is done. Now, why not using yum-dnf directly for this task? DNF is one of the underlying tools, among others, used during the upgrade process. But there are too many changes between world 7 and world 8. It corresponds to a gap of 8 versions in Fedora, so the DNF transaction will fail. The same kind of principles have been used previously in order to upgrade from world 6 to world 7. However, LEAP provides a new way to do this by using more recent available techniques. We will focus on them later in this presentation. What are the main changes between world 7 and world 8 that complicates this task? As every new major release, it comes with some new software, and some of them, including drivers, have been deprecated or removed. For example, the kernel module EE1000, a new driver widely used in the past on some virtual platforms, is now deprecated. That means this module is still present in the kernel module 3. It is loadable, but it is not supported anymore. Warning is printed while loading this driver. Another example is the removal of many PAM modules. For example, PAM32 has been deprecated in favor of PAM fail lock at the time of world 7. And no, it is completely removed on world 8. Another example is the PAM carboros module, which is now superseded by SSSD. So LEAP detects this kind of configuration and warns the user during the upgrade step. Some package names have changed, many of them due to the move from Python 2 to Python 3. The move from Yarm to TNF. As you know, world 8 comes with reorganized repositories. Bazos, which provides the core of the distribution at its name indicates. And AppStream, which provides modules as a replacement for other former world repositories. Grub 2 moves to the free desktop bootloader specification. In other terms, the Grub configuration files have been organized differently. They are split in different parts with one file per boot entry. Obviously, this kind of term has to be taken into account very carefully. Now, I'm going to explain how it works. LEAP supports several backends. By default, LEAP relies on the RHSM backend. Thus, it can work with CDN repositories or a satellite server. Custom repositories can be used, too. It can also be a combination of both, in particular to provide the party updates. At the bottom of this slide, you can see the LEAP pre-upgrade command. Let's begin an upgrade in parallel of this presentation. For the demonstration purpose, I'm going to show a combination of real hosted repositories, providing the real aid packages with a custom repository, in this case, EPEL. As per the documentation, there are some prerequisites beyond the considerations about hardware support. Since we use CDN repo, the system has to be properly registered. Here, it is already the case. We need to have the extra repositories configured as it provides LEAP and DNF 4L7. Let's clean the catch at first. Here, we have our mandatory repositories already configured, especially the REL7 server RPMs and the REL7 server extra RPMs. The system has to be up to date. Here, we have a REL7.9 release, the latest version of the REL series, with the latest kernel installed. Let's check if all is up to date. Let's install LEAP and its dependency. As you can see, it comes with DNF underlying libraries. We have to download a table from the customer portal. It provides some files containing the repository names, the list of packages, including the name changes, and for which architecture a given package is available. We don't have to deal with these files. It's intended for internal needs of LEAP itself to download this table. And then we have to extract it in the following directory. As an example, I installed a third-party package from the EPEL repository. You might know HSTOP, it's a useful tool. Thus, we need to define the EPEL 8 repository in order to provide an update for this package. It has to be defined in the following file. The repo ID, the name, and the base URL are mandatory. Now, we are ready for the in-place upgrade. I launch the LEAP command, and I enable the repo using the EPEL 8 repo ID. It takes some time. We'll take a look at the result later. Let me describe all the phases involved in the upgrade process. The first one being the facts collection phase. LEAP executes many commands to have an overview of the system configurations. Some gathered data, among others, are the hardware devices, the kernel-related configuration, the network configuration, and also a solenoid setting, subscriptions, either a desktop environment, and so on. In this phase, it also determines what to do for a given package. For example, JLIP-C is obviously kept. As another example, the Python 3 LEAP SMNH is installed to replace LEAP SMNH Python. This is almost the same package. One is for Python 3, the former one was for Python 2. As you can see, some packages have been renamed. For example, the DHCP client instead of DHCLIENT. The kernel package itself is kept, but in fact it's the World 7 kernel, which is kept. New kernel, coming with World 8, is packaged in kernel core and kernel modules, mainly. Next, the checks phase. Plenty of checks are done during the upgrade process. Here is a short excerpt. For example, it checks if some old BAM modules are still in use in the current configuration, as explained previously. It also warns the user if some deprecated SSH algorithms are present in the open SSH configuration. LEAP doesn't support encrypted partitions, so the upgrade is inhibited if looks is used. LEAP will also inhibit the upgrade if some deprecated drivers are loaded. Next phase, the target transaction facts collection. In this phase, LEAP creates some overlay FS moons in order to make a try inside the container. The purpose is to leave the system unchanged. It gives the ability to roll back in case of any failure. Overlays, lower layers are tools of the OS system. At least the directories below are mounted. Slash the root file system itself, slash boot, slash var catch DNF, and possibly other partitions like slash var, slash user, and so on. Once the directories are mounted, LEAP uses system DN spawn in order to launch a container. Inside this container, it runs DNF, the one from rail 7, that is to say DNF 4.0, to install DNF 4.2 from rail 8. The installation target is a subdirectory named EL8Target by installing DNF 192 rail 8 package installed in this minimal root file system. Here is the global picture of this phase. As I said, host file systems are mounted using overlays to leave the system unchanged. On the contrary, the DNF catch is bi-mounted to preserve the downloaded package on the host system. The installation route being this directory itself bi-mounted from another one in the host in order to preserve the generated content. That way, we will have a minimal root file system in this directory. It will be used for the next step. Next phase is the target transaction check. In this phase, LEAP re-use the minimal rail 8 root file system in order to check the full upgrade transaction. Inside the container, it runs a dedicated DNF plugin named RailUpgrade, which shows rail 8 underlying DNF libraries. It's executed in a target subdirectory named install-root, which is an overlay Fs using the host file systems as lower layers. I think it's easier to understand this phase thanks to that overview. LEAP spawns a new container by using the minimal rail 8 generated in the previous phase to execute the DNF plugin in question. At this step, it just checks the DNF transaction is successful. In the next phase, the downloaded package will be catch here. They will be preserved on the host system. Here comes the download phase. Once the transaction check is successful, LEAP invokes again the RailUpgrade DNF plugin to download all the required package. This DNF catch will be used later for the real upgrade. The last phase before the reboot is named interim preparation. LEAP use system DN spawn to install camel and draket packages inside the container. It generates a dedicated initromfs using the draket command, working with the 4.18 canal from rail 8. This initromfs includes a LEAP draket module which will proceed to the real upgrade. Finally, LEAP copies the new canal alongside the upgrade initromfs into the slashboot of the host system. And it creates a new boot entry thanks to Grubby. The final action above is done only during the LEAP upgrade mode. Here comes the step where the user is involved. It's time to check the report and the logs. They are located in slashbar logleap. If there is no inhibitor, the upgrade step can be launched. And after that, a reboot to proceed to the real upgrade has to be initiated. Let's come back to the demonstration. As you can see, the upgrade has been inhibited. As a precaution, the tool requires to permit the SSH route login. So let's change it just for the time of the upgrade. Now, I run LEAP in upgrade mode. If it succeeds, it will add an upgrade entry in the boot loader as explained previously. I skip the wait as it takes at least 5 minutes. This time there was no inhibitor. Everything goes well in the first step, so we can reboot the system using the well-upgrade Nitro MFS entry in the grab menu. Before rebooting, let's check if my package from EPL has been downloaded too. Yes, it was. Now, let's see the real upgrade step. Reboot is done using the upgrade boot entry in the grab menu. The dedicated Nitro MFS containing the upgrade that the SSH shell script is loaded, everything is done offline, networking being disabled to make it easier. LEAP is resuming the upgrade execution and spawns a container into slash sys route. The grab and upgrade entry is removed. All file systems are mounted, including slash boot and possibly other file systems, like slash var or slash user. The wild seven product certificate is removed and then young DNF configuration is updated. Now, the core of the upgrade is coming. LEAP runs again the DNF plugin well-upgrade. This time the package are upgraded, installed and some of them are removed. A new Nitro MFS corresponding to the new kernel is generated. Grab 2 is installed again on the boot disk and then logs are written to disk. The upgrade having been done with a serial Linux in permissive mode, a second reboot is needed to relabel the entire root file system. Here we are. The system is fully operational after the third reboot. The SystemD LEAP Prism service is started during this boot process. The release is locked to the current upgraded version thanks to the subscription manager. The first boot phase actor is also executed. It can be a custom actor. The main goal being to add some custom post-upgrade actions, especially for third-party software like enabling a given service. And some cleanups are done including the removal of this written service itself. Now let's reboot the system. Here I'm also skipping some parts since it takes a lot of time. Package are currently under installation. Now all the package have been upgraded and a new Nitro MFS corresponding to the new kernel is generated. It's installed. It's almost finished. The system is now upgraded. It reboots to relabel the file system. Here we can see the very last phase with the LEAP Prism service. The upgrade is successful. The upgrade operation is almost done. However, it remains a few actions for the user. The system has rebooted in S-Linux permissive mode, so it's time to go back to enforcing by setting it persistently. Of course, this step can be bypassed if S-Linux was disabled from the beginning. The second thing to have in mind, the rel version has been locked and you might want to un-set it in order to have the updates from the latest rel version. Here is a focus on how to debug that upgrade process. Most problems are related to the user configuration. The logs and report are usually enough to fix the incontred issue to have a better understanding of what is failing, adding the debug switch can help, as it prints every single command launched by LEAP. If a true bug is suspected, STRACE is our best frame, as usual. The resulting file is very huge, but combined with the debug mode, it's not too hard to determine what happens just before the failure. If an issue occurs in the reboot step, it can be more complex, like a can boot case. We need to add some parameters to the kernel command line to fetch the serial console and the full boot log, and to possibly enable the init.fs debug mode. It is also feasible to insert some break points, upgrade which stops the execution just before upgrading the packages, and leap upgrade which breaks just after that. Let me give you an example. In a recent case, a customer got the following error message. He had set the container mode for the subscription manager. In to practice, it's something done by creating a sim link in the slash ETC directory inside the container. During the upgrade process, it's the very first bright operation in that container. In that case, unfortunately, debug logs are not enough. Thus, we need to stress the leap process to get some evidence. As per the debug log, we know the sim link is created via systemd and spawn. So, we'll look for the exact vehicle that runs that command to find out the PID, and then we'll look around the end of this process just before it exited with an error. And what can we notice here? The slash ETC host file does not exist on this system. Despite removing that file is not strictly forbidden. It's not very common. But as you know, users might be very creative. By comparing with an stress on the test system, we can see the slash ETC host file is bind-mounted into the container just after the check. Since the file doesn't exist on the customer system, that explains why systemd and spawn fails and why leap print its message. Creating again that file as it comes by default fix the issue. Before ending this presentation, here are some useful links from the official documentation to some articles explaining how to customize the upgrade process. Any questions?