 So hello everyone Welcome to enhancing the Fedora update process process Don't be alarmed. We're not pushing anything into Fedora. This is just a proposal My name is Vincent Svinstra. I'm a senior software engineer and I'm working for So We have a tool called leap Leap is not only a tool It's also a framework for application or as modernization As I said, it's a tool which will be used for the upgrade from row 7 6 to relate It's the initial version anything later You know you will see What does modernization mean updating upgrading the system and then there's the question isn't that that what DNF and RPM is doing. Yes and also no because leap can allow you to do Customizations and extensions of the upgrade process which is way beyond the capabilities of DNF or RPM Not only because of technical limitations, but also because of policies there reasons for that to Guarantee the stable upgrade transaction to happen So What we with these two can actually do for the users? We can make optional things available to users during the upgrade things that haven't been there But are now optional in the in the newer version can be made Available through giving them choice choices by asking them questions What we can do is convert configuration file formats sometimes from one version to another some parts of the configuration files changes and the meaning changes or the format changes or values that deprecated Or things like this. So you can do this with RPM. This is usually a no-no We can allow third-party applications to be upgraded together with the operating system That means like not only that there that you just include the repositories which you need which is needed through the transaction, but quite often like Sometimes when you upgrade the software then it needs database migrations to happen and things like this So you never know. So this these are basically Waste which can you can use there? As I already mentioned, you can ask the user questions during the process Right now there is no such thing in the RPM upgrade You can allow to do things. Yeah, that's what I already said Yeah, you can actually influence the transaction sometimes you see that it does something, but you don't want it to be done For example, it installs something that you don't actually need because it sees it like oh This is a soft dependency or something. Let's install it with it, but you don't need it You can warn the user Or about removed support for something that is in use or at least has been used in the past It means you can scan the current system before you do the upgrade and see oh you using butterfus Well, butterfus will be removed. It's obsolete. There's always something better Now I have here a few scenarios, which are hypothetical. They are not implemented neither in realm nor anywhere else. These are just Ideas in a way that you can imagine how we could improve the process like first Asking the user to switch to the new bootloader specification In Until now we every every bootloader had its own configuration files grab simple or who knows what which bootloader on what platform and there was like a move to Unify that in one place to have all the the bootloader Specifications also across as far as I understand it correctly also across the multiple distributions So then there's like one way for doing it for everyone a Question to the user could look like this There's like Convert to bootloader specification with a justification What it is it to give the user a choice to do it or not and basically say yes or no and Based on that the action will be taken or not later on when it's time to do it Another scenario would be detecting Python 2 application scripts and subscribe to the Python 2 Channels what I call the channels, but the actual name should be module streams as far as I know So you could ask the user Yeah, here. I wrote it Python 2.7 module stream we detected Python 2 scripts on your system The new version of Fedora gives you the opportunity to switch to the Python 2.7 module stream and then you can Have a choice like to either just try to the stream Do nothing or like abort the upgrade and do whatever or something with the system if you if you want to another thing is like detecting these scripts or applications and one the user about the discontinued support, which is a thing next year, right? I mean Python 2.7 is no longer supported officially at least upstream not and It could look like this and we could even show for example where we found these files That said these kind of scans would be very very very expensive on on systems which have lots of files so this are just potential things we could do other thing which happened I think when I'm not wrong in Fedora 29 They switched to Postgres 10 But you can still have a Postgres 9 Promodule stream as far as I know and Sometimes you don't want to upgrade your database Because like you are not sure that the application that uses is it will be actually happy about the fact that there's a newer version or Vice versa, maybe it's better to have a newer version and not to have but you cannot support the older versions and whatever So if the user a choice that they can switch to it or give them the choice to say like upgrade to the latest Thing would be like service defaults Do we quite to be quite honest when I thought about that it seemed like a good idea, but I think it's maybe not even that useful, but sometimes Someone changes the user defaults and we can detect that they are not That there are differences between that what will be and what should be and We can ask the user to convert them and merge or merge them and keep my changes and keep or to keep the previous Defaults or discard the changes and apply the new defaults and Well again back to the third-party applications part of this Extension could be just to update the repository also config repository configuration Sometimes these They change locations they they have different ways of Referring to the the the path in the in the URLs and so on so which are not like covered by the typical variables which which are used in in fedora or By am so it can be used to to update the repositories for the new for the new system that the new rpm So we found during the upgrade Yeah, and I have here an overview Over the upgrade workflow by example of rel that means like how are we are doing it right now? Right now means like how we are going to do it basically for the for the relate upgrade The first thing what the users would do they am install leap as a tool and run leap upgrade and They are like three stages There's the the original state where it's just like in rel 7 we would just scan the system We will download some things But we will not modify really anything on the configuration or on the system the next phase would be the in it from this that is like a stage in between where we already boot into a new kernel new system D and We'll be able with the new tooling of rpm to apply all the the rpm transaction and the last stage is basically starting the rel 8 system and Giving for some actions that need to be executed on the first boot give the opportunity to run them In the first phase We scan and collect facts about the system that means like we scan like what is the Networking configuration, what's the storage layout and all these things we collect together and Any tools that would want later to upgrade like for example configurations or whatever They would scan them collect the information and would send them as messages to or store them as messages in our system The next thing is like we would perform checks on this collected information to to be able to to block Upgrades in case there is a problem Problems could be that you use a kernel module that is not supported anymore Like I mean like one thing I noticed it was like Reversed later, but I had the problem with the e1000 driver which was removed during during the development phase and Basically, this would be something we would detect and will prevent the user from from upgrading so that they are not Ending up with an unbootable system or a system that is at the end like inaccessible because the networking driver can't be found Next section would be like that you ask the user about questions About what you found if you want to do something you want to do something Like these scenarios is so before The next thing is like we would solve the rpm transaction. That means like we're trying to solve the the transaction from the rail 7 system to the rail 8 system and Be sure that it works and of course downloads already all rpms. So then they are locally available When we are going to the Internet RAM disk after we prepared the Internet RAM disk Which is like something that needs to be configured of course in a bootloader specific Configuration we will start the Internet RAM disk and from there we will not boot the system on the disk we will just start in the Internet RAM disk and we will switch into some Different state where we actually continue with our process and we can apply first workarounds that will fix Problems that are in the transaction, especially when you consider this big jumps that rail does like from rail 7 to relate There's like many years of development for this all these tooling and applications. Sometimes it happens that the RPMs are not able to Upgrade without actually some intervention. We can do the workaround application before that and Perform then the rpm upgrade Also, of course, there can be done some late checks before that so in case Sometimes you might be only able to detect some of the problems once you are already in the new system under the new kernel or in the new using the new system B After the after the transaction passed We would apply configuration fixes that means like the new configuration files are updated based on the data that has been found originally in the raw 7 system and Yeah, the relate checks should have been a little bit earlier, but and The last part is that we are scheduling the s a linux relabeling So because we changed a lot of things and as a Linux during this transaction needs to be disabled because it would go Crazy about what we're doing We need to ensure that once it's rebooting that is Relabeling all the files correctly so Once we're booting into rel 8 the first thing that happens and the first time right now is that it will relabel everything because we told it to That is considered when you didn't disable as a Linux of course on your original system and We're getting them to the first boost for first boot tasks. That means like sometimes we for example can clean up Temporaries and things like this afterwards Users could like let's imagine like you have a workstation You could get like what's new in rel 8 or something. This is like scheduled for the first time and not like To come up every time so and then you're done basically so I have a demo for this which takes about four minutes. It goes just quickly through there are no questions in this involved, but That's I will like give you just a little bit of an idea how this currently looks like so Start so first thing I run the leave leave upgrade tool. It will this is that with the back output. Yeah, so Usually this is This will be reduced because it's too much information for for someone to process and especially for normal users This is not really interesting The whole process Right now what it does it did already all the checks it did already the scanning part right now. It's preparing the Subscription manager to move to the rel 8 subscription or to relate product but in a in a Temporary environment. Let's be we basically use a System the N spawn container with an overlay FS to to modify the system without actually modifying the real system and So we are able to get System this subscription manager to switch over to the new Product and then after it's done we are trying to get the Now we're trying we are getting the the metadata from the repository and We already in this stage. We already applied through a plug-in all the workarounds to to make the Transaction succeed and work exactly how it should be By default young wouldn't be able to DNF wouldn't be able to resolve this. We're using already DNF here Because a lot of the packaging information is lost and there are packages replaced with other ones and They cannot be Specified in the packaging. So this is like another reason why we had some but why we need some tooling like this now we will reboot to the in it run system and You see it's like already went through the first stage and now it applies already the the upgrade transaction This is if you look at the curse that we will notice that this is fed up And Also, you can see it here After about like a hundred nine seconds of the boot you will notice that it will be even faster because It took like ten minutes to generate the RAM disks for whatever reason There are still things which we need to solve but like this is Still in involved in in development so that was finished relaveling is happening now And right after that it will be it will be booting into the new system Also, this was I think set up twice or three four times or something That's just the whole process takes 17 minutes. I Narrow the dine down to four so That is not taking too much time of the talk So and How it's booting into relight? And I just run some some a few comments that is like showing some things like you see already a limit rated enterprise in those eight of all beta and It's like the reddit release file Like network is up and it's that's it for this for the demo But let's see you how far we are with the time where we good so Now I will give you an overview over leap About the framework. What can it do and what does of what does it consist what we what we're doing here? The main thing is that we have a thing like called actors which are the modules which actually do the work and These actors the communicate over messages Which we represent in the form of models the the messages and the models are basically the payload of the messages Everything else is metadata around it that we know from where it came when it was done and things like this, but It boils down basically to data a model looks a model definition looks like this Be like experience Python guys might have noticed that this is a very common way of doing things like for databases for example since we're talking here about data structures it was like a natural choice to do something similar and We have a way of expressing things are optional by saying they're nullable or making them required by having them not nullable and However, giving them the ability to specify a default value and then since they are like kind of optional as well There's like also the ability of inheritance So you can see you can use the same Model but different names For different meanings sometimes you have the same data, but the meaning of the data is like something else We have here the upper empty RPM transaction tasks is something for the workarounds what I was talking about where we Influence the the RPM transaction There's like we say like this needs to be installed these needs to be kept these needs to be removed and there are like some local RPMs which we bundle to solve some of the problems with the packaging and However, some of these actions actually need still to be filtered out and That's why we needed to separate it because otherwise some actor says I want to have the RPM transaction Tasks message But I actually need the filtered one. So instead of like putting there a Field in saying like hey, this is the filtered one or this is not the filtered one and having it iterate through all possible messages We would basically just applicated The the structure and give it a the new name We're using types basically for everything I will get there we have different field types if you look at it there are no dictionaries We have strings numbers Also specifically integer offload Booleans date times lists and Model where model means like you can have another model living as a nested item But you cannot have arbitrary key value pairs if you want something like this Use a normal dictionary and dump it as a Jason string and you store it as a string It's too it it basically prevents you from doing like saying checks about like What's the final would not because in the end you will end up with everyone just using dictionaries and This is not a good idea We have also the dialogues here They look like this when you use them just that here you just see right after the execution what was returned. So I can show if you want I can show you that I have it actually prepared Oh, is it readable? I guess it is So you see like it's waiting for input has a default value of the DEF CON CZ which is like the current local username for the for the Wi-Fi and you can specify a password Which I just did and it's like you see there was no echo not echoing it. So it's really interactive dialogues however These dialogues are meant also to be replaced with for example cockpit based interfaces, so we could change the The dialogues to be actually webforms In if you would in start the upgrade through some cockpit instance that is still hypothetical thing but basically The way how we implemented it should be possible because we we can just switch the renderer and it will handle all these things for us Here's that I can second question like What sci-fi universe you prefer? I rather not answer this question Because they might people be very passionate about this Probably even worse with Marvel and DC So this is how this looks like Really new and not even my colleagues in the room will know that Right now this morning I changed actually the way how we define dialogues It will look like this Something like this This is again a class and it's very similar to the models Because it allows you to to write it in a more natural way Before you had to create a dialogue instance and put there a tuple of components which you put there It was just too complicated I got that there's a feedback and I was like thinking about it why not to do it's like in in this way Then bubble over there was like talking why you don't use the models and I was like hmm so I basically resemble the models in some way here as well and use even the dock strings for the Of the credentials basically what is displayed to the user and just give the ability to set the title for the dialogue You're done For the dialogues you can ask the user like certain questions Input types are text numbers passwords. We just like hidden input. Of course. Yes. No answers choice and multiple choice multiple choices Everyone knows checkboxes, right? I Workflows is basically the The thing how the framework knows what to do This is like if you execute something with the framework you specify a workflow The beginning of our in place upgrade workflow looks like this I stripped it down from the documentation because to get a bit more fitting in here, but the Justice here You specify basically You specify here just the workflow class and you define subclasses in here, which are individual phases and So a workflow consists of phases and in these phases are these okay Missing a slide in these phases will be actors and the actors are selected by tags and You see there's like for example checks face tag download face tag any rum start face tag and I have there we go This Based on these tags we can find we can find from from all the actors that exist Where to put them into which face and when to execute them and that's like you can see here like these are like facts collection tagged tagged actors these red ones and They will go into the facts collection face The yellow ones goes the checks face goes in a checks face and so on so I Think I will skip this today. This is not too interesting. We have a repository where all of these things are inside and These repositories can also be linked together means we can exist multiple repositories One use case where this would be for example that like a third party third-party vendor would add their own actors and models and things like this into the into the mix and But they also want to use things from the existing Repositories so they can actually link to it and include their own actors into the workflow We have also bundling support for tools which like best scripts even binaries Files like which you need data files as templates or or things like this or input data You can have Python libraries included Which is like your own things that to to make things better and these things can be either private to the actor or Shared across multiple There's just an overview of overview about like what's all there. I think that's not Then we have a tool called snack there Snack there helps to develop the the the actors more rapidly because you need to follow some convention and snack that can help you to Basically get rid of the boiler code and regenerate and pre-generate you do things It can run the actors like for testing It can run whole workflows also for testing It can get it generate boiler boilerplate for actors tags topics models Workflows even create the repository and you can manage the repositories with it in a sense of you can register them for local Yeah, there's a registry in your home folder where it can store The locations of all your repositories so that at any place where you are they can list them and can find what is registered and In the repository it can show you what is in there and even what is linked to it And of course it can do the linking of the repositories And now I come to the section which is like usually the most demanding so for For the live actor demo, I prepared already a workflow and Repository There's like as I showed you before there was like a work for the in place upgrade the workflow This one has only two phases as the first phase and second phase Nothing special. It's just like I want to show you how you can get actually your things executed in there So as a first thing You see there's nectar discover which gives you The overview and shows you what's in there I Already predefined for for here Like two actors a location provider that is supposed to be executed in the first phase and the sink death com data Actor that is supposed to be collecting all messages that are under the death coms data topic Additionally, I have the location model specified a hostname and a result hostname model specified and for those we will be writing now some Some some actors that actually use them just that you get a feel of like how one would write like this module So the first like there would be like figure out. What's the current hosting? So It's obviously not a big make another big deal, but for that you can use this nectar tool to provide provide you with the Boiler plate the thing what we need to do we want to execute it in the first phase So we copy that put it here And we want to have it included into the in the death com workflow. So we need to put the We would put there also the pack for the workflow and now since we want to produce like We want to scan for the hostname and we want to produce a message that actually contains it. We will already save produce Hostname, which is the model and we called this whole thing was named scanner and we can now Let's lose we just really cold dinner have it here so So you see this is the boilerplate it generated for you already it imported The models which you need it imported the tax which you need and it pre-filled it here So the thing is that what you need to do is get Going and the first thing we do will say We show a message that to make That hard work. We need to have the socket library from Python We produce a hostname with the name variable Get the FQDN and This is the first activity row That's all you need to do to get produce a message send it to others for for for using it to See this in action now we can use nectar run to say On the hostname scanner. Oh look Okay Something with it wrong Okay Now I see the file wasn't saved Okay, so You see it just shows shows a message here But if you want to see what it produced you can actually say print output And you can see that it produced a message with the payloads Which says like name field and it has the data for the local host local domain, which is the hostname of this virtual machine and Now if you want to process that message you want to resolve the IP for that we have the resolved hostname Model that inherits the hostname model so it gets also the name field and Extended it with an IP field So let's create an actor That does this so We can use the same the same face We produce the resolved hostname We consume the hostname of the other actor and we call this actor hostname Resolver So now is your new item and you see the hostname resolver. So for every hostname we get so for Hostname in self consume Name this will this will limit the input to just the type hostname If you wouldn't do this and you would consume multiple types of messages, you would get everything So basically the specification of hostname here is a is a filter on the type So now we want to want to Resolve it use the socket use self produce hostname Name is hostname name and IP is Socket get host by name Thank you So what we do here is like we resolve the hostname in in in this In here We pass through the name to the new to the new type and we produce a new message and Send it further so for every time we get for it there can be multiple messages of the hostname So there could be multiple actors that produce it or there could be multiple hostnames that need to be resolved this would take care of it to Be resolving all of the hostnames that are sent just for the sake of Visibility Self-resolving hostnames now. I'm running this hostname resolver, but it doesn't do anything Because we need actually if we use nectar run We needed to explicitly tell it to save something To be used by other actors There's a reason for the reason for this is if you're debugging something it would save all the messages You would end up with a lot of bogus data in a in a place that would later be used for another actor so to avoid this You use you call the first one again, but you say just save output and This one is doing something so You saw before the workflow you can execute it with nectar as well and it will execute All the things It was scanning for the hostname it was resolving the hostnames It resolved it for local host a local domain and then in the end I have a sync Actor that actually consumes all the messages that have been produced. You saw there's like the dev conf He said which is like the first actor is like providing location the second message that was done was the scanning for the hostname and the third one which was done was the resolved Resolved IP So basically, that's just a quick introduction to this Writing actors can be a bit more challenging because there's a lot of things involved, but I hope that gave you a little bit of an idea Any questions at one second, I just so if just like what just quick Links to documentation and our github organizations are here, so Please I would rather see it as like working. Oh, yeah, the question was if If it would be replacing the system upgrade plug-in, I'd rather would see it more as like either extending or integrating with it Because that's basically what we do we use we don't want to replace it We just for Fedora it makes more sense to use the approach that it used to use there So we would rather reuse what exists there and try to integrate with it. Yes, please Yes, I did Yeah, well the question was if I looked at the history of Fed up and Yes, I did look at the history of Fed up, which is the redhead upgrade tool actually and Yes, we have been looking into it and there are known problems which we know about like like for example, a bi Problems and and other things but we are not doing the same mistakes Which have been done there because for example one of the things which we are trying to avoid is like switching the system D context Like we don't load like from our new system D in the Internet is for example the 20 versions older system D and trying to give it like the state from a new version So now we are not doing this and no, we're not using Plymouth either. So So there's we're trying to avoid these things. Yes, please The question is if there if something goes wrong if there's a way to roll back at this moment, no at this moment, no, but this definitely is something that we have to address and It's basically already a regression from the previous version of rail 6 to 7 because we actually support snapshots in some way there So we need to improve on that but the initial version won't have it because of the complexity with with the rest of it involved It would be something which we add on top of it later, but no idea when Then just the question is if they have a way of verifying that everything is okay. Um Let's say like this if the You don't really have a way of verifying that everything is okay. You can do some checks But even if I mean if you can't roll back What are you doing? Yes Well, let's say like this the leap tool we might bring it like the framework into it The upgrade tool does not make sense in it because we don't have a workflow specifically Specifically fitting to Fedora just yet. We don't have the time This is more like a teaser of like hey get the Fedora community But like or people who are interested with this like get them interested into this topic to improve Like we don't want to force them something to take something that they don't want so it's it's like basically hey look what we have We offer you to to use it as well And we are fine also with doing things for it and work with with you But we're not going to to actively start with it right now because we are like not having enough time and For forgetting it done. We have another lot of other things to do. I saw question. Yes So basically if if Maintaining the obsolete stacks in RPMs if that would have solved some of the problems we have and if the Fedora community wouldn't like Basically purge these obsolete after like two or three versions to basically benefit well with it The thing is like I thought so as well at the first moment but actually the longer I think about it is quite a high burden to keep it in there and Also, some of these things cannot be resolved with absolutes or provides For example the Python 2 Python 3 switch You cannot Say Python 2 version is obsolete by a Python 3 version. Even it's a library Because they right now can even co-exist, you know, so it is not so simple You know, that's why there are a lot of corner cases and not all of the problems are like related even to these things We have a way around this for this. It's called package evolution service, which provides us with data for this, but this is For some other time Any other questions? Yeah, so within a phase obviously there is the only yeah, how does the ordering of actors work within a phase? obviously the phases is not a problem, but the actors they can send messages and they can write messages and can send messages and consume messages and Basically, we need to be able to resolve the the order. That's like what you're trying to say, right? so and what we are using there what we're using there is topology sort and Basically any Producer is executed first for anything that is consumed so if within the phase anyone produces something that the same phase consumes that is done before and No one can produce and consume the same message So that's any other questions. Thank you very much