 Yeah, so then hello everyone. Good afternoon and Today we want to share our learnings from upgrading an outdated yokto-based system First of all today's agenda Yeah, first point we want to introduce ourselves Then we will introduce the initial situation we had in our project Then the general approach will be shown how we want to go through the upgrading process Then We want to show up what was right and wrong before the upgrade After that we will go into more detail into several issues We had during this upgrading process and in the end we will summarize some learnings we had during this upgrading So now as a first point, let me introduce myself. I am Simone Weiss. I'm working as a software Developer for embedded systems embedded Linux to be specific in the automotive industry Originally, I studied computer science back. Yeah, well only four years back but since then I worked mostly in yokto-based embedded Linux distributions and And and also took care about cyber security topics there with respect to the automotive industry Privately, I like my cats and nature and it's the first time I'm giving a target software conference So please forgive any nervousness and adults through that, but I'm very excited to be here So thank you also for being here today. Yeah, so my name is Michael Ester I'm a senior software engineer at Electrobit. My background is electrical engineering and My day-to-day work is about embedded Linux and Python and in my spare time. I practice MMA I love hiking and cooking and it's also my first time speaking. So also a bit nervous So then a few words to Electrobit itself Electrobit is an automotive company with 35 years experience in this field Nowadays we have 10 years open source experience We are building embedded Linux systems with yokto and Now nowadays also we build a big corpus linux. So every corpus linux is the Electrobit Linux distribution There's nowadays also a version built up on Ubuntu So the first question one could I could ironically ask and maybe management will ask is that so Why do you need to even upgrade yokto and I want to make it short so that your system does not look like this You see here a little bit of rundown yokty that I got at the last Embedded open source summit in Prague and we don't want to look our system like this We are still like our older yokty, but we also like the newer yokto versions more So you should upgrade it so that it doesn't look right around a little bit rundown anymore. So that's a very That's the fun part of the motivation and now our initial situation just like my Yeah, so as you can see we were talking today about the HPC platform You can see on the screen and the red circle, but what gets all into this platform So that you also know the What we are talking about. So first of all, we have some external components like pokey open embedded and the arm trusted framework Then we have the Electrobit products that are come into the platform like firmware hypervisor the corpus linux image and in the automotive industry quite common adaptive autosar That comes all together in this HPC platform. So again from firmware hypervisor linux autosar and then we have there also Stuff like opti then some eb own applications that are running on the on the platform and Also Phoebe and the goal of this platform is to the to deliver it to the customer The customer can put on his applications on it and have a running HPC platform so We are building the linux with the yokto project and The problem we are facing was that the kernel was kernel maintenance was about to end and Also other components of the system were not maintained any longer in the automotive industry It's quite common that systems needs need to be maintained 15 years or more and We are only a small team of develop us and it's not feasible for us to maintain all of this by ourselves So we won't be able to provide fixes for stuff like CVE's or Integrating new functionality Also the year 2038 problem Patch is not yet applied to all components. So our conclusion is upgrading yokto to a new immersion that is still maintained So now that Michelle told you a little bit about the initial situation I want to give a brief overview about the general approach we took and upgrading the Yokto system. So the first question we asked ourselves, which version do we want to upgrade to? Well, one could say that's easy just use newest one, but you could also say, okay We want to use the LTS. So for example, let's use Kirkstone and because that's current LTS or you can say well I have maybe another project running with a Yokto version that's maybe you have a year old, but maybe I could have some reason that she's there So that's all things that you should consider But also that is the corner maybe in this myokto version and LTS kernel because for that that for us was a very important point Then you might have other upstream layers besides pulch yet meter open embedded that are relevant in your software And you need to check. Okay. Are those layers available in this kind of version? Can I upgrade them myself to this version? Is it all fitting together? So those are all the considerations that went into our decision And I cannot give you a channel recommendation there because what I want to highlight here It's it depends on your specific use case that it depends on your specific requirements on your specific situation But those are in brief the points that we considered and we were at the end quite happy with our decision So once the decision is made Our next step was to upgrade our build environment So we are building our system with a container and what we basically did we just check the yokto development manual and Check, okay. Are there any new packages required for building this yokto version. We put them all into our container and this Pretty much flew from the beginning. So no major issues there. That was easy then what we did next was we read through all the migration notes that are Available in the yokto development manual For the versions we are upgrading over so Not only the one where we are upgrading to but also all those that we are skipping this upgrade And we are trying to get the chest of it to understand. Okay. What has now changed where do we need to adapt and this was As that we took for preparation So those are the main parts we took before we upgraded as a preparation So now let me come to the upgrade process as such and We have some constraints here. So we have two machines on the one side we have quemo on the other side, we have a hardware sock and For those two machines, we have multiple distributions. I will tell more about those distributions and the later and but for each machine and This diffusion combination we process the layers as per their priority and Upgraded one layer after the other. So first we fix the syntax of the layer. So there might be some changes in The new version with the syntax. So we need to fix that then we apply the knowledge that began to read through a migration notes Then we checked. Okay. Have we overwritten any configuration from poor kilometer open embedded in our layers? Is this still valid? Do we still want that? Is it still implementing the same thing that we really needed to review and every overwrite configuration we did there from the pokey system and Only a few cases we needed to adapt and say, okay, this is I don't know this variable is not called this way anymore whatever you need to Basically rename it or whatever or something like that. Then we needed to check the patch rest So we have for example CVE patches and okay when we're upgrading then we have different CVE So some might still apply some might not apply. So I'm sort this out as per open source package Then you have also maybe some project specific Functionality implemented Where you treat an open source component to fit your requirements better. So those patches you might need to rebase or to Check if there's the feasible this is open source package. I will tell I will give a concrete example about this later and Once you've worked through those four steps, you're basically ready for first build and Then you will have some issues for sure you will have them and Then you fix the build issues build again and then after some iterations of that you will be able to Complete a build and then you start testing that and there we always had certain Goals in mind that we expect from a layer. So for example This is layer. I expect QEMO to at least start or with this layer I expect QEMO to bring up additionally two containers that implement this and that and that we did in a smoke testing Fashion and once we were satisfied with this And at the next layer and started the circle again Yeah, so then what was right and wrong before we started the upgrade? so on the technical side why we need to upgrade the understanding was totally given but Summarizing all this upgrading effort into Summaries for management so that also there a complete understanding is given as very hard on the other side Yeah Then on the right side, we have no modifications in the pokey layer and in pokey or any other upstream layer as recommended But on the other hand we have overrides and back ported classes and also own recipes For components what is good that we only that we have created more than one layer But most of the layers have the same priority We have on recipes created with quite good quality and also with respect to upstream but there are also recipes that are quite out that are outdated and Not with respect to upstream created The use for sure version control but The commit history is really unclean so this makes it really hard to track down stuff The last good point is that we use the inheritance mechanism But then on the on the wrong side we have quite a few more points to go so The style recommendations are not followed everywhere We have circular dependencies between different layers and We do not use a reason pokey So now that we gave the overview about our initial situation and our general approach We want to point five complete issues to you and the first one is called machines and distributions You see there on this table our machines the hardware and the quimo and also our distributions And as you see we have way more distribution than machines To give you a rough overview we have our normal distribution. We have two flavors of installer for this distribution We have a debug distribution again two flavors of installers for this debug distribution Then we have Many distributions for testing so we have One that is including the sis caller where we pass our kernel and installer for this Distribution we have unit and hardware tests and robot test distributions and again robot It's taller for the robot test distribution and then we have a production distribution That is the end the one that gets deployed to in our case a car And then you see the dot dot dots and I want to point out There were many distributions and this was already our issue. So we had Too many distributions and upgrading all of this is just a huge effort Like coding building testing and you do not want to spend the time on something that is not necessary So our first recommendation is to refute your machines and distributions carefully before you upgrade First of all, you should avoid unnecessary machine and distribution combinations So for example in our case we had an hardware installer for the sis caller, but we only fast on quimo So this is a distribution that is just not needed. We can scratch it go delete it Nobody wants that code. That's not usable And might be different in your case, but delete any code that you don't need before you upgrade it second of all if you have a Combination where you do not know the reason for well, there is most likely none And we had a combination for a debug distribution with our second flavor of installer that was not Feasible because our second version of installer can only accept so-and-so large images But our debug distribution was anyway much bigger so there was no real use case and it could be deleted as well and third of all and This might be Applicant to more people than the other two issues which were very specific to us Check if distributions can be merged with each other. So I said before that we had a distribution for the robot tester But this only differed from the normal distribution We had by adding an extra container that contained tools and programs that we want to utilize for robot testing So we could just say okay. We don't need the specific distribution at all. Let's just Use the normal distribution and deploy the container for example via SSH So remove the distribution save the build time and ease up your maintenance by removing Code that is either dead or duplicating Close enough for functionality that you have already implemented Yeah, so then the second issue we had Too many layers. So as I told you before We have created more than one layer what is good, but if you create too much layers It's not good. So here you can see all our layers in the project and how they append on each other and Yeah The layer dependency increases incredible when you have so much layers and Also you see on the right side of picture we have there on layer which contains several applications we have in our in our image and there you can see the application layer Four layers append on the application layer and the application layer also append on four other layers and With this the general upgrading approach Simone told or told you is It's very hard to follow because you do not know how to go then to the next layer And yeah, this makes it very hard because you have to look in multiple layers How to how to go forward forward to the upgrading process? the next point With the layers was that the priorities are not consistent. So as I told you before we have a lot of layers that share the same priorities and When you have this case You might not know from which from which layer which recipe comes in the end in your in your in your image so choose your priorities carefully and This is the third Part is an issue in our Project organization, I would say so we have layers maintained by different teams So we are together in the Linux team. So we maintain a lot of layers, but we have also other Departments that are also maintaining layers and yeah, they are not often Know the style guidelines or how to how a layer should look like and yeah, this makes it also very hard So and this is directly related to our third issue Here called non-conforming layers, but you could also maybe describe it as unclean layers So now I will mention where we upgraded from maybe the source what integrates that you so our starting point is quite old So we started with doctor version 2.4 and you upgraded to at the latest version that was then given when we started So 2.4 is quite old, but the good news is then we still did it at the end So even there's still hope for very old systems. That's the good news for everyone. So and We needed to adapt the yachter syntax with the help of the scripts and refuse very carefully You might be tempted to skip this yet, but in our experience And I don't know how it is for others But if you start with 2.4, it's really worthwhile to refute this carefully Because other people might start from more recent versions and then the scripts might be more applicable so refute this very carefully and a Good news is that the Python 3 Adaption from Python 2 was really without any big hassle So we basically replace Python 2 with pysons 3 and that depends and in the inherits and it worked out of the box there was Really no big hassle there. So that that was a point that went very smooth and Then last of it and we had many built failures that were basically due to the fact that the OEC make Generator and our defaults to Ninja, but we still utilize Unix make files mostly. So Upgrading this variable also fixed many build issues from our side overall many packages were modified in multiple layers and All those layers are the same priority and why we are aware that you can use for example bit big minus e to get the environment It was still hard to track down and which Change happens where and why and maybe there was a change that was not related over all this the This is the functionality of a layer. So you did not expect it there So this was all the points that we needed to clean up But I think this is something that's really worthwhile in the end that you should spend the effort and Last point and yeah, we already had the point that we did not have a coherent style and for example, I want to give you a Could into consideration the swing source your eye append plus equals those are just stuff that I found during the upgrade and This would have not be needed if we would have used linter for example I personally like OE lent ADV and we are now using this and I think it's also worthwhile to integrate a minimum set of feasible rules for example in a GitHub hook Yeah, so then we come to backporting So in the project we have a layer that is called major project BB class overrides and Yeah, it's a layer that overrides stuff from for example, pokey or also from our product layers Then we had the case that We had before the upgrade we had a district on this row configuration override and When we started to upgrade this override Does not apply anymore. So we needed to find out why yeah a lot of stuff was outdated and In the end we were able to remove this distro config class override completely because everything was was already in our product included Second thing is You're a bit of a history. So we need a reputable build for our platform And there we also had a lot of classes we overrides in this in this project layer Here you can see for example the image types BB class From pokey and there were also several other classes that are overwritten But after the upgrade we were able to remove All of them excluding this image type BB class so we could reduce the size of code drastically Okay then also during this upgrading effort We had a lot of workarounds Around this in recipes That could be removed after the upgrade. So for example the busy box And the busy box recipe we were able we had done Three liner where just the dev config is renamed and yeah after the upgrade the desk config had already the right name So deleted stuff because It makes it less complicated We have also We were also able to remove files and configurations for example the read line package That has several files added. We do not need In this upgraded version, so we deleted that stuff completely and also Crash journalism package that we do not need so we also deleted that and Yeah, then for the back porting Get it's a very important. I already told at the beginning We have an unclean Get history and the commit messages were not in a very good shape Because they do not describe Why some change was made they most of the time describe only What what what was changed in the code or Sometimes they did not say anything and this makes it very hard if you want to drag track down something Why the stuff has changed and you have to port it to a new version have to upgrade it Yeah, so they are Describe why a change was made because in the future it will save you a lot of time because you know why something happens Yes, so let me now come to the last issue I want to present today It's issue number five and it's handling changes and other open source components so of course when you upgrade your to turn your version you will also many open source components in that newer version and We patched most of them well at most but many open source components and again in our project with around 280 patches and the patches included CVE fixes, but also other project specific Adaptions for our individual requirements we hadn't given and While it would be fine to just have the patches additionally for some components and let me name now one botan and They were additionally forked so we had a fork of botan where we had then over 1k commits and then again patches on top and All of this was not developed by us by the Linux team But still yet we needed to do the upgrade and that problem there was then We had basically no idea what's now going on botan was not even building. It was very unclean so our Commendation directly out of this is choose one approach to either fork all the patches, but don't mix both it will make your life How it sounds? easy in the retrospective, but you need to prevent such things from the very root and In the end we cleaned it up So we removed the fork completely and we now only patched botan because after careful review It turned out those 1k commits was just Many back and forth and not so well structured commit So it was not such a large code change after all but to figure this out This really took time decide for one approach and then go with it continuously don't do both That's not a good idea Of course for the upgraded open source components. We are then also needed to check are there new CVE's or other security fixes that we might need to integrate but if you have CVE tracking set up from the October project. You should get a nice report and this was then yeah Work but easy work absolutely doable and but do your CV tracking and we check everything there in the upgrade. I Want to now give for further concrete examples about changes that were In the updated open source components where we had issues this so One company the example would be lipcap in the new version and of lipcap We certainly saw that there was a system called for PR CTL PR cups at read to check the Bounding capability set and So as to figure out what maximum of capabilities can be gained by a process But we use this then in our OC I containers and This was a scissor that was not acceptable for us But as we know what value and we can return there as our system is quite static. We just returned that value in a hard-coded fashion to avoid the scissor and Patch lipcap in that way to avoid the scissor so that we don't need to configure it in our OCI container in the second configuration then we had problems with U-boot especially into the interworking of the upgrade of bin utils and We had problems with the elf relocation there because and But this was fixable with an upstream commit that we found But most of all we had made certain assumption about the memory layout that we ever get further U-boot and this was causing problems with the search of the device tree in the environment for you boot and And it took quite some debugging to find out that we in the end just have placed the device tree now differently And we need to upgrade the config environment size that defines where you would will search for those device tree Then another issue we had was GC seen it might not be really an issue but and we were using GCC version 7.3 before and As you might know in version 10 There was a change with respect to headers So headers that were not supposed to be included in other headers as by the C++ standard were removed in GCC 7.8 GCC 10 and this caused many built fray just in for custom applications on our side and While it would be easy you can say okay, I'll let our trust customers just upgrade their applications to include the correct headers You need to consider that we are building a platform here and that one customer Delivers it to the next one to the next one then to the next one and you hit maybe before they all don't talk to each other and it was In the end decided on our side that we will just patch those headers in again because we don't consider it a real risk So that we avoid that every custom application now get upgraded with this change because there's always a certain risk and it was there and we wanted to minimize that So the last concrete example I want to give today is about the Linux corner and so mentioned earlier that we're also using opti and Before I come to opti I want to come to another point excuse me So some major and minor device numbers have changed This is the upgraded corner and there we had problems as we are passing those along in the OCI manifest of our containers and we basically needed to ensure that this is consistent again, it was An issue that did not directly Was was not directly obvious And needed quite a bit of debugging But the fix in the end was okay The numbers need to be consistent as per the old version and what we had before or we need to upgrade the container configuration Whatever is most feasible and then the last point here I already started before was about the interworking with the tea driver So we patched the tea driver on our linux corner side with a Authentication mechanism specific for applications that we needed as power project environments and As our old kernel I mentioned before we had 2.4 So we had 4.14 before and the T interface has changed quite a bit since then and we had that problems this Specific adaptions and it did not apply anymore. This was the only real case where we needed some conception to rework and rework and we needed to rework our authentication mechanism there and Yeah, but this worked out at the end and the good news there is by upgrading the kernel and going over this modification again We realized that okay Well, we should take a deeper look and see how we are doing delta education mechanism and Now I'm aware more of this part of the code and I'm considering if we can upstream it in a feasible fashion and want to Bring it into a shape Yeah, maybe in the next few weeks where I can start some discussions there if it's also feasible upstream contribution So that's a good news if you upgrade your system You also learn new parts about it and there might be something that you should upstream or that you can reuse somewhere else So consider that as well in the process So this sums up my issue 5 and maybe a hopeful note Michel was the complete summary. Yeah So this is already our last slide. So here we have our learnings We have some goals we wanted to achieve and also how we can measure them So one important goal was that we have a company-wide awareness about this upgrading process because it's a lot of effort to do this You need a lot of people that work are developing in this upgrading and yeah, the positive effects are Notable noticeable in the company because everyone is happy about the upgrade now. So yeah Goal achieved Then more technical goal. So, yeah, think about your machines and distributions. They need to be a logical number So as Simone already said Delete not needed machines or distributions or merge them if possible Yeah, then I think also a goal for every project standardized syntax Yeah, so what we did we introduced syntax guideline all over the project now and also Use a linter to check all of this and yeah, that works out quite well now Yeah, we also wanted to optimize the layer split because yeah, we had too much layers We reduced the number of layers we assigned Sensible priorities to the layers so that it's more logical and Yeah, we have now the same layer construction all over all departments that are involved in the project Then about upgrading OSS components We could measure that we have now reduced the number of patches because we have a new version and a lot of stuff that we Need is already upstream So they're also the recommendation if you have something That you can upstream then try to upstream it because you will invest from it in the future and others as well So yeah, good thing. I think Then handle third party components. So consider the assumptions for integration of the components Yeah, this points out to the Stuff Simone told you about you good and the spin utils issue Yeah, and then our last goal was up-to-date software. I think everyone can agree and Yeah, how we want to measure it. So if we upgrade more often then we have up-to-date software and Yeah, with this I want to end our presentation So thanks for your attention and if you have any additional questions, then please feel free to ask or else you see our contact informations and If you have later any questions, then please feel also free. Thank you. Yes Bob, thanks also for my side. I wish you all a pleasant evening Of course, we are still here for answering questions and see you also later at the booth ground Thank you for great presentation So in this project, so it is there any Automated testing system system system system used Such as CI CD So the question is what testing system or what CI CD we use So the question is what a testing system or what's the ICD we use? Ah, yes Yeah, so we have Jenkins running Where we have integration tests and we have also a software test department that also tests all the components in the system and Yeah, the tests are running on chemo and also on hardware so we have We have an integration department some folks that are have added Raspberry Pi's to our target and Now you can automatically flash the target and then do testing Thank you very much. Is this answering your question? Is this answering your question? Okay, yeah, and as pointed out before we also have like unit tests and robot tests So we are doing different levels of testing So we are working there as per the remodel that is quite commonly used in the automotive industry as you know Yeah Thank you. Okay. Thank you for presentation. It was great. I have a question about the installer You mentioned that you have two different installer for each version. Can you explain what's the difference between them? Yeah? It's quickly explained. So one installer is responsible for the initial software load when the ECU is produced and then put into the car So there's an initial initial software load That's one flavor of our installer and the other one. That's the one that can be used at the workshops Okay, thank you Thank you very much for your talk. I'm you were talking about the myokta 2.4 and you Updated all the way to the latest one. I'm from here on in You said you upgrade more often Do you have a an idea of how how you're going to deal with the upgrades from your own end? So you mean how we deal with the upgrade those like do it. I don't know how we have a year or Yeah, I Understand your question and I could tell you now what I would prefer personally wise and I could tell you what will most likely Happen, I think it is now also clear to everyone customer management and so on that we need to upgrade And I would like to upgrade Yeah, maybe every half a year or whatever But I am not in the power to make this decision So I would say we will find hopefully a middle ground like from LTS to LTS Okay, thank you, and I think that would be feasible This would now conclude our session I would say if there are no further questions and I would like to thank you all for coming again and Have a nice rest of the conference. Thank you