 And we will start our workshop with just a few people in the audience, but it doesn't matter. And it will be about fixing issues in your packages that fail to build on alternative architectures. So, let's start. The agenda is quite short. We will talk about the common issues that we saw in the past in the packages that we were building. We will let you know about the resources that are available for developers. Some general directions for architecture support because with the development it can change. Or with the projects. We did some work on CI for upstream projects. And, yeah, we have some recommendations. We can talk about improving our documentation and seeing some rather complex examples of how to work with, let's say, complex issues where something doesn't behave as expected. So for the common issues, the most common one is about the data in Dionys, which is the order of bytes or written memory. It depends on the architecture. Sometimes one of them, the most important byte is at the lowest address, and the other one is at the highest address. So this is the most common issue where projects written originally for Intel CPUs, they assume the little Indian format and without any change they can easily fail when running on big Indian platforms like big Indian power or its mainframe. The reasons are usually wrong typecasts when trying to get the smaller amount or the smaller value from a bigger one like typecasting from 4 bytes value to 1 bytes or from 64 bytes to 32 bytes. In the little Indian architecture it just works because it takes the lower bytes but it rates wrong for unexpected data when running on bigger big Indian architectures. And generally data formats for files or for storing information in general either contain the definition of the Indian is directly in the format or they expect one and they need some byte swapping or value swapping functionality in reading, writing these files. Another issue that's common is that the chart type in C language is by definition undecided and the implementation of the language can decide which will be the default one. So for Intel's default is signed and for all other architectures it's unsigned can lead to strange issues when functional returns minus 1 for Intel but it's all ones in the byte and when the chart value is unsigned it's 225 and instead of for example failing or returning from a function it does 25 cycles of some work. Another thing is about the size t type which fortunately went away when we dropped the 32-bit mainframes from building because it was defined there differently than all other architectures I know about. It was a problem with C++ code where some templates couldn't deal with the size t being unsigned long while some usually unsigned end so troubles there need some if-deaths or similar tricks in the code. Another common issue is using the atomic operations or working with atomic operations. There are various ways how to do it properly I think the best one is using the atomics from the CU and C++ languages. It's implemented in the like this or current versions of GCC so no problems no need to define any functions for that usually an assembler and usually just for U architecture so use the standards. Similar ease is always the clock counting or tick counting where some again per CPU architecture or architecture specific instructions are used to get these numbers and again it requires some code in assembler again often is missing for other architectures and if this assembler code is mandatory there is no work around like using the get time of day functions or side stuff and it just fails the mills. Other things are architecture definitions or checking what architecture is the code is being built. Are you using this in the source code? They not always list all the architectures we care about but yeah it's quite simple to add the missing ones but this should be meant as a warning whether there is some better way how to decide on which architecture it is without requiring really explicitly list of the architectures. In the source code one warning or additional warning there is no standard definition for the little indian power in the preprocessor defines it needs to check first for being power PC architecture and then decide whether the byte order is little indian or indian. This is a proper check for designing or defining the code fast for the ppc64 begin architecture and the ppc64 le which is the little indian version. Again general warning about any pieces written in assembler it goes against the portability of the code so be careful and do it just when it's really necessary. Hardware support I think is no problem overall along ARM, natural intel, power but there can be some differences on the mainframe because it's due to its heritage. So it's now over I think 50 years it's missing some more desktop related or PC related pieces or classes of hardware like USB, there is no Wi-Fi, some other pieces that are missing. And the question here is what to do about it whether to build let's say the unnecessary pieces, unnecessary libraries and make the packaging cleaner so there are less if arch in the spec files. I have less exceptions or whether you want to be more precise so you will even still building these packages. My opinion is rather to be cleaner with the packaging, reduce the number of exceptions and rather build a few more libraries. So this is still quite cheap even in the mainframe case so we can accommodate with a few more RPMs than if you deal with some BIOS if arch and potentially broken dependencies across the architectures. Yeah so I think that most about the common issues, the parallel debugging, it's a technique how to debug issues somewhat specific to an architecture, it means that you are doing the debugging process in parallel in a working situation and in a broken situation. And yeah when the results will differ then we have the wrong piece of code. Should I talk to the mic or? Before recording the first part. It will be really short and it's meant more as a work on session really doing some work so the talking part should be short and all the information is listed here. Yeah the next part is the resources. There is a general wiki page dealing with some architectures in Fedora which is pointers to all the architecture specific wiki pages that tell us some additional information. Every package in Fedora can access the machines provided by the Fedora infrastructure team. They are internal machines, they are a little bit R machines and both big and little end-game power machines so just we will look on with, I think it's using your SSH key from FAS and yeah if you are in the Packager group it will just let you in. You just try it so it works. So that's the easiest option to get access. And when dealing with the endianist problems, when something is failing for beginning architectures, usually in most cases fails for both power and security. So the easiest way is to debug on the power machine that's really easily accessible here. There is a need for a strength machine specifically just let me know and we have one guest on public internet where any maintainer also upstreams can walk in and can do their work. Yeah, they are ARC channels where to find us, you can ask questions there. If there is a need for longer term for some projects or you would like to integrate power machines into some CI or such stuff on the upstream level, there is also an option to get virtual machines from the open powerhouse that are on various places across the world. And I think either you can find them on the internet with the contacts or we are cooperating. We are the alternative architecture team located in Bernal, Czech Republic. We have cooperation with the University of Technology in Bernal that is hosting one of the powerhouse. So we can also offer this service. There are other ones. There is Unicamp and then there is OSU. Yeah, Unicamp, OSU, Oregon State University and maybe there is something else as well. Yeah, and for people wanting a really workstation based on alternative architecture, there is a good chance to order now the Talos workstation which will contain the Power 9 CPUs that should be generally available later this year. So it's not only for working on the alternative architectures but also it's focused on the privacy and security minded people with some additional features. Yeah, but it's probably the only alternative with reasonable performance to make it your primary workstation should be available later this year. Do we have any of those? Sorry? Does our Intra have any of those? The Talos 2. The Talos 2 don't think we will have any. We don't have any? I will have one. Oh, not you. It's still not released. Yeah. I thought they released it very recently. It's announced it's open for pre-orders so it's not physically available, you would have to pre-order it. Okay, I thought it's already released. Not yet and for the federal purposes we have good cooperation with IBM so they are providing us power machines directly from IBM for the various projects we have. I think we as federal don't need it. We don't need the Talos because it should be just a regular power-based machine following all the standards so it should not make a difference. So it's really warm. For the individuals who will be interested in having it. Yeah, so that's for the resources. Yeah, and now a slightly more interesting piece which is the CI. We are working now internally because it's an exercise now. Definitely there's a plan to make it public but what we saw is during the F-26 release cycle there were very big issues getting Firefox built with Firefox. The F-23 version, they made lots of changes there. The F-22 version was fine. There were some fixes from the federal Firefox maintainers so it was fine but the F-23 version really was broken for a very long time for most of the architectures including 32-bit arm, 64-bit arm, power, S-20, everything. So we started to think what we can do about improving the situation and our idea was to start doing some kind of CI or testing on the upstream level. And the problem or the issue here is that I don't know if most but definitely some of the upstream projects are not interested in supporting the alternative architectures we care about in Fedora or Reddit in general. We want to make the other thing but since they have these architectures as the lowest level or lowest tier in their supported architecture plan so we probably need to find or have our own infrastructure for checking these upstreams. So yeah, we started working on that and I think you can show some of the results here. It's now tracking a couple of projects where we know that we had some issues in the past. We started with DTC which is the device compiler just to learn about the Jenkins because we picked Jenkins as the central piece. You're just as a test project but the first real one was the 5-4-1. So the difficult part in 5-4-4 is that the Fedora packages still carry some patches that are not upstreamed. So you cannot take the upstream code from their markable repository and just try to build it. It will fail immediately or almost immediately so it needs some additional patches. That's why we split that into two steps that depend on each other. First is to prepare the Fedora sources that are patched with Fedora-specific patches. As you can see with the blue ball it's succeeding so we're able to apply the particular Fedora patches. If it would fail, it would mean that the Fedora patches would be databases or maybe dropped. So when this task would fail, it will mean that we can do something or we need to do something to get Fedora sources first. Once we have the source code for Firefox, we then do the builds. One build is with some default configuration which is failing in general but succeeding at least for Intel which is a good sign. So at least one architecture is succeeding and there are some common issues. Some of them are already reported being worked on by the Firefox upstream. So sometimes it should change by using directly on the Firefox code. So Firefox-Fedora task is a much more closer to the Firefox configuration in Fedora. It's using some libraries from the system so it's really more following the style of how Firefox is built in Fedora and I guess it's failing for all architectures. It's really taking the latest upstream code, I think even newer than the next Firefox version which will be 56. I see there is even a branch for 57 so it's really the master branch, the latest stuff that's in Firefox upstream. So yeah, it's failing. The Jenkins thing is able to put some nodes to the failed build so for it to put some reasons why the builds are failing. So it's easier to track these things when you can just check the differences between the known failure and the last failure. So if it is pulling the master branch, what spec file are you going to use? It's not using spec files, it's using the upstream instructions like configure, make, install, this kind of work, it's not working with spec files. So how about the dependencies then? It needs all dependencies installed locally on the builder or also on the slave. So it's quite a manual action to get it ready. So it's not that smart enough to figure out everything and do the build. Yeah, that really needs a prepared machine with the dependencies pre-installed and just doing the build. So any plans of automating it so that it can probably use the... In Fedora side probably can use the raw header spec file. Yes, it could be the next step and I think we would have to talk about the tools that are available to do that. Because yeah, this is really doing the builds natively without any mock or any kind of choice, virtual machines and things like that. It's a simple setup so definitely there are some possibilities to make it more smart and maybe use the spec files or really the stuff directly from Fedora. I think it would be a good example to use the rebasal tool that's available for the Fedora packages where it really automatically creates patches for the current state of this git or the packages in this git to do the latest release version. Here can be also a problem that this CI works with the master branch of upstream. That's not released yet so it's really directly talking to the git repositories or to the circular repository. So yeah, let's say it's open question how to improve that and how to make it closer to the Fedora packages but the question also is whether we want to make it closer to Fedora or closer to the upstream. So it's also a question whether to really to follow the upstream development or Fedora packaging because Fedora packages are based on. Definitely when you pull it in like not Fedora or any other pistol, definitely they will add some patches, right? Yeah, that ideally all the stuff is in upstreams in Fedora, the patches are quite rare and as we can see for example on the Open Blast mass scientific library that's being built in Fedora and this is built directly from their git on Github. So all the stuff is already in upstream so if the upstreams are really cooperative and accepting patches from downstreams or from the alternative architecture, people across the distributions it can really work or these checks can really work on the highest upstream level where the upstream is really up to date with regard to the alternative architectures. Yeah, so Open Blast is one library on the statement directly from upstreams, a live DAP4 is another one and I think with the DAP library using DCI it found one issue that was not visible in the Fedora packages because Fedora builds cannot use. Network connection so there was a small number of tests or in upstream for this library there are tests that need network connection and this test was failing. It failed for little and didn't power because there was a wrong check for block size of some things, some caches and it used the block size from the XFS fan system which is 64 kilobytes on power machines for whatever reason. And not expected for kilobytes so I spoke with the upstream guy and we fixed that and now it's all fine. Any question Mike? Well no, just to comment, the reason they made it that size has been a batch of page size. Sure, yeah. But yeah, we have seen that be a problem. Yeah, so that definitely was a reason to make it this way but the upstream code was never run in this configuration. So it failed, we fixed that with upstream and now it's real as well. Image magic is one of the recent things that got broken. So we are trying to improve that here or at least monitor to stop here. Okay. So yeah, as I mentioned the plan is to move all these things to public so now the discussions will be how to share resources with some general CI federal infrastructure, how to share that with upstream, share with other issues with anyone but definitely when it will be public it will be much easier for anyone to check for the results. The good thing about integration in Jenkins with the source code management tools is that it allows you to see even the direct changes in the upstream repositories so it's then easy to see what changed. Yeah, well that's the reason why the Jenkins tool exists. Yeah, so that's it about the CI and about the plans and yeah. So other stuff. Yeah, here. Protections should be taking all the packages that failed to build alternative architectures. Our opinion is that excluding the packages is really the last resort solution. We would prefer when we would be conducted first for our opinion about this stuff unless it's clear or clear that the package doesn't make sense to be built on the particular architecture. But sometimes we can do just exclude the failing test like we did for the image magic issues we have. So yeah, really, we would prefer not to exclude the packages if possible. Yeah, but it depends on the maintainer. It's their decision. Yeah, if you are looking for help where you are looking at, shall we improve our documentation? So that's our question to us. And yeah, the last one here is about the stuff that's changing during the time. And it reflects the state of the upstreams or can it reflect the state of the upstreams, the various projects, like the current development on the goal language compiler, where the PPC system or big NDN part is not on par with the little NDN one and because of the missing functionality, the goal language package maintainers decided to drop the big NDN power from the supported architecture. I don't know if they make it for F-26, but definitely in F-27 there are no packages built on the PPC-664 or if they carry some ongoing piece or if they follow the goal language guidelines and use the goal arches background. So that's changed recently. It can happen in the future as well. It really depends on the state of the upstreams. How they support the other of the architectures and that's all. What else is here? Sydney has a nice example of a complex issue that she is debugging right now. So she can give you an overview of what she is doing. And I think that's all from me. After that we can do some open discussions. And yeah, we can discuss also some arrange related pieces that one might be interested in. So yeah, I'll put it off here. This is basically an example showing how deep a problem can be. The answer for the fix could be a single line, but while debugging it may go deep down to multiple projects for debugging. So recently I was trying to build a federal cloud atomic image on ARM V7, which is ARM itself. And while doing that, there is a cross step where anaconda installation happened to create the image. And there was an issue which was in the storage configuration. The error you can see it says that volume group given physical extent size 4 megabyte. But it must be 1 of 1024. By looking into error message, it was not quite obvious. Can you just raise the context please? Sure. So by looking into error it's not much obvious what is the problem. And it works fine on other arches. So I thought probably it might be at a specific issue or maybe something else you don't know in the beginning. So I will tell you what approach I took. Since it was a problem in anaconda installation, so I opened the anaconda source code and I looked for the error message which could be the text stream match, which could be volume group given physical extent. Something like that. And I found that in anaconda file called kickstart.py. So this is the first step. So you can see that there is a line called volume group device that gets supported PLE sizes. So it basically calls and gets the physical supported sizes here. And it's trying to basically 1, 0, 2, 4 bits. And the anaconda needs the physical extent size of 4, expecting the default which is used as 4 megabytes. So what I did, I opened the Python interpreter and tried to reproduce it on the machine and tried to reproduce on ARM and both XLSC64 to see what basically the results is here. So what I saw that the result obtained is on ARM HFP was just 1, 0, 2, 4 bits. It basically returns a list of sizes starting from 1, 0, 2, 4 bits to 16 gigabits on other ARCHES, XLSC64 or ARCH64 or Power ARCHES. So it seems to me like it's not getting expected results because it's only 1, 0, 2, 4 bits returning on ARM. Or it might be that that's the only supported physical extent size on ARM. So I wasn't sure at that time. So what I did was later I went, I have to go detour that from where exactly this information is coming. So basically, the second step, which is it basically calls from, it's basically called from Blewitt package. So I went there, so there is also just there making call to blockdev.lvm.getSupportedP sizes. So this is the function which they call. So obviously here is not the definition of what exactly it's getting returned. So next step was it's basically coming from blockdev module. And so I need to look there. It might be the definition is there. So I took a guess that where might be this module because it's not a Python module. So there is a live blockdev package. And that package is suppressing here in C. So obviously it's not now in Python. So now you can see that the code flow is going from Python to C. So it was basically a live blockdev provides us some binding, some functions which can be used in other languages. So in Python they were using those functions. The way of using those functions basically in live blockdev is a page. There is a file which gets generated called blockdev.2.0.typelib. So typelib is a kind of another form file format which basically can be read by gobject introspection which is another project. So it's a binary format which is read by them. So you can basically get back what exactly is defined in this typelib file. And there is a binary given by gobject introspection which is called this gr generate which will generate an XML file from the typelib. So when I open the XML file you will see that this is the function name. So lvm getSupportedPais sizes which you can see there in the Python, the blevid package it's calling the lvm.getSupportedPais sizes. So that's the function which Python should be calling. And it's basically mapped in the scene function as bt.lvm.getSupportedPais sizes. And it says array zero terminated equal to one. So I will tell you later. So these are the things which I understood while debugging the code. That's how they are sending back the data from Python to C to Python. This was not just standard C Python used. It was like it uses the gi and it gets understanding from how it reads from the typelib, etc. And yeah, the return type of the function is u in 64 which is like 8 by 2 sizes. So later what I did was I invented this function in C which is bt.lvm.getSupportedPais sizes. I did the debug and print statement there to see what basically it's returning the data on ARM and on other Arches as well. So I took x8664 as the example. So in both Arches this function was basically returning the right data. And this returns exactly the same list which is 1024 bit to 16 gigabit. So basically it's the multiple of two from 1024 bit to next will be 2048 bit, etc. So to 16 gigabit. So now I know that it's not a problem. It's not something that ARM v7 doesn't support other physical extent sizes other than 1024 bit. It basically supports but there is some problem happening while returning the data from C to Python back. Which is basically happening only on ARM v7 or maybe some other 32 bit Arches as well. Because I'm only debugging on ARM v7 and there are i686 so I'm not using that for now. So I can just say that it might be 32 bit issue. Okay, so now I know that we blocked the written right data. Now I have to know how exactly it communicates. The Python and C both communicates to each other. So there is another project which is called PyG Object 3. Which basically has a function called, if I call and this function interestingly at the runtime, it gets the function name and calls the function with the matching function name. It will probably look into the tightly information and it will call the function if it matches. And it makes call to the label of the defined function. And so yeah, so most of the places it's fine. And now still I haven't been, went to where exactly it's returning data. So there was another, there was another PyG Object 3 which has, which basically returns data to, data which can be returned to Python. Which in the function looks like this. This is the function which is going to start PyObject star and just call this truepy array. So there are various other functions as well, but this is what it gets to turn in our case. So this is the standard way of returning the data to Python. It's like PyObject star and it will map back to the Python, to the Python data types. Which whatever data types and gets out of there, understandings are there. So yeah, I reached at this point. So you can see that I have to go into various projects to understand where the problem could be. I'm still struggling. I know that, I know the flow and where kind of the problem. But still what exactly is the problem is still in progress. I hope I will get answers soon. So far from my experience it seems like the problem is the type being used. Like also they have various data types used which is like depending upon arches. Like unsigned long, when I do the size of on arc64 or x86 or any other 64-bit arches, it gives me 8-byte of size. But on ARM v7 it gives me 4-byte of sizes. So there are a lot of data types being used which is basically actually you defined in Glib to library which has the definitions which uses other primitive standard data types and doing the type divs. So there are basically a lot of type divs and some are architecture specific which might change. So I am looking into various possible options. The fix could be a single line but debugging may take you from one project to a lot of other projects which it depends. This was my current experience with debugging this issue. I just wanted to share the information how you can debug various issues which might be an arc specific. There is an Indian issue and there might be the 32 or 64-bit arc specific issues too. There might be an issue which is only happening on just one architecture. So essentially we had like PPC64 available due to some changes added in the PPC64 in G0C. So I think it is already for... Is this bug fixed already? No, I said I am working on it. The recent problem I encountered and it is still taking me time because I have to understand how exactly these all flows are happening. Never debug this kind of issue before. I think that we already fixed all the simple ones. Another good and interesting thing is that you cannot use GDB or something which can give you a trace from Python to C function call. So you just only can debug Python and then if you are going to C you need to use... You don't know basically how exactly the flow works so GDB doesn't give me the results. So it is really like going and doing print statement or something like that. Whatever understanding is there to use that. So thanks a lot really. I think it is a really nice example of a complex issue that needs to be developed. It might be worth to try it on a second with Intel because it is more likely to... Yeah. A long type should behave the same so there might be some wrong definition in between... Or the interface between the C library and the Python library where they would understand correctly all the types and would really return the one value for the second with Intel. So I think it really showed the flow of all the things of projects and all the stuff that needs to be debugged. So I think most of the workshop... We were expecting to get some issues if people have... Really the idea was that some people are having some issues so do you have something to show or do you want to really help with? Unfortunately not many people came here so we cannot help them. There is one thing that I think is pretty good because I want to publicly say to Jerry James who is a federal maintainer for a long time who did some fixes for picking general issues in various strategies even when our initial discussion was okay upstream doesn't take care about, for example, mainframe. They just ignore that. He was leaning to putting an extra charge and I don't know what week later I saw that from in federal leakage. It's been that he goes to fix so he is a really good guy in fixing, say, obscure issues related to the beginning architecture. We will find some way to thank him more publicly. It really shows a nice part of doing collaboratively on the distribution. It's a really hard task to fix it, maybe he likes challenges. Or he doesn't like when packages are failing. But yeah, I think our goal in general is really to make all packages available on all architectures if it's technically possible. Anyone can pick his favorite architecture. There are reasons why people are still using mainframe. We have some time. I have to ask, is Fedora actually running on any big Indian architecture? Yeah, it's running on mainframes. It's running on the architecture. It's still big in the end and I don't think it will change. Any specific examples? Which architecture? Is it mainframe? Which mainframe? There is only one, it's IBM mainframe. Oh, okay. The system Z or Z-Systems? It's the C&DX, the shortcut and UPC-64, the beginning power, the original power. It's still still big in the end, Fedora. I think it's going well now with the C&DX, okay? We are working on this a little bit, going forward. It depends what you define by support. The little I did will be the priority. And we will try to keep the beginning running if it's possible. Oh, okay. I thought that we are totally trying to remove it. No, we are not actively trying to remove it, but there will be time where we will be.