 Welcome everyone, this is Jenkins Platform SIG meeting for today, May the 9th. Today we have with us Kenneth Salerno and Damien Duporta, thanks a lot for being here. On the agenda today we have a few open-action items. They are always the same month after month, but one day we'll sort them. We've got some work on Docker agents. Damien will tell us about that. And we have Kenneth with us. We'll talk a little bit about the PPC-64LE progress for the Docker images. And we'll also talk about the latest improvements on the Docker agent images. Hello, Kevin. Let me add you there. Thanks for coming. So for the open-action items, we still have some Docker images. We have to announce the end of support of the deprecation. Let's be precise. So of course the Blue Ocean container, which is not the Blue Ocean plugin. You know, it's a different approach, different effort. I've seen a release a day or two from now about Jenkins Blue Ocean plugin. So it's still maintained, even if we won't have any other new feature added to it. But the Blue Ocean container hasn't been maintained, hasn't evolved for a few years, I think, from now. So yes, it's definitely deprecated, but we have to announce it to the end users. And to do that, we'll have to find a way to announce, to communicate the deprecation to users. And so that's an administrative process. We're still not sure how we will address that. So you will hear from it in the coming months, I'm afraid. Damien, would you like us to switch to your current work about the code factorization on the agents? Yes. The goal is still to target on long-term run having a single repository for all agent images. The reason behind the rationale in that proposal is that all the agents have a common ground. The base image operation, especially when we need to update, let's say, Git package when you have a CVE, for instance, when you need to regularly update base images, et cetera. But also all the free agents have different use cases. So we have the Docker agent, which is the most basic image. And right now, the SS8 agent does not inherit from that image. But both of them need to have default Jenkins user within the ID of 1000. And that kind of, let's say, common ground for whatever kind of agent you run. And then the inbound agents, which needs the Docker agents to be released to pick the change as a base image to only add the shell script and change the entry points. But that adds a lot to the different processes there. So right now, the work that has started, I didn't know where to start. The first step is, before even trying merging the repositories, I'm trying to find code duplication inside these free repositories. Because on the free repositories, we have the same pattern. We have a top-level folder, which is named based on the major GDK version. So you have 11 and 17. And then we have code repetition inside these directories. We have exactly the same Docker file, except for the GDK version. So right now, the goal is to decrease the amount of code on each repository. So when we will merge, we will have exactly the same pattern with twice less code. So I'm currently working here. And this is draft status, but that allows me to get my own dirty with every details on both Windows and Linux. The next step I have in mind, that should not need any complicated discussion on the high level. That will more complicated discussion on the implementation proposed on the draft. But it's more review and testing, more than really high level discussion. That will be to start by merging in bound agent and agent. Because it will solve a problem we currently have today. We had to wait almost one month to be able to deliver a Kenet word about the PPC64. And the reason is not because of Kenet. It's because we had an issue on the inbound agents with the automatic updates that takes the latest base agent updates. We had issues on one of the images that was blocking the whole process. We weren't able to deliver. And now, not only we waste time when we have issues, even when everything goes properly, which is most of the time, we have to wait for a release of the agents and then the automatic update process to kick off, open a PR, pull request, and build again everything for one small change before a new release. So not only we waste time, we block potential contributors like we had or slow them down. And finally, when we have changes or drifts, the version are different between agents and inbound agents because they all have their own lifecycle. So my proposal targeting merging the code of agent and inbound agents since they already inherit from each other today, the goal will be to have one release lifecycle when there is either one change in the Dockerfile or one change on the inbound version for the remoting component of Jenkins. We create a new tag and both images are released at the same time with the same version. No more confusion for the users. And either it work on a pull request or it doesn't. But we don't have to split the work for both. These are the next steps. For the SSH agent, though, I propose that we wait to see if my proposal is accepted, it's valid and it's working. The reason is I don't want to plan ahead, especially when we'll have an uncomfortable discussion about the lifecycle because the SSH agent use semantic versioning while the inbound and the classical agent are using the default Jenkins plugin versioning system, which is buildnumber.comitid. And we have added a suffix with the package build number when we change something else than the main components. So we will have the decision to make as a community. And that one require discussing before pull requests to see if in the eventuality of we are able to technically merge, do we need one tag each? Do we need to synchronize the same ver? Do we need to synchronize the current version? Do we need to change one or the other? There is no obvious answer that will depend on the community feedback here. And I'm not sure I have an advice right now. Technically, that should be easy. If the inbound agent worked, then adding the SSH agent might have side effects. But globally, the code is almost the same. So that means the side effect will be us surfacing issues that were existing in sometimes, a discrepancy between images. So that will be on the benefit of the user. And if we communicate it well and we do it properly, we should be able to avoid this issue or fix them in the next day or so. So this is the plan. Right now, a preparatory work by merging code duplication and fixing the tooling. Then next major step, merging inbound agent with agent. Only that will be technically complicated. And if everything goes accordingly, then I will come back here and propose the next step with SSH. Is it clear? Does it make sense? And do you agree with this? These are my three questions for you, folks. To me, it's clear. And of course, I do agree. I don't know for you, Kenneth or Kevin. I think it eliminates a lot of redundancy. And I noticed it immediately that in the hierarchy, that it's essentially the same files. But really, we could have used variables to decide what we're going to pull in and what version of what and for which environment. And we actually already are using some control statements with if then for there was a decision that had to be made for one of the architectures. I think it was ARM that we couldn't use J-Link for ARM to compress the size of the JDK. So we already have some kind of control statements in there anyway. Absolutely. For the sake of, let's say, history, there were two reasons why it's as it's today. It's not because the code is bad or bad architectural choices were made. It's because, first, all these images started with the Docker automatic build for official images that require the Docker file to follow some conventions back a few years ago. That is not the case and not required anymore. But that's the reason of why, for instance, we have default version everywhere because each Docker file need to be built without any tooling around. It was the role of the Docker hub to automatically build it. That's not the case in three years now. So we can get rid of that first constraint that doesn't exist. But that explains why we had so much Docker files everywhere because the matrix of Docker file was a list of images on the Docker hub. Now, we have added it since at least one and a half that Docker build is there. And the control statement you mentioned, Kenneth, shows that we can work together the multidimensional matrix tags, architectures, base images, and now custom high-level images. So we know the tooling allows us to use Buildix and we push images directly to the Docker hub. So we don't need the constraint from Docker for official images. And my question to you is, are you going to eliminate the three projects then and just merge them into one project? On the long run, yes, that will be a good idea. That might need a proper way, a blue-green deployment, in that case, that will be creating a new repository, Docker agents, at least on the source code, yes, to your question. So how would you make a decision if you only wanted to build the SSH agent? Would you make that a parameter? Good question. I would delegate that decision to Buildix. The reason is because these images are not idempotent images. We install packages from the base distribution image. So between two builds, you might want to have the latest package that we installed. You don't want to fix the version. It's not a single binary image built with a tool like Bazel or something. Between two builds, I don't expect the exact same content from these Docker images. Because you might have one of the dependencies of the operating system that we installed for the SSH, you need to open SSH as well. So one of the dependencies here, I expect it to be up-to-date each time I rebuild it. So it might change even if I don't change the Docker file. That's why I don't mind rebuilding them all at once. I rely on a highly parallelized process from Buildix, at least for the Linux part, and partial parallelization using Docker Compose on Windows side. However, my proposal is if we see that slowing down the builds for contributors, then we could add a conditional build based on the change files. If the Docker file of SSH is changed on the pull request, you only build the SSH to avoid spending too much time building and change in the pull request. But for the main branch, what do you think to always keep them built? Yeah, I mean, certainly with the inbound agent, you're combining those two make a lot of sense with the base agent. But there's a lot of redundancy in what gets pulled down to build the SSH agent. Yep, absolutely. All the SSH installation should not be part, at least not the server part, should not be in the inbound agent. That's why using the combination of Buildix and multi-stage Docker files would help in that case. You would have only, for instance, let's take Debian for a given version of Remoting and GDK. You will have the base stage that gets the GDK and create the default UID Jenkins with 1,000 or whatever. Then you would have a stage which will be only the inbound agent that takes care of adding the script, setting the default entry point. And another stage in the same Docker file will be the SSH port. The second and the third will be built on top of the common that is the agent today. And the goal then will be to specify different build targets in the Buildix images. So you can say Docker Buildix build SSH targets. That will be described on the Docker bake file to point to say, OK, if you want SSH for Debian, then it's that Docker file with that target so that you'll build the parent image and SSH, but not the inbound in that case. Right. Does that make sense? Yes, because the inbound image becomes its own separate entity after it's built. So just because we refactor, rebuild, we base to, if you were to say, rebuild the SSH agent. While your inbound agent will be based off of an older version of the base, you shouldn't be forced to rebuild it if you just wanted to build the SSH agent. Yeah, that makes sense. So OK, I like it. And also, another compensating measure to fight against the behavior you describe will be to start with a regular schedule for releases. Like, I don't think that will kill or cause problems to release once a week. Every week, let's say Thursday, we have an automatic process like we do for the weekly core Jenkins release. The process will take the latest master branch of the whole set of agent, rebuild everything. And if everything old test pass, then it tags it and release this build version. So then that will be a regular schedule for end user. They know that they will have the latest package once a week. And all your changes will be released on a predefined schedule, unless if it's, let's say, an emergency like security issue. If you as a contributor have something merged, let's say, during the weekend, you know you have to wait until Thursday for the automatic release to deploy your change. That means some change might be slower, but it's deterministic and it's regular. So in terms of packaging, security scanning, and habits from the operators that consume the image, they know it's a regular schedule. Yeah. And what's the worst case a week, if you were to fall just before the rebuild? I would say worst case is if there is a security issue that can be exploited in the image from last week. But in that case, there is nothing forbidding to manually create a tag and trigger a new release outside of the regular schedule. All right. If it's a security issue, you can trigger a rebuild. Exactly. The weekly cron would be really, really, really simple. In my mind, in my proposal, that would be, OK, if the master branch passes on Thursday when I'm triggered, then I create a tag and no more. And then we will keep the current behavior that say, oh, if you have a new tag, then you will build an image with that tag and push it. What's the current release schedule for the controller image? Because I've noticed the war version and the checksum look to be like 2.3 something. And I know that we have 2.4 being built. And so I always manually specify that. When are we changing that in the? So for the controller, we follow the Jenkins score releases, which means you have two lines. You have the weekly lines, which is rebuilt. Each time we have a new weekly release every Tuesdays. Once it's released in terms of packages and war, then the images are rebuilt. So you can expect a change every Tuesday. So if you want to use the weekly line with Docker image, I recommend you to update every Wednesday. But I have to manually specify the version and the checksum myself. Really? No, you're not supposed to. No, no. If I pull down right now, if I do a git clone on the head of the Docker. Ah, on the source code. You mean, oh, that's different. Sorry. Source code, you have the master branch. How does it work? There is a script that takes care of building the tool, ensuring that the tool last weekly and the tool last LTS. Sorry. We don't have a mapping of the image with the Jenkins versions. The master branch is the latest. And the script takes care of rebuilding every version, specifying manually the version on the build.org like you probably are doing. Yes. You can, yeah, it's an environment variable. And you can specify the path or just the version, actually. And then you have to specify the checksum. Yes. I would not expect any kind of support by default on rebuilding yourself. Because the support which is made by the maintainer is on the build artifact pushed on the Docker hub. The source code is a different matter. Because otherwise, that will be really, really, really complicated to support all of the use cases. If you have to support all of the combinatorial way of building with a given Docker file, every ways to use it, that means then you will have to ensure that the Docker file work with Podman, with the other build container like KaniCo. And that one is already a challenge. Because then we need to validate it against all the building versions with or without the experimental features. Then we need to create, to test all the combinatories, meaning all JDK version of build.org is combined with all the Jenkins version with the checksums. That's an exploding combinatorie. And supporting that is really complicated. Anyway, the main constraint here, that's why you also don't want to go too deep, is for the controller right now, we are tied to using the same tag version for the Docker image and the Jenkins weekly. And if we need, for instance, that's the problem we had, we need to update Git version on the controller image. That's really a challenge because that will need to wait for the next weekly or the next LTS trigger. Right now they are bound, so you need to tag a new Jenkins release in order to release a new Docker image. For the last security issues, when they were exploitable, we added the Docker image suffix, but that was a temporary measure and it's not sustainable. That's a discussion about the lifecycle of the controller image version. The question is, shouldn't we use a package suffix or something like we do for the agents? It sounds to be working with the agent, but there were people that were against that. So the question is, do we consider a one-on-one mapping between Jenkins and the image? That might help on the topic you're on the line because if we change a different topic, then for a given version 1.0.0, I'm saying arbitrarily, or X, I, Z of the image, then you note that version has a fixed version of the Jenkins checksum and version. So you have a deterministic build even with a source code in that case. But that will change the way we build and manage the repository. Thank you, Damien. Sorry, I will have to interrupt you because I'm a timekeeper and it's super interesting, but we won't make it at 30 by the hour. Anyhow, we can always discuss that on community Jenkins.io on GitHub or in the next meeting. Thanks a lot for your questions, Kenneth. So if you don't mind, let's switch to what has been done and we'll also talk about what you've been doing, Kenneth, with the PPC64LE because your Docker agent, SSH agent and inbound agent, PRs have been merged. Congratulate that. And your control PR has not been yet merged. I don't know why it has been reviewed. Do you have any input info, Kenneth, about that? No, I'm not sure. I think we were waiting for a review, right, Damien? No. Sounds like yes. To be quite transparent, I focused on the agent now and then we had some delays, so I'm not sure what are we waiting for here? Yes, you're correct. Based on what team you fixed the issue team pointed. So yes, what you say is true. We need for the maintainer to review it and see if it's okay to merge it or not. At first sight, I don't have any objection, but I'm not the only maintainer that should check on this one. Okay, yeah, I think we're just waiting. I think it was Tim who was looking at it, right? Yep. Okay, so we just have to see when he's freed up. As far as the agent and SSH agent and inbound, I tested all three of those and the ones that were pushed to Docker have been there all working. Cool. Good news. Yeah, I did uncover something interesting about JDK 15 where they changed the V fork to PASX spawn for shell, J and I execution of shell to do an exec to copy the parent process and fork to the child. And under QEMU emulation, PASX spawn seems to be broken, but I could fix that with a JDK parameter, just to specify V fork, but that was my only catch, but that's, you know, nobody's gonna be emulating these images, but me. That makes me think I might bother you, Kenneth, on a plugin, the durable task plugin, because there is a side project named lib durable task that sounds like to be a go long binary. I understand it's that binary is used for persisting pipeline when there is a restart trigger. It's part of the whole pipeline workflow suite. And recently there were a discussion and a per merge. Someone added PPC 64e and arc support for that go long library, but there is no tests. The main reason is because it's a plugin and we don't have a PPC 64 machine on CI Jenkins. But on that area with the age on PPC 64, we could start with QEMU as a first line of testing, but I might need your expertise there, if you don't mind. But I wanted to ask you before, if it's okay, if I ask you directly on the pull request to see if we can help the plugin maintainer, are you okay with that? No, it's okay. Yeah, and I have a test environment where I actually run these containers through the emulation and I also have somebody who has real hardware, so. Cool, sure. And at the same, on our side, on the Jenkins infrastructure, we are still able, I think we should request the OSU OSL for a one or two PPC machine because it looks like they have some. So that could add also real hardware for permanent agent on CI Jenkins. We used to have one, so there is no reason we cannot find another one there. And that might help as well. I will let you know once we have a feedback from them. All right, go ahead, Inia. Those machines are available for us to use. Thanks for your help. Of course. Thanks a lot. Yeah, we won't have time to do all the subject, but there is one which is very important to me. It looks, at least to me, very important. That's a drop support for GDK8 for the SSH agent. At last, Damien, a few words about that, please. So yes, that one, we forgot to take this one back in September. So right now, if you were using this, then you have a problem. The main idea is that your controller should have the same GDK major version as your agent, at least for the GDK use for executing the agent process. If you have, and that's the case on the official Jenkins infrastructure, if you have needs to build a project, a development project on an agent with GDK8, there is nothing restraining you to have a GDK8 installation that you use with your Maven, Gradle and your project or whatever way, but it's still recommended to use another GDK for running the agent. And you can define which GDK is used for the agent process on your controller setup, either a Permanent or FMR agent. We use that since 1.5 and C agent in SIO. So if you have that problem, use the latest GDK11 or 17 depends on your controller, same major version, and you extend the Docker image and add your own GDK8. The change logs shows a draft example of a Docker file for this, if you have that use case. Other than that, there is no reason to keep using GDK8 since Jenkins use this GDK11 by default since September, 2022. Except if you want to get in trouble, I guess. So, oh, the time is finished. I'm afraid we'll have to wrap it up. The agenda will be compiled on the community.genkins.io we'll create about that. And the video should be available on YouTube from 24 to 48 hours. Thanks a lot for being here. Thanks a lot for your time. We'll see maybe each other 14 days from now. Have a good rest of the day. Bye-bye. Thanks. Thank you, bye-bye.