 Hi everybody. Welcome for this new Jenkins infrastructure meeting today to the agenda. We're going to talk about report the Jenkins here that are the new digital ocean accounts, the workspace accounts, some ongoing work with the Jenkins that I owe and Docker with the arch. So let's start with the G frog thing. So last Friday we had a meeting with chief frog. Yeah, so just to go back. I'm not sure if I should. Yeah, I wrote some notes during the meeting so just to come back on what happens. During the last security release, we had issues to do the security release we had the issue to promote artifacts from the staging environment to the prediction environment. That thing hasn't been solved. What we discovered was we were using an under size database. So we had some like issues on the number of connection. So the only thing that she for did was to increase the size of the database so we could have more connection database connection. But the second thing that we noticed was, it took us one hour to copy the artifact instead of say zero seconds. So we engage with G frog supports. Sorry. So we engage with we engage with G frog supports and we had a session last week where we debug the biggest so the what we have to understand there is what we did was to just increase the level of looking on the service. And then side was a person who helped us on the last Friday we investigate on the site so right now we still don't understand what's happening. And why took one hour to copy the artifact, but we should hope to have a better understanding in the coming week. So that's the state. I'll put a link to the meeting notes here. It's my document. So I have it here. Any question sounds good. So yeah, the next step is once I have more information, we should plan another meeting with with them. And I hope we find a good solution. The next topic that I want to talk about that I'm pretty excited is digital ocean offered us to sponsor some machines so they. So I created digital ocean accounts last week, I invited a few people weren't interested to participate in it. And I mainly gave in in Damien, I think. So the plan is to provision an additional mirror and to provision a small communities cluster to use with the to use in our CI infrastructure. We're all looking for some help here to write the terraform code to provision those resources. I know that digital ocean has pretty good support with terraform. And so we would like to do it the correct way initially. So, yeah, if someone is interested to participate here. And that's definitely a place where you can help. Feel free. Feel free to reach out to Damien I think that you may have some time to work on it. I'm still related to the mirror. I started working on workspace to remove the machine so we have a machine running on workspace. To provision everything on the new Oracle environments the machine is running updating itself, which is great. We are almost ready to the provision, the one running on workspace. So the good thing is, I'm changing a little bit the way we push artifact to the service. Previously, we were doing a push approach for from a machine name package to Jenkins package to Jenkins that you would upload every artifact to the archive to Jenkins machine. And now the approach is a little bit different we just pull the only artifact available from a nursing server. And right now that server is the US and your. So that's the main difference. So the idea is, we are now able to just add more mirror very easily to the pool. And so we still have to decide where we deploy those new mirror but the idea is to have small mirrors and to have more of them in different region. The good thing with the one on Oracle is it's pretty cheap. I think it was something like $30 per month. I still have to see the cost for the network boundaries, but that's the current state there. If some people also have experience using Terraform and Oracle, that's also a place where we could help for the first machine, we just did manual provisioning just went to the user interface and configure everything. But for more machines, we would like to do a better setup so people can contribute to that environment using Terraform again. Is it right for all of you? Sounds great. The next step is to brief updates on CIDA Jenkins RIO. I think that Damian is ready to talk on it if he wants. So as a matter of fact, I would like to operate tomorrow ideally or Thursday to apply the change. The thing is that we are on a situation where the BUPET code has a lot of issues going on. Some have an impact on production. Others have an impact on the ability to locally test the code. And I would add to these two kind of issues the fact that we have time to production which is still slow down because we are still using the staging branch. That means if we try to deploy a new big change like this one on CIDA Jenkins RIO, our ability to iterate locally and efficiently to fix the issue could be put at risk. So my proposal is that in order to be sure outside the classic, on the hypothesis that the classical review process goes well that there is no hidden thing cooked by you folks when reviewing my pull request. And I would like to get rid of the staging branch tomorrow morning as a first mandatory step before going forward. Just to be sure that once we merge a pull request on the Jenkins infra repo, then it's merge and deploy to production immediately. So just on that one, I totally support that move because definitely the problem that we have here is when we want to do a change, we first have to merge the change to the staging branch and then we have to just create a PR from the staging branch to the prediction branch and it's only deployed in production. I had to look at it in the past we were testing every staging branch on Amazon so in the past we were provisioning Amazon infrastructure testing the deployment there and then put it on that infrastructure. This is something that we stopped doing a long time ago so right now we just don't use the staging I mean it's just a way to slow down as so I totally support that. I'm just not sure if you want to switch to the different branch prediction or just use a different branch. Maybe prediction since this year one. Yeah, one step after the other right now I want to remove staging from the equation, because changing the name of the production branch could have impact on the puppet master. So that should be a change when we if we upgrade or that should be separated change. So once we have that one I will feel totally okay and safe to go forward so if everyone is okay on the review and we can remove that I should be able to deploy the CI Jenkins IE agent configuration. So that that's the cool part and obviously there have been on something I want to mention here I've so we have merged today something on staging that will be deployed to production as well. There are issues on the puppet code that I cooked that might be there for a reason. But also, I think they are there by accident and I want to be sure is everyone that we are aware that that can put the stability of CI Jenkins IE at risk when I when I will, I will deploy this. That's why I want to group the operation on a single one. What happened is that CI Jenkins IE run on a container, which is running with the UID 999. I don't know why we don't use the default Jenkins user. And since the recent event I'm still wondering my assumption is because there is already a user thousand on the host machine, and we don't expect that user to be able to read Jenkins home. To be honest, I think the reason was was, I think initially it was not using a Docker image. I'm wondering if we were not using the open to package and it was using the UID 999. I mean, a long time ago that that that would be my the my guess here, which relates to the second issue. The default home user. And most of the puppet code is using a Jenkins home path, which is the path of the Jenkins home on the host virtual machine file system. So sometimes it's correct, but sometimes it's a parameter passed inside a container where the moon point is different. It's not valid Jenkins, but bar Jenkins over. That's why we have on CI Jenkins IO, at least on the puppet run some errors every time. Like, for instance, when trying to install plugins, the important CLI that was suffering a bunch of issues, including that one, because it was not using the correct path so unable to locate the Jenkins CLI jar file, because it was a not existing path. Also, the default user running the Jenkins controller on CI Jenkins IO, as a home directory that doesn't exist. Current it sets to var lib Jenkins because of that bug on the puppet code or. And I understand it comes from it was running with the package and when it was shifted to Docker container, it did not follow the Docker container precept. So I tried to fix this elements team and Olivia have reviewed that, but I might have forgotten or broken something, in particular with the, I tried to be careful on the user ID. I, I tested and reproduced with the vagrant box, where I make sure that they were different to ID between the default 1000 user on the machine, the 1000 of the user named Jenkins inside the container. So I was using a third one to have the same topology. It was working well, but still, we might have different behaviors. Also, that can have an impact on trusted, trusted as the same pattern except it's not 99 and it's something else, which is not 1000 which is not So both might have issues due to that change, but we have to fix this error. I mean, it's not acceptable that we have a perpetrator, but for our own prediction, they all are purely ignored, but we have to fix that. I think that that pair has been merged and I'm currently looking at this pair. Yes. And the upcoming pair with the CI casc also embeds the things that run the line or Olivier, which is, we use the common line on the host, a shell script named idiom potent CLI, which role is to run either install plugin on Jenkins, run a groovy script for the lockbox, or a safe restart Jenkins at the end. These are the three actions that I identified that that common run. How does it work? It does a Docker exec inside the container, locate the Jenkins CLI.jar and connect through the CLI to localhost to Jenkins to trigger the actions. So the safe restart and the groovy execution must be working. That's why I kept the existing behavior and I only fix the issue to reuse. And that's an idea from Olivier on the initial pull request. I've added a new challenge which is not merged yet. That will use the new Jenkins plugin CLI, which is part of the Docker image for the plugin installation. But I specify the Jenkins home as destination instead of user Jenkins ref. The reason being, what if the Jenkins process is down? For instance, gcask, read the file and some more during the startup sequence fails because there is a plugin which is not present. That's the case I had. In theory, Jenkins cask should still should fail the startup of Jenkins, but the API should still be available on localhost. However, some edge cases, it doesn't work. That means sometimes you can have Jenkins failing. The container is up. Jenkins prints a big error stack due to configs code because missing plugin. And what you want is the idempotent CLI to install the missing plugin because we forget that we had to mention this one. And then you are stuck unless you manually put the GPI plugin and restart, force restart Jenkins. So to avoid that edge case that Olivier don't give right. I'm using Jenkins plugins CLI because you only need the container to be running and it's completely independent from Jenkins. And it's a good use of that new tool. So that's also something that change. So there is a big, big bunch of changes and I would like to try them tomorrow morning. If all the process review is okay. Yep. Tomorrow morning is a little bit dangerous because it's the 2.289.3 LTS build day. Oh, it's tomorrow. Oh, I think it was next week. Okay. So, well, if you would be if if you or Tim would be willing to start that build early, early your time. So if you start, if you were to start your day and just to be sure that does not affect the release that's here environment. While it doesn't, while it does not. Yeah, if it if we affected trusted that would block the Docker containers. It just it makes me a little nervous if we could just get through the process of building 2.289.3. And it's really only about four hours so Damien if you were to launch the 2.289.3 build as you arrive at work. And then four hours later it's probably done. I would even then if we have to delay, I would prefer not spending my time tomorrow on that and delay to Thursday. One, one major task per day because I have, I can work on the terraform I can work on fixing was a part of John King's pet code so I don't mind. And then let's delay to Thursday. And also it will add more time to review the changes. If it's okay for everyone. That's great for me, Olivia, you're okay with that. Definitely. Okay, so let me just make a note there that will do that Thursday instead of Wednesday. Thursday, so I think I cover I'm sorry I took a bit of time but I wanted to underline these changes so everyone is aware that if it fails in the coming days, I'm there. I don't plan to go on holidays but important to inform everyone on that. So that's that's the first time the purpose of these kind of discussions. Just to highlight potential issues. The next topic is about so this one that's Mark you want to bring the docker multi arch progress. Did we make any progress it's last week and maybe some information here. So, so there I think the, the story is quite quite good we've had docker multi arch images on our roadmap for over a year. And now we've got nice progress thanks to Tim Jacome and to Damien Deportal on actually implementing multi arch support. So, thanks to Tim's work for implementing docker build x bake and Damien's work for implementing parallel testing, we've got a significantly faster build process for our Linux images. Both of them. There's more to do there and and more to more to continue and we've got work that will continue. Damien maybe you want to give more more color and more input more but that's it's good. And we've got that we've still got the proposal in play to switch those docker images for the August 28. We've got two dot 302.1 LTS release from using JDK eight to using JDK 11. I'll get a JEP submitted late this week that's proposing that and has a pretty detailed plan trying to describe what we need to do. The only hiccups here that we discovered today during today's weekly process. If that's the publication of these new images that we wanted to try to do during that weekly failed for little minor hiccups that but we have to reproduce and fix them. So right now we only build on and test we don't deploy publish this images for now. So Damien just to be clear then so today's two dot 304 will still publish a docker image, but it won't be multi arch. Exactly. Okay, great. Thank you. Okay, that's good. If I understand correctly the limitation is having access to the agent to run the multi arch. We have to check what is happening because the behavior we saw entrusted is different than what we saw on CI. So we have a team on high to double check that we use exactly the same agents with the same chemo because we use chemo mainly. We get the specific architectural images because this static agent are connected to see I right now. So we wanted to start the first version with chemo emulation to avoid requiring too much, let's say, exotic platform for now. Great. Thank you. Yeah, Damien thanks so much to you and to Tim, it's wonderful progress that's been made there. And we can still make more. I have one number. So the master build branch went to 2225 minutes before that all those changes and now it's between 12 and 15 minutes so we gain 10 minutes. That's a nice improvement. And if anyone is willing to help on the windows port. So I'm going to work on this in the coming weeks but this is mainly PowerShell scripting and everything is sequential. And that part is a different pattern. So we have a lot of gains. We I'm, I'm sure we can gain five or six minutes more by parallelizing all the tasks during the building test phases on windows. So anyone interested to help there. It's a huge improvement to me to be done. So that that improvement from 25 to 12 includes the windows improvements I thought we were stuck at 25 but but you've found a way even without doing the windows improvements to still cut in half the build time. Because the windows machine are taking 30 minutes, 12 to 15 minutes. That's the slowest part of the pipeline you can see clearly on the branches. While all those are Linux the slowest Linux or that based. And they took almost two minutes to build and around two minutes and a half test, which mean the slowest Linux part is six minutes. You'll see the gain we can steal out right so so if we can if we can make windows builds as efficient as Linux build we could be down to a six minute build process. That's excellent. Okay. Thank you that's all that I had on Docker multi arch. Look for I'll send. I'll announce the, the gap for review. It certainly needs lots of review. This is a, there were a lot of things I learned by trying to prepare the gap. That people will now tell me oh you misunderstood this or that. That sounds good. I think we cover all the topic for today's meeting. And you have to pick before we close anything. Thanks for your time then, and see you in RC. Thanks. Thank you.