 Hello everyone, welcome to the Jenkins Infrastructure Weekly Team Meeting. Today we are on the 24th of October 2023. Around the table we got myself Damage Portal, Stefan Merle, Kevin Martins. Hervé, I'm adding her. Oh, he will add himself. He should join us in a few minutes. I assume the kiddo don't want to go sleep. Mark Wait is not there today. Stefan is there. Bruno is not there. And Kevin is with us. Hello everyone. Let's get started with announcements. The weekly release 2.429 is out, at least the war file and the packages. Just a word on the docker image. We have delayed a few hours the creation of the tag because there has been some regression during last LTS on the container image tags that are provided to un user. There has been a main change on the latest LTS on last week weekly. The default image, if you don't specify your GDK in the tag, switch from GDK 11 to GDK 17. That was written in the changelog as far as I can tell. But in the process, we accidentally forgot about keeping the tags with a suffix dash GDK 17. By changing from the default image to GDK 17, that's a consequence. Hervé has just issued a pull request that should fix that. And we will test that pull request on 2 days weekly. We delayed a few hours because we want that pull request to be reviewed, approved, merged, verified, before triggering the release. Then, that's information. We will work on backporting the missing tags on last week docker image. The methods I propose to use on the issue is to use copyo, which is a tool that allows to copy an existing docker image with a certain tag to another tag. The goal will be for us to avoid as much as possible to rebuild the LTS docker images. And instead, we copy the existing. Because these images already exist as the short tags. So that should be easy to copy them bit by bit. Is that clear? Does it need clarification? Kevin, I haven't followed. I'm sorry. Do you use the changelog and documentation update for that weekly? Merged and published. So it's live on the site. You're really efficient, thanks. Changelog merged. That's really cool. Let me take notes. Delay a few hours to add back the missing container tags. Do you have other announcements, folks? Just really quickly, Denon. So the tags note was also added to the 414.3 LTS changelog. So that's also there, thanks to EVE earlier today. So that's also been published. Cool. Let me write this down. Missing Tdk 17 container tags. So LTS changelog 4.4.3 updated to warn users. Fix for two days weekly. See above. We plan to back port missing tags using binary copy with scope.io to avoid rebuild. Accidental overrides. Let me take notes of that. Thanks, folks, for the work on this one. Can't be done. Quick questions. It can't be done with Docker only. Docker... You could, but that would be way way way slower because you would have to pull the original image to a short tag, tag it locally and push the new reference. But yes, it could be done. Scope.io has that specific feature for that. But yeah, that could be. Another announcement. Remoting 3176 version and 3181 had a bug when using direct connection no web socket no TCP discovery. CI, Jenkins, IU was... So the remoting protocol the system used to have inbound agents when connecting as inbound connection to HTTP or TCP without the discovery was suffering from a bug that was a side consequence of a huge work that basiled it on the remoting. That was just a minor consequence because it's an edge case that we have on the Jenkins infra. So as soon as we detected that problem a fix has been found and published. But no user facing problem. We were impacted as administrator but no user pipeline were failing because in fact it was just CI, Jenkins, IU during a few hours unable to spin up agents. As soon as we rolled back to the change after two hours all the pipelines were there. So no user facing impact. A new version of the remoting fixed the issue rolling it out to CI, Jenkins, IU. So the validated on a manual and simple test that it was working we weren't able to reproduce the issue we had with the two previous versions. And now we are rolling out that new remoting version on every agent template we have and we'll see if on large scale it keep working as expected which it showed. So thanks everyone involved on this one. I believe the change log for the remoting has been updated for this. I haven't checked. Arvet, do you remember if it was or your music there? If it was what? If the bug was reported on the release notes. Not this one but I've put a note in the Docker agent. I wasn't sure if Docker agent were concerned directly. So I just put a notice on the Docker engine with the three of them. One seven six and two eighties. Cool. I believe Docker agents could have the same issue though. It's just it's way more visible on the inbound agent because it's used directly by Kubernetes and other plugins. But that could be a case when using the Docker plugin for the Docker agent base image. But no emergency because there is a new version with the fix. So we can do it afterwards. There is no need to rush it. But yes. Thanks for that Arvet. Do you have other announcements questions or clarifications needed on these topics? Nope. So next week we will have a new weekly as usual. Next week I've already forgotten when is the next LTS in 18 weeks by default. 17 since it's already been one week. I think it's November 15th that means? November Thanks. So we have plenty of time ahead of us before that new LTS. Do you have a release candidate if you want to write that? It's the first of November, the release candidate. I'll say the first of November. So next week, cool. Thanks. Yes, 15 November for the release. Tomorrow we will have a security advisory. Plugins only as far as I can tell. Plugins only. So tomorrow we will need to be sure that we don't break trusted CI Jenkins.io because it will be required for generating the update center with the new version once the security advisory is published and to publish and deploy Jenkins.io website which mean survey, I'm sorry but the operation for cleaning up Jenkins.io will have to wait Thursday. But you can take backups tomorrow and prepare the operation. Next major event, we have DevOps world soon. We had one last week. I don't remember, so let me open last week. I'm not sure what would be the source of truth for this so I could prepare. So let's just write them and continue if it's okay for everyone. You also have to talk about the election voter registration which end 5 November. Yes. Good point. Election nomination is I think it's the 27 of October. The end of the nominations. Nomination and soon 27 October. Voter registration. Voter registration. So have you have you done the three of you nominations and voter registrations? I think so. I'm registered. Same here. I haven't done any nomination. So please nominate candidate for the different seats. Please look at the blog post for more information on how to do that et don't forget if you have contributed on Jenkins. I don't remember, is it the past month, past year, I don't remember the threshold but if you have contributed recently on Jenkins, please register to vote. Quick question Kevin. Is there an article about the election nomination in Jenkins and additional question. So we have something on the carousel. Yeah, okay. And then and then just as a note there's one person nominated for each seat at this point in time so if there's no other nominations that come in the week they're most likely won't be a vote because there's no point in voting for a single person so it could follow the same pattern as last year where we didn't have the vote because there weren't enough nominations. Okay. Thanks for the explanation. Okay. Do you have other announcements for upcoming calendar event folks? Yes, there will be a day off in France on Wednesday. Yep. Team availabilities So Thirst on 11 of November No French Jenkins SRE One of them is also a day off for me but only one of them in Belgium it's a bit different So expect a bit less availability from the team So good point please when we will take items on the upcoming milestone be sure that you select carefully the task you want to work on taking in account your own days off. Cool. Thanks. Other announcements calendars? Stefan, I swear next week you will, I know you will be off but in two weeks you will take the meeting lead. Ah You're thinking about everything I'm forgetting so that's perfect. Okay, maybe we'll take the day off in two weeks. So the task we were able to finish during the past milestone I think Robert lost his access to the credential plugin Yes It was changed through the RPU and settings. Perfect. Thanks Baptiste for taking care of this one Github permission for developers so as usual that has been created by the bots New component has been created on the Jira project for coverage plugin so thanks for the person who did that let me give credits where it's due Okay the bot was able to create the component so thanks Alex for looking on this There's been another Jira component but that one was archived because not needed the developer moved their issue tracking from Jira to their Github issue tracker for the plugin so we archived the associated component that was used to categorize the issues on the Jenkins Wrong plugin redirect we fixed thanks for from a user report on Jenkins website we fixed a few wrong directions on the wiki system that were configuration minor errors we located a seven like this including the one that they reported it has been fixed and deployed it was a minor configuration error on NGX there was a slash that should have been removed so thanks for the user reporting that we had the user trying to create a new Jenkins community account I don't remember what was the hands okay their account was already created so I triggered a password reset since they should have access to the email which was the proper email they should be able to follow up I went out from them so I assume it's okay I tried to contact them by private but never had an answer so I closed the issue we don't have any more any AWS S3 buckets storing Terraform states Terraform states are a critical part of our Terraform setup you can see that as a shared database of what resources are managed or used to be managed by Terraform on each of our cloud system we have one couple of these states per project we separate them physically to avoid no problem if one is failing and three of our project we are using S3 as we want to decrease the S3 builds and try to get outside the cloud base account we move these states to Azure object storage which is the same as S3 which is the same properties and we were already using this for all the other projects so that was a quick migration I have to admit that the Terraform documentation was really helpful and it was just a single command so that worked very well Any question ? Nope, next task Pluginside build commonly fails on infra-ci so that one was a long due issue and since the team was able to correct the OM kill of the pluginside backend that was a Java application there was a fix one or two weeks ago I don't remember when exactly but after one week we haven't seen any more 502 errors so that that was the root cause of that problem so we were able to close that issue so thanks everyone for the work here now let's go on the work in progress I'm trying to introduce a bit of change so I will treat the issues in the order of priority so update the top priority for us this week or nowadays Stéphane Hervé can you give us a quick heads up so last week we said you had to transfer knowledge from one to the other so Stéphane is taking the lead on the work now so we want to Yes it's a travel Yes of course what I did I did some timing for the Ersing and Sink and at the end I had some parallelization first naive with the epilouette and then with the GNU parallel command and it seems very powerful and the synchronization is taking less than a minute but we need to check on time with the real process every three minutes but it seems really really good Next is dealing with the triggering of mirror of bits Sink I need to read more docs and try to find how it's working Hervé what do you think was that good enough Yeah you can Another new pair that Daniel just opened it will stop generating latest in every dynamic order so a little bit less different knife or not or avoid and I think we are almost ready to try again our life experiment on the current data job using or if Daniel we need to pair with Daniel yes but first is mirror bit if I'm not mistaking so from Daniel to remove the latest less data to copy Thank you I mean looks really good I forgot you mentioned another element or was the pair from Daniel the latest sorry I was taking notes and trying to listen to much things We are mentioning mirror bits Stefan I did mention mirror bit that we need to trigger the check of the sync Oh the time at the end it's not the time it's triggering the check of mirror bits syncing the notes if I understood correctly what we need to do requires a bit of Kubernetes Exactly except reproducing the issue locally Did I capture what you said on a written it was way more way better because you spoke about the sim links the referencing that I forgot to tell so it's best than what I said so as a reminder updates was taking sometimes to be copied while it's the role of the crawler job so that's why we removed it with a survey and on my side I was able to reproduce the error locally that we have on the new mirror bits for updates I think I might find the issue but I need to spend a bit more time alone which I won't be able to do until tomorrow I need to stay focused one hour and that time was hidden by the remote bug I'm sure we will be able to have something to test before next milestone I said that 3 miles don't a go Anything else to add on that topic Cool great job folks Are we Are you able to talk and explain I don't see anything I'm kicked out so me is completely buggy I only see my face We can hear you Are you able to give us I reconnect Ok so I'm gonna delay the IRM64 report by your way So it's not my priority anymore that's too bad Pelnet mirror It's still disabled I wasn't able to reproduce can't reproduce the reporters network problem so I was able to obtain the class of IPs for the reporter which is on French ISP and I've asked people walking on that company to test on the same IP range and they are not blocked So whether their specific public IP is blocked which means they might be doing something words that led Belnet to block them or they have an issue on the whole firewall blocking the outbound request and right now we can't be sure of one or the other So the proposal is that I'm gonna contact directly Belnet so the person was not answering on the mailing list but there is an official email on the mirror system so I need to contact them the person I know walking for Belnet don't know that area they work for other departments and they don't know who will be the person of contact so they told me to use that email let's contact mirror admin for the public support email hard to tell because that would be a shame if we had to keep that one disabled or remove it so IRM64 mirror static link from getjenkin oh sorry are you back I'm back okay time for you to give us a report on IRM64 then last week I've finished the incremental publisher and private engines ingress migration to IRM64 the most sensible one was private engines public engines sorry because it served every public facing service we have on this cluster so quite a big list it went very well no service interruption no test and same for incremental publisher I've then worked on the publishing image as IRM64 for the other repository and prepared re-migration Kubernetes config migration the remaining one which are difficult are uplink which I think need complete revamp modernizing before we can migrate it I propose to postpone this migration until it's modernized it's using node 9 node 10 and yeah a lot of things are working as expected and we need to rework the build process and upgrade everything before doing anything else question you mention node 9 node 10 is it because there is no IRM image for node 9 and node 10 no I haven't looked at that because I wanted to get it running before trying to build ARM64 I didn't manage to do that ok because I believe uplink is updated some time to time I don't remember when was the last update to be quite honest but it's not that far away the last commit on this proposal is from 19 the last update of the docker image that's different because even if you don't change the code, the application itself can change its rebuild especially when you have docker image installing packages on latest versions do you mind double checking because if it builds even if it's old code and old node yes if we can have an ARM64 that will be too separated problem ok and next is plugin site API repository which is building its docker image quite technically it's not using more pipeline so it needs more works to be able to build a I need to understand how to build an ARM image on the current build process ok is it plugin site ok it has been updated at least 7 times during the past 2 weeks especially to fix the om kill so that should be quite easy to migrate to ARM64 because it is the same pattern as other Java application that we already migrated we don't build it the same way we build it it's not because it's the same way we use to do on other image but it's not working today and it will continue to work so that's why we should be able to do it the reason I'm mentioning it is because I had to build it locally 2 weeks ago on my macOS machine which is ARM so I know for a fact that it work locally I don't say I agree with what you said but ok next candidate and that mean you need to spend sometimes on this method am I understanding it correctly yes cool and I have other services using an image we don't build directly so they have the ARM architecture already available so it's just a Kubernetes configuration ok third manager data doc ACME there is nothing to do in ACME because it's not deployment per se using CRD from data doc we are thinking the way director but it's only an ingress there is no pod no pod what I have it's just a second on mirror bits I've also updated mirror bits oh yes open the public as a mirror bits I haven't touch a workaround update center of a concept because we are working on it and it doesn't have to be it's not in code yet but I can update it already I propose you you go for it ingress is broken on this one is that ok for you Stefan yes lastly I was wondering about also migrating quickly that's the intention but it's not a priority I haven't say that I just say that wasn't on the initial plan of that wave but we can absolutely start discussing that and start planning ahead so I'm writing next so I don't infer any kind of priority or choice is that ok for everyone that mean you can start working on it on the upcoming milestone if you have started the work on all the others and waiting for review or waiting for builds so you mentioned weekly.ci.junkins.io we we build the custom docker image which is a Jenkins official core image where we install plugins we know that Jenkins work on IRM 64 since a lot of time so that one is a good candidate to start building our own image and see how it behave in production especially because we don't have remoting for this one we don't have agent or executors so that should be a really good first steps so I think your proposal makes sense Stefan I remember that we say we wanted three waves of migration the first wave was the easy one where we just had to change the enginex image then the second wave was ok then let's move our custom docker image the one that Eva took over this week so on my side I vote for if we can get started on building the image and starting to deploy weekly.ci.junkins.io don't you think that we we may insert the creation of a not pool IRM 64 for infra also because like that we can use the same image on junkins.io and weekly it doesn't matter because I've opened a proposal to build also also an ARM image yeah of course it's both this controller this cluster is already an ARM node pool I think it's a good opportunity to try our day on weekly that's the agent.junkins.io to see if there are any issues at first step and then creating a new update for the same proposal but for the private 8s cluster or every other cluster moving most of supported services to ARM 64 yep go ahead yeah it's for the future for the next year I just wanted to point out that as you said Stefan this issue which was only regretting ARM 64 is in fact a big update of our services so really great issue for that you agree and we still have to find a way to connect to a specific database from ARM pools to not to go into a wrong way as long as we cannot connect to my SQL if I remember correctly yeah we could also switch to JavaScript no because we need to take back the data we already have we cannot switch to we can do a dump an SQL dump and convert to Postgre that's difficult I think things that are inside on the table it's not complicated there are two tables few fields I don't really like to keep something which application are you mentioning that is using MySQL writings writing no I don't remember which one no it's the statistical it's Matomo sorry Matomo does not have any PostgreSQL connector otherwise we would have used it that's the problem we have so ARM 64 you mentioned the private gates cluster yes so survey is that okay for you if we start planning it's not mandatory to do it that milestone you do it when you're ready that's that will be the next step but adding an ARM 64 node pool to be used by infrascii for agents for agents for the name with the attack agents no to make sure that Intel pods are not spawn on that implementation detail thank you implementation detail so I propose for agents first so we can start moving Kubernetes management and most of our images that would be the same process for you for instance the docker elm file image so whether we can move to the all in one image by adding the missing tools or you can just add the ARM 64 multi build like you already did the service images that's the same build process so that will be quite easy for instance docker elm file docker hashicop tools for the terraform apply etc I mentioned last week but I was a bit ambitious but the VPN VM in ARM 64 I'm thinking long term I won't have time to spend on this one the upcoming week but I'm mentioning it there so just to to have it written somewhere so yeah that's a nice take away I think you should start sharing that with Bruno and other clubbies colleagues in the community as well the take away for ARM 64 is there anything else to add on that second major topic need for clarification or is it ok for you folks ok cool thanks folks you opened an issue about mirror status link from getgenkin.io returns a 404 error we need to work on this that's most probably the html file is not copied as part of the error sync done by the mirrors I'm not sure if all mirrors have it but we will need to put our hands dirty on this one and fix it that's not top priority but that's not really nice to have a 404 error so if we can spend some time so I'm adding it to the new milestone I think the issue is well written so no question on this one to do publishing genkin security rules as packages so danielbeck had a question about choosing github packages for publishing elements on the code QL I believe I haven't answered but I think you already checked it his proposal makes sense genkin safra could be the location the genkin security scan is on genkin safra already does it make sense for everyone daniel stated yeah it's a tool containing all the ecosystem oh good point yeah good point that will mean migrating genkin security scan to genkin ci by default it's I don't remember where but he already expressed it how we how we see why repo is in one organization or the other and it made sense I'll try to point it again the problem is the trustability and the follow-up is that you cannot associate a package on one org to source code on another org so whether we move the source code repository to genkin ci if we want to publish on genkin ci packages otherwise his proposal is will be the de facto under the hood reason is because the way github provides signing for the packages for traceability it doesn't cross the boundaries of github organization because they use different private keys so to answer and close award on gdk 21 version from adoptium so I took over knowledge sharing you know Stefan to Damian so for the packer image we were able to to deploy the gdk 21 that led us to discovering the bugs so that delayed a bit but since 24 hours it's running on cia genkin ci so we are now providing gdk 21 final release for the developers of cia genkin ci on the linux virtual machines linux containers and windows virtual machine next step is to run the windows containers and all the and all the tools on poupettes so I'm taking over this one so next step windows containers and tools installation and then we are done so thanks for the work just have to do it should be done in this milestone you have a question here oh by the way tool installation will cherry pick hrv's work from genkin ci slash docker images written reminder rv so I won't forget to reuse your work thanks for this Stefan, statues on gus I don't remember that I touch it this week can we check on the issue please yes I don't remember when I did the last thing but I started splitting let's go back to one file ok I don't remember when we were able last week we migrated the first so sdf and pm yeah this one is done yes and you had to change because we were checking the the node version or the npm I forgot something linux sdf and node gs and npm we checked npm is always latest that's that we cannot check on a version because it's depending on the installation of node gs I remember I got one open, the whip ok so you have one for the other let me write this down linux other tools whip, pair so you have to spend time if you can on this one but the priority was update center and it will probably go on competition with yours on windows yes windows pair ready to review so the goal is to use gus also for windows right now my initial implementation use a separated file so we have code duplication the idea is we can merge the windows one over the same feature set than what you are doing on the master branch for linux the gdk trivi I don't remember another tool I believe it's node gs so then the goal for you will be either you start moving everything on linux and then on windows and then factorize or you factorize now or whatever path you choose we'll check we'll check we'll test factorize and move all to gus speed up the docker image library to create tags ok that one is on hold am I correct we did some tdd but with you I think I haven't seen the test yet ok I did some I remember but I spent most of my time I'm sorry no problem is that ok if we spend some time pairing on this one that milestone so I can take over during the next milestone since you will be off I will be off during this milestone Monday Tuesday ah ok I heard the first of November and second of November ok so is that ok if we pair this week so I can take over for the end of milestone tdd started and over to do this week before Steve is off any question nope info rolled back the docker oh yeah confluence data tag discovery like a week good upgrade to kubernetes 1.26 change log being read I'm halfway the change log is that ok for everyone aws decrease cost I've updated it because we removed aws s3 ok so I mentioned it on that one since we are almost end of autumn we still have two virtual machines to migrate away to tiny minor machines I'm adding this one on the backlog back to the backlog but I might spend time on one of the two services here sorry back to backlog but to VMs to be migrated the goal is for I want to check the services what they are doing because maybe we could migrate directly these services to kubernetes instead of migrating the virtual machine matumof I admit I gave up during this milestone no work done on your side are we do you ok gavin gave us details ok I need to get back on it the main issue was that there is still that problem on aks that we cannot join the mysql database from the irm64 nodes which is just a bit annoying so we'll see what we can do erve do you think you can spend some time do you want us to spend some time on it or do you think it's a bit too much right now yeah we can let's keep it in the milestone erve award on the pages to remove on jenkin sayou no progress since last week I wanted to be sure to off the backup and backup recovery before doing the switch in series already I don't remember if you had something planned can I let you continue on someone take note just just have a world war noise on my house so I need to check continue erve and someone else is writing notes so we can plan it for Thursday nothing more but it's fun planning for supported dk version this Thursday ok cool tomorrow is security advisory do you need help on this one review is there something we can do to help erve should be good I'll probably ask you for check before doing the switch but yeah I will check my backup process and then I jenkin documentation ok so I let you announcing it as soon as you can yeah and then we can check tomorrow all the mention cool finally planning for supported dk version to do removing all dk 19 no one on team was able to continue this work I believe I will be able to do some work since I have to work on the dk 21 unless someone want to help don't hesitate to send to request je still under review but I don't think marc needs any more element back from us so that should be ok for us to continue without writing anything because the jepp will write on what we wanted to do any question so do we have new issues nope do you have something else to add folks you good retarying the chinese jenkins.io website I will add the link later we have to sync so in andaman need to sync to get started for last nights ok I think that's all so folks I need to switch meetings so see you next week for people watching this very well and see you later or tomorrow thanks bye