 Bonjour à tous et bienvenue à la infrastructures de la Chine-Kinze. Le conflit de week-end, aujourd'hui, c'est le 7 mars 2023. A la table, on a ma salle de portes, L'Herve-Le-Meur, Marc-Whites, Stéphane Merle, Bruno Varton et Kevin Martins. Nous sommes six, six bulletins. Oui, c'est un bon start. Annoncements. Weekly, no weekly plans today. The plan changed yesterday. The weekly has been moved tomorrow. Because there has been a pre-announce of a security advisory. And the weekly release is part of that advisory. So instead of building weekly today and tomorrow, that will be done tomorrow. As requested by the Chine-Kinze security team. Did I miss something? Nope. Okay, cool. Yes. I thought it would have been skipped. No, in order to, the release pattern is to deliver a, we need to deliver security release for weekly at the same time we do LTS. So that weekly users don't have to be forced to use LTS. Weekly is considered as valid to use in production as LTS is. It's just which, which release pattern you choose. It's because when, if the LTS is concerned, it's a back port from weekly as far as I can understand. Correct. Right. That's, that's the general pattern is LTS is, is, intends to be always a backward port, back port from, from weekly. Weekly is the main branch master branch in the case of Jenkins. So first, so most of the time and the hidden repository that the security team uses, the pattern is always the same. It's like a fixing nasty bug. First, you start on the master branch and you ensure that the weekly, at least the master branch can, has a correct CI and doesn't suffer from the issue anymore, whether it's security or just a bug. Then you consider if you have to back ports to the LTS line, that were forked from previous weekly. So then you back ports, you cherry pick the commit if possible. Sometimes you need to add up things as well. That can be a complicated process. And so tomorrow will be release day. So all these releases where the back ports happen, including the weekly branch will happen. Tomorrow is also the day of a new LTS. So about announcements and no weekly for today. Let's work for us. Reminder tomorrow will be a big day with a lots of, with security advisory, a new LTS and previous LTS will be updated and weekly will be updated. And I wonder if there, if some plugins are also concerned by the security advisory. I don't know. So let's check. What did I announce state? Weekly, formal TS, new LTS. Sounds like no plugin part of the advisory, of the announced advisory. From the public mailing list. Do you have other announcements? Oh, oh, actually, I guess I have a question. Over the weekend, I didn't remind me how I'm, how ci.jankins.io plugins are managed. I didn't upgrade over the weekend of plugins on ci.jankins.io. Was it okay that I did it from the user interface? Or should I have done it from a configurationist code file? No, it's, it's a, there isn't any configurationist code or whatever. Okay. So I was, it was okay. What I did. Thank you. Absolutely. So next weekly should happen. Next week. C'est le 4th of March. I don't remember the expected number. I assume 2.39 something. I think it's. Next week, 395. Correct. Thanks. The next LTS happened tomorrow. 2.387.1. 2.387.1. Next security release. I think just in case you didn't heard, you will be a security release. And next major event scale was last week, right? No scale is this Friday, Saturday and Sunday. Oh, nice. So I'll be out Friday, Saturday and Sunday, and then trying to recover a little bit on Monday. And we have Devox France. This week. In April. Do you have other major events on the pipe where you can meet plus de 2 teams or contributors? Nope. Ok, then let's proceed with the work. First of all, what task were we able to finish during that milestone? I'm taking them on the order on my screen. Not in order of priority. The hazard service principle credential would have expired tomorrow before the current ACS publicates cluster. So we were able to rotate the credential and update the cluster. That required an announcement because the held up and release CI services had to be restarted. And that can take from one to five minutes. The main reason is the time required to unmount the data volume in the cluster and mounted to the new virtual machine. Everything went flawlessly. It took 30 minutes time for the system to update to perform rolling updates. So were any outages noted any meaning did we get any complaints from users. Hey, I've been broken by this or that. I haven't seen any. We, we sent it messages email to our prior to the operation. We could have been one day before. We didn't see any complaint, but for sure, there has been some outage. Stefan and I saw that CI Jenkins IO was already under heavy load. So the amount of request cashed and waiting reconnecting to LDAP took a stall on the performance of the system. But it was still working as expected just a bit slower during the five minutes. I haven't seen any other complaint, but for sure that might have impacts. That mainly has pointed out by Daniel Baker repo. Jenkins CI.org is the one that could be impacted. We don't have a for the next issue attempt to skip artifact caching proxy failed. So that rubber issue was open by basil. That has been closed. There were two main issue that you can see below. First, the root cause was the fourth one here. 502 bad gateway error due to the persistent data for the ACP instances. Some of them were already full at 50 gigabytes. So we had to perform an increase of the size of this persistent volume. That was an opportunity for us. Since the data inside this persistent volume is not a problem, it's caching data, right? We use persistence to be sure that we don't reload everything all the time. So that was an opportunity for the team to check some live migration of data and resizing on our clusters. So we learned a lot of things that were written on the associated issue. There are many tricks to be sure that we don't have any downtime. So it happened without any downtime except for the Azure ACP, which was already full, but still able to serve request in read only. The second part was an issue on the way the skip caching proxy label and pipeline library were processed. That has been fixed by our very quickly to un blocked build. So if this separate issue are closed and we were able to confirm that every builds that were reported as broken by one or the two issues were correctly working as expected yesterday. That include the failed plugin builds due to missing Windows configuration file. That was the same issue. So nice work everyone. That was again a team effort because we were free and there were many, many things to fix their tiny things. But yeah, a lot of work for a single day. Just to know that we open an issue to track the work about monitoring the usage of this persistent volume. Not sure if we already have this on data dog and it's only a matter of adding an alert if we have to add a probe to monitor if we have to do something else. But yeah, we need to do this. Not immediately because we increase the volume, but that's something to have in mind. So, and I'm a little bit surprised at the 50 gigabyte volume going full. I mean I've got. Okay, I check my local maven and it's only 25 gigabytes. So, so obviously I'm not as good a representative of the usage across the whole infrastructure as I thought I was. Yeah, and now that's the back end extension indexer is using the application proxy. I think most of the plugin are now cashed in this mirror so yeah. Oh, go ahead. I think that may hint that artifact caching proxy, probably is becoming almost the size of our data volumes on on repo dot Jenkins ci.org right it's almost becoming a full and complete cash thanks to things like back end extension indexer that builds every single plug in tragic as that may be. Thanks. I don't know if the HPI files are all there. For sure. For sure, it's still smaller, because we don't have the whole history of all the former dependency. Okay. Last week I was off most of the week. I really took the issue of frequent page duty alerts that was there back there again. Good catch folks. So, it looks like that on Azure, the windows server templates, when given a big disk are trying to create a C and the D drives partitions inside the disk. When we thought that the issue was fixed by increasing the size of the disk, it was, except that the C drive use for the Jenkins builds was still 30 gigabytes or something. And they were a big empty partition D drive with a lot of gigabytes not used. So, the image template we were using image base was with a small disk. Yep. Yes. Good point. It's not the current template that we use. It used to be that one, the small disk that was forcing the creation of a small system drive disk, and then automatically created the D drive and extended to the size of the wall disk. So, I have no, absolutely no memory on why did, because I'm responsible for using that small disk template and I have no memory why did I chose this one. So, thanks folks for, for going through the pain of analyzing this because it wasn't an easy one. Just a reopening and reclosing of Maven 390. As a pointed out by the team infrasci, I use for back end extension indexer and eventually other jobs was still using a 3.8 Maven version. We had an automation of the update of that Maven 3.9.0. We are currently debating and nitpicking about should this be synchronized with the poopettes or should it be autonomous, which means, yeah, almost there. That's a good thing when you are nitpicking. The provider deleted said why that was a user asking for a change of their account and can say you associated email. They were able to give enough proof and the risk was quite low because that person didn't does not maintain any plugin. We check the email mix that person discuss with us or so. So yeah, the amount of proof were good enough for trusting that change. So yeah, thank you so much for your survey for taking care of that person. Remove docker bill for Jenkins CI PCT. Thanks team. Team help Basil to clean a former usage of docker in that repository. Nothing related to the infrastructure as far as I can tell. Jenkins core release, disable weekly release. So that was a request from the Jenkins security team. We use the issue to have an audit and audit trial log, and we took care of not updating the configuration of release that CI until today. That should be the same until tomorrow. And that's all. Finally enable to log in. Yeah, that's a classical someone opening an issue with an account problem. We don't know their username. We don't know the email. They just doesn't feel the form. So I closed the issue because it was it wasn't filled as expected. That is the completed work. Now the work in progress. I try to categorize the issues because we have let's aggregate of issues that are under the same thematics. We still have an account related issues. Someone cannot access their plug-in accounts. That person maintain. I will want to maintain. That's option plug-in. That has been updated since months. So that one is a bit more sensitive. We ask proof from Jira Atlassian, from Atlassian organization. The person says they are part of the github.com option. Chini, Chini. Yeah, I guess. I will write that even if they can prove they are under that's github issue. I don't mind. But still that email should be absolutely still valid under the Atlassian. So they have to contact their internal IT system. Because here it's about taking over the maintenance of a plug-in that has been updated since month, if not years. So if no one object, we will keep the okay. We have a first level of proof for github, but we need another level from Atlassian.com internal teams. Any objection on this one? No objection. Okay, so that one will magically move to the next milestone. And we will have to keep an eye on that. If no one object, I will make the answer, but I will need your help to help me follow up any feedbacks then. Looks good. We have one issue, which is not a direct action expected from the Jenkins infra team. So thanks, Basil and team for taking care of discussing with Ian about migrating one of his plugins inside the Jenkins organization. So the issue is more for audit log of the whole community and not really anything related to us because it requires a Jenkins CI administration rights, which is not the case for us. Any question or things unclear there? Is, is there a way that we need to flag that to one of the Jenkins CI org admins or the Jenkins CI github? Yeah, organization. Or is it already done? Oh, Tim says, okay. Got it. Take care of this issue. Since he was one of the original asker about the location of this plugin in 2019. Then we have the expired credential issues still open. During the past 10 or 12 days, almost all the technical credential we use for Jenkins to whatever service expired. Mainly EC2 are not available open by by Alex. So EC2 machines are working today on CI Jenkins. I was able to scan. I assume that Stefan or Hervé, did you rotate any credential on AWS? I don't think so. Okay. So if no one objects, that one will move to the upcoming milestone and we'll have to check on AWS, which is still cloud base account. So one of the cloud base member of the team will have to check the, what's the name, AWS I am credential associated to CI Jenkins IO and see if they need to be extended or updated. So that one is should move to the next milestone. Trusted CI doesn't spawn new nodes. So that one was fixed. It was the ability to connect to Azure from trusted CI, which runs on AWS. Stefan and I were able to rotate the credential, apply it. But we wanted to start to manage as code discredentials in the terraform Azure plugin. So we should at least have an audit log of when we rotate the credential. We had an issue is that the permission model that we use, we don't want the technical user managing terraform Azure resources to be administrator of the organization. Just in case if that account is compromised, it cannot access the billing and a lot of issues we want to limit. But the consequences that the permission model is a bit more complicated than we talked. So right now we're able to master to manage what we call Azure application, which is the abstract concept of I got a technical user. But the part requiring a credential associated to that application or the service principle, which is the user inside the application. That part is a bit trickier. So the last status before we had all these issues to fix days off and stuff was at least we have a part of this object already managed and only the credential need to be managed manually on the UI. We want to close that issue only when we will be able to manage everything. Right now we are using a temporary credential. That's why that issue must keep remain open. Because that credential is one month valid. So we will have to close the issue there. We have the same for third CI. That's exactly the same. The difference in the case of third CI is that we want to switch that credential to no credential at all using a capability named workload identity management. Which is possible in the case of third CI because it runs inside Azure itself. And if we start managing with Terraform or directly on the Azure UI tell the system, oh, any request sent from that virtual machine will be associated to that account. No need to insert a credential inside Jenkins. That team confirm that it should work with Azure virtual machine plugin inside Jenkins. So we need to validate that assumption. The second step process. First step is start managing third CI virtual machine with Terraform. And then create the workload identity and test it. Same thing. It remains open because the current third CI, what's the name credential that has been generated manually to unblock the Jenkins security team. That credential is only short term credentials. So we keep the issue open until it's fixed. It expires next week. So we will have to work on this next time. We add the same and I forgot. So let me add just a comment here. And if no one object, I will add an issue for CI Jenkins IO, which credential also for Azure also expired during the weekend. That's the issue that Mark mentioned a bit earlier that you try to fix. It wasn't clear on the error message. If it was related to the way CI Jenkins IO works or if it was something else. It was something else in that case. Thanks Mark for taking care of that. So let me write this down. I did exactly the same, rotated the credential, inserted a new credential, restarted the whole machine and checked. That's the everything works again. So CI Jenkins IO Azure credential. So the idea, if it works for third CI, we should do the same for CI Jenkins IO, which is also an Azure virtual machine. So also candidate for workload identity management, which means no credential better, no need to rotate them. Same as certes.CI. Temp credential and candidate to workload identity management. Is there any question about these credential rotations? So the good point is that our calendar elements were working as expected. But still we had too much things to deal with this week to cover all these elements. So some were done before it failed and some were, we were surprised that it happened so quickly. So sorry for the inconvenience and we have a path for improvement there as a team. Mark Jenkins got signing certificate related issues. We have to wish open issue right now. What is the status or did you had any news about the DG cert renewal? Apologies, I have the action item to send them a message. I've received no response from them to Stefan's and my attempt, and I haven't yet asked them. So I will do that today. Sorry about that. That's, I've got to raise that to them. We've now got what is it 21 days or less before it expires. Okay. Do you mind adding me and CC now that I'm back from all the issues. But Stefan, is it okay for you if you keep working on that with Mark, but the goal is that I'm there as a fallback for Mark. Is that okay for you, Stefan? Of course. Of course. And I remind you that I will not be there next week. Well, and so and I'm out Friday, but let's, yeah, let's I'll keep, I'll keep multiple people copied that way. We've got got more than more than Stefan and I know where things are going. Then I propose to add the Jenkins dash infra dash team email, which is a private email with the whole team. So the whole team will be aware of that knowledge. Great. Jenkins dash infra dash. That's a private mailing list. So also everybody will be part. I assume everybody was already part of the. Yes. Okay. So more news next week. Okay. Okay. If it's okay, this. So this issue will keep being moved from milestone to milestone. So I will add them to the next milestone. Now what about ACP related issues? So the top level is reintroduce an artifact caching proxy for CI Jenkins. I read do you have a status on this one? No, pro-assignant, different. Repesitories. Next in line is the Java documentation repository. I have to finish. Okay. Which one did you finish or work on since last week? I have to check on the issue. Okay. I believe back end extension indexer was. Well, the pull request was merged, right? Or not yet. Yeah, but I don't remember when exactly. Back end extension indexer. Okay. Back end step generator. Next week. Last week. Okay. Pipeline steps generator. So I got a question here on this question. Where we able to check the impacts on G frog. Because the subsidiary question is we don't need to cover all repository and force them on ACP. If these repository or doesn't have any impact. And so, so the data says that. Artific caching proxy has done a very positive effect. We were previously in the top five consumers. We're no longer anywhere near the top five consumers. So I didn't check to see how far down the list we are. But we're not top five anymore. So that means we should continue working on this element. But that's really important because the time spent on this one. Some might not be worth the effort. But in that case, it is. Well, it's. I think it's less important because we are already. In the back. I think Stephon said it well that if we needed to use some of air base time elsewhere. So that's okay. So long. Right now we've received significant benefit from what's been done. And, and we're probably going to have to do other investigations to understand how we, how we do more reductions beyond ACP beyond the proxy. So one major benefit of air base work on this repository on this let's say exotic repository though. It's air base was able to find the way we're able to retrieve remote objects using url connections and stuff. We're using. Yeah, a directly repo Jenkins CI. We're using all Java form of HTTP clients. So somehow, at least finishing this element will allow us to prepare the future for increasing the Java version from jdk 811 to 17 or even more, which would have been a blocker right now. So at least it's a kind of cleanup project, that's the value I see there. So I absolutely differ to you on if you see that it's useful to continue to do on these elements. Please go ahead, your time is well spent on that. I think it's still worth the effort. Not only. Yeah, the metric of the ACP bandwidth was the top priority. Now that it has decreased, it's still important to get them. No one object is continue one repository per milestone. So you can spend your effort on other major tasks, but still keep that one as a regular one. Is that okay for you? That can be more if you feel it as soon as it doesn't go through other priorities. Looks good. Cool. So on the topic of ACP or Giffrog. Reline repo Jenkins CI org mission. Today's pod publicates with the LDAP restarts shows that we need to find a way to have an highly available LDAP. Still to do. I started to reproduce. I started yesterday to work on a local LDAP with a with a set of a test data that is already on the open LDAP image. And I'm trying to run this with two pods with a read on your replica. I will want to run it on a local Kubernetes cluster with Jenkins instance to see what be at first before trying to install it on a new cluster. I will want to see mainly how to fine tune the detection of when one of the replica goes away because maintenance or crash. How does it behave and how much time does it take before the load balancer is able to switch to the other in the context of Kubernetes. And the other one will be written. At least, yes, yes, there is no need for the only rights come from accounts Jenkins. OK. So the idea is that. Every instance such as CI Jenkins, I will choose the issues Jenkins or even the Giffrog repository would have a domain name that will always point to a valid read only replica. Or, at least an instance that can provide read. An account Jenkins, I will use another domain name that will always point to a right. In fact, the measure stress on the LDAP is read only stress. The right is really low and can be handled on the on the side. Exactly. And we can accept that account Jenkins. I always down for five minutes time for the rights for the right instance to be restarted. But yeah, right now it's still not a problem, but it will become if we need to enable authentication, which that's the next topic. It sounds like Mark that given on the top consumer you saw, we might have to still go on the way of eventually enabling authentication for the mirror repository. We may have to, although I had a conversation with Basel Crowe and he challenged that and said he would like to see more data extracted from the data reports to confirm that that concept. So, what I saw was many requests, lots of requests from a few IP addresses in the high lists to the maven repo one cash. But Basel's point was, we need to understand if those requests to the maven repo one cash are in fact generated by Jenkins related activity, or if they are just someone asking for a copy from maven repo one. And it's, I think he's got a good point there, and I think there are ways to answer that question but it will need some further looking at the data, because we don't Basel's point was we don't want to enable authentication. If, in fact, it's not going to help with the problem. Right. And if they're doing Jenkins development work, then, then that's really not going to change it. It makes sense, especially if the Jenkins activity is something we could fix on a parent poem or on documentation. Or, or whatever right yeah if it means we need to look at if that's the case, then we need to look elsewhere to find ways to reduce the bandwidth demand. So we need to challenge a frog on that part. And I believe that we have to maybe ask Stephen Chin and Laurie first again to give them a status that we showed some effort, what do you think Mark. Yeah, I think I think it's more than not so much that we need to challenge them as rather we need to do the data analysis and bring the analysis of the data to them to show. Look, here are our here was our usage before here's our usage. Now, here are our key consumers before. Here is the reduction of those key consumers now. And, and that brings the dismaying when that the largest single consumer is still the largest single consumer and we're still working on that topic with them as to what we do about that. So it appears to be simple abuse case, or simple misuse abuses such a strong term. So, focusing on the main consumers. Yeah, but that main consumer we cannot do much without G frog. Right. And that's why we've got that conversation with J frog is, look, we've got this large consumer that our attempts to find them have failed, our attempts to appeal to the abuse reporting organization of their ISP have failed. And we're, we're sort of out of options that we can take because we don't have control of the networking endpoints on that service. Is there any, anything else on the topic directly G frog. Or maybe so I may need for I will need further help and I'll ping the infra team separately with being sure that the IP addresses in the report are not ours, because there was one that we saw I saw in the report from digital ocean in Germany, that may in fact be one of ours. I'll look at the most recent data it just arrived today for the last 12 days. And so I have a great excuse to do some more data analysis public IP in this list. Okay. Issue that was open yesterday, quite recently, sounds like a user mention an issue with some cashed maven repo one cashed artifact on their bills. I saw you assign the message. What's the status. I don't know, I don't know how to resolve that because in the report, there are a version of this plugin but going to dash Atlassian dash two, not until six, and I don't know how it could be resolved. Do you need access with me to the repo gifre repo administration so we can check if their repository is correctly mirrored, if there isn't any error, if it's not excluded. Okay, you say we have it on repo Jenkins here, is that correct? Not the last version, only until Atlassian two, and they are requesting Atlassian six, so I don't know how to resolve that. Okay, so we need to check the difference between our repository that mirrors Maven central and the Maven central itself then. Okay, so that one definitely goes to the next milestone. No other question or topics about ACP or gifrog. Other issues, we had the hanging agents, especially Maven 17, but not only since the past two or three weeks reported by JC. We had issues, but we also had to closely monitor the capacity of CIG and kinsai auto treat builds, particularly the bomb builds that are again more and more frequent. I assume that the LTS and security advisory might have an impact, given two or three conversions, this increased the amount of possible builds and plugin envelopes. Good thing is that thanks to the work of Stefan and Irvi, we were able to increase the capacity of CIG and kinsai auto, that used to be 150 agents, as you can see on the three weeks ago plateau. So we had 116 because we have 150 container agent Linux, and then the rest are all the kind of agents, usually virtual machines or windows container. And now we have reached the 300. We have the same workload capacity on both Digital Ocean and Amazon. Digital Ocean allowed us to increase our limits. We haven't checked the impact on the spending yet, so we'll have to closely monitor that on Digital Ocean. We also can start monitoring a bit closer, a bit more closely. Studying, if we could vertically scale each node of Kubernetes, right now each nodes is able to host three pods at the same time, given the memory and CPU limits we use. So we could study how to increase the size of these machines, given we have more and more frequent builds that could help us to have the same amount of nodes, but handling more capability. And about that issue, unless someone objects, initially I said we should close it once we will be able to define two workload capacity, one for the bomb builds and one for the other plugin builds. If it's okay for you, I will proceed to close that issue because the initial problem is fixed and open an issue dedicated on having separated node pools, because we don't know when we will be able to do it. And we were able to partially solve the issue here by increasing the capacity and adding a way to get metric. Particular array was able this morning to add a public link on CIG and Kinsayo to any developer, they can directly see the current workload capacity. If you go to CIG and Kinsayo, you see that you can check at anyone and you can be logged out. Let me try in real time. I'm not logged in. So it's a public information. And you can see a diagram. So the colors as are reported on the legend. So in that case, the gray, the gray one is the amount of build in the build queue waiting for an executor. And the green one is the amount of online executors. So you have free time span. Short one is the past hours. So you can see that the build queue decrease and then increase the gain. So we have 300 builds on the queue. And then you can see the long one. But that one is only since the last restart. So that's the first step to give some action about the developer. If the build is slow, they can check this one to see, oh, I see that we have currently 300 builds on the build queue. That's why you are waiting. The next topic and the next issue, we will have to separate the workload capability between BOM and plugins. As you can see today, we have a lot of plugins that are used to be waiting for the BOM builds to finish. So at least we will separate these two kind of usages to not block the plugin developer. Is that clear? Is there any question, objection, things unclear on that topic? Clear. Thank you. So better close this one. Close and open a separated issue to split workloads. One major one that has been solved that is the emergency parts. The update center job was failing due to mainly the guy speaking. We had dependencies on a common name blob xfair, which is a kind of her sync for Azure bucket storage. And when we worked on the let's encrypt updates to support Azure DNS, it broke Python's and since that big machine PKJ origin genkin saio is currently used for synchronizing different plugin updates, on a trusted CI, we choose an agent that connect to PKJ to run something to push to the mirrors. And that command blob xfair was installed manually and not managed. So we missed the part where it was broken. So we were able to fix it by playing around with Python and we have, we have fixed that on the machine, it was able to go back. So before closing the tissue, we still have to track the installation of blob xfair that we moved around all the shell script as a requirement that takes days. So developer of this script can now control the version that will be used. We now need to tell Puppet to check that requirement and install it if needed. LV has opened a separate issue to ensure that all the blob xfair should be replaced by a new az command line. That doesn't use Python, it's statically compiled easier to install and has way more features because the latest version of blob xfair was from September 2021. In the meantime, we have different usages of blob xfair from 16 to 111. We should have 111 everywhere as soon as possible. So that's the, let's say a definition of done before being able to close that issue. Which lead me to another one that you open Mark, it's a consequence of that issue. While we worked on trying to, the two or three days while the update center failed, some of the plugin update were missed by the system synchronization. So the plugin is seen as tagged and released. It's HPI file is uploaded on repo Jenkins, but it's not available on the download server. So for that part, we might have missed others. That's absolutely consequence of the update center. We need to run a manual task with the project to correct these missing plugins. We did that part with Stefan using a new method from Daniel that used to tell the update center, run not for the past six hours, but way more two or three days. I don't remember exactly what we did. 49 hours. And it sounds like that that time windows wasn't big enough. So we will have to run right now. A big more time, but there is also a manual version, which used to be the former, former method. It's described on the run books, private documentation. And we have to run the update center project locally on our own machine. So we won't mess with strut CCI and the current update center. That operation should generate a JSON file with the list of missing plugins. And we can then on PKG origin as part of the procedure, upload that file, run the synchronization script one time in concurrent concurrently to the current one that should fix at least for this one. So definitely this issue going to the next milestone as priority. Any question. Thank you. Thanks very much. I assume it would not help to have a list of exactly which the process you were describing sounds like it will generate the list of which plugins were missed and will then synchronize them. So, because I could read old email messages that I get from repo that tell me what's been released, but that would. Me doing a good job of reading and sounds like you're to the tooling will do a much better job of that. If it's easy and not costing time for you, and only if that's that could be an additional way for us to check that the list of plugin that we are fixing our part of what you see. Yeah, and it's not easy. I'd have to dig through my trash. So then let's let's forget about this one. The tooling should be enough. Every. We had this one on hold because of all the activity and incidence. We are back on rile as file as far as I can tell to continuing migration of the clusters. Can you give us a status of what you should be able to walk on around migration of private gates and public gates as well. So, 2 or 3, but service, we can move at first. The IRC bot, yes, as to Twitter bot and the GitHub command ops bot. We can answer me question and then proceed the next day. We can. I think there is no prejudice to shut them down. While we are migrating them. Ok. This is not that the services will be gone. They're just moving from one cluster to another. So there will be a temporary down a brief downtime. Exactly. While Twitter hasn't. Happy wasn't available anymore without being forced for it. But he's still running so. Yeah, the Twitter one might. Disappear so. And then. We also release the agent can say oh. That will be the next steps. I will let you write down on the issue what you plan for that. Once you will have finished with the bot. Is that okay? Cool. A few issues now create an update Celi manifest to update a Kubernetes quota, depending on the CI config. So that's the part where we increase the capacity of CI Jenkins. We have to location where we have the max amount of pods per cluster. One is on Jenkins configuration side and the other is on Kubernetes quota side. We need both to be sure that it doesn't behave unexpectedly. And yeah, we need an automation. Once we change them on one on Jenkins, then we should have an open pull request saying, oh, you want to increase the capacity, etc. So that one is to be done ground committed access to release CI to some security team folks. So that one is hiding the fact that the air back model of release CI is too simple and we should add a layer of protection here. So this trick thing was admin access to that instance. And people who should be able to trigger release or part of the rest team should only have the ability to trigger builds and read only. Even if that instance is behind the virtual private network. So the, and the concern there is, is that we don't want release leads to be able to alter the configuration of the system. So we don't want them to be able to access to the credentials as well. Because this credential implies some risk. And the more people have access to this, even if behind inside the private network, the more people could have a mission compromised that will try to authenticate release CI and try to extract elements. It's not trusting people that's trusting the credential that can be stolen to them. Right. Good. So I thank you. I hadn't considered that that aspect of it. Thanks. So thanks Daniel for adding this to our radar radar radar. Out of space on a CI Jenkins IE agents. None of us was able to start diagnosing this one mark. It's not, it's not occurred again. So I was, I was surprised. I don't understand why and I'm, I'm not overly worried about it. Okay, if you don't mind, I will ensure that this build has been kept forever so we can, it will, it won't be cleaned up. Okay. Might already been the case. I'm interested. So I'm clicking keep this bill forever. Okay. So now we should be able to analyze afterwards. Damien, could you go back to the keep this bill forever and could you update the description of that build so that people like me who have a tendency to delete old things will be reminded. Yes. Issue. I never used that. Or here I can do it. Actually, let's not make you do it because I don't. I worry about you showing on a recorded session, something that's been authenticated. Let me go do it. I can happily do it. No problem. Thanks. We will see we have to analyze this one to see what happened, what is the message, etc. Because it depends on a kind of agent, if it's container agents that is filling the disk, then we have that one might not be easy to solve. And the reason I'm a little worried here is that we've just recently added the AWS SDK plugins as dependencies managed by by the Bill of Materials and the AWS SDK plugins are huge. Oh, so we could in fact be in have increased our discus. That was my only concern there. Make sense. All right, so that one I need to keep this bill. Keep this bill forever is set and I will add keep. Okay, got it. Thanks. We did upgrade everywhere to is 0.46, I think. And this one is back in the correct two spaces in addition. We still need to check if if it repair the one that we're broken with for indentation. I've not seen any version of bad ones since then. Okay. Do your mind after this meeting just to update this just give a status as a comment on the issue. Just that we won't forget in one week. Yes, should solve the issue. And if the problem is fixed, you can close the issue. Of course. In fact, we will be shown on Tuesday, Wednesday evening or Tuesday with the new LTS because it will trigger the new one on Kubernetes. No problem. As soon as you had the issue explaining that command explaining that on the issue. Replace updates. That's you. What's the status for you about five minutes before that meeting. I had a green on my version of the debt ceiling running through Jenkins file. So that should be. I'm doing that in three steps. The first step is the new debt ceiling. The new Jenkins file sorry dedicated for the CLI version. Then the ask out for the controller to deal with that file. And if it's working fine, I will remove the GitHub action that was doing that for us from now. Next step. Remove GHA. Yeah, that's why it's not in the issue by the way. I forgot to explain everything. No problem. Finally, the issue I mentioned earlier open by airway. I will move it to the backlog. The goal is to ensure that we replace all blabic suffer a call to Azure CLI call, which will require finding the exhaustive list of these elements, machines, templates, then ensuring that we have Azure CLI and then changing them one after the other. Keeping in mind that some might be hard to test or verify until it's really tested, particularly the new plug-in synchronisation to mirrors or the Jenkins core release. This one could be tricky. Back to backlog. Now we add a few new issues to consider adding to the milestone. Feel free to add some if you, if I forgot some. We have a proving I'm not a spammer. So it's an account issue. So that one will be added and we will see what is the request. My great update Jenkins.io to another clouds. The machine pkg origin Jenkins.io is an upcoming machine is a machine that should be migrated out of AWS as soon as possible, especially given the last changes we had. We will have to go back on this one given the cost for the bandwidth that it cost us three to four K per month. The next one will be walked on because some of the blogging sufferer fixes for the date center will be part of this one. That's why I'm mentioning it here. Validation certificate for certs CI Jenkins.io. Once we will have the workload identity management. We still have one month and a half left before having to renew the certificate by hand. So for these two, I assume we won't have any time. So I propose to remove them. Keeping them on the backlog. Just mentioning them. There has been a message about remote repository for repo carap slabs.com. The goal was to add a new mirror on G frog. Avery, was there anything else on that one? Do you remember it or not? Because I added it on my note, but I don't remember why. I don't see why it's on this. C'est important. C'est un rapport. A exception. Oh, yeah, that's right. Okay. So that one's still on the backlog. I saw the move ACI remaining workload to Kubernetes to stop using ACI at all. That require adding windows, not pool. I don't know if it's okay for everyone. Let's wait for finishing migrating release CI in the correct cluster. And then we'll see what we can do. And one last, but that might drive our March months. Ubuntu migration. That one is important that I want to migrate this one. It's currently on the backlog. Let me add it to the meeting notes. Ubuntu 20 for campaign. So the plot twist. Ubuntu bionic that we use the most everywhere on our virtual machines. Is end of life in April this month, this year. We might have one or two months extended months with security release, but that's all this summer. We're basically dead. So we have to upgrade at least to Ubuntu 20 or four, or ideally to 22 or four. The reason I'm mentioning that is because we some machines such as PKG, origin, Jenkins, you are using packages that are not available on something else than bionic. So the impact can be great. So we'll have to solve this one in priority. The main thing is that most of our puppet infrastructure as demonstrated by Stefan and RV works very well on Ubuntu 20 and Ubuntu 22. But we will have to migrate everyone to a recent version. My proposal is to focus on Ubuntu 22 because it's the latest LTS. And because Ubuntu 20 was a mess with Python packages. So we can create a repo to use for generating the repositories of Jenkins. Doesn't exist in any form on Ubuntu 20. And it's not installable and compilable. It will break due to the way Python packages are done on that distribution. So that's the proposal to use a descent and the most recent LTS. Yeah, we saved two years. So the two top item will be pkg.origin.jenkins.io. And then the packer images are ready to the first heavy lifting. We should be able to migrate our agent images, both container and eventual machines to that version as well. That should be the two step that we can start working on. A note about pkg.origin.jenkins.io, the first of the testing can be done since we use the Docker image for the local poopet thing. So we should start at least if what kind of packages, just such as create repo and arrest exist on the current machine and see if we can find a new way. And the only way to test it will be to migrate the virtual machine in the future. So that will implies doing a snapshot of the current file system, upgrade the machine, restart it. Maybe work, maybe not, and then iterate. So that might require to create a brand new machine from scratch to be able to have a blue-green deployment. And my proposal, which is this, so I don't have the issue right now, but that will be to move most of the static virtual machine still running on AWS to Azure. And that one is important. We have two jets on Azure. We have 1 to 2K. The bandwidth cost should be clearly lower, even with the same amount. AWS is really expensive. But that could be a risk for us to go across the 10K per month limit once we've done that. So that's why for this one, I propose that we keep pkg.origin.jenkins.io on AWS. And we start migrating trusted. That service is composed of free virtual machine that will be worth it to migrate to Azure and see the cost impact. So if it's okay for everyone, I've mentioned that. I'm not sure we will be able to work on that the upcoming milestone, but for sure we will have to work on it during the month of March. So I propose we start Ubuntu 22 campaign as part of that milestone, some work has already been done. And then for the upcoming milestone, we'll work on migrating to Azure. Is there any question, objection, things unclear on these two? Okay, that's finally all for me. I knew you missed me and my one, more than one hour meetings. That was very clear and nice. Good job. Is there any other items you want to raise discuss that I could have forgotten? Oh, yes, yes, actually, I take it back, there is one. So it's been discussed. This is an informational thing only. But in the platform SIG, in the documentation SIG, we've realized that we need to do some extra effort to deprecate and end of life some things. So Ubuntu 18 is a good example, but there's an even better example, which is CentOS 7, that Markweight doesn't like and wants to end its life. And it will end life in June of 2024, whether we do anything or not. But what we're going to do is submit a Jenkins enhancement proposal to improve the way we notify users about things reaching end of life because we have containers that we need to reach end of life like the Blue Ocean container, we have operating system container images like the CentOS 7 controller image that eventually we want to end life because its upstream is end of life. The info here is just to say that this Jenkins enhancement proposal will be coming and will propose to extend core Jenkins to have a way to disclose to users that something is approaching end of life and then based on a date stamp has reached end of life. And it will have to have a way to represent things that are not immediately obvious. So there's some discussion needed there, but just be aware that end of life is getting attention from the platform SIG and the doc SIG. And I guess maybe one more announcement. So Bruno, while I'm here, doing announcements and Kevin, the second item is that the doc SIG has decided that about April or May we will transition the documentation on installation from describing how to install with Java 11 to describe how to install with Java 17. 11 will continue to be supported, but we will make that transition because we know that it will be a transition because we know that Debian 12 will not deliver Java 11 at all. Now we don't, that doesn't affect the Jenkins project because we ship, we deliver Temran and Temran will work just fine on Debian 12, but we don't want two sets of instructions if you're using Debian 12 you have to always install Java 17. So we're just going to take this opportunity and say everybody should start using Java 17. When they install Jenkins from our instructions. I'm not ready to say that last statement that Damien just put in there and I want everyone to know I didn't say that, but I appreciate Damien's leading that way. And we need to remember to switch from jdk19 to 20 soon I think. Right. So in, I think that's actually another item. Let's put it on the announcements because I just did the research on this. Let me look to see. Oh, we got the calendar. I remember that we put the calendar and the official end of life for the jdk. Oh, I think it's 21 or 23rd of March. Yeah. So, okay. So, so I have 18. So the one I'm expecting is 18 April 2023 will have a new release of Java. That's a Java release. That's a, no, a new release of the Java. Java 11, Java 17, etc. I don't know on the date for Java 19, but you're right. I think I have a date. Java 19 to 20 is coming. Right. And, and so that's another one. I believe it's that will be jdk21, the upcoming LTS. Is that correct? Well, I think we've got to go through JDK 21st 19 is the current JDK version 20. I didn't think it even released yet. Will it be in April 2023 as well? Or March today? Let's see. They're stated schedule. They're stated 21 of March release of JDK. So Bruno is exactly right. 21 March JDK 20 will release. Oh, j'étais très bon. And then JDK 21 and JDK 21 is slated to be an LTS, but they've not described. It's not stated. So on their site yet. I don't see it. Jdk 21 LTS. Cool. Thanks for the announcements. Yeah. Excuse me. Going backwards in time. I should have remembered those for earlier. No problem. That's all for me. Is there anything else you want to address? Ask. Cool. So. Thanks everyone for the huge work. I'm going to stop sharing my screen and say goodbye to everyone. And now I'm going to stop recording. See ya. Bye.