 Okay, hello everyone. Welcome to the Jenkins infrastructure weekly team meeting. So today that will be overloaded because we cancelled the meeting last week for. I was alone as far as I can tell available for running it. We had production operation that were planned that day so the combination of all of these make it made it harder. Today on the table, we have myself damaged the portal. Marquette, Stefan is off green with off and Kevin is there with us. So announcements from the two past week. The Jenkins version 2.413 and 2.414 are available. Last week. This week so I haven't had time to finish a comment but I started an issue about the release process. So the goal is to track somewhere because I thought we already had an help desk issue, which we didn't. We already have an help desk issue for infrastructure about moving all the application process to release CI, which is something that is discussed. But today, we want to automate the manual task where we create an annotated tag and we publish the draft release. That is something that changed compared to six months ago, for instance, and it's a manual step documented on the release process. So I've opened an issue because these steps could be automated at least for the weekly release. I saw the discussion you had with Tim Jacob about, yeah, maybe we can or cannot automate it for the LTS core release, which should be doable. But that was the exchange for security release that will need to update, that will need to an exchange with the GenSec team. Another fun thing, Marquette, we discovered with Irvine realized I was a bit enthusiastic to create the tag. And I created the tag before the packaging builds was finished. And I realized it was not available on get.genkins.io. Exactly. I need to add a comment that one of the release parts and the release process must be done. That's the part that push it, at least on the archive Genkins.io mirror. So get Genkins.io as at least one example. I didn't realize this one, so we need to track it and I need to add a comment on the issue I created earlier. Well, and my apologies that that race condition is there. I thought it was valuable enough to do the poll from get from get.genkins.io even with the race condition because I was aware of the race condition when I proposed it and accepted the race but obviously if we lost the race once we need to implement a fix. Yep. So last week release went well. This week, we had two issues. So the one we just mentioned. And another one, since we have graded Kubernetes last week, I run a replay of the package thing because I wanted to check that the agent were properly created on the corresponding not pool to avoid a bad surprise today. And packaging failed due to a replay build. Because when you replay a build that clean up the parameters. And as we already have an issue on Genkins in France leads. We still have a no old issue that say, hey, if there isn't a previous build. Then we are stuck. Where is that one? Oh, it's on packaging. We, I think that's that one. So the initial build on a given branch for the package process at least and I'm sure the release has the same issue. The first build is failing and need to be retry a second time each time we create a new LTS the dot one of LTS has that problem and the replay on the master branch that I did. Cleaned up the previous versions. So today's packaging weekly bill failed due to the same reason. He didn't have the default value. So it failed because it was empty and the shell script failed if it's empty. So we have good safe words, but we need to solve at the pipeline level to provide the default value that could at least allow us to replay. Because I don't know for the LTS initialization, but for the weekly release, it shouldn't happen. So replay the build and that's okay. I need to open an issue or at least update the existing issue. Let me add this to open issue or comment on existing track this race condition need to wait until war is available on get the Jenkins IO. That's all for me for the weekly releases is there. There's something else to add. So I've confirmed that the container image downloads and build successfully so so that part was successful even after the race condition got exercise so the rebuild must have worked. That's great. Thanks. The change log has been merged. Thanks to Kevin for proposing changes to the change log. There was an oddity in the change log that sometimes happens where the automated change like generates unexpected things. And I didn't do the investigation to figure out why I just fixed it. Just to note about the change logs. I'm impressed because since one year. We have the weekly release and the change log done the same day every week. That's really impressive. It sounds normal now today. But trust me one year and a half ago. That wasn't easy to have this kind of timeline so congratulations to everyone involved in the process that's really efficient. Well and thanks to Tim Jacob for his automation of Jenkins change logs it really is a much better experience thanks to his automation. Yeah. That's all for me in the weekly releases from both weeks. Anything to add or questions or things to play for cool. Regarding the announcements. Next week I will be off. So I need you folks either to run the meeting or to cancel it. I'm available to run the the the zoom part of it. If every is willing to run the the actual meeting agenda part of it. Is Stefan back next week. No, that's why I am asking Stefan and Bruno will be off so that will be only the three of you. So maybe we just intentionally cancel. What do you think? Are they you will. Will you be off in two weeks. No, I won't be off in July. Okay, so that means that we move to the 20 25 of July. Is that correct? Right. Let's cancel. Next week. Meeting and run the 25. Sounds good. Yes. I can I ask you to update the milestone and created just before the meeting to update it from 18 to 25. Okay. That's all for me on announcement. Do you have other announcement folks. Okay. Then let's continue with the upcoming calendar. So next weekly won't be delayed like the meeting. So next week. We will have the 2.415 version. I've already forgot the next. Not yet. So 2.401.3 will release. So let's see. So baseline or release candidate today. So releasing in two weeks on the 26th of July. Today, tomorrow, the new baseline will be selected. And the new baseline is likely 2.414. Because 2.413 looked very good. And I don't expect any surprises from 2.414. Okay. And 26. Okay. So I guess we will check the next baseline in two weeks during the week. Correct. Yeah, there's there's no crisis for us. Whatever baseline is selected doesn't doesn't have significant impact on infrastructure. Perfect. There is an advisory announced earlier today publicly for tomorrow. I haven't checked the content. Nothing announced. Nothing has been. Plugin Sony. Plugin Sony. I don't know what has been announced publicly just so then. He announced it in about three hours ago. Yeah, I saw the title of the mail, but I haven't looked at the body. Let's look it together. It's public. Yes, it is public and let's look at it. So for Jenkins plugins, not Jenkins core. Absolutely. So sorry, I cut you a. Okay. So plugins only. So that, that means that they, the GenSec team might need to access here. I was late on come on pinging them on that element. Thanks a lot for catching that really because otherwise they would have been. Slow down on that process. They will use the new CI Jenkins. So if any of the current plugin version are. Yeah. Are concerned then they would have to update it. Otherwise they only need trusted CI Jenkins to be up so they can publish Jenkins website when needed. Yeah. I mean, we shouldn't we shouldn't we shouldn't we actively assume that a restart of CI Jenkins that I will be required just in case. Yeah, I mean, okay, we don't know. They haven't told us which plugins are involved in public disclose which plugins, but it feels safe to just say. We're going to restart CI Jenkins that I would tomorrow. And then if we don't. Okay. We smile and say we didn't. Yep. Absolutely. Yeah. I mean, is there a volunteer to open the status here? I can do that. If that's helpful. I have done that before and I'm comfortable with it. Okay. Thanks. So I saw you were a bit slower to answer than Mark or maybe the electron from Paris or slower. That means you are dizzy. You are volunteer to review and approve Mark's pull request and status. So, and, and I assume we set it for a time tomorrow. Okay. So you might have a timeline. So we'll deal with that privately until we, we know when they start. Usually. They always start at. So it's a 1 p.m. for me. So that will be. 11 AM. Yeah. I'll, I'll just, I'll make up a time. Put it on the status and then we'll correct the time because they can restart it. Anytime they want, right? I mean, it's, it's whatever pace works for them, but we just alert that we expect to restart there because it may. Okay. I will take care of upgrading all the plugin from trusted CI and CI Jenkins later today. So deep when they will update plugin will be as tiny as possible. Thank you. Thank you. That's very considerate. I see one plugin update pending right now. I saw seven and trusted as well. So get a check. Great. Okay. Next major event. I don't have any. Okay. Then. And we can start. With the huge list of things that has been done. I will try to be as, as fast as possible. I promise. Thanks. Survey for the work on integrating CI Jenkins. Say, Observability in data. Using the data. Plug in. So now we're going to go to the next slide. Okay. Okay. Next major event. I don't have any. Okay. And we can start with the huge list of things that has been done. I will try to be as fast as possible. I will try to be as fast as possible. Okay. So now we have data. Plug in running on the instance that sends metrics logs and traces. To the virtual machine data. Agent which. Buffer bufferize all of that data and send them to that. And now on that. There is a nice section. See I observability that provide dashboards and pipeline execution. And we can see. A lot of traces and telemetry. So now we have a full observability integrated in. Data dog and we can start doing things such as monitoring bills. For instance, monitoring the bills in infra acceptance tests. That could alert us when an agent cannot be spawned or when the package is not able to be installed or. A lot of things. So these jobs will be really helpful to warn us in case of issues. That will be helpful to study what is the. Why are the bomb builds taking so much time for simple steps. With the telemetry, we can prove that with the traces. And a lot of other use cases. So thanks for that work. That's really useful. And that will open a lot of. Improve management here. While we are on Seattle, it runs on a new virtual machine since last week. In a new network in a new resource group with a new hardware, which is way more powerful and cheaper. And it runs on a network. Which does not have any overlap and potential to. To have IPv6. So see, I can say, you is not reachable for IPv6, but it could be in the future. Then the former resources have been cleaned up. We did a lot of clean. Thanks for the help on there. And if you use to access see, I can say you're with SSH, you need to look at the new runbook. Like the GenSec team has to do today. Because there is a new, a new configuration. It's just us name change. And you need to use the new VPN. The private one. Because we change a network. Any question on that part? Okay. So let's go ahead. So, let's go ahead. Let's go ahead. Let's go ahead. Let's go ahead. Any question on that part? Okay. So let's continue. There's been an issue when releasing the JellyDoc Maven plugin. Thanks, Basil, for fixing the issue inside the plugin. That was due to the GDK version. And the trend and the set of transitive problems due to that. Now it's fully working with GDK 11. Even 17. But I'm not sure, but I know it's not GDK 8 anymore. Fixing the issue. Thanks Basil for that. Erwe, can you give us a word about Windows Server 2022 agents on trusted CI? Why did we do that? And what problem did we solve? So about two weeks ago, I started to work on creating a 2022 version of the GDK agent image to be able to provide the GDK agent 2022. The CI design that IO has already in those 2022 agents configured, but trusted didn't. So we added them to trusted. So the GDK agent, Windows Server 2022 and GDK agents, Windows Server Core 2022 image could be published on GDK. Thanks. So nice work. I haven't checked the result if the images were published, but I think you did, Erwe? Yeah, yeah. They are published and I've used them to test the 2022 version on the Core in Windows agent. Oh, nice. It's already building everything. I just have to finish a refactoring of the build process in the Core in Windows agent with you. The request is ready. And then creating the 2022 agent, the GDK agent will be as simple as adding a agent type in the trunk inside. Cool. So there is a part that now should move to the SIG regular meeting because we did what we needed to prove that the infrastructure did their work. Anyway, we still need Windows Server 2022 for the infrastructure because we have the release package that build the Windows official MSI during the Core releases. It's running using 2019 line images, not even LTS. So we will need to build inbound agent or to use inbound agent 2022 so we can upgrade the node pool, the Kubernetes node pool used behind the hood. And also we have the ACI Windows agent and CI agent in SIO that are used to run Maven builds on different Java versions inside the Windows environment. This should also be a target to use the brand new 2022 agent. We are building custom images today with Docker on Windows that inherits from these images that are published. So we will have them to update our own builds. In the future, that would allow us to switch every systems to Windows Server 2022 LTS. We will need to keep the 2019 builds, but these can be built from a 2022 Server Docker with Docker now. It will use virtual machine. It's a bit slower, but different isolation level. It's not virtual machine. But yeah. So that means we can move everything as recommended by Azure. That doesn't do everywhere. Is there any question about that target and usage? No. Okay. Mark, can you give us just a head up on the Jenkins board repository? So we've adopted the board. We're using the repository that was created. We've got agreement between me and Alex Brandes that will continue using it. And it's, it's got content. I'm sure we'll do more with it as time goes on right now. It's just a content archive, but it could certainly be posted to someplace like governance. Jenkins.io or, or see this archive. Jenkins.io or, you know, et cetera. That's, it's, it's initial purposes met. And I am pleased to announce that one action item that was on my list for probably two years is gone. Yeah. Thanks Mark. Any question? Yeah. Alex mentioned it in one of the issue of poor request. This also means you are already using. Or you will use. No, because Alex and I haven't settled yet. And I'm the note taker for the governance meeting right now. My strong preference is still Google docs. I appreciate that everybody likes HackMD, but editing, editing. Markdown live in a meeting is more difficult for me. So for me, I'm still prone to keep using Google docs. I've got to talk with that on support. What's that? You have Markdown support on Google docs. I don't know. And in fact, that's what we do is I run a Markdown converter from Google docs to output it to Markdown. It's very simple. It works great. And so, but for me, the editing experience in Google docs is faster. It's just easier for me if I, if I'm live in a meeting and Google docs. I was asking it because I saw the current export of the archive from 2020 to 2023 is one unique Markdown file. And I was about to propose to, to split it in daily or month year or like the other archive. And with that, I don't know. You might have to adapt your process for archiving. No, no problem. It's trivial. If we want to split it to a file per, per meeting, that's easy to, easy to do. And I fully support that if that's easier or that I have no problem with writing an individual file. That's what I did with most recent notes is I had to update them. I extracted the current. Ran it and it was easy. So happy to do that. No problem. I'm great if we split it to individual meeting profile. If that's easier for people, I am happy with that. Okay. Okay. Let's continue. We had an issue build money to plug in failing with 401 unauthorized. So that one was due to air pay. Repository permission of data build failures on trusted. I don't recall if the timeline was due to the outage. I caused the last Friday or something else, but that was the cause. And as soon as the build run again properly, then problem was fixed for basically update GitHub user to authorize as committers. So thanks. I think it's, I guess it's Alex. So someone added for permissions. So thanks for this one. Test history page and see agent can say for COVID is inaccessible. I don't remember what was that issue. It was after your side switch. Yeah. Okay. So that, that issue was due to other. Yeah, the S3, the S3. So there were too much builds. So the page mentioned by Alex, that was in error. It's the proxy time out on Apache because it took two to three to four minutes to load at the beginning. Due to another issue we'll see later we were able to rotate the builds. So back to an effective limit of 50 builds in the history of the master branch of Jenkins core. And now the time is 30 to 40 seconds, which is below the minutes. So no more time outs. So yeah, thanks for reporting. Thanks for the, because really pointed an issue on the G unit plugin that could have been related to that. So yeah. Now it's working. So no more problem. There were requests to add users so they can access to gen, to a Jenkins CI team. Releasing to incremental yields. So yeah, that was a five or three that was two weeks ago. So the incremental publisher was failing because someone here bumped the Node.js core version to trying to clean up the image. And of course that person named Damien DuPortal, which is me, didn't tested powerfully. So it started to fail. It has been fixed. But also we had a secondary issue that was initially using its authenticating, authenticating on CI Jenkins, API to retrieve information to get the build artifacts. That authentication is not mandatory, but could be used for API rate limiting. I'm still having mixed feelings about that. And I don't know how and why, but with CI Jenkins, your migration to new virtual machine, the token of the technical user we use disappeared. So of course the presented token, since it was, there were no token defined for that user on CI Jenkins. So CI tried to pass that token as an LDAP password to our DAP, which of course said, no, that's not the right password to use. Effectively it failed. So yeah, we created a new token on CI Jenkins. I'm sure I added the instruction and it was pushed on our secret store. And once redeployed, that's okay. Sorry, Mark, go ahead. So I think you missed one, one part of that story, which was incremental's publisher was not running dependabot at all. Right. So it was way behind and you found that problem, fixed that problem. And of course that meant there was a flurry of updates. And I blindly approved many of those flurry of updates. And so there was this terrifying period of blind approval because CI passed, but CI was doing almost no verification. So yeah, thank you. Thank you to all involved in fixing that. I guess I've got a question is, is there a way we should safety check our dependabot pages to be sure that if dependabot is enabled, it's not showing an error because I know plugins I maintain have that were on occasion, I'll make a mistake. And if dependabot, and of course it stops submitting, submitting pull requests because I broke it. But it's separate topic. Another time that's hard. I will say. I don't know if we can catch if dependabot is broken, but checking for the last image update, which was two years ago for incremental publisher, or one year and a half and checking for the presence of. Eventually the last. No, it's not with, I'm not sure if there is something on the GitHub API for updates. Yeah, you could look at the last builds, but yeah, I don't know. So we might be able to go ahead. We might be able to check the, to, to, to pull the check status and put the, the, the galerton needs something like that, maybe. You're, you're also sophisticated. I'd read the webpage for the dependency graph that says, I don't know if you can use the function and use that. I know that Olivia is working on a dashboard for dates, for that specific usage because the problem is that you need an, eggs today, whatever solution we choose, you need an exhaustive lists of the project, of the repositories that we need to check. And the more repository we'll create the more we need to update that list. That can be. Okay. But yeah, there is that, that's a complicated topic. And I don't know how it's dependent about answering to that. I would say with a Chrome running every repository and check user is a new dependent about config to add it to the repository to check that. But then how to exclude repository that I don't need dependent about for a lot of good reasons. Config, you can create a pull request with a new one and add an exception list to be reviewed by humans. Yeah, but that will mean a lot of process and things. But that's that's a good idea. Best if I just submit a ticket. If that's the ideas go go into tickets very nicely. Let me take it there. And the main cause of the second problem was the tests inside the application were on testing. Yeah, I don't understand that. Yeah, I don't know how to build and test. Well, and we have similar say cases and other things. We've got a back end extension index or bug report right now that has is highlighting that it's tests are not testing anything useful. And we had a we had a failure in the pipeline step stock generator that it's tests were not testing anything useful. So, so that's a common pattern. There are there are a number of places where we would benefit by more tests. Our apps where tests are not. The tests are necessary but they're not sufficient right they're really not. Yeah. Okay. But now it's working well so thanks everyone involved because at least six person were involved in fixing that issue. Thanks. Brunch strategy and see a Jenkins IO diverge from prior configuration consequence of the migration as well. VM migration since we had we had to recreate a Jenkins home copied from the former one we weren't able to reuse the former one. We applied a lot and a lot of cleanup. And also we had issue with the S3 buckets that are archiving artifacts seen that later. So we applied thoroughly any elements. That means we need to think about using job DSL for defining the jobs in CI Jenkins iOS code. So this kind of elements will be easier to update and we could plan it easier because each time you try to scan it, you eat the API rep limit on GitHub. Effectively that break all the bills and say a Jenkins IO. Or no, that makes all the bill waiting on two with no information saying they are waiting unless you are an admin which is all full for the end users. Switch to job DSL for defining. Okay. Thanks everybody for the IPv6 support. Jenkins IO is properly set up everywhere. And all of the services running on on public on the new cluster are all available for IPv6. Except the service. Good points. It's working on on its own load balancer service. So we could add an IPv6 for the help service, but we can see the need right now. Yep, because we are the only consumer of held up and we don't need IPv6 for the connection to the TCP connection to LDAP. Accepted up our IPv6. Compatible. Compatible. Thanks. S390X. So I've closed that old issue because since we upgraded to 22 weeks ago, the machine. We don't need to replace it. And by the way, the agent was offline. I don't remember exactly, but there was a quick fix. I think we had to restart the agent in SAO. And it was upgraded at that moment. So I don't have anything else to add on this one. And we have an open issue for managing it as code in the future. Plug-in continuous delivery failed without factory permission denied. I don't remember this one, but I guess it was one of the outages. Oh, no, yep. That was one of the outages. In order to migrate to agent in SAO, we had to switch the agent configuration from SSH launcher to inbound. And there was an issue in the code. So the init script is run. That's a funny one. When you use inbound agent with our setup, it uses SH. While it uses bash, when the SSH launcher is running, it uses the init script. And if you use that form, the pipe and the end, that is bash only. So that's why we had to change to the usual form redirect yesterday and then pipe. That was a nasty one. Next one. Javak vanished from the high meme. So we had the issue on the template that has been fixed. So we had the agent configuration and their content. So thanks, Alex, for that. IPv6 again, that has been validated with the new cluster, as we mentioned. And during the changes, we did some misconfiguration. So we had people who are using RFC compliant DNS resolver in the organization. During one day, they weren't able to reach the IPv6 connection. So we had to remove the record. We fixed the IPv6 and had it back to record and the user confirmed that it was fixed for them. So thanks for the job here. That wasn't an easy one. We had to learn how to do it. We had to learn how to do it. So thanks for the job here. That wasn't an easy one. We had to learn that part. I can announce that public cluster has been fully migrated to the new network on the new cluster and we removed every leftovers except a network that is soon to be removed. Which means we don't have any more overlap issues and everything is running on a clean and clean network. So I think it's been a multiple weeks effort by airway sustained by the rest of the team. So great job. Now we have a proper cluster to work with. The first consequence of this one is that now we pay $500 each month less than before. Just due to the fix ups and the new cluster. Of course we had an LTS that has been released to all of our controller using the LTS line. I don't think there is something to add on this one. That's usual. We had a few close as not planned. Since we migrated all the cluster we don't need to back up that cluster. We had issue during the outage of Friday we had an issue about someone saying I want to install it doesn't download. That's all the information we had. I closed it with no information. That could have been an issue with the mirror. That made sense to move it to LDS. Close does not plan. We only have one agent and there is no need for us to have an agent. An agent that we don't control is risky for the trusted CI ports. Since the initial need was to provide Docker images for that CPU architecture and we are using KMU to provide these images in at least one year. There is no need for that. The native machine running on CI is perfectly fine and would allow native tests. If we have plugins or an artifact that needs to be installed, that can happen. In that case, it will be the way to go. Finally, I closed an old issue about importing and managing AWS resources because now we already managed the Kubernetes cluster and we are getting away from AWS. We have free virtual machine left and then we won't need it. That's why I closed the issue because no need to spend our time on this one. One of the machine which hosts the update center cannot be managed properly through the EC2 API because it's running on, let's say, AWS maintain manual hypervisor and it cannot implement half of the features of EC2. So trying to manage it with Terraform is risky and I don't want to take that risk because that could end on stopping the machine and making it unrecoverable and we don't want that. Better to migrate it properly and then we'll be okay. Any question until now? Things that we closed and I forgot? Oh, okay. Work in progress, we had a new issue about the Jira upgrades. I saw there has been a setup on Jira, a default setup that has been changed. I'm not really sure. I'm Jira admin so in theory I could do it but they have no idea what the feature is. I don't know if we can do it. So it's, and Daniel Beck is a Jira admin. So if there's anybody who knows how to do it, it's Daniel Beck and so it may just be as simple as us saying plus one to set the behavior, Daniel, can you do it? Okay. Because like you Damian, I am a Jira administrator but I'm scared as can be to make administrative level changes because I don't have the expertise that Daniel does. Okay. So then that case if it's okay for everyone, I will add a comment saying that to Daniel after the meeting, remove the triage but also remove it from any milestone because it's not something we should expect unless Daniel say I can't, I need you to do it and then you on high will have to take that issue and then we learn enough about Jira admin to do it safely by doing it together. Ask Daniel or Jean and remove from milestones. Not an infrasque but need to be tracked in L-Disk. Okay. Is that okay for all of you? Yes. AWS decrease AWS cost. So short-term, I didn't have time but I need to do it later today. I need to report on the June billing what is the status of our spendings. I guess we will, it stay almost the same since the past three weeks. I haven't seen anything, we have put alerts and we saw the alerts two months ago. So a peak of increase would have been mentioned to us. So I need to report and give a status here. Then the other task for that issue in order to decrease. So I propose that we should be able to close it and create a new one just to focus on the new item. The new step will be remove the free virtual machines that are currently running and consuming half of the credits monthly. One is update Jenkins CI with the packaging machine and then we have two other tiny machines and Susan usage. I think it's usage. Is that okay for you if I report and close this one on the upcoming issue and create a new one so we can narrow the scope and avoid being it to verbose? Yes. Okay. Got a close in favor of new one is title scope. Need to report for soon before. Okay. Almost closeable but not closed yet. Kubernetes 1.25. So we did the heavy lifting including a big general outage of the cluster that we had to recreate from scratch. So first good work survey because you work allowed us to recreate a brand new production cluster in less than half a day. So thanks a lot for the support and the work here. I still have a post mortem mark. I was thinking about writing the post mortem on the Jenkins IU block post because it was the impact was clearly outside our team. So I would prefer doing it afterwards. Is that okay for you? That will be great. I think that'd be great if you don't mind describing what we learned and I think that would be a great place to put it. Absolutely. So that's we need to finish the post mortem. That was an issue with an accumulation of configuration change directly related to IPv6 and the way it works with Kubernetes. But we also learn and propose some let's say improvements. So for instance, the public IP that were deleted is changing effectively all the external DNS resolution of everyone trying to download. That should not happen in the future. So we were able to find thanks to our researches locks. So we have the concept of we can lock resources so they cannot be deleted. What will happen next time if we delete the Kubernetes cluster and the public IP are part of one of the automatically managed resource groups. One of the resource group will fail at the ends because the public IP has locked that they do not delete. So we have added that as a security and it's managed as code. That's a minimum improvement. And as Tim said, so I don't know if you saw this one. Yeah, I saw that. So we don't need the lock protection for IP because we can't create them outside of the cluster and not pull the source code. We can avoid that transitive implicit deletion in the future. So I propose that we can keep it. We can add this directly on the commands of the load balancer even if it's the default value. So we can add a command pointing here saying you can add them on the resource group if you have to recreate it next time. Next. Yeah. Yes, exactly. What is the next Kubernetes upgrade? At least we know that we are able to most of it is stateless. At least we don't have any more pets cluster running for three years and we are confident in its recollection. Absolutely. And the backup LDAP system that Olivier built and that you checked and refreshed during the migration shows that we don't lose data. Because I have to admit that during at least 20 minutes I was more sure that we lost the world LDAP data. It wasn't the best moment of my life. Another improvement is that we have a technical administrative user used to administrate the cluster from our application system. That technical user is a service account and the only way to create it is a script inside my machine which is absolutely not sustainable. So I've got to add the require element on Terraform since now we have managed as code all the cluster that will create that user for us and generate the Kube config as a sensitive output and only copy and pass the output if we have admin access to Terraform. That should be quite the improvement. He could also add the script output to the notes displayed at the end of Terraform too. No other output. We can discuss that later. And finally I need to open the issue for the next Kubernetes upgrade that we have. I propose that we try to do before September ideally just to be sure that we are good enough. The issue will start with what are the depreciation line of 1.25 so that will help us scheduling a timeline that should have the same element that we have. So once these tasks are done we can close the issue but the rest of the issue for this one. Right on Jenkins.io. After lists. Subtasks done. Upgrade the Ubuntu 22.04 upgrade campaign. So one no two last candidates. Update Jenkins.io is still running next one. This is part of the AWS costs. That's a good program. We say that one will allow us to do the upgrade or avoid doing the upgrade that would be even better. A word about Poupet Jenkins.io. Still on Bionic. So we don't have to rush removing it. We have two years. We don't have the same operating system as the others. That one will need to switch to Poupet 7 not Enterprise. Because as for today the problem is the following. Enterprise version of Poupet doesn't support yet Jami. There is a huge ticket with a lot of people asking for that. We are not able to use it for free for the first 10 or 12 machine as an open source project. But it doesn't give us any feature that we have used during the past two years. We used to work with the web UI which is Enterprise but we haven't done it. By default that means we should migrate from Poupet 6 Enterprise to Poupet 7 and then we will have the Poupet server support for Jami. The agent is working on Jami. We mentioned a few meetings earlier about using Ansible instead that would have allowed us to change the paradigm and not needing that virtual machine anymore. However with the recent let's say heated discussion on the Red Hat area and the operating system around the Red Hat ecosystem but Ansible which could be suddenly less open source I'm now having Segon talked about moving away from Poupet. There are other solutions that could fit but yeah so that's why I propose that we focus on during the summer upgrading to Poupet 7 that will allow us to also keep that in mind. Is that okay for everyone? Yes, especially given the noise in communities around Red Hat sponsored software. I think that's very wise. I propose we keep that issue on the upcoming milestone because as we'll see a bit later in two bullet points, we will have to work on updates on CIG and Kinsayo fails to delete stashed artifact with access denied so that one was preventing the build discorder to work effectively on CIG and Kinsayo and that was a chain of problem then. This problem has been fixed. So Erwey do you want to explain just a bit what we did? What is left to be done? The default configuration which is not deleting anything from S3 storage from Jenkins as it's security risk as underlined by JC. That's also why these options are specified via GVM option which is cumbersome but it's not for this reason it's for discouraging people to use this option. I think we can close this issue since the projection problem is fixed and we have to open a new one to put in place a service to discard old artifact and stash on S3 as well. We can close this issue via AWS life cycle policies. Can I let you do this part? Erwey? Yes. Open a new issue about the artifact. Just a note right now during the past two months we went from $1 to $0.60 per day. It increased the cost but we are still below $1 per day. The cost is not a risk on short term. That's why we can close this. Thanks, Erwey. The next one would be split between Erwey and Hai. The goal is to migrate the virtual machine to AWS. The backend that uses to serve and distribute to remote through CDN network package files. That one is running on an old machine on AWS that is locked to Ubuntu 18 right now. That creates a lot of weird issues and maintenance challenges. There are multiple solutions. Right now, Erwey is studying if we can use Cloudflare for the update center. The challenge here is can we replace an Apache web server that we run from ages which is not highly available and subject to issues to something else where we have a CDN network that could be used to distribute. That's the challenge. We have multiple issues and elements to discuss. I propose that we don't go on details here but Erwey is working on that area. Two concerns that we need to mention. The first one is we still serve HTTP content without enforcing HTTP in this redirection. We force the redirection with the mirror downloads but not on the update center because it was written on an old GP that Erwey studied that we had old instances that weren't able to run TLS. As we discussed a bit that sounds like an old constraint that shouldn't exist anymore because these instances don't include plugins and plugins have HTTPS enforced. First of all, independently enforcing HTTP to HTTPS redirection for update center with the blog post that is required like we did last year with the mirrors. I think that should be a good start that could whatever solution we find could be useful. That one is still important. Does it look okay for you Erwey Mark? Yes, absolutely. Last year for Ketchen Kinsaio and Erwey also that the current publication process generates HT access file for the update center that creates a bunch of redirection depending on specific update center chairs or something. So update center is highly compared to Apache. We cannot get away and replace it by an engineering server somewhere building our own distribution system or using the public services. Do you have any other questions? I have three buckets from any others. Is my understanding correct? Erwey, it seems so, but there are a lot of HT access redirection generated by this center. I need to see how it works. All the stuff are on the update center. And one of the compensation that we talked about because the goal for us is to find a way to decrease the outbound bound wave represented by the JSON update center file that are served to the end users. That part costs a lot. So it's not the amount of requests that can be solved. That's what we are paying for. Additionally, high availability is needed. One of the elements that could be studied and Mark might have the answer is that could we think about having an HTTP redirection when going to updates if there is a redirection, will it be honored by the Jenkins core instances? And that would require research across multiple Jenkins instances at least for a year's worth of Jenkins. That's a very much a Daniel Beck question to ask. He will have forgotten more about that than I will ever know. Okay. Update center index. Because if we have this, that means we can always have our own Apache system. We can use that to build our own cluster. Highly available because built and integrated someone one of our clusters. And the redirection when it ends on instead of serving the file that will redirect to the Cloudflare R2 for instance. So we will control the entry points. We would not have to point the domain name to the database. The thing is that what we call mirror is something we control because the reason of not putting update center on mirror bits as far as I can remember. But maybe that can be changed is that if we want to invalidate an update center cache, we can use a specific installation process which we cannot with sponsor based mirrors that are pulling the data from us. However, we could think about using mirror bits with a specific installation only for that and the only mirror will be ours will be Cloudflare. That could allow us to project on China or have a fallback or think about that kind of scenarios. We need knowledge sharing from Daniel. Did I capture everything we exchanged? Did I forget elements or things are unclear? No, seems good. No, seems good. Okay. And on my side, I've started working on PKG Origin Genkin Sayo, which is a second service run on that machine. It's trying to move that service to Azure on our public and private infrastructure. We'll move this to Azure. Because fastly is in front, so no traffic or bandwidth to pay for that would add HA. And also the core release process would not need to be run through an SSH command. That means instead if we have all the data during the packaging process that will run directly on the pod that is releasing and packaging everything that will have access to the real data generate the Debian and CentOS package inside the bucket and that bucket will be immediately available through the public cluster. We could control finally the environment on a Docker image that would not constrain the Apache version because we would have separated elements and the data. So that one will help a lot to getting away from that machine as well. So that's why we speed the job per service. Is that clear? Does it make sense? No? Okay. So that one will be our big next task for the upcoming weeks. Outside the artifact report. Any question or can I switch to the next one? Almost there, folks. No question. You can switch. So Matt, I propose we keep it on the next milestone. We need to work on the topic. Thanks, Kevin, but we will lead you to all the operation during the past two weeks. So I hope that we should be able to start working on this one again soon. Next milestone looks okay for me. So I propose we keep that one on the desk. I've looked at Kevin said that his repo was way more updated than the one in Jenkins and FRA, but I didn't know any difference. Yeah, that's a implementation detail in the sense that we have to spend time on that. But yeah, thanks for the head up. That is that says we can work on that and Kevin already did some work and we need to help him on that area. Proposal for application to migrate to ERAM 64 got a delay until Stephen is back. We had a discussion and now everything is clear. It took notes. So we should be able to resume and it will be back in early days when I will be back as well. So the goal is to have ERAM 64 for all the static services on public gates now that everything has been done, upgraded, migrated. That will be Stephen main priority because it will help to decrease cost on different area. Artifactoid bandwidth reduction option Mark, I think both of us are a bit late on that topic. Right. So that's it will get some focus today and tomorrow for me. I've got I've got to propose well propose changes to the root palm for both Jenkins core and plugins. And then we've got to start evaluating what does that mean, etc. and dealing with the bumps and bruises of it. Thanks. So I'm keeping it. And finally, artifact caching proxy and I was late, but the goal is to open a pull request on ATH based on former basil walk. The goal is to start using ATH on every part, ACP on every part of the acceptance test on this build, not only the initial generation. So got to start a draft pull request and see if it break the ACP on Azure or if it's working now with the new network. I think that's all for the current element. We have new issues. So improve data dog in gestion. Thanks. I think that's to keep track of what we could do with data dog in the future. That one will we treated it. I've opened backup infrastructure data. I was sure we had an issue that you wrote an issue but I wasn't able to find it. So maybe that's a duplicate. But I try to put as much information as possible here to point the different elements we could have that was triggered by the CIG and can say you former resource group deletion. Right now we have snapshot of the two former disks on the new resource group and the goal will be to see the to study the Azure backup that create a vault of the data that can be migrated to another cloud if needed. So that one will be our encrypted vaults for storing data and recovering case of a big problem. There was an issue on the developer extension back in crawler. Mark, I believe you took care of at least answering to this one. Well, I wish I could say I've taken care of it. I've got to do the research to find out what broke. I am reasonably confident I'm the one who broke it and therefore it's perfectly justified that I should be the one who fixes it. But I suspect I merged the dependable change and the dependable change passed all the tests which hints that there are more tests needed. So there will be some research and the outcome of that will eventually be tests that check that particular attribute and then only allow pull requests if that test passes. Okay. Should we follow your recommendation and move the tissue directly to the back end extension? Yes. I think it doesn't belong in our help desk. I don't think I have permission to make that decision. But if someone else does, I would love to have it in back end extension indexer because there's another issue somewhat like that from Daniel Beck already that's been moved to back end extension indexer. Okay. I will take care of this one. I will finish the notes with what we said with the new issues. I will give details on what we said. Do we have new issues though? I'm missing some metadata two weeks ago. Missed this one. Oh, yeah. Okay. So we have users from China that have low performances when trying to reach either a data center and or the download mirrors. Even though we have mirrors in China, we have way more than we knew. That could be interesting. So as we say to that person, they cannot use archives or PKG. There is a lot of elements. So one of the proposal we had, the error to Cloudflare has ability to project copies inside China. So that could be a way to help our China users. As we told to that person, we need a sponsor inside China because there are a lot of unknown knowledge. There are a lot of things that we don't know about how it is to live there. So we have to use some friends there and now not anymore. So I've tried to ask the person gave us good details. So, yeah, that I propose that we delay that to after the work that they're doing, once we'll have migrated update center. Does it make sense for you? Yes. And finally, we already treated this one. So I don't see new issues here. Do you have new other topics to, do you want to mention? So I raised an issue that may be interesting to infrastructure. I'd like an opinion. I'm sorry that I didn't solicit your opinion already, but pkg.jankins.io and mirrors.jankins.io both have installation instructions in their Debian and in their Red Hat subdirectories. But the installation instructions are, so if you pick Debian or Red Hat, either Debian stable or Debian, you see a nice page that gives you some installation instructions and they're generally helpful. However, they are not the full installation instructions because the authoritative installation instructions are in the Jenkins user handbook. And the problem with these installation instructions is they are nice and simple, but they only tell you about Java 11 and we want you to use Java 17, but Java 17 isn't available on some Debian machines. And on and on, there's an awful lot. So the temptation I've had is to offer the pkg.jankins.io should redirect for that page to this page and stop trying to present those simplified instructions because simplified instructions are inevitably the wrong instructions for some set of users. So I'm open to your opinions there and we could do it separately. It doesn't have to be today. It's just I've realized this page worked really well when there was only one Debian released to support. We now have 10, 11 and 12 with very different Java versions on them. And there was only one Red Hat version, the Red Hat page worked, but we now have eight and nine and we have other variants like Rocky and Alma and Oracle. And it's no longer a nice and simple world. Honestly, I think that's a brilliant and really useful ideas because in the worst case, it's adding redirections on Apache and XHTML can also serve a body that redirects. If you have a web browser. So yeah, combination of both. Yeah, that makes sense for me. I mean, the other benefit there is it will stop the enumeration of directory contents that happens on some of those pages. And I just don't think there's enough value to people listing directory contents. I don't remember. It's mirrors or PKG that actually will list the directory contents. But I just, I'm not persuaded that that's valuable. So just, it's a conversation topic. I'll bring the conversation topic separately on an issue. We don't have to resolve it here. Makes sense. I think that's worth opening an issue to start the discussion here. Right. Don't forget about it because I mean, that makes sense that will help us, especially we had a confusing moments or very high about finding which index HTML is generated from where. If you remember the GPG rotation, the Jenkins design, I forgot the name, things, header and footer. So yeah, that will help definitively. Yeah, it, it, I don't think there's compelling value to having those exact web pages on the PKG site. If instead we could redirect to documentation that really describes the situation for the user. Thanks for raising that topic, Mark. Do you have other major points here? None for me. Okay. So tomorrow. Okay. I'm crossing the issue and stay here. Just need a quick sync with the two of you and see you in two weeks for the other watching this recording. Bye-bye.