 Hello everyone, welcome to the Jenkins Infrastructure team meeting. We are the second of May 2023. Today, around the table, we have myself, Damien Duporteur, L'Herve Le Meur, Marc-Houay does not here, does not seem to be here. Let me share with you the shared notes, so you can help me taking notes while I speak too much. I've added the link to the shared notes in the Zoom meeting, you can also find them on stack and diaries. So we also have Stéphane Merle, Bruno Varton and Kevin Martins. Let's get started with announcements. The weekly release 2.402 has been released, is that 400.02? Three, no. Let me check the one we currently have in production. Yes, you are correct. 400.03, hello Marc, is out and the packaging worked on the first try. Yes, so the next step I assume are next release checklist elements to be done later. But that means it's agreeing to go for infraciae. The container image has been built and container is out. Is there anything else about the weekly release that you folks want to... It needs a merge of the change log, but that's various release checklist items. Okay. You don't need to list them there, we know they're on the checklist. Thanks Marc. Second announcement, update center certificate. Since the 1st of May, the crawler job and the update center job stopped working because they are one month ahead of the certificate expiration. So I forgot it was one full month, I thought it was one week. So there is an internal safety system that says, oh, one month left, let's fail with an error. So we will need at least for today and tomorrow for a few days until next week to find a way to disable that check. I'm almost sure there is a way on the code until we can renew the certificate. But that was mentioned that we will have plugins that are expected to be released but not available on the update center since yesterday. So we also will also have to update job configuration, which is currently six hours floating window to be 48 hours just to be sure we catch back the deployment. So need to catch back. So yeah, I think a message on status Jenkins, I wouldn't kill as well as an update because we haven't updated the existing elements. So we'll take care of that if it's okay for you. Plug in releases. Disable the check at least for one week. So that means the tasks are to be more important. Hopefully I tried this morning. I had a virtual coffee with Olivier Verna and the key he sent me privately after the validation from the board isn't the CA key, but the last year certificate key. So it's a miss. We need Olivier to start again his whole machine on his vault to send to encrypt and send me the new CA key. Otherwise, I won't be able to sign the certificate request from Stefan. But once it's done, that should be easy to solve one hour of work need Olivier to send us the proper key. That's all for that specific topic. But that mean update center and crawler, the jobs in charge on trusted CI to generate the update center index and the tools index, you know, these tools, JDK tools, Maven that Jenkins cannot to install. The metadata for both these elements are signed by these jobs. So they aren't updated since the last 24 hours. A word about Azure maintenance is as well. Microsoft is planning a few maintenance is during the month of May. So just to let you know, we will have to open issues if we are if some of our services are under these maintenance is. I saw an email just before the meeting, but I had two or three on my backlog. That should be end of May or June. So we need to help desk open if there is something to do. No help desk means nothing to do. That was just a note. Finally, a word about the upcoming software update campaign. I will call them. The most important one is when to when to 1804 reaches and of support at the end of the month of May. Okay, so we need to update all of our remaining when to 1804 instances. The test we have done and the thing we have in production shows that we shouldn't have a lot of issues. The only exception is the packaging system on PKG Jenkins IO. So I was waiting for the LTS that will happen tomorrow before putting hands dirty. The goal will be to containerize to use a Docker container for the packaging steps to provide the proper tool before trying playing around that will allow us to upgrade the host machine to 2204. And then we should be able to work on the container version to have a reproducible process for the packaging steps. The main targets are the Red Hat's packaging tools on open to 18. We have a specific version that we know that work on open to 20. The tool doesn't exist at all. No one was able to run create repo and open to 2004. And there is a new version on 2204 the current LTS that we are targeting. But it's a rewrite from Python to pure C its name create repo dash C. So we don't know if it works or not. And we have to spend time on trying to package Jenkins with that new tool. So that one is the most prior. We should be able to do it before end of May, except for the packaging. So Damian. Yes, there is there is a facility that we could consider purchasing the extended support maintenance from from canonical. If we say we have to stay on a wound to 1804 for a little while longer. I don't know if they offer it for free to open source projects I doubt it. But, but there is a product that extends the life, the security support for 1804. If we really need it. Considering Canon equal. He called extended support. Yes, but on the other hand, if we are not able to, to bump a six years old version of our packaging system, we might have other short term trouble. Right. Right. And I agree. Certainly the reality is this is unhealthy, right? We need to switch to new packaging. And that absolutely we do. But yeah, worth considering if we are stuck or if we don't have the resource, thanks for sharing that good point. And I think that there was an experiment that did switch to using create repo dash see so I'm not overly concerned by it. Same. It's more a question of timing. So, Kubernetes 1.25. We should be able to do it in June, because the end of life for Azure and digital ocean for the 1.24 that we have in production today is end of July. So my proposal is that we target the upgrade to offer cluster to Q&A test 1.25 to June. Is there any objection to that high level proposal? Let's target June. And of life. July 2003. Get on the other. I was on table. I didn't search for Amazon, but Amazon are always one version below. There is one new upgrade I wanted to push during the summer later, but we might need to do it earlier. The good thing is that it looks it's going to be okay with the one year of cleanup. That's the puppet seven version. The reason is that we now have in less than one month for puppet module, which latest version is breaking because it drops the puppet six support, which is the major version we are using in production. Even though puppet six is still LTS and provide a security support from puppet labs. They have recently introduced puppet eight on the enterprise parts. They should deploy and distribute puppet eight for open source soon. But that means we should switch to puppet seven. The preliminary test, at least on the unit test and vagrant shows no problem on that one, whether we are on Ubuntu 18 or 22. So that one should be quite easy, but that will require shutting down the whole puppet thing. We should not break services, but we will have to do it. It's a slow and careful operation because we have to do it machine by machine. So that's a full day of work and full day one full day of preparation. So I'm not sure when we should be able to do it. There is no emergency. So I propose to do it once Ubuntu 222 is finished. Does it make sense any objection? Or do you think Kubernetes 1.25 should be done before? I don't have a strong opinion. So that's why I'm no objection for me after Ubuntu 22 and before Kubernetes 1.25. The reason why I want to mention these high level campaigns is because we haven't updated the Jenkins roadmap since at least one year for the infrastructure. And I will want to update it. And these kinds of elements will be really nicely placed on the infrastructure section of the big graphical roadmap. I will share the pull request and the link to you once it will be done. Do you have other announcements? Cool. I want to talk next on the calendar. Next weekly 9 May 2023. I expect the version 2.404 release not found. Next LTS is tomorrow. May 2023. We expect 2.387.3. Chris Stern is release lead and answered on the Jenkins release channel. He planned to start at 4 AM UTC. To start at 4 AM UTC, which means 6 AM for the European central European timeline, which maps perfectly to the time of us being awake because it takes 2 hours to build a release, bit more, 2 hours and 30 minutes. So that means starting at 8, 8 past 30 on the morning on our time, on the time for Stefan, Bruno, I and Harvey. We should have the end of the release and beginning of the package. So if the package fail, that will be morning beginning of day for us. I will be here in any case since I will wake up start early and finish early. So we are covered. Harvey and Stefan, if you are around during our morning, just in case that will help me feel safer. No problem. And once it's released, then we can start working on infrastructure again. Any question? Next security release known that I'm aware. Nothing but the mailing list and next major events is the city cone. The eight nine may mark. That's correct. I'll be there in Vancouver, Canada. And yes, not my fault will also be there. Yes. So Alex and I will see each other and we hope to see Gavin Morgan. Oh, let me hide. Okay. Do you have other calendar elements? Okay. That was perfect. Okay. So let's proceed to the walk that was done. I will try to be as brief as possible. On see a Jenkin say, oh, the bomb bills are now split. So theoretically, and what I saw during the weekend tend to prove that assumption. Even if you have multiple bomb bills running and waiting or using Kubernetes agents. The other normal builds for plug-in developers or other project are still allocating without waiting. So if you have a bomb pick of bill, you won't wait for one to two hours before you build is taken as a plug-in developer. That should be a nice improvement. The resource sizing is still the same. We use the same machine size and the bomb bill seems to work from operational point of view. Anyway, this weekend, I've seen a lot of different infrastructure failure during the bomb bills on master. I saw, but most of this error are not reproducible between two bills. I saw ACP failure due to DNS temporary DNS resolution failure. I mean, I don't see anything we could do about AWS DNS failing for a few seconds. Accept hosting on DNS not work the effort. So some failures on what appears, but I'm not sure of myself there are what appears to be test failure inside the bomb code. So I'm not sure if it was real test issues that could be because bomb and plug-in were updated while working on the info. It's related directly to the info. I'm not sure I can correlate. The time is a bit more than it used to be. But again, I'm not sure if it's because of the combination that are increased, the amount of test increased. That's still weird. But at least we have a value delivered to CI Genkin Sayo. Plug-in developers should not have to wait for the bomb bills. So as far as I can tell, there may be, well, I attempted to release a new version of the bomb over the weekend and the bill failed. I didn't do the analysis to understand why I just launched one now. And if we're lucky, it will pass. And if we're not lucky, it won't pass and we'll diagnose why. The one you launched this weekend was the failure on DNS resolution. Ah, thank you. So clearly a transient failure will help for the best. Great. Thank you. We were able to contain the excessive consumption on two Azure different usages. The first one was because of CI Genkin Sayo that had a lot of increased activity during the past two weeks, especially the acceptance test and core IMM machines. The data we have on Datadoc shows that these machines were clearly underused. So we decreased to a half cost machine. Same amount of CPU, even if I'm sure we could decrease, but twice less memory, 32 gigabytes instead of 64. That's effectively half the cost of the previous instances. So that helped a lot. We'll see if it stayed that way. So that's why it has been closed. And the packer builds that was on us. We shifted the one month ago, the pattern we use for the resource group. It's a kind of see that as a namespace or a box containing cloud resources. And we shifted the way we were using them on packer for security purpose, but we didn't adapt to the garbage collector. So effectively we all the experiment and work and fixes done on the packer image built. We had yes, hundreds of virtual machines or associated resource that were dangling inside this. So we have removed these resources and we have added a garbage collector that has been proven to be effective. That's, yeah, that was a lot of money wasted. So up we will be more careful next time. Thanks, survey for and Stefan for pointing that Puppet Jenkins. I owe thanks to your work on the alerts and the communication and the team that machine was using a lot of space. I was able to clean it Saturday that was easy. There were dangling archive of the Puppet installation that were staying on the machine. They should have been deleted and one gigabyte per archive with three archives duplicated on the root directory and all Olivier Vernon's repository. That was the a lot of gigabytes used for nothing. So removing them and rotating the vacuuming the journal CTL was okay. So that's okay. We had ground commit permission that was repository permission of data not directly infrastructure. Thanks, survey. Thanks everyone involved on fixing the crashes on the plugin Jenkins. I your website due to the school that has been fixed as far as I can tell as far as their what you reported a very. Anything you want to add on this one specifically. Cool. Having validation issue and pull requests. New maintainer of a new plugin was discovering all the tasks to do it. Nothing expected from the infrastructure to summarize better reading the doc. Log initial not factory that sounds like. Same same kind of issue new maintainer. There is a. Egg and chicken problem when you have an account never connected to our factory. But that has been solved. Certificate issues on the sea. We also have a virtual assets. As far as I can tell that's domain serve and protect. And serve some static assets. We had the same word issue. I think it was was it pick a G mark or was it another machine? I think it was a day. Last week. The third board system wasn't working. If we the crown tab was running, but it's in quiet mode and we can change that it's part of the puppet module. So we don't have a log on what did it fail or what did it refuse. But it's been weak since the certificate should have been renewed. That might or might not be consequence of the refactor edits in March with the Python version and there were a lot of tangle issues. The certificate were renewed. We'll have to be careful in July for the renewal of this one. Hope it will be fixed with the Ubuntu 22 or four great though. That will feature recent version of third but I hope we should be able with this and recent pets to have this issue fixed. The alternative solution will be to disable the automatic Chrome tab on the puppet module and write ourselves a shell script that will report a log somewhere. No line of code, but that's got to be written. Any question. None for me. We when we detect those kind of problems. We even if we need to do an interactive fix. I'm okay with that. It's only typically once every three months at most. So long as our detection is there. No problem. It's yeah. Yeah. That's it. I want to open an issue to track adding SSL age. Like because you are watching or your system mark is watching this for us. That's really useful, but I think it will be a nice team improvement to move that burden away from your infrastructure. Or at least rely first as first level of of watching to our data doc system. There is already a data doc probe monitoring with alerts on CIG and Kinsayo. That one always each time that there is a security advisory and CIG and Kinsayo is put down. I don't know if you see you have two alerts one that say CIG and Kinsayo services down and the second say SSL certificate is less than 30 days. It's a false positive, but that means we have something on that ad hoc already checking for the age of certificates. I think we should set this up and challenge you mark next time you see the alert on your whole system we compare with that ad hoc to see if it's detecting properly. I like that. I, I think the detection of assets dot Seattle Jenkins that I was more of a happy accident from my detection system than intentional because I don't think that I had ever enabled checking of assets dot Seattle Jenkins that I think it was a happy accident that the message was inside another check. And that we don't like happy accidents right that's really nice to have them but we don't like them long term. Thanks for that mark. Every. So every thanks for managing the switch to mail gun for sending emails of account Jenkins. CI. No, I couldn't get into you. The goal was for us as infrastructure team to have a way to observe and get a status on the sent email for you helping user. We said I never received the email. There are numerous cases and but we were blind before that change so thanks survey for that survey is currently driving the mail gun port. We still have the old accounts and grid by Kosuke but as far as Kosuke chat with us dash. There is only one administrator at a time. Well with mail gun we can be multiple which could envision setting up our own send grid on Azure that will be easier for the access control in the future. But in short term we use the free mail gun account. So yeah, it sounds like it helped you. Can you confirm. Yes, we have the last five days. What remain to be seen is how many emails we are sending each month. Okay. To be sure we don't go over 100. Wanting check amount of email. Send per month to stay on the free plan. Do you want to set a reminder end of month or maybe weekly for that. Yes, when the end of month should be enough. I'll put a reminder in two weeks. Can you add it on the genkins infrateem so just in case if you are ill someone else can can take over but by default that that should be used since you manage the whole thing genkins infrateem. Release of CD event not published to update center. That was fixed that was a temporary issue due to RPU failing the build. So switch mail provider to mail gun. That's what the password has it email was. Let me change that to that one. Thanks for the work on that part of it. And we had an outage of the CIA gates cluster due to a lot of reason we had to recreate from scratch the cluster that went well but that was wasted time for us. At least now we have a cluster properly named and now our that that was the opportunity to improve the quad quality of our terraform project for AWS. So back in line. There were also two issues closed with no action. One was closed by the requester Basil about say there was an issue with the CD process on one of his projects inside the Jenkins project. But it's not the case anymore. Thanks Basil for cleaning it up. Digital ocean confirmed that the brute force SSH attack was not us. We spent time for nothing, but at least we discovered a bit more data dog and the tools in digital ocean so no action expected from us on that area. Any question. Okay, so let's cover the work in progress elements so element currently on the milestone. And let's see if we should keep walking on these elements before looking at the new items. So can create account. Are they what's the latest status of this one. I haven't checked. Check is a male. They're email but don't receive anything. Probably propose this this person to use another a different mail provider. Yeah. Good point. Either the check with their administrator again. Remind them because most of the time that's people don't want to do that for whatever reason we don't have to care. I don't know if they don't like them or if it's complicated topic or whatever the reason they have to check with their admin or change the email to be used. There is nothing else we can do about that. Because maybe their admin have blacklisted our domain or IPs or I don't know whatever so except if they don't contact them we won't have anything to show to the admin if we go to their admin ourselves. That's why we need at least them to start the discussion. And if they don't want to contact them then they will use another but thanks for managing that because it's not an easy one. Any question. I propose that we keep that one on the current on the new milestone and wait for the feedback. So it's okay for you. Artifact caching proxy is unreliable. So that one we keep it open because it's a reality. The bomb build since Friday or Saturday are not using digital ascent anymore, neither for ACP neither for agents. So the issues we saw on digital ascent should be gone because we weren't able to reproduce or to see any case outside the bomb builds. That was a certain threshold of a request. That means we know there is a limit on the ability of a ACP system at least on digital ascent when it starts to not answer properly. There might be fixes but if we don't need then let's keep things like that. So issues on AWS this week though. So we'll have to monitor carefully the bomb builds to see if we don't have the same issue on AWS. Finally, on Azure it's still stuck. We need to work on the CI Genkinsi migration. Do you have any questions or things on that topic? Okay, so we'll keep it open so we can continue watching on it and see the results once everything will be migrated in Azure for the virtual machines. Migrate trusted CI Genkinsi from AWS EC2 to Azure. Stefan, what's the status for this one? I am almost done with the second VM. If I manage to have the private IP working and I will be able to start for the third one, which is the permanent agent. And all the setup of the network and the security of the network between the three of them. Next step is a permanent agent. Cool. We should be able to check. After the meeting for that part. I just realized you can, you might want to add a private DNS with a record for the to the IP using static IP can be complicated. So we're better to use a dynamic with a private DNS record. Okay. That should unblock you and should be a viable solution. Okay. Thanks Stefan. So we keep this one on the milestone. So it's okay for you? Yes, please. Next step. Hervé, add launchable to agents. So I believe you need it. I still don't work in progress. Yep. I have difficulty in installing a launchable on Windows. They've got a somewhat working Python installation, but when I try to install launchable, it doesn't, it doesn't success at finding the setup store. Okay. So I believe might need to a bit of help on that area as we discussed this morning. Need session to unblock, but almost there. Cool. Thanks for the report. Anything to add? As soon as this install will be good, I will be, I will be able to propose basils simplification of the pipeline memory. So we need to run the install step each time. Where is that? It also will be able, it also will be available for Windows build. As it would, it could be the, the part where we can gain most of the build time with launchable. So it's, as soon as it's available as better it is. Cool. Thanks for the work. Thanks, Basil, for that part as well. See, I can say you use a new virtual machine instance type. So I've started a rough draft locally for setting up a new environment. That's a mix of what Stefan is doing on Trusted and what used to be done for the current CID in Kinsayo. We have a tag on the Azure project named legacy TF, which was a set of Terraform code used for most of all the Azure resources that we removed that but kept the tag because we can, that helps a lot when trying to understand what resources doing what are listing resources. These plans might not be up to date. That's what I see. Most of the attributes have been changed and some resources doesn't exist anymore. But anyway, mixing both of codes allows me to have, let's say proper code. So I've also selected the new instance size with a bit smaller than the current one because we don't choose all of the memory for CID in Kinsayo. That's still 40 bucks per month. It's not a lot, but I mean, we don't need a high performance machine if it's not needed. Right now, the status for me is naming. And I spent some time carefully reviewing Stefan's work to be sure I don't miss anything on the new Terraform provider for Azure. My main challenge is that we have different cases between trusted and CI. In the case of CI in Kinsayo, that machine will need two network interface, one for public access through HTTP or inbound agents, and that will be exposed through a public IP and one private interface in the public network to be reached through SSH. So that's subtly different than trusted CI, which is inside the private area behind the SSH Pons machine. Yeah, but that's the way I was starting at the beginning, so you may find it. And that's one of the points that Tim Iacom made on your pull request, Stefan. I will want him to discuss a bit more. You want to use the Terraform module for avoid reusing, for providing, do not repeat yourself, but I don't see the point if it's to say, oh, that will be one if for each case. One if CI or if trusted or if third CI or if or the machine that we don't have. And maybe Tim think that I'm a guard in Terraform, which is not the case. So we'll have to learn how to do that, so lose time just to do the templating. That won't be lost time, but in that case more time. Yeah, in that case, even the argument of that I might be missing something. That's why I want Tim to explain a bit more, but I don't see what to reuse. We have three cases. Each case is subtly different. One is fully private, but through VPN. So one only interface for all cert CI need to interfaces for two different network and trusted is fully private. But through SSH machine, I mean that three different cases. So we cannot put on the module, the network interface. We cannot put the public IP. Yeah, eventually data disks. That's data disk and the base of a virtual machine, but the virtual machine don't have the same, the same setup. So yeah, if the goal is to write a 10 line of map structure in a local variable for Terraform instead of 10 line of Terraform. I don't see the point honestly, but there might be other reason that I'm missing now and we might change in the future. That's not a problem with worst case we factorize. What do we have my great application from system pool to Linux pool on private gates. A very Stefan we we saw this one we have the boats running on the system pool. So we need to add toleration and things and not selector eventually. Do you think every should be able to work on this before end of week or do you think you have too much to handle. Yeah, I can't take this one. I have almost nothing this week. Perfect. Let's add not selector. First, then toleration later. Why I'm writing this survey it's because not selector will help you to try to create affinity to one to the Linux pool. Because if we add things on the system pool that might have side effects on the technical systems such as core DNS or the CNI the CSI controller. So better starting with not selector to fulfill this one. Then see if we do the same for ingress controller others or just update everything at the same time if we're the team who had the toleration first on the services. And then once we have done that increase disk space for system pool on private gates. If you don't mind I will want to take this one for a specific reason I don't mind pairing with the two of you or just one of you are doing it alone. It's just because in the Ubuntu 2024 campaign we have to update the system pools. So we'll want to create a new system pools migrate the traffic and then bump and finally do the update at the end. Because since we have to update system pools that will be a way to try a new way to update system pools to with a green blue deployment that will avoid any errors in the future. At any moment we would have two different one and the operation will be instead of changing the disk will be create a new one migrates the pods. Drain the existing pods and then remove the old one and recreated from scratch if needed. Got a try with Ubuntu 2024 and green blue deployment. I don't know if it's related but I remember that we had a problem with the core DNS that has two instances in the same. Yes. That's on that area as well. That's not right now. Once one survey will be finished with the previous one we can do one per milestone on that one. But the idea of the green blue deployment is that we will have a during the operation to system pool to Linux node pool. We might not need that for the other pools since there are four agents but for this one with web services running within that should be important to have two at the same time. Migrate Google Analytics. I need help from Olivier forgot about granting me the correct administration. Alternatives are KK or Tyler which aren't responsive. Past release sites are taking long time to load. If you don't mind I will postpone this one to the milestone after because we won't have too much time to pass on this one and I will want to work on the Ubuntu before. Because the cases we have are using Ubuntu 1804 nodes and the driver on the kernel of Ubuntu is way different for Samba CFS mounts with the new Ubuntu 222. So I would rather upgrade the Ubuntu operating system with the new kernels. See with behave for the release and the get Genkins.io web service and then decide if we have to switch to NFS based PVC or other solution. Any question? I need to update Ubuntu kernel upgrade with better SMB support. There is a the next make environment and description either Mark and I have to take care of that. I will add a message to Alex because he might be waiting for this one. And finally Erwe are you still okay to start again planning the public cluster migration. I don't remember which service can I let you evaluate at your own pace and prepare the plan for service by service. Is that okay for you? Yes. Now new items or items on the backlog that will want us to have a look on. Decrease AWS cost I've updated it. We were able to consume 5k less than last month, which is good, but it's still too much. So the idea here is continuing our efforts. The bill is split half, half between two regions. So the work that Stefan is doing and the date center migration somewhere else would help on the area. On the other area that these are directly the builds from CIG and Kinsayo. The efforts done by Basil, Jesse, Tim, Mark and Erwe on the bomb builds. I hope this effort to pay in the month of May to decrease the billing here. I hope so. We'll see in the future. Right now we're trying to contain this cost, but we need to move the web services ports to decrease the half, the bill by half. So Stefan keep continuing on that area. There is no immediate thing for us. We confirmed that what we did on the past weeks was okay garbage collecting decreasing instant size. I don't see immediate action for us there. So if it's okay for you like we keep that item on the backlog. And I plan to close it if we're able to pass below the threshold. Puppet upgrade campaign to latest seven dot X. So I started some experiments. And this one will be automatically in the new milestone. The goal is to prepare updating Puppet to Puppet seven. I propose to keep it on the backlog. And wait for Ubuntu 22.04 campaign. Is that okay for everyone as we stated earlier? Okay. Ubuntu 22.04. I am 64 virtual machines. So Stefan we were successful to publish virtual machine images. So the next step is to try one of these images on one of our element on the workflow. Can I take that for that milestone? Do you think you will have time to trust it? Trust it and something else if I'm bumping my head on the wall. No problem. So the first step you will have is to select which job on infra CI could be a good candidate for using RM 64. And then start using it. Okay. My recommendation is to look at the job using something tool, a set of tools building go. So terraform or Kubernetes jobs could be a good thing. Packer itself could be really good to try. That could be a nice first step. I will go for Paker. That's the best way. Job in infra.ci to test with RM. Okay. So adding to the next milestone then. Thank you. Renew update center certificate. Definitely going to the next milestone. Because what we said earlier, pkg origin genkin sayo. Puppets keeps updating the GPG that one stays in the backlog. It's really annoying to be spined by the Puppet agent, but I won't have time to spend on this one. When to 22 or for campaign is not on the backlog anymore. As you're billing on CI genkin sayo, I will keep it on the milestone just for me to do two things. Summarize the whole month of April in terms of outbound bond with cost day by day to see if we saw an increased decrease or if it stayed the same since we enabled the S3 artifact manager. And what I did just before the meeting, I'm adding a reminder once a week for the whole month to check for cloud custom at least Azure. And I believe also in AWS. We can clearly improve with the alerting system and everything, but need to check these elements because they are costing us a lot. My great update genkin sayo to another cloud. That one will allow us to work on the AWS costs. Right now delaying it off one milestone. I want to focus on CI genkin sayo in Azure first. So this one stay on backlog. Use web socket for agent. If one of you is bored, you could start working on that. The goal is to enable Web sockets on the Apache server for CI genkin sayo and experiment whatever pod template experimental pod template you want to use Web socket instead of TCP. You know that you can take it if you're interested. I'm not adding that on the milestone, but you can keep it if you're bored. One last element as per more class log analysis for the gfrog, we are still consuming way more than expected. It's below, it's still way less than the month before. At least we're able to gain 10 terabytes, but still there isn't any abuse way of blocking an IP or doing something. The good news is that so first we expect to check carefully the logs sent by gfrog and see if there are any issue on the logs or on the way we calculate the outbound bandwidth and check with gfrog if the current state is okay for them or not. In the event if they want us to enable authentication on the non release repositories. I've successfully started a high availability held up instance on bare metal on my own this weekend. There is an out of box protocol inside held up since a few years now namesync wrapper that allows different pattern for replication master to read a multiple master that's quite useful and you can have full replica or delta replica. Right now I'm targeting a multi master one with full replica, because we don't have a lot of data and the frequency of update is really low in our case on all that. And it's way easier to handle especially inside in a Kubernetes world, because we could just scale horizontally the held up instead of having right and read systems. The rate at which we change the data is not enough for us to risk losing data when there is a network partition we are clearly far away from the limits of this system. So that would help us a lot especially with the PVC that is slowing down the stop and start of the container. In that case we should we could should be able to keep one PVC per instance. And we we should be able to operate a cluster without any downtime for the held up that would add a safe word if we need to enable authentication from gfrog. Here we are. Do you have before we check the latest non triage issue. Do you have topics that you will want to work on the milestone that I forget to speak about. Okay. No topic. So then moving to the issue. Let's look on the recent issues we had find a way for to avoid non expandable missing update center support Linux for running VM. So thanks James for opening that issue. We had a private discussion about there are at least two projects I think these both are plugins. Yes. To Jenkins plugin, which build process needs to run the Maven and Java command on the Windows hosts and they require to execute Docker command from that Windows host to a Linux container enabled Docker. We don't have this in the infrastructure today. That's a legit. That's a legit requirement. I don't say we have to work on it now. The thing is we are there is a discrepancy today the most of the developer using a Windows 10 or own system with Docker desktop and Docker desktop is able to switch from Windows container to Linux container. The downside is that in our case we use only Windows server because we have to work on this on endless system. So, yeah, the question I've tried to summarize the different choices we have none of this choice is obviously better than the others, but each one has its own cons. I will want to hear feedback right now with the help of JC we were able to to improve failure detection for these plugins. So, these plugins are now building properly on Windows machine and if it detects that it's on Windows then it won't try to run Docker Linux container on this one for now. But yeah, they have legit cases. We have different solution that will require adding a Windows 11 template on Packer that could either be next to the existing or a new one. The problem is for ACI or Kubernetes container. There is no base OS container for Windows for Windows 11. So that need to maintain both a Windows server for container and Windows 11 for virtual machine. That's quite the effort. I personally like the WSL ID. Maybe we could have WSL installed by default on the Windows machine. And if this plugin want to use Docker Linux, they could start a WSL engine that will provide Docker inside Ubuntu and they should be able to bind off. But that's still a lot of effort. So yeah, that might need specific work right now. I ask the question. Does it concern a lot of plugins? And if not, shouldn't this plugin get the custom thinking side instead of adding all of this complexity, you know, applying library? Does it worth it? Is it a concrete need? Yes. Yes, there is. Not a lot. That's not age. I don't know what your definition of age is. It's a legit case that happened. For one of the most important is people running agent on Windows. And they use Docker pipeline workflow. And they have their Windows machine standard machine with a GVM running the agent because that's the way it works with a Windows system that stored the inbound agent process. And they have a WSL Linux machine. And you need to test the case of when the agent runs on Windows machine and need to start container with a Linux case. We need to test this element, especially for the file path conversion between both of us is between the container agent and the host machine. These are real use cases. We don't have a lot. And it's not a priority, but we need tests for these elements. You need test cannot allow that and you need integration or acceptance test for that. These are exotic case. So that's why that's a legit use case. It's not priority. Because we weren't able to run this, this test until today, but still increasing the coverage and that area wouldn't kill. I'll ask the solution proposed by jumps. That's really nice of him. I shared with us the switch team on, but that's a specific feature of Docker desktop ends my proposal to use the Docker desktop on Windows 11. Because of course you cannot install it on Windows server would have been too easy then. But we can install Windows 11. Virtual machines. Yes. Yes. As you provide a way to have Windows 11 virtual machines. Because I'm running a machine of Windows server with the engine installed on it. Can you repeat? I'm not sure I understand. For my experiment, I've created a Windows server virtual machine. I have installed the desktop on it. Did it work? Because it's working. Yes. Okay. My information about Docker desktop come from the person packaging Docker desktop that told me it's not supported. I mean, I don't mind. We could try. Instead of Docker C, we could add Docker desktop. We have chocolaty. There is a package for Docker desktop on chocolaty. That won't kill trying if it work. Worst case, we still know how to install Docker C so we can always go back to Docker C at any moment. That could be good news. I didn't try because when the person in charge of packaging tells you, we don't support that. I mean, I won't even try. I trust them, right? But maybe it's just their QA system. I didn't use that on that. So that's a good point and thanks for sharing. And I think we should absolutely try. I could also try now. Come on. Good point. Just a point. Keep in mind that it's interactive versus non-interactive. So what you can do interactively, we are not skilled enough on the windows area to easily make it non-interactive like we do on Linux. So I spent some time last month. You spend some time on non-chable ports. So Docker desktop is another of magnitude in terms of complexity. But yeah, if it work, please add a comment on the issue because we could then try something with this. Thanks for the feedback, Array. I'm removing the triage and I'm adding that to the infra team sync next. Next one, find a way to avoid non-experial access for the data center certificates for that one. Oh, just a minute. Sorry. So there has been a misunderstanding that they talked that we had to share the key for the certificate for 10 years, but it's a certificate authority. However, most of these points remain. And you want to start the discussion about what would be the threat model that leads to a 10-year valid CA to validate on Jenkins core side the metadata from the update center with a one-year or six-month rotated certificate for update center. There might be other solution that could be valid or invalid, but the goal is to reassess the threat model to validate or change how it's implemented inside Jenkins. So no action expected from us here except keep the discussion ongoing. That might switch to a GEP though. That's why I'm removing the triage, but no action expected from us, so no milestone here. That's good for you. And third triage issue. I'm not sure to understand everything here. I will assign this one to Mark. I understand that we have spare distribution line mainline version of data center file created when the data center generates its index. And when we have plugins released under a new LTS version, that one is created. It's not created when the core version is released. It's created after a given release weekly or LTS of the core. When a new plugin is released, that new plugin is the first to say, hey, I've been released under that maximum version of Jenkins. That explained the discrepancy that Alex talked on the update center files. So as per mark, there is no action expected here for us. I will remove the triage and assign that to Mark. I just want Mark to be sure that it's okay and there isn't something hidden behind for us. If it's okay for you, we'll add this to the current milestone that might be close with nothing to do. So we don't lose track of this one. Is that okay for you? I don't have other issues, no new issues, no element. Do you have something else or are we able to close? Good. Cool. So before closing, I just see a new message. Hello, Vignan. Hello, guys. I'm reading aloud the message you just posted. I'm new to Jenkins and want to contribute to Jenkins and Fraq. Can you share me some resources to get started? I would suggest, hello Vignan. I would suggest you to post your questions and community that Jenkins.io and maybe Adrian will be able to help you clarify some parts of the progress from there. We as Infra Team won't really be able to help you about health-scoring plugins code. Yep. We don't really work on the code, but more on the infrastructure side. Vignan depends also on which kind of skills you want to grow or learn and what you already know to do. You want is with virtual machine management, with fleet management on Kubernetes, on Terraform project, other things. You want to enhance your programming skills? Yes. Then Herve's tip is the best because we don't code a lot of Java. That's infrastructure layer. That's what we do. So if it's okay for you, I would follow what Herve said. Open a topic on community Jenkins.io. You will have plenty of mentors here that will help you on the Java area for specific programming skills. Does it answer your questions? One, two, three. I'm not sure if you heard us. So thanks for asking Vignan. I don't know if you can hear us, but yeah, as Herve said, yeah. Yeah, but Vignan, do you mind switching your audio on so that will be easier for us to follow? So yeah, as Herve said, then we recommend you to start with community Jenkins.io for working on programming skills because the infrastructure weekly meeting that you have here is only for the Jenkins public infrastructure. So we are not specifically programmers. I think we can hear you. Don't hesitate to mention our names on the topic you are opening on community. That will be easier. And if you don't have answer back or no valid answer, don't stay to come back to the meeting next week then. Okay, for you. No objection. Okay, I'm gonna stop sharing my screen. Cool. So you did write to ask the question Vignan. That's a good way to start at the first step. So welcome to the community. I hope you will find what you expect. And see you. See you on the upcoming days or month. For everyone, I'm going to stop recording. So for everyone watching us, see you next week. Bye bye.