 Hi everybody, welcome for this new Jenkins infrastructure meeting. So the major, the major events that you may have seen on meeting this was we make rates the JIRA instance to the Linux Foundation. So we did it yesterday. It was a little bit harder than expected because we did almost the same thing that's during the testing periods, but for some reason we had database issues. So we had to restore from the different backups. And basically what finally worked was to restore the backup we did for the testing one month ago and then we upgrade from that backup with the latest data from yesterday. So yes, it took us several hours to finally manage that. But the good thing is we are not now officially running on the Linux Foundation, which would reduce the effort to maintain that service. So if you have any question, feel free to ask. The main difference right now is if there is something wrong with the service, we have to open an infrastructure support request on the Linux Foundation site. But otherwise, we keep our access, our administrator access to the service. So it's almost the same, I mean, it's almost the same situation that in the past, except that we don't have to upgrade it anymore. Any question? So, yeah, I think everything is set there. I proposed to move directly to the next topic, which was the Docker image for Windows. Garrett, do you have any news here that you want to share? I mean, that's pretty much done, I'd say. I spotted an issue with the host that we're building on AWS this morning, where the disk size has dropped right the way down to about 20 gig. I'm not sure why, but on the latest, so I've managed to tweak that so that it's actually running with a 100 gig disk, and I put a test in there as part of the packet build to actually fail if we, so if we upgrade the base image again and somehow get a smaller disk, it should fail the packet build. So it's probably a certain the Docker works and everything now, so that's quite nice. So I'm just monitoring that on ci.jenkins.io before rolling out that AMI version across the rest of them. And I think that's about it, apart from maybe a bit of pressure on adopt OpenJDK to see if we can get that PR merged. Great. Great. Any questions for Garrett? Let's move. So the next topic is about the Azure or Azure account. So down first, we have quite a good news here. So the Azure, Azure offered some credits that we can use. So it's represent one month of usage, but yeah, it's, it's still more than welcome. And we will use it, you know, to keep reducing the cost for the new foundation. So I mean, it's more than welcome. It was not expected. And that's really great. The other discussion happening as well at the moment. So I have a meeting with Linux foundation folks later today to discuss how we can migrate our Azure account to the Linux foundation. So the current state at the moment is I'm the owner of the Azure account. So Tyler was the former owner, then it's me. And so I would like to transfer that responsibility to the Linux foundation. And also it should, we are also looking at ways to pay the invoices in time. So we don't have to send a lot of email to ask people to pay the invoice, basically. So I can't really tell at the moment what will be the output of that meeting. But that's the current discussion happening. I propose, any question? No, then I propose to move to the next topic. We had issues today with the VPN. So basically, the certificate with location list experts on the machine. So we had to generate a new one. And so we had to build a new Docker image, publish that new Docker image. I don't have an easy way to monitor that for the future. And considering that it does not affect a lot of people because for more, I mean, it's many for administrator people. We still have the time to fix that and the needed actions. I updated the documentation. So feel free to look at it to know if you have any questions. I still have to update the list of people who can open the VPN CI that key. I was thinking to add Tim Jacob and Mark on this. I don't want to have too many people, but I think we should need, we need someone else than me who can modify, who can request and sign certificates on the VPN configuration. Any question? Hi, Daniel. The next topic that I would like to bring here because I'm almost done here. So basically, last week I've been working on a PR to guess parameter for the releases. So the idea is instead of specifying parameters like where you can fetch the codes and which ones and stuff like that, we use a branch name to identify which kind of release. So for example, if we are on the master branch, we assume that it's a weekly release. If we have stable in the name, then we assume that it's a stable release and stuff like that. The code is there. I also added unit tests for the Python part and the bash part using bats. The only one that I'm not sure at the moment is which branch we could use for the security weekly releases because right now we're using code names. So I could, I cannot guess any information there. So this is the only information that I'm missing in order to match that PR. And yeah, I'm looking for reviewers here. And I'm happy that Daniel can hear me that while you're here, I propose to quickly have a quick feedback regarding the report agent in Seattle org status with chief frog Daniel. Do you have any new information here? No new information. Baruch asked about progress there related to storage. So he confirmed that traffic is reduced after we fixed the tool installers for a Lure plugin. We haven't made progress regarding storage yet. And I had a few follow up questions and he hasn't gotten back to me. And also we don't have the new stats that no longer include a Lure plugin. So it's unclear to me who's exactly waiting on whom, but it probably makes sense for us to take a look at what we can do to reduce storage space. And the problem I have right now is mostly around I delete stuff, especially from the repo one caches that appear unused. How can we fix the problem if it turns out that they were actually used for something unusual? That's the problem that I currently have. And I mean, there are a few strategies, but I think JFrog wouldn't be too happy if I started downloading everything that I will delete just before I'm deleting it because that will cause additional traffic. And so that's a weird situation right now because I have no idea what we can be confident, what can be deleted. Will it make sense to deploy just a machine? I mean, not publicly, I mean, just to do a copy of repo. Not in the purpose to provide the content, but just as a backup, as a full-back situation. Right, so that's also something I'd consider in the problem is or it would even be possible to sort of do this inside the factory directly, but I have no idea how feasible it is to copy a repository with millions of artifacts inside the factory or whether I'm taking the entire thing down if I click that button. And since I hadn't gotten a response yet from Baruch about something related to scrapers, because that's also something we're dealing with, I haven't yet had an opportunity to ask him about the feasibility of something like that. Specifically about scrapers, the problem there is the idea is just delete everything that hasn't been downloaded in, say, a year or something. Problem is that surprisingly little because some clones actually, you know, wget-r download the entire thing, which means far less is unused or not downloaded for a long period of time than we would expect. So that's the additional complication there. Is it possible to block that recursive download? Well, how do you distinguish abusive behavior from just, you know, a maven build? That's a clean maven build and abusive behavior probably look exactly alike. I'm sorry, the irony of what you just said, Daniel, is a thing of beauty. Thank you very much. How do you distinguish the difference between a maven build and abuse? Thank you. I appreciate it. Okay. So, in the end, we still have some news regarding GIFRAC Artifactory. Thanks, Daniel, for that. Daniel, there have been discussions about incrementals being possibly expired even if they were referenced. Is that also unsafe or is that not appreciable in terms of volume? I don't know how much it is. So, if you remember what Baruch told us about his expectations, he wants us to limit Artifactory to legitimate uses. Incrementals is a legitimate use. Right. So, I could wipe the snapshots and just tell everyone to just rebuild whatever PR it is that they need a snapshot for or an incremental and move on. So, with snapshots and incrementals, I could have some benefits that we don't need it. Similarly, there are private repositories in Artifactory called SIRT incrementals and SIRT snapshots, which are essentially the same thing as the public counterparts, but for the Tengen security team, which is another 100 or 200 gigabytes or so. But this does not actually accomplish the goal, as Baruch described it, because we would reduce the storage space for the legitimate uses. And so, that's why I hesitate to do that, because if he's saying, well, that's great, but that's not what we were asking for, then we haven't actually accomplished anything. So, we can probably look into deleting old snapshots and old incrementals, just as regular cleanup, if they haven't been downloaded again in several months or more than a year or something. But that's not actually solving the problem that we have and what JFrog expect us to do. Thanks, Daniel. Thanks for the clarity. There is another topic that I would like to briefly talk, and I did not list it here, is the acceptance test hardness. So, those are using a lot of resources on CIDA in Kizorayo, and I'm just wondering how we could either reuse those or maybe do you have any suggestions on this topic? So, the main reason why I'm raising this is because CIDA in Kizorayo is not only used by those tests, and those tests take between seven and nine hours to complete. And so, when we trigger several PRs with those tests, we just use a lot of capacity, which delays certain jobs like the plugin sites and other things. So, I'm just wondering if we really have to run those tests. And I've explored that question with James Nord, and he's quite strong that, yes, they are a significant help to him. I am still continuing as the reaper of acceptance test harness jobs as necessary to reclaim CIDA Jenkins.io. My thought was, we consider, since Red Hat had been executing the ATH inside their infrastructure and won't be when Oliver Gonca steps down as release officer, we have to plan to take the ATH execution into the Jenkins project, but possibly consider a separate cluster in Kubernetes that is dedicated to ATH or a separate Jenkins instance. I'm happy to start that conversation in the mailing list if that's okay with others. I'm not sure what would be the benefits to use a separate Jenkins instance for that, but I think we should definitely use a specific agent for that, so we do not reuse agents for other purposes. Really quick to clarify, there is a very long-standing complaint by Jesse and others that the weekly ATH runs take up all of the executors for probably an entire day because of the number of pull requests of Jenkins. Is that what this is about, or are you talking about dollars? Well, so for me, I'm talking about the jobs under infra slash acceptance test harness. Right, but the question is this problem that this Jenkins instance doesn't build anything else while ATH builds are running, or is the problem that this consumes so much money in our Azure account periodically? Both, basically both. I think it's just more annoying now because it affects the other job, but it doesn't make sense to keep building those. I know that there is a long-standing discussion about this one. I mean, there is a ticket where basically Jesse and Tyler did not agree on the best way to solve this, and I don't have strong opinion on this ticket, so I'm not neither. I mean, I think I understand both person. I think I understand Jesse and Tyler as well. Couldn't James fix the eight hour runtime, wasn't it? So they would have jobs with being balanced, or has he not managed to fix that? Unfortunately, his analysis showed that yes, there was some unnecessary duplication, but that was not a significant contributor. The most recent builds are still showing about that same duration, and one solution is just allocate more HMM AWS instances, but that's an expensive solution. It's something that I've done recently and increased our costs temporarily while they run, and then we fall back to not having that high cost. We need to have this grain. I mean, we feel like a hundred regular, but they're very, they're quite valuable tests because they catch, well, some of them probably are valuable, real issues, but they're not valuable right now because there's so many failing. The problem is we are just increasing the cost in this case, and we are increasing our Azure costs overall. We were going to around 8,000 per month, and now we are going back to 10,000 per month. So I think we can just increase the number of memory machine. Well, but the Bolivia in this case, the HMM machines that are being used are actually EC2, so I don't think they are affecting the Azure bill. They are affecting the AWS bill, but as far as I know, they're not touching the Azure bill. Yeah, then there is something that I have to clarify on my side. But Tim, to your point, I think the ATH suite value is diminished significantly by its duration and by its unreliability, right? Having a hundred tests fail mean people stop watching it unless they're explicitly experts in that thing to the side of its value. Unfortunately, my attempts to ask for some tests to be deleted failed, so I'm still, I'm not sure how to approach ATH. I'm happy to delete Merge, click Merge on deleting tests. Okay, I may repropose those then because there are a number of tests of a certain plug-in that I am already confident they are well enough tested in the plug-ins development. There isn't a lot of gain by checking them through Selenium. The main gain that you have having an ATH is that you can test them in new builds of the core. So if you've got a new weekly or alt-experts and coming in that suite to get run. For me, it's the rate of finding problems in ATH is much, much lower than the rate of having false failures that I have to go decode. Why did this fail when it shouldn't have? Yeah, well, we're finding real issues in there, but we find that normally not to be fixed or a long time later than they should have. Values low because they're failing to me anyway. It needs to be a green suite. So it sounds like we can't find an agreement right now on this topic. So I propose to do some investigation for the next meeting next week and try to see if we can come with a good option. And possibly one of the options is removing some of the tests but requires someone to go through, look at the failing tests and other ones to remove some that aren't so valuable, especially if there's any time consuming ones. Any other topic that you want to bring here? I just have one last reminder before we finish the meeting. I just received an email for the election. So it's time to vote for it. Don't forget about that. Any other last minute topic? We are five minutes before the end. We still have some time. So yeah, if nothing else needs to be discussed here, I propose to go back to RSE. So I go to three, one, two, three. Thanks for your time and see you on RSE.