 Hello everyone, welcome to the Jenkins Infrastructure Weekly Team Meeting. Today we are four on the table. Stefan Merle, Erwe Lummer, Mark White, and myself, Damian Duportal. That's why, okay, color changes forever. Let's get started with the weekly release. So, Stefan, what's the status since you just checked it? The first packaging built fade, but that's something that very definitive. I just launched a replay and that should go smoothly. And the change log has not been merged yet. There are some minor updates needed that I'll make Kevin's out ill today. So I'll make those changes and merge them after this meeting. Okay. Okay, so I'm going to take a look at the Docker image to be built and incoming change log and last checklist item later today. Right. And I, in order to keep my focus, I'm going to stay in this meeting focused, and we'll actually do the writing after the meeting. Perfect. Do you have other announcements? Okay, so let's get started with the next weekly, that should be 2.402 next week. So the 25 of April 25. Next, LTS, 387.3. Yes, or four. Yes, scheduled for May 3, releasing May 3 release candidate. 19 April tomorrow. Backporting has started. Chris Stern is the release lead. Thanks. And Doc's team will write the change log and upgrade guide. But no need to note that, Damien. Okay. I haven't checked the security advisory, but I don't recall having one. We had one. No published, no published plan for no published plan just declared yet. So. None. Perfect. Next major event is the silicone. Is that correct? May 8, 9. Vancouver, is that correct? That's correct. Okay. Other announcement, major issue. Okay. So let's have a quick look on the task that we were able to finish during the past milestone. We had a plugin in update on CI Jenkins. To help the core developers around. So that's an update on the code coverage plugin. So done. Thanks for managing the Jenkins release. Twitter accounts. Suspension notes. So if I understand correctly, you had to downgrade the plan. Is that correct? Yeah. We were on the previous plan and the second was still active, but it would have been cut in 40 days if we didn't change the type. I searched a little bit before finding how to do that because it wasn't of use at all. Another great execution from Twitter. Change to avoid suspension. Okay. And we are within their usage in terms of publication numbers. It's it's around. It's 1,500 tweets per month. Allowed with this plan. So we are very good. Cool. Unless we have a lot of developers. That will be a first world problem then. Cool. Thanks. So no needs to remove the RSS to Twitter publication. And we can continue. Nice job. Enable renovate for Jenkins infrastories. Okay. That was done a long time ago. Is that correct? It's just, it had to be closed. No, it's, uh, I, um, I misread the given issue at first. I thought it was for plug-in site, but it wasn't. I modified my message, but the story, sorry, repository wasn't in the selected repository of renovate. So I added it. Cool. Thanks for helping. Thanks. Mark under way for the move. I beams into Jenkins. I your repository. I assume we. Done nothing. It's a neck. I don't know how to pronounce his username, but he done. He done the work. Okay. Nothing else to do on this one. Um, we were able to, uh, that means now the update center update log are now streaming inside a data dog. If you want to extract some information about the data that is served by update center as well. Okay. Nothing else to do on this one. Um, we were able to ensure that apache logs were collected by data dog on our machines that was required to measure the amount of data served by apache is served by update center as well. So the next step is learning on data dog how to do a dashboard that will measure the amount of data served by apache. But it's working. Uh, I won't go on details. Uh, there were a lot of tiny hidden changes behind this one. And of course. I broke things that I fixed during the weekend. So the jobs we start using as your upload are failing. That issue was opened by Alex. Thanks for mentioning that that's related to a. Team and I trying the Azure artifact manager. There is an issue a bit later about that topic. So that's why I closed the issue telling that we have to follow back. So let's treat that one as a duplicate. But the takeaway is that. Random jobs were failing for the archive artifact steps. So let's say half of the jobs worked as expected. And half of the other failed with a TCP and check error inside the bill logs. We disabled immediately. Team has started helping us on that specific area. So it only happened during time window 24 hours. That might happen again, but that time we will have to communicate. I took a certain email and I realized that my email was black listed by Google groups, even if I'm an admin of the mailing list. Who knows. So next time I promise I will check that the email is received. Self-improvement. Anyway, next issue update center doesn't build because agent one is offline. Let's say we had the SSH key to rotate it and it was. Yeah, it was sometime we should have done that. The credential name entrusted just for the joke. The credential name was literally. Cumber keys. Please rotate. So I cleaned up the puppets. And I removed any mention of nonexisting virtual machines. Cumber was the virtual machine serving Jenkins. Website years ago. It's been three, four years that that machine is long gone. So I decided to remove the manual. Cumber keys because it was used by trusted CI to upload the through her sink. The generated Jenkins I website. That's in the case in four or five years. It appears that the same SSH key was also used by trusted CI to connect with permanent agent agent one. That's why the update center wasn't updated. So a new key has been inserted and the credential cleanup has been done. That could have been a fun one. Next time I will promise I will call you. Some dirac components to be archived. Done. Nothing else. The administrator can do that. Alex Brandes, a.k.a. Not my fault has been added. I'm sorry. Alex Brandes, a.k.a. Not my fault has been added as to copy editors team as validated on the mailing list. And also today's side notes. I've had it him to the Jenkins in organization group named Docker images. So now he's a maintainer of all the Docker dash images that we built. The reason is because he's helping us a lot on taking care of the dependencies and fixing the dependencies on most of these images. So that's why I decided. He still cannot push or deploy himself. He only have the ability to open pull request and he's considered trusted by infracia when it's scanning the repository. So his pull request will start safely. And so to the Docker team. So congrats Alex for that trust level. And thanks for the help. We had issues. On the virtual machine with Docker on windows both 2019 and 22 that was blocking all the test framework of the official Jenkins images. That issues is gone and fixed. That was a tricky one. I don't want to go into details. Everything is written within. And now we have up to date Docker, which is not me on this, but community edition and we have everything required for the images. By the way, there is still a word issue. We cannot update. We cannot create with PowerShell inside windows container on our inbound image. We cannot create. There are some PowerShell command that failed only on Windows Server. That doesn't fail. I cannot reproduce that on my machine. And that do not fail when run in interactive mode. Yeah, it's not in infrastructure things, but that's a funny one. Monitoring improved data tagging for perpetual machines. That was a request from Stefan under that make a lot of sense instead of identifying infrastructure virtual machine data. It's not really clear that it's trusted at CI. Better to use better human readable names. It's done. You can search. And Apache logs combine with that make it easier for us when we have an operation or a failure on the infrastructure. There was a Gira account locked. Account issue that's fixed. That's part of the campaign of cleaning up the AWS account to spend less credits. We finished the garbage collecting. That was an old issue. So right now the garbage collecting is a bit aggressive because it's delete sometimes images of production. It's unexpected. It should not. But we were able to see. I need to comment on the later issue, the three of us that we were able to gain 60 bucks per month on the accounts. Which is 60 bucks per... No, sorry, daily. That's better. I would say it should be around 1.5K per month of economy. We already see that on the forecast. And the garbage collecting ensure that snapshot and AMEI storage cost won't come back again. I don't remember what issue was. I assume it was closed. The documentation of the code signing renewal process for our successors in three years has been done, validated and reviewed. So the whole GPG and side code signing should be okay. Good. Thanks, Maroc, for catching the issue on update. So the certificate has been renewed. It's weird because everything was in place and in fact we don't know why it failed. So we were able to renew everything. It looks like you are muted. So this was expected to be an automated process and the automated process did not happen as hoped. Exactly. From the system logs we saw that the CONTAB run the... Once a day at 6 in the morning UTC time the root CONTAB on all the virtual machine with let's encrypt run the CONTAB renew command. That CONTAB renew should have detected the glitch. When we run manually the command it succeeded in renewing the certificate. That but renew, not CONTAB renew. Yeah, sorry. That but renew my bad. So this was the SSL certificates. I thought this was some other certificate used by Jenkins in communicating with or validating content. No, it's not. Thank you. That's another issue. So we dig a bit but it seems like the POPET module and the way it's working is not correct. However we saw there were some leftovers of a former set bot package that were installing their own CONTAB that was running at the same time. So there are users I've linked the issue that had the issue and complained about that CONTAB that should not be managed if we don't use the native package. My guess is that the two CONTABs were competing because I tried the CONTAB with two user account and they were competing with a lock. The thing is we cannot remove the dash Q flag. It's in row inside the POPET module on the CONTAB that's quite annoying. So my proposal is that we wait for the upcoming three months. We'll see if it happens and if it happens again as I wrote then we will have to eventually disable the POPET feature and create the CONTAB ourselves. So thanks Mark. I think it will be worth it to create the same kind of monitoring on Datadug in the future. We are still missing time for that but yeah, thanks for monitoring this. And the Gira manipulation on the plugin. That's all for the task we did. Did I miss something else? Nope, okay. So the work in progress. First area that one is quick. We need to renew the update center certificate, the one you just spoke about Mark. The deadline is end of May. So this year we are starting quite early. So thanks Stefan for taking care of that. Stefan handed me over and then I handed it to Olivier because we need either KK, Oleg or Olivier Verna to generate the certificate. I've just signed it. Maybe they only have to sign but they have a critical private key which is credential. Last year Olivier and I decided not to grant me the access to this one because the CRL behind and the certificate authority is valid until 2028. It's still five years from now which means the more people have access to this, the more people could play around with the update center. So the goal is to try to have as less possible having access to that credential. Renewing that root certificate means updating all Jenkins instances that was done in 2018 and that wasn't an easy moment. So better to have this once a decade. So we have asked Olivier was still bothered by the fact that we cannot have the key so he is asking if I can have the key as well encrypted to my name that I will put on a restricted machine. Mark, is it okay if we submit that proposal to the Jenkins board? I want this to be validated. I'm not sure if I should ask on a public email or if we if we start with the board and then extend. I'm not sure of what would be the best way. I'd say ask the board. I don't know that the board usually processes those kinds of requests but it seems good to ask the board. Okay. So I'm going to ask the board and I asked Olivier to send me a new certificate so we can renew it but we have still some time to generate certificates for us asking the board if Damien may have the CI key along with KK, Oleg and Olivier. So as soon as we have the certificate Stefan you can continue working on this. Thanks for checking the documentation. It seems like it's good we were able to write it properly last year. That's also your job so thanks. Let's see in one or two weeks. I'm getting that one automatically to next milestone. That's okay for you. Perfect. Any question? Next subject. I aggregated we have two issues about the email sending issue. So we have currently two issues blocked by the fact that the users complain they never received the email with the password when creating their account on accounts. So on that topic we don't have access to the SendGrid cloud account that is currently used by accounts. So we cannot monitor if these emails were gray listed if they were sending issue to the respective provider. So we ask KK and we got different answers. First KK was able to grant us access to the mail gun account where Andrew, Tyler already had access. So now the four of us have access. So please a reminder can you use a personal account and enable multiple authentication factor mail gun for everyone here. Mail gun doesn't seem to be used. Earlier today KK also answered that the SendGrid account has a plan that only allows one administrator to purchase him. So he's asking if he should grant the access to someone else that could be technically doable. And also Tim Jacob and Olivier answered about this sending service email concern. So Tim told us that maybe we could set up our own SendGrid instance because it can be managed within Azure. Advantage is that the billing will be centralized and any Azure administrator or let's say account could be granted access to that SendGrid instance inside Azure. That would be interesting in terms of access control for admins. And Olivier warned us about the potential costs. As far as I can tell SendGrid is around 50 bucks per month. I don't know who is paying this. I assume it's KK but maybe not. I never had an answer from KK on that particular topic. Maybe it's not paid. I got a billing spreadsheet from Olivier that mentioned 40 points, like 40. 14 or 15 bucks per month for SendGrid. But that's the only information I get. So the proposal made by, I think, Erwin-Stefan was can we change an account in Kinsayo the send, the email sending provider. The real question behind this is the amount of email and the cost that we will have. Are we able to switch to mail guns since we have access? Should we need to a bigger SendGrid instance that could cost us money? That's the question. Yeah, looking right now at the SendGrid on the portal it's around 20 bucks per month for 50,000 email accounts so it should be more than enough. On Azure? On Azure. Okay, so that could be for accounts in Kinsayo that should be clearly okay. Okay. On mail gun area? Sorry, go ahead. SendGrid account the problem is there is a 2FA on it and it's a problem as one number has to be dedicated to it. I was thinking maybe we could get a 3DO account for that. And we will get the 2FA account for 3DO. I think the fact that you check the SendGrid on Azure if it's 20 bucks per month for that amount of email that's absolutely an option it's not costly. My question is I don't know the plan for mail gun. Are you okay to check and compare? Because we have mail gun accounts with multiple administrators already today. So why not using it if we are in the market? If not let's compare prices and use Azure instead. What do you think? Megan is a paid account or free account? I don't know. Can I ask you to I can check when I suggested we could use any free share of any mail provider as we don't send lots of email we noted that free account are on shared IP and with a big risk of back listing. That's one of the main concern for mail gun. Even on shared IPs they are drastically hitting people with non-secure database. So I'm pretty confident on the quality of those shared IPs. For mail gun? For mail gun, yes. Yeah, but they had very strict usage and they checked before sending. So they are really good. So the question is what if a large corporation that just sits on what would be the good rules for checking the CIM or email sending says suddenly that IP must be blocked. Then you are like Damian Duportal trying to send an email to the mail group of Jmail from his Jmail and as an admin of the group and is still sent in spam. Of course, there is absolutely no way to send that, but even with the dedicated IP and I know how they work on that area and that's really hard and they do a great job. That's all I can say. Of course, that's not foolproof. Okay. So the shared IP challenge and the pricing looks like these are the challenges to solve here. Is that okay for you? So if you see a solution you can go both. As infrastructure officer, I absolutely trust you on that part. If you have an adopt or want someone else to help on the decision, don't hesitate to ask. Looks good. Okay. Using the Azure sun grid will mean eventually creating it manually and then importing it on Terraform is possible to have configuration management if you go the Azure way. The cost of 20 bucks per month is acceptable inside the Azure billing. If you go the mail gun account, then go ahead and update. The goal is to unblock the users and being able to have a run book explaining if you have an issue with email sending go there or go there and then we can help users. Is that okay for you? Yes. So unless someone has a question, I switch to Azure cost. Okay. So the idea is that we could decrease the Azure billing. We had two options. The first option on Azure cost was using an artifact manager to ensure that CIG and Kinsayo outbound bandwidth could be decreased. So what has been done? On the bomb builds Jesse Glick was able to merge a proposal where most of the stash and stash steps during the build are undone anymore. We were able, so I need to comment on the issue, we saw an impact. The outbound bandwidth decreased due to this change. The forecast show we were around 600 instead of 1300 last months. Could be worse checking the month before though because we had quite unusual activity last week, but clearly bomb is one of the culprits. The artifact manager is a tentative to decrease that outbound bandwidth because on Azure the outbound bandwidth pay from Azure buckets is clearly lower than from a virtual machine. One of the main reason is by default the blob storage when they are public like the one we would have there would serve the archived artifact of CIG and Kinsayo. It will use the Azure CDN by default. You can disable but it's enabled. That's why Microsoft is able to decrease the cost because it's served through their CDN network. How does it work? Artifact manager once installed and configured the archive artifact or stash and stash operation right from the CIG and Kinsayo virtual machine to an Azure blob bucket. Then CIG and Kinsayo when you click on an artifact because you want to download it for instance all plugins archived .hpi generated file which is a few megabytes then CIG and Kinsayo redirects you to a temporary IP valid for one hour. When you click you have an HTTP redirect from CIG and Kinsayo to their Azure content network and CIG and Kinsayo is not serving the data only the redirects and in background each time someone issue a request to an artifact which is stored on blob storage then a new token is generated for each and these are a bunch of temporary token. That's a really smart behavior. So even even requests from Jenkins agents would be satisfied by the content delivery network not by I don't know for this one I assume it depends on where you are if you are inside Azure for instance Azure virtual machine agents then the answer is no but for the AWS or Digital Ocean yes I believe so in any case CDN or not the goal is to have the this file served by another service than CIG and Kinsayo itself less pressure on the Apache server, less storage and less thread serving requests so that's a good idea problem is that after 22 hours we started to have word errors with nothing on the controller logs and on the build logs you had the TCP uncheck error when the build was trying to stash unstash or archive an artifact we saw data in the bucket I initially thought that my initial configuration was wrong but thanks team checked that it was working we were able to reproduce and with a few builds that we tried it worked every time from our local controller tests which mean there is an issue when scaling up with CIG and Kinsayo and or an issue in CIG and Kinsayo setup so that's why we disabled this one team was able to pin this error message to some behavior of the underlying Azure SDK that is used to connect to Azure APA in Java through the Jenkins plugins so an update of that plugin fixing some of these issues but not all have been issued and deployed to CIG and Kinsayo so the proposal is assuming a proper communication to developers that we will try again using the Azure artifact caching manager and the areas we will see and enable debug logging at least for the administrator to see if the same behavior happens again to help the plugin developer to pin the issue because it's expected to work and it should be transparent on that particular topic I was a bit dismayed about this not working so I tried the S3 that's the same thing but with for Amazon S3 buckets but since we tried to get away static services from Amazon accounts I wanted to use another I used a tray with digital ocean space which is fully compliant with S3 I'm not sure if I misunderstood the configuration of the plugin but it looks like I wasn't able to make it work so same I open issue and I got help from JC Glick so I will need to retry that part the question is how it behave if we have both systems installed and set up at the same time of given controller because technically you can do that from the UI I'm interested in knowing how it works how does junk in select one or the other I don't know and it's written nowhere no one was able to give me a proper answer so I propose let's try locally the question I want to raise is maybe in order to keep an equilibrium if we cannot make the Azure plugin I want to raise the question of we can still use an S3 buckets that means the artifact will be sent from Azure to Amazon each time but most of these artifact are in fact generated by AWS or DigitalOS so they will be copied from DigitalOS and Amazon directly on S3 and CI Jenkins will only issue a redirect to AWS when someone request the artifact so that should still be able to allow us to decrease the bandwidth that could be a solution most of our builds happen in AWS anyway so that's the area where we are so now the proposal is to start with Azure first time is that clear? Does it make sense and does it trigger questions or things or ideas so the amount the magnitude here of data and money we can avoid spending is from 500 to 1000 bucks per month on the Azure costs Stefan can you report what you did around the IRM64 I was pretty happy because I managed to have the build from Packer for the Azure IRM64 but for now it's bumping into an error due to the not being able to overwrite VM images machine images within the Azure gallery if you got the same version it's forbidden to overwrite the old one and that's a problem not for the production usage so when we issue a tag it should be okay but when we do PR and built on main we are using dev gallery and staging gallery and those one we are in a problem we are stuck in that problem of already existing images so we were going away with looking version and using 5 only and then we realized that maybe it would be easier to add a time tag or an alias tag at the end of the same version but we need to make sure that we got the garbage collector running smoothly on that because if not we will have tons of images in those gallery so it's working but kind of useless for now the next step for us will be to be able to start using to build our own docker images on ARM64 or to switch to the all in one image and start most of our workloads to ARM64 on the other costs using spot instances so spot instances we checked the prices alas only two times better which mean if a spot instance is reclaimed then only one retry remove all the benefits and the non-benefit is developing their build being retried especially on the ATH the thing is in the case of plugins that builds on virtual machine requiring docker most of these plugins need docker and have 30 to 60 minute builds it's not a quick build of maven cleaning style you have an HPA most of the integration tests use docker and need virtual machine and could be reclaimed by spot same thing for ATH some ATH branches are just a few minutes and could be good customer of ATH of a spot instance but some takes six hours to be cancelled by a spot reclaimed so using spot for the ATH is not a good idea the proposal is we might need to propose new templates with spot and then we could either opt in or opt out but in the case of ATH that will need a bit of revamp inside it's not easy it's not an easy way to gain money so spot instance for virtual machines in Azure might not be interesting for C agent and Kinsa U agents however we enabled it for packer VM builds that we do because most of the spot reclaimed is one hour and most of our builds anyway takes 40 to 15 minutes so let's see how it behave but that's just a few bugs to gain here then unless you have question about Azure costs right now we should be just under CAE this month with the current workload meaning all the virtual machine of C agent Kinsa U are running on Azure that's the current status so we should be okay and then we would need to decrease C agent Kinsa U spendings on Azure that will be the next step AWS costs bomb build, bomb build, bomb build and trusted CI Stefan what's the status of trusted CI migration for Azure? for now I'm defining the VM that we need to spawn on Azure for trusted within Terraform and I'm on the network side right now so I have to PR on two repository because we have Azure and we have AzureNet so right now I try to both of them working together with creation of the network within AzureNet and usage of that network in Azure but yes it's my main task if the RM64 leave me alone for a while because it took my brain way too much that happened do you need a review or unblocking on that task on the upcoming days or we probably need to check with you if I did correctly for the mind of the net tomorrow if you're available should be doable thanks walked so we were able to successfully start and run partially a bomb build on a new set of node pool the assumption is we are able to show that we can decrease the costs of the bomb builds there are different layers the layer here is first targeting of not blocking the plugin builds that need container when there is a bomb build or storm of bomb builds at least plugin developer will have a shorter feedback loops second see how we could decrease the cost of a given bomb build by packing more agents on bigger machines so the overhead cost less and also bigger machines helps us to use let's say low cost spot instances so we have updated the existing node pool that bomb and plugins is using on AWS to decrease the cost of a single pod that doesn't mean that will decrease the cost globally because it depends on other parameters but still we're able to decrease of 20% the cost and we also change the spot eviction rate because most of the instance size we were using had a 10 sometimes 15% eviction rate which was visible on the bomb builds with a lot of agents while now we are under 5% for every kind we use and we had the new bomb builds and there is a lot of learnings on that area we are in a loop back the work that JC did to help us on the outbound bandwidth consist in instead of building one time the megaware stashing it and unstashing it 280 time which is costly now we build it 280 time but it's available locally that one has an impact on the compute that surfaces some of issues we already had before especially CPU contention it looks like 4 CPU per branch on the bomb is not enough so we have different solutions here but it seems like we need self made solution for stashing and unstashing here that means we should ensure that everything is running on that new node pool and that we should use either an EBS volume or S3 bucket so we build one time the war we copy it here and then we can reuse it that's an optimization the good and positive thing is now we have a way to measure specifically the behavior of bomb builds so let's iterate on that part important take away for us as administrator as underlined by JC the bomb build currently use label the label allocate an agent and we as admin implement the interface contract that label is by saying oh it's a pod template if the label is 17 it's that template or this one or this one the test we did we directly specify the pod template method in the pipeline itself so we don't use that label contract by the admin the advantage of that new of that second method is that it surfaces the pod eviction to the developer in the big logs which wasn't the case before it was only under controller logs so we didn't from developers and the thing is with the spot instances I eviction rate and some OOM on some pods and also CPU eviction because using too much CPU due to the we have a limit of four CPU sometime we saw pics in five and six requested so the system killed the pod and then you have a reattempt so we are working on still studying the matrix area but what was surfaced by the last builds yesterday and this night is that there is an issue on CI Jenkins we don't know if it's Jenkins if it's the Kubernetes plugin if it's our whole setup if it's the topology Azure to AWS but SH steps on the bomb builds for instance a single curl request that check for the ACP availability that should take a few seconds it take two free for sometimes five minutes for the controller to establish the connection to the agent run the command and report back so 10 minutes building the megaware free to 15 minutes for running all the intermediate steps and then 15 to 20 minutes of running the PCT that mean each branches of the 280 branches is currently taking 30 to 45 minutes each at the same time so of course no improvement in the build time and in the cost it's even worse that's the status right now so that's why we haven't merged or changed anything yet um now on the minor issues oh no we have till the artifact caching proxy and reliable another yes Mark Demian if you're willing so Erwe and I were just having a discussion prior to the launch of the project we're going to talk about bomb cost reductions I'll be sending an email message proposing to significantly reduced bomb execution costs by changing what we execute when we executed so the idea I'm going to take is I'm going to propose that we will only run a very lightweight step on each pull request and that in order to in order to run on every pull request we'll have to apply either a label or a comment to a pull request and what we'll what I'm proposing to do is we'll use an Octopus merge to combine many pull requests into a single build so that we cut the costs we get that we get the smoke test that happens it takes from five to 20 minutes to run the smoke test and the smoke test tells us important things and then the proposal will be let's only run the bigger set of tests the ones we run on every pull request now when a developer specifically says and a maintainer of that repository specifically says I want to run it and they'll only do it when they believe it's valuable enough to justify the spend that's absolutely a good way because it can be done now and it doesn't it doesn't predate or create any kind of problems with the other optimization we still need to fix that issue I mean 300 parallel elements in the build you should not take minutes for Jenkins instance that size there is something abnormal here but still yeah that's a really good idea if we are able to drive this thanks for that idea I did realize just as I was describing it there's an exception case there when a developer wants to evaluate a prototype they need to be able they need to get full execution like a regular pull request so I think what we would do is say if the label has dependencies only run the small test only run the smoke test an opt-in with a specific label we'll discuss that in the I'll use the developer list I think for that discussion yeah is that okay for also opening an issue on this to track the idea yeah if that's okay with you I'll do that that's a better way to do it that gives us a very solid place to track and discuss good I'll do that the discussion in the mailing list is clearly a better but putting tracks record for audit to serve as support for us is important great correct good idea that's absolutely worth it folks so that's all for the BOM we weren't able to reproduce ACP issues with artifact caching proxy on digital ocean only digital ocean for the BOM I don't speak about ATH in Azure that's another topic you had concerned about that the recent changes from JC weren't absolutely using ACP everywhere so my test on the BOM separation shows that ACP is used for the mega war generation however Mark we are not sure and we might need or at least we might need your help understanding each PCT the PCT.sh step once you have the mega war and you run the PCT it's a jar file that is called with a few parameters in the shell script and we don't know what that process is doing is it calling Maven and building things or is it doing something else do we need ACP to be used for the PCT.jar codes I'll have to look to be sure I confess I don't know BOM is the SH in need of for the ATH issue that Basile reported that's different topic that's not the same job and that's not the same network that's not the same cloud and that's not the same error message in digital ocean it sounds like that when we start having too much digital ocean we might have a limit in the system in the case of the ATH that's TCP connection refuse that's absolutely the network so maybe that could be the Kubernetes cluster with the Azure ICP blocking connection at load balancer level because it reaches a certain amount but I believe we are still using the old network for CI agent in SIO and its virtual machine agents which is overlapped with a bunch of issues so the proposal on short term two proposals first one we don't need to focus on using ACP for the ATH at least not for the parts mentioned by Basile because it's partially used unless we demonstrate the opposite using the the metrics from GFrog second in the CI Jenkins IO performance tuning that we need to do as soon as possible migrating the controller on the agent to subnet in the new public network built by Irvay will be a solution see CI Jenkins IO network changes did I forget something about ACP or Irvay no let's go okay tiny tasks make environment and description fields mandatory for bug type issues I propose to remove that issue from Milestone Alex opened it that's worth a discussion with the Jenkins core I don't feel like it's the Jenkins infrastructure team role I don't say we close it because still we are GERA administrator and if a decision to change the behavior of this field on GERA is needed we need an audit track and we need someone to act we could pass putting to the mailing list threads in CL desk issue yes right so yeah so I propose to remove from Milestone no objection no objection tiny issue Puppet agents keeps updating the GPG key each time we have a weekly there is something on the system that writes the value of the new key the new key file as dash 2023 in its name to the hold file while Puppet on the pkg keeps updating it the other way around so we are spammed by this one we have to find which part of the weekly process I was able to pinpoint to the mirror scripts two weeks ago but it's not the only one doing that so I misunderstood something or missed something that's minor is just another chance for us RM64 VM agent unavailable same garbage collection of the Packer MEI is a bit too drastic so the proposal here should be quick before deleting an MEI let's check the configuration the public configuration file from CI Jenkins and infrascii and if the MEI is within don't delete it so that one should be easy to use CI Jenkins you define a default build this carder I've installed the plugin I haven't looked yet I will want to do a session with you folks because in so we need it we need to decide and communicate but also I don't know if you remember the two of you when we checked the details of the child orphan we can see on the organization scanning on CI Jenkins you which define how much element when a repository or a branch is deleted how much element or how much time an element should be kept and I discovered there is something named a build strategy that provide the build rotation logs that we were searching for which is a bit different than the orphan child so maybe we could define the top level by default but we could define one policy for each top level element in CI Jenkins and we missed that one last time so if it's okay for you I will want to take 30 minutes because I remember last week before the devox you mentioned setting like this one that I warned you about not applying not immediately I propose we do this in group I wanted to find a way for CI Jenkins to not build archived repository I don't know if there is something for that there should be when the archival is taken in account yes so that's to be checked theoretically they shouldn't be built with the configuration we have so there is something doing word I wanted to share the knowledge in that session so next step is setting a default build disk or the policy and communicate about that to developers hey now it's only five builds just just a minute I need to shut down are we add launchable to agents where you the shoe before about the title time to check it launchable to agents my request is ready the humanity checks are okay so when it will be merged I will be able to update the backline library to use install the launchable or install it if not good one last issue is Ubuntu 22 upgrade campaign I was only able to start request on the docker open vpn image I haven't checked the result yet open to be checked and deployed I think that's already a lot there is an issue incoming for CID and Kinsayo with the following written elements so summary so we can close that meeting we need to change the system disk to an SSD on CID and Kinsayo as you told us survey importing on Terraform could be done one shot so we should be able to audit and check all the resource settings on the hardware especially the SSD issue then we that should allow us to enable disk snapshots on the Jenkins so we could get rid of the job config history there is the network migration that we mentioned earlier that could help also related to the Azure port I've asked Tim Yakom if we could shrink the amount of Azure virtual machine templates another one interesting to share with you because I think you were the first person to ask me about that I will want to also enable web sockets so the agent will communicate to the controller through HTTP and web sockets which is way better than the inbound TCP native for resource usage do you have anything else? no ok I'm stopping the recording stopping screen sharing first stopping recording and see you next week