 Hello everyone, welcome to the Jenkins Weekly Infrastructure team meeting. We are the, I don't even know the date, 17th of January 2023. Today we have self-time at the portal, Hervé Le Meur, Stéphane Merle, Mark Wait, Kevin Martins, and Bruno Verrarten. Hello everyone, I hope you are doing well. So, announcement, the weekly process is going well, tags are okay, packaging and Docker image are almost finished, if not already, and checklist in progress. Am I correct, Mark? That is correct. Cool. Do you have other announcements, folks? None for me. None for me. So, we can proceed with the upcoming calendar. Next release is the 23, is that correct? We are 17, no, 24. Next LTS, I have no idea, I haven't looked at the calendar and I'm asking for help. So, I will look it up. I believe it is February, hang on, it's on the calendar. So, February 8th, yes, February 8th, 2.375.3. The release lead is Alex Brandus. Okay, thanks, Mark. Jen Kinn's advisory. Do we have publicly advertise advisory? We don't. So, NA, and next major event will be first them, the first week of February. As far as I can tell, some of us will be there, some of us alas won't be. Sorry, folks. There any comment about calendar? No upcoming event that we forgot. Okay, so let's get started with the work we were able to completely finish. So, add Jen Kinn's Infraio component to CI Jen Kinn Sayo and Infraio. So that experimental work from Gavin is going less experimental and it needs a professional continuous integration. That's why that has been moved from GitHub Action to Jen Kinn's. Is that polite way to say it? Thanks, Serge, for taking care of helping Gavin. One of the main takeaways that Gavin is using a nice tool name Playwright, which is a NPM dependency, which allows you to run tests on a web browser. So that kind of tool exists in one decade, but that one seems to work quite easily, but required some system dependencies that we had to install. So, Gavin is now working with a Docker image that he was able to test and check for that specific project. And we are going to install that tool or ensure that at least the dependencies require dependencies out there. So, any developer could install it if needed without requiring an update to get installed or whatever command. Which is not allowed in production, of course. Thanks, Gavin, for the huge work on that area. GitHub app for plug-in health-coring. So, thanks, Serge, you helped Adrien. As far as I can tell, the GitHub app sounds installed and ready. Now, it's a matter of providing the correct key to the application or in the correct format. Yes, we went through, if the key had to be converted, it seems so. So, I've redeployed the application and we are waiting for the next probe run job, to check if it's working or not. Okay, so I'm quite optimistic that you will help and both of you will finish that. So, nice job, folks. Components has been archived. The name in English is fun, but in French it's way funnier. Trust me. Foufou is like, you're really crazy, crazy. No, I searched for a bar, bar. Or, that's a pure Anglisism. If a French person would have done this one, it would have been Toto, T-O-T-O. However, an improvement on C.I. Genkinsayou. Thanks, Stefan, for adding AWS Virtual Machine based on Windows 2022. So, that's an add-on to the actual Azure Virtual Machine Windows 2022 from last month, which means now, if Azure goes down, first, if Azure goes down, we still have machines in IWS and C.I. Genkinsayou spread a lot between the two clouds and the coast. So, thanks, Stefan. That's also the opportunity to clean up some labels and all configurations. So, nice, thanks. Netify site for Genkinsayou components. So, Hervé, do you confirm that everything is set up for Gevin? Yes, this was an issue he created when it was built by GitHub Action. So now, it's from Genkins. We don't need this issue anymore. Okay. But still, it's off. So, yeah, superseded by Genkins instead of GH. Because we still have the challenge of providing a Netlify token on the Genkins pipeline instead of the GitHub Action. Am I correct? Yes, sorry. We still have the challenge of providing a token to a low Netlify. So, while moving the build to CI and Infra, I've created a GitHub App dedicated to this repository. So, you can extract, you can use a GitHub token in the pipeline. To deploy to Netlify? In Netlify, it's set for Genkinsayou components while it's not the correct term. The core of the issue was he needed a GitHub token. Okay. He made itself, himself, he took care of the Netlify token and for the migration to Genkins, I had to create a NPM token and a GitHub app. So, you could use a GitHub token. A token? Oh, isn't it... There is free token, there are free tokens for this pipeline, a Netlify token, a NPM token and a GitHub token. Okay. And so, which Netlify... So, you already generated a Netlify token in Genkins, man? He made it itself, himself. I didn't touch Netlify. Kevin took care of it. Okay. The issue was... Okay. My question is then... Is it... Is personal Netlify or is it the Netlify of the Genkins infra? I don't know. I don't... You open the issue and check the comment in there. I didn't have to add to... to do anything about Netlify. Okay. I think that somewhere else. I'm not blaming you, you looked on the defensive. I'm not blaming you. I'm asking question because I don't know. And since you migrated the pipeline, you are the person with the most fresh memory. I don't know what to respond to you about Netlify. I think I didn't touch Netlify. Okay. And because Kevin took care of everything around Netlify. And his previous issue was requesting a GitHub token. Okay. So we have to check as a team what is the status of the Netlify deployment website? Is it only temporary? Is it only for a preview website? Because if we have a production website, we should use the Genkins infra Netlify that is used for other websites. Okay. But it's closed and it's okay. We had an account. Next issue is an account related issue that has been solved from my point of view. Can you please read the title? So I remember it was translated on Google and it was something I've been cooked by the anti-spam system but in Russian. Yeah. So and I've replaced the title on GitHub now because I would never remember that translation. I don't receive an email with a password. I'm trying to register with another email. So I put it, pasted in what Google Translate says. Okay. So I've checked, but that was word because the person opened a twin issue. So I triggered a password reset and I checked that the SMTP relay was okay based on the logs in the printed. Let's say the Gmail SMTP received the email. So if that person doesn't see an email on their mailbox, I can do anything for them. So I told them to look their spam and we'll see. That one was word because using 200 related emails with 200 related account name. Well, anyway, HSTS bloc use of trustees and third CI, the GenSec team, security team was blocked when using Chrome to reach both controllers because for both of them we use private channels. These are SSH tunnel or VPN private and both private VPN. Sorry. And with Chrome since publicly the Jenkins.io domain says that all the sub domain require HSTS which tell the browser to always switch to HTTPS with a valid certificate and don't accept manually signed certificates untrusted certificate or HTTP. The thing is that in the case of these two machines they are private and they are machines so we use let's encrypt with HTTP challenge for the virtual machine usually and in that case since they are private machine they cannot validate the let's encrypt challenge for a new certificate because they are private and they are not exposed. So they cannot expose the challenge to let's encrypt which mean we need to use a DNS method where instead of exposing a challenge for an HTTP page or an HTTPS certificates instead the let's encrypt system create a DNS record with the challenge so you don't have to expose your web server. We never had time to spend on this so I generated manually the DNS or using the DNS technique a first set of certificate to unblock them instead of hold backing the HSTS because HSTS takes time it has a long TTL it will have taken 3 months so instead we generated a valid certificate and it unblocked them immediately and so that issue links to another issue where now we need to apply an automatic renew system one for trusted CI and one for third CI so these are two subsequent issues now that we have valid certificate for both of them so that's why the issue is now closed because they have a valid certificate and worst case we will have to run the third bot renew command manually in two months on the help of these machines if we don't succeed in automatic renew Maven 387 is now generally available for developer and CI Jenkins.io so thanks everyone involved on that especially Stefan but not only Airvielse did this work we haven't heard about any issue with that Maven version and it's all container machines and setups and thanks Mark for adding it to the CPUZ machine we had an LTS release last week and all of our instances were updated at least the one using LTS version so thanks everyone on that part as well I wasn't available that day so thanks Stefan and Airvielse for backing me up on that area that prove that the team is not dependent on me on these areas which is a good thing which is healthy any question about the job being closed or can we switch to the to the work in progress I don't want to cover the close as not planned because these are free account related issues just a personal thanks for the HSTS thing it's a lot easier to deal with trusted CI insert CI now that they've got SSL certificates thank you I know that's more work still to be done but thank you very much my dealings with them are simpler now thanks to that Happy that everyone was able to simplify the life of other teams then the work in progress issue we spend a lot of time as a team in mob programming with the the automatic update of that certificate only for trusted CI Junkinsayo so that machine doesn't run on Azure it runs on AWS and the DNS records are stored in Azure which mean we had to put a structure a permission structure to have a technical account allow to create and remove DNS record only for something trusted CI Junkinsayo for the DNS challenge and nowhere else we don't want that technical user being able to change record for let's say pkg.junkinsayo so manually if you do everything manually that works very well we validated that as a first step but then automating that permission model with Terraform and the Azure API was quite an issue thanks a lot team for pointing us to the element that we missed or misunderstood and thanks Stefa Nervé for revert the king with me and asking a lot of good questions which allowed us to succeed on that area that we spend one day and a half on that topic we learned a lot on the process so now we have to apply to Poupet to change the let's encrypt system to use DNS so we need to play around with Poupet but the good news is that since we manifested ourselves on the Poupet let's encrypt module saying we are interested in the support for DNS Azure last week they already merged the pull request that was selling for months and they released the whole thing so we should be able to not deal with certes but binary at all and do it all in pure Poupet so the chain should be minimalistic for us so almost there and thanks everyone involved on that part because that wasn't easy thanks for the moral support folks because I was on the verge of throwing my laptop through the window yesterday so a new issue taken by Stefan about playwright tool that we mentioned earlier so Stefan is working on that we know it works on the context of Docker image now the goal is to move it to packer images the reason is because we want to put every builds in the near future on only the packer image container image so we don't we need to install it and transplant the changes we did on the webbuilders we had a new issue from gave in about sourcing index html for getjunkinsayo we didn't have time to work on that we should be able to do it next week reminder getjunkinsayo is a mirror system hosted on Kubernetes the Kubernetes system has two web services the first one is a mirror hosting and the other one is a mirror redirector the mirror hosting system is a web server serving files from an Azure centralized bucket and the redirector redirects when you have a request it goes through the redirector and if the redirectors finds the file it get a hash and from the hash on its database it decide to redirect one of the closed mirror or to serve the file as a fallback in the case of the index html file we need a need that's the root file on the buckets that one was moved manually on the buckets the request from gave in is can we source it somewhere so it can be changed the goal in this case is to add the web components so anyone going to getjunkinsayo or one of the mirrors at the roots if the mirror the index html to have a proper web page so i propose so sorry i am i'm a bit tired so i forgot to reschedule to reschedule the task so that task need to be finished during the upcoming iteration is there any voice against that choice plus one for me okay play rights it's not top priority but it sounds like it's almost done so is it okay to finish it during next iteration stéphane oh yes about the getjunkinsayo i'm not able to evaluate the complexity of that task at first glance it should be easy getting the index html and putting it somewhere but we might need a process to to a deploy it from the source so that might be a task that could spawn across different milestones i propose that we keep it for next one and anyone volunteering can assign themselves if they want to start working on that and i might help or do it by default is that okay for everyone yes yeah although i think it also be okay to say no we're gonna wait one more milestone i don't think that this one will harm gavin if it has to wait a little bit okay i have a personal matter that there is that giving ask us an irc three weeks ago oh almost every day i ask him either publicly or privately can you open an issue for that so i would feel bad if we don't take it now even if it's not prior but that's only personal so if it bother anyone else i don't mind taking it because of that well and that's a good that back story is a good reason to do it that makes sense okay we had a new issue and that one i don't mind waiting for this one i will add a message because last time i had an exchange with that user the exchange was quite complicated so i will add a message and a risk deal for later that's related to the way sentos redacted mirrors are managed and basically i don't want to provide any more support for that person because that's how they handle mirror management on their own infrastructure so either they use a standard mirror or not my problem besides they are using really old version of Jenkins so i would say maybe you should upgrade one of the three older LTS at least so no objection if i move it to infratim next and i get a message yeah no objection at all we don't we don't know anyone's support for their private infrastructure yeah i still plan to just throw an eye and decide if it's an issue there might be an issue on our own so i just want a sanity check from someone of us but when time will come next issue bump terraform module for AWS EKS so that one was moved on that milestone back because we weren't able to so sorry stephan discovered that there was an older cluster consequence of that issue that had to be cleaned up to avoid spending too much credits on AWS and we have a solution for the let's encrypt setup on that one now that we're able to do it for trusted CI there is no we can use the same method on the permission level on Azure to let that cluster to create record on the the correct DNS so that one we should be able to solve it quickly besides based on what mark extracted from the G frog data the artifact caching proxy might be a bit more important than what we say during the past two months is so we need that cluster fixed that's why i propose that we add that one to the next milestone but first we finish trusted CI and then we update that one exclude non numeric plugin version from update center thanks mark for forwarding the discussion to the correct people i saw there has been a pull request directly on the date center i propose that we remove any milestone from this one because we don't have any more expected action and the course of the course of action will be that pull request being accepted or merge an update center so it's outside our scope is that okay for everyone agreed and the pattern i specifically asked daniel beck a while ago i have merged permission to the update center repository but i asked him specifically do you prefer that i exercise that permission or not and his preference is that i not exercise that permission so he would rather be the one who merges all changes to update center and i like that that is perfect he maintains it very well so even though i have permission and others should take the same advice i think rather than any of the rest of us doing any merge to update center it's much better that we let daniel decide when he's ready to merge it perfect and given the title of the pull request once if that pair is merged then that will close the issue on a desk back because it's the same so we don't have to worry about cleaning up the issue thanks mark renew the sign or certificate for Jenkins so for that one we need to work on that mark but with neither of us at time Olivier is okay to help us to share information so i propose we keep it that one up because when that will be required realize repo Jenkins CI or mission so we have a record here mark it is seventh biggest consumer as far as we can tell for December on that repository that is another proof that mark is our testing person for Jenkins and i'm sure Jenkins will collapse if mark homelab wasn't able to run so there are lots of things that we get from that that data analysis and we'll take action on those right we've got several places that we need to we need to improve including more caching on my homelab and more caching on ci.jenkins.io and and we've got some consumers that we're reasonably confident are not contributing enough to the Jenkins project to justify the bandwidth they're using so we will take active measures to ask them to stop and we may even take active measures to prevent their access rather than just being nice and asking them to stop so one of the takeaways we will we will have to confirm i will have a message on the issue after the g frog meeting tomorrow we have two meetings one stefa nervian i and mark about mark will show us the data and how to treat it and then mark and i will have a meeting with g frog to ask them to send as if possible weekly reports to discuss about what mark discovers but one of the takeaways is that we we need to confirm by with the real IPs but we have let's say a gut feeling that our infrastructure is still consuming a lot so the work that everybody did on the caching proxy might have way more impact than what than what we expect which mean the work on private gates and public gates the two new cluster with the network solving is top priority in order to put the ACP back the rest will be exchanges with g frog to ask if we can ban some IPs that marks that mark discover where high consumers i don't give name right now and eventually if g frog doesn't have any objection the world authentication of hippo one or cash or mirrors and held up in high availability modes that are closely related that's the peak might be deprioritized with the other steps first because we cannot treat all the task at the same time we have to say cancelize and the ACP might make more sense for now right yeah the data the data the data bandwidth or the bandwidth data certainly supports the artifact caching proxy being a crucial part of this of this effort because a big chunk of the actual data transfer that I saw is really right from our own releases repository so it's delivering our own bits to ourselves they prioritize and less so thanks a lot mark for that bunch of information that help us prioritizing and choosing carefully our tasks so that issue keeps being on the milestone Hervé your turn private gates can you give us a status not a lot of progress since last week I've got the certificate ready get some certificate ready I'm currently trying to run the Kubernetes management job on this cluster so yes experimentation about this job if and when this job is running correctly we will be able to back up the current phrase I think in Salyo data and move it to this new cluster then we will be able to to move the DNS record to point to this new cluster nice that's quite clear that's that's the same battle plan thanks migrate dns upgrade and benefit and profit after cleaning up thanks a lot that's cool don't forget the pull request to add the the outbound ip to Kubernetes management yeah I'd like to know how I can identify it also but yeah can you start by persisting it and then you will think about automating otherwise what happened is that you will spend three days automating a fire no never mmh we all do you're not alone on that war mirror start report wrong results didn't make any progress on it no I didn't have time I propose to move it back to infratima suggested last week I was a trigger because I told I was going to help you and later you and I had time so I propose to remove Assini uh we might spend some time on it but I propose let's see later I would even yeah move it in yeah definitely not for next week blocked from creating a new account by the anti spam system might be an issue to close oh no they tried yeah okay did it again came back to it and we didn't okay and it's on me I forgot it was on my side and they didn't checked so definitely to be done next milestone all inbound agent publish as latest hey that one entrusted cian kin sayo it's been numerous months where and a set of old tags on the agent images inbound agent and ssh agent these old tags are rebuilding uh sometimes daily sometimes weekly sometimes monthly we weren't able to find why and we thought it was a bug or a word behavior but today we were able to find one of the causes maybe there might be other causes but one which is plausible um let me open that commit that we push on ssh agent can anyone see what does commit does it's important so I prefer sharing the information that pipeline instruction says for that pipeline so for that specific tags all the tags that add a Jenkins file with that instruction in the past once they have been parsed so one build trigger read the Jenkins file and say oh I need an additional trigger at the pipeline level not at multi branch level that's the trick part and the pipeline say oh I will try to pull the SCM and if I see any change I will rebuild myself that mechanism used to be there before it's from an era before webbooks before multi branch pipelines and in that era we were using a policy and because we were using branches but first of all policy does not make any sense for a tag second we don't need a policy we have a multi branch pipeline which takes care of how at least once a day let me scan the whole repository and decide if a build should be triggered or not and all the analysis that mark daniel team hi and other people did was to check the multi branch set up with build strategy that say if a tag is older than free day then consider it should not be built only discovered which work well we demonstrated each time you manually scan it work as expected and as configured so that set up is okay but now the challenge is these old tags we don't want to change all the old tags from the past to remove that from these old tags but these tags already have the polling enables so I've asked the question on different location and I'm waiting for answers if there are a way to disable polling SCM at the multi branch level that will be easy otherwise I suspect I should be able to run a magical grep command to remove from the config XML the directive polling SCM from the triggers that will be really ugly but I expect it to work another solution will be to recreate the multi branch from scratch that they would check all the tags and say oh these tags are old I won't try to build them meaning I won't parse the Jenkins file and I won't enable polling SCM so that one can be tested immediately on docker agent and docker inbound agents for SSAT agents we will have to wait a few days because yeah it was done five hours ago so of course the previous tag release yesterday of that image will have the polling and then we will go back on that problem again so issue has been commented and we definitely need to work on that we worked the three of us on that one for knowledge sharing yeah James ask you you will see later if we need to keep the old job in CEVI I know it has been back up backed up already and I don't mind removing it because most of these tags have a build build rotation policy so Jenkins already deleted the builds the last succeeded builds and that's why the policy and keep rebuilding because it lost the previous one so it's a before nothing so of course it's from no known state to code it's a oh new code I have to build but it fails of course so that's one of that's a reason explaining that behavior we might have other surprise like with it so creating new multi branch job from scratch will do so and show that we have a clean state to start with any question so we keep working on that one finally account Jenkins a admin access for Stefan I did not have time to check on that one Stefan are you mad if I move that one to infras team next to be sure that we walk because clearly we had too much issues on our plates for that milestone so I'm not sure we'll have time on that one that will cost you a coffee when you see me but that's okay okay so let me click on that one it's it's unacceptable cost thanks for your understanding okay on the backlog I don't think we had so I'm checking for the new issues first do we have new issues that we don't have either on the backlog or that one we just removed no new issues that are not on the current milestone on the backlog right so now I'm looking at the backlog just one time if you see an issue that is catching your height don't hesitate I see a new message on that one yeah okay so I propose okay I don't see other thing that we absolutely want to proceed time private okay I test back up maybe but okay that one can be ignored what cluster backup yep yep of course that's part of what you're doing it's not yeah it's not really about migrating this cluster it's more about having a backup of our cluster okay do you want to start working on that one no at least thinking working against issue no maybe not okay no problem I don't think we should plan for more and see if we can terminate our other issues if we are able then we will take on the backlog just a few things for next week we will have to start thinking about the Ubuntu 22.04 campaign because we have a lot of machine on 18.04 and 20.04 20.04 is okay because it's five years support for LTS but we have a lot of machine on 18 we can upgrade this machine in place we can migrate these machines to new machines with updates or we can change architectures I don't mind but we have to think about that and as every fund in December we are now able in Azure to get Ubuntu 22 which mean we should be able to update the agent virtual machine and container templates to Ubuntu 22 so it's not priority for this milestone but we have the go everything is green for going that direction and we have to do it before April as well obviously because that mean all machine in 18.04 won't be supported anymore starting May a bit more pressing though Kubernetes update we will have to schedule the next 1.24 infirm. not mistaken because we risk to have one of our providers stopping to support that version if it's okay for everyone I mention it now so I plant the seed and next week we can start scheduling based on where we are with the rest of the priorities is that okay for everyone yes cool those were the two main topics I don't have any more topics to either discuss writes 2k 8s yep prepare to create a new public cluster in bear with Stefan totally I propose that we second salise so that your mind is not overloaded is that okay for you if you are waiting for builds to take one hour to finish and your mind is free I propose that you start thinking about the battle plan so then you prepare on one issue that you can exchange on a written form with Stefan before you start discussing the two of you of what need to be done is that okay for you so first a written battle plan on the issue that is okay and as soon as you start synchronizing with Stefan then you can't move to the current milestone the related issue is that okay for you that gives you enough space to think and start thinking and start preparing the battle plan without forcing you to do it at all costs so you can focus on private case does that make sense for you in the way you used to work yes cool okay I don't have any other topics do you have some or can we close so again thanks everyone for the huge work show our whole doing and see you next week so I'm now stopping the recording