 Okay, everyone. Welcome to the Jenkins infrastructure team weekly meeting. We are the 22. 505 April. Today we have so made a man, we have Stefan Merle, Mark weights and available. Welcome everyone. So let's get started with first announcements. The weekly release 2.342 has been released. Is that correct, Mark? So yeah, checklist is completed. Cool. So we should be able to proceed with the Docker image. Docker image is built and confirmed. That's part of the checklist. Cool. Thanks a lot. So no issue this week. We had an issue last week. I accidentally deployed a change to release CI that happened during the release. So I had to trigger it manually on myself. Sorry for that. This week it happened. So we still have some tasks that will allow such issue to happen in the future. Until then, please be careful about what Damien can do just the day of the release and throw bricks at Damien. Announce other announcements. So we have the LTS release tomorrow as we discussed last week. So please tomorrow do not merge anything at all on the Kubernetes cluster until the LTS release is done. Same for trusted as well. So please restrain to change the infrastructure unless you know what you are doing. And that message is particularly targeted at Damien du Porte. Are there other announcements on your side, folks? And for me. Okay. So let's proceed. As a reminder, the past three weekly meeting notes and video has been published to YouTube and GitHub. And I still have to publish them on community.genkin.co, which I will do right after this meeting. First of all, the tasks that we did this week. So this is the milestone with the date time of today. I'm going to the closed issues that are listed on the notes. First, we had an incident yesterday about Artifactory being slow for a few hours. So that's the same incident regularly. We start seeing alerts on Datadog and PeggioDuty and right at the same time, our human monitoring, e.g., our users start complaining about builds being slow on the machine or on CI.genkin.co. We tried contacting GFrog. So first I asked Hervé to do that. He's still not on the allowed accounts to open issues. So I assume that's the same for Mark and Stefan. So I will send them a gentle reminder that we need to be able to open issues, not only me. We also opened an incident I followed up, except the usual automatic message we never heard from them. Each time we never heard from them except we don't accept hello and okay, is it closed? Not sure what to do about that. We don't have anything right now, but it starts to be more and more frequent. So let's see the response they will give us about at least adding more people able to open support. And at the same time, I've added in, it's not the first time, but Jesse Glick reminded us that there used to be a proxy caching so that won't solve off the issue, but that will help mitigate such issues if we correctly deploy that tool. The goal is to not have a spuff that will make the experience of building plugins and Jenkins component and CI Jenkins IO that will be slower or less reliable than it is today. That's the balance that we have to challenge as in Fratim. There is an issue for that that has been added to the notes in the hell desk, issue 1136. No, sorry. Monitor, artifactory, right. No, so that's another issue. Bye-bye. It's not an issue from Daniel Beck. I've just before, so I think it would be good to have it. Okay. So I've added on the list at the bottom of the notes, you have a new section that will be the set of issues that we will set to the next milestone. And artifactory proxy caching will be one of these. So artifactory proxy caching is about spawning an NGINX server that will serve as a proxy, as explicit proxy. So we'll have to configure the builds to go through this NGINX instance or one of these NGINX instances depending on how we deploy them. And the idea is that any Maven release build, at least as first, first time cache is empty. So NGINX will forward the request to artifactory. And then it will cache the answer. So any subsequent build will have its file served from the local NGINX. That's something that used to be present on the platform. So we should be able to build from the existing tooling. It was already caching only the release artifacts and only Maven URL, which should be a good target to get started on. There has been discussion about using snapshots, etc. Let's start with the release and let's see what behave and then let's iterate. The main challenge we will have to face compared to three years ago is that we have the builds of CI and NGINX.io can happen on AWS Azure or Digital Ocean and they can happen on virtual machines or Kubernetes agents. So we will have to find a way to correctly balance the caching. Should we start with the first instance that might add network latency if you are not on the same cloud as this instance? Should we have different caching proxies but in that case we face having different behaviors? An artifact can be cached on one cloud and not on the other. So if you run two times the same build you might have different build times. However, Maven is quite good at the artifactory checksum system. So as soon as we only cache release artifacts then there should be deterministic builds only the build time should change. So is there any other question or can we look at the artifactory monitor rates? No other questions for me. Okay so monitor or response rate on artifactory. So we can close this one Hervé because we already have Datadog monitor that then feed PagerDutyAlerts. The two things we can do though are first, ensuring that the Datadog monitor is published so it should be available for everyone on Status Genkin.io by default. And secondly, ensuring that everyone on the team will be on PagerDuty. That will be a topic for middle of May not for now unless you want to get started on that. But I propose we delay that second point to May. If that's okay for everyone. Okay with me. I'm on PagerDuty right now but in many cases I'm just acknowledging and ignoring rather than taking active action. Good to visit in the future. Good for you? Yes. Okay so let me add thanks Hervé for that to be closed. Datadog monitor. No it's not Datadog. No Datadog. Okay are there other points around artifactory? Okay next topic. We fixed the Digital Ocean Terraform build that was failing since a few days. Thanks Hervé for that. And I already discovered that Kubernetes 1.20 is not available anymore on Digital Ocean. The older version is 1.21. So that raised the topic of maybe we should start planning upgrading our clusters on 1.21. Usually we try to upgrade all the clusters. That's what we did in the past but now we have four clusters. We will start the question should we do one at a time. Is there any constraints? So that's a topic that needs to be discussed. Thanks Hervé for creating the issue that is on the to-do list. No worries. No emergency because our existing Digital Ocean cluster is currently still up and running except that it doesn't benefit except from security patches. It doesn't benefit from the full management and all features. It's just it stays as it. So we did a trick to fix the Terraform build we are able to deliver. So now the question is when and who should work on that topic. I propose we delay the discussion about upgrading to the to-do list for later on the meeting and we continue on the next issue unless there is a question. It's good for me. Okay. So the disk space on the virtual machine hosting the service census.genkins.io which retrieved metrics and send them to the Jenkins report system was triggering alerts. It was almost full. So the team increased the disk space on both the system volume and the data volume. Is that I am understanding that correctly Stéphane Hervé? Yes, because the data volume was set up for a physical disk of 64 gigabytes and was only using 30 gigabytes. So we took advantage of the increase of the system to increase the data volume to this full extent because it was already we were already paying for that. So thanks a lot for doing that. I don't think there are other points or there is another question on that topic. Okay. A new end chart. Jenkins job has been introduced to allow us to define using configuration as code. So we to define credential and jobs. So we write a bunch of YAML and we use environment variable that are passed like the usual Kubernetes and Jenkins config as code. And that generates job structures. So that's a first step. And that chart is only let's say a size for our usages. That might be a contribution for the community for later or at least a conversation starter because right now it's not possible to define correctly this kind of elements. Job DSL is absolutely not what I would define as correctly. So we have a YAML definition. So it's it's current and easier and it allows us to host sensitive jobs and infrasci. And that should also allow us to define the jobs of release.ci as code as well. So we should be able to to switch the configuration settings quite easy in the future. We had a bunch of minor issues that are day to day explanation. So unless you have one that I forgot, I propose that we jump to the work in progress. So thanks very much for taking all of this work in progress. So we had we currently have 10 issues that are currently being worked on or that should have been the two major one. So I'm taking them on the order. We have rating migrating rating Jenkins.io to Azure. So status, stop me if I'm incorrect, Stefan, but it's what the database has been created and is managed as code by Terraform. There is an Elm chart which is installed in the pod publicates that hosts deployment and service and that is able to connect to that database. So the next two tasks are now defining an ingress that might not be enabled now because the current DNS points at the old virtual machine, but defining the ingress is required and importing the first time the data from the actual database on AWS. So you can see the application working on real life. And once these two tasks are finished, then we can plan the migration. Yes. Okay, so is it okay to continue working on that issue for the next iteration? Yes? Okay, thanks. Apply to Docker open source program. It's still on me. I need to work on that for the next milestone as well. I will be able to do it. It's just like I forgot. So that's why I'm mentioning it's not a matter of time. It's just like completely forgot. Docker up credential for VM agents. So there is a work in progress to try to spread the low the API rate limits between different accounts and kind of usages. There has been a change. I still need to write it. Olivier and I worked yesterday. He shared with me some of the missing credentials. And we did a cleanup on some of the Docker accounts used by the infrastructure. In that case, it was Jenkins forever. I'm current. I started a run books that should be published tomorrow. I expect that explain the fact that we have Docker organization where we publish images. And these organizations associated users and owners of members. And we are limited because we are not part of the open source program to free users per organization. Which means most of the time, we should we are not able to correctly let everyone access and manage it as for today. So the pattern I propose is for each of these organizations, we should have three users. The first one is a member and it's a technical user that we use for the Docker login of the infrastructure to connect to allow pulling or pushing images. This account will have an API token to connect that we can rotate. And it's only members so it can only read and write the images. It cannot delete images. And it cannot change the ownerships and the settings of the organization to damage control if one of our instances is called. Everyone on the team will have access to that technical access through the subs encrypted passwords. Then the two users are owner. So there should be humans with a two FA password authentication on the Docker either on their personal account or the account they specified. Since it's only two, the proposal is that one is the current Jenkins infra-officer. So for now it's me. That will be the next official and I will change the seat in the future. And we defined a fallback. If the Jenkins officer is in early days, gone is whatever. We need to have someone else able to get it. So for now on Jenkins forever, I think it's Mark. I will need to do a full audit of all the other organization that we manage. So all the Jenkins something organization, except Jenkins CI and Jenkins, which are out of our scope. So to do on that area, I need to write down the run book. So I propose that we delay that issue. I don't think we will have time to treat it next week because then once I will have correctly written the run book, then Stefan, you will have all the required information to proceed on that issue, meaning updating the pipeline to library. Right now, that issue is considered as block. So if it's okay for you, I propose that we don't move it for the next iteration. We will continue later in two weeks. And you maintain that spreading access even if you got the open source program from Docker? That will depend on what we can do. I'm not sure about what the open source allows us to have. So right now, I don't know. Good for you. Yes. Any question on that topic? Okay, so I propose that we remove the milestone of this one and I will keep track of it through the notes. Maybe after meeting the next milestone. Yeah. Good idea. Yeah. Can I let you a real change the milestone for me, please? Yeah. Next, milestone. Add email at alias4press, same. I need to open an issue for the Linux foundation so they can define for us an email server. And then we have to proceed on migrating all the accounts because we didn't heard from KK from anyone. So the reality is that we have to proceed. We could try to contact Mailgun at the same time, but we still need to ask in parallel Linux foundation to start a web server for us. Yeah. So we can go at the same time. Exactly. The idea is I need to amend the issue. Linux foundation, so we said we told them we need at least to catch all at Jenkins.io under the new press alias that was asked by Gevin. So as soon as they have created that mail server system, we can migrate the mxdns record to them and at the same time contacting Mailgun to see if we can recover the account or at least ask them what is the list of issue because we need to create them. So it's on me as well. Sounds good for you. Perfect. Yes. Sorry that I didn't get the action item I took from last week done. Thanks Demian continue. No problem. I'm also late. Garbage collecting AWS old image. It's working progress, Stefan. Yes. Is it okay if you continue working it on the next milestone? Is it okay for you if I continue working on it a little next milestone? Okay. Because it's kind of funny to learn jQuery as a key tool. Git for Windows long path. So I understand that in short term, the issue from our user is closed. The issue should be closed because the next step are long term that will be building our own Windows Docker image and setting up the the parameter. Is that correct over there? Did I miss something? Yes, that's correct. Okay. So can I let you close the link to the let's build our own image issue? Yeah. Not right now. Take the time required. Please link to building our own Docker Windows images on infra.ci. Should we add building Docker Windows and infra.ci as a topic for you over the next milestone? Or will it be too much? Not sure. I want to work on this next week. Okay. Two weeks maybe? Yeah, maybe. Okay. So I will let you add the link tissue to the milestone next. Sounds good. What are the other topics? Infra costs. So you were successful on making it work. Yeah, I'm waiting for a review. Oh, there are still a review from us? On the result, you mean? Yeah. No, just a separate request with my fix on it hasn't been... It's mixed on the 200 emails I received. I'm sorry. Do you mind linking it to the issue because I don't see the link here? I think in the issue you should have... I was monitoring the link tissue and they are all merged. That's why I missed it if there is one. Sorry for that. Yeah. Okay. So it's work in progress and you need the technical review before we can proceed further. Is that correct? Yes. Okay. Infra cost, waiting for review. And so that means to do next milestone. Correct? Yes. Okay. So let's move it to the next milestone. Monitor builds on our private instances. So I think there were... Okay, items. So Mark, you proposed to use RSS like you are doing on your own. And Daniel answered that I missed that answer. Right. And Daniel's answer guides what we should do. He's right. I thought of replicating RSS. He said, hey, we really don't need anything that complicated. A single job running on that trusted.ci that then reports outward to some other location, whether that's the status.jankins.i or wherever is enough. And so I think he's right. It makes sense. The more I thought about my idea of replicating RSS feeds, the more complicated it became. Okay. So that means we should stick on a job that published to, let's say, as your bucket or whatever system outside. Exactly. All we need to and we can publish anything we'd like to that location. It just gives us a place to a public location to see the health of the thing from the results of the Jenkins job and creating Jenkins jobs that look at other Jenkins jobs is actually not that hard. Okay. So is there anyone interested on working on that next iteration? Yeah, we're not. Okay. So that means thinking about whatever language scripts that will run as a Jenkins job that will generate whatever is on Yammer takes the reports published on Azure buckets. And then the next step is adding a datadog monitoring, eventually linked to page duty that will look at the freshness of this one. Monitor. You can repair on that, are they? Yes. Of course. I'd like to understand a little more because of buckets and pipeline and datadog. The three of them are interesting for me. There were one or two previous issues from Daniel also for private reports on these private instance and also reporting of a failing job in Jenkins-Sinfra on IRC. Cool. Okay. So can I let you change the milestone on that one? Yes. We have weekly.ci Jenkins.io. Which is ready and waiting for review. Yeah, it has been merged. So almost there. We have to... Oh, it's okay. Is there a new issue? No, it has been merged. So we have to check. It has been merged. No problem. So I propose that we keep it for the next milestone as well, because we need to validate that it's okay. And we will let team, since he opened the issue, close the issue and consider that done when we will have reached what he expects. Does it sound good for you? Yes. If you're okay, Hervé, both you and I will monitor that issue and answer to team. So we share the workload on that one, but that should be almost there. Thanks for taking care of that. And so when I tried to open weekly.ci.Jenkins.io, it directed me to an HTTP site, not an HTTPS site. And it was the Jenkins devil icon, the image of crash, the crash image. So still more work to do there. To do next week finalize, plus let team close the issue once done. Okay, thanks for checking, Mark. One last issue is create new sun grid account on Azure. Didn't add time for that. Since it's not emergency, I propose that we remove any milestone from this one for now. Even in two weeks, I won't be able to work on that because it's currently working. So no problem. No problem. I agree. Okay. So I'm now looking at the new issues that open. So it's an idea from Hervé that I feel very good and very at ease with. We have that infra team sync next milestone that will be an always open milestone. We add the new importance to pick for the next weekly meeting. I insist on importance. And we can also put there that what we already did, the important topic that have been blocked and delayed of one or two weeks. So most of the time issues like the Docker credential for VM that are work in progress, but blocked by something else for one or two weeks. So the goal of that milestone is for us before closing the weekly meeting after checking what we did, what we are currently working on is to check the new important subjects so we can shape the new milestone for the next week and decide what do we have in the area. Yeah, that's nice. And I've just created an infra sync later for the issue. We won't work on it next week, but we want to keep in mind like the same great one. I would have used this one too. Even for the one in two or three weeks, not too many. For me, it's back to open the issue without milestone that we have to try and keep in mind, especially since the RV, the big slaughter of the rotten issues is there. That means that we should have less and less open issues. So that's why I hate to get them back in the issues with 3H. It's too much for him. Less than 200 now. I propose we start with the next one and we keep in mind the later one. It's just that when we reach the later state, it means we won't have time or it's not as important as the current one, while the next or the upcoming for next week or the one we need to delay. I propose we start, but we keep in mind your ideas, your idea, because that might be needed. Let's try it with the next this week and see if it's okay for you. Yes. Okay. A grade to Kubernetes 1.21. Who is motivated to start working on that? Yes? Yes, I'm ready to work on it and if you want to do it with me. With pleasure. Okay. I don't feel comfortable doing that by myself, but prayer is nice. Okay, so we can sync on the next step. I think the first step can be done totally by Stefan and the RV can just ping you, but that one is quite easy. And then the rest I will let you both of you work on that. So let's put it on the next milestone. Minor one missing incorrect headers cooked by RV. That's one that should have been fixed, but since the latest major ingress on Kubernetes upgrade, there has been a rollback. So I will take care of fixing this one. So if it's okay, we put it on next milestone for me. I will ask you to review it once I will have open the pull request. And then we have the artifact caching proxy for CI Jenkins IO. Uh, if it's okay, unless someone wants to start working on it, I will want to start to take this one. Okay. Good for you. That's, let's use the Docker hub credential for VM agent. And I think that's all for today. Oh, there is a question topics that we didn't speak about. Okay. Thanks a lot for your work, folks. And see you next week. Thank you. We are progressing. We are 37 minutes, almost 30 minutes. See you later. Bye. Bye.