 According to the cloud, it's recording. And now sharing my screen. OK, share. Everyone see my screen? Yes. Yes. Is it recording? Yes, it is. Everyone, welcome to Jenkins Weekly Infrastructure Meeting. We are the 8th of March, 2022. Today, we have Mark Waite, Stefan Merle, Hervé Le Merle and Haïda Menil-Portal. Let's start with the announcements. We have a new weekly release, as far as I know, like every week. It's been the fourth release, at least, without any issue on the release process. So congratulations, folks. Our process is, again, acting normally, which is a positive thing. I don't have other announcements. Does anyone? So we've got upcoming LTS tomorrow. Tomorrow? Yes. OK. 2.332.1 LTS releases tomorrow with major, with significant improvements to the UI. That's a good news for the users. It is. OK. Hopefully, we didn't plan anything easy tomorrow. Yes, that would not be a good thing, because we spent months preparing for tomorrow. Yes. I'm not sure what did I miss to forget about that. That's OK. No problem. That says it's working smoothly. That's great. Maybe it will be, can the four of us make the effort for the next LTS in six, eight weeks? I don't remember exactly. Four. Four weeks. Oh, weeks. Every month? No. It happens every month, yes. OK. OK. Now I'm mad at the name LTS, but that's another discussion. No, the LTS baseline is established every three months. And the LTS.1.2.3. Four weeks from now it will be .2, right? OK. Eight weeks from now, .3, 12 weeks from now it will be a new .1. OK, so it's OK. So just to see, is there a calendar of this disease or any? There is. The Jenkins calendar has those on it. OK. So Hervé and I, as Stefan and I, we have to add this calendar to our own calendars to not forget team improvement. Thanks, Mark, for the pointers. Are there other announcements then? So that LTS announcement says tomorrow until the LTS is released, we don't deploy in production, at least not something on the pod publicates cluster. Great. As far as I know. And we don't deploy something on Puppet as well because that could impact trusted CI, which wouldn't be able to release the Docker images. Because right now, this is the instance in charge of this. OK. With the release of the release driver tomorrow, is it Katy? It is, Katy Chan. OK. Hervé or Stefan, if you are interested to join the release process, we can ask Katy if it's OK to have you as shadow. Usually, there is that big Zoom meeting on the end of afternoon in Europe for us, where we have sometimes someone from security teams, someone not. But having someone for infrastructure is always a piece of mind for the person in charge of the release because we are there if something goes wrong. So you don't have to be available non-stop. But if you want to shadow, if you are interested in discovering, it's open, just let me know. But it's not mandatory, OK? Let's start with the notes, unless there is something else. One, two, three. OK. Hervé, you had the proposal, or at least a draft of proposal, to help us having better actionables for that meeting. Could you explain that at least to Mark that was on there yesterday when we shared that? Yeah. Yeah, so I've created a milestone in the help desk repository. Can you click on? Yep, I'm opening right now. Milestones. In front, Team Sync. So when you guys have the text. I'm sharing. I'm splitting the screen. So I think it can be a good way to gather the different issue or subject we are currently progressing on or want to speak about in this meeting note. So I started with this week with this milestone. And now when we are working on an issue, I'm adding it to this milestone. And I have also added it to the milestone, to the meeting note. So we have an easy way to look at them when we are preparing the note. That sounds great. Yeah, so these are topics for discussion in the Infra Team Sync for that day. And then their status is tracked as well. Now, will there be a, I assume we close milestones, even if all these issues on them are not closed? Or how will that work in terms of? It seems to be defined. I don't know yet. So I'm not sure. I don't know. OK. There is a paragraph about it below Damien. Oh, sorry. About the milestone. Right. Where is it? They don't see. Below, below, below, below, below. Below. OK. Yes, this one. Not this one. No, not this one. Which one then? Just above the one you line 89. Hervé, you're sharing your screen and you drive this. I need your help. You have to explain it to us. I know you don't like it. I'm sorry for that. Much more to add. It's limited as we can only link issue from one repository at once. We can't link issue or pull requests from other repository. So it implies to create an issue in a desk. If we want to track it that way, it's not monetary, but. I would argue that it will be a good thing to create one help desk issue for top level items. Because it allows us to share the state of our work, not only we have the new triage, but we will have the top level items that we do. And then we can link to other issues. That could be in the form of migrating an existing issue on one of our repositories, moving it to help desk and starting tracking it. If it's enough to be a top level item, right? Is it technically possible to move issues across repositories? It's possible, but we are losing all their link. It can't break their link. OK, so better to create an help desk on the one on one. OK. Like it has been done for separating the full script of an help desk was an issue in the info report repository and you created another one in help desk. I have added on the help desk one a link to the info report one. That we create one issue on help desk per top level. OK. Just to be sure I understood correctly, the milestone that is existing today, the slash three, was for the iteration that ends with that meeting. Is that correct? Yes. Will it be for the upcoming? Yes. OK. OK. So after this meeting, I'll create another one for the week to come. OK. If it's OK, so I'm creating one now, just to be sure that I understand, the goal by creating a new one is to support that we add all the topics we need, either not done from last week or new topics. And we add them right now. That will be a complementary to the meeting notes. Sounds good for you? Yeah, it was, I was thinking, putting the enclosed issue we have this week to this one. OK, so I'm taking back, thanks for helping. I'm taking back the screen. So let's start by creating a new milestone. I think you might use it, use some other milestone for dark issues or governance meeting, I don't know, could help. But yeah, yeah, we will see already in a five. It's OK for any use. Cool. So if it's OK for you, I will want to try the following. Let me know if it's what you had in mind, because I'm not sure that I understood everything on the way you will want. Maybe not everything is defined. So first of all, I would like to check on the previous milestone, the work that has been done by the team. I might have forgotten to add some issues, but these one, two, three, four, five top level items should be what we did or finished during the previous. So that's the first step of what did we do so we can see our progress altogether. I will edit my stone to add the next on the previous milestone on each of them. I'm sorry, I didn't. I'm not sure. Well, you spoke about passing the issue from one milestone to one other. Right now, I'm saying, let's check the closed issues that I've shown on my screen to see what did we. Yeah, I've put them in the notes, so I don't. I'm not sure it's a good. I'm not sure it's a way to list and review the issue we've done. I don't know. For this week, I've put all of these issues in the meeting that you have on the right. So there is no. OK, so let's start with infrascii-contrigeur that are not working, that were weren't working. Let's take the subject on that strike because I think it's not clear for you. So I need you to prepare yourself for next week and to explain to us how we should because I don't know. We are trying now. So I never intended to list to watch the list during our minutes. It was only to prepare the meeting notes. OK, that's why I need you to explain it. It's what I've done just now, but sorry. OK, OK. So you need to prepare what you want to say to us, what you want us to try because I don't understand. I'm sorry. I tried here and then when you started listing the issue on the left, I said, yeah, no, please don't do that. I've prepared the notes with all these issues. I prefer not to look at this list right now because I don't know yet how to use it. For this week's meeting, OK, maybe later, but right now I was I didn't plan to and I tried to explain what I thought. But yeah, I didn't think about anything like you where you wanted to look at this list right now. I didn't plan you where you will have put this on the meeting. So I can, yeah, I prepared what I thought I had to prepare. And it's not complete. Yeah, OK, but I didn't. OK, so I understand we used the meeting notes this week. Yeah, is that correct? Yes, sorry. OK, no problem. What's the actionable of the subject of milestone? Is there something that we not you that we mark? No, no, it was a bottom of the list because it wasn't really the most important thing to speak about. But yeah, that's because it's around how do we work as a team? So don't worry about that. OK, there is no problem to have taken. So OK, so let's say to do. So let's start the topics. Then let's switch to the next topic. So in Fracia, Icon triggers fixed. It's not a weekly issue at the moment on time two weeks ago, one of the plugins on the plugin suit changed the behavior of the pipeline when the pipeline defined a trigger, a collection of triggers. And it called a library that also redefined the triggers. Before both were merged. Now the second override the first. And one of our library, the one we use for updates CLI had a wrong logic that was if you provide that attribute with a contributor expression, then define the new contributor otherwise empty list. So what happened is that the moment on time, the empty list was overriding instead of being merged, meaning we keep the existing one. That's what was disabling. Jesse Glick and also at least Daniel explained us deep dive why it's not a good idea. Technically speaking, so now we understand what happened technically. And we have a short term fix that allowed the Kubernetes management to run as it was during the past year since a few days. So that part is fixed and we can operate again at full speed. Medium term is there is a pull request that has been merged one hour ago on the pipeline library that takes care of trying not to merge, just in case we still have one other jobs without the hack. Long term proposal from LV to migrate all the updates CLI pipelines on their own multi branch projects, eventually using a marker file. And we define one pipeline that will have the same behavior for everyone run once per day or eventually once weekly, but that's all. That will allow us to simplify our pipeline. We will remove all the data CLI logic. And that was the issue. So thanks for everyone involved in that part. Now we are able to continue working. Is there any question on that topic? No, thanks very much for your investigation. Sorry for the pain. Yeah, that's let's say user experience with pipeline. That's another topic outside our scope. Azure portal management. So with the changes we did last week about enforcing MFA for everyone on Azure, team underlined that there is one click button, there is one setting somewhere. So we could try next week that topic now that we know that everyone is able. But the thing is we have specific settings that was trying to enforce certain security settings that are defined. We have to disable that policy before using the one click button. But that could be a good thing that would change a no brainer and we wouldn't have to think about new incoming user in the future. So I propose unless someone want to take it, I want to take it before end of week. Is there anyone else interested? Yeah. Okay. Angelix and Ingress Cert Manager upgrade done this morning. The goal was to update. There were a lot of breaking changes on Cert Manager. The goal was to have it in production. So everything went fine on that area of Kubernetes. We had a status Jenkins I updated this morning one hour before starting the operation. I updated that now. I don't have anything else. That's a regular operation and it worked well. So any crucial things in the Cert Manager update that you need to share with us? If it had several but no surprise as you made the upgrade? Yes. Based on the change log, there were some change on the way Set Manager use the HTTP01 challenges but we are not using HTTP01 challenge on Kubernetes Set Manager. So I thought that one could be sensitive but it sounds like it's okay. We might have to monitor in the upcoming weeks the certificate renewal but all the services on Kubernetes are monitored. So we should have alert. That should not be bad surprise. However, that one isn't me. Grafana Pods was stuck in three days and by checking, we had a credential that was not expired but deleted from Azure. So the work and nodes on Pod publicates weren't able to mount volumes because they didn't have the service principle to authenticate against Azure API. I was a bit too much aggressive on cleaning up the dead service principles last week. That one was one that was noted. So we have to be careful. It was one of the without at least two years without any activity while it was active. I don't know why. So we will have to check next time. So we have recreated one. There is the full procedure and we added the expiration of that token on the calendar because the current one was due to expire in two weeks based on the formal node I had and there wasn't anything on the calendar. So at least we should be not beaten next year by that. We put some details and the RV was present during so I'm not this factor. So that should be good enough. So I'm not the only one having that knowledge for the future. Thanks for the support on that one. What about info report RV? So the info report repository contains four scripts reporting different stuff. And one of this report was about Jenkinski repository collaborators and right on their repositories. But since December 2021, the GitHub bot user of this script has his permission done graded. So he couldn't access every repository and the report was incomplete. It has been noted by Raoul on a desk issue. And to resolve it, I had to update the script to generate an access token for GitHub app instead of user. Using the GitHub app PrivateK and App Identifier. See, we added a GitHub app and asked for its installation on the Jenkinski organization with metadata as the only permission needed. The report is now complete. I had to implement a mechanism to generate a new token every once in a while since GitHub app access token can expire after one hour. And the script is taking more than one hour. It takes around two and a half hour. I was in the process of creating a shared pipeline to get this kind of access token. But Tim mentioned there's a way to do it which is already integrated in the GitHub current chart. So good news. And we will be able to replace the current chart in several places like the report of the same repository by this mechanism to get an access token from the GitHub app. So we will be able to decommission the GitHub user. That is awesome. We didn't know that it was already existing since 2020. But that's really cool. The plugin takes care of the logic behind. So that's also an important topic not only for us to avoid users as they say but also thinking about the CD for the plugins. Because that one, one of the reasons why GitHub action is used for the CD is because that credential was problematic for the automatic part in charge of pushing the artifact. Sorry, the automatic part in charge of creating the releases inside GitHub because you need to use the GitHub API for that. But at least for us, that's a very nice one. Thanks for the job on that part of it because you had renew in real time every less than one hour. The token is still useful for such scripts. About that, sorry. There is two sub tasks, two different tasks only for report to do. Separating the pipeline running the four scripts at once in four different pipelines. This work will be beneficial for separating our main work, our mono pipeline during the main work and also the maintenance task to make it manifest and so on. So we'll be able to separate these in their own pipeline and the credential too. So all of these scripts and steps won't be able to see other credentials and it will allow us also to tune the current period or resources or even we talked about putting updated tasks on their own Jenkins instance but we don't have any more. Let's start fighting with job DSL first on Mifra CI and then we'll see. And we also have to move this service from trusted to Mifra CI. So we'll be able to tune even more as the trusted resources needed since these scripts takes some time. No resources. Thanks a lot for that one. Is there any question or points to add on that area? Nope. What about GFrog incident? So right there we started to hit the infamous no space left on device on the GFrog Managed Instance on their SAS system. Danielle, I'm giving fighting with that. Danielle was able to purge some cash at least the blood of gigabytes that allowed the service to be back again. The exchange with GFrog was delayed but we had useful information on where we can check the size of each of our repositories. For information, Danielle has started to apply some new rules because when that instance, one of the repositories inside that instance is used as caching. So as you know, some developer might complain that their local Maven builds clean whatever doesn't work as expected. In that case, they have to fix the Maven configuration. So that's only certain artifact ID and group ID are used and cached inside the repository only the one from Jenkins CI because there are some binaries non-related to the project but there are transitive dependencies of this developer or worse from another project that they own on the machine that are consuming a lot of things. Another note for us on infrastructure when there is a deletion of artifact on their service, they have a garbage collecting system that runs every two hours or four hours, I don't remember and it has to run at least six times before the full garbage collect. So the light garbage collect every two hours is a procedure inside the web service and the full garbage collect is a snapshot cleanup at the file system level on their home. So we have some cleanup to do. Thanks, Danielle, for the work you did that was an opportunity to add a gave-in on the list of persons that are able to merge and status Jenkins IO because thanks, Kevin, for also sending information to the user in that area. And also we are waiting for GFrog for two topics now. The four of us and also team gave-in on Danielle and Vadek should be now allowed to open issue by email to GFrog support, not only Mark and Hyde. They should be recognized because gave-in send them an email and wasn't recognized. So they weren't able to discover which instance, what is the DNS and their whole support standard process. Second, we are waiting for them to check what is the file system limit and to check if they can increase it. At least adding one terabyte more. We are at 5.6, adding one should be okay. Given the cleanup procedure we are also doing on our site. Since Monday, the deployment and release are okay again, but as Jesse Eglick noted yesterday the service was slow. We didn't hear anything from GFrog on that area. That could be either the file system, when the file system is more than 80%. Most of the time the world's block's device is going really slow. But that's also another coin in the machine of we might want to have a caching proxy on our infrastructure using what was done 2 or 3 years ago because that kind of issue will be completely hidden because we will use our cache instead. Thanks Jesse for noting that. Some comments and information has been added on the associated help desk issue. Are there any questions? Anything unclear or something to add on that topic? So nothing expected from us except trying to find the time to restart the project of the local proxy cache and watching for GFrog answers. What about fastly purged requests? If I understand correctly are they stop me if I'm saying something bad. There has been a request from security team around the fact that anyone was able to purge the fastly cache publicly with the correct curl without any authentication. That's a feature of Fastly enabled by defaults. The original security issue was worried anyone could purge cache for Jenkins.io or Pluginsite.io but it's not the entire website which is purged. It's only the individual URL. So it's not a... We didn't get a definitive answer from security team but it's not as worrying as it might have been. I'm not sure we should restrict the individual URL purge. Are you okay to ask one more time on the security private issue both Vadek and Daniel? Daniel is the requester and Vadek is the security officer. Just give them back that context again and tell them, okay, we want to rework that because it's causing issues, let's say, operability issues for team giving, you, I, Stefan. Ask them their advice and then depending on their answer, we keep the current restricted process sadly for team and given request or we can roll back and everyone is happy. Sounds good to you? Thanks. Are there any questions or things to add on that topic? Nope. Thanks, Servay, for the implementation of issues, similarities. That's a good action, if I understand correctly, that run when someone opened an LBSK issue that would bring the user to similar issues based on a text research algorithm. The idea is to try it's not definitive as always but we hope that it could be useful to one user to find topics that should be the same with the same answer. Stefan, on your own, could you just give us a break? Sorry, I need a break. Could you give us, could you share the status of your main tasks? I remember that you opened an issue on update CLI to allow us. I'm sorry, it's written on that document but it's completely below. I didn't mix up with the presentation. But yes, I tried to change a little update CLI for it to be more useful for us to handle a matching pattern in Regeps and to use it for our updating process and checking if some security groups have changed, for example. And I did submit the pull request. I also did submit the pull request for the documentation. I may have to show and change a little some information like the different information value of the optional parameters. But I think it has been merged. Okay, so we have to wait for... I think Olivier did merge. Wait for release. So you are designed volunteer to track the update CLI once it's released, update CLI update on our Docker images so you can use it. Can you remind us what is the high-level task? What are we doing this for? Oh, the main task is to update the security group from AWS and to match them in the paper, I think. Oh, yeah, okay. And that's the first one, the second one. I got two issues that need that particular behavior from update CLI. The second one is depending on something that Olivier did, if I remember correctly, it's probably because of digital ocean or something. I'm sorry, I didn't warm up my main goal for that. Okay, and right now I'm working on a garbage collector for Azure, but it's brand new, it's still from today. Okay, Mark has to leave, I assume. Oh, I haven't seen it. Yeah, but we are on given minute past the end of the meeting, so I'm going to stop recording and the delayed issue will be delayed once more. Sorry. Let me stop the recording. Hey, voilà. Did you cut the paragraph on the issue? Still recording. What? Where is it? Did you cut it? Did you cut it? Did you say, hey, what are you doing now? Still recording. Okay, guys, I need help for the notes, because still recording. Still recording? Sorry, sorry, still recording.