 Hello everyone, welcome to the Jenkins Infrastructure team meeting. We are the 20th of September 2022, so today, on this virtual table, we have myself, Thémiène DuPortal, Hervé Lemur, who just arrived, Mark Wait, Stéphane Merle, Bruno Verrardin and Kevin Martins, six people, I'm so happy. First, many thanks Hervé for taking care of generating these notes and taking care of that logistic and improving that from one meeting to the other. That's really free and really useful. I still need to publish the previous recording. Let's get started with announcement. First of all, the new weekly has been re-published. Checklist almost done, thanks Mark. Each should be available in a few minutes. I've manually started the trusted CI job. Yep, thanks Mark. So sorry, I've made a bit of noise on the IRC channel because I saw in the big logs some test errors, messages. I thought it was stopping the release, but it's not. It has been released correctly, but that means we have tests, not sure if it's a flaky test or a real issue, but you're better to mention it. We'll see if infracia is down for the next days. Do you have other elements related to today's weekly? No, thanks very much for checking. I'm glad you checked because that probably should have failed to build. I think it's a surefire bug. I think it's a Maven bug that it didn't fail to build. And I think the Maven bug is an interesting corner case that needs more exploration. Thanks Mark for the explanation then. So we'll see. Second announcement, DevOps war next week, which means next week. We'll cancel. It has been removed from the calendar, but next week team meeting is cancelled. So don't expect a meeting. Next week meeting cancelled. Mark, do we have a weekly next week? Yes, we, I think we should. The machinery will run and we'll watch it. So I, I would not want to spend the effort and the energy to cancel it. It's just too much work. Makes sense. Thanks a lot. Next day, yes, five October, 2022. Mark, your calendar. Some of us will be jet lagged. Don't forget that will be the Wednesday. So we have two weekly before the next LTS. It helps your mind. We keep having, Stefan, are you still okay to keep the challenge of you taking care at least of the Docker images? When we have a new weekly, you have to merge them and ensure it's deployed. That will be the case for the Kubernetes part of the next LTS in two weeks as well. And so the rest of the team will take care of the pet part that it sounds good. So we split the effort and the burden. Is that okay for you? Perfect for me. Cool. Just asking because it might be a boring task. So I don't mind taking it or having someone else. Cool. Thanks, Stefan. Security release. So last news is that tomorrow security release was cancelled. It was removed. I'm not sure if it's, is it still kept? Daniel sent a public email. Send an email to the public security advisories list saying that there will be a plug-in security update tomorrow that the plug-in security release affects that the plugins affected by the security release are less than installed in less than 1% of Jenkins installations in the world. So it's a relatively low impact when I did want to spend a minute on this when we made a mistake two weeks ago, when we mentioned that today there, that tomorrow there would be a security advisory. We knew that two weeks ago, but we're not supposed to disclose that to the world until the Jenkins security team says it. So just a reminder to all of us, as we assemble the agendas, sometimes we have information that we have to filter before we put it into the agenda. Sorry. You know, it was, it's no big deal. It's just a reminder to all of us that, oh yeah, that's right. Security advice. Security communications come from the security team, including the dates of things. Now, the date of something is actually relatively low risk piece of information, but in a conversation with them Daniel noted that sometimes it can be disruptive if people expect a security advisory and then it doesn't happen. In terms of actionable to be sure that we don't really rely on ourselves because I don't trust myself on that kind of area. Do you remember because I think I'm not on the, I'm not reading the mailing list, the advisory mailing list, could you share in the link to that advisory link if it's public we have a mailing list that we can all check. So that will be the source of truth for our next meeting and we don't have to worry about filtering ourselves or not. Yes, that's a good point. I'll put it into the, I'll put the link to the advisories, the advisories page and the mailing list into the into the notes here. Good, good suggestion. I'll do that. Upcoming security releases. Let's use the public email as source of truth. So it's good for everyone. And, and even if you're not subscribed to the list, it's a Google group and therefore it's visible by by just opening a webpage. So I'll put that into the notes. Cool. Okay. I think that's all for the upcoming calendar. Just not since we are canceling next week, the upcoming milestone will span for across two weeks and not only one week. However, I challenge you in terms of project management to only keep that milestone as if it was spending only one week because most of the team will be traveling, disrupting the amount of time we can spend on running tasks. I don't want to overload the poor Stefan next week with not going to the DevOps world, because he will have to under the infrastructure alone. So please folks, let's consider this to expand as if it was only one. Thank you. Okay. So if it's okay for you, I propose that we get started on what has been done. Welcome Kevin, I'm taking the issues as they are on the notes there. The sorting might be different on what you see on my right screen so I'm taking on the left. So welcome Kevin to the copy editor, GitHub organization group. Thanks for merging your first pull request on the Jenkins.io website thanks Mark for monitoring him and thanks for validating these permissions. Thank you to everyone for helping me get to this point I appreciate everything I've had up to then and it's nice to be part of the team. Cool. Happy to have you. Next point was requesting access to Jenkins project VPN. So we have now David Nosman. I'm sorry if I'm not pronouncing correctly, please correct me so I can make an effort. David Nosbaum and Nosman. I need to ask him. The first one, Nosbaum. The B is sounded. Cool. Okay. Welcome. So he's working on the security team and now has full access to the ser.ci Jenkins controller used by security team. That was a good opportunity to clean up the update CLI process and update dependencies for our Docker VPN system. But the documentation that's defined and they're very contributed to improving the past month is is working very well because it's the seventh person that has access without any issue. Next one. CIG and Kinsayo container agent are missing SSH agent with the new all in one. And also the second one new agent erroneously running as roots. So let me put this to bug as a victory because thanks to first the work of Stefan then Hervé and then again Stefan. We now have for Linux, literally the same environment on every agent on CIG and Kinsayo, whether they are virtual machines or container, which means we build the same thing we have the same tool with the same version on the same location. That's the same definition. Obviously that sounds like magic. And we had some, some hiccups, let's say some bootstrap initially there were some unknown issues because of the fact we are dealing with containers that don't have exactly the same package sets as virtual machines. So we focused on our usual tool GDK Maven that was very well has been tested. So you did great work. But we were beaten by the fact that for instance the default Ubuntu image in the context of Kubernetes might not have the same charts by default than the previous images or the virtual machines, resulting on some weird character unicode encoding, or resulting on the default user of the container being root for sometimes. So that was just minor hiccups, except it was frustrating for the developers. So sorry for the inconvenience, but no security issue, no, no outage. And now we have that all in one. So next step will be Windows and thanks a lot Stefan for helping a lot on fixing these issues, because that was a lot of unknown there. Thanks, Basil, of course, for your celebrity in mentioning and reporting the issues and confirming it was fixed on your side as well. Nice teamwork folks. I'm really happy for that outcome. You also propose to add test to ATH to test local and Absolutely, I think there is a new issue, but good point. The proposal that Stefan is currently working on is to improve the, we have a job on CID in Kinsayo that Mark initially created, I think, or Tyler about that run regularly and he'll check and acceptance. He'll check. He tried to spawn some different agents based on the labels that we provide to the developers. So the goal is to improve what is tested, not only the ability to spawn these agents, but also some let's say context of these agents is the default user still Jenkins for instance could be one that Jenkins cannot run a pseudo command that There is a span of element that we want to be tested that could be tested easily in that job. So that will be the outcome. The three elements that has beaten us on the deployment of all in one container. We need to check that now they'll check mention them. So next time the hills check will be tested and our future ourselves won't be beaten by these issues. But every that's good improvement and way to improve ourself instead of staying on the on the issue. We have a question on the only one. Next subject cannot check out Jenkins core get repository a new Linux container agent my bad. That's the third issue. So fixed as well. Same topic. The user to keep her secret manager plugin developer team. So I think Tim Yakom to care of that. That was a, let's say usual repository permission for contributor of plugins. Not really. Thanks team for that. Did I forgot some tasks that we definitely closed. Okay, let's go on the work in progress. So for each of these issues. We give a status and we challenge if we keep them on the upcoming milestone or if we put them on the backlog. The report Jenkins CI or mission. That one. We will discuss that on the contributor submit the status is I've started the draft GP Jenkins and unsold proposal, an announcement proposal. Fine. The strategy we want to adopt in order to bring the bandwidth usage of repo Jenkins CI under 10 terabytes. So G frog continue to sponsor us. So the project is kept sustainable. The next step is a today I have to send an email to the developer mailing list to point this element and this issue and start the discussion. Should happen during the contributor submit in Orlando next week. The goal is to on compare the pro and con solution gather the opinions of every major contributor involved in that especially the one with legacy knowledge. And then we will have to take a decision. So again, the main, the core of this is writing down what do we use that repository for. That's the main outcome of that issue. So of course it's our top priority. And it's, it has to be kept for the next milestone. Do you have any question on that specific issue. Next one update LTS issue filter search query. So that one has been raised by Alex. I don't see any actionable here for the Jenkins it fratim. There is no action on our side as far as I understand. Am I correct mark. Okay, so the topic was, I've got to read a little more carefully. This one was. Oh yes, nothing for the infant team. This is this is really a release lead improvement. Perfect. We are now have my stone for that. Oh, we do. Okay. Yep. Can I let you just explain every violation because that's your work instead of letting the shoe where we can't really act on them. Without any milestones. So without any tracking of any sorts, I've created a new milestone to indicate it's an issue we contact on directly. So anyone can see the issue has been taken in a good spot. Exactly. That's the goal to differentiate from issues that are rotting on the issues lists that are still in triaging state or on dead states. As soon as they are there, we can scan this regularly and ping or close the issues or ask people to act on this. So that's a way for us to limit the amount of rotting issue. We can see the description. I'm not sure it's complete but that is so that sounds great to me so is, is it also your intent that when something comes in that is spam to the help desk. We market we assign it to this and market is not not not planned or do we just market is not planned we don't have to assign it to this. I'm sure I think we don't have any to put it in any milestone just close that not planned. Okay, thank you. Absolutely. Thanks a lot every for that. So, if it's okay if it meets I'm moving the current milestone to the not the recollectionable because as you can feel Mark, it's a really easy discussion. So important to have it on help desk it's a centralized place but no actionable for us no action from the team looks good. Okay. Next topic mirror stat reports wrong results. So that one is still as triage. I've added it to the current milestone. So the. I'm not sure this is still a legit issue in the sense that the mirror stats shows the mirror stats. I'm asking the question loudly because the file mentioned by Alex is served by multiple repositories. So there is no issues for the mirroring service. What Alex is mentioning is that when you look at the mirror stat we see that we have some mirrors that haven't been updated. The thing is that this mirror why used so they are on the history so I'm not sure of the actionable should we spend some time scrapping the current configuration and see if these mirrors haven't been updated we can remove them or disable them. I'm not really sure my knowledge of the mirroring system is not good enough there. But at first sight I don't see that as an issue right. And for me it's, there's an improvement to be had here that so very on calm is online again. It had problems, but we have not brought it online yet as a as an official mirror. So they are the they are mirroring our content, and I'll you know calm is another mirror that is online, but we're not using them to offload, offlet offload bandwidth demands. And so both of those would benefit and I believe we have one in Singapore as well but for me I think those are all part of a, when when we get to it. Improve the mirroring by adding that Singapore mirror, adding back the survey on mirror and adding the alien mirror. And then they're all part of the same story this, this is to your point, Damian correctly showing the status that so very on is not acting as a Jenkins mirror right now from get dot Jenkins that I will not offer files from so very on no matter how close you are geographically to them. So they appear here means they have been added in the database of our own service. I'm, I don't remember exactly if the number here is, is it a mirror reporting to get Jenkins it's okay is it the our system that should edit is it because they are disabled. I think I think the reason they shade they show it and zero value on the since is is that they're not in the in the list of current mirrors but yes they have exists there sometimes so there is data for them. Okay, so then we have an actionable on that issue is that we have to check those for me the status of the mirrors are they enabled or not, and check on each one if they are up to date or not, or enable it otherwise. Does it looks correct. Yes, that that seems correct to me. Jenkins in front team. Need to check the status of the following mirrors. Okay. Do you think it's a topic that we should keep on the upcoming milestone. For me, no, I don't think it's that high priority. I think we've got other things that are much more much higher priority mirror mirror maintenance is is low far down on my list anyway. We have multiple mirror already working well so I propose that we move we remove milestone of that issue sounds good for you. So I'm removing the triage because I need to portray it now and I'm clearing milestone. So this one is introduced an artifact caching proxy for CI Jenkins IO. So that's also top priority subject it's the work that they're doing on having caching system for to decrease the amount of things we download from G frog. So quick status. Yes, so I have an artifact caching proxy running on Azure on the public. Cluster. I've created a new digital on cluster to host the proxy on it to so it's separated from the cluster dedicated to CI Jenkins that IO agent. As a security precaution, since we can't be sure origin can do malicious thing. And I'm currently looking at creating a public against cluster on AWS to host. See proxy on it. Same use case. I've dedicated a big cluster. When this AWS new cluster will be happening, I will create a request on doesn't plug in Mark gave me where we will be able to test the proxy. Before as the functionality is behind the opt-in parameter of the build plug-in shared pipeline library function. When the test executed and if it's successful, we will put this parameter as about so every plugin would use proxy if available. Nice. Are there any questions there. Cool. So some elements that might be outcomes of the current discussion of the contributor summits for that specific topic. The first one is that we might decide to mirror everything and not only repo Jenkins.org. I think that seems already a sane element to do what do you think. Since repo Jenkins here today is mirroring most of the usual Maven external repository so that would be a centralized point for us, because it's still downloads will do from repo. Yeah, I think that makes sense to me. I don't see how we would confidently distinguish between them. I think it's a lot simpler to just say yes, let's mirror anything we asked for that we would have requested from repo that's Jenkins CI dot org. And given how much it's mirroring. I think that's large chunks of the internet. Does it make is it okay? Does it make sense for you? Yes. I'll probably have to increase the volume size. Oh, now I'm using 50 volume. Okay, well, yep, let's let's check this. Increase size. Yep. I don't know how much we would need. We will have to check it and that that should eat us some kind of growing linear limit. Authentication. Is it authenticated? I don't remember. This is proxy need a require notification. Requiring client side authentication. Okay, so in the case if we need to authenticate or to enable authentication on repo Jenkins so the upstream, we might have to create technical accounts on the LDAP and add these credentials directly inside the engine x configuration so engine x would authenticate with LDAP credentials. It's any other question on that topic. Okay, can move to the next one. So I've moved. Sorry, Damien. I was a little slow on my maven cash. Long time since it's been cleaned. I'm only at 33 gigabytes. So everybody if that helps you and on another system I'm only at 22 gigabytes and they're two systems that are used quite actively. So in terms of total disk space used by that cash. I'll be surprised if we hit a terabyte. Yeah, I was thinking about the same. It's, it's what it's a number you've quoted recently Damien correct. Yep. The worst case is the world she frog stores 3.5 terabytes. So we know that the upper limit we cannot go more than 3.4. Not much. No, 10 terabytes per month shouldn't be. Certainly. Yeah, the distinction there is data storage, the three terabytes that Damien's referencing is data storage that we use on on JFrog. Which is not much. Right. Three terabytes. Yeah. Yeah, but we have three cloud providers. So three proxy and three volume of free provides if we go this route. Yeah. Still cheaper than bandwidth. Okay. Can I move to the next topic or do you have other questions or publish acceptance test on this Docker image on release. So that one is almost done. So I'm going to move it to the next milestone. Everything I'm only waiting for confirmation from the developer that once they will create a new release everything what work and went. We have added two images already. And we had to contribute to the CD plugin process. Because we had to annotate tags because right now, until that change, when you were releasing a plugin, it was publishing the GitHub release, which was creating a tag, but the tag was associated to the date of the commit it was pointed to, which means it was on the day when you publish the release and creating publicly the new release. It was the day of the commit when the commit was merged. The sequence is if Jenkins or any tool looks at that date time, it was older than expected. By switching to an annotated tag with the, so with the git command, it's adding only adding the dash a it's a bit different in our case with GitHub actions, but we were able to make it. The goal is to add some content that will change the tag that will prove the owner of the tag signed the release and will add a new date timestamp when the release is done. Looks to be working, but I will only close that issue once it's confirmed. Any question on that. I haven't checked the last message, but we're waiting for the new release. Next topic, finish cleanup of mirror brain. I'm working hard on this one. That's quite a sensitive to pick. Where is it? I still have some work to do on this one. In context, the PKG big machine that host update center, but also plugins and updates and some former mirrors artifact mirror brain user is also used by every element that we use for releasing, releasing plugins, releasing core, using sub project and synchronizing everything. So I'm right now working on creating a technical user dedicated for that, which role will be then to be the main user honing all the doc routes with all the plugins and everything. Then Apache will be changed will be read only only on that because it's not the case for some parts and there are some permission issues right now. And finally, that user will aim on all the scripts that run regularly with the Chrome tab that used to be the mirror brain user and it has to change name and change permission. So that will be a step by step process because the risk is breaking almost every release things that we have. So let's not go too fast on this one. Right now I'm testing it locally and I will try a green blue deployment first new user with which as much as possible. And we correct issue when they occur and then we can start removing mirror brain user and things they should that will create still context, a bit more context that machine hasn't been automatically managed or documented during the past four or five years. So it's only the collective memory that helps us remember what is on the machine. That's why it's a sensitive topic, but that's for the best. Next topic. We have we have we have missing that the dog metrics for a case cluster. We don't understand that one. It's not major so I will clear milestone unless someone want to work on this one. Because we have a good set of metrics on both graph and on data. So no blockers there it's really minor inconvenience. Austin begin else curing application on infra Eric and I let you give a status on what you did what has to be done. I've integrated the job in my heart. So it's running on both. So anyone can see build results and one of us or the Chinese. Image can be produced. You and Adrian have finished Elm chart. I think, or at least the first session. So now we have to create a post create SQL database in Azure. From the repository and deploy the chart configured to use this database on public. Okay. Okay. Otherwise currently in holidays. I don't remember was taking care of that but the idea is to see if we can work on this before taking the plane this weekend. Because that will be nice at the DevOps world to show a first initial version of the application. They have an issue in the under confider. Yeah. Okay. We can discuss this later but yeah we can get a session running. And if if there's an issue it's perfectly okay if it isn't running. Don't don't be shy at this is this is not that hot a priority as far as I can tell yeah it would be nice to have but it's only nice to have. May I ask you Stefan to take the lead on the database part of this one. Do you feel to take the lead so the goal will be to add a new database and the new user like we did for updates and usage. It's terraform manage so then I could and I could take over on the end chart installation. Okay. So the three of us could interact there. Any other candidates. Oh no. Thanks every on the summary. Doesn't seem motivated. What do we have next. Publish pipeline step dogs generate on back end extension indexer. I feel like I will want to start rocking on this one tomorrow but to not too much so I propose weekly at the milestone for this one. The thing is that one is required to build the Chinese Jenkins website which hasn't been updated since beginning of the month that's why it's slightly important. If it's okay for you I'm adding it to the two weeks milestone, I will try my best might be a pure asynchronous work it's a minor one so that should be okay. If I see it's jump from minor to meteor, then I will remove it from milestone sounds good for you. That won't be my top priority won't containerized Java 17 windows agents. So that one will be in pause. Now we have the only one great jobs that we mentioned earlier. So the next step for this one is being able to build windows container with Packer. So there is a currently working progress pool requests. Anyone who want to try. It's poor shell low level things. It's just calling the correct poor shell version on the container, but almost there. So I'm going to remove the milestone for this one. Next because it will be important but after the DevOps world. Update key use separated pipelines organization scanning for all update key processes. There. So thanks for that work we have an initial implementation that looks really good right for Kubernetes management. Yes, but for now it's. We have to update our system to take into account. The kind folder jobs. So we could define them as good. So, I think we have an issue. Yeah, yes, organization scanning. We are short so that's the answer that transform. Yeah, more definition into groovy job DSL with our default job settings on infrasci. So we need to implement the job of kind GitHub organization. So then we only have to put a Jenkins underscore update Jenkins file underscore update key pipeline that will be taken for only the update key. We also have a slightly minor bug that since we have two multibrand jobs pointing to the same repo, we have some of the GitHub check by default that have the same ID. So sometimes it redirect one or to the other generally to the last job that finished the check. We are not sure how to change that. So we need help on that. No issue. The two jobs have the same check status. How to change that. Did I forget something on that's to be curvy. Okay. So I'm, I propose to remove the milestone and put it back to the infras team next, meaning we will have to walk later after the DevOps world. 123 let's go then, if no one objects. Okay, I think we have covered all everything there. I need to move this one. Collect data dog metric for ephemeral VMs. I don't know why it wasn't there. It was just jumped it. I forget it. Sorry. Stefan, can you give us a status on this one? Almost there missing the windows part for it. I think it's. Yeah. Don't be understood. It's okay. Except that. Okay. And the public dashboard. Right. Can you play that? So the goal is to provide to the end users. A dashboard with metrics about the agent. So if they have a build of their plugins or core or contribution on see agent can say, oh, they have free access to the machine metrics to correlate with what was. We did use that amount of memory. Did it crash because of, of om, et cetera. So we have to give them a dashboard by default and open a data dashboard to the public or core. Yes. Absolutely. So the first step is to have a dashboard that will be private by default and add the correct filters. And once we validate as a team, the dashboard, then we can open it to developer and communicate to the mailing list. The developer could use that additional tool for to observe the state of the agents in the pipelines. Right. And I think that was in the initial definition. I might have. Yes, make the data doc data available for developers. So there is no implementation path there. So the proposal is a dashboard because we use dashboard for statistic and can say you with Olivier last year. That was working pretty well. And so when those VMs. So great job because it was a lot, a lot of, a lot of tiny minor things that took a lot of time. So great job, Stefan. Thanks for that. Thank you. Think we have cover every work in progress. Do you have new issues that we have to take in account? So first, Stefan, just can you confirm that I can move that issue for that dog on the next milestone? Yes, yes, please. Okay. Now the milestone in front team seems next. Did you have any new issues? New issue that could that we should work on on the upcoming two weeks. Nope. Okay. Do you have other new topic we have to this mark the status of the elections. So I confirm that I have access to the community Jenkins I you on this course topic. David just removed himself from that area now that I've confirmed so both Olivia and I should be the only owner of that area. So that has been discussed as a topic on this week. Contributor board. So the topic that we should discuss during the contributor summit that will be a good way to just pave the way. So that means I will have to ask Olivia sometimes this week if he can give me some some notes or tips on the path in order to have this does it look good. I should mention it at the contributor summit certainly the timeline is a bit rushed because we would like to invite people to submit their nominations during September, we may need to extend that into mid October. So, so certainly have have the conversation with Olivier. Maybe it's good for you and me to sit down together and look at at what the parts and pieces are that are there there's announcement, etc, and we may be able to get those out even before contributor summit. So that at the contributor summit we're just reinforcing that. Hey, this process has started. To start as soon as possible. The process. Okay. Major topic during the country as well. Well, and I'm not even sure it's a major topic it will briefly mention that elections are in process we're going to use the same same methods we used as, as last year. Okay. Any question blockers. Okay, so that's all for me. I don't know if you have one last word. First of all, let me stop the recording.