 Hi everybody. Welcome for this new Jenkins infrastructure meeting. We are one week before the Jenkins contributor submit to the agenda today. We have few interesting things. So the first thing that I want to mention is Damian is working on upgrading to AKS cluster from version 1.IA 18 to 1.19. We hesitated to do the upgraded version 1.20. But because of the contributor submit happening next week at the silicon, we decided to be more, I mean, to be safer, to have a safer approach. So we will just do one major upgrade at a time. So we don't, yeah, we don't introduce too much risk. So as far as I understand, the two major changes are around the docker demand, which is deprecated. Container, that's docker shim that has been removed and container D is specific to AKS container D is used as docker engine instead of docker as container engine. Thank you. So some changes related to the Azure storage that are that make that upgrade almost mandatory for us since we have a heavy usage of that, that is the second change that need to be checked before the upgrade. But everything sounds good. That's awesome, because we don't have major releases coming so that's that seems to be the right moment to do this. That being said, we also any question regarding the AKS upgrades. So I saw that you prepare the announcement and status page. So once we merge this, we are fully ready to work on it. I still am. I'm still not sure about my time to commit that work yet. So I would prefer to wait until the very last moment before approving the maintenance. Is it okay for you? I'm not sure. When do you explain what I mean is once we. Yes, so you so okay. So once we once we once we merge this pair of windows, maintenance windows will be announced on status. And my point was, if we already do to the upgrade on Thursday, that's fine. We can merge upgrades. Otherwise we can wait, let's say tomorrow to see if we want to do on Friday. What's our, what's your expectation for me, opening the status page means that we are ready for that. If there is a blocker that will be specific to one service. So for me, I will, I'm available and everything should be planned as accordingly, except maybe an outage on one of the specific posts that's done a risk there. Okay, you know what I'm going to validate that now at your review. And just one information I forgot to make an announcement on this course since it's quite new so we'll take care of doing that after the meeting. Okay. To, to be sure everything is good. Yes, that sounds good. Now, and Damian that's scheduled for Thursday, 8am utc. If you, if you're going solo. Is there are there things in that session which are sensitive. Or is that something where for instance, Aditya could join and be be watching over your shoulder and talking with you while you did it. If the time window is okay, I will be happy to do that. Even either it's required or not, that's a good thing to be always true for that kind of things. Either for learning session sharing ideas or at least to be sure that someone else have a second parent high and on the action so yes with pleasure. Yeah, I don't know what if Aditya is actually available but 8am utc is not that late in India standard time. So, if I remember correctly anyway it's it's not an unreasonable hour of the day. Yeah, I did feel free to make comments. If you, if you, if you want to start later, that's fine as well. Usually, it's, I mean, it's pretty straightforward because we just have to go to the Azure interface select the red version that we want to use in that seat. The reason why they take some time is because when you're great, it's a great one note at a time. So flag a node as maintenance mode and so it will remove one note at a time, deploy a new node, remove to the parts to study to the new node and so on and that process can take some time, because we want to be sure that we don't break anything in the process. So, at least it's an option. Great. Thank you. So that that would, you would not consider that harmful and there's nothing so sensitive in the session that we couldn't do this. Exactly. Don't hesitate to contact me at it here and I will have the link of the call record one in IRC and in the ACM the associated not for the case of great. So if you're interested in joining and are available, don't hesitate. And also if the time does not meet your schedule and you want to do it one or two hours later. There's no problem on that don't hesitate to mention on the IRC channel. Any other question regarding discussion on I guess upgrade. So then sounds like we can move forward. So first regarding the affects security topic. So several weeks ago, David came into this meeting to present what we could have. The next step was to install the GitHub app in the Jenkins for organization so I decided to to install that GitHub app, but only allow access to five key repositories. So the purpose here is to identify how we can use that tool to detect security issues. And so I enable it for RC but plug inside API plug inside Jenkins version and the current Jenkins test so the goal is to analyze Java and react codes, and also the current majors, because we have that that's what we wanted to identify. I still don't have access. So I had to request access to the service. So you have to go to security that they fix that Linux foundation org, and then you have to open a support ticket to to have access to specific dashboard. But yeah, in my case, I don't have access. I have access to Jenkins CI, but let's get to Jenkins here for I have to double check that if people are interested to participate, let's say Damien or Mark. Yeah, I think you can also request access and I would be really happy to support that access. I think more on this topic I really thought that I will have the time to work on this until the next until the Jenkins contributors and next week. Any question. Thanks for enabling it. The biggest status that Jenkins that I know so that that was us most projects and that's really not really urgent. But I made several changes to the status page so the first one was to remove all the iframe so that means that now the website loads a lot faster than it was before. And so for instance in this case you see the announcements that we generated 10 seconds ago to to to announce the AKS upgrade. And so you have access to the different services. More importantly what I had it was, I suggested some, I just finished some work that I started several months ago. So the first one is now the service link, you have access so you don't see it on my screen but you have, you have three button. You have to get the Jenkins that you and in the case of get the Jenkins that you, you have a short description of what get the Jenkins that you is, and there you have the monitoring on iframe. So the response time. So the idea is to have this page to have those pages for every services so if some people are interested to help with this project that's really easy to do. So you have to get the Jenkins status to get Jenkins infrastructure status. So that's the status page. And then from here there are two main directories, the layout content, the template, HTML template, and if you go to contents services, you have you see three, three files, and so you can have you can have more let's say for the Jenkins that you, but the more important thing is, you just have to reuse those parameters. So for instance, you can provide a service URL you can specify a service description you can specify monitoring iframes with a title and iframe. You can provide some links. And so what I would like to do is to do that for every, for every services that we manage. And usually, and what I'm envisioning is inside the links, I would provide in for links to let's say the code or the way we deploy the service, all the kind of key repository that could be useful to debug a specific issue on a specific key repository. The last element that changed on status page was can go to monitoring and also this is also something that I would like to improve. Do you have a section monitoring, sorry. I've been a bit quick. So you have a section monitoring here. You have a simple page. You also have some link. And so this time, if you click on the link, let's say service actually the per response time, you have a data dog dashboard for every services so you can have a clear idea of how the different services behave. If you think that we should add more data dog dashboards. Yeah, feel free to open it to your request and why do you think that we should provide that information and I would be really glad to bring that information, the two, the two additional dashboard which are useful. You have one that I don't that tells you if latest packages are available. So this one is just tell us that the latest latest latest weekly release is available. So basically when we do a new release, and let's say we don't publish a Windows package, you usually see it here. And the other is on call notification. So it just provide us the worst SLO for specific services over the five seven days. And otherwise you also see the notification when there is an outage on the specific events. So, yeah, that those were the changes that I thought to the status page. So any question. Looks great. Thank you. So, next topic in the agenda is a case upgrade but I think we already covered that topic. So, regarding the ACI configuration and see I do think is that a young Damien do you want to bring us a quick update on this one. ACI issues has been fixed. There have been a few bugs corrected by team so thanks team for the help because it was absolutely not my comfort zone. So that helped that helped me to be sure there were no error with the latest Azure plugins who were able to upgrade all the plugins that fixed all the issue that were caused by the initial bug and then the whole back. So, we took the opportunity to also upgrade the latest to see to begin for the agents. And then since CI was down, whereas suffering from issues during that part, we tried to reboot the virtual machine. Someone forget to put the S defaults on the FS tab. That was quite a funny issue to detect so the virtual machine was not able to reboot. So that has been fixed. Now it's back in control with Puppet. So that kind of issue should not happen anymore thanks to the Puppet agent pack and working again we upgraded the certificate. So, while we were at that task, we upgraded everything on the machine, checking score to the latest LTS that was released just before all plugins or packages all over content of the operating system. So now that should be good. And finally we took the opportunity to upgrade all the agent virtual machines EC2 and Azure to the latest version that has been built with Packer during the past months. So the operating system are up to date. So how does to quickly deliver Maven 3.8.1 as requested by other contributor. So that was quite a nice cleanup and CI Jenkins. I hope it will improve the life of developers and their ability to iterate and update faster way in the upcoming. That's all for CI. I can quickly say trusted works exactly the same we did almost the same once we were sure that CI Jenkins worked after two days. So everything has been applied in the same way to pay certificate upgrade plugin etc and Azure virtual machines agents configuration. So everything should be okay and all the temporal resources has been deleted to gain some money on Azure. So we can get back to businesses on these services. Thank you. Thanks. Thanks. Mark any question here. Oh, sure. Great result so thanks for doing the fixes. Awesome. We both do additional fixes to the trusted CI. I think that was yesterday's the first one was we discovered that the update center certificate expired. It was supposed to expire three months after we created it. So the reason why we took a very short life certificate at that moment was because we we identify potential issues with the new certificates. And so we want to be sure that it wouldn't have any effects on the process. It did not. And so this time we generate a one year certificates. So we have more time in front of us. And we also be also have an issue with a certificate with a crawler job on trusted CI, because we changed the name of the rule certificates. So that was an easy fix but I'm really stupid. The last topic that I want to bring here is I got I got a notification from fastly that my credit card is expiring and that I have to change the credit cards. I haven't been charged in the past and my credit card because everything is covered by sponsoring. But yeah, I have to identify a way to not have to put a credit card in the service. So I'm not sure yet if fastly offered that option to open source project, or if I have to look around the news foundation. But yeah, if you have any suggestions that's that's something that I would like to solve pretty soon. Usually that's pretty difficult when you have infrastructure. The way we do, because we have a lot of sponsors, we have a lot of different accounts, and most of the time by default, they assume that you will put a credit card. But because you have individual contributors behind those. I mean, because you have individual contributor. We don't always have a credit card that we want to put on the service. So yeah, that's a pretty common issue. If you don't have any questions or any last topic that you want to bring to the meeting here. Yes, just a question. It has been so we discussed this in private. I want to bring that to the team. So we should create a Google agenda, a bit like the SIG ones for the NFRA team that public or not public I'm still not sure but the goal of that agenda will be to mention the certificate renewal. Or when there is a deposition of a component or whatever task that are time bound, like for instance the data center certificate renewal yesterday. There are a lot of these tasks that are could be automated but are not because we need help or maybe it's not possible. So the goal is to have that calendar that could act with alerts for everyone to share that knowledge. So it's not on someone private calendar. It's not on a document that doesn't trigger any reaction. We want something with events. So I suppose that we start with a Google shared agenda like we do for the community that could we can subscribe to this as team member and so then we can have this kind of alert one weeks before something goes up. I mean, yeah, as Damien discussed, this something as Damien mentioned it is something that we discussed I think earlier today or maybe yesterday. But it wasn't my plan since a very long time. And I think that's the certificates expiring issues. Really highlight that we need an agenda because not everything can be monitored. I mean, easily. And so a lot of things could be catches by using an agenda. I don't think that that agenda should be public. Because we want to prove we want to put. I mean, important information there, such as, yeah, critical certificate which expired. And but yeah, we should have at least enough people on the agenda so people several people can catch specific dates. But yeah, I'm open to suggestions. It sounds like the Google agenda is the easiest way to proceed. Yeah, maybe maybe discourse provided an agenda that we could use. I have this something that I have to investigate. Any other other question or topic you want to bring here. So I do have one last topic. Because of the silicon and the contributor summits next week I would like to console next week infrastructure meeting. Any objection. No objection for me I think it's a good plan. No objection either. Awesome. Then thank you for your time. And I might have two points since we are not out of time. Sorry. I've added them at the end of the note on the action points. After the team agenda. So the first one is. I would like to bring regularly a job named github reports that I understand to be some task regular tasks that retrieve statistics and put them on infrastructure. I will propose to, to, I will propose and do this under a problem with you if it's okay to migrate that job on infrastructure because that job is creating a lot of locks on trusted CI and combined with the infrastructure. Sometimes the build is quite high when something goes wrong. And that github report is really, really creating a lot of locks and issues. And so, given the sensitivity, sensitivity of trusted and the fact that infrastructure is way more stable because we can interact with it in Kubernetes in easy ways. My proposal is to move it. I'm not sure if I missed something when I'm analyzing this project because I don't know so any external ideas or feedbacks or comments on that. I plan to write an email to the Jenkins in front to ask that question, but I wanted to bring that subject to your knowledge during the meeting. My last interaction on that job I think was with Daniel Beck. I just remember that. Well, I think it's worth a conversation with him just to see, because if I remember isn't that the one that runs for a very long time gets multiple copies of it running. We hope it completes some fraction of the times has all sorts of interesting behaviors in that job. Yeah, this is something that we have to investigate with Daniel and also regarding the permission. This is something that we have to investigate. Okay, I'm going to ask directly Daniel, maybe the answer is no, it can move to release or not move. The root problem, the problem I want to solve in fact is, I will want to avoid having so much builds in the build queue. I saw there were already some usage of the lock. By Ben Key word. But yeah, in that case maybe forcing it to only run one build at a time. Yeah, and I think, I think there the conversation with Daniel is really good because I, I don't, I apologize but I don't recall why it's structured the way it is, it's just a surprising structure. Right, the fact that it's, it seems like it's wasteful what it's doing and I, I remember seeing them thinking it was wasteful but not having come to a conclusion, what to do about it so thanks for detecting it and thinking about it more. Well noted. I think, I think for that one we should create a georatica because I'm not expecting you to work on it until next week. I think that would be better to create a georatica so we move the discussion there and have a try so that's the goal is to raise the discussion here no action point until July on my side. My only fear is that it can take several weeks before you implement that change. So the discussion happening and because of the other priorities regarding the next one started using QB agent, AKS for the other things that I know. Yeah, sooner is better. Yeah, we still need ways to reduce Azure costs. In the last months we were at 10k, a little bit slightly over 10k. So we definitely have to go below 10k. And so, and ACI represent a major part of the cost of Azure accounts. We still need one or two weeks to see the effect of fixing the ACI configuration update on CIG and Kinsai. However, the proposal here is to not remove ACI but to start adding a limited capacity of pods of Kubernetes pods that run on the dedicated case cluster. So it's monotonous, it's not expected to be multi-tonal, the ACI is expected to run on that cluster. And that cluster have a static capability in terms of resources. So the goal is to add that static capabilities and see the impact on ACI costs. As always, the best solution will be a mix of both. Since ACI are quite expensive, they are really good, really performance, that's really nice service, but it's expensive. So I think mix is too contender like we do for the virtual machines with Azure VM and EC2. I think the solution will be with that. But the goal is to start checking that without breaking the usages. While we talk about Amazon, the current process for the Amazon sponsoring is that we have to provide some cost estimation to continue the process. That's in my to-do list. Okay, I need to extract metrics for you on that topic then. So yeah, that's one thing. And yeah, another thing that I would like to warn is I'll have limited availability to work with infrastructure over the next week. So yeah, please pay attention to what you change. Make everything. Yeah, thank you. We are now running out of time. So thanks for your time and see you on Libera.