 I have everybody welcome for this new Jenkins infrastructure meeting. Before we start just to announcement the first one is Damian and I will work on updating the interest controller that we use on activity cluster. This is a task that we delayed for too long basically so it involves deploying a new controller redirecting the traffic the DNS traffic. The network traffic sorry to the new controller while we put down the old one so nothing I mean normally we shouldn't have should not have any downtime but yeah the risk is zero does not exist. But yeah, so we take we work on that on Thursday. The second announcement which is next week. Sorry. Sorry, Olivia just for clarifying so this is the ingress controller to the Kubernetes cluster and therefore if it were to go down anything that's hosted in Kubernetes could be affected. Is that right so it's Jenkins.io for instance. Yes, so every websites. So that's why we deploy a second one so the idea is, we deploy a second one and redirect our traffic to the second one. We upgrade the first one and then we go but redirect the traffic to the first controller so that's the idea. So because those are just stateless application. It shouldn't be an issue. We did that several times in the past so it's not the first time that we do this operation is just that because you have to deploy a new controller redirected traffic, and then redirect it back, it can take some time, but otherwise it should be fine. The second announcement is next week we will have an LTS release or notice release. So this is something that we have to pay attention so if we plan any work in the release environment we should not, we should be careful. So that that's it basically so don't don't be too don't change too many things before the next test release. So that's it for the announcements. Regarding the notes. As again, those notes are prepared several days in advance. So feel free to just add any topic that we that you want to discuss the notes. So we can come prepared for the meeting. So the first one that I want to discuss briefly is I've been thinking in the past about how to split the role of the infrastructure officer to delegate the role. So the community could take more responsibility here. And so if you have any suggestions there. I mean, I would be really happy if you could think about that. The way I would think is would have to have to have a shadow infrastructure officer responsible for specific area. And those are I mean the same person does not necessarily have to work on everything. So we could have let's say someone more interested by communities someone more interested by puppets monitoring other things CI shut libraries whatever the topic is. If you find a kid repositories are an area that interests you feel free to. Yeah, feel free to start reviewing PRs maybe put you in the code owner. I mean, those are, I would be really glad to find ways to do. Do they get a little bit of responsibilities here. So any question. I think the first step is if there is an area that interests you and you don't really know how to start to contribute. I briefly mentioned that in the next one of the next topic, but we started documenting as much as we can. So if something is not clear. Feel free to ask and I'll try to find the right person who can put the documentation and get repositories but I'm coming. I'm coming to that point in one moment. So I'm going to just switch to continue on the documentation to pick and then we will continue with the. I'm just switching the documentation so we stay here. So with Damian last week, we started discussing about how we can improve the documentation and the challenge that we have is, we have different kind of information that we store on the Jenkins info organization. And we have different ways to work with that. So the first is you have, we have like this meeting we have synchronous communication where we want to have different people working on the same document at the same time. So that's why we introduce acne. So everybody should be able to to participate to take notes, basically add content, the reorder, whatever. So everybody is invited to participate, but otherwise for specific documentation. It's better to just open a pull request on the repository Jenkins and for a slash documentation. So let's meet. I'm sharing here. So this is the git repository that I'm sharing. So as you can see, we, we have a directory containing communities maintenance. So those really is to collect information when we do maintenance on the communities cluster. And we want to add more documentation each time we have to do some maintenance. And so we probably add a document for the engine is controller that is coming on Thursday. We are taking notes for the meetings. And we want to add more service documentation. So we are still building that git repository, but yeah, feel free to feel free to request specific information. And then another element that was asked last week is what do we do with information that we don't want to be public like for instance run books. The idea here is just to do to document a specific service and inside that specific page, we just put links to the rent books. There are different reasons why we don't want to have it say our rent books public. Because it involves either personal information like call that person if something goes wrong or do specific operation that we want to keep private. So yeah, that's that's why we we still have different kind of documentation on the Jenkins for organization. Again feel free to participate to that project. Any other questions before I continue. No, awesome. The next topic is instabilities. I'm infrared CI. So on the past few weeks we had quite a lot of issues with web circuits. Damian do you want to continue on that specific topic. So I've monitored since last week, the website issue. I haven't seen any on the logs. So I might have missed some, but yeah, didn't see it happen again. So the throttle status of the ACS API, the Kubernetes API will use for the company in Kubernetes. And it was not throttled anymore since the incident last week. I'm not sure if it's correlated or not. So right now the relative is that we don't know, but it looks like that it has been a bit more stable. So, even though we had quite a lot of beads and infrascii. So, I just want to mention that on the infrascii topic, since there were a lot of issues, especially it was taking like more than six or seven minutes to restart there were issues due to performances of the data volume and Azure. There were errors that were taking a lot of times to load. There were a bunch of tiny elements that took a lot of time to restart and it was restarting the jobs in loop sometimes fading and there were also some jobs that were running since five, six, seven days even. So we are iterating on infrascii configuration, little step by little step to be sure it should be better. But right now we did not face the website issue anymore, which seems to go on the direction of our guts when we say the upgrading Kubernetes and accelerating the jobs should be okay. So, let's wait and see. But for now, no issue. Thanks Damien. There is another point that we changed last week. We faced our issues with the traffic risk at the Jenkins that I will, which is a service running on the community cluster. So I'm just wondering if, yeah, if the fight that we know stop redirecting part of the traffic there and reduce the pressure on that cluster. The pressure was not on the cluster itself, but on the request made to the Kubernetes API. So the amount of requests done by your Google client, native or. Okay. Thanks. Thanks for this. Next topic, which is also something that Damien has been working on. It's about cleaning the data configuration. So just a bit of context here. The Jenkins project is sponsored by data dog to monitoring a big part of the infrastructure. And a while ago, we started using terraform to configure the monitoring checks. The terraform code was applied from CI to Jenkins.io. And what Damien did recently was to first update the terraform dependency. So the deal was to move to version 0.13. And also clean up a bit of legacy around that project. So if you want to look at that specific repository, it's located on Jenkins. And now the build of that repository is built. That terraform code is executed from infrared CI. We used to have a staging environment, but we stopped using that staging environment. The reason for that is because for some of the synthetic checks that we have on the prediction account, we had to request to increase the limits. So yeah, we send a request to data dog support to increase the limits of the number of synthetic checks that we can have. And so because of that we cannot use the terraform code to deploy the staging environment anymore, because we are limited to three synthetic checks or something like that. So yeah, the work done by Damien here was written to simplify. And if you are interested by a specific dashboard, whatever, that's the right place to go. If you want to improve the monitoring of a specific service, that's also the right place to go. Still on the data doc topic, we have multiple components. So we build our own data docker image. So we have some custom checks. That's how, for instance, we monitor if we can still download the latest Jenkins packages from the service get the Jenkins that I have. We also use that monitoring to know if the release process is not stuck. So typically what happened in the past was the release process released, you use Maven release plugin to publish the Jenkins war. And for some reason we were not able to package the new version. And so we had to manually trigger the packaging job. And so when the same situation happened now, again, we just detect that specific issue. And I can see you that you're wondering. Do you have any questions on that topic? No, okay, sounds perfect. So yeah, custom checks for that a dog and otherwise everything is maintained through terraform. So the idea that the goal is to ensure that we can build correct as well, maintain as code. So the first step was to be sure that the CI simply as code was fixed. It's our ability to updates and do whatever actions, but the near goal for end users and us is that we want to improve the monitoring to be sure that if we are triggered on page duty is because something is really broken, and you're right now it's quite verbose. So we risk the risk is that we can ignore real positive issue. And the goal is try also to automate most of these things to provide a statue automated status for the end users at any moment, if one of us is not available. That's the miracle. So there is one specific topic that I found that happened multiple times in the past was, we updated the terraform code, it's working and then after several months it stopped working and we have to look at it again. And I saw the same behavior for some of the darker images. And so we shouldn't have probably put in place some monitoring, let's say, if some beard or not passing on the master branch someone should fix it. Let's see in some of them. I was just surprised this afternoon to look at a bunch of filling job, even if we work on those. That's a good point. That's a question we won't answer right now. This is where will this alert will go. IRC channel, email, whatever notification matrix message that I think that the question that the question that would be who will have to answer that question in order to implement this. Yeah, but there's something to keep in mind. Thanks. Thanks everybody. Um, the sort of next topic which is, yeah, I don't think it's there. Thanks Damien for putting that as well. So, sounds like you finish working on that. Do you want to share the state here. We can start using to burn a spot template for the agent and see I can see you. The underlying cluster has a static capability right now, free notes that are medium sized notes. So for tiny steps, it's okay. So the next step will be to start moving some workload, eating the limits and then improving the automatic scaling of the underlying cluster. The next step, but it has been successfully applied. And there are some minor configuration changes that we will have to do in order to improve switching to observe it, etc. But overall it's working very well and really efficient. So I'm quite happy with this. All the jobs I worked on to twice or three less times because they were building on virtual machines. So the performances for these use cases were quite better by moving. So next step is real life. And I would be really interested to work on that with you as soon as possible because we are investigating ways to reduce the cost of CIDA Jenkins that I own. So, if we can be more efficient with that service that would be really nice. Okay, right. The last topic. Sorry, yes, before. So I'm assuming for now that the experiment is explicitly only for Linux based agents, no windows based agents right so we're not attempting the wild card of windows based agents and Kubernetes. I don't think that's on the road map, but I haven't been walked on yet. So we have a working example and info.ci, but it needs some work to be sure we deploy this on the project. Great. Thanks. The last, the last topic that I want to highlight is we transfer a few key repositories from carrots. Welcome to the Jenkins project so feel free to look at them. The first one is the UC it's a small by it's a small tool that you can use to bump plugin version you just provide a plugin that takes day and then you see we automatically updates for a specific Jenkins version. The second tool that I mean that you should definitely look at is the Jenkins version so it's also a small binary, and it can you can use it to know what's the latest version, either the latest table or the latest wiki a weekly. So we've been, I mean, this is this is a tool that we are using. I'm not sure if we are using specifically this with this version and the monitoring in data dog, but you are definitely using into location, and in the release process. So it's really to be able to know what's the current stable version, and it gives you that number. And so when you use the Jenkins version and the UC together. So that's what we are using. Let me share those repositories. So if you want to look at how it's used Jenkins. I'm going to go here. So you can see my screen. Yeah, and it should be the current dash, let's say Jenkins LTS. So we have a specific itch repository for that test for stable and for the really for the weekly. So we provide the plugins that we want to that we want to install. So we remove the in your right. So is the SSH agent has been fixed. Now, yes. So we can also specify to not update a specific plugin. So this is something that we still have to work on for the release environments. So we can specify to not update a specific file. And then when you look at Jenkins now it's not here. So I guess it's an interaction workflow. Where is that? Yes, it's using the interaction. So it won't be easy to show. But basically, if you go to the UC or Jenkins version, let's take Jenkins version, you go for the latest release. We go for the latest release and then you can directly download the binary that you want to use. If you're running on, yeah, whatever the distribution you are, the architecture and you're sorry. And, and yeah, that that's that's pretty useful. And finally, the third project is Captain Hook. Captain Hook is a web hook proxy, but the specificity here is to collect every web hook that should be sent to the Jenkins. But if for some reason the Jenkins is down, Captain Hook cash those requests to when Jenkins is back on track, Captain Hook send those requests to the Jenkins. The idea is, no, we have some configuration to apply to use it on a Jenkins infra project. But because we restart multiple time per day the Jenkins instance, we want to be sure that we connect that we keep that we that we end up everywhere when we request. So yeah, that's also another project that we are currently testing. We cover all the topic. So any topic you want to briefly talk here before we stop the meeting. One call to call. So yeah, I already put so then this week, I already put the link to the next notes weekly for the next weekly meeting. Feel free to add any topic you want to discuss there. Thanks for your time and goodbye. See you on RC.