 Yes, now we are all coming so welcome everyone to the Jenkins weekly infrastructure meeting. We are 24 of May 2022. Today we have one site, Rehmer, Stefan Merlin, Haydem and DuPortal, and we have Mark White. Okay, let's get started with the announcements first of all the weekly. So successful. Yes, 2.349 has released and the release checklist is complete. And the image is okay on our side and we are waiting to fix the shell library bugs before delivering to the fresh site. So no problem whatsoever. Do you have other announcements Mark? No, none from me. I don't either. No incident. So we can proceed to the notes. So about the task that we were going to close on the past issues. So what we worked on is finished and delivered and doesn't need any rework. First of all, we are now using Docker community engine on the Azure virtual machines for Windows containers. That wasn't emergency. But thanks to Tim Iacomb, we know that by default Azure is going to stop updating the windows of images on Azure because their partnership with Mirantis is going to stop. And we were using the result of that partnership by default, which is Docker enterprise edition. We don't need these features. So our process is now building and installing Docker using Windows Core version with the community edition. No versioning. It's the same way like we do on the Ubuntu images, we use the latest version. We could version, but that's, we don't see a reason for doing that right now. It works very well. And that's also opened a side effect, which is not part of the team priority, but now we have a first version of the windows of our 2022 images. So the next step will be to test this images next to the actual windows of 2009. Second task close. Windows virtual machine agents were slower unavailable on CI and can say during the weekend, there was an incident that has been fixed. I don't remember the issue, but it was quite straightforward. No, it was a pure Azure issue. I'm looking that to remember. There were some issues. So I've tried to document the diagnostics, the troubleshooting process. No logs on Jenkins site that were useful except I failure to deploy the virtual machines. But on other side, there were error, useful error. The disk size that's a side effects of the previous tasks with the new Windows server that was the new templates base image that we're using. You need 127 gigabytes, which is the defaults. And we were using 90 gigabytes. It was only on CI Jenkins IU that fix was already in place since months is entrusted see me. So just fixing that configuration item was able to solve the problem. The issue wasn't blocking the users and the slow, it was just only scheduling virtual machine agent on Amazon, and not on Azure, it was not a mix of the two. So we still have a working workloads for our end users. Thanks, Mark. So we gave access to Vincent Latone to the VPN. Initially, the goal was to allow him to trigger release of the remote team. So the release of the remote team was another subject treated by team and you, Mark, as I understand. So Vincent close prematurely the issue. And then I took on myself to give him the virtual the VPN access, because Vincent is someone helping us a lot of time and we will need his help to try to show the Kubernetes plugin with basil. So the VPN access was more than needed that case. So that's all for the completed tasks, because we have a lot of long running tasks this time. Is there any question about these tasks, close tasks, or can we go to the walking purpose. 123. Okay. First one. This is a permanent redirect. Jenkins is the way that I use the former domain to stories Jenkins. Stefan, can you give us a head up. Yes, we did with the ID and chart for the HTTP redirectors that will be able to handle all the reduction we want. We updated the Kubernetes configuration for for cloud cavities to handle that specific redirection Jenkins is the way that I owe. Now that's the C name. That's the redirection DNS point as a C name on the on the IP on the domain sorry for that public cavities to be able to handle that. Okay. So the redirection works if you use a customer to see posts. Yes. And we are waiting for Alisa gave in. I'm not sure it's looks like because Alisa told us to see with you Mark. And I remember that gave in at some time was created the thing initially so not sure with the owner of the domain. Yeah, I'll double check with Alyssa I believe she's actually the owner but that's putting ownership of a domain into the hands of someone who isn't typically doing domain ownership so let me talk with her. We'll we'll we'll work it out and be sure that we coordinate how to make that transition. It by any mean if we can transfer the ownership of that domain or at least the DNS zone to Azure that would be easier because we can manage it if possible. Once name should be well and that's good so no prefer to change the ownership to Azure so that in protein can maintain it and that that is that is the right thing for us to do. I don't recall who it is that she purchased it from originally but it should be feasible to transition it. I just don't know what all the steps are. I hesitate to ask if you need help on that area as soon as we have one access. So almost almost closed. The concept is giving access to Brazil. Since this is an internal cloud this process. There isn't a detailed status on the issue. I sent in the information and I'm pointing for feedbacks for me. I didn't have, but I only send it yesterday or two days ago. So, to be some time so no expectation there. There is a situation that we don't keep that issue and we wait for Basil to give us feedback. If we get feedback, I will put it back in the next iteration. Otherwise, we let them write somewhere until Basil is able. We have the auto notify people based on service routing rooms. I assume we didn't have time. I have to work on it. Still the same problem as I can't notify a team. It's right. The team name as well comments. So it's not interpreted by get up to being the team. I have to maybe combine the two get up action to get what I want. Okay. Do you think you can work on it next week or should we put it back to the background? I should have time to. Okay, so we'll keep on for next week. Mirrors Jenkins so we had an incident last Friday. When we change the C name for the mirrors to the new machine so mirrors successfully switch to the gate Jenkins IE it worked. We didn't hear from an issue in that area. However, we had the side effects. The domain of dates that you was seen in the mirror. And the TTL of the DNS was one hour. So during one hour, we had to wait with broken updates until the DNS went back to the original machine once fixed. So sorry for the impact for the people used on that area. And we will have to take care. It works very well. The mirror mirror beats is now the only mirror system it seems to work very well. We didn't see a big peaks in changes. So the next step for the upcoming iteration will be cleaning up the code, ensuring that there isn't any upcoming request to the actual virtual machine. And then converge, remove PostgreSQL mirror brain, the virtual host and put it on the puppet management again once mirror is removed. That means that that machine will switch from three features to two features, which is a good thing for us for the management. So still work in progress and going to work on that. No question. Digital ocean. They went back to us. I am going to take appointment with them. They are interested in continuing the partnership that they want to discuss feedbacks and evaluating the needs for the upcoming year. So we were we have to walk and preparing ourselves and measuring same rate as the rate we were burning money since the beginning of the sponsorship if we keep the same size, and then different scenario with more and different level. So we can have an idea of the impact. And once that we take appointment with them and then we can get started again. They are ready to add us on that area. But I personally assume they will ask us to another blog post or case to the or something else which will be fair given they give us money. Ideally if we could envision moving. Everyone will be trained to say, what if we stop using EKS, what if we stop using humanity's cluster on Amazon and using only digital sound that the scenario I will want to get as a baseline. And then we see the cost is too much than we can negotiate. But that will be a safe word for us. Because we wouldn't be, we wouldn't have to depend on the AWS for that. Sounds good for you. If there are any other ideas that you could use digital sound for, don't hesitate to get it. That's perfect. For instance. So walk in progress we have to take appointments a minor administrative tasks and a bit of knowledge. Questions. So the integration between major duty and that a lot seems to be working as I know it has received alerts from major duty. So we will only remove the on call mention. We received the application notice and see if we still receive a page out here. And if not, we'll add it as we need, but it seems to be correct that easy as it is. So there is that that integration was already done before, and we had both integration enabled, because for each at home call deprecated annotation, there is a twin annotation at major duty, which is not deprecated. Since we have the integration mark that's already done in data dog. It sounds like that it's working because of that one. So we'll see the worst case is that by removing at home call. We disabled that feature, and we cannot enable it again because it's deprecated because maybe it's deprecated but still running for us that's the thing we are not sure so we need to be sure. So that's one request away. And if everything goes according to the plan. Then we can close that one. We'll wait on the next iteration mirror in Singapore. So now we have all the information required now run book has to be written. We took the opportunity to meet with Olivier today, and he shared some information we have some information on different areas, and some are on code that we deleted a few months ago. We just have to dig, but the idea is that it's only finding the correct mirror URL to give to the people so they can her sing from. I don't know mark if you remember, it's one of the two of us sell but I don't remember if it's the CHI or the New York one. I'm not sure which which one maybe we can send an email to a cell I'm not sure with the contact to ask them. Could we misery if we are not breaking one of these mirrors because of the daily are seen doing by all the others. So, so shouldn't they be able to our sync from any mirror, they should be able to our sync from from. I mean, it's, it's not a they are truly intended to be mirrors of each other so all we're doing is latency so either of them is fine or the one in archives at Jenkins.io or the one in Singapore or not Singapore the one in China. I mean, any of them should be viable as mirrors shouldn't they. That's a good point. So, yeah, we tell them to her sing from one of the mirror the closest to the location of the list that we have on the mirror beats that is my understanding correct. Well, I mean, it does introduce latency if we could say, hey, our sync from some from the first one selected. And for me, either of the two at OS usl is are quite good. So I'm not I'm not worried about which of those two is is faster, I would assume they're already getting lots of our sync traffic from plenty of people like me. One more our sync consumer is probably not going to overload them. If you have multiple public IP to avoid being denied. Oddly enough, no, I don't and they have never denied me as far as I can tell so. Okay, so one book and then we can contact them. Actually, maybe my mirrors are running from the St. Louis mirror so I'm getting one from X mission, but nonetheless they've never rejected me either. Okay, to transplant to the next iteration. Okay, next one is building our own windows images. So almost there. I've got the major build and tested. I have to to use this new image in war in the Jenkins some fun. Yeah. Nice work. So we will have to test case once it's deployed on release and infrasci that new image will be used for the upcoming weekly release. So next Tuesday. It's important to have this in mind. That's the windows of our core. That's literally the same image so that shouldn't be an issue. But that's the first step. I mean, we have only have to be careful proposal is are you okay if we do it only Monday since there is an upcoming long weekend, I will prefer. I will test. It's afternoon. Yeah, because it's from exactly the same image. So as soon as they are published and we test one builder, it won't change later in the week. So either it breaks this afternoon and we wait Monday to fix it, or either it pass and we are good. What do you think we do it right after. So we can ask Adrian since he's working on some couple requests, so we'll be able to test directly and give us a yes. So it's good for you, Stefan Mark. Yes. So almost done. The two next tasks are going to jump for the next iteration, but nothing to say we didn't have time to work on the Oracle port. Maybe we'll start working on that later today but we did not. Maybe the two last ones. See I can say so I've published the note from the post mortem and share the recording link between the four or five of us including basil. So I need a review from everyone before publishing it and closing these issues. So most of the action point have been done or we are waiting for Basil. We didn't add any, any issue and see I didn't consider you to write limiting. So right now no emergency that service that still exist until we are fully on right limited. So I propose that we just check that and maybe bump basil because it looks like basil is doing a lot of stuff. So I don't want to put pressure on him. So if he doesn't have time just let us know when we should be okay. That's all. Yeah, you have a lot of things. Yeah, we're thinking about what I thought you were talking at the beginning, you know, we have a problem with the dance not coming up on the on the Kubernetes. That was coming from Azure, not from Amazon, I think they had. You don't remember that. No, I found the link that they were having issues while we were looking for last week as your incident that was before last week meeting. I don't remember exactly I don't remember what you are. Okay, we'll check you later and open a desk if we forgot it. So the knowledge will be shared. One last thing I haven't tracked it yet. I had an answer from someone at Docker. So, as you might remember the Jenkins CI and Jenkins CIFRA and Jenkins forever organization and the Docker Hub are open source and they are not really limited. So, I'm writing something on the Docker open source program issue. I will put it on the next iteration. Our end users are not API you're right limiting. And that's a good thing and that's why CI Jenkins IO is safe, because Jenkins CI Jenkins IO only pool images in Jenkins CI. So no more right limit for these images since 10 days. So, our account is still facing some time to times API rate limit when we have machines building all the Docker image for the Jenkins controller or agents, because they, they build from official Docker images, which are API rate limits, most of the bandwidth, so makes sense for Docker to API rate limit is we misunderstood that part. And I asked them what we could do about that and they might have some team seats for us. So I've asked them two questions and I'm waiting for an answer. First one is it okay to have two seats that are that have the extended API rate limit per organization, one for trusted and one for not interested. Or two push or but two per organization. And the second question is, if we had this seats, we will go to four seats organization to humans and technical accounts. I asked if it's a problem, because we should be limited to free. So can we raise the limit to at least five, just so we have multiple humans or technical or some margin. Exactly. For the rest, the answer on everything. And I'm waiting for the answers. So we are safe for see a Jenkins say what is the root cause that it's one month ago. But we are, we still have API rate limit issues that might slow down the Docker build images for Jenkins itself. I think that's all. You have other topics that we forgot. Mark, do you have other topics that we could have forgot so that we walked on. None none for me I did make a note in the announcements about the four dot 13.1 remodeling release. I had forgotten to mention that we've switched remodeling all future releases from the main branch from the primary branch the master branch will use Jeff to 29 we already have those releases for that 13.1 is an exception because it had to be released from with the code signing and therefore had to be done untrusted. And so the change was surprisingly easy. It worked well but if someone were to need to release some other old patch version, we would have to make a similar change there and then it needs more discussion. So I love that we switched to Jeff to 29 it means big big easy benefits that job can be couldn't be removed from trusted but it can be ignored now we don't have to use it except for patch releases. Nice. Nice nice. Now about the next release with the next next next. So I've put two topics. First one, the crawler tools metadata generator. So he understands. Sure, my knowledge is quite limited. But it sounds like that that job, the crawler generate a bunch of file statistics files push somewhere. And when you are in Jenkins and you add new tools installation. If you have an automatic installer that plugin use the files to configure or to have the list of installer is my understanding correct mark understanding is correct the crawler the crawler finds versions of JDK or versions of Maven or versions of Gradle or versions of choose your arbitrary tool and generates lists that are consumed by that by those things as JSON files. So it appears that since a few days, the builds on trust this tie for that particular job, I still know the Jenkins infra these builds are failing the underlying reasons. Thanks Daniel for putting that it's because in four weeks. There is a certificate that will expire. It sounds like the update center signing certificate which is renewed once a year. So we have to update it before June. I don't remember exactly the year 14 of June so it's not the emergency but it's important. And there is a kind of failsafe mechanism that make that job failing for us. So good thing that the kind of notification but we need to rotate that so I propose to add that to the next upcoming release. So we have to ask or find information about how to renew that one between team. Daniel, and maybe you Mark, I don't know if you're at ease with that part. What is the thing is that we need to also add the calendar alert on that area, because we don't have some. And so the calendar alert will show us the run book. Between run books and there we can act on improving that specific area for now. So I propose to add that one for the coming milestone if it's okay for everyone. Yes. Yes, absolutely. Is there someone interested or can I pick it or I can. Yep. Yes, I don't know what is the process. We can try to look around before so. Okay, I will try if it's way too much. Yes, so adding it to Stephen. For the coming one. There is also. Open by Basil, there has been a long discussion mainly about English semantic. Sorry. It's Java 11 requirement for the infrastructure. I mentioned it. I don't think that action for us right now. Sounds like they are on the right on whatever direction. The thing for us infrastructure is to follow up if they need something. I proposed we keep that because it sounds like there are a lot of different ways of doing that. I'm not sure that basil and team have a consensual ideas on how to do it. So, if it's okay for you, we track we watch this one because it's important due to the dumping job eight in September, we will have an upcoming weekly for that. So, let's watch that one and see if they asked for help and mention each other and we'll try to take decision. All of us. Sounds good for you. Mark is that okay for you for the process. Yes. Okay. And I think that was the last two months. So we already have a pretty packed milestone. I'm doing one last sanity check on the upcoming to see if I don't forget. We clear release key flow. Gc AWS or not it's only back burner. It's only issues that are up. I've created to track duplicate EMG in favor of Docker. That's the discussion. That one is an easy one proposed we added to the coming the goal is to build using Docker everywhere. So we will be able to dump EMG in the future. We also use container. Exactly. Can I add it to, is it okay for you to take it. It doesn't involve cleaning up the MD from the Docker image is only involved switching the pipeline library to docker get your mission. Okay. So that's all for me. Other topics that you will want. Good for everyone. Yes. So reminder it's a long weekend for people in Europe. I don't know if it's the same for you in the US more. It is yes. So expect some less activity for the coming milestone. And then everyone have a good week. Take care about some rest. And stop the recording.