 Hello everyone, welcome to the weekly infrastructure team meeting. Today we are the 21 of December. We have today, Stefan and Harvey with me. It's start to be Christmas holidays. Okay. Mark won't be able to join us today. His half team won't be able to join the Azos on duties. So we can get started. First announcement that the proposer, I propose to cancel next week. Team meeting between Christmas and New Year's Eve. Because almost no one will be there except me as far as I know. So let's see each other in two weeks in 2022. Unless there is someone that disagree and want to join during their early days here. No, thank you. It's good. See each other in 2020. Second announcement, the Jenkins Weekly. Jenkins Weekly release. 2.326 has been released successfully. Thanks for Mark, even though he was expected to be in day off. So Mark, when you hear this recording, that's a blame for you. But thanks anyway. I triggered the release. And team was able to fix the last part that failed. In validation under the script. So we'll have to diagnose that further. In a few days. And the container images failed to build the window spot because not in this space. That's something that happens sometime. You might want to increase the disk space for these missions. I don't know. Do you have. Do you have other announcement or elements works? Nope. Okay, so let's go ahead. We have a pretty pile agenda. First point. Just a note. Thanks for taking that job. We renamed. Two of our main repositories used for the Kubernetes management. So now the name of the repositories and their documentation have been updated. To have more understandable names. So now charts has been renamed to Kubernetes management, which is the repository in charge of managing our Kubernetes applications. And public charts is now Elm charts. So it's clear that it hosts a set of Elm charts that we use and that and people could use if they want. Thanks for this part that will help a lot for contribution and making it clearer for newcomers. And also for us. One point with since two days we have TCP connection errors on the build agent running on the Azure Kubernetes cluster. We don't know if it's related to an external outage on networks or to some network error that we discovered in the Diagnose panel of the Azure UI. But at least we have that error since a few month is now that say we have an overlap of some network ranges. So the overlap is not with the pod subnet. It's with the service subnet. So these are the virtual IP that are that are used as external IP for each Kubernetes service in the cluster. And the documentation of Azure clearly states that these say there must not overlap with the virtual network use for the not pool or the pod IPs. The thing is that we are we overlap because it's a subnet. So we'll have to change that. However, the documentation that I've linked there clearly says, oh, no, I need to add the correct link. So it has been confirmed on one of the GitHub issue of a case that we cannot change the service on a running cluster. We have to create a brand new cluster. So I need to update the links here. So the proposal here will be for the upcoming weeks. The target will be before end of January, at least to create a new cluster named private cavities that will sit on the private network as per the GP that Olivier and Tyler wrote a few years ago. That describe the kind of a network IOP sorry. The IOP infrastructure enhancement proposal where a way to write RFC life for the infrastructure. And so there was one related to the network. I think it's the first one. No, I'm not sure not the IOP one is. So I don't remember exactly which one. Search. Code. IOP two. Yes. It was about network segregation. Right now, our public case cluster sits on the public production. But that cluster is also hosting services that should sit on the private production network behind the VPN gateway. So here will be to create a private cavities cluster inside that private production with the correct with a correct set of network settings and maybe other changes that we will discover to have a proper cluster. Then we can start the proposal is to start migrating private services such as infrasci really see I the graphana or anything behind the VPN out from public to the private. Then we will see if it's changed the TCP issues. And if we have other issue after one or two week on that one we will be able to then create a new instance of a new public cluster and migrate the public the public element on that one. That's the idea. We might have a dependency on terraform. But it's a nice to have. We can start right now using the GUI and managing the new cluster manually. However, it could be easier to change, especially the DNS port if we manage all of these on terraform. My proposal given it's almost Christmas is that we delay these changes to the next week. If it sounds good for you. Yes. Brand new year. A word about the public gates migration. Most of the public application that we are hosting are stateless or depend from on an external manage database, which is not hosted inside the cluster, which means that should be easy. Except maybe for the LDAP that's the only exception for all there's a service that should be easy to have a copy of the services running on the second cluster, and we can do a B with the DNS. So that should be that that should be good enough. Let's see. Worst case that will be an outage when we try the migration. We'll have to do that with a full day with services down and we'll see. I prefer having a bigger outage but planned ahead with people being aware. And so we don't rush the migration. I prefer taking a full day if it's required to be to feel at ease. Is there any question about that? So for people not available right now because of Christmas or not available to reach, don't hesitate to ask questions and I see or on the mailing list. I will send a specific email on that topic to discuss that as synchronously by email. Sounds good for you. Yes, perfect. No question. The next topic is Netlify. So thanks, Gavin. Gavin, add the needs for a spawning preview of the pull request on some of our projects. Most of the time these are static websites such as plug-in Jenkins IO for instance. And the goal is on each pull request from a contributor or from an external maintainer to have a preview of the changes in real time. So we don't restrict the pull request only to code review but also to validate how it behave before merging. Netlify is a SaaS service that allows that feature only on static sites but that could be really interesting because it takes care of the cleaning of cleaning up. Alternatively or complementary we could also bootstrap a Jenkins cluster that provide the same feature. We will have more things to manage but we will be able to have more features and we will manage the cloud resource will pay for this but we'll manage it. Gavin started a support request to Netlify asking for an open source sponsoring an open source account and there is a discussion where Netlify ask us which makes sense to add the Netlify logo and eventually some other things on our website publicly to say we are sponsored by them. That's the condition for getting the account started. There is a discussion about that I'm not sure about all the details but this has to go through the Jenkins board to validate the sponsorship conditions. And Gavin also opened the door to maybe thinking about hosting Jenkins IO on Netlify. Are there is technical question asked to us that we will have to answer even though I'm not completely sure about the exact scope of the question but it's about the bandwidth that we have on Jenkins IO today. I'm not really sure if we have to target the bandwidth with Fastly on the cluster or without Fastly. I have no idea. So I'm going to ask Gavin to clarify before giving an answer. Because the source of truth for this bandwidth could be totally different. Is there any question about that topic? No. So let's go ahead. Next topic award about rating the Jenkins.io. So this is a web service that helps to give user rating about the releases of Jenkins IO. So on the Jenkins.io website. When you go to the latest changelog release. Then changelog. And as you can see after one or two seconds you have some weather icons. These are user feedbacks about these releases. So we have a virtual machine that has to service rating the Jenkins.io, which is responsible for the weather part of the changelog. When the service is down or not acting or not behaving as expected, the weather icon disappear from that page. The rest of the page is static and hosted on Jenkins IO. So that web service was down. First of all, we weren't able to catch any alert about that because in fact we had a bunch of alert due to the TCP issue we mentioned earlier. So the alert was hidden in the pile. That's a shame but thanks for the user who report that on the community forum that helped a lot. So we were able to react promptly after that. Thanks Hervé for jumping on that. So we had to restart the virtual machine. The virtual machine was in bad state. The AWS console wasn't able to reach the machine and was triggering alert. So we took the opportunity to restart it, change its size so it was migrated to a more modern supervisor on EC2 and we upgraded the packages on the machine. Hervé also worked on the Docker image that ran the service inside that VM. So thanks for that. We tried to put that Docker image under the same recipe as all of our Docker images. There are still a bit of work to do though. We are missing the automatic release ports. That one is an easy one. Stefan, if you're interested, that could be an easy one to do. It's one YAML file to put on the correct, we can point this to you. Okay. Let's try. And the Apache version is shown. There are two Apaches server. There is one on the virtual machine, which is not showing its version because Hervé already fixed that a few weeks ago on all the platform on the pooped side. But there is an Apache server for running the PHP request inside the container. And that one should be configured to not show it first. So that's a task in progress. Is there any question about this topic or things unclear? No. Cool. Let's move on. Ongoing and huge work on the upgrading all of our update CLI manifest. New version tracking all dependencies of all Docker images, all charts. So there is a heavy lifting here. We discover every day a new repository with a new set of dependency to add. So it's like the unfinished stories, but almost there. We might want to follow Olivier's advice though on updating the Docker image tag. Of the default value of our own private, our own charts three public. We might have to roll back what we started to do, where we were overriding the default tag on the Elm files on Kubernetes management. So that means a bit more time because we will sequentialize all the updates. But that might that also means easier to understand for newcomers or maintenance. That's always a balance to find. We have a few incoming requests to take care of one from Daniel about conditionally remove some Apache directive for spam protection on the third CI Jenkins instance, which is a private instance. A request from Basil, which is minor but might need help from Jenkins core maintainer. Because he is so that the Jenkins core pull request or builds are not able to restart after a full restart of CI Jenkins. They should be able to start again. But the thing here is that Jenkins see that it has to restart the pipeline so it works as behaved. However, the agent with the full file context on the working directory that are Kubernetes pods. These agent are gone. So it's hard to restart. So either we'll have to work with Jenkins developer to improve the pipeline so it can restart without state. Or we can work on trying finding a way to persist the Kubernetes container use for these patterns. I'm not really sure about the correct way. So we need to ask for help on that part. I'm not good enough on that element. Minor request about Jira setup. Someone want to change your field name, which I don't disagree with. Just don't see the value, but I mean, it doesn't cost nothing to do it. And it's very well documented. So why not. I've been set owner of the Jenkins if her team private mailing list, which is used to contact privately the Jenkins in front team and use as a external communication email. So now I have to add their V and Stefan as members of that mailing. So you should have all the information we mentioned earlier, Netlify stuff, discussion with G frog, etc. I still have to contact Tyler and Olivia to ask them. What's the purpose of the service and since Jenkins IO. Because we don't know. Thanks, Tim and Mark for enabling the Azure centralized logging on Oracle Cloud, which mean you have you only have to authenticate using your Azure credential with centralized authentication and it should be able to access the Oracle account. So I've, I think RV that was good for you. It was good for me. So we'll have to check with you, Stefan, but you should be able to do it. And on the hold we have the puppet cleanup and the IO domain renewal. This element will have to wait for next year. IO domain renewal is, I think we'll miss that part. It's automatically renewed and we'll have to pay but the difference in cost is not so much. It's like 30 bucks for one year. So that's not so much. Okay, that's all we had. Don't know if you have other topics to bring. Good. Good for you, folks. Wishing a Merry Christmas for everyone. Thank you. Wishing a Merry Christmas to you. Happy New Year and see you in two weeks. Bye bye.