 Hello everyone, welcome to the Jenkins infrastructure weekly meeting today, we are the 20 of June 2023 around the table, we got myself Damien DuPortal, we don't have RV, he's in holidays, look at him, Mark Wait, Stephen Merle, Bruno Verarten, and Kevin Martins. Okay, let's get started with announcements. So the weekly release 2.411 has been released, that is the war packages and containers. I assume changelog is going to be there, is that correct? Correct, it's been merged and will appear, or it's been flagged for merge and will appear shortly. Cool. Something that I will also describe later today on the platform sg meeting that really is might will should also have updated the Windows Docker image to previous week weekly 2.410. More on that on the platform sg I don't want to spoil anymore. I haven't tested that yet. The goal is before until next week we should be able to do manual, let's say non automated, but non manual processes to publish today's weekly as a Windows image that might need to add some tags and do some things. Thanks for the contributor who did that. I'm not aware of any Jenkins infrastructure Windows controller usage, but since it was a change better to to mention it here. I don't have other announcement. Is that the case for you folks? No, okay. Let's continue with the upcoming calendar so I expect the next weekly release 2.412 next week. The 27 June 2023. Is that correct? Next year truly is this plan for next week, but the Wednesday, the day after the weekly 2.401.2 28th of June. Is there any question clarification? Okay, a word about Jenkins Advisories. We had one last week that wasn't announced yet during our previous meeting that was announced a bit later. So that's adversary while sinfra so for the plugins that's okay. And then another issue that has been fixed on the 2400 weekly release and 2401.1 LTS, we are using that LTS or later weekly releases so we are not subject to the advice that advisory. No more advisory announced. Clarification. Cool. I'm not aware of the next major event where the Jenkins infrastructure member will be present. Is there anyone? Nope. Okay. So let's proceed to today to a this week milestone. I'm switching to the GitHub view. So we have two closed issues. Three closed as no work plan issues for the two closed issues. A user had issues with their account. I've triggered the full reset of their password and so there is the same TP server on mail gun was saying everything went fine. I assume the user might have the mail on their blacklisting or they might have a rule that deleted haven't heard back so the issue has been closed. And also a question from Alex with the migration of trusted CI controller the private controller from AWS to Azure. Minor elements changed. They were documented, but clearly not enough. So anyone interested on reviewing the runbook and updating to avoid someone else. It's the same, let's say e-cups. It's welcome, but it looks like information there is correct. Just a reminder for everyone. Now you have to SSH to controller the trusted that CI Jenkins IEU, which is the host name of the virtual machine that we use for SSH port forwarding. While you still need to connect to trusted CI Jenkins IEU as the web service with the ETC host configured to the loop back on your machine. The goal is when your web browser goes to trusted CI Jenkins IEU. Then your ETC host file is read. There is no DNS record for this service. And then it goes through the local port from the SSH port forward. Any question? Okay. For the free other, so Reset password email. That one is an account on someone either doing an error or mixing our infrastructure with Jenkins, same for the 3625. We closed, but we worked on a remote repository for Keras Labs. Some of the Jenkins project, including Stapler, require a dependency name that is stored on elementary release from Keras Labs. We wanted to add that mirror as inside our repo Jenkins Artifactory because that remote repository was having issues in sometimes it was answering errors or wasn't properly set up. We tried to mirror it, but we hit some hiccups. Thanks for the work from Basil, Vincent Latom and Jesse Glick. They were able to guide the people at Keras Labs so they can publish directly to the Maven central repository. So now no need for us to have the mirror since everything is on our own mirror of the Maven central repository. And it's it's cached directly by our system. So that's why it has been closed as no work done because there is no additional work to be done here. Any question? Okay. This was the walk. The alien seems to have frozen for me. Did others see this perceive the same thing? Yes. Hello, can you hear me? Yes. Okay, sorry. I don't know what is happening. We have storms in the area in Belgium, so I assume my ISP is having issues. Sorry, I will try my best. Okay, let me share my screen again. Can you see my screen? Is it readable? Yes. Okay, it's still recording on the cloud. Okay, so I was saying Stefan, Erwin, Hiya are currently changing our machines and rotating our passwords so we are not able to work with the availability we used to have this week. So we don't have a lot of work done. That's effective. And that might at least one time with our environments. The sound is not really good. Do you want me to try to share my screen and you just, your voice and your video? Yes, good idea. Okay, can you still hear me? Sorry for that. Why am I having these issues? Okay, just been texted by my ISP. They are having issues. Okay. Guess what? It's their DNS infrastructure. It's always DNS. Can you see my screen? Good for me. Cool. So just a word about the work in progress now. First, you can stay on the notes of Stefan. No need to open the issues unless someone need. So installation and configuration of the Datadog plugin on CIGENK in SAIO. So right now I've ended over this task from AirVay. So I'm working on O to ensure that the Datadog agents which run on the virtual machine host is able to communicate with the controller. There are two problems to solve or being solved here. The first one is how to configure properly the Datadog agents. We don't want it to listen on everything, otherwise someone could send data through the UDP listening port from the outside. And secondly, the second problem is how to make communication between the container where CIGENK in SAIO controller runs and the Datadog agent on the host. The second one is easy. We use the Docker 0 gateway IP. Everything listening on the host machine will listen on that interface or is able to. And then since it's a gateway, just have to use the gateway IP as the destination. Since there is only one Datadog agent and it's the only protocol listening on that machine, there is no conflicts. So the goal will be to reinstall the Datadog plugin and set it up to listen on whatever IP on its ATN 125 instead of localhosts. For the first one, we will have to add a network security group to forbid any incoming request at the network level, as hinted by Tim Iacombe. And also we will need to update Puppet in order to ensure that any requests from another interface than the loopback or the Docker 0 should be dropped or even rejected. We cannot drop all UDP connection though, otherwise we kill the DNS resolution. So we have to be careful on this one. There is an account issue for the next, any question on this one? Okay, account issue, I'm skipping this one. We are waiting from the user and if they don't answer in today's week, close it as usual. Ubuntu 22 or for a great campaign. So the machines hosting the census and usage services has been upgraded. These two machines are still running on AWS and should be migrated to Azure soon. Also, we migrated archives the Jenkins IO, which is an IRM machine running on Oracle, and it's also running Ubuntu 22 or for. So now the last non Ubuntu 22 machines that we have to migrate. I accept Puppet Jenkins IO, which must be Ubuntu 20. For the rest, we have CI Jenkins IO virtual machine. We don't plan to upgrade it, but instead we have a new machine that is going to be created on a new network and we will use this one as a base instead of migrating the current one. We have the AKS cluster node pools and in order to switch to Ubuntu 22, we need to upgrade to Kubernetes 1.25. And as far as I can tell, these were the last machines, the rest of the machine has been upgraded. Oh, no, my bad. We have updates slash PKG machine. That one will be tricky. Any question? So I don't plan to walk on PKG this week because I will focus on Kubernetes 1.25 and other tasks. So I propose that this issue should move away. If I got extra time, which I don't think I will. The proposal will be with Stefan or someone else interested on how to run the release process of the Jenkins core inside a container. So we could migrate the updates machine to whatever OS we want, because all the dependency will be an Ubuntu 18 Bionic container. And then we could upgrade separately the container and dependencies with a test offline without requiring breaking a release. Next issue renew SSL certificate for updates Jenkins CI org. So thanks Stefan for driving this one. So we were able to discover that a former experiment by someone named Damien DuPortal, aka myself, had some leftovers on the USR bin repository that was pointing to the snap. That broke the Chrome tab that was running every day in charge of checking if a renewal is required. We cleaned up that part, but it's still not renewed because we need a full path or we need a way to update the path seen by the content processes. Otherwise, the third bot installation is not available. So there are multiple solution here. The one we discussed and that we agreed on separately and then with a consensus is to stop relying on the automated Chrome tab and write ourselves the Chrome tab line using Puppet. So we will control every pieces, including the logs doing so will avoid us wasting time like this one because it's been six or seven months that we have the issue repeatedly. And with the new Chrome tab, we would have logs that will surface the error. We would have solved that problem seven months ago if we would have had logs. Since we cannot do it, let's fix it by ourselves. Isn't that will be useful for any other kind of Puppet driven let's encrypt renewal. Exactly. The goal is to finish this issue this milestone, please do not run the third but renew command manual on the machines, because we want to be sure that the process works as expected. Keep saying that Damien because there's the risk that Mark Waitz going to go in so I appreciate your saying that thank you very much. To be quite transparent, the first target is myself I almost forgot this this morning, and I was to update that I was like no Damien no as if I was to person in the same brain right. Okay, so I'm not the only one who is sorely tempted. Oh, I know how to fix that and then the problem hides again for months. Great. Thank you. Okay. Next step is migration of the public gates cluster. The migration is finished, except for one service. It's been migrated officially. It's the mirror download system but we kept the old one because we still see some requests incoming. My goal is to write down a quick blog post that will need your help Kevin and Mark and Bruno and anyone here. My goal is just to communicate that the public IP for the mirrors for the mirror director is going is going to be changed. So when we will kill the former cluster in a few days, I will let you know to survey when it will be back. And complaining that oh it's not working. They will have a blog post and public communication on the community Jenkin say oh so we can say hey please respect the RFC for DNS resolution caching. So almost there. And everything else has been removed a lot of cleanup has been done by RV and that area. So, yeah, that one was a huge one that opened the road for IPv6 support for the mirrors and some services hosted on the new cluster. And also that mean we only have CI Jenkin say you and search CI left in the overlap network. So good news people, almost there killing that whole the unreliable network. Next step Stefan proposal for application in publicates to migrate to RM64 can you give the gives gives up ahead on this one. Completed the first list with all the the services that have been moved in the new publicates, but not worked on the on the one that can be eligible for the RM64. So we got few, but the list is not finished yet. Cool. Thanks. Should we continue working on this one for the coming milestone. Yeah, why not. Yes. Okay. And next one is use a virtual machine, a new VM instance type for CI Jenkin say you a migrating CI Jenkin say you to Ubuntu 222 in a new network. Right now, I haven't worked on it since since last meeting, except for starting, let's say proof of concept of inbound agents right now I'm having minor issues. I expect to work on this tomorrow since my machine has been migrated. Remove IP restriction on bounds or migrate to VPN. So contrary to what we discussed last week, it appeared that during the security release. It has been an annoyance of adding custom IP. Besides the security team confirmed that it's, it wasn't a problem for them. Team and Alex expressed their concern about that restriction and other solutions. There isn't a consensus on that area except that instead of restricting IP that can SSH connect to bonds requiring VPN access could be a good intermediate for that. So you don't need to keep track of your public IP, which is. Yes, I would say, if you have to change your public IP is either you have an ISP issue or you shouldn't access trusted CI. But the VPN is is a good intermediate so let's go that route. That's no will need a bit of work because we need to peer the virtual networks between each other. I've added the to the list, but I propose to delay that task because we don't have a very yet and we have other tests to finish. Of course, if it starts to be an emergency or blocker, please raise your hand and we will put this back from the backlog back to the milestone and change priority. But as far as I can tell for now, we should keep this one. Matomo GitHub Docker repo haven't done anything yet I plan to work first day on this one. I keep having this one on the next on the next milestone so unless there is any question, I can jump directly to assess artifactory bandwidth reduction option. Mark, can you give a status on your part of that task. Apologies I still haven't done it and that's been my I must do it. But today, I've got to send the summary to to our colleagues at JFrog so that they're aware that our initial experiment did not have the desired results and then we've got to find. Okay, what are the alternatives. But that won't happen until late today. Okay, and I apologize I have to drop off the meeting here in two or three minutes for another meeting. Just before you drop. I have a meeting invitation named G frog bond with status report for Thursday. 6pm my time so noon. Is it a meeting with JFrog. Looks like. Yes, meeting with you for Stephen chin or Laurel also interesting. Okay, good. Yeah, so I thought Stephen was out of the office but but we'll look forward to them. Yeah, I'll get that sent out and we'll meet with them and talk to them then. Okay, can you double check with them because it that might be my Google agenda messing up. It's similarly on mine as well but I think it's worth asking, right because it's a good excuse. I send them a summary today and say hey, we see on our calendar this but we're not sure that you're in the office. Cool. Thanks Mark, so we can release you. And next topic is a great equipment as one dot 25 so the goal that has been delayed from past milestone but I keep having this goal is to upgrade one or ideally the two digital ocean clusters to the new Kubernetes version. Jenkins CI failing for Jenkins plugin after changes in Jenkins file. We haven't heard back from the from the maintainer so I will have the last message and we will close this during this milestone that issue, because we did everything we could so I need to check if things have changed otherwise we close it and we can keep going. And finally artifact caching proxy is unreliable that's directly related to the CI Jenkins IE agents that need to be shifted network. So that's my next task for the CI Jenkins IE that's why I'm keeping this one. Now, let me just check if we can you open github.com Jenkins and trial desk Stephen. Let's see if we have new incoming issues. Nope we don't have any new issues there. So nothing new everything has been taken in account. So nothing else for me. Do you have other elements for you folks you want to speak about. Okay, so then I'm going to stop the recording so for people who are watching this recording see you next week sorry for the bad sound and internet cuts. Let's do better next week.