 Welcome to Jenkins Info team meeting. It's the 9th of November reminder that we adhere to the Jenkins Code of Conduct when we're here. In terms of announcements, we've got the Jenkins 2.320 release that's in progress and it's looking good. The war file is done, the Docker engines are done. We still need to run the checklist. Helm chart refactoring is in progress. Arvay, did you want to give a brief summary of the progress there? Yeah, we are simplifying the structure of the end file. So we finally will have one file for the public k8th cluster and another for the k8th cluster and it will be easier to read and understand them. We've started by splitting the chart repository in two with one with the end management files and the other just for the war charts. So they can be updated separately. Great, thank you, thanks very much. Jenkins election voter registration has now closed as announced with 81 registered voters. Ballots will be sent probably during the course of this week. All right, so Demian had assembled a bunch of really excellent notes on things that have happened recently. I think we should go through them just to be sure that we've got an understanding. So we released a major security release last Thursday and a weekly to match with it. Had to deal with some issues on the VPN. Arvay, you had had to do actually quite a deep dive investigation there to fix that thing. Thanks very, very much. And the release was delivered on time and had the desired content. Anything that you wanted to share, Arvay, to highlight what we learned from that? It was a problem with the VPN machine which has three different networks configured on it. One for public infrastructure. And the other, I don't remember but Demian will say more in a minute. And a third one, which wasn't used and was in provision of private cluster. And so when the machine starts, there were conflicts between cloud in it and the network configuration and inverting the two network interface. So we had to fix them so we could be able to have them in the correct order. And we also discussed about getting rid of the third network. So we could use a far smaller machine because the recommend for free networks require a quite big machine. I'll tell the machine for that. It's detailed a little bit later. I think. Thank you. Demian, anything that you wanted to add there in terms of additional notes? No, that was a perfect explanation. So I'm sorry, I've been time zoned. I thought it was in one hour so I didn't check the clock, my sincere apologies. Welcome, we need to persuade governments to stop meddling with clocks. Vote for people who stop meddling with clocks. Yes. Just a note, I received a message from Olivier that he's not able to join us because he started a new job and he's quite busy. Excellent. Well, thank you for it. I'm grateful he let us know. That's great. Thank you. We had a certificate renewal failure on archives.jankins.io and Demian's got a good question here. Should we consider in the future using traffic as a better way to manage these kinds of services? And I think it's a good idea. We had intentionally chosen a separate machine to spread the bandwidth costs in this case so that this one was running on Oracle Cloud. But that wouldn't stop us if I understand correctly from using traffic, right? We could run a small Kubernetes cluster there and run traffic on it. You don't need Kubernetes at all. Oh. You can use traffic in Kubernetes but here it's the perfect usage for a tool like traffic. It's when you have a single VM with a single Docker engine and you have one to three services that you want to expose on the public IP of that machine. So you only have Docker and it's a single machine. Ah, okay. And in that case, traffic will be another Docker container that will be spun up. It will watch for the local Docker container and it will auto configure the virtual host based on the labels of these containers. And it also manage let's encrypt automatically. That means you will have renewal like we do, but that means removing third bot and the Chrome tab. It will be managed by traffic itself. Nice. It simplified the setup and also it allows to reproduce that locally because it will mean you can spin up only the Docker Compost stack. You don't need the Ubuntu Docker part of the VM. If you want to test locally the services setup that also enable WebSocket by default, especially for Jenkins instances when they are in the backend. The problems that will cost outside the migration. First, we might need to run traffic in front of Apache in front of eventually something else because on a lot of Apache, we have specific routes that might be hard to transfer to traffic because traffic is only a reverse proxy while Apache is both a proxy and a web server. The second one is that let's encrypt won't be managed. The certificate generation won't be managed by third bot. You will completely dedicate that to traffic. So I don't know if it walks out of the box with our monitor rings or our tools right now. That's it. That's the balance. Thank you. All right. Next topic then was VPN. Anything additional that needs to be added there? No, it's gonna, there's a nuts. There are some improvements that can be done on your fix, but not really something, I don't have something to add. Anyone interested in starting to work with Puppet or starting on infrastructure can also take that improvement issues that will be refining the code that Irvine high roads last week. It's mainly template, adding templates and stuff. So anyone interested in getting started on Puppet can take the Gira. It's a, I think, did I put, yeah, it's the infra 31.24. It's already tagged as new BFRI and leach. Excellent. Thank you. And Damien, I would propose here I'm gonna shift and let you continue the meeting. I'm the wrong voice. So I'm gonna be quiet and take notes. Yeah, thanks. So sorry again for being late. Thanks, sir. They are so far taking over. That's a good thing we can, that means we shared the information correctly. And next topic is the cost is rated to the cost at least on EC2. So we are now using spot instances for both virtual machines and Kubernetes workers used on CI Jenkins IO. It's only been three full days, but it sounds like that the billing is going down. We need, let's say, to wait 10 more days just to be sure that it truly has an impact. However, as we saw earlier today, the trending, the cost explorer on AWS stopped showing us the machines used for IMEM or KS as increasing compared to last month. So it's a good indicator. We have to keep it checking daily check. And one of the concerns for spot instances was what if they get evicted? Have we had any cases of developers or others complaining, hey, my agent died unexpectedly or was not available? No, not yet. That has been underlined by GC and GC started to work. I thought on a public issue about being able to restart a stage automatically of a pipeline if the cause of failure, the detected cause of failure is the agent being put offline. And it sounds like there are part of the code already inside the pipeline plugin. So I really hope that we could be able to benefit from that. But as for today, I have to communicate first on the developer mailing list to ask them. I forgot to send the email. So you have to do it now. And so if you have this issue, don't state to report it. We will switch back to on-demand then. For the container, the risk is low because our VN-i phone, based on AWS recommendation, we don't have a single type of worker instance. The constant is that all the workers will have 16 GB and 16 CPU and 64 GB. However, we use every type of instances on the series M4, I think. The goal is that it's a recommendation to have the best available spot pool, which means most of the time, the cheapest. And the one that is less prone to being evicted because you have a lot, and the algorithm selects the pool where you have the most. So let's see. But outside this, we don't have any other safety check here. Next topic is around Azure Virtual Machine Usage. So teams spotted issues related to the configuration. We weren't sure if it was our configuration, Jenkins Cask, our Puppet, or the plugin Azure VM itself. So we have done a tiny maneuver yesterday to try to fix that setup. The consequence was that EC2 was way more used than Azure VM for the Linux and IME machines. For Windows, it was equal. Not sure what happened. It sounds like that it might be an old bug when you click Apply in the plugin itself that messed up the configuration a few weeks ago. So our config might have been okay, but the latest version seems to be okay. So let's see. So yeah, we have to double check with team that it's okay, but he was quite busy. The symptom for everyone here, if you start seeing on CI Jenkins IO, one slug, you have on the left menu a cloud statistics. If you don't see the three kind of Azure VM setups, so it's Ubuntu, Ubuntu IME and Windows, then it means that something happened and the cloud statistics are not even reported, which means there is something wrong with the configuration. Most of the time, a simple save or config reload of configuration as code will solve the issue. Help me again. It was the three types were Hymem and what were the other two? Ubuntu, Ubuntu IME and Windows. Thanks. Something else now. So we have checked the machines. We might think about changing the limits, meaning decreasing the limits for the EC2 instances to have less machines spawn in EC2 and allow for more machines in Azure. I need to write a Jira issue because there is an ongoing thing it's requesting to the Azure support to increase the limits in terms of CPUs on the region for us. And finally, the last step will be using spot instances on Azure as well to decrease the Azure cost as well. With the same risk and safety issue than on EC2. We can be evicted. Incoming Jira issue to write down the details and point documentation. There might be differences between both clouds. Finally, just a word on the new provider. It's still the same. It will be for later. We have the needs to pipeline, librarize the Terraform task that we are using today on Jenkins Infra AWS. So then we could create easily new repository for each cloud provider for Terraform Infrastructure as code. The oncoming are digital and on-scale way because we should have a bit of credit for both and they provide managed Kubernetes. We also have the CAF-3S and the 2OS-USL machine which is more puppet and infrastructure work. And there in pop mention Kivo cloud recently which is a cloud that provide managed CAF-3S cluster meaning Kubernetes cluster. So that could be interesting along with OVH to ask for sponsor. So anyone interested could ask or we will check when we will have time. That's all for the costs. Any question? Okay, let's go to the next topic. Wikijenkin CI. So huge thanks for the work that everyone put in that specially RV. Now the service is back. It's static files only from the exports. It's highly available. We have replicated service. It's virus free because there were some virus on the HTML commands. Good catch RV. Thanks for catching that bit of luck but a bit of focus that would have been a shame if we have shipped virus to our own users. We weren't able to discover when that content was injected. It could be years ago. It could be during the September hack. It's not possible to check. We don't have any clothes on that but it has been removed. So I see the phrase it's replicated in AKS but I'm not sure I understand what that means. How did, is that a natural outcome of having used a Docker image? Correct. At any moment you always have two instances and you are load balance between two instances because the service is stateless. That means that when we upgrade the Docker image adding content, rules, maintenance, you always have one instance which is always ready to handle the traffic which is migration cluster upgrades and everything for that service. That wasn't the case with Confluence because Confluence was not able to be horizontally scaled which led to a proposal from you Mark that also Hervé mentioned that could be a good thing now to put wiki Jenkins IO behind fastly like we already do for Jenkins IO and plugins Jenkins IO front end today. So that will mean creating a new wiki.origin.jnkns.io domain that will point directly to AKS and moving the current domain to fastly. We have the same order of magnitude of data on both. We don't have as much traffic on this one than on Jenkins IO but still that could avoid a bandwidth that could allow us to not have that much bandwidth on AKS directly. Does it seem a good idea? Is there any voice against that? So, okay. So Mark, I see Mark wait, right? I showed the notes, so I assume that you will. Thanks a lot. So that means on a medium term we can start to use what we learned on that one to restart the work that Olivier did on Jenkins IO related to the fact that generating a Docker image with all the content inside so we can scale it horizontally easily but with the downside of adding a bit more time between once a pull request is merged and when it's deployed. So that was a new force that was limited by update CLI and our usage that will be medium term. Expect something January. So I was trying to understand what you meant by restart the work on Jenkins.io. It's the Docker image containing the content. Got it. Thank you. Because there's lots of work to migrate things from wiki.jnkns.io to www.jnkns.io but that's well outside the infra team. Correct. You can put my name on writing the Jirai issue. I will try to phrase it better and to underline the goal and why did we stop and what are the next steps? That's all for the wiki. Is there any question? Things not clear? Okay. Next one, AKS, ElmChart, ElmFile. So there is a huge work being done around the ElmChart and ElmFile. I won't go into details because there are still discussions but the idea is to separate ElmFile and charts. So the charts can be packaged and tested on their own. They can be reused somewhere else and reused outside infrastructure for some. And it helps us to keep smaller repository easier to understand and easier to get started with. I think Aditya will confirm that it will help because right now there are ElmFiles everywhere, everywhere as well. That's the new goal. We are working quite fast trying to not break things these days with RV. Expect something finished around next week, maybe. That includes adding more automatic update of the components and dependencies. Next one is ratings, Jen Kinsayo, unless there is a question for the charts. So thanks, Gevin, for fixing ratings this weekend. So give me some background on that. What happened that caused it to break? Did it just stop working? Was there something he had to do? I have no idea if it has been broken since months or recently or if it was broken at all. I just know that Gevin worked on it. I saw a message saying that the tiny cloud and the changelug are back. Thanks, Harvey, for pointing me because I didn't know what ratings was doing before. What I know, though, is that so now it's under our pipeline library for the docker test so we can start having automatic updates. However, that will be a topic. There has been an infrage where I will add it to the notes that we mentioned on the IRC channel. The goal will be to migrate ratings from the virtual machine on AWS where it is to the AKS cluster. The challenge behind that is the database. Is it required to manage Postgres database today? So there have been some questions from a discussion with our V Team and Gevin about changing the kind of database, go-croach-tbe and something else. As for today, in the context of the effort to decrease the AWS spendings and having something more resilient, the first step could be using a Postgres managed database in Azure and migrate the docker container service from the virtual machine to a pod in Kubernetes. Gevin and Tim were saying it should work with the coach, but yeah. Yeah, I would propose to separate the topics, just to be sure. Honestly, I love Postgres code and I don't see the need to change, but maybe the price or the sake of sponsorship or just the pleasure of trying something else. I don't mind. But in terms of pure infrastructure, there is an iteration to migrate first from AWS to AKS without changing the app itself. Could you describe again why the change from AWS to Azure is that a cost-balancing thing? What's the benefit of making such a change? So the cost first for the service itself, Postgres database are almost the same cost on both clouds. The first one is in AKS, it will cost us less for running the service in a pod than having a world virtual machine with its own public interface on AWS. So this would move it from a separate virtual machine into our Kubernetes cluster. Correct. And allow us to manage it with all the techniques we use to manage the cluster. Thank you. Exactly. I miss that. Okay. And then the nice to have it will be easier to manage. So especially for the developer, there will be more autonomous than on a virtual machine. Let me add on the notes. Ratings, rating. I got the reference of the Jenkins infra. So anyone interested on that issue, don't hesitate to add the message on JIRA or NIRC. If you aren't, we will take that by default. Don't worry. Is there any question related rating? Thanks again, Gavine, and Hervé and Tim for that work. Next topic is the Kubernetes 1.20 upgrades. Hervé, I think that the topic, were you able to start working on that or is it still... That's it, that's it. Okay. No problem. So that will be a subject for next time. Thanks for taking care of that. Next topic is JDK updates for 11 and 8. Just wanted to do a sanity check. First, the Jenkins Docker image. Did we use the latest JDK 8 during the LTS release last week, Mark? So we used the JDK 8.302 rather than 312 because the Docker image for 312 was not yet available. We did use the latest JDK 11, so 11.013 for the LTS last week, but because 8U-3-3-8-12 was not yet ready. Okay. This because... Yeah, it was not available yet. The image was not available, and so we'll be checking soon. Okay, cool. So yes. So for the Jenkins agent, I haven't checked yet. I submitted pull requests for those as well. Cool. But I... And I believe they've been merged. I don't think we've released yet with those. Okay, cool. On the infrastructure. So the virtual machines on both all clouds and all operating system are running the latest version with the security issues fixed. The tools, if we use the tools, are also doing the same. For the container, there is a number. We are still on 8U-3-2-0-2. So for the Kubernetes container running on CI Jenkins.io. But we have the latest 11 and latest 17, of course. Oh, and 17 was one that I haven't checked to see if on the image, on the controller image, I think it's still awaiting the Docker image. Yeah, I think it's not... It's the same as eight in that it needs... It's the original 17-0, not latest because they haven't made the latest available yet. Oh, no. That was the first released before... Oh, it is. The base image that we... The upstream base image. For Jenkins, we don't have a 17 for the controller yet. I thought it existed in preview. I thought we had one... Oh, there has been a preview. Yeah, a preview image needs to be revisited. I'm pretty sure it exists. It's not official. It's specifically called itself preview, but it does exist, I think. Correct. Oops. I've added the link to the 17 preview that you mentioned. And yes, you're correct. That's not the latest version. Okay. That might be interesting to propose, pull request us as Jenkins Infra to add updates, CLI update process on the images. What do you think? I would love that. I think we got one or more forms of them that came from Dependabot, but not reliably. Whereas update CLI, we know works reliably to do the job. Yep. The risk being update CLI documentation still being not really complete and easy. It could be an issue for contributors, though. So that's a fine balance. Right. That will be mentioned on the pull request and that will be put to the decision of the maintainer area. That's all for the images. So for the infrastructure, the must have will be focusing on improving images on the frag containers. The idea will be moving the Docker images to Packer. So we'll build the same all in one image and we will be able to deliver future JDK or Maven updates at the same time for every kind on CI Jenkins. Finally, cleanups. I don't have anything to say on the cleanups. We still have some old puppet roles, some Azure storage accounts and resources to clean up. We weren't able to work on that. So that's an ongoing process. I have one last topic that you underlined last night, Mark. If no one mind, it's just as you saw, you had to put the PPC, was it the PPC agent? Oh, no, it was S390X. Yeah. Okay, that's it. So the EBM machine. So it was handling jobs aimed to be built on Intel. Well, what it had was it had a correct label that said it's Java. Yep. But Java is was assumed by some of the jobs as meaning Java eight. And in that case, Java eight is not available for S390X in a version that we are willing to use. Correct. So that's related to labels. So I propose that we continue the work that everybody did a few weeks ago for the IMM memory allocation. So that for these machines, PPC, RM, and we don't let them have the Java only. And we create a label like Java dash RM, Java dash S390. So if you use Java, you have an Intel machine allocated by default. And you need to specify your specific label with a dash. Okay. And for reminder, after the IMM change also the convention for windows. So you have Docker dash windows. If you want to docker with specifically windows. So I know that create long labels, but it allows to avoid accidental changes. Does it sound a good idea for everyone? If it's okay, we'll want to take that subject. Just I want to play with the labels. So, and for me, I was, I liked the notion of Java's platform independence, but in this case it was too broad. Right. It's, I think platform independence is really cool in Java. But in this case it was assuming Java 8 was available everywhere and that assumption was wrong. Yeah. And especially in that area where we change often, and we recently changed the GDK and the fact that we're not using the latest LTS now on Jenkins in general, I would prefer to provide a standard Intel machine unless you specify another CPU option. It's still, it feels too young and yeah, depend on what we are doing that could be troublesome for the contributor. Make sense. Okay. That was the last subject for me. Is there another subject for all of you? Cool. Thanks a lot. I'm really sorry for being late. Sounds like we keep that 30 minutes. Yep. I need to publish the meeting note instead of Olivier now. Since he's not available. Mark, I trust you to update the upload on YouTube. Yes. Okay. So next, next, meet next meeting should be next week as usual. Thanks a lot. Have a good day everyone. Bye. Bye bye.