 Okay, welcome to the weekly infrastructure team meeting. So first of all attendees, so Aditya and I are the only attendees today. Regarding announcements not so much, it was quite calm except of course the new LTS that has been released this week. So first of all, Aditya do you have any subject that you would want to talk about? Oh no, nothing, nothing to say. So I might be superficial on the subject today since we are only the two of us. Do not hesitate to interrupt me to ask questions for any subject. Chances are I that we will cover again all these subjects on the next team meeting. So first of all, right now we are having a maintenance of the repo jenkinci.org repository. So that is a chief frog managed service. So they sponsor us by using that artifactory instance in their cloud. And since the past two or three weeks we were facing issues for the downloads, sounds like front issues, some artifacts were not being able to download temporary failure on some areas on the world. So sounds like related to their mirror infrastructure. And so they are migrating the version of whatever tooling they are using and migrating the database as well. So we should have had downtime during the past hours. I haven't had time to check, but for sure we will let you know. Everything has been communicated today on the status page on the development main link list. So we'll send a hand of maintenance email to close that afterwards. The expectation is that we should not have any error starting next week when downloading artifacts from CI Jenkins or developer missions. We had weekly an LTS release this week. So that was pretty packed. A bunch of issues were met for both. We will have to do retrospective to improve the next ones. I don't have so much information on that part except something related to the S390X and PPC architectures. I will speak about that later. But we were able to release. It just took longer for both and took more effort, but room for improvement. But great job everyone involved on this release because we were able to deliver and that worked very well. We switched from adopt OpenGDK to Eclipse Temerine. I think it's the correct pronunciation. So please pardon my English. So Eclipse Temerine is the new home for the adopt OpenGDK distribution project. So it's under the Eclipse Foundation. That change had some impact because we are early adopters in that case. So we changed only the Docker images and the team had to rework some elements because we only have two operating systems supported officially by Temerine as for today, Debian Bulls Eye Only and San2S7. Which means we had to switch to multi-stage Docker image using the awesome G-Link commands. The idea is that we create the first Docker image supported by Temerine and then we extract the GDK and pack it using G-Link. And then we copy that packed version of the GDK on our own image. Instead of just installing a package. The positive side effect is that we gain hundreds of megabytes on each image that have this system. So that's quite nice. We are still having issues with Alpine Linux because we had to use Jellipsy, which is not a good practice at all. One should not use Jellipsy, there are plenty of good reasons because it uses muscle. But it's temporary measure because Eclipse Temerine is currently working with Alpine Linux developers to have a native supported image. Another consequence, we're not sure if it's directly related to Temerine or if it's just an issue that has been made visible. We had to suspend the build and distribution of the PPC and S390X images because a bug on KEMU, the emulator we are using today. Two solutions for that, using native machines, but we need two machines for each, one for CI and one for Trusted, where the delivery opened. And Tem was able to track the issue on KEMU, which has been already released. So we have to wait until that release to be updated on the infrastructure to be able to put them back. So let's wait next weekly, that should make these images back. I'm sorry if you started to use them, but these two images were not official, they were just new shadow release, so not so much impact there. But we are deeply sorry if you are testers on these images and that you were not able to use latest release. Any questions so far? Well, I understand all this, I just wanted to know how Temerin is better than OpenGDK. If you can provide some pointers on that, it would be... Yes, Eclipse Temerin is Adopt OpenGDK. It's just that the whole infrastructure for building, maintaining and distributing Adopt OpenGDK is moving under a foundation, which is the Eclipse Foundation. And so they changed the name to materialize that change, but it's the same code, it's the same project, it's just under the foundation management now. Okay, got it. So they're starting to move and right now they just support a couple of images, like these operating systems and then when they finally move, probably there might be more coming up. Like you talked about that in a conversation with Alpine, right? Exactly, that could be on a form of a g-link or something like that with a specialization per distribution. There are long discussions without absolute answer right now, without consensus that will be moving part of that specialized per distribution work on the package maintainer, meaning if you are a maintainer of Alma Linux, that's your responsibility to provide OpenGDK under packaging Eclipse Timerine for your distribution, instead of having an upstream build. Got it. So it's kind of sharing differently the responsibilities. I mean, each project has that kind of issue, so the move to Timerine raised this question again. Okay. Honestly, I don't know what are the supported timeline LTS for OpenGDK. I feel like we are early adopters, that's why we had these issues. But that was not really impacting the main lines of our images, so the IRM and Intel images. And that's already a safe ground to test these things before it comes to an LTS or something like that. So the next subject is the switch from ACI to Kubernetes agent, so on CI Jenkins IO. For costs, resiliency, and technical reasons, we started to add Kubernetes agents next to ACI. And we completely removed ACI because it started to behave erratically. We weren't able to finish a single pull request on Jenkins Core two weeks ago. And also in terms of costs. So after two weeks of production usage, so far so good. We have removed everything related to the Docker API limit and secrets. So we have a minimum system working. We have auto scaling of the resources. So the next step will be improving the costs in the future and adding different Kubernetes provider for resiliency on different clouds. One of the most important elements here is the depreciation campaign for the use ACI parameter on all pipeline of all plugins that we provide for developer on CI Jenkins IO. So there is an issue in Fra 2919. I'm going to add the reference here. It's a non-breaking depreciation for now. And we are going to work on batching the pull request to all plugins to propose the change to the new parameter that will be used container agent instead. New upcoming change that we will have to study carefully. NGENIX ingress 1.0 has been released. It's the NGENIX ingress n chart 4.x. So for now expect the pull request that will force using the version 3.something to keep the old version, the current version we have and only benefit from patches and security updates. We have to check before there are breaking changes, especially with the support of the new ingress API, which is mandatory on Kubernetes 1.22. But we need to be on Kubernetes 1.20 to fully benefit from this new ingress. So we have to check what is the behavior of NGENIX ingress 1.0 on our current versions. But that's a subject to keep. Anyone interested or with NGENIX on that part is welcome to help us on that. We need someone able to help us to do the impact and either migrate to that new ingress or work with us in order to upgrade Kubernetes before doing the ingress upgrade. New subject that I'm raising personally. It's a proposal for a weekly task that will be 30 minutes each week of mobile per walk. So we start up a visual call, everyone connects. And the goal during these 30 minutes will be to bash issues on the Gira infra tracker because we have a lot of issues that should be closed that are out of date, that can be raised or bumped again or triaged. So the goal is tackle this problem down so we can go back to using issues as support for the actions on each weekly team and we could start going back to triage. Proposal that has to be confirmed by at least Olivier and any people interested next Wednesday at 1 PM UTC. So that will be 3 PM on Paris or 9 AM on the East Coast. Packer images. Some work has been done on the Packer images repository. The goal of that repository is to provide virtual machines templates for CI Jenkins. So we are in the process of update. We bumped Git and some dependencies. Expect the bump to the latest Maven version, 3.8.2, which is a patch version. We are currently running 3.8.1. And we are now going to build MEIs and all virtual machine template on each pull request. Because during the past months, when a contributor raised the pull request, it was not really built. So when it's a matter of downloading a new version of a package, we cannot test it until the change is merged on master. So 30 minutes waiting for that build and then a second time to rebuild when it failed. It was more than one hour for feedback. So our goal is to shrink that feedback. But the cost is that an individual pull request will be built in 20, 25 minutes. Because time to build the virtual machines and test everything. That set the ground for auto updating these images. So there are some work on update CLI, but we are almost there. Expect before end of September, all the update of the virtual machine will be automatic. Next step will be migrating the Docker inbound agent mages that are specific to the infrastructure inside that repository. So that packer will build the same environment. Either you are running a container or a virtual machine. Some new topics. So I will cover rapidly. And we will have to discuss that on the next team meeting with more members. I'll be in vacation from the 2 until 14. So we should not plan anything unless Olivier is available. Next big topic on September will be cost management. We started manual cleaning on Azure with Olivier, removing older and dangling resources. And now we saw some quick and fast improvement on the costs. So this month, we have decreased the cost of Azure. But the cost on AWS are increasing each month. So what we want is removing your new resource. Optimize the cost on CI Jenkins IO by using instances with the same resources, but less IO or less network because it's not always needed. And finally, there will be now that we are able to agent as code. We should start using spot instances on EC2. These instances can be destroyed at any moment. There is a time of two minutes before the instance is down. But the cost is twice less, even better. So that could be a great way for the tiny instances to gain some costs, especially when we scale up suddenly the Kubernetes. One last word about Poupet. So the whole Poupet on Jenkins Infra starts to be ordered to run tests and ensure that when we had a change, that change can safely be deployed to production. So now we were able last month to improve drastically the time between an IDE and the changes deployed to our production, which is really nice because we can update infrastructure really efficiently. But now, upgrading to a recent version of Poupet, tackling down CVS and providing a local reproduction and integration test, all these topics are quite sensitive because it break everything. And we might have to rethink the way we use and test modules. I wasn't successful after three days of full work on that one, so I need help on that topic. If you know Poupet, you are welcome to help. For me, the road, if I don't get any help, will be rewriting all the tests for our modules, Poupet modules, using the new PDK, that means starting from scratch and then migrating the RSpec contents module per module. OK, that's all. I tried to be under time. I don't know if you have another question. If there are things unclear, a topic you want to raise. No, nothing as such. It was pretty clear to me what's going on. Thanks. So then, I'm going to close the meeting and pause the recording. And I hope that next week, more people will be there. Yeah, holidays for the luckiest one. Good for them. Yeah, I think next week will be on holiday, right? No, that will be the only day I will be present. So I should be here as spectator. OK. Yeah, but yeah, this is under the time limit, but I will be working only the free. I have other company meetings so that I want to attend. OK. So I will be there. So see you next week. Thank you. See you next week then. Thank you so much. Take care. You too. Have a nice week.