 Hello everyone, welcome to the GeneKins Weekly Infrastructure Team Meeting. We are the 6th of June 2023. Today is around the table. We got my safe damage portal. We got the RV plumber. We have Stefan Merle, Mark White and Bruno Verhardt. Hello folks. Let's get started with the announcements. Weekly release. First important information. We will have a dead GeneKins score weekly release, the 2408, despite being tagged on the repository. It doesn't have any artifact associated. And the version 2.409 is being released. Too long did it read. We had an issue with the LDAP. That's obviously with the Murphy's law upon the worst moment of the release publication that led us on a partial release. So instead of trying to fix that and forget half of the thing, it has been decided with the help of the really person that the usual release suspects, as I named them, to trigger brand new release and declare 2408 dead easier. And we should have result in one hour or less. Thanks everyone for the support on this one. Yeah, that's all. We are watching it. And then the usual steps. Another announcement in the upcoming days, we are going to proceed to a migration of the LDAP and get GeneKins mirror or public services to a new cluster. The date and time will be announced. But I think that's important enough to mention, because some of the services here could be critical to the end users. That's part of us trying to get rid of the old overlap network. Do you have other announcements? Nope. Two dot. Oh, oh, actually, yes. So get dot Jenkins.io is going to be used by Docker containers. Yes. So I don't know if that belongs in announcements, but it's the migration. Yeah, they're later. Talk to it later. Will be announced. The GeneKins.io dash Docker image. Yeah, I don't know that it justifies an announcement. Damian. Okay. Okay. Now upcoming calendar, unless you have other announcements. One, two, three. Okay. Next weekly will be 2.410. That should happen 13 June 2023 next week. Is that correct? Yes. I don't know for the next LTS release. I haven't looked at the calendar. 2.401.2. And it will release on, let's see, we are, I believe, release candidate in one week. And final release in three weeks. Give you a date in just a moment. So final release date 28 June. In three weeks. 28 June. Yes. Okay. I haven't looked at the adversary, but I haven't seen any mail on my inbox. So yeah. Yeah, no, no announced. No announced upcoming advisories. Okay. And next major events known. Okay. Anything else to add? Let's get started on the task. We were, we were able to finish. I am 64 not pool to start using RM 64 pods. Stefan. What are the wonderful news on the IRM 64 area? And we got another pool with IRM 64. The availability zone one. And it's important because if it's not on the one, it's not working. And I also moved the, the Java doc. Jenkins.io application to that. Not pool IRM 64 one. So it's working great. Officially Java doc is now served by an IRM 64 system. Yeah. So far so good. Congrats. Yeah. And helping getting arm 64 to be known by more people. It's a good thing from my point of view, of course. Yeah. Congrats. Stefan. Do we have another issue? Or are you okay to create one if we don't have that will list the upcoming candidates for that migration to IRM 64 because I'm sure we have other services there. I will really need your help to find them. But yes, we can do that. We don't have no. Okay. So issue to open. I don't think so. I don't remember. So your role is to check if we have one already one. And if we don't, and then you will work together to select the other, the criterias will be already migrated to the new cluster that require a synchronization with RV just to be sure we don't step on his toes. We can wait the end of the full migration if we don't fill with that. Second criteria is having an official IRM 64 image that case Java doc uses the engine official image. And that's all we don't build a custom image for this service. So any service using an official image that has an IRM 64 version out of the box are good candidates for that one. Oh, I thought we will put everyone and add a sub issue that needs to work on the container. But okay, we filter to only those that we can already move. Okay. We can get the full list on exclude. It's the same thing as the pipeline matrix access. Thank you to remind me that thing to list potential candidates for using RM 64 in production. Another note here, evaluating the costs difference will be hard in the sense that we don't we know the cost of the former cluster. We know the cost of the current cluster given it's being worked on right now. So it's, it will be hard to really have an order of magnitude until we have finished everything. And that will be hard to distinguish between do we pay less because it's part of the work that I read it on selecting the proper not pull topology, the proper network, the proper machine size for the not pull, or is it related to IRM 64. The answer will be because of both, but that will be hard to distinguish. But for now, we will pay more because there's only one service on that not pull, which for one server is huge compared to what it has to run. True. Absolutely true. But we know that one machine is clearly less. So in any case, that will be interesting. Do you have something else on the IRM 64 subject? Okay. Our main objection is Stefan starts now, or do you prefer waiting for the full migration for the other service? How do you want to play it? I don't know, I think. Okay. So Stefan, you have to really sync with the rest of the team when you will do operations. Because as you saw, we might have surprises when migrating to IRM 64 pod, not pull, sorry. So just to be sure that we, we don't interact. The priority is to survey for the migration if there is a conflict in terms of timing. Next issue can create an account. Sorry. Belgium Air Force is training Ukrainian pilots on the F 16. So yeah, they are playing around on low level altitude. I closed the issue because we never heard back from the user. That's the user who claimed that they never received the emails. But as the work that we did on Stefan here, we see that mail gun or sun grid, I don't remember, but our email system that we gain access to send the email to the email provider. So we repeatedly ask the user to either use another email domain or contact their email administrators to see why are the email not delivered. But as, as we saw the email is sent and acknowledged by the email server of that domain. So there is nothing else we can do about that. And most of the time the answer, the user answer with a two or three weeks delay. So yeah. As we said, if they can reopen, but I don't see anything else we can do. Do you, do you see other action that I could have missed because email is not really my knowledge area. No, you can't see the delivery status. There is nothing you can see, you can do. I think we spent so many enough time with this user. He received some mails and he might have received the last one, but didn't check it back. So we gave them two paths they can use for being there. I'm sorry because maybe that user will want to contribute or open issues to the Jenkins project. And it's slowing them down or blocking them to do so. But I don't see anything else we can do here. So let's see. At least the outcome of your work over in that area with the help of Stefan is that now we're able to observe when the email are sent or not. So good job team. That's, that's still really useful. Remove translation plugin for CI Jenkins. I might have closed this one too quickly. I just want to check with you. I think you removed the plugin from CI Jenkins. Are you all right? Yes, but I haven't. I didn't have the opportunity to restart the Jenkins. That I owe because there were. Every time there were some core or some bomb builds. Okay. In any case, it has been restarted a few hours ago. And the plugin is not marked as installed as of this morning. So in any case, the change was applied. So thanks for taking care of this one. Puppet master migrate virtual machine to Azure surprise migration that wasn't planned at all last week. Summary as part of the Ubuntu 2204 campaign that was planned. We updated the OSL machine with the plan of we try Edamame and Lettuce. That went well. We started to upgrade the Puppet master VM. Which failed and never restarted. We had to ask OSL for help. So we decided really quickly we were able to finish the update to Ubuntu 2204 only to discover that Puppet Enterprise master doesn't support Ubuntu Jammy. Open source version does though. And by mistake, I checked Puppet open source agent and controller assuming that the enterprise version would have the same requirements. That's not true. The virtual machine has been upgraded to the latest LTS version of Ubuntu done grading it back to Bionic. No, not Bionic Ubuntu 204. That wasn't possible or too risky. So the emergency choice because we weren't able to deploy new configuration changes across our virtual machines due to that problem. After discussing quickly with the team I took the decision to create a new virtual machine on Azure to host the Puppet master that machine is using Puppet 2004. That should be the only machine to stay on focal. Because the goal was also to stop using Bionic as much as possible since it's out of date. That machine has been created. The good outcome of this one is that now since we have sensitive credential on that virtual machine, we control this credential on Azure virtual machine. And that led the former virtual machine on OSU OSL to be used like Latrice or Edamame for something else. They are not used right now. One of the proposals I have and I'm raising right now is to tell OSU OSL that we ask them for a snapshot of the virtual machine and they can release this machine to avoid consuming resources on their cluster. And if we need resources on the side, we send them a request for brand new machines. That would avoid us to have legacy machines with legacy system that has been upgraded all over the years. That will release the resources for them so someone else can use them on another open source product. And if we need more, we can ask them. But that's to play the boy's good rule that clean up on a better way because it's been two years that we say we have this machine doing nothing and we haven't done anything because we don't have the time or the resource. Would that be something viable for you? Yeah, plus one for me. That will be less machines for us to manage, less code to manage. And since Bruno and I have a task to contact OSU OSL to ask them for, let's say, exotic CPU machines that will be a sort of trade-off. We let them release resources and we use something else. I propose we will do on two different email thread though because ideally, if they can share with us the snapshot of the machine before deleting, we could put this snapshot on the archive bucket on Azure that Hervé created which is encrypted. So in any case, if we need an old, old thing for whatever reason, we would have the file system of this machine available. Looks good for you, folks. OK. Yes. So we'll take care of opening issue. Should to release lettuce edamame and radish with this snapshot to archives. Any question? No? OK. Getting an authorized with CD setup with code build cloud plug-in. So that one is related to the trusted CI migration that effectively happened Friday. So user was blocked because it took more than the three or four hours before expiration of the CD credential to GFrog. There is a job named repository permission of data in trusted CI that runs regularly, which role is to update and rotate this credential. So since we were migrating the instance and we took more time than the two interval of run for these jobs, that had an impact on the users that was already explained on the mailing list and status. So thanks for everyone guiding the user in that direction. At the end of the migration, we were under job successfully. The user confirmed they were OK. So no problem. That was an outage. Sorry for the inconvenience. That was a short term notice for that migration. We already noticed the morning for the afternoon. So room for improvement for the announcement next time. But outside is nothing special there unless someone has a proposal or something to point here. OK. Related to trustee.ci migration to Azure. CI Jenkins plug-in bomb agent are not allocated. That issue has been closed. But what we tend to see is that when the CI Jenkins IO controller restarts, it looks like that there is a blocking somewhere inside the controller that forbids Jenkins to try to create new pod agents. Stefan did a complete exhaustive job about checking if it wasn't related to the spot virtual machine allocation in the Kubernetes cluster that supports the bomb builds. That wasn't. We didn't see any problem on that on the Azure console. There weren't any limits in podcota CPU memory. No error were happening on the auto scaler. We didn't see any issue on the lower level infrastructure. But what we saw is that Jenkins stopped trying to create pod after a certain amount of time. And the restart after the restart tried nothing. While the other jobs were still creating new pods. So that could be related to a behavior of the Kubernetes plug-in or the bomb builds. These are the two things because we have right now three Kubernetes Clouds set up on CI Jenkins IO. Each Kubernetes Cloud has its own OK HTTP thread pool for the Kubernetes client. So either the problem is located on the way the Kubernetes client works inside the plug-in and that could explain why that Cloud is somehow facing a fix of activity, a restart and it breaks something. Or that could be related to the setup of the bomb builds job that might not be set up to survive a controller restart. And in that case, that might be a bug in the way that the agent and the process should be stopped and released. So there is something weird and we don't know there is a lock or an issue here. It's hard to pin it to infrastructure at first site, but maybe there is something non-obvious here. I propose, as we mentioned on the issue, when we see such a problem that we collect a thread dump of the controller, as we saw as a team exercise earlier today, collecting a support bundle from the top level item on the left is enough because the freedoms are present. Then we keep this bundle and analyze them afterwards. When you see that problem, a controller restart or reload from disk, that one is safer, reloading from disk, unblock the problem most of the time. If the reload doesn't work, don't hesitate to restart the controller. Had a message, a shutdown message, restart and then it should start again the bomb builds. That one coincides with the LTS upgrade of the controller. That might be the low-level issue. Changing the core version failed to restart the build at the restart. Do you have anything else to say about this one? It's closed because the problem after a restart was solved and the bomb builds are back again. Invite Mustafa into the Jenkins Infra plugin scoring. Thanks, Hervé. I think you took care of this one. I think that's usual operation. Is there anything to say on this one? Thanks. We had an LTS release last week. Of course, we upgraded all of our controller less than 24 hours after the official release. Thanks, everyone involved on this. We are unissue related to accounts. Someone did a mistake or did not answer and was trying to reset the password of an account that doesn't exist. It looks like a mistake between our system and someone else's system. Thanks, Hervé, for taking care of this one. And now back to the tasks that we are working on. For each task, as usual, we have to state if we can continue working on the next milestone. We'll need to leverage the amount of tasks I'm working on because Friday I might be off. I don't know if some of you are half days off for the upcoming milestone. Yes. I'm in PTO from the 15th of June to the 20th. 15 to 20. Okay, so that shouldn't impact the next milestone but the milestone after. Is that correct? Yeah. Just wanted to be sure I understood because I'm bad with number in English. Cool. So good to know for next week. Okay, so let's get started. Jenkins CI filling for Jenkins plugin after changes in Jenkins file. There is a word issue here. A user is opening pull requests and was set up as administrator of the plugin repository. But his pull request was still seen as untrusted by CI Jenkins. I'm sure we missed something obvious. So we replayed multiple times pull requests that changed Jenkins file because his goal was to test that Jenkins file change. So it was locked and their need for asking. Initially they weren't, they didn't have the correct permission. They weren't using the correct Git commit system that has been fixed and still issue. If they merge the pull request, their problem should be gone. And team has invited the user. So the user should be able, at least after merging the pull request to again try something else. Now that the pull request mark the user as untrusted, there is no way this could be changed as per team. I'm not sure of myself. I remember that we were able to just rescan the repository that should change that behavior, rescanning pull requests, but team says no. So I might have confused memory of what we did on Infra-CI a few months ago. So the proposal is we let the user merge to the main, create a new release and then open a new pull request with their new permission to see where the problem comes from. We check it's unrelated to yesterday GitHub issue. That issue happened since two weeks. Most probably it's the, we might have an edge case where the user is a direct admin of the repository, but outside the Jenkins organizations, maybe GitHub reports wrong collaborator status when it's scanned by Jenkins. But that's just an out of the wild assumption. I don't know really what is happening. We don't see any obvious error here. So I hope what team did by setting the proper permission would help and would let the user. So I propose that we add this issue for watching only. I don't think there are any action expected. So we add it to the next milestone. Is that okay for you? Okay. Let's watch the result of permission. Just merge of the successful. Can you give us a status of what happened on the migration of public cluster? Yes. Let me check which we have only four services remaining to migrate. The LDAP service currently in migration. And then we have mirror bits and public sites and Jenkins data. Okay. So four service left. Cool. Remaining to migrate. Almost there then. Yes. Jenkins.io and plug-in site. So that mean all the other were migrated successfully. Yeah. Go ahead. Yeah. I'm looking at the issue to remind me which one has been migrated since. Okay. I think I did some and then I ended over when you went back from your PTO. Is that correct? Yes. So uplink was migrated, but I think we mentioned it's not. I couldn't. I couldn't. And report the Jenkins data. Reports, accounts were okay. And uplink. And uplink. Which is the one you want over to me. Okay. Uplink, account and reports were migrated with success. So right now you are working on LDAP as we mentioned earlier during the meeting. Yes. We had, we recreated a new storage equipment to start the LDAP back up. And doing so we are going to write some issue. And. Configuration. And by the way, don't forget to close the status once you have finished later today. Once. Yeah. Yeah. I want to deploy LDAP on the new cluster before. Yep. New LDAP. Installation. Required. Oh, what did I do? I changed. So as you say, the file storage that is important, I think more for Stefan to know. We created the storage accounts and as a very, uh, found we also need to specify file storage, which is a sub element storage accounts allows you to create. Bucket, which are blob storage, but they can be on different types. And we need the specific type of storage file storage, file storage maps to SMB mount. On the current stage. Yeah. The alternative being an NFS file storage, but on premium storage accounts. Um, installation required file storage sub element of storage account. To be created. Yes. Yes. Yes. Yes. I'll share. Good point. Okay. Yes. Yes. Yes. I'll share. Good point. Okay. So now we on the new release and publicate I'll try to restore recent existing dump. If then it's works. I'll I'll continue the migration by switching accounts. dam to theanız While working on the next step, so no rights will happen on that base and I will be able to do a backup of the existing one and restore it to the new one right or modification. All right. I'll switch back to the same name. I couldn't do the Jenkins IO to its correct location. Perfect. Good to go. That will be tomorrow. Don't forget I will be on the road tomorrow. So let's over communicate but that looks good. Any other question on this one? Okay. I propose that obviously we keep that one on an upcoming milestone. Next one. CI Jenkins IO use a new VM instance type. So the goal is to transplant every changes until the last one that has been done on the new trusted VM on Azure to the new CI Jenkins IO virtual machine that is not used yet. Once validated and bootstrapped, there will be a requirement for an initial data copy. I tried to take a snapshot of the current CI Jenkins IO. I was able to create a new data disk from that and mounted on the new VM. But terraformed definition of data disk requires you to define the snapshots, the data disk can just write down explicitly that the data disk comes from the snapshots, which is not the easier way because after we want to delete the snapshot because we won't need it anymore in a few weeks once migrated. And then that will let the data disk in a state where removing the snapshots recreate the data disk because the reference doesn't exist anymore. So it will be a recreated empty. That's really weird behavior. But yeah. So what will be done there is that I will use the snapshot for a temp second data disk that will be mounted manually on the virtual machine and I will run a her sync locally. That her sync will be fast. They don't plan a kind of merge between the snapshot and the hard drive. You create a hard drive, a data disk from a snapshot. But that data disk must be managed as code. And on the as code definition, there is a property which is either empty or copy. If you try to import the data disk from the data disk created from the snapshots, terraformed says that attribute must be copied because it has been initialized with the attribute as copy. That attribute is immutable. So that means on terraformed you need to define copy and say setting copy required to add a second attribute, which is source ID, which will be the ID of the snapshot. If you remove the snapshot, the definition is checked and it does not exist. So yeah. So that's why let's create a brand new one and do her sync and that would be okay. Yeah, more secure. So yes, so I keep working on that. That's my top priority. The goal is to have a new virtual machine for CI Jenkins as soon as possible. Data disk, her sync to do. And related to the artifact, I think artifact caching proxy and reliable issue, we will need to migrate to inbound agent on a new network subsystem to follow the network changes of the instance that would need to be done before migrating fully. So agent Azure, the agent to inbound in the new subnet. These are the two required tasks before planning and announcing the migration that will mostly happen in two weeks. Migrate trusted CI to Azure that has been done successfully. There are one last tasks. There were two last tasks as for this week. One has been done. The network has been restricted and checked everywhere. The last element will be that might be a subsequent issue to select the proper size for the virtual machine fmr agents to use the same as CI Jenkins that we changed a few weeks ago. Because right now we are using cost instances. And we don't have the same quota on the new region where the new VM is. Because before the AWS trusted CI was spawning fmr agent on U assist. But right now we need to be on the brand new subnet and everything is on U assist too. So the quota between region or difference. So that should be a poopette configuration change. One test and the issue will be plausible. It's one mile updates vmh and quota or type for U assist too. So yes, nice work, Stefan. Thanks for the handover. We were able to learn this one. It's working well since Friday. So now we will have to watch the request from security team to access trusted CI because they will need to provide their public IP. That's a new change from the former instance. The VMs has been stopped on AWS but not deleted. I propose that we wait until the next either LTS release or security core release. So that will probably end of June. So we'll delete it in July. Any objection? Okay. So that one is obviously on the next milestone. That's only one last change. Ubuntu 2204 upgrade campaign. Right now for this one, we have one candidate that will be search CI Jenkins.io. The proposal is to do an inline upgrade because it's on Azure. So the SDS way is taking a snapshot of the system. Since we control the virtual machine, we do the upgrade. We change the puppets in frac config to use Docker, the Docker 2024 version. And that should be okay. I propose to, if anyone is interested, I propose to pair on this one because I did most of that work alone and not as a team last week. So better to work it as a team. Upgrade to Kubernetes 1.25. I've posed this one due to the puppet trusted and LDAP migration. I propose to put that one back out of the milestone because I accept reading the changelog and preparing the next step. No one will have too much time on this one. And I propose that we go back to starting upgrading cluster in two or three weeks, depending on our availability. Because if Herve is on PTO, maybe we won't have time, Stefan, but maybe we can start with the DOKS cluster eventually. AKS will stop proposing Kubernetes 1.24 end of July. I don't remember for digital I guess it will be end of June. So yeah, back to backlog. Herve, did you have time to spend on the install and configure data. Do you begin on CI Jenkins IO? No, but I have to borrow you some time to check on that and then we'll have them. Okay. As Stefan and I saw, we might need to prioritize these tasks once LDAP is done because there is a lot and when I say a lot, it's a lot of error logs on CI Jenkins IO due to the plugin trying unsuccessfully to contact the agent. Maybe I can try to change the data plugins configuration on CI Jenkins IO to see if I can If you have a solution for that, yes. I'm not sure it will be that easy. Are we using the plugin? Because the simplest case will be uninstalling it. But if you find just one configuration setup that will stop the plugin to contact the agent, that might be worth it to avoid triggering a restart. If you need time, we can work on this Thursday. It should be okay for me. Okay. Let's think on this. Thursday. Support Linux container when running on Windows VM. My initial manual tests, I wasn't able to have a running Docker desktop. I was able to install Docker desktop, but it was always failing to start the Linux WSL back end. I haven't tried installing WSL before. I was assuming that Docker desktop was using it and I guess that's the problem I had on the manual tests. Maybe I should just install WSL, spin up Debian and then start the setup. Back to backlog for me because I have too much thing and I haven't spent too much time on this one. Unless someone wants to try it by themselves. The goal is to have Docker desktop up and running on Windows Server 2022 ideally as code with Packer Image. No volunteer. I won't be able to work on this one because I will need a Windows machine. That's a closer feedback look to try things for me. Yes, I installed Windows Server on my desktop. But anyone wanting to work is welcome. Artifact caching Proxine Reliable related to CI, Jenkins, SAIO, Agent on VM migration to a new network, still progressing. I want to early ponder that the fact that the ACP is still working well as per LV work on Datadog that allow him to pause the log and check the amount of data that is served by the ACP instead of the repository. We are between 8 and 12 terabytes per month approximately of data that is not used. It was between 4 to 5 terabytes last week. Is that correct? Yes. That's a real matrix. That's a lot. ACP is still doing a lot of work and saving precious bandwidth for GFrog. I didn't add time and we absolutely need to help the user add JITPAC to repository available. In order to unblock the user, we need to add a new exception on the ACP repository ID. I don't remember if it's that simple. I think there was some details. Is it okay just to evaluate the units? Was binary published to JITPAC? So we have an external artifact repository and it works for local builds as it did for until... Oh, it's probably that but I don't remember if there was some message below. So they tried to use the JAR dependency trick. So you point Maven to a local directory that is used along the other repositories. If you follow the naming convention that we had it, that will be ignored and won't use ACP and will use the local files. But yeah, the issue is that user wanting to upload JAR on artifactory but they don't have the permission. Yes, they propose to use it as a proxy. That will mean adding one other mirror. But right now we are trying to restrain the amount of mirror. So proposal is to add instead a new exception for that repository. That means defining eventually an ID, a conventional ID. Yeah, I would have to propose a prerequest in the public input to use a local repository that's a repository named local like we've done previously. What do you think about since we can use pattern on the exception to test? So first I propose we add JITPAC to unblock them and then we should be able to propose a convention for developers. If the ID of the repository on the POM XML is a non-cached dash something that could be external or non-genkin. So we could use a generic pattern if they had a certain prefix that will automatically not be cached. What do you think? Yeah, there is a probe. I've suggested a probe to the plugin else covering detecting the third party repository. So we could use that to see and check what these plugins are using as an ID for their external repository. Yep, to avoid ACP caching third party repositories. Let's add JITPAC as exception on short term. Proposal for plugin health probe or third party repo. Is that okay for everyone for this one? Is there a proposal? Is there already a request for this proposal? Okay, can I let you update the notes with the pair? Okay, are you okay to take this one Arviz? That's okay for you. Cool, thanks. So I'm adding it to the next milestone. Assess Artifactory Bound with Reduction Options. So that one was opened by Mark. I forgot. So we will need to plan brownouts. A brownout is a planned blackout of a service on a short amount of time to see if everything explodes or to identify what could be a problem that we weren't able to identify at first step. The first brownout will be trained on GIFrog to set the GIT proxy repository. It's a mirror of a real life repository. So you use Artifactory in front of this repo. We want to set its private and check if it's effectively available without authentication through the public virtual repository. If that works, we might want to switch all of the mirror repositories such as Maven repo one, but all of these other as privates to Forbid user to publicly access this repository directly. To avoid to Forbid this user to use us as a free and free QS mirror. Don't get too safe on this one because once user will discover they can still use the repo junk in CI slash public because it's a virtual that includes all of these dependencies, then part of the traffic will shift to that repository. But the goal, since we don't need it, I mean that the person who will do that, they won't do it accidentally. But the user that are using this one might are doing it accidentally. So better to shut down these accesses and have a centralized point and then iterate and see the impact on the bandwidth reduction. So we will need to announce the blackout. Ideally, the GIT should happen this week for planning the Maven repo one next week and the other. So we should be middle of June with the first feedback to give to Shifrog. Good for everyone. So I will update. I will comment it after last exchange with Mark. And then if it's okay and we will proceed most probably Thursday afternoon morning for the US. That's the proposal I make here. Good for everyone. Proposal of first brown out Thursday, we are six, eight during morning US time. Matt Tomo Github Docker repo, I assume it's related to a new issue. So gave in proposes help to switch from Google analytic to self-manage Matt Tomo instance to collect the visits metrics from plug-inside Jenkins IO stories and eventually other static websites. So we dug up the work he tried and never finished a few months or years ago. So the idea is to have our own instance of Matt Tomo. I'm not sure why we need a custom image for the Docker unless Matt Tomo doesn't provide one. We have another issue in triage state that will be about installing the end charts. We might need to, if it's okay for everyone, we will update on this one and we will link everything because the other one is mixed on the Google analytics. Yep. I need to clean up the issues and follow up on the initial assessment from giving. We need to assess the amount of storage, the requirement for database, the requirement for the entry points that we need the back end, the front end, just to be sure we use the correct ingress. Gavin already gave us the required answers. So you need to assess them as a team just to be sure we use the correct storage and persistence setting. Is there any question about the goal of Matt Tomo? No, is there any question about the service in itself to self-host instead of Google analytics? No, is there any question about the subject at all? I will not ask any question about the name. Yeah, I discussed with you quickly about other alternatives like plausible or good points. But I think since Gavin already knows Matt Tomo is willing to deploy it, I think we should stay with Matt Tomo. Maybe other can be later. Sure to update. It won't be a lot anyway. Other alternatives. Do you have other names because I don't know that area? Right now, plausible. I don't remember the other. I had two or three in mind before, but I just remember plausible right now. Are these alternatives self-hosted or are they self-hosted? As we have enough trust in Gavin's work, he used Matt Tomo for the past 18 months for plug-in sites additionally to our Google analytics. We can proceed. Does it capture what would you say, Hervé? Is that okay for you? So that means installing a new release of a service that looks like it will be installed on the new public gates cluster. Yes, we have to check if it can work with PostgreSQL and not only in MySQL, so we can use the current PostgreSQL server we are using for other services to try the database in it. Yep. Ideally use our existing PostgreSQL. Worst case, we still have flexible MySQL server instance on Azure. Worst case. That would be a fallback if it doesn't work. Bruno, Stéphane, just for you. I'm not sure that will be a good candidate at first sight for IRM because it's running PHP and I have absolutely no knowledge of PHP support for IRM 64. That might work. That might not. I don't know. So if you want to try on IRM 64, we have to check with Gavin, but I guess Gavin was using Digital Ocean since he worked there. So I'm not sure he has IRM 64 virtual machine for his tests. That could be interesting to evaluate, at least for the front and middleware stages. Yeah, of course, I've got some ORM 64 machines on Oracle Cloud FreeTour. I could use for all the tests. So why not? Yeah. It's not mandatory. It's not urgent. It's just bonus. Pure bonus. The official Matomo Docker image is IRM compliant. Oh, nice. So that could be interesting. Only MySQL is supported? Yeah. Oh crap. Okay. So we will assess in details, but that means we will need to create a MySQL flexible instance on Terraform Azure. And we should update saying, oh, maybe good candidate for IRM 64. Other question about Matomo? Okay. Recent plugin bomb release fail after an expected long time. I think that one was fixed. I thought it was fixed. Oh, it's about the build time. Okay. So that one required the work on the CI Jenkins IO migration to be evaluated again. So I propose that we move this one on the backlog. Unless someone wants to debug it, of course. Back to backlog requires CI Jenkins IO migration. Okay. Let's check together the new issue marked as triage if it's okay for you. Oh, I was particularly talky today. Four new triage issue. We had the CI Jenkins IO repository scan fail with a stack trace. So that one should be closeable. We'll take care of this one. Yesterday GitHub changed something on their API and all the Jenkins instance of the world using the GitHub branch of API, so native GitHub. We're showing that error when technical user set up for the GitHub scanning, whether organization or multi-branch had the maintain status. I'm not sure if it's a user with the maintain status or the maintain status themselves. I've seen two different cases. So I'm not absolutely sure of that one. But we had issue that was on GitHub side, but they were really efficient. And once the error was reported, so always report error when you have such error, such problem, they roll back the change and they took additional measures so that it won't happen again. So the issue is closeable as far as I can tell. But I removed the triage that was part of the milestone this week. Package availability dashboard is empty. I think it's a consequence of the cleanup that has been done on Datadog. We have a package, sorry, a dashboard that is relying on a metric that doesn't exist anymore. Is it because there is a specific label? Is it because we changed the metric? Because we stopped collecting that metric? I don't know. This has to be checked. That's a public dashboard. So that's a dashboard we see on Datadog that has been published. So it can be seen publicly. I think it's on status Jenkins.io. So it's not high priority, but if anyone has time to check, I propose we add it as a bonus on the coming milestone. Is that okay for everyone? I will let you assign it to yourself if you have time. Yes, I'm removing triage for this one. I've opened an issue while trying to migrate a link. Realize that we have a server which is a single server that cannot be replicated and that has an EV database. It takes seven to eight hours to dump the data and three to four hours to restore it. On an highly optimized pager dump on a machine with 16 CPU all used at 100% during the dump. So highly parallelized. Mainly because the structuration of the data table is there is one big table with tons of records. So there is no way you can parallelize. We could try to create the proper index in the future to have improved dump time restore, but the most efficient way will be migrating to a flexible server as a server-side migration tool. For less than for 80 to 100 gigabytes of data on the database, they claim to one hour only migration with the service being shut down. Clearly way more efficient than us doing a client-side pager dump. The proposal is that we work on migrating a link with a planned outage to a flexible server and eventually either migrate the database from flexible server to our current instance or we can directly migrate to our current instance. I haven't checked in details if both are possible or if we need a two-step process. Why migrating from flexible to flexible? Because these instances from future to flexible. Yes, but the flexible server allows you to create replication between multiple flexible instances. So that's why I assume the two-time process. First we create a new flexible and then we migrate it to our current one. But eventually the migration tool allows that if you already have a flexible instance. I propose that this one goes to the backlog because we have to finish the public migration right. Unless someone is ready to take this one, how do you feel about this one? Oh no. I believe this one will be a better fit during the summer with less activity. Oh yeah, that's right. I'm removing triage. Is there any question about this one? Are you sure there is less activity during the summer because people will have time to do some open source work? Oh, I mean for us. Not the users because in any case that will be one to two hours outage. So yeah, but good point. Got it. Another one, exploration of the DigitalOS and PAT. I'm taking this one since I'm the only one with the MFA access alas. So if it's okay for you, I'm hiding it and I will do it later today. Good for you? Yes. Removing the triage. Do you have other new issues or things to add on the pipe? Okay, there's a lot of triage to be done here. Okay, something else to add to the upcoming milestone? Something else to say? Okay, so I'm stopping the recording. So for people watching us, see you stopping first. Screensharing, now stopping recording. See you next week.