 Ok, bonjour tout le monde, bienvenue à la Team Meeting de l'infrastructure Weekly de Jenkins. Aujourd'hui, nous sommes le 5 mars 2024, mais ce n'est pas le 23 mars, c'est quoi que vous faites ? Sur la table virtuelle, nous avons ma saison des portales, Hervé Le Meur, Mark White est off, Stéphane Mer, Pounovarton et Kevin Martins sont ici. J'ai oublié quelqu'un ? Non, ok. Nous commençons avec l'annoncement. Le week-learn est 2.448. Donc, le part de package est ok. Nous avons eu des comportements de l'agent. C'est la première fois que nous avons perdu contact avec l'agent durant la phase de synchronisation du mirroir. La deuxième build n'était pas dans le monde, il n'y avait pas d'erreur, mais c'était locké. Il n'y avait rien à faire avec la machine remote. Mais tout était déjà mirroir. J'ai joué le step manuel directement sur la machine de pkg et tout est gris. Mais c'est drôle. J'ai ajouté les steppes sur la chaine githènes. Si cela s'occupe d'une fois que je ne suis pas là. Donc, l'information est chérite. C'est un script pour renseigner le machine mirroir. L'image docker est ok. Il est publié, donc pas de problème. Kevin, tu as un update sur le change log? Un change log est publié. C'est disponible sur le site. C'est cool. Merci. Ok. Il y a quelque chose d'autre sur la release. C'est à dire que nous sommes prêts à rouler, Stéphane, pour l'update de infrascii. La prochaine semaine. Je ne serai pas disponible. Je ne me souviens pas si nous avons choisi de maintenir ou de consulter le meeting week. Mais je vais avoir besoin d'une personne pour prendre soin de ça. Si nous gardons le meeting. Est-ce que j'ai été arrêté ou est-ce qu'il y a un problème? Non. Ok. J'ai besoin d'une volontaire pour rouler. Vous avez besoin maintenant? Nous avons volontaire avec Hervé. La prochaine semaine. Je ne m'en souviens pas. Ok. J'ai un problème avec le milestone. On va avoir quelqu'un pour le replacer. Je vais vous déterminer. Je vais essayer de penser à ce moment-là. Juste en temps pour obtenir la proper permission. Vous avez un autre annoncement? Non. Ok. Je vais essayer de faire le calendrier. La prochaine semaine. La prochaine semaine. La prochaine semaine. Je ne me souviens pas. La prochaine semaine. La prochaine semaine. Ok. Nous avons eu un lts depuis la dernière fois. Marche 2020. 2020? Merci. Vous savez la version? 2440.2 Vous disait 20? 20 marches. C'est la semaine, vous savez. Je suis en train de lire les commentaires. Marche 2020, c'est la semaine. Oui, je suis désolé. J'ai parlé au 21 juillet. Ok, merci. Et le premier annoncement pour cette version. Merci, les gens. Cool. Nous avons un annoncement pour l'adversaire de la sécurité de l'adversaire. Si je ne m'en souviens pas. C'est le 6 juillet. S'il vous plaît, n'arrêtez pas d'arrêter le CIG. Ne faites pas n'importe quoi sur cette machine jusqu'à ce que le système soit prêt à rouler. On va laisser un jour entier pour tout le monde pour faire son travail. Parce que c'est un ordinateur. Ce n'est pas un ordinateur. Ce n'est pas un ordinateur. La prochaine événement sera la conférence de Scalics en Californie. Je crois au moins Marc, Basile et Alissa. Oui. Je vais me couper l'année dernière. 17 à 19 de Marche. Si vous voulez rencontrer Junkin, c'est le perfect place. Est-ce qu'il y a un annoncement du calendrier? Ok. Pardon. Je crois que j'ai oublié quelque chose. Ok, on va continuer. Let's have a check on the tasks that were done during the past milestone. Array, I might need your help from time to time. Delete Gerabot issue and delete users. That was fun. Ok, cool. Improve robustness of the city tool chain. In the incremental team Jacob has added some code to avoid failing the job when we encounter a 503 error from Kitabaki and 403 error too. So it's more robust now. It might also result in some potential not stored in Jenkins CI organization repository with the build child with C3 tentative which repository so we might have some for one error reported in a desk but less than now. Thanks. So Kitab city process fail with 401 correct. So Alex remained me and showed me that running the repository permission update job entrusted was sufficient to renew these maven settings stored in Jenkins CI plugins repository secrets. Ok. Same for the next one. Cool. That was the week of the error failing. Nice. And finally You were able to fix the issue on the mirror export. I used a better page and the mirror stats one and the pro is now we are not depending on the specific file presented on any mirrors. The con is that we lost the continent information of each mirror but it's not main information from this report so it's not that important. Ok. So I see we have already a new version because this schema was changed that's nice. Ok. Good. Any question on this task that were fixed? So we have two issue closed as not planned user opening an issue with no details and when they are asked question they never answer. If they don't answer. Now walk in progress I'm going to try to Oh what happened? I had an issue sorry. I'm going to try to sort by priority. So right now the first priority was easy copy Can you give us an update on what was done on that topic? There was three main repository using the plugin site and all of them has been migrated to new storage accounts from standard to premium and to version 1 to version 2 as most of them had cost related to transaction putting them in premium as a result to not pay any transaction so we have minimum of around 20 dollars per month per storage account as minimum cost but we don't have the variable cost so far most costly one was Jenkins data.io which was at 70 dollars per month all of them has been porté to this new storage account and are using service principle to generate short lived SIS token to synchronize the content to the other file storage Azure file share and the big benefit here is the time for uploading so main the reduction is quite big for Javadak we went from 40 minutes to 1 hour on Carter to 11 to 14 minutes with ACT copy the other build were reduced so first build less cost and secure development and up-to-date tooling nice the next one are the mirror scripts executed from the pkg virtual machine I have it's related to the next one the blocking point for us to add azure click on this machine so we can use it with service principle to generate SIS token as soon as tools is present in the puppet setup I'll be able to rework the existing script to use to generate SIS token instead of using one standard production team decision to stay on the access key code system as azure SP would add over in this context ok the service principle is for the updates in detail so I should be able to finish this week I think ok I've got the puppet part and the script part will be shouldn't bother me ok don't forget to announce it when you will change it engine just for the case if az copy changes metadata on the files or change the behavior of the copy that might have impacts on the update center so just announce it so if something goes wrong that is un foreseen then we will other persons such as daniel or alex will know and they can go back to that change ok for you and don't forget we will have to clean up if it's not already done there are many clean ups to do for the preview storage account I also saw some others that haven't been updated since 2 years and we that we might be able to remove be careful with stuff not used for 2 years last time daniel killed us it wasn't 2 years it was just a week so ok yeah yeah so a big step of clean ups will be the last step of this helpdesk the next one we've update the thinking that I owe is ready to the next step of the previous one I need as a plea to be able to finish the script as a full request where we are testing your jobs on the data thinking that I owe entrusted is already up to date with the code to generate ss token but we don't have as your plea on this machine right about entrusted agent token used un entrusted agent entrusted agent entrusted agent it's non ok my missing piece it need to be installed on both entrusted and pkg VMs see next steps ok so I assume once you will have everything working on one that will un block the other anything else on these tasks on this one no should be really soon as soon as the puppet party is ready we will be able to ask a drinking security team to review the main request nice good job this one is my fault I forgot to renew ss token used since the cryo jobs ok and the cloud flare r2 token so this weekend I renew the ss token in azure and replace the credential entrusted but the r2 bucket start was failing but azure synchronisation was restored ok and this morning we renewed the I had an issue also on updating the token on our private repository d'afforme state jpg error that we resolved together this morning so I was able to renew the ss token update the credential so we have now to restore crawler r2 bucket synchronisation part that we commented this morning ok I will let you do that task then ok in terms of priority we moved some tasks for next week so I'm gonna finish with the task working progress but we have on top priority RM64 oh what did I do I might have clicked the wrong ok so RM64 is on top priority do we have something more prior I don't see certain on RM64 can you give us a quick status yes sometimes you go you stop and then it's going fast I'm not sure if you want to have everyone got it ok so that's a connection problem or a very good example yes we I'm fighting with one usage on Jenkins.io I had an error with the usage of pod in RM64 et AirV pointed me that it might come from the size of the pod so as for another task he created new pods as large RM64 I tried to use it but then that was a rabbit hole and I found out that those large RM64 were not able to spawn because we are missing a nut pool those large nodes and then those nodes were not able to be created because of the Kubernetes version that was a little too old so we both tried to upgrade that patch version to 1.26.12 instead of the 1.26.6 and then we we had kind of a nightmare because the upgrade killed the connection through the agent so the agent cannot finish correctly and kept the lock on the terraform state so we called emergency which is Damien du Portal and this morning we were able to remove the lock and get back to work but then by discussing about that usage of a large agent we decided to try with a VM instead because the pipeline used on the CIG in IO is running on a VM and we plan to merge both pipelines so I did changed my aim and try a VM like an hour ago and I'm not sure it went through but it's on it's way ok I might be mistaken but I remember both of you mentioned a second job was it a Kuntap or something that require a bit more memory so same context for a Kuntap it was a Kuntap on a question progress in a few weeks already and we discuss it together on this progress the agent was killed and we might want it to compare the HDK 17 on CI to check in IO plus sources to compare them all to info on CI it was a 4 CPU 1 frame versus on info on CI alpha CPU 1 gigabyte of RAM so I created 2 template info on CI large 64 on info on CI it passed my build passed ok both of you might want to check that we are not on a word case solution could be also using virtual machine it's not I would say it can be the same cost or even cheaper to use virtual machine than pods but that's a hard take that I'm making but that's a viable solution something else on IRAM64 is it is there another task or what will be the next task taking I got all the checkbox to check for that image you can check on the issue if you want but there is a lot of work yes ok you have plenty of nice next topic someone in Romania was blocked from downloading they were redirected to the ARCH on university web server there has been a discussion I believe that issue might be closable I don't know if we keep it for STCO mirror that's a secondary topic that's fine I will create a new issue but the goal is just can we check that I ate the cloud was able to don't know everything I think so yeah last week because they never answer to the last message from C auto cool my proposal is we check if that problem is solved for I ate the cloud then we can close the issue and start a new issue which name is a new mirror for STCO ok so let me comment it let's confirm each user is now un blocked let's open a distinct issue about the new mirror provider survey do you want to share with us the new provider status this person this person this admin contacted us and he has he has to put in place ercing or ftp server on their mirror so we can integrate it with mirror bit for us to check in mirror bit almost there but that's cool that may be a new provider for the eastern Europe thanks for the offer to lock any country we want to push people to offer a new mirror for that country nice that's a good way to push people to offer mirroring I mean that's a way, that's a technique we have a reek sorry let's go on the order now I will read the third one in priority yes I agree update CLI step fail regularly when processing Jenkins IO pull request the title could be github app github action are failing because it's not only update key it's a label check every github action are it's not the only one I saw ok are we sure it's the github action on Jenkins IO yes I had the issue with the label one I don't remember what it's doing but that's really annoying label conflicting pull request workflow ok so that mean we might need to add the tidbacks solution then if not already the case that's weird it's using its own yeah that's why so this is because the same ok that's why it was failing too so that could be worth an explanation first on the issue just for the self documentation the github token here is the author of the pull request so that mean the API rate limit applies to the person who open the pull request so there are multiple solution at least this one shouldn't use the github token like this it should use the tidbacks to use the github app and then generate a new one hour value that will renew the the rate limiting ok so that one is to be checked that's an annoying one ok we'll have to find a proper solution on this one so yeah to be addressed we can create a github application and set it to this repository and generate the API rate so we'll see you later not that easy because that's a public repository you will have a permission issue the problem here is that we're not on infrasci so anyone opening a pull request the github if that's not that easy I don't think it's that easy because we don't want an external user running an update CLI command with the github token of one of our github app that will be a great way to extract information we can limit first contributors but it won't prevent people who are already contributed I see what you want we could check tbx but the problem here is because of the emitting user so yeah that's an annoying one or we can move the update CLI process on infrasci what will be the danger I'm thinking out loud but on donker Jenkins Weekly we are also using a github app to generate token to create a pull request you know we know let me gather on the update plugins only on the main branch we choose tbx for that I'm sorry to kill the game but don't you think we should discuss that after yep not true not an easy one issue open by Alex he's asking to exclude some repository from the ACP why not I'm still not sure depends on artifact that's machine generated if a new version and therefore not available okay I'm not sure to understand their problem but no problem to remove that that means it won't be cached that's only for one plugin so that should be okay any objection already open the pair looks good okay plugin ill scoring configure new job and see ijan kinsayo I discussed with instead of its using a statics file instead of querying the api for the plugin site so okay he created a pull request that emerged today that generate a report with content of the api the same call the plugin site is making so now we've in the plugin site we can atom code to query this report instead of the api so we don't have to have the plugin scoring with the api nice don't forget to add the references on the pull request then yes I have to discuss with him about this job the open this issue before we discuss this report is it okay for you to drive the topic since you started it or do you yes no problem this question in that makes sense with adria is to decouple plugin site generation from plugin health static file better availability nice we still have free minor issue monitor builds on our private instances so as I understand that one you worked on it can you give us a short summary I've got report of every jobs of private instance but but it's using token which we don't have good way to manage their life cycle on their restriction so I don't know if this work will be usable at all I have to exchange with chanking security about it with Daniel back chanking user token used to pull information from each controller life cycle permissions on each okay thanks for this one so I propose this one to be put on hold put on hold until we get a security feedback next one thanks for that I I I I updated some range protection on some repository when I when I see them I but I haven't separate more pipeline this week ok, that's really low low low priority and same for the other I believe something has been I saw you open an issue yeah I reported I reported a two or three repository I've done before I opened request on the main tank in San Francisco and Alex made one ok nice ok, so this last three will be put on hold or removed from milestone at least ok ok ok now let's have a look at the new issues is there any question on the walk down or being done 2, 3, no ok so let's consider first what we scheduled for the upcoming milestone we have a few issues here as well first upgrade to Kubernetes 1.27 so Stefan you started to work again on it what's the planning planning is AWS first so I did prepare the comment with all the checkboxes but the first step is to redo the reading of the changelog but with AWS in head and not Digital Ocean and and yes planning it on Thursday if you are ok with that looks good and next will be Azure we have a little discrepancy right now as we did upgrade the private K8S to 1.26.12 while the public K8S is still in 1.26 6 so this may give us problems we may have to upgrade to the same patch version but should be ok unless we have a problem worth mentioning it that means we have 2 weeks to wait for before having everything on the same page so no worries on this one that should be ok ok I'm switching these to the milestone ok and the next milestone we have a lot of issues past release sites are taking long time to load I propose that we move that one on hold we put that on hold because it requires not that much of work but still some work let's remove milestone from this one we have the migration leftover from public K8S to RM64 I won't have that much time to work on this one I believe let's remove this one to another milestone for later same intermittent out of memory we don't have the bandwidth for this one same for this one I won't have the time to spend on this it's not blocking same for the open VPN clean up removing the central cache this one must be moved somewhere else that one is important and that one is top priority as the same as update center and I don't know for docs jenkins are you are you able to spend some time on it no I was in the black fixation so for the next let's keep it but in low in the bottom ok update I'm trying to to add the proper priority unexpected delay is building a small plugin on linux agents so for that one it looks like it was due to digital asian we should re-enable digital asian on cian jenkins you to see what is happening do you think you will be able to try it at least or we have to re-enable digital asian and run a job so yeah ok so I'm letting it here that one is low priority as you say that one is to be treated update CLI I think we have the right order of priority reminder the new private kubernetes cluster the goal is to have the build agent from infrasci to be built on a new cluster that will run as your sponsored subscription for instance the rm64 not pull you trifled with should be on that new cluster so you will have separated cluster between controllers and agents is it clear for everyone yes but that one is not that important compared to the aws cluster from cloudbzaws that one will allow us to create one or two new eks cluster the question was should we upgrade 1.27 for these clusters propose that we don't try to couple them that one require setting up the account checking the credits are present preparing the work on the terraform project to see how can we use multiple aws account on the same terraform project resources could be spread bit that would also be the opportunity for us to upgrade to the new eks module I was planning to start spending bit of time on this one this week is the goal clear is there any question or objection ok cool we can really realistically start looking at the triaged issue so artifact proxy does not deliver incremental jars that word so that need to be analysed is that correct rv that legit issue we have cannot login success so that need to be checked inside account app to see what the result is so i'm adding it to the milestone they didn't receive any email so worth looking at the logs of account app see if their account exist etc most probably they are mixing up account app and their own Jenkins so proposal from basil stop stop excluding the maven central repository on ACP that will increase the storage and the bit that will increase a bit the workload but that will help to achieving faster builds so that will be removing the exception on the settings xml and see what is happening the risk is that if we forgot why there might be a reason or word thing we might have to roll back that change I don't mind taking care of this but once I will be off again I will have to be aware of what will be the condition to roll back it is that ok for you folks ok there is a repository to archive so I've already I'm taking care of this one so we can remove the triage there is a discussion between jc and basil regarding le support log formateur ok on a to let them decide but that's not triage and finally evaluate clean owner it's a new functionality from kitab global kitab action that does a quickly clean up of code owners and remove old members create a request to remove old members or in code owners interesting ok worth checking with team since he's the admin we can try even on the genkins in 5 needed ok I think we have covered all the new issues yes yep I thought I removed triage just been removed ok do you have other topics to check for the milestone stuff do you have other topics you want to mention ok is the prior set of priorities that ok for you I will have one last look ok so the last good news is that we spent the 4.4k last month on azure which is way above the expected threshold so congratulation to everyone and we continue decreasing because I haven't checked in detail the work from airway but I totally believe that we will gain a few dollars for each of the premium account storage that replace their quantum parts without the transaction cost so yes nice work so I'm gonna be there in 2 weeks see you next week for the people who won't be in holidays let me stop sharing my screen and stop recording bye bye bye