 Ok, bonjour à tous et bienvenue à la conférence de l'infrastructure week-day de la conférence. Nous sommes à 9 mai de l'april 2024. Aujourd'hui, à la table virtuelle, nous avons moi-même, Déliment du Portel, Hervé Le Mer, Stéphane Merle et Bruno Verrard. Marc Weit et Kevin Martins seront pas disponibles aujourd'hui, donc commençons. Le premier détail pour l'annoncement. Le week-end 2.53 est terminé. La release et le paquet ont été bien réalisés. Quand l'issue avec l'inscription de la packaging a été installée sur l'OSU-OSL Sync. Après un certain temps, l'OSU-OSL a commencé à être très lent en copiant des données. Donc, nous devons attendre pour la prochaine Sync.sh Nous devons renseigner la front-up sur la machine virtual de PQG. Et la bille de packaging est installée. Tout le temps, nous devons renseigner la bille de packaging quand elle est installée sur cette porte. Et ensuite, nous devons finir la Sync manuale sur la PQGVM par renseindre le script Asmiror-Poyne. Et, si pas incassée par la publication de ChangeLog à Jean-Kin Sayo, grâce à Kevin Hadervis' travail, nous devons incasser rapidement si ce n'est pas déjà fait. Si vous inquiétez de quoi la production de l'exemple de l'exemple est finie, vous obtenez le pipeline de l'exemple de la packaging, vous le readez et vous le renseignez par vous-même. Le long terme de fixation est de prendre la paix de l'OSU-OSL Sync en favorisant de l'archivité de Jean-Kin Sayo. Et puis, nous devons penser à ce pattern, ce pattern de la production de l'exemple de l'exemple qu'il va nous imposer de différentes détails et que nous pensons que tous les mirrores sont obtenus par l'artif. Parce que nous devons nous contacter tout le manager de l'exemple pour changer leur source de vérité de l'OSU-OSL pour l'archivité de Jean-Kin Sayo. Il y a un peu de travail à faire, mais pas très compliqué. C'est un truc qu'on a mentionné. Et dans ce contexte, c'est un opportunisme qui est mentionné ici parce que ça fixe l'OSU-OSL. La rélease n'est pas la seule à pouvoir évaluer pour l'archivité de Jean-Kin Sayo. Il aurait la référence de l'archivité de Jean-Kin Sayo et l'archive de l'esprit. Nous allons donc les mirrores pour obtenir l'information. Mais au moins, il sera avalé par défaut pour les utilisateurs pendant 1 heure. Est-ce qu'il y a une question de clarification sur cette question? Alors, change-l'agressif est en train. Donc, prêts à déployer un infra.ci. Est-ce qu'il y a d'autres annonces? Est-ce qu'on plane de faire une majorisation dans la production jusqu'à la prochaine meeting? Oui. Ok. Prêts à déployer un infra.ci. Est-ce que vous pouvez nous donner un summary juste comme un souvenir, Stéphane? Oui. Pour planir la migration de l'infra.ci sur ARM64, nous devons changer le volume pour ZRS, c'est-à-dire une région non spécifique. Non, région volume. Donc, j'ai déjà créé la nouvelle place de storage et nous devons changer le volume pour cette classe que ce sera fait et que la production de l'infra.ci sera faite pour changer l'infra.ci sur ARM64 parce que la zone pour ARM est spécifique parce que la zone n'est qu'une zone pour ARM. Et pour maintenant, nous devons changer le volume pour Intel pour la zone 3. Nous devons changer le volume pour ZRS et si nous avons un problème avec la migration de l'infra.ci, nous devons changer le volume pour l'un ou l'autre. Mais ce que nous devons faire ou après quelques jours après, on le verra. Ok, pour la première opération nous avons besoin de temps de date. Qu'est-ce que l'opération de temps de date? Sur la semaine, à la 10h00 Paris. Donc, ça sera 8h00 Et la troisième sera la 11h00 Mais nous avons encore besoin de confirmer quelque chose. La opération est la migration du volume. C'est ce qu'il faut faire. Et tu as dit que la migration de l'infra pour ARM64 est-ce correct? Oui. Ok. Alors, c'est là-bas. Ok. Hervé, tu as des plantes d'opération jusqu'à la prochaine pour l'opération de milestone? Peut-être des tests sur les updates de l'institut.io Des tests de tests. Parfons-le. Quand je suis prêt, je ne sais pas. Ok. Donc, l'idée est de laisser tout le monde savoir, au moins un jour, d'une à l'autre, avant l'opération, est-ce que c'est ok pour toi? Oui. C'est la même opération que la ARM64. Est-ce que c'est ok pour mes notes? Ok. Est-ce que tu vois l'opération de production entre la semaine prochaine? Non. Ok, à mon site, je vais avoir ICP, NGNX TUNING. Je vais planter une semaine durant l'infrasciation de l'infrasciation. Donc, le TUNING ici est les derniers changements, n'ont rien dans les warnings. Donc, je veux augmenter la usage de la memoire. Tout est prêt. Donc, on va terminer le temps où la production peut être impactée. Je ne vois rien d'autre. Bien sûr, on peut avoir d'autres, mais au moins, nous savons ces 3 points. Vous avez un autre annoncement, folks? Ok, non. Calendar. Donc, nous allons avoir la semaine prochaine de la février. Il y a une relance de l'LTS la semaine prochaine, comme je le remercie. Je pense que c'est Wednesday, si je ne suis pas misé à l'envers. Je ne suis pas complètement sûr. Non, c'est peut-être un candidat de relance? Je ne sais pas. Non, c'était la dernière semaine. Je vais le vérifier. Je ne suis pas arrêté de vérifier. Juste pour vérifier. Oui, ce sera ... Oui, nous allons avoir un LTS la semaine prochaine de la février. Je vais le vérifier. Ce sera la février Donc, Wednesday, je serai là pour regarder le LTS. Si quelqu'un d'autre veut participer ou prendre soin de l'infrastructure, vous êtes bienvenu mais oui, c'est-à-dire la semaine prochaine, nous allons s'arrêter de faire des opérations de performance entre la février, le milieu du jour et la février. Je ne sais pas si c'est Christian ou quelqu'un en Germany en train de prendre soin de l'LTS ou si c'est quelqu'un dans le U.S. plus tard. Mais oui, Wednesday ne touche pas la production parce que lTS a la priorité. Nous allons faire quelque chose directement à l'LTS. Qu'est-ce qu'il y a? Je n'ai pas vu de sécurité d'adviserie. Les prochains événements pour qu'on ne quitte pas parce que je pense que c'est silicone. On a perdu Stéphane. L'internet, probablement. Juste un quick check sur les budgets du cloud. L'année dernière nous avons appris 4,2 ou 4,3K sur Azure, sur l'accès du CDF. Pour l'année précédente nous avons consommé 1,1 ou 1,2K nous avons forecasté 4,1K. La même consumption d'adviserie n'est pas très précise. Il peut être plus ou moins 200 crédits. Donc, encore plus tard c'est mieux, mais non exceptionnel. Nous ne sommes pas sur red alert. Nous continuons la direction. Un mot sur la sponsorship Azure je vais envoyer un message sur les portes. Nous devons checker si nous pouvons externer l'endage pour les crédits Azure parce qu'il a été appliqué en octobre et que ça ne valait au moins sur le web UI pour le 1 mai. Nous devons acheter 2 Microsofts pour voir ce que nous pouvons faire. Parce que nous avons beaucoup de crédits qui sera un chien si nous perdons leurs crédits. Donc, externer les dates ou externer les crédits je ne sais pas. Mais si c'est ok pour tout le monde je vais contacter les portes donc nous pouvons discuter avec le CDF. Et si possible pour le faire sur un compte shared et pas seulement sur mine parce que maintenant je suis le seul à avoir accès à la bille de consumption qui peut être vraiment anoyante si l'autre team membre doit être autonome. La consumption est une bonne date. Nous pouvons consulter plus quand nous commençons à bouger dans les agents de France si vous voyez d'autre façon de consulter ce n'est pas la date. Bonjour Marc ! Bonjour ! Donc, juste en temps parce que sinon j'aurais envoyé un message à la board j'ai réalisé quand on a les crédits que l'accès à l'account comme nous avons beaucoup de crédits mais la date est en fin de mai 2024 qui semble un peu court. Donc je ne sais pas si on devrait commencer à actuer le contact CDF à Microsoft qui nous a donné les crédits juste pour être sûr que c'est seulement leur système et que nous avons plus de temps mais oui, je suis un peu... Merci, oui, je vais écrire ça immédiatement. Donc, on a 32K à gauche qui est bien. On a eu 8K, on a eu 8K, c'est bien. Mais au mois de mai 31 l'expiration est la bonne date. C'est bien. Donc, je vais commencer l'émail immédiatement. Et le second point est que si ils commencent à aller sur le renouvel même si c'est un outil positif on doit être careful sur quel account ils utilisent le renouvel. Je n'aurais pas envie d'avoir un accès à l'account parce que maintenant je suis le seul à avoir accès à la conception de crédit parce que les crédits étaient appliqués à mon account Azure donc si ils veulent réunir ou appliquer de nouveaux crédits je vais essayer d'appliquer un accès à l'account Azure maintenant qu'on sait qu'on va réunir un MFA donc ils vont appliquer de nouveaux crédits ou de réunir sur ce nouveau account. Donc, tout le monde ici sera capable de voir la conception. Ok, donc si... Je pense que c'est ce que tu dis et je suis en train de le faire juste pour être sûrs je pense que c'est ce que tu dis que si Microsoft dit 31 mai 2024 mais nous allons vous donner une nouvelle donation à un nouvel account en ce cas, c'est mieux pour Jenkins si nous avons un accès à un nouvel account Azure qui a plusieurs gens à l'aider Absolument. C'est bien, merci. Ok, je vais travailler. Si ce n'est pas possible ce que je vais faire c'est que je vais créer un nouvel administratif pour moi, comme le portal Dash Admin Jenkins Azure IO et je vais mettre un compte pour mon account courant au moins pour RVE, you and Stefan et mon compte sera utilisé par tout le monde. Great. Qu'est-ce qu'il y a sur Azure qu'est-ce qu'il y a sur CDF ou Sponsor? Ok. Une autre warnings sur la consumption de AWS CloudBees. La consumption est encore un peu haute pour l'année dernière mais c'est à dire qu'on doit activer sur ce nouvel account parce que c'est beaucoup de crédits et qu'on doit réduire ceci. En fait, nous avons besoin de continuer de travailler sur le sponsor AWS j'ai vérifié le temps du 31 janvier 2025 et on l'a déjà mentionné. C'est important je n'étais pas sûr sur le date d'expiration mais ce que je veux dire c'est que le renouvel 2025 c'est parce que notre donation expire juste après la fin 2024. Exactement. Merci. Je n'ai pas l'intention et peut-être qu'on va le mettre et Marc-Waite a l'intention pour soumettre le request pour la donation et je vais requester la même donation parce que je pense que je pourrais demander plus, nous serons certainement d'utiliser plus. Mon problème est que si j'ai demandé plus, ils vont dire comment vous avez déjà utilisé et que nous ne sommes pas encore consommés. Exactement. La méthode qu'on avait sur Digital Assign c'est exactement ce que vous avez dit. Ok, merci. Check Not this date or going your world Ok, je vais donner des détails sur how to get started on this one on a one of the task later and Digital Ocean so we also have an exploration date I said we didn't I was sure but at least on their web UI it's written to Tegon January 2025 and we have 17 credits left Right now we have consumed for this month 255 so we are on a tiny month on Digital Assign which means most probably we should think about choosing Digital a bit more on the upcoming trimester so first AWS sponsor Digital Assign could be a solution for the new update center as a secondary or third mirror but also for Gage and Kinsayo mirror I remember when Stefan started but we never had the priority on this to use Digital Ocean agents virtual machine agent but that's instead of the current Kubernetes we could increase the Kubernetes cluster size as well on this one so more builds for plugins on Digital Ocean instead of AWS whether a CloudBee is a sponsor and also we have CloudBee for a second virtual machine for two tiny services that require virtual machine migration that could be an opportunity to move them on Digital Ocean where we already have archived the Jenkins.io running Any question on the cloud budgets clarification Nope Ok, is that ok for everyone if I add this to every meeting or should I stay once per month given the time it takes on the weekly meeting For me I like it every week but I'm ok if you do it less frequently if really needed I don't mind giving the time and it helps me a bunch to be sure that we stay on track Ok, I will just confirm with a message will answer with emojis in Async so we will confirm for next week and so I won't take too much time here anymore and yeah we'll see Thanks Mark that means for having a feedback everyone can give one not as dates the goal is to see if we keep that formula or not if we keep it then we'll add it to the not template Ok, so let's get started on what were we able to achieve during the past milestone A big one has been the closure of the issue replacing BlobExfer by azcopy So we had a date center unplanned issue 2 weeks ago due to that the leftover were a bunch of cleanups and last minute items everything has been covered by airway Airway do you have something to add on this topic something you want to point out Ok, so now we don't choose BlobExfer anymore we use azcopy and it's updated on each new version I've seen 3 updates during the past 2 weeks so great work The takeaway here is way faster I don't remember the numbers but the numbers were incredible on the plugin website contributor website and the pkgsync.sh is less than 2 minutes instead of 19 to 20 so it's almost a 10 times decrease yeah, so that clearly really really useful huge work forks we close 3 issues on ACP so ACP is now including Maven Central as we saw last week also incrementals and that allows us to point fingers at all we have unexpected artifact that aren't produced by us in the release repository or we mirror unexpected artifact that should be in Maven Central we have fixed the behavior and removed Maven Central so all issues I've as far as I can tell all issues have been fixed there is no blocker or performance issues anymore we will have 2 subjects one is really minor I'm trying to fine tune ACP so we can remove some warnings this is really really minor it's just me loving the sport the low level kernel and memory buffer the second one which is more important I've opened a new issue thanks to Bazil pointers we will discuss that later but we will need to perform an audit of the non plugin non Jenkins artifact on release repository just to decide if we remove archive or keep them that will be annoying but really useful and important topic is there any question on the ACP or things I forgot no ok we had a user submitting spam that has been taken care of so thanks for this thanks survey for helping the documentation team to get the permission on the right repository plugin wikidox a repository has been archive thanks for this one we had an unexpected update center crawler root certificate expiration so it would have expired in may one month before the code of update center and crawler start to fail just to let us know early enough that it's not expired otherwise we would run in serious trouble and the discovery we made is that the notification on the event of our shared calendar are personal so I was sure I put notification and yes I did it was on mine but I was off so I didn't receive them and the notification weren't sent to stefan or a viewer or the team so here I've added again with multiple notification but that doesn't serve any purpose for next year so that mean we will need to find a solution for this the shared google calendar is not enough for the team level mentioned we need a bot a github action whatever solution that will let us know in advance so we need to start thinking about a solution but we need something calendar does not share notification does not share notification it's spare accounts we need an alternative but an idea stefan and I discussed on brainstorm I'm sure we already have that kind of discussion a month ago but I don't remember so sorry for someone already mentioned that having a github action Evangeline and the L desk repository that takes care of opening the issues instead of sending notification so when we have the meeting now and we do the triage we have issues such as a in one week or free week you will start you will need to to rotate a certificate expire or whatever because in the end each of this notification lead to an issue with the here as the action we are running until completion so if we had the github action with a kind of textual calendar that ensure it create the issues then we will see them track the completion of each of these issues with taking care of it and we shouldn't miss them that was my initial proposal if that's ok for everyone I propose everyone sleep on this proposal and think about alternative solution because they might be easier things but that's the idea using something as close as the reality as possible and the reality is that we use issues to track completion with great issue to discuss that that's a good point let's open an issue to this problem discuss solution and track implementation I don't mind taking care of that writing starting the issue and leading the topic I don't mind 1, 2, 3, ok the issue and I will add it to the next milestone if it's ok for everyone because that's a major topic for our ability to operate the platform ok for everyone thanks Marc for taking care of the Delphic sport that has been cleaned up on Artifactory on Update Center on the licensing and code everyone is agreed so nice outcome for the project we closed issue ok if you have exotic MTU the user documented and give a feedback on at least 2 solutions they use to fix that which are funny solutions we tried an SCDSA certificate at least key for certificate on one of our website technically it works very well we need to add the proper annotation I will back this change with the following arguments that need to add to a notation on each of our ingresses which means it's a nightmare today because we don't have a policy system that takes care of all new ingresses so that mean we have one certificate and one ingress different than all the others on Kubernetes that's a nightmare to maintain so cert manager has an open feature request to define a default set of algorithm and certificate setup for every certificate but that's not a feature available yet so we cannot say hey please use SCDSA by default it's RSA by default unless a notation per ingress so we know we can technically it works but as default I propose to keep shooting RSA for now the benefit of SCDSA is faster uncheck so less time spent on the hops less consumption on the load balancer ingress level that should be beneficial but right now it's too much effort given the benefits is that okay for everyone is there any question on this one okay and finally thanks for Hervé, Adrienne's idea we were able to decouple plugin site generation from plugin itscoring so now the generation of plugin site is using a static JSON file generated once par hour by our infrastructure and publish to report which is the idea available so if PHS goes down for whatever reason plugin site will use the last successful report which is really cool because we can operate and we don't need to implement HA on PHS is there any question on that last one nope, okay we had three shoes closed as not planned two are duplicated and most likely a user still don't know how user ends up on account Jenkins.io when they are trying their local Jenkins we might need to think on Jenkins.io to have dedicated page and get started with the wizard and the user because the information is spread across different tutorials so maybe that will be worse starting a page to decrease the amount of these issues I'm not sure if it will really effective but at least it will help normal and users in the end one was related to an artifact re-timeout that happened last week no actionable from us because it's on Gfrog and Gfrog mentioned they had issues and everything went back to normal the problem happened during a CD release so somewhere in GitHub Action Network so it's not on our Azure platform but worth mentioning and the user care of the analysis and the closure so thanks to them and thanks for contributing to Jenkins of course is there any question about the done tasks something unclear something that need discussion on what we were able to finish oh ok now for the next so do a walk in progress for the next milestone first of all I've open an issue about the Docker Hub HTTP 4.0.2.9 Confirmed by Docker I was in contact with two person from the tooling part and then with the engineering manager of the registry the Docker registry and they confirm our analysis and what has been written on the issue I shared the issue with them so what is happening to the Jenkins project is not an image rate limit we are still under the open source program and Jenkins slash whatever image are consumed without any kind of limit by end user however on trusted CI I haven't seen the problem happening on CI Jenkins I hope because the rate of build looks lower then we don't have to push the layers but we are triggering the anti abuse system of the Docker registry because since we moved all of our builds on the trusted CI private network with only one out egress IP that IP is marked as abusing because we are reaching 2.2K request per minute when we are releasing an agent or controller image these requests are all pull and push of all layers we do from within the trusted CI network including all agents and since we build a bunch of images in parallel all these requests are counted on the same abuse rate limit so right now they have increased temporarily for us until end of month a kind of exception but they told me the exception is a walk around so that might not be persistent if they forgot the configuration somewhere because this is not something they do usually it's a Nicolas Delouf style walk around they told me so maybe that will work or maybe not wait a second did they actually say Nicolas Delouf style yes anyway it doesn't mean it's Nicolas I understand thank you for the influence that Nicolas has on the organization that's very good it's a style of burning reprise on stage that's the style anyway they confirm that the short short term walk around of using multiple IP agress on our gateway to spread the request is okay for them absolutely that will un block us and that will make the walk around being able to be removed so that will be the short term fix that we should do during the next milestone so that's why I'm proposing to keep this one on long term we have to think on how to decrease the big amount of request we make against the registry because that's a lot of course it's a successful story for Docker because they provide us with really cool tools such as buildings and Docker composes all the work done by by Bruno, by Tim on the Docker images makes them suffer from their own success so we can do marketing and block post and communication on that later we need to think about a long term solution right now on their side they are checking our code and see if we are not having a bug inside buildings that they could work on or fix or optimize but that's not easy they don't see anything obvious we have two main proposals that I made one has been written on the issue as discussed with Irvin, Stefan yesterday we think that switching out from the Eclipse Timerine base image on the Docker files of the official Jenkins image as you can see we have a multi stage the first one is never pushed it's all it's getting the Eclipse Timerine and use glink to generate a new runtime not only adapted to the Linux distribution but also to remove a lot of things that are needed on the final image so here if we change that base image to the same image that we have on the second layer and add a run instruction that download the Eclipse Timerine target binary distribution then we will greatly decrease the amount of requests we do on the Docker hub because the Eclipse Timerine image is just there to get that binary distribution but it has 7 to 12 layers depending on the distribution of the versions while if we use the same image as the next stage we will have downloaded the image in any case so we just reuse it so that mean a bit of code for us but at least we would clearly have a direct impact and a long term impact globally way less requests think about it here we download let's say 10 layers for that image that's the one for Linux AMD 64 Debian then we have Debian Slim 10 more then eventually that's the same let's say the word Debian we have this but we have 10 for Debian 10 for Alpine of course it's 10 for Debian AMD we have 10 more for Debian IRM64 10 more for S390X etc etc I don't even mention the windows it's only one application one Java application exactly so the impact here will be great that require a bit of work for us bit of work means adding a run curl or run IP I think it's PWU on portion so that's a bit of work for the maintainer of the image meaning some of us here but not only that's a topic I think is worth presenting to the SIG platform today because that would have an immediate and direct impact and one of the positive outcome of doing this will be faster availability because this takes sometimes weeks even months to be available while the binary distribution are there really soon after an official release so I propose that we get started on these two elements and see the outcome another outcome we discussed on medium term or long term will be using staging for building the Docker images staging will be HUS having a private registry inside the trusted network and we start by pooling images, building the images and pushing them on the private registry and then we have a promotion step at least for the Jenkins core release that will allow security team to have something to check or they could build to the staging a few days before an official release and here the impact for the Docker hub will be we will spread during one hour the request first the bunch of pool for building the image we will don't overload the Docker hub immediately because then we will have to wait for the promotion even if it's an automated one usually for the weekly but that one is involve a lot of work useful work related with the Jenkins security team however I'm not sure it would have a direct impact immediately because the whole promotion part that say hey let's get from our private and pull and push the layers to the Docker hub how much time do we need because that could slow down the releases of the agents for instance so that's a less obvious solution compare to the base image and multiple hikes so do we already get enough so the transition from from using the container images provided by Temran to instead use just the binaries provided by Temran seems like something we know how to do we know how to do it because we've done it before it may even be that we can bring out the old implementation and use it as a baseline it's pretty easy to me feels like the first target if we want to go further than that can we make that choice after we have data on the impact of the first change I believe we should have both here because that one if not because that problem we saw that same rate limit problem happening with Chocolati when we build our package images we start sometimes to have this error and sometimes from the RubyGames distribution because we have the same issue on InfraCI network so if we have the ability to implement on our Terraform module that manage the outbound gateway the ability to to add more public IP on demand that will be always useful and that change is not a big one because that's technically possible on the NAT gateway already on Azure ok so those two items neither of those are particularly cutting edge or huge effort exactly we have to work on them that takes a bit of time no unexpected surprise here but we need to make sure that the load will be spread among all those external IP and not specific to one or all the time I mean the spreading is not that's that's the NAT gateway specification on Azure and that's a run-robin algorithm so yes it will be spread it won't be spread immediately you need one hour for it once you apply multiple outbound IP but new connections start to use the run-robin because the NAT gateway are the same thing as the load balancer they use for inbound technically so that's a good point good question and it's written on the documentation so if it doesn't then it's another issue and clearly we will have it somewhere else ok thank you is that ok for everyone sorry I took time but that one was particularly painful during the past two weeks and that could endanger any security advisory or LTS release well and the it's crucial we'll have a further discussion of the platform SIG to be sure we can close on at least the second half of this good so now we've got next week we've got an LTS release coming I assume we'll want to have that is it next week I think it is yes it's Wednesday so we will we would prefer to have the container image change before that and even better before next weekly next Tuesday so that we can yes but the priority for me is on the infra team I would prefer infra team validating and spending time on the spreading hypes that doesn't restrain the two to happen parallel but for me IP is that direct actionable to secure this one right and I agree wholeheartedly it's Bruno and I as part of the platform SIG can look at the at the change from container image to use a binary so that that's not something we need the infra team to do that's something I think Bruno and I can help with Bruno are you okay I'm speaking for you I probably shouldn't Bruno but are you okay are you no that's perfectly okay Mark of course as a person I can't return I'm willing to help for that too yes thanks I have a perfect on sufficient other image that I wanted to do right cool thanks folks thanks for your help no it's only matter of let's do it yes go go go the next one I need discussion here unless you have a still question on docker on the docker part nope okay next major issue that we need to spend time on except the update center but we know it's really priority that one is a proposal for the permission model to bootstrap the AWS account so that's a permission model what the main key is I want to protect us we want to be protected against something stealing our credential on our administrator machines so that's a pattern that has been shared from club these people so thanks to them for this contribution they have shared privately because I cannot share the terraform code that could help us to automates they also have AWS commands so we have different implementation path that way focused only on AWS concept first we won't go on using the big AWS IAM center and we shouldn't focus on multiple high availability stuff because the scope here as a reminder is that we have credits until end of January then we don't know so the goal is to move on that account only the CI agent inside your firmware agent we want to run on AWS so that workload is not trusted that would involve two EKS cluster that will be migrated to a single one with pod agents ACP within inside not available outside and that should be all we can add workloads if we have more credit but right now that's the target for more or less given that there is that the root account we all have shared account at least there vestifan mark and hi encrypted without GPGK on our system we should restrain on using the account at all costs except for bootstrapping or top-level privileged operations such as adding or removing and admin so I've put a message here explaining the implementation proposal I need your validation either with a thumbs up or thumbs down on the message you don't have to do it right now the rocket is not good enough looks good to me but the idea is that you have 24 hours to read this and vote just to be sure I don't forget anything or if it's unclear please add a message and we can clarify the goal is to document this so we can get started my goal is to have the code whatever implementation we use but most probably Terraform run manually once and then I will clean it up and I will ask someone else to do the bootstrap so I'm not the best factor on this case contrary to the Azure sponsored account so 24 hours to get thumbs up and thumbs down and then I will work on that issue for the next milestone the goal is to have a draft bootstrap and then next week we could plan the real live bootstrap if not already done is there any question things to be clarified nope ok third top-level item is update center so I let the mic to Herve here just to give us a summary on this part just to let you know Stefan and Hai we seen the GEP we need to read it we are late on that topic and we need to take care of that so that's outside Herve's leverages and Herve that's your turn on about your part of on that topic I'm currently looking at pkg sorry ok let's move on the RM64 part then instead Stefan we'll go back to the center after in fact I already said most of it because the new plan is to move the infra-ci it's not the new plan but the next step of the plan move the infra.ci move to the new RM64 and as I explained it's a two step migration the first step will be for the volume the next step will be for the the RM64 by himself by itself and that mean a dedicated not for the controller but at the end not only the controller the controllers and the non agents we need to find a better name than non agent not for if you can find a name all here so the second RM requirements sizing a new not for ok so we are taking all the data right now from sorry data dog configuration I'm compiling everything right now controller so we have two controllers release CI and infra-ci one of the targets that Stefan architectured is that we will have a smaller nodes which mean we should not be able to run release CI and infra-ci on the same virtual machine they will be on the same network to ping each other modulo the network policy but we will have one per node so we can spread the kernel and take care of controller but we still have three let's say non web UI services let's say three bots running on the private cluster because they don't need public access they only push data such as RSS Twitter GitHub command tops there is an IRC bot used by the Jenkins CI administrator so these services need to be thought can we migrate them to RM64 and if yes do we want to run them and pack everything on the same node pool or do we want to fill the gaps or do we want to use different node pools these are two strategies that will have different involvement as Stefan and Erwe discover earlier today we don't have limits memory and resource allocation on the free bot so we need to study with the metrics and decide whether we had limits or not which ones and for now the GitHub command but doesn't have any IRM built so we need to prove that exactly so IRM6 that might be the answer we don't have IRM64 image at first sight on these free services we know we have for the other services such as Datadog Ingress so that means maybe we want to have a node pool for the controller and another node pool for all the other services we plan to run which means we will keep the current Linux pool which is IRM64 and you can create a new one only designed for the two controllers does it make sense for everyone I believe the last mile here Stefan if that's okay for you just a billing check what is the expected cost of per month with the current amount of nodes we have because they are really big so what is the projection in term of IRM64 for the controllers and then we can do projection for the second one later so you will not be ear closed if it's okay I propose one node pool IRM64 only for controllers and we do an extrapolation it should have two nodes all the time you allow a third one when we have a surge or when we have an operation with restarts and you just do a billing extrapolation given the cost of the tinier node and in IRM64 which is cheaper we have to change the interim the node pool to lower the prices because we don't use them as much as before but I propose this as a second step consequence of this but decoupled and if that's okay for everyone that should provide Stefan answer on how to use the taint and toleration because that decision is a requirement for the sizing but also to set up the scheduling of workloads we don't want the controller trying to be scheduled somewhere else and we don't want things to be there based on the decision and anti-affinity between the controllers or you don't need the size will be enough so that's the next step so Thursday as a reminder the first one is mandatory we will need to migrate data to the Jenkins sum of infrascii to ZDRAS volume Stefan do you want to do both release CI and infrascii at the same time or do you want to test only this one and then we can plan ahead for release CI and other later I'm sorry I don't speed up same yeah no problem and then the other one okay that's all on the RM64 I believe do we have something else on the RM64 topics Stefan if you want to rush it and have both of them I agree but with you I'm just asking the question I don't know if you're asking because you're asking or if you have something in your mind so if you really need to rush it's okay but I'm not rushing alone no problem and no I have nothing more to say about RM64 I don't think so okay so then back to our view on the update center can you give us a heads up on the expected task for that milestone for this topic yeah I'm currently looking at the apache to vlogs to determine what are the most requested by and the corresponding amount I'm still on it the goal is to adjust a stress test on the date.shankins.io so there are something I'm still on it I'm a bit surprised because the the test the rotated test it's only one day every weeks that are kept kept and I would have expected to have all the weeks saved so I got the reason I found it somewhere hidden on the issues that's a message from Tyler that say we cannot afford that much of logs unless someone gives us a free platform to aggregate our logs and the day that sponsorship is gone we lose the logs because hosting logs is one of the most expensive thing on the cloud world I'm looking at the logs on the machine and the logs from 29 I'm just saying the rotation policy come from that rule I don't say that rotation policy maps to what we see really because between Tyler and us there has been 5 years of Olivier alone and sometimes you have to do what was needed to make the protection run so sometimes we have un managed thing that could explain why we have some holes here does it answer your question or does it give you a pointer at least I know why now it's only one day may I ask you just to report on the issue once you will have time the amount of data you see I'm interested on the log size for one day or one week or just to have an order of magnitude it's not now just report it on the issue please so it's sure that we will know for later because saying it aloud will just make me forget in 5 minutes sorry second question do we have these logs in Datadog collect it because we have the agent and metrics collected for that machine I don't know if we collect the Apache logs I don't know ok ok so that means performance benchmark and based on what we said earlier that means if you have a meaningful low test you should be able to plan it during that milestone or later is my understanding correct yes ok we'll let you report on the expected port don't are you able to check the metrics from the mirror bit system on Datadog as well not yet ok sorry I don't see the relation between your bits because mirror bits is the service you want to load test so you need to see the metrics today and always it used and you need to be sure that your benchmark doesn't kill it so if you have these metrics to check the result of the benchmark you try the 28 and 29 march to see if it had an impact or not or I don't know but that could be an exercise of finding the metrics so you will have everything at hand ok I don't have anything else on update do you folks ok so we had an issue that we asked Stefan to open we have plugin site api or plugin site generation I'm not completely sure but one of these two services gets data from ci jenkin sayo which we don't want because if ci jenkin sayo is done we cannot update plugins jenkin sayo so in the same spirit as what has been done on plugin hill scoring thanks to LV and adrian's work we might want to do the same for this one so we have this one on upcoming milestone we'll see if we have time it's not that much so I've started looking on it if anyone else is interested please take it otherwise it will be low priority in any case no priority mark I'm not sure what has been decided for this one I forgot to track it because it required gyradmin and it was a discussion about the test one xx never answered you are muted mark I see you speaking but we can't hear you mark you look like you're sorry I was muted and saying something that's shame on me right I talk really great when I'm silent my action item I'm just going to propose I will close it and let Alex reply if he objects to that I think after a week it's a good thing to say hey I'm not going to go any further because of the Gira permissions thing I investigated when I made the change to implement what was suggested by TM I also prevented creation of all issues in the jankens project and that's not a good thing so I reverted that change and hey said let's close this looks good for me we have two mirror do you want to continue working on it or do you want to pass the light on these ones I've sent an email to Daniel from SCS to know if he had any news about it and for the hostico I have to check if we can add ftp with user and password in the ftp URL I try to I started to to to run mirror bits with cave 3s on a locally but I it needs ready which is not support one it needs volume which are not support one I also try to run with docker I'm 64 to run on my machine so I tried with my full request concept full request but I couldn't tell I had an error so I'm still still working on it ok thanks for the heads up last time I tried during the gage in kinsayo for the records I might have forgot to share that with the team I run successfully using KMU the AMD64 image and it works very well for the mirror bits on my mac silicone and I was able to run local empty you don't need the data production what you will need because ready is used by mirror bits when you use the mirror bits add list or remove commands most of the time start with an empty ready so you can install the official and chart locally on your k3d cluster and then you can mirror bit will see the mirror and you can add manually the mirror and check it that should work I was able to add almost the same ready data because ready is used for the mirror you had edit or remove and then the database is filled when mirror bits is able to scan so if you can reach the ok I can connect the ftp with that credential and scan the data and start filling the local ready then that means it's successful and you can plan to apply it in production is that ok for you ? thanks we have the tissue about Jenkins stat repository with John Mark tooling to move to the Jenkins info organization that's a low priority I didn't had the expected time to work on it I would want to do it but I didn't have the expected thing that's just migrating data especially because we already have a number TAP that means we can move as it only credential changes and that should be done quite soon if that's ok for everyone let's have a look at the new issues I see one issue from Mark and one from me on triage status do you have other new topic except these two ones that we are going to triage I don't have any others ok first one Mark thanks for opening that issue about the markdown format of plugin to install on CI Jenkins.io I believe it's because Marcus Winter added markdown documentation within the pipeline library feature as proposed it's not approved it's not merged so this is very much looking to the future this is not something that needs to be in the next milestone it can wait until a decision is made whether or not we're going to allow markdown in pipeline step description I like it, I like markdown a lot in writing documentation but it's not currently supported ok I got a proposal should we start with the plugin il scoring on that plugin and check today what is the score and if the score is not really really really good what would be the attention point on that plugin and in parallel I believe we should mention and if it's ok we'll take care of that we should mention Daniel and Vadek or at least the Jenkins security team to ask them what's the status and what they know and their habits because maybe there is a red flag somehow but the last time we had that kind of exchange with them the red flag where oh that plugin hasn't been updated since years and during the time they answered someone adopted and started to clean up the plugin yeah the plugin health score for that plugin is 100% oh yes so it's reasonably healthy because I keep it healthy last release was ok the person open the health desk issue is the person maintaining the plugin ok but so I'm not worried about that part I have made the request to the security team asking for their review because we should not adopt this thing into ci.jankens.io without careful thought right and Jensek is part of that careful thought and I agree wholeheartedly very careful before we add any plugins to ci.jankens.io they must not affect it's primary mission ok is that ok for you if I add you as a sign and we add it to a milestone so we wait and as soon as you have a feedback from Jensek if it's positive then we plan the item to take it over from you and take care of the installation and because we need infrascode and stuff it's not manual thing here so is that ok for you that sounds very good I'm happy to have this assigned to me and it will wait until it certainly can't there is no reason to install it prior to pipeline library documentation be exported with markdown and we don't it's not critical to us that we need to be able to put markdown in other places on ci.jankens.io the things we write on ci.jankens.io plaintexte is good enough good so I will take care of this one the second one so on the initiative of basil we have that audits to run on artifactorich I believe it's a kind of that should be something not at the infrascope team level that should be a shell responsibilities because at the infrascope we can't do some of these actions but if we find anything which is not hpi or gpi finding the information and making the consensus on should we keep it or delete it it will be hard there will be a easy one if the artifact are published on a maven central repository we can directly remove them move them on a private archive repository so it's not used and not available on sbuy admins but I believe we will have somewhere the discussion that require long term contributor to have an advice I will want mark if it's possible just to bring that audits to the board to see if we can have a way to communicate clearly we need help on this one to have a second pair of eyes on this and not being alone on this one if we don't have anything until end of april if no one is able or willing to help us I propose we start the topic in may only inside the infras that might break things but at least we will have communicating early enough to let people we need second pair of eyes on this one is that ok for you ? that sounds very reasonable to me I don't think this needs to be done in april so I think this is that's more of a good thing to avoid future problems but our april platter is very full right now ok, is that ok to go through the board or should we just send an email directly to the jenkins cidev mailing list and get started on that communication layer I wouldn't bother with the board I don't think the board will consider the governance topic so jenkins developers is great ok I'm taking care of that then to ask for help let's target may or later to really start working on it is that ok for everyone ? ok, so I'm going to add this to the milestone because that requires just communication work but still some work I don't have other new item do you ? ok, I will remove the tray edge and move everything on a milestone just after we hand the call I don't have anything else ok, so if no one has other thing you want to bring let's see each other next week bye bye bye bye