 cloud. Okay, it's recording. Everyone, welcome to the weekly Jenkins infrastructure public meeting. We are the 22th March. The notes, as usual, are taken collaboratively on ACMD and will be published on the Jenkins infra slash documentation public repository as well on community Jenkins and the recording will be on YouTube. Today, we have Mark, Wait, Stefan Merle, Ervel Merle, and HiDemandu Portal. Okay, let's get started with an announcement. I assume the weekly release was finished or almost. I see it's visible on Jenkins IO, at least for the package. I even checked the Docker hub. I assume if you guys Damian, I'll run the okay, so I will run the release checklist sometime later today. Okay, there's, there's I suspect there's more work to be done. I just don't know what the work is yet. Okay. I must finished Docker is certainly the war file is there, but I haven't seen indications of, well, the checklist is there. We just needs to run the checklist. Okay, but at least the packages are there, which mean the war and most of the native packages have been built. So sixth release in a row, eight if we count the two past security LTS without any issue in the process. So good job, people. Are there other announcements? No. Okay, so let's proceed. So let's get started. So moral core information, we are trying a new process for having actionable in the form of GitHub milestones. So the idea is that we start by checking the milestone of the current date of today. And we start by the closed issues. So all of the help desk issues that we worked on and finished or pull requests on that repo have been associated to that milestone. So let's see what we did since last week by checking this closed issue. So that should be quite quick. One, two, three, four, five, six, seven, seven, eight. Major one, we had an AWS key exposure. All the details have been written on the issue 2830. It has been closed under the assumption that the keys have been rotated. We have been able to demonstrate that it wasn't been used for anything. And the root cause was fixed. Unless someone need more details or if the issue is not clear, I won't spend time detailing. We put a lot of information. And we have just for information before I let you have a point, we have two sub tasks that are long term improvement related to that area that have been opened. There is one which is work in progress. So we will go back to that one around templatize the job definition to generate DSL folders. And there is one in the new or to do list. Herve, you were seeing something. Okay. So yeah, that was a minor issue, minor impacts. There are a lot of things that can be improved. Summary, be careful when you have GitHub check because it's enabled by default and it can expect trade sensitive information from Jenkins log output. Yes, it's not the good issues on screen. Sorry, I click on the bad one. My apologies, 2834. Thanks. Is there any question point, things I could have forgotten, things not clear on that one. Okay. Next one, fastly. So now, let me click from here. Now, fastly, thanks to the work of Herve is managed by Terraform. So we have a fully managed public repository, which means that we hope that next fastly option update, if someone has to add another changes settings, instead of being done manually on fastly web UI, should be done through pull request to that repository. So now there is a cash in a fastly cash invalidation step that happens as part of weekly and LTS builds. I assume that's not requiring a Terraform change or no, no, I don't think so. I think it's some pure requests. I have to check the pipeline of release, but I think they are calling the fastly API with token allowing them to purge complete services. Right. And it is just I'm confident it is an API call. So so it's okay that that's continues to be done. Yes, it would be good to identify the token used by the release. Okay. So we know this one is used for this. Unfortunately, or fortunately, I don't know, token can't doesn't seem to we doesn't seem to be possible to generate token. Ascot. So yeah, that that let's say the limit managing tokens and validating cash. However, invalidating cash might not be needed on Terraform, but at least we gain some auditing and the ability to everyone to see the settings publicly and to propose improvement, especially in areas where security header is required. When Gavin will publish sheet and can see the way website website, he will have the possibility to add in this repository, corresponding services, without going to fastly interface. Jenkins. Jenkins is the way website. Right. And okay, good. Thank you. And so that will be fastly cashed. And it will require a change to cash it. I thank you. Thanks very much. It will be a new services service in Jenkins in fastly naming. The great job better. This ability is always good. Next item in there as your resources are now managed again with Terraform. So that's the marketing branding. In fact, we don't manage anything. It's just that the the we recreate, we reinitialized the whole project. We removed all the old Terraform definition that weren't updated since one year, one year and a half at least. So we will have a task to reimport resources, the existing resources and creating the new one. But now we are ready to operate. So the next step for this one will be adding, I forget it up. So it's empty state. And there are some work in progress or new issues that will be around adding managed databases for ratings and can say you at least and short term and for the new private Kate's cluster. So that will use this one. Please note that these two Terraform manage project are using exactly the same short tooling as the existing AWS data the dog and digital ocean. We use the same pipeline and the same make file and the same versions. So great job people. Thanks for having a new team for Terraform on GitHub that allows us to have code review automatically pinging the correct person. And to ease the management of the repositories. Thanks Stefan on the huge work you did on helping us to bump Terraform version on one dot one. And now it's fully automated from all projects with synchronized updates. That's a really, really huge one. Thanks also, Stefan RV and team around the auto label issues. Now, based on the kind of service people select on the desk, we have more labels automatically set. So it says you're for us to get statistics and to use the labels on the bunch of issues we have. So really cool. I forgot I realized I forgot one issue that we closed during the weekend. Sorry, I'm opening it right now. We were a bit too enthusiastic to deliver the new Maven version. I mean, it's full automated. It was just one button, except that Maven 3.8.5 is subject to a regression. So thanks, Basil, for letting us know. The consequences that as soon we updated Maven last Friday, started to break all the bills of the Jenkins core, at least maybe more. So we had to roll back to the Maven 3.8.4. We are looking for 3.8.6. Let me add that to the milestone. So and I had not detected that failure in my infrastructure when I rolled out Maven 3.8.5, but I only had a day or two of testing in before. So I'm glad that it was easy to roll back. I was quite surprised. I still haven't rolled back my personal infrastructure. Okay, so good, good to know. So I still need to write down an issue. I realized that I forgot that. I think the person who did the huge work of updates CLI Terraform synchronize updates could do that to ensure that we have synchronized Maven version updates between container agents, VM packer templates, and Jenkins tools. So I will let it to do after the meeting. And second thing, I'm responsible for deploying Maven in an enthusiastic way last Friday. So an improvement proposal that I'm making a lot there to be challenged could be improved, could be better. It's trying to avoid to deploy changes to CI Jenkins IO, at least new tooling feature on Friday. At least afternoon. I always have mixed feelings about that kind of rule honestly, because the I mean, it broke some usage years, but the problem is not because we deployed quickly, it's better to have the latest version quickly. What is missing is a way for our users to be able to test some changes like can I read deployment or elements like this. So the don't deploy on Friday is just for me a temporary act to avoid having our user frustrated by the fact that this does not work. But it's an opportunity to us to think about how could we deploy a new version of the tooling. And one of the reasons for Maven is sensitive for most of our users, a central tooling like GDK, these two are really central, eventually get as well. So we could think about the way to either propose age agents or can I read deployment like only 10% for one week, we could think about these elements. But for now, we need to synchronize deployment of Maven in order to think about improve deployment patterns. So do not deploy on Friday for tools of CI Jenkins IO. Is that okay for you? See, I'm I put I would push back and say no, I don't think we should even restrict ourselves to that. Because yeah, I guess it's a workable temporary thing. Until we find a better way, as you suggested, is there a way to do canary or to do new tempers and but right, you know, canary for 10% looks good. Yeah, so yeah. So I will say the point here is how do we ensure that there isn't any bug? We could ask Basil, since he opened the issue, so he cooked the issue. If he has any insight, because he is the only person who was able to report that to us. So having his insight and ideas on that area, what do you think about we ask him? Because he might have ideas on all we could add a health check. I know Mark that you already use a job that I tend to do and I checked that this job was working after the deployment that check that the agent could be spawned. And that job is called by the bomb archive builds, which are big, big builds. So it run as pre build. So it start by running the Il's check jobs. The jobs check that each label required by that job can be spawned and unspawn to be sure that AWS is reachable, that you have the correct template available, et cetera. But that job is not able to catch a bug in Maven that happened on specific builds. So maybe we could improve that job to run a Maven clean build maybe on the Jumi plugin that we use sometimes the infra to test. Maybe it could be improved the that those techniques have to be kept lightweight. But if we can find something that is very lightweight, that still does some some interesting check, why not? Yeah. But whatever the health check we're doing, we're we will still have the possibility to crush everything on a Friday. Sure, the health check is still all of those checks are still incomplete. Good thing is that we can revert kind of easily. The problem is that we need a way for every everybody to be able to to ring a bell and say, Oh, something is wrong. But whatever health check we do, that will not be as good as the real life check. So that's why the 10% with the kind of reversion tempt me. But we need to make sure that if something fell on those 10% we got the information. Exactly. And still check proposal, even with open telemetry enabled, I don't know. So that's why as Brazil for advice idea. Oh, could we cook that? The question is, I understand our developer frustration. But if it's if it happened one or twice a year for one or two people. That's, that's to be challenged. That rule has to be challenged. But I propose that we apply it for now. We just be careful on when we deploy. That should be good enough, at least for the main depends on why you deploy. Because if it's just an upgrade like that, it can wait for Monday. But if it's a security upgrade, you may take the risk and go ahead and drop it on Friday. That's the point of PR. It's pull request. And there is a brain behind that who say, Okay, I approve or not. Exactly. Okay. If it's okay for you, I will transform that as an actionable in the CI documentation public page where we list the labels and capabilities available for the bills. I will add to I will open a pull request and ask the three of you for sure to validate it before merging. So not only one of us. And I will also mention Brazil to have an advice on the pull requests. Does it sound good for you in terms of process? It's great. Excellent. Last element we worked on. Add docker credential to the data doc deployments to avoid being rate limited. So thanks for the work on that part, Stefan. So now we have a new end charts that deploy that credential on every namespaces that require it, including the Jenkins agents for CI Jenkins IO, where it was manually managed previously. So now it's automatically managed from subs. Same credential. And then all data doc benefits from decrease the amount of failed data installation. We cannot hear you well. Oh, continue. At least I think I was going to go ahead. Okay. However, we still have issues. Even with that add on to that a dog, the issues still happen with the rate limit. That will be the next topic on work in progress. Before we jump to that, did I forget some tasks that you folks closed during the past week? Not sure, but I don't think so. Okay. So that's already a lot of work. So now work in progress that should corresponds to open issue on the current milestone. So for each one, we will cover it and see if it's something that we continue working on or if we remove it. The goal is that that milestone should be closed after that with no open issues, either back to triage or delayed to the next milestone. Yes. For the done one, we did we mention the status check where we removed the removal of the log in the status check, in GitHub status check. For me, it was implied on the AWS API key exposure, but is that okay? We can cover it. Okay. Thanks for checking. So on the area of work in progress issues, docker up credential for VM agents. So that one is quite tricky. Team and so mark, so some API rate limits on builds on CI Jenkins IO for jobs that were building docker images for Jenkins, either controller and or agents. After checking, we thought that we weren't using authenticated call to the docker hub, but in fact we are and we did. The thing is we reach a new kind of API limits. If you don't use authenticated docker engine, then you are rate limited per public IP. That's the only way for the docker hub to track your requests. If you are authenticated with an account, that's better. It's not per IP, but it's per account. And we reached the limits that RV got on the official docker documentation, which is 200 requests per account for six hours, which is not a lot in fact. And we reached that limit with CI Jenkins IO builds. What are the solutions? So right now there is a walk on progress to improve the code of the pipe and library that use the that define the credential. And the idea will be to split on multiple accounts to upload. So there are different walking angles there, having at least one pool account and push accounts separately. So if you reach rate limit for the pool account, you don't end danger the ability to push new image. And also the angle of one pool account for each Jenkins controller that we use at least. And we can even push forward with different docker pool accounts, depending on the cloud, like the kind of agents. Is it a virtual machine or a container or walking out on Kubernetes? Because we saw exactly the same thing with the data dog deployment. When we have a big bomb builds that schedule a bunch of pod and trigger auto scaling on AWS Kubernetes cluster, data dogs start to reach the rate limit with the images. So for data dog, there are other way to fix that. By not using images sustained on docker hub. We could have our mirror, we could add docker hub proxies. But on short term, at least spreading the load between different account is quite easy to implement. Also, as noted by Hervé, we could check the open source subscription on docker hub for the account we are using for pooling images. So Hervé opened an issue that is struck on the team weekly not as to do. The idea is to check the open source program because we know that we already apply to that program. And we are currently gathering information which docker hub account of Jenkins related stuff is subject to this one. Olivier shared an information with Hervé and I by email that he never got answer back from docker about Jenkins for eval and Jenkins CI infra docker hub accounts. So it looks that only Jenkins organization on the docker hub accounts is subject to that program. Which means the extended rate limits and maybe other features are only applied to that which we don't have access to. It's not Jenkins infra area. So if it's okay for everyone, I will take the contact that Olivier shared with us. I will have the Jenkins infra private mailing and get the discussion started again with docker to ask what is possible to do with them. Yes, great. Additional points. I know that on the enterprise account of docker hub we can generate multiple token accounts to spread the API rail limits and some of these tokens can be read only which add another layer of security. You can only pull image and not push. That would be great for the pull, yeah. So that could solve the push and pull account security issue even though it's still better to use on CI Jenkins.io another account than an official one. Maybe Jenkins forever could be okay. Is there any question or thing that we forgot? Just a reminder that what team, last time we had API rate limits, team pushed also another solution that could be as much as possible try to avoid hostile docker images on the docker hub if we need it. We could benefit from the recent AWS automatic mirror of the docker hub that would mean paying a bit for the storage on AWS but it's quite, it's not expensive. We could also try to build some docker proxy cache but that one is really tricky because you need to either add some transparent proxy rules on the network so you need to fine tune the network and sometimes it doesn't work as expected or you need to configure all the docker hub to use that image to override on the go the namings. So that's why the last time we all consensus that it might be too too much effort well we could just ask docker. Next work in progress add an email alias for press. So we don't have, the question is from Gavin is he want to add a new email alias but the mx of Jenkins IO is currently delegated to mailgun accounts that neither Olivier, Marc, I, Gavin, neither Tyler, we learned that a few days ago have access to. So the last person that could have access seems to be KK we haven't heard from Kosuke by email so let's wait one more week before the thing is we could always change the mx to whatever mail system but we might lose the list of email accounts that are configured on that mx so if we do that if I understand correctly Stefan I let you stop me and correct if I'm wrong we need a kind of catch all accounts and we need all to have access to discover again that part. Yeah we can we can receive all the email and then by analyzing the the mail coming in understand who address who get an address or not we have to make sure that the spam is not making us think that some account exists you will always receive the email to Jim or Tom or usual names. On the other side thanks again Olivier for pointing us to the correct interlocutor at the Linux foundation scope so it looks like that if we want to take care of the email mx instead of mail gun once we know which what do we want we we only have to open an issue like we do for Jira at the Linux foundation they have the IT service for that so if unless someone is against that maybe we could ask one time to the board mark but it feels like that the Linux foundation is a good idea so they could take care of the mailing system we just have to determine which list we ask them to create and we've we've certainly liked their services on the Jira system that they host for us Linux foundation has my full support I think they did great for it maybe we can also try to get in touch with mail gun to find a way to discover all the account we have now even if we don't get any any login yep maybe they can at least provide the the list of accounts yep good point would any answer from KK is it okay if we contact again let's let's say deadline is next infrastructure meeting yes try to contact it's four by five past five I'm sorry I'm talk too much but these subjects are really important are you okay if we continue at least until the for the next 10 minutes yes call me we don't hear you you're muted mark I have to drop off for another meeting but I proceed without me thanks very much thanks mark we are blocked currently on another important topic the ability to define credential at folder jobs level for at least being able to migrate safely infrastructure report job from trusted CI to infrasci for improving the AWS exposure risk with the terraform we want we don't want terraform for AWS to be able to access the credential of terraform firstly and we also need that for the update CLI migration on a centralized multi-brown job so we have at least three major tasks on the near future that require that so we are blocked by this one um RV can you just give us a status update on email notification from Gifford cloud status I didn't progress on it and I want to put it in the next milestone it's more nice to have okay so back to triage yeah okay is it okay if we add back triage or should we just remove any milestone I would remove the milestone okay fine for me fine for you stefan yes yes okay you have to clear the milestone and show you okay and same thing for the github status tracking same thing for github okay I'm receiving the webbooks from github status page but I have to make something about them probably keep an issue open with all change and update okay cool so I'm moving to the upcoming milestone docker up credential defined credential I'm clearing migrate infra report milestone because we are blocked by another tasks so either I we add it if we are quick enough but given our rates doesn't make sense a switch from github action to gen kins for update cli tasks this is no agency I'm not sure we have to keep it yeah okay and as far as I remember we discussed and agreed on 27 78 should be done before so having a specialized multi-brown job on infraciai that cover all the update cli of all our reports so let me clear the milestone which in turn that one required the it has no milestone because it required the credential to be defined at job level so it's sequential so no milestone as well um so let's move it to the next milestone same for this one uh we have gcaws or dmage from packer can you remove the level 3h yep um is it okay for you stefan to take it for the next milestone for the coming week I started it so I want to keep it for the next one yeah assign it to stefan yep yeah that's a copy of my private one so I had to redo the issue okay so the milestone for this week can be closed good for you good yes thank you oh where is it yeah 22 just today where you were on it yeah I had the the zoom uh toolbar okay so I'm just double checking the work in progress and uh just to be sure that we agree on what can be done on the next iteration and then we can finish with the new elements to see if we add them back migrate rating jenkin sayo to azure which means uh that's the adding the manage database on terraform azure uh is that okay that uh um stefan and I take that one as a way to stefan okay so I unassign you and that's the one we stole you okay um there is one I'm not sure if it should be update terraform to write the stdr to local file that's a to-do minor to-do after the aws exposure yeah but is it really necessary since we can consult the full log in in frau that's here in jenkin std it's nice to have but it doesn't really matters yeah I agree is it okay if we close it then yeah won't be implemented uh since keep the status check disabled and so I can remove it from the milestone as well okay um the email alias for press we have the docker up credential for vms gc old image define credential that's already a lot of work um lv you raised the thanks um you reminded me I missed that one there is um someone at an issue with their account on jenkins.io is it okay if we take this one I can help someone can take it is it okay to have this one on the here yes is there anyone volunteering for that one or should I assign it to me by default okay there will be enough for everyone um this is a new one I've added it to the milestone I think by by error so first of all before I go on the monitor builds is there any work that you are currently working on that we didn't speak or forget to add the milestone planning on uh Oracle resources maybe I wait for Stefan and your work on rating yep and in that case uh scale way I don't know or I don't okay may I ask you to add the milestone to scale weapon yeah so as we discussed a bit in private before um I was the person in contact with scaleway so I'm going to send an email to the scaleway team introducing hrv as the person that will lead the account for now and with the jenkins infra team private email in cc so that hrv can drive the partnership creates the scaleway account see if we have the credits and start the work on terraforming all of this in autonomy and so we plan the Oracle part on once we will have done the migrate rating so that Stefan can take the lead on Oracle helped by either hrv on me which will be more importing resources and uh initializing the project sounds good yes so that should span on the free next weeks and we don't add these to uh on here we don't add the Oracle one and we add the scaleway um so on the new element monitor builds on private jenkins trust we should start to implement a way to monitor the builds on trusted ci at least some of the most important one all of the details have been put on the issue uh Danielle bake thanks thanks for your help uh he proposed uh uh akish solution that is more than akish where an external job uh will check uh an export is an export with the dat time because not only we have to watch if a job fails but we also have to watch if a if a job was scheduled if the what if the job wasn't scheduled then you don't have a failure notification so we need to watch the work and we might have the same issue on infra ci as well and eventually release ci so it's a multi-step process but we have to we absolutely have to start with trusted ci because not it's blocking the update center it's blocking the security teams and it's blocking our external user and it's not the first time that Danielle asked us to monitor that right now Danielle is our watchdog which is not really acceptable not that he is he's doing a good job at that but yeah i mean that's not sustainable so thanks Danielle for that um there have been some details on that part i propose that we put that as after the current priority but still i keeping it in the upcoming because as soon as you have some time thinking about how to implement it there asking Danielle for more details checking the existing elements could be a great help for information there is an existing process on datadog that watches the last updated time of the update center is on file so we could reuse that logic and code for the same there that's to mention um we remove the shared tools so i remove it from the meeting note we've added the migrate rating so that's okay for the new one i would simply link to the next milestone yep i don't know yep uh so let me update the notes just after i don't want to take too much of your time we have the a new issue not on a milestone about migrating trusted Jenkins to release CI Jenkins at least the docker builds the issue has been created with details it's just for information and on my side i will add that issue the add private gates aks cluster uh i don't add it to the current milestone i want to first drive uh stefan on the rate the rating but that one will be exactly the same so we can also have another opportunity to work with i feel like i will have a lot of work after that yes um i volunteer to lead the private gates because it's a topic i just like to do and i would like to monitor that one but i need one of you folks to shadow me or at least review it yeah i'd like to be on this one too okay so i'm not adding it to the milestone but i assign so hrv and i and we will see if we have time to work on it it's important the most practical those tasks sounds good which doesn't urge you're really good but yeah sounds great the sound is distorted so sounds good yeah but we didn't hear you well that's not really sounding okay okay good but not so good i need to update the meeting notes after raising you um then publish it is there any other points to pick you want to raise no thank you okay thanks for your work folks for your feedbacks and have a good day see you next week bye bye