 Hello everyone, welcome to the Jenkins Public Infrastructure Weekly Meeting. Today we are the 17th of May 2022. Let's get started. Today we have Stephen Merle, Mark Waite, Hervé Le Meur, Prenever-Arton and Haïdamian du Portal. Let's get started with announcements. First of all, we had a security advisory successfully published. Everything went smoothly. At least we were able to help the team. We opened the Statue of Jenkins when CI went down. No issue whatsoever. So we have to still gather feedbacks from the security team if they saw anything that we could have done to improve. On our side, I've noted one feedback. None of us, and I include myself, talked about opening a message on CI in Kinsayo beginning of the day, despite we had the invitation. So I did it directly on the main, but we all forget about that. So that's an improvement for ourselves. Next time there will be a security advisory. We should be able to open, eventually have a branch, a local branch. That's not a bad thing, though, because since it's security advisory, we should restrain from publishing anything. So I'm not really sure about the stance of the security team on that area, but is it okay if one day before we open a mention that we will have a maintenance window on CI Jenkins with no further operation? And I think it's okay with them because they've already announced it by that point that the advisory will be published. So the fact that it will be published is already public knowledge by that point. Therefore, publishing a maintenance window is just restating that what they've already said publicly that there will be an advisory. Okay. That might be a discussion to have with them because that could be a bullet point on their checklist when they start checking with the infra team that the status Jenkins must be updated with the risk of slowing them down if they're not available. However, they could also be autonomous on that part if we have a runbook. So if we do them on that point, right, that's balance to find what do you think. But yeah, it went well. CI Jenkins, are you interested already up to date? For the Kubernetes subgraded, the image is currently being built. Second announcement, weekly release. So publish successfully. Need the release checklist as usual. No issue, whatever. I restrained from merging anything this morning. So thanks people for monitoring me. Are there other announcements on your sites, folks? Actually, yes. One more. Google summer of code. Google summer of code. Project selection will be announced this Friday, the 20th. We hope the Jenkins will be selected as a project. We firmly believe that we should be selected. But it's not known until they state it publicly and we're not allowed, even if we were to know, we're not allowed to announce it in advance. It's theirs to announce their funding it. We be sure we honor their right to announce when they wish. Okay. Sounds good. Let's cross fingers. Thanks, Marc. So let's get started on the notes for today. First, the work done during the past iteration. So by done, we mean closed and delivered. The migration of rating Jenkins.io to Azure is totally finished. The last pieces were documentation and deleting the database, the old database on Amazon. So thanks, Stefan. As per basal analysis, it wasn't a huge and prior improvement, but still we were able to improve the random number generation on the virtual machine that we are hosting, especially for CI Jenkins.io. Thanks, Stefan, for taking care of that part. I've checked on my site and put reports on the issue. All the Kubernetes hosted controllers have everything that it needs because the machines and we use the default underlying machine. So we trust the cloud provider to provide us good, good default setup, which we didn't on our own machines. So thanks, Basil and Stefan, on that area. We had a bunch of day-to-day administrative tasks such as maintaining GitHub teams, helping other contributors to have repositories for different projects. One notable change was we are now managing crowd-in incoming requests through the LDesk. So thanks a lot, Erwe, for dealing with that and for putting all the automated tooling to help them because it means not only us benefits from the LDesk, but also the others. It's a working progress. Yeah, but that means that triggered the interest of other people. So it's a good tool for sharing visibility on a single location. Clearly, it's a huge improvement. I've added suggestions also for Alex and you, Marc, on his request in the repository. Nice. It sounds like there has been an artifact to issue. I was off this day. So thanks a lot for managing this. The issue was closed. Do you have any feedbacks and enhancement thing we could improve on that area? Still have the status updates published in the LDesk could help. I had to find a way to do it properly. Nice. Seems like we still have the issue of only me is able to send them email. So until we solve that issue with GFrog, I hope they will start doing that because that starts to be a bit worrying. I propose that don't hesitate to call me on my phone number in such cases because it's only an email for me. So unless I really don't have any data cellular, even if on a holiday, that's a temporary situation. So next time do not hesitate if you feel like that it's necessary. Seems like Tim was able to handle it by spamming them on different email threads. Works well. So let's continue doing that but don't hesitate on that area. Question because it's one of the other tasks around Datadog. Stefan, you worked on fixing first the synthetics test we have on Datadog. One subset of our monitoring. Thanks for that. You were able to add a monitoring probe on this new synthetics test on the UI of GFrog. Were you able to check if it fails and what the monitoring did? We did a test with a fake return status code of 205. Yes but during the incident four days ago because at that moment the UI was down as reported by the user. I remember Stefan you said to me the alert was triggered. Yes and I thought you would receive a lot of pages but I forgot that on all days you will not. So I'm assuming that I received some but too much and I didn't check that part. That was read in Datadog. That was read and I saw that you folks handled it so I didn't check in details. So if you checked and you are sure no problem otherwise don't hesitate to check the Datadog because it has a month's history. Nothing is deleted so we can still change the time span of the Datadog and you see if it was read during the incident. I will do both but I'm pretty sure I did. Cool great job then. So that was the two other elements and that's all for the fully closed element. Did I forget a task that you closed that weren't mentioned or that are not part of the work in progress? But that was still being done during the past week. I don't think so. Okay cool. Let's jump on the work in progress. I've tried to set the priorities while preparing these notes. Maybe just in the done thing the fix for the retrieve of the next draft release. Good catch. It's not normal but we've got many situations where we had many next draft release and the script used in our pipeline and Jenkins, Mavin, CD, Action, Workflow were returning all of them in a multi-line string making the pipeline fail. So I fixed the GQ expression to retrieve only the last one. But we still have the problem of multiple next release to fix or to find a why. It's okay. There is no help desk issue. The multiple issues. Okay. Do you mind creating one but not close it? Mention the link I've put on the notes because as you said we still have to fix the why do we have multiple next release. I will put the explanation on that one. I share that on a private channel and at least with Stefan and Hervé. To summarize it's because we have different events triggering the release drafter on GitHub Action and sometimes since we create releases and update them it's on multiple triggers that can happen concurrently. I've opened an issue on release drafter to explain the case. We most probably have on short term to add a kind of lock so we streamline the builds for release drafter on each repositories and on long term we should stop triggering release drafter from GitHub Action and put everything under our docker or any release pipeline library. So that shouldn't be that much work because it's a docker image with Node.js one-shot commands. So writing our custom shell library for Jenkins so we can run it from a shell library will allow us to control the full flow because right now we have Jenkins and GitHub working concurrently. That's not the most prior topic for now unless it's blocking us. Let's see if the lock unblock us for short term but if you're interested in writing shell library that could be a great exercise. Any contribution? Welcome. Thanks Arif for taking care of that annoying part because it was blocking all our docker builds since two weeks and it was not iced so thanks a lot. Other topics that we fixed and I forgot? Okay so work in progress. Mirrors Jenkins.io Sunsetting mirror brains and consolidating our mirror's infrastructure. Blog post published. Thanks a lot folks for helping reviewing and publishing. The date is this Thursday 19th where we will switch the DNS to the NuGet Jenkins.io and then in the upcoming days we will start removing mirror brain pieces from our assets virtual machines, puppets, documentation, etc. Digital ocean email sense. So the cluster is, as we said last week, has been stopped. We can stay five months in that step before really running out of credits. There isn't any builds handled by digital oceans since last week and the email has been sent with the team in CC to digital ocean marketing people that we were in contact with. The Jenkins board has been CC'd as well to the email. So now let's wait for their answer. Either they say no and we can close the digital ocean tooling and area or they say yes and then nice we can start build the game. No question. Next topic is depreciation of some data.dog annotation that helped us to use data.dog to pay jodot integration. They have been replaced or changed. I don't know. So a rate to your mind updating us on that area? I didn't have a lot of time to look at it but I've looked in data.dog and there is an active jodot integration but it seems one way only. I have to check maybe contact a page jodot and ask them if we can try it and if we can benefit from it as an open source project. I don't know. Okay. And there were another solution proposed in data.dog depreciation notice. So I can look at also at this one, this other one. Okay. Are you willing to keep that item in your area and work on it next week or do you want to pass over? It's okay to continue. Yes. No question on that one. Mirror in Singapore. So status is we are still retrieving data from different areas. We missed some times to do it properly. We need to retrieve and write a run book to have a standard location where we have the, let's say, the information to pass along to proposition to spin up a mirror. So Erwin and I are still allocated on that part. I mostly have to search my history. Still okay to keep that thing or someone else want to take over or you want to pass it over? I can tip it. Okay. So let's put it on work and progress next iteration. Jenkins build our own windows image. So thanks a lot for the huge work you did on that area, Erwin. I've brought some things on backer images that you had to fix. I was working on other areas. We were, we did a lot of templating. So now the windows images are using chocolatey or on the verge of using it. That will allow us to have more feature parity between Linux and windows virtual machines. I understand there are still some work in progress. Is it okay for continuing this week, Erwin? Yes. I don't have any more blockers. So as a case of the container, the CST tools from Google to test the image, but since they are completely different on the Linux or Windows target. So I think testing if windows version exists run it and if not skip it. Okay. Let's skip on windows. Is there any question blockers things to mention on that topic? No. Okay. So let's jump to the next one around the terraform for Oracle cloud. We have three steps. The first one mandatory and required is to initialize a terraform project and the back end and the credential. So we can start managing Oracle with a terraform provider like we did for Azure, Digital Ocean and Amazon and Datadoc. Once that one will be created, that's a kind of implicit tasks that could be part of any of the two issues that are on the milestone. Then in parallel, we have to first prepare a new virtual machine for updates Jenkins.io that won't be used immediately, but we need to provision it then with Puppet and test it. The second one is to import the actual Oracle cloud resources that have been created manually by the team one year ago and to start managing them with terraform using the terraform import command. There is at least the archives to Jenkins.io virtual machine with its data volume. I assume some security groups around. I remember we had a second virtual machine. I haven't looked like that. Mark, do you remember if we have another service on Oracle cloud? We have several. So for instance, I'm borrowing some capacity there to run small agents for my private server, for my private test Jenkins. Do you feel like these machines can stay out of the automated management or do you want us to manage them with terraform? No need to manage those with terraform. We can have a mix of both. The important part are the persistent services or the services that need management from anyone from the team. So archive is a candidate. I don't know if we have others. Yeah, so are you sure archives is unmanaged as far as I knew it was managed? It's not terraform managed. Oh, not terraform. I see. Well, so we certainly want the terraform managed. And if the unmanaged things that I've got out there are in any way a distraction from the team, let me know and I will destroy them because they're just there serving me as small machines that I use for as agents. Okay, since it's used for most of the time for his check of releases, that makes sense to give community credits on that area. That should not be an obstacle for us to manage the rest. However, for the person that will take that task, please ask mark for validation when you see a resource that you think should be managed by terraform, double check with mark and the rest of the team just to be sure that it's okay to let it be managed from the UI and not imported in terraform. So question, we want to take one of these tasks because right now it's allocated to Erwin. Hi, but I don't know, Stefan, are you interested in pairing with me on importing the unmanaged Oracle Cloud resource? Of course, I'm interested that. I'm forcing myself not to answer the question because of course I want to do everything. Compared to the task you would have in the upcoming days, taking an account that will travel, do you think it will be compliant with your tasks given you closed the lots? It's for you because if you want to pair with me that have to be either in Bruxelles or when I'm home, so it will lock you during my travel. No problem. So let's get started on that one if it's okay, that will involve creating the project and Erwin if you're okay, we delay the from scratch machine for you once you will have the rest. Is it okay for everyone on that way? From scratch machine, I'm sorry I don't get it. From scratch, you had some code on terraform, you open a pull request. I want to merge it, create a new machine. A new service. I'm sorry, you're talking about terraform and to provide new machine. Okay. Two other tasks, CI Jenkins.io, sorry folks, with my two days of I totally lost track of merging and publishing the reports that we did with Bezil. Since we didn't add any outage and that Bezil acknowledged that there is no danger for CI Jenkins.io, that's only a risk of build having to wait, so it's only written more than a blocker. That's why we didn't press it and aside Bezil has a lot of tasks, so we didn't add that much time to elaborate on the issue reproduction. So I propose we delay until next week, I will take care of finishing this report, sharing it, so we can close these issues and start the tracking issue of the task that we want, so we can start prioritizing. Is my understanding of the situation correct, Marc? Do you agree on that area? Or Stefan, as you were there? Yes, yes, sure, yes. And last walk-in progress is auto notify people based on service routing rules. I assume it's an issue management task survey? Yes, walk-in progress too. Okay, let's go to notify to ping people of many or other people of teams depending on the service-selected issue. Nice, nice improvement, that will help a lot. Okay, just one look at the incoming task first and the new important survey you mentioned, Archera, we discussed with them. Yeah, we don't have a desk issue, I don't know if we should open one. Yes, can I ask you to open one? However, just a reminder, I'm asking you to challenge what I'm going to say, but for me it's really low, low priority. Even though it's a partnership, it doesn't accept the tip like we should use spot virtual machines on AWS and we should try to pay upfront to reserve some supervisor to run our VMs. It sounds like their way of helping cloud users does not fit on our model. We still want to try things for them, but we have clearly more important areas and since we are not able to finish our task for now, I don't see priority on that area. No, but I think at least they said the activation of their service was quick. I think if we could send them data, they can already have something to work on and it won't take us much time, I think. Okay, let's say okay, we add to current milestone this one and I propose the following rule of thumb. If the desk issue, but Archera, if we are not able to work on it in the free upcoming iteration, so the free upcoming weeks, given all the things we have to do, then we can tell them we don't have time at all because we need to put to either tell them so they won't stop waiting for us because I don't want to annoy them too much and if in three weeks we don't have much more time on that area, that means we don't have the bandwidth. Does it seem a good showstopper for you? Sounds fair to me. I don't think we're doing them any harm by not engaging with them, but if we can't engage with them in the next three or four weeks, it's fair to them to say, hey, we apologize, we've looked at it, we just don't think it's a great fit right now. And in that area, I will take that role of telling them, since I'm the one proposing to the to-stopper, I don't want to put any of you on that situation. You are also welcome to pass that to me, Damien. I'm the one who started the investigation with them and that's also fine for me to go back to them and say, hey, we're sorry, our workload is just not such that we can do this exploration right now. Thank you for taking care and thanks, Mark, for proposing. I've locked my mac. On the upcoming important elements, there has been a raised issue from a user on Jenkins I.O. That's, I've opened about adding a permanent HTTP redirection from the old Jenkins is the way that I.O. to storage Jenkins I.O. because most of the link that you can find on the internet or in Jenkins.io website itself, most of this link are broken. Most of the time, it's an old blog post, for instance, that triggered the issue. So the proposal is to add a single ingress rule on the public cluster. If we're able to point the Jenkins is the way I would name to Kubernetes, then we don't even have to build a docker or a web server or whatever. Kubernetes will take care of the redirect for us. So Kubernetes would then serve from stories.jankins.io, any requests for Jenkins is the way.io, that sounds great. Because you can create an ingress rule with the correct annotation, which in fact will use the ingress engine, so ingress controller that will create a virtual host for the domain Jenkins is the way.io, if we can point it, and that will answer HTTP permanent redirect with the you can customize the code, 302, 308, 301, whatever you want, and you don't need to create a Kubernetes service or Kubernetes deployment behind to serve whatever, just need the ingress rule. So it's a kind of easy redirector managed, and that should be quick to do. Is there anyone interested in taking that one? Do we know easily who is honing Jenkins is the way.io and where is handle the DNS? Yeah, Alyssa Tong has ownership of it and can help with questions about who the DNS provider is. Stefan, by asking that question, you just designed yourself a volunteer. That's what I hate too. If it's okay since it's Kubernetes area and you're willing to grow on that area, you can, if it's okay, you are assigned to the issue, you can bump on me or Erwe on that area if you need technical help. Does it sound good for you? That's perfect. You have the right to say no, and it's not a problem. I know, I know, but I want to say yes to the thing, so yeah, go ahead, love it. On the infra next topic are there main things that you think will fit the next iteration, even everyone will be traveling to Brussels, except Mark Monday and Tuesday. That's do you see other tasks on the infrastructure here on that milestone or on the pile of tasks that we have that you think is important to add to the milestone as a let's work on it. Okay, so let's get started with the current work in progress. As a reminder, if you're able to fulfill all the other tasks, then you can take issues on that list. It's a kind of back burner. Pick the one you want to work on based on your own interest. I like back burner. Okay, is there any other topic you want to mention for this week? Nope. Okay, so let's stop the recording for today and the sharing screen as well.