 Okay. Hello everyone. Happy New Year 2022. Welcome to the first Jenkins Infra open source weekly team meeting. So today we have Hervé, Stéphane, Marc and Tim. And hi. Let's get started with announcements. So this week we have the weekly release, like every week. I saw that the Docker image, Debian package were available one hour earlier. But the job was waiting for the usual Windows MSI synchronization. I don't know if it failed or succeeded. So it failed on some, it spent 100 minutes waiting for pkg.origin.jankins.io to respond to an SSH call. And so I stopped it and restarted it. Okay. I see nothing wrong other than for some reason pkg.origin.jankins.io, the same machine that received several other things through SSH didn't respond to this Windows request. And so it's running again. Okay. And I made a note of that in the Jenkins-release IRC channel for reference. Sorry. No, no, no need. It's just... Okay. Thank you very much. Second announcement, Hervé has been granted administration right on the GitHub organization. So it has been requested three weeks ago on the mailing list. He had to sign the cloud that has been done. So unless I miss something, everything is okay. So I took the opportunity so we can work on managing all the repository, helping, helping on boarding Stefan and other newcomers. So just a round of table. Is it okay? Or is there anything that I missed or that you don't, you want us to cut his access? No, it's fine. The only thing is, generally you don't need it too much because the infra team is normally admin on pretty much all the repositories anyway. Exactly. It doesn't change a lot except now is able to manage GitHub app on team organization secrets. So that could be pretty useful. Okay. I don't have any other announcement. Are there other announcements around there? Okay. Let's go ahead. Step one. So team, it's your subject, GitHub issues for infra. Yep. So I've kind of briefly mentioned it a couple of times over IRC in the last week. So we just switched the hosting process over completely to GitHub issues and that seems to be going quite nicely. At least I created a test repo for the infra project. But yeah, I mean, the hosting one is a really good one to have a look at. If you just go to the repository permission update, you can see what you can do with it now. No, it's not. It's just not called what you think. It's just type repository. Sorry. Can you repeat it repository? Yeah, that one. And if you click issues and new issue. Oh, I'm not logged in. I am. Okay. New issue. Hosting request the top one. So this is all defined as code and you can easily change it however you want. It's not a lot nicer than Jera because Jera is very limited to who can change it. And even as someone who has access, it's pretty horrible changing it. It's so hard to find your way around and make waiting for it to resync and everything. And like, yeah, even having the access is horrible to change. Whereas here, you can change it as easily as you want. Okay. So I just sent a link in the chat to the one that I made as an example of what could be used if people did want to move. So it's in the Zoom chat. You've got it there. Okay, I've added the notes. And so you can add things like FAQ entries in there for common issues. Cool. And so I created two templates. I created two templates there. One for a specific common task, which was GitHub permissions with some specific fields on that. Cool. Tim, Damien, could you go back to the new issue? So on community forum, that just opens community.jankins.io. So it encourages people to ask questions there rather than posting a GitHub issue to use it as a question? Yeah. Yeah. Excellent. That's brilliant. Yeah. And then there's an issue of one I created, which was, I think, generally based off the info project when it has set up at the moment, but with a bit of text, a bit of special text. That's cool. So you just, I haven't created any on this, but if you just got a new issue and just created a general one. Yeah, Damien, I wanted to see the, okay, so it's, the general issue is even that as a form. Is that right? Yeah, that's yeah. Oh, that's nice. So avoiding, at least in motivating people, please give us enough information to duplicate it and please a numbered list. Really, we need to know the steps you took. Yeah. Yeah. So I mean, I mean, the general Jenkins CI one has got more fields and stuff, which has really made things a lot better for the Jenkins CI GitHub issues. The issues that I've been getting recently have been a lot better. And even things like environment for Jenkins CI issues being collapsed since it's all working really nice. But I think infra generally needs a bit less information, depending on what it is. So cool. Sorry about it. Yeah. So set to create a label with triage. It kind of depends on how you'd want to work with a, you want everyone to review an issue and move it somewhere. I was planning to probably do, if people wanted it to do some sort of auto labelling, based on the service aspect from the dropdown. So if you click a service. I don't remember if, if it's you who listed some example of GitHub action, which could reference similar issues. Do you have a comment? I think Gavin posted that in Gitter. Yeah, where it will ingest all your issues into Algolia. And then post a comment. It's not quite as good as the, it has Jerry's plugin for it. But yeah, you can do that. So the thing I was looking at here was adding a service and then adding something which will automatically add a label off of that, which would then have some configuration for who wants to get notified for what labels. It's one thing where GitHub is lacking is that there's no fine grained subscription service. People have to build their own. But yeah, I added a bit of text near the top saying this is for the infrastructure on the Jenkins project. Moving out of the general issues to Jenkins. I should hopefully help with those issues anyway, because as everyone's saying, you get a number of issues that have nothing to do with Jenkins info in there. But kind of just reinforcing it still. So, so Tim, we could use this kind of technique for jankin www.jankins.io issue reports that come in through. We could use these kind of forms there as well. Can they be filled? Can they take data from a parameter in the URL? Yep, they can do. So you set. So if Damian brings up the YAML file again, yep, any of them. If you click edit, unfortunately doesn't render the YAML. That's nice, which is great, but not what we wanted. Yeah, so on there basically each field there's an array of entries. If you set an ID on that, then that can be auto filled from the URL. So if Damian just goes back, if you click issues and then new issue and new issue. Yeah, I had the zoom pop up. Sorry. No, and if you just click the top one again, you'll see in the URL bar, there's some feel some parameters. So those are built in parameters. But if you set IDs, so you set IDs on any of your fields, then you can pre-fill them with that. Yeah, so and URL equals blah. Yeah. Yeah, see how you can do that. And you can auto fill. Yeah, you can auto fill the service with that sort of thing. If you'd set the ID because there's no ID set on the moment. I didn't catch that one. Is it the ID? It's in the YAML config. You need to set an ID for each field. Yeah, so basically, Mark's talking about how does a report an issue link, which includes like the URL. So he wants to automatically have it. So I think currently we pre-populate a whole body. It's pretty nasty. But this means that you can just put the actual URL and without having to pre-populate a lot of stuff. Well, we populate the whole body and then the submitter deletes the whole body or metals with it in a way that makes it completely useless. So this gives me much better hope that we'll get useful, better submissions. Excellent. Thanks, Tim. The next step now will be we as a team should start using it for all of our requests. And we need to migrate the existing Girard infrastructure on that one. These are the next high-level tasks. I don't know if you have talked about that. You assumed there are a bunch of tools for that. Yeah, I've done migration before. I've done it on quite a few of my plugins. I wonder if it's in my forks. It's definitely got it checked out somewhere locally. So, Tim, have you found it useful to do the migration from Girard and not just leave things in Girard and work in a hybrid mode? I prefer it to just been off Girard if I can. Okay. It's a lot nicer just to have one source of truth. And so if you stop people reporting, it's really annoying having people reporting new issues over in a different place. I mean, you can kind of mitigate that with a whole project because with a whole project, you can stop people creating new issues. But if you're not getting rid of the old ones, so I didn't migrate anything that was closed or resolved. I just migrated issues that were still open. I have so few other questions. The first one is you can reference other issues from other repositories. So that one is Girard infrastructure. It's a general question for everyone. We have the concept of EPIC that can be used in different ways in Girard. The EPIC is like a super task that lists a bunch of tasks that are each one atomic in terms of work item, but it's a general thematic like deploy the new service, apply automation, update it one time. The EPIC should be well-scoped even if it's big scope, but it's still a scope. You can close it at the moment on time. I don't know on GitHub issue, even when using projects, is there a way to have this? Because in that case, I see that the EPIC would be an issue in that repo or a new repository like this one that will link all the sub tasks on all the repositories. So there's a few ways of doing it in GitHub. So there's milestones, so you can group a set of issues together into a milestone. And then there's also, yes, you assign them to a milestone, or there's labeling for grouping the issues. And then generally, you have a project or backlog. So if you go to projects, you can have a project for that piece of work, or as your backlog, you can use the top one. The top one is a lot more powerful, and you can create lots of different views, and different people can have their own different views. And you can have that at a repo level or across the organization level as well. So then it's kind of about how you want to work. Labeling tends to work really well. And you can just have it, you can just define a top level issue as well. Top level issue with labels to link them together, or... Okay, that should make it. I propose that we try it for the task we have right now. I assume that will involve maybe migrating a few existing items from JIRA, like the configuration code or the Terraform relative that we are working on, moving them here, single or we can aggregate them. And in parallel, if it's okay, then we should migrate the issues that are currently open. I don't feel like that we need the closed issues. But the goal will be to try this this month. That's a proposal I'm making to see if it's a valid replacement for both how, let's say, kind of project the big management and for the L-desk, that should be the same thing. So how do you feel about that? So it's a good idea for you. We want to take that this action. So that means taking team current work and creating a entering we have a help desk and then go it. How are you motivated to work with team on that? Thanks a lot, Tim, for that. Yeah, I wasn't sure if you're interested in different types for stories or that sort of stuff or whether just one generic issue. I don't remember. I also saw a link to Eclipse doing this for Epic. So I don't remember where I saw it. Does it ring any bells? No. Not sure. Seems like putting those gibberish IDs in, even though I've set my username properly in here and it had something that wasn't my general username. Thank you very much. So let's see how it behaves, but that's really cool. That's really really cool. Thanks, Tim, for all of this you have done in 10 days, what you would have taken a few weeks. So thanks very much. Are there other questions about the issue area on GitHub? Okay, let's proceed to the next topic. And what about the issues we had during the past two weeks? Three weeks? Four? First one, we faced network issues that started, let's say three weeks ago, middle of December. So we started to sew only on the Kubernetes instances, in particular on the Jenkins agents, Kubernetes agents, a lot of TCP issues. Most of the time it was random connection being cut when trying to reach outside the cluster, like cloning a repository from GitHub, pulling a Docker image. There weren't exactly one kind of domain or IP, class of IP that was, it was completely random. So, Erwin and I, we checked the Azure console and we saw some alerts from Azure UI about, so that was a topic we presented two weeks ago. We had, we have IP overlap, these alerts have been raised since one year and a half on the Azure cluster. We have IP overlap on different features of the current Kubernetes cluster. So after digging the documentation, we don't have any other solution than creating a new cluster. We cannot change the IP area which is overlapping with our virtual network. Without creating a new cluster. That means, as we said two weeks ago, they want to restate that, that will be the next major task for us. That's a great opportunity to split the existing cluster named PodPublicKate in two new cluster, PrivateKate and PublicKate. The public one will host the services that are public facing and the private one should be on a private network only available through VPN in order to have more safety around the Jenkins instance such as infrasci or release CI with sensitive credentials. They will be on a physically separated network. That's a work that wasn't done, but that work is, will be an implementation of the IEP from Tyler, the IEP2 because in fact it wasn't implemented correctly. And that one described to at least network production. It's not the development network. That one could be raised in the future for the, let's say, what's the name, pull request instances for the developer of application like plug-in Jenkins that run on public. But right now we are on using correctly VPN with the private production network. So that means we will have to create a brand new private production, migrate everything that should run on this, such as infrasci, which is already done, but release CI, Grafana as much, and then create a new public production and migrate everything on this one. We have to expect an outage on some services. The main one will be LDAP because we'll have to migrate a stateful application LDAP from the current buggy cluster to the new public one when it will be created. I mean, it should only be a couple of seconds. It's just a DNS swap, isn't it? If it might not even be an outage. Yes, depending on how you do it. Yeah, we will have to stop the application that can write in LDAP though. But yeah, if we do a scurvy, we should avoid the LDAP. Might be even faster than restarting the bot, right? But that's the next steps and we have to put this as high priority because the reason, thanks to team research and knowledge sharing, the reason why it started to appear middle of December is because Microsoft has rolled out a new network infrastructure system. So it has been confirmed that we are not alone having this issue. And any alerts that was raised since month or years started to have a meaning, which means IP overlaps, the packets weren't put in correctly. Could we raise the support ticket and get them to roll it back? I don't think they could given the size of the change, but yeah, maybe. No, I mean, so they were doing an incremental rollout to customers and they paused the rollout partway through December because they didn't want to have any more issues around Christmas time because people were going on holiday. I wouldn't be surprised if you could ask them to roll it back. I mean, sure, we can fix this, but it'd be nice if they could undo it now. I think yeah, that's worth it, opening support tickets. I'm not sure it's really an issue in the sense that by moving Infrascii out of the existing cluster, we don't see TCP issue anymore, mainly because the IP overlap was caused by the additional IPs requested for each external request made from the Kubernetes agent. So we are just back below the thresholds. So if we start migrating other services, we should be okay. Wouldn't we hit it on release.ci as well? No, I wanted to experiment and fix all issues on Infrascii before, and because we have to fix the next one before operating. I mean, won't we hit the same issue with release.ci or is it just that we have less jobs running now? Oh, we had the issue until we stopped the increased activity on Infrascii. So now we both are working as expected. So it's a kind of equilibrium. Yeah. Yeah, I guess it'll work. I'm wondering whether that's related to, because even when we swapped the new cluster, we used to have these Azure API issues where the container step doesn't work properly. Yeah. Whether it's related to the same thing, because the new cluster will have that as well. Exactly. So that's the second issue. It appears that when you have an AKS cluster, there are issues when using side containers with the Kubernetes agent plugins. So that has been, it's not confirmed by the developers, but there are people at least at CloudBiz that are working on that since this morning. It has been acknowledged by another user than us on a public issue. So now there is an engineering effort from different contributors to work on that part. That's good, because it might be something to rise with Microsoft as well, but I guess they can do that if they've got some Microsoft contracts somewhere. So just for information, I've put a bunch of nods about the cluster and the Kubernetes issue. On akmd, I've added the link on today's notes, because until yesterday morning, we weren't sure, except maybe Tim, because you were in advance with the analysis, but we weren't sure that the two issues weren't the same. These are two separated issues, and the Jenkins Kubernetes issues, since it was spamming the AKS control plane of a bunch of requests for kubectl exec to the agents, it was adding more load to the network, catalyzing the network issue two weeks ago, but these are totally decorated, because the new temporary cluster where InfraCI is leaving us for today has the second issue, but not the first one, and the latest Kubernetes plugin. So in order to have everything working for InfraCI, we did a bunch of work yesterday and today with team RV, we switched from pod agent with multiple container to single container model. That means we had to rebuild and upgrade all of our Docker images to inherit from GDK, which was quite a huge work, so thanks everyone involved on that, because that was let's say annoying, not complicated, but annoying. All images, so thanks a lot for that. There are still some areas that we need to finish, the Terraform management port, RV is working on that, and Tim, I don't know if you gave in or you started to use the new Node.js channel P image or not on the plugins on Jenkins websites? I assume, Gavin, well, when he gets back to it, but it'll only fix the plugin sites, the changes we did. The annoying thing is that Jenkins.io uses a really old version of Ruby, and it's not, that version of Ruby is not available in Alpine's repositories. Okay, so we might need to define new images for that, but the rule is all images must inherit from inbound agent, Azure, Debian, Windows, or Alpine. You can choose, we don't mind Debian or Alpine, that's the same result. It's just that this image must have the default tools, default user Jenkins and default on three points with the Jenkins agent script, so that Jenkins controller can start the pod with this one. Yeah, I'm really hoping that Gavin does get Spirigarh and finishes it, because it's really annoying having to use old versions of Ruby, and it just doesn't work on the new stuff, and the Wastruct doesn't really maintained anymore, like the tooling that it's using. I wonder if we cannot reuse the Ruby used for Puppet, because it's also not the one that I'm sure it will be the same. I might need to help. Yeah, I mean, there's probably, you can probably install like a Ruby Package Manager or something which installs it, but it's certainly not available in the repositories. I had a look, it's all three plus. Okay, so that's work in progress, but now we have a bunch of workloads, and we haven't seen any WebSocket timer issues since on these jobs. So thanks very much, involved in that, because that was not the best way to start the year. The third issue is just a special thanks to Olivier, because Olivier and I tried to update CLI on some elements, and we, everything started to break, and we have a bunch of failed jobs, so that should be fixed by updating to the latest 0.17. That one was quite annoying as well, because it was stopping our ability to update often as well. And the combination of the three issues led to the fourth topic, incremental updates was not updated in plugin since a few days, if I understand, or it was not working. I might need a mark who Tim described the issue, because I'm not sure I understand that all the incremental stuff, so I need help. Jesse made a change a week ago, and it didn't get deployed mostly because mostly because of builds being broken, because of the network issues. Okay. Now I understand the words you're using. I can tell you the story if the story helps. Yes. If you don't care about the story, I'll tell you about it. I can tell you later, whatever. It's a fascinating story, but maybe it's not here. It's the infra breakage meant we didn't deploy a particular image, and that image was needed because other changes were needed in how Jenkins plug-in incremental deployments work. So when I deliver a new pull request, I automatically get a build of that pull request that I can use in a plug-ins.txt file, and it's really elegant. It's very, very powerful. It lets me evaluate betas with code, pre-releases with code, but that was broken because we had to break it, Jesse had to break it due to some novel and interesting behaviors in Apache Maven. And let's call that enough. That's enough description. I can tell you a much longer story some other time. The other bit is... The other bit is... The one with the sha1 with underscore and everything. Yeah. Yeah. Yeah. Ran will be the text leaders and marks it and thinks it's a beta when it's not, because it finds a B in your commit string. But yeah, the other thing is that I think it's just it was basically a drive by PR. I don't think he's actually tested it fully. So we just need to see whether it works. So it may not actually fix the issue, but it's a drive by PR that will hopefully work. Okay. But okay. That makes more sense between team explanation and ILC and yours, Mark. Now I start to understand better that area. I was close to zero knowledge on this area. So good to learn. Cool. Since it's... We are 10 minutes past the limit of the meeting, I propose that we delay the other topics that are, let's say, less important to the next team meeting, unless there is one you want to bring right now. One, two, three. Okay. I'm going to stop the recording.