 I have everybody welcome for this new Jenkins infrastructure meeting to the agenda today will mainly cover costs on our front. So the first topic that I want to mention is, I did some cleanup on the Jenkins infrastructure organization so we had a lot of contributors within the organization but a lot of them stepped down for whatever the reason. And the thing that I wanted to do was to create an admin group where I could put everybody there, so they would still be inside the organization so we can still send them notification, contact them. But at the same time they don't have changed permission. So we reduce the risk of an attacker to take over to take on their accounts and modify whatever the tree repository. So the idea is, if, I mean, if you don't need a specific permission usually it's easier to just ask to be removed. So yeah, we don't take extra risk. And usually, for every gate repository I try to have at least two or three maintainers from different time zones. So we are sure that we can have someone who can merge any changes rapidly. So yeah, another major change that did in the organization was to switch the default admin permission to maintain our permission. In most of the cases, we don't have people, we don't need people with admin permission. So again, really few people with admin permission and otherwise we delegate the maintenance to different teams. Yeah, that's many about that. Any question on this one? No, something that I would like maybe to put in place when I have some time is maybe some terraform to monitor default configuration. So the idea would be, for instance, to be sure that we always have at least two teams per tree repository, whatever. I don't want to use terraform to control the Jenkins info organization, because I'm a bit afraid that someone modified the default configuration but I would like to be able to identify if we need to grant permission to some person. So typically when something that I would like to solve is when we had an issue with the update center one week ago, Daniel opened a PR on the status key repository, but nobody was around too much to peer accepted me. This is typically the kind of situation that I would like to avoid in the future where basically someone create a key repository, put some automation around that and then forget to grant the right permission. So this is the kind of situation where it's annoying. Another thing that I also fixed a lot was people to transfer git repositories to the Jenkins info organization, and then lost access to the git repository because that person only has read permission. So again, you need one person to fix that. So we have more than 100 git repositories on the GitHub organization. So it's not always easy to verify accesses on those. So we definitely need a better way to handle access. The next topic, which is very brief. We start to version and charts maintained by the Jenkins info organization. So update CLI is now can now handle that workflow. So typically when a new Docker image is published. Update CLI, they take that updates on the right hand chart, the chart version. And so I would like to, I'll probably be, I'll probably use a GitHub app. But you already have some experience with that. But I'll probably put a, sorry, a GitHub page so we can query the right hand chart version that we need. You're right. Do you have an experience with that? So about GitHub pages for hand charts. Yes. Yeah, I have, I've got that with on the captain hook, but doing it in that way. I think you just need to create, you just need to create the branch. Emily first and then I think it's the chart release action. If you want to do it that way. We'll publish to that. Okay. I've got examples of that working if you want to see that. Yeah, I think that would be a small, a nice, a nice improvement to the situation. Because right now we use hand charts. Located on a local directory, which means that when we do change on the hand chart, we cannot easily roll back to previous version. And also we also installed the same hand chart and you under on different clusters. So you would be able to have to have that flexibility. And now that we have everything in place to automate those hand charts. I think we are really, it wouldn't take us too much time to improve. The next major topic that I brought to this meeting is about the cost. So I reviewed the different accounts that we have. We created the Google sheets with the different costs. And basically, we are, we, we increased a lot our Azure voice. So we are slightly above the limit. That is asked by the CF. So we now have to identify ways to reduce the costs. We also so on Azure. One force of the cost is for Seattle Jenkins that are you. So we just have to put stronger limits on Seattle Jenkins agents. The number of agents that we can use. Right now we are using either for Windows machines or for Azure container instances. That's, that's something that the other thing is we also increase our costs on the Amazon accounts. The thing is, it's normal to increase the cost because we have more usage. We work on those accounts. So it's totally normal, but you also have from time to time to clean up old services and identify ways to limit the gross of those accounts. And so that's definitely the perfect moment for that. Another account that I look at was Rackspace. For some reason that I cannot identify the machine, the network bandwidth drastically increase. So this is the machine used by Archive to train kids that I own. So we had to time more traffic over February and I don't remember any change. I just right now can't explain why we have more traffic on that machine. So this is something that I discovered this afternoon and I have to to dig a little bit there. Any questions so far. So the good thing with the Rackspace is the cost increase, but you are still below the limits. Okay, so we're not, we're not at risk of receiving a bill from Rackspace for the increase. Now, I assume your question on why did the traffic double is likely due to. Oh, there must have been some change somewhere that motivated traffic to arc to go to archives instead of going to the mirrors or to get like it should. Yes. So basically archive is used as a fallback service when I get touching the radio is not available. And because of the traffic that goes to get the Jenkins radio, even if that service is done for let's say one day, the traffic is really huge. So when you when when you look at the traffic that goes to the mirrors, we have more than terabyte of traffic. So I just mean, it can simply mean that get the Jenkins radio was done for maybe a few minutes or maybe a few hours. And there are many reasons why traffic was redirected to our garbage and key that I use. So this is something that we have to understand. And that that's because I'm still saying lots of plug and download issues from various places. I started creating a doc capture them all, but it just got to have too many entries to maintain. Okay, so I had not seen any plugin download issues recently, but you're consistently seeing them, Gareth and from all over. Yeah, I'm probably getting five, maybe or 10 a day based on the repose that are triggering just had one now on the Tecdon client plugin that needs to create a custom image with the new plug it in, but it needs to download other ones failed to connect to talk about how to connect something when downloading plugins. So the reason why it's hard to debug that specific issues is because the traffic arrived to get the Jenkins radio get the Jenkins radio identify the closest mirror to you and then redirect you to that mirror. And for some reason that mirror in that time became really slowly became very slow, sorry. Then you get time out issues with that specific mirror. But if you ask someone until to install the plugin, that person maybe is in different part of the world, and we download from a different mirror and everything will be fine. We have two ways to improve the situation. One can be done on mirror bits. So mirror bits can be more aggressive about disabling mirrors or enabling mirrors. But the underway would be from Jenkins to not to Jenkins or Jenkins like to not fail or at least retry when the mirror is slow. So Gareth, I assume your failures though are coming from things that you're actually running in the cloud or are they from your local your local environment. It's all in the cloud. Okay. And this is, which is also something that you have to keep in mind is because if you run your instance in the cloud, let's say Garrett is in Europe and time zone, and the machine is in the US then your machine will be redirected to your US mirror. And so if from your machine you try to debug that situation. Everything will be fine because you're not in the same region than your machine. I think the cloud regions that you're tending to use are US based typically right I don't know if they're US East or US West but I'm assuming that they are commonly US based or are you using cloud regions that are are Europe based. I'm not 100% certain. So some of the recent failures have been on on actually on a Tecdon client plugin, which has been built we have actions, but it's trying to build a custom Docker image. And it's tests inside there and as part of that uses the plugin installation manager just download all the plugins. Right. And so therefore it's not it. It's a fail quite a bit. There's other issues around that as well. We like not so much infrastructure. But we see, like, if I have a Jenkins version to it's fairly static and the plugin text file where all plugins are defined. And I build it once and then a week later I built it again. And once it's all I'm not going to get the same build. I actually quite often get build failures because it's it tells me that my, the combination of the plugins I'm using or incorrect. Yeah, and that one I think is a documentation gap more than anything else because there's a there's a setting that says, if I don't honor my version numbers, I absolutely mean it. And, and right now the default with plugin installation manager is to not honor my version numbers, even if I absolutely mean it. Right. So, so I'll, I can, I can go through that one with you I had a long, we've got a discussion going in one of the plugin installation manager issues or pull requests about that particular setting. So that is because I can apply that to these bills, and hopefully that will reduce the number of failures. And then you're only see I should only see the actual time out some disconnect sometimes. Right and you are you are using exact version numbers in every plugin that you're specifying right and you're specifying them all so you've got an exhaustive list. Yes, so I think you want the minus minus last faults. Last is that I'll look it up and send it to you, Gareth. I think it's just because I know I've. I know I've seen that one and I wrestled with it myself well why is it going and getting a different version of what I specified I gave exactly the spec. Just on the increased for rack space with the. So archives to junk is that is that also the first place the plugins become available for them. So basically, we have a script, we have a script located on a machine, and that script will upload our to fight to our character Jenkins that I owe to the awesome network. So I think it's the first awesome mirror that is updated, and I think archive come at the end. So Daniel saying something about like the first within the first sort of like 30 minutes to an hour or something of a new plugin being released, where it's not available on the mirror as it tends to fall back to somewhere. Oh, yes. Yeah, I'm wondering where I'm wondering whether or not that scenario is happening more now. And that's why we're seeing increased traffic. So that's a good suggestion. We can easily identify that on the Jenkins. Infra chat repository. I think it's. No, no, sorry, you can't see that. But I can tell you what are the fallback machines. Let me double check this right now. So the fallback is definitely fallback. Get the Jenkins radio and archives the Jenkins radio. Yeah, definitely. So that would explain why the traffic increase market. Because because the way so yeah, so the way mirror beats works is we upload some mirror beats as a local five systems. So we upload the file there. And then on the mirror, this there is a job that regularly built a file ashes. So this MD five chat 256 and so on. Then identify mirrors that also contain those file with the same hash. Because every mirror's don't necessarily synchronize immediately with mirror beats, you always have a gap between the moment that file is available on everyone else. So the fallback situation is, if get the Jenkins that I cannot redirect you to mirror, then it's a new the service. The fallback form of fallback location. And one of them is a loop on itself is to query get the Jenkins radio. So for that, for that to know if serve if a mirror, I mean to know if a specific version is already available. You can just go to get the Jenkins radio you search for the plugins or slash plugins the name of the plugin the version blah blah blah. And then you can specify question mark your list. And then you have so I can just provide for instance, just put the link in the chat, because I won't share or maybe maybe mark you can share your screen specifically for this one. So it's not a plugin. The link in the chat here. So you can just open. So that's an example with the windows. So we, we know that information from your bit. So typically what it says here is on top right, you have the known ashes that your beats known. And at the end, then you have a list of your house with the corresponding file. So for instance, if you look for very old Jenkins version, whatever, then you'll see that no mirrors contains those files because mirrors are usually configured to contain to keep one year old data. So that's something. And at the same time, if you just release a plugin. Yeah, also, yeah. So for windows, because we changed the file. When you switch to the new. So if you just set up windows you look for it again, for instance, I think if you take, let's say, I put a URL here. So I put a second URL. So this one is a very old Jenkins version that normally nobody would install. But basically what it says is no mirrors contain that file so it fall back to fall back zero or fall back one. Those are fall back that get to Jenkins or archive the Jenkins radio archive is located on a different machine just a different machine fall back that get the Jenkins radio is just the same machine that get the Jenkins radio. So it's just a hack because no beats doesn't doesn't distribute file on the redirect to so it just a second service ring on the same location that just gives you the file. So yeah, that's all about the new house so to be clear that that's the city that that will explain why the traffic increase and archive the Jenkins radio, which is fine. So we can go back to the notes. While you're there, maybe I can show maybe a few things with mirror bits as well. So if you take the example so the windows link. So if you open the mirror windows link, but this time you replace the end of your list by mirror stats. Okay, now you have the list of mirrors and you have, you know, when was the last time that we all was updated. Also, for every mirror else you have to line a dark line and a light one. So the dark if you put the cursor on the dark one it shows you how many files were downloaded. And if you put on the other one you see how many traffic was downloaded from that location from that space. So that's one of the information you have another that is in that is in the last 24 since since midnight local time from you from UTC. Oh, since midnight UTC. Okay. Yeah. So that's another thing. Another information that you have is if you just put stats. So then you know how many times that specific file was downloaded today, a month ago, a year ago. So if you put an old version, a more recent version, let's say the latest weekly, for instance. Yeah, 286. Okay, no downloads of today's release. How about last week's release. And you can also remove the shop because the file is the shop. Oh, right. Instead of Ah, okay. And because I was looking at the show that's that's, of course, if not people aren't downloading the show already 1400 today. Wow. So that's something interesting. Something important to keep in mind is those data are stored in a ready status base and I don't I don't guarantee that we won't lose that those are just there for information. It's not coming. And there is another one so mirror list mirror stats, and you have stats and you have another one. Yeah, I don't remember. But yeah, that's all for now. Okay, that's a nice. That's a nice project. If we go back to the agenda, the last topic that I want to mention is Damian has been working about on deploying an AKS cluster on the Amazon account so that AKS cluster will only be used for CIDA Jenkins that I use Jenkins agents. The cluster is there is configured on CIDA Jenkins that I own. And so the plan is now to update the labels to instead of deploying containers on Azure container instance, we will deploy containers on that cluster. So the purpose is to have a clear understanding of what cluster size we need for CIDA Jenkins that I own. So we better identify the costs there. Ideally, we also would like to use that as a way. When we have sponsoring discussion with cloud providers, because we'll be using communities cluster it means that we'll have better portability so if tomorrow we decide to to switch let's say to a correct cluster running on Azure. Then we just have to deploy the cluster and we would just update the communities configuration and that's it. So the purpose here is just to simplify the management of CIDA Jenkins that I own. That cluster will be used, because that cluster will be used for CI, which is a very public, I mean, that's very instance. We won't be using it for other services that Jenkins agents. So that's something really important. Hi. Hi. Yep. So yeah, that's that's really nice. And we are almost ready to you to switch to it. Basically, what's what will change on CIDA Jenkins.io right now, we are using Maven containers for ACI and will soon switch to the Jenkins inbound agent that we built for the Jenkins info organization, but we recently switch on those on CIDA Jenkins as well. That's fine. Nothing will change basically, except that that will be running on communities instead of Azure container instance. And yeah, any question on this topic. All right, then we are almost ready to finish that meeting. There's just one last question we have a resource group on the Azure account named REST Linux dev group. And I have no idea who created that. Apparently we started open source content open and open source machine. So I probably just delete that resource group then. Yeah, maybe it's something from Tyler's experiments, because Tyler is the main REST hacker so maybe he used another account. Well, and he's also a SUSE user, therefore an open SUSE user so that would, that would double double hint that he, that may be a tighter thing. Yeah, probably. I mean, normally he has a different accounts. And for the weirds. So the fun story is when when he transferred me the Jenkins account to me for some reason he also currently access to his own account. I can see what he's doing, but yeah, that's that's a good suggestion. Have a look at that. Thanks everybody. So I think we are good and the last topic that we want to cover before we finish the meeting. So we have a couple of topics. Maybe one question about issues Jenkins IO stability because I've noticed a few times that the service had delays and timeouts when accessing it from Europe. I wonder whether our monitoring catches it. So monitoring did not catch that but something to keep in mind for the monitoring is it's running from the US. So maybe that would explain why we didn't catch that I can look at it. If we get notification for your. You're not the first one complaining about the speed of that service. And especially from people from in Europe so maybe, maybe we can see with the next foundation. Yeah, I because I've complained but I cannot say for sure that the previous top performance issues much more often. So, even in this case, I don't crumble. But it would be nice to figure out what's the root cause. But yeah, for some reason each time people each time I had people complaining about the speed of that service I tested myself and everything was working fine. I'm not I'm not an easy user of that service anyway so I'm probably not the best person to measure. Well, and I, I don't know that we have any synthetic checks doing multi site but the data dog synthetics are based in Frankfurt so that might be a great excuse to configure or read see if we already have a synthetic check of issues that Jenkins.io initiating from Frankfurt because then it is it is Europe based. That's, yeah, we I'm pretty sure we do. So I can quickly look at it. I think it would be generally nice though, even if you discover me, I'm not sure what could we do about that. We, I mean, we can open a support ticket on the next foundation. Yes, they've been quite responsive. I just opened an issue Sunday and they already responded to me Monday and started work on it. It wasn't related to performance but they've seemed quite responsive. The campaign to some of our service provider. So is that finger pointing I agree. And so I am looking at the dashboard from Frankfurt and accepted from load from time to time, which remain pretty stable. So, so we already have a synthetic check. Yes, we do. Okay. And so we have a synthetic check and we also have containers running from the community's cluster that mentor that endpoint as well. And we have one check from the public master as well. So in fact, we have the test from multiple location. Otherwise, when such issues occur, a good, a good, a good place to look at if you don't have data dog information is to look at status touching Israel. We have basic. page right there. I mean, it takes, it takes some time to load. There's nothing that I mean, if we want to have those information there. I couldn't find a way to, to, to cruise a speed. But yeah, if you look in our issues, you're there. Obviously it's only for seven days, this case. But that's usually what's what I look when someone complain about the response, the response is less. Well, like just for where your operations I assume are more than just a request to the status page you were probably authenticated and doing real work. Yeah, so I should be authenticated at the time. And you're sometimes I just leave time out. Okay. Sometimes it opens slowly. And again, those those tech are just simple queries on DPI. So there is a quiz on slash API slash being so it's a very light test. So it's not because the test is passing correctly that the services behind correctly. Right. But it's just a good, a good indicator if something is wrong. Okay, last topic. Yeah, so what I would suggest is to open a journey to get on the next foundation. Maybe they have some information there. I also got complaints about email issues with Jira, so giving does not receive email from Jira anymore. But I have to look at sacred because something that already happened at the time the past was your email address was put on the bounce list. Yeah. I'm also not receiving email that was the ticket I opened with with Linux foundation and they, they asked me to double check and then read the send grid log so that's you're doing exactly the thing that I need. So the problem with sacred is on the cake and I have access to the accounts and if we want to put more people on the account will have to pay a lot more. So right now we are paying $15 per month to send emails. So the best thing that I would suggest is when you're ready to debug that we just give you some time together you send an email, or at least we generate some data, and then I can monitor the send grid because from sacred I think it's only keep the story for one week or something like that. Okay, so we need to be to be sure that. But on the cake and I can monitor that which means that on me. Yeah, so I'll send you an invite probably for tomorrow or Thursday. Okay, perfect. Then I propose to stop here. Thanks everybody for your time and see you in RC. Thank you.