 Yes, recording has started. I welcome all the books out of court when you need to get back into the project. This is the weekly sake for the project and we will discuss the action items first. So the first action pending action item that we had was to update the project details for details. I believe Shitesh has raised a PR on the Jenkins IO that could do this. And I have reviewed it. I was assuming that Mark, if you would have some time to do that, we could wait for your review. But if you think it looks good, then we would probably move forward and merge it. You get it much better. Yes, and I'll do a review and if you've already reviewed it, I think my review will be quite rapid just to be sure it formats correctly and then I'll merge it. I'll try to do that before I go to sleep tonight. So apart from that, there was some doubts that Rishikesh had related to it. And these were action items upon me to help him on this, but unfortunately I could not. I don't find the time because I was, so there was another action item that Rishikesh had. He wanted to understand the descriptive, describable pattern that is implemented across Jenkins. And to do this, I myself had to go deep into the Jenkins code and understand the pattern. So I realized that during my time, you know, I was, I never, I remember I had to use this. The descriptors we have to use it, but I was never, I never explored it in the depth that I should have to explain this pattern. So I created a small gist with whatever I could find with my explorations and I hope that serves as a starting point for it to understand the pattern. So Rishikesh, I wanted to ask, do you still have these doubts or were you able to resolve them? The UI, I haven't again worked on the UI because, you know, based on the implementation, you know, the UI is based on the implementation. So I poshed, you know, the implementation for now. And I've created a design document, you know, explaining two ways of implementation that is using Cron syntax, or you know, using the build discarder. So that was what I've done this entire week. So then it feels like that Rishabh and I should review the design document. Do you want to give us a summary of the, the concepts that you found different between the two and do you have a particular one that you recommend for Shikesh? I have shared the link in the Gitter channel, one I can share it. Yeah, so, so basically, this was, you know, to just, you know, explain the difference between both the strategies. The global build discarder, the aim of the global build discard is, you know, to schedule maintenance tasks without having a Cron syntax in the UI. Okay, it is done intelligently by Jenkins internally. The Cron syntax, you know, administrators have to pass, you know, you know, Cron syntax for each maintenance task. And that strategy. So the global build discard here, I have written an M, you know, a working of how exactly it works, I was going through the documentation. So basically, if you check, there is a, what do you tell a class called background global build discarder. Yeah, that that. So this background global does build discard it executes every hour. There is this method called get recurrence period. Okay, which is, which is currently by default set to over. So every, every hour this, this thing runs, and it calls the execute method. This execute method calls the process job. Okay, it gets its check, it checks all the jobs in the Jenkins on the Jenkins controller. And then it checks whether this job is applicable for a global for, you know, discarding the previous builds which have been present or not. Rishabh, can you go to the design document? Yeah, so in the second step, yeah, it calls the execute function hourly and runs the build discard on all the jobs present on the Jenkins controller. It is based on the strategy present in the global build discard. Okay, so basically there is a function which we need to, you know, over, overwrite that is applicable function, which the user needs to set. If the user sets this. So basically whichever functionality he he writes in the is applicable function that configuration is used by Jenkins internally to decide whether it should this, you know, run the build discard on that job or not. So this is the basic functionality of the global build discard. I was thinking we can use the same functionality for, you know, what do you tell maintenance tasks as well, but I was having few questions regarding this, because global build discard is only used for jobs. It's only used to iterate over jobs. So is there any ways you know where is there any way where we can use it for caches as well. I don't know of a way directly to use that, but I would think that it's implementation would could be used, and we may have to do a new implementation that says, okay, we're going to use global build discard or as our pattern, and copy its code or duplicate relevant portions of its code into the get plugin to iterate over get caches. But, but I think I'd assume that you were thinking here. Huh, because global build discard or happens every hour, would we then in the UI have the user choose something, you know how frequently they want it whether it's hourly or every two hours every 24 hours every 48 hours and then skip it on those when it's not selected. What was your what was your vision for people who don't want to run caching every hour. So what I was thinking, I was thinking of running it. And so first my thought process was now whenever it runs every hour, I was thinking, first of all, you know, I don't I didn't want it to overload the system. Okay, so I was thinking that is there any way where we can find the CPU utilization how much CPU has been used or how much RAM has been consumed, so that you know, based on that data we can you know schedule, you know the maintenance tasks, or hourly or you know, every three hours, and let the user even you know, have an option of scheduling it weekly, something like that. That was what I was thinking. I'm not sure about how we would. Is there any other intelligent ways of scheduling maintenance tasks. Yeah, sorry to interrupt. I was just saying that I think we should divide this in two steps as Mark, the first thing that Mark asked us that as we decided to do the initial project right there. We are going to expose a way for the user to be able to set a schedule for these tasks right so that when you talk about CPU utilization and then the system having the intelligence to be able to schedule the jobs on the basis of the current status of the system, I believe that is something that we can implement when when we know how to implement a schedule job based on the users. I think we should take this step by step. We don't have to make it intelligent at the first to get to the first equation of this feature as well right we can start with the first vote which which I believe, please correct me if I'm not was to take the users input on what is the frequency with the system should use these jobs and then use that input to run the tasks and not think about. I mean if we could think about it it's great but it's okay if we don't think about see utilization at the first step. I do I think that's a that's that's a, I think we'll need the CPU utilization or the overload prevention no matter which scheduling technique we use so I think I think considering them as separate steps is a good idea. Okay. I mean I just wanted to what my point was to you know as a steps in requirements we know that okay first we need to do that and parallely as a feature we could figure out how to understand how to get the metrics from the system to be then able to, you know, build it, build it and fail safe so that we don't affect the system. Now, in terms of I'm not sure that CPU load is is the crucial measure there I would and even if it is it may be difficult to get that in a platform independent way. I think though we can get how long did the sub process run before it completed and the duration of the of the run of a sub process gives us a first first level approximation of how much demand it placed on that on that computer. So we know when it started, and we'll we'll be monitoring its exit code so we'll know when it finished, and the difference between those two is the duration of that process in terms of its wall clock execution. However many CPUs or cores it had available to use. But I think I think Rishabh is right that it's probably much more important that we do the first step and get it get confidence there before we worry about optimizing to not overload. So Ruchakesh did that answer your question. Yeah, I have a doubt. So basically, this what he did. Finally, are we like the UI in the UI, are we going to what what kind of UI are we looking into are we looking into, you know, taking you know, crons and tags from the administrator to schedule maintenance tasks or do you want the do you want the maintenance to run automatically without having an input from the administrator. I was assuming we wanted the administrator to have some control of the frequency. I don't know that we want a cron syntax. Or maybe, maybe I should say it differently a cron syntax, maybe more precise than we're actually ready to use. For instance, I think it would be disastrous or at least at minimum very unwise, if they scheduled to run every minute. So if, if we if we went with the global, if we went with the global bill discarded concept. And so it checks every hour is there work to be done. And then if the job configuration said only do that every 24th hour or every 48 hour. That might be added might be enough and then we, we don't need to process cron syntax now. I'm not sure that Jenkins users Jenkins users may say well but I had to learn cron syntax everywhere else, and they'd be right. It's just, I have a hard time imagining us running cash maintenance more than once an hour. Maybe I'm maybe I'm misunderstanding there. What's your thought on it, do you think that there will be interesting use cases that require it to run more than once an hour. Oh, I don't, it depends on the repository of the repository isn't that they go if it doesn't frequently updated it I don't think it would make sense to run it every hour. Yeah, that was, I agree with your your observation, at least for me I have a hard time imagining a repository that is busy enough that refreshing its cash every hour would be would be important and even more difficult to envision that refreshing its cash every every few minutes would be worthwhile. Or we can give an option where you know we can run it not hourly or you know on a daily basis or something, you know, to, or something configurable by the administrator. Because the aim of the global bill discarder is to not have cron syntax. It has to be done by Jenkins internally. The other implementation provides cron syntaxes for the year you know administrator where he can, where they can plug in the cron syntaxes and run the maintenance task. So that that's the whole difference between both the strategies. Well, go ahead, go ahead. I just wanted to say at this point of how will we decide in sharing these jobs, what is the question is going to be, I believe we should test this idea by when we built this feature we should run this on Marx machine, which has a lot of projects. I believe we should take inputs from that from that machine on how the frequencies that we're trying to whatever frequencies that we're assuming should you know, for the system, we should, I believe it will be a good practical test for us to know how it's actually going to work on a user's machine. And if our frequency is not you know the optimal raise that we wanted to. So it could be a test that we could see a run on your machine and then see how you know the system metrics how they are on the frequency that we decided could be earlier could be a basis. What do we need to reduce them or can we reduce them. I'm certainly willing to be a test case, I'd be honored to be a test case not just willing. That would be a real privilege. So if, if my little installation is is a of use I'm happy to do it and yes it does have several. In some cases rather embarrassingly large repositories that it's caching. Yes, so I believe it would be a good exercise to understand. Understand the relationship between the internet, the internet tasks and how they, how they're affected by the size of the repository or the other parameters which are defined in the repositories and then if you have a large number of size of repositories, how, how do you know this is the task that you decide how the brain track system, how much resources they're going to take. I mean, I was just listening to the question that we're trying to answer what the frequency should be. And I thought that before, without trying actually these tasks on a machine at that scale, how would we understand how can we answer that question. And, and I think that's on that sounds very good to me now. I'm not sure the answer to that question results, which path who should I should take whether global bill discarded or cron syntax because I can see arguments for for either please correct me if I'm wrong the prawn syntax allows us to essentially, it allows us a way for us to decide the frequency. Decide the frequency on a more granular level then exposing a way for them to say that okay every one hour or every five hours or every day. For instance, I can with cron syntax I can say things like the second and fourth Thursdays of the month, those kind of things. Anyway, so it's so yes it is much more sophisticated than the, the simple hourly scheme that the global bill discarded uses. The cron syntax is very, very rich in Jenkins it has keywords like at daily where it says run it sometime during the day, or at hourly or at weekly. And so it, it has, it has a level of sophistication that is certainly very, very, very powerful. So as I understand, the primary rules that we have for this project is to, is to do the heavy work heavy lifting behind the system of running these tasks, but for the user we want to provide a way for them to configure these tasks. Right. Yeah. And if that is the aim, then more customizability, especially when that feature affects the performance of the system. So the regularity could mean that the admin would have more options to essentially find out the legs and we don't have frequency or whatever we think the default frequency is is not the one that should be divided in their system. What I'm trying to say is that the transit syntax would allow us, allow the user more freedom to decide for themselves, what is the best way for them to run these tasks. Instead of us giving standards notes or something like that. So, so do you. So, so I then I would say I would prefer the second strategy, because it provides more customization. Yeah, yeah. Yeah, Ruchakesh. Back to the an earlier question, would you be okay if you are, do you have something that causes you to lean towards one choice or the other. Nothing other that's what I've gone actually the I've gone through the entire cron syntax implementation and I'm pretty much confident of an implementing it. That's, there's no favoritism as that's what I'm still a bit confused about the global builder scatter like what are the various conditions based on which we are going to implement the maintenance tasks. So, with the second strategy strategy where we can use the front syntax. The parameter is shared with the second so there are the same way that they've seen in the global builder scatter strategy there are books exposed to run the processes. Yeah, yeah, yeah. Or, you know, we can, we can create our own asynchronous thread by extending the, you know, a sync type of a sync periodic work that there's that. Yeah, if you extend to that you can create your own background process and and then run the maintenance task in that thread. So what you say is that the second strategy that you have the second implementation, the difference would be somewhere in this contract. Yeah, exactly. Here, what would happen is in that in the pattern in the cron syntax implementation, there would be one thread which would be running every minute which would check every minute whether the cron syntax is valid or not. If it is valid, and then the corresponding maintenance task is run on all the repositories. So, in terms of implementation, the core difference will be these two options is the usage of content tax and no other differences that you've seen. I didn't get you can you repeat. I just wanted to ask. Am I audible. Yeah, yeah, yeah. Yeah, I just wanted to ask if the only difference between these two implementations is the ability to use content tax apart from that. Yeah, we could use the same. We could extend the same contract to iterate over all the jobs and then. Yeah, yeah, yeah, both of yeah exactly the rest of the implementation is same the only main feature would be, are we taking the cron syntax from the user administrator and scheduling it or do we intelligently schedule it behind the scenes in Jenkins. And that we believe for us could be a step by step approach. And that sounds very reasonable to me. I think cron based syntax, it feels like you've done a very good job of exploring a hushikesh. Even, you know, we can safeguard cron syntax, like, as I've stated in one meet that assume an administrator runs a GC every minute. Okay, here's his syntax is corresponding to every minute or every 30 minutes. We can safeguard by putting some rules behind the scenes where you know, he can start, you know, running maintenance ask only, you know, like a base from hourly, like one hour 30 minutes one hour one minute, something like that so that he doesn't overload the system. Alternately, you could put a limiter in that says, if a current through if there is currently a maintenance thread running I will refuse to start a new thread. Yeah, scheduled. Or as we discussed we can add it into a queue and then you know the queue it and then you know. Yeah, even better you're right not not just just queue it and then somehow use the same thread. Right, exactly. Yeah, good point. Very good. What we discussed so I believe it. I was just saying that we're tilting more towards preferring the second approach. I think, yeah. And as rich as as who should cash noted, some safeguards only have a single thread that's processing these so that we are forcibly rate limited, we can never have more than one running at a time. That kind of thing. Yeah. So I, I have a question related to using a single thread. So how, how do we, so if we're not using multiple threads, let's say I have 60 repositories and I have these five tasks to perform. So when I start running them, it reading through the repository. If I'm using a single thread would be running a chance of freezing the time of execution for these tasks and not to a point where it is unfeasible because we this is a background processes, we don't have to worry how much. Okay, so the question would be, first of all, do we worry about how much time the does these tasks are taking. And do we put put an upper limit to those, to those time periods for an example if GC has started to run on a particular repository or a single thread, and it's taking. I don't know, five hours. So do we, do we have some kinds of upper limits where we say that okay, we can't proceed forward, considering the fact that it might take five days for the whole batch of tasks to run on these 60 repositories. Yeah, I'm not, I'm not immediately visualizing why we would want to put an upper bound. I can imagine someone's decided to compile the Linux kernel for their Raspberry Pi. And as part of that, they're doing a garbage collection operation on the two gigabyte Linux kernel repository on their Raspberry Pi controller. And it may take many hours, but they get the benefit that when it's done. It's, it's done. Tell me tell me more about cases where you worry that you're worried that, hey, they may have many copies of that and therefore they might somehow not be able to complete the other work. I like my only concern is, I think it's born out of this, this assumption that I have that let's say there are between the two scheduled frequencies that we have for the whole batch of tasks. How do we guarantee that all of the tasks that we've decided that are going to run for all of the repositories within the system that run before the next frequency scheduled time for them to run. Since we're doing this on a single thread. And I thought that the answer there was that that because there's a stack or a queue that that won't allow the next task to begin until it's predecessor has completed. And for me that was okay that means that if they've scheduled them to occur too rapidly. They will queue and the work will be done when when when when the first maintenance tasks completes the second will begin. If it was scheduled to begin earlier and same for third, fourth and fifth. Now, now I don't know that we want an unbounded queue because in that case. It's just queuing to do work that. Well, yeah, if the degenerate case that you're describing were to happen where processor and file system combination is so slow. Or large repositories are so so large that the the work simply cannot be completed in in the, you know, if the if the if the controller were continuously falling further and further behind in processing its queue of maintenance tasks. There's no point in making that queue very deep. It will just work on them when it can. Yes, have I yes, have I talked to your question Richard or no. Correct, correct. Yes, but I, as you've said it's a it's a case which could be an extreme case. So I don't know if it's something that we should prepare for right now. This question that I'm trying to ask is only because so let's say currently in my system if get GC is going to run my limited knowledge is that it's going to use whatever resources that I have on my system to run that process, which are going to be a single threaded process. And understanding as well get GC, get GC command like it GC is specifically written to use multiple cores. If the if the computer has multiple cores and so it will run portions of the garbage collection in parallel. Yes, so when we're saying that we're going to limit our jobs in a single threaded process, then we, what are we for we're for going the execution time we're increasing the execution time for these individual tasks. I was, I will I think I don't know if it works like that but I was thinking, when I schedule a maintenance task using no the get client plugin, it calls the underlying get command line present on your system, which runs a separate process to run the maintenance task. And once that maintenance task gets run, you get the result into the get client plugin. So that was what I was thinking. I mean, me me to and the fort. No, please. Besides to use multiple threads. And it does are multiple cores and it does that independent of Jenkins unless we were to somehow configure it to do less than that. That makes sense that answers my question. I was not thinking, you know, in that day. Yeah. I, it's still, it's we're only putting a single command line get process forking a single command line get process but then that process chooses to use multiple cores as it sees as it feels to do so. Correct. That's true. That's true. I was thinking that it be a way somehow allocating a single thread to the get command line operations, but that is not happening. Yes. Here, I was worried about this, like, if I run a get, you know, a GC command, you know, and that, you know, over, you know, consumes a lot of resources on the of the computer would that I think that would be a problem right. Like, it would, it will consume like 90% of CPU, you know, making the computer a bit slow. I was not sure about how do you proceed with that, or is that fine for now? I'm not sure what do you do with that. My opinion was that's fine because what we're doing is delegating to command line get. And if that is a problem for the user we would invite them to change the scheduling and that argues for the Cron syntax, change the scheduling so that it only happens during periods when they reasonably the system is idle or is less busy. Now on a system like ci.jankins.io, those are not regularly predictable times, but there is some pattern to them. So, you know, we can read like the frequency of how, you know, free when exactly the system is idle and then you know, give a recommend the administrator based on that, you know, whether you know so that he they can schedule the maintenance tasks. At this point, yes. If, and there may be some historical data like that available since Jenkins itself does predict load statistic or does present load statistics, or at least the last two days. So I had another doubt regarding the git caches so basically when I create a free style job on the free style job and the Jenkins UI, it creates a separate workspace work directory which contains the entire repository. Whereas if I use a multi branch pipeline, it only creates a caches folder. So here we are only worried about the caches right not regarding the free style repositories which is present on the Jenkins controller. Because it's an, it is strongly advised to not have any jobs that execute on the Jenkins controller. And so having us perform any maintenance on jobs that the user makes the mistake of running on the controller. I think is is a would be a bad pattern. It's, we only want to deal with caches that are maintained by Jenkins core itself, not with freestyle jobs that the user constructed. Okay, did that address your question who should yes, yes. Yeah, so I agree wholeheartedly with you that we should only do multi branch, we should only do caches on the on the controller, not job workspaces. Also, yesterday I was, you know, just messing around, you know, trying to make a implementation. So there I tried to save the entire data which I've got, like the Crohn syntax which are taken from the user and you know, stored it as an XML file. Can this XML file be changed by other users on, like, if that computer doesn't belongs to that administrator, can some other people change that XML file, asking, you know, just for security reasons. Certainly Jenkins configuration files can be modified by anyone who has permission to modify them. The next time Jenkins starts those configuration files will be read. So if any malicious user tried to change the Crohn syntax in that configuration file, would it affect the Jenkins software? Yes, that's correct. So UI based validation is good. And but is certainly necessary but probably not sufficient at the low levels of the API we want to we'll want to be sure that we're using we're checking the data, the schedule that's proposed for sanity there as well. The other reason for that is configuration as code. allows those kind of configurations also, and then the user might have the administrator writing the configuration as code definition might have made a mistake that causes it to now be scheduled every minute, something like that. So yeah, we if there are safeguards at the UI. We usually what happens is safeguards at the UI are also implemented in the API and the UI just presents a pretty error message of the same, the same safeguard. Any other topics we need to discuss today. So now we are so now we are more favored towards parameter, you know, Crohn syntax approach. So I can, you know, so I can, you know, start exploring more about it. That's what I was thinking. That's what was this, you know, this weekend's agenda, you know, to fix the architecture so that we can proceed on you know how we would implement. And that sounds very good to me who should cash. Thanks for doing that exploration. And thank you for for having researched global build the scarter versus the scheduled tasks interfaces those. Well done. Now timeline wise, I believe we're about to start the official coding phase aren't we. Yeah. It's going to start on June 13 according. Okay. Are you, are you feeling like you've got enough that you're ready to start, start the coding. Yes, one. Great. I apologize it's, it's approaching 11pm my time and I'm not nearly as awake as the two of you tend to be at this hour of the night. I don't have any other questions I think we can wind up the session if you want more. All right, then I'll go ahead and stop.