 started now. Go ahead. Okay. So for today's meeting, the agenda just one second. So the first thing on the agenda is update on our get it done fair tissue. And one thing I just saw we had three behaviors which we had to add to the clone API, which is the API needed to check out for the first time. So those, those three behaviors, the first one was clean before checkout. And we don't need that because we've found out that for the first time, we don't need a clean because there is no good repository. The second behavior was pruning stale branches before checkout. And the third is pruning stale tags. So while I was so I already implemented all of these behaviors in the clone API, and while I was writing the unit tests for these additions, I discovered that if we're, if you're building a repository for the first time, we would not need pruning stale branches or stale tags because stale remote references, we won't have references to them before the build because we don't even have a repository. So at that time, I realized that we don't need to add these two behaviors to the clone API step. So, so that's a discovery, recent discovery. And I think that makes things easier because the only change we have is related to unit tests now. And I think the fix is good to go because we had, we had to add, we have to cover all the possible use cases we would miss if we would avoid the second stretch, the redundant one. And I think we have covered all of those cases. Mark, what would you say? Well, so I think there's more to test, but we may be at the point now where the best form of testing is interactive, where we do some explore exploration of various permutations and combinations of job settings to see, hey, did this deliver us what we expected with some interactive tests? I was so dismayed by the realization that I didn't understand personally the conditions around prune and clean. And we had to be taught by a test, an automated test. So there's, there's more work to do there in terms of the interactive testing. Okay. So how would we go about it, Mark? The second, my second question is the second agenda is further action needed to merge the fix to the master branch. How, how should we plan that? So one thing that I was thinking of is maybe what we need is interactive testing that takes, let's say the set of available parameter, available extensions, arguments and chooses a subset of them and then says, okay, I want to run these without, with the redundant fetch and without and compare the repositories in the workspaces that result. So we have some form of repository comparison. Did we get all the references we expected? Did we get all the branches we expected? Were the ref specs correct? My worry is that if, if they aren't correct, some subtle thing will be introduced by removing the redundant fetch that, that someone will say, Hey, you broke me because you removed this subtle thing on which I depended. Okay. Okay. That sounds like a great plan. I can, I can start that with this interactive testing. Yeah. Now, now my usual game plan with that kind of interactive testing is the interactive testing is intentionally rapid. The goal is not to make it particularly repeatable just to keep notes as you go. But then when you find a real problem, that's the excuse to write an automated test eventually, which says show the problem. Okay. But if we focus on automating it too soon, we spend all our time on the automation without doing the exploration. Sure, I understand that. Okay. So after this, the next thing on the agenda is the discussion on implementation of performance improvement into the Git plugin and the Git line plugin. So last time we discussed that we could have a checkbox a way that we buy default enable performance improvement and people can revert to the old changes if they want to. So, so I tried doing it and I have a lot of questions later to it. So the first thing is I, so I basically I added this checkbox here, river performance improvement changes. And this says link to the descriptor class of SCM. So I would like to show the code here. Yeah. So this is a Boolean enable performance improvement in the descriptor class. So one of the first things I have in my mind, which I haven't explored right now is that where are the places I, how will I, how will I ship this Boolean to every corner of the Git plugin? I understand that this descriptor, this is an object for the SCM class. So when I create an SCM class, I would have this as an object and I can access. So I created a method, which is, is basically a getter to get this Boolean. But as I understand, we need to create the client within the SCM class, within the Git SCM class. So not sure if, if I'll be able to access this variable within the SCM class, because this is a different, the, the, the descriptor class is a different class, right? Is an, is basically an object to this class. So I haven't explored that part. If, if I am plainly wrong, and you can right now tell me where I'm wrong, that's okay. Or I'm going to, or else I'm going to explore this more. So I just implemented it to show what I, so that the first part of this whole implementation is to figure out how to bring the Boolean, how to bring the choice the user will have to the code. Then the second part, so this is the first part. Now the second part, which I, which I was thinking about is that if we need to selectively choose between implementations, which we, which from the analysis we get from the JMH benchmarks to do so, we know that for an example for Git fetch, the choice is heavily dependent upon the repository size. And so what, what I was thinking about to under, so first I was thinking about how do I get to know the repository size of a particular repository. So I was looking at a command called get, I wrote it somewhere, get count objects, count objects. So what it does is it basically counts the unpack of objects and tells the size. But the issue with that command is, and I implemented it first and then understood the issue. So the issue is that it counts the unpacked objects. And when I'm cloning the repository for the first time, it would send the objects in a packed, in a compressed manner. And that would basically, so I wrote a small test to check if my implementation is correct or not. And I cloned the Git plugin repository and I tested the object size and it was zero. So that was because I, the repository was packed. So, so while I was doing this, I, I, I understood one of the revelations I had was that I have to realize the size of the repository before creating the client. That means one of the first question for, for, for this thing is, can I access the, the repository I am going to use with the client I'm going to create before actually creating the client? If I'm not able to do that, then I would have to, I would have to clone the repository whenever I have to, maybe for once when I have to make this decision. But then that means if I have a 300 size repository, I would, I want to improve the performance that I would add a considerable amount of time while I'm cloning the repository and then estimating its size. First, it'll be packed. So maybe I would have to also include the, I would have to implement the Git command or unpacking objects that would, that would also, maybe I, and then I would also have to, one of the, one of the thing is that I would have to create a new client without, maybe I would have to create a client with a temporary local repository for the, for the life cycle of making that decision because I would not have any repository to create a client to check the size of the repository before creating the client for the Git plugins functionalities. So, so first, I think the first part is how do I, is this the right way to, to use the descriptors class getter, getter to take the user's option and then use it to switch between implementations or revert to the old changes. The second is how do we go about measuring the size of the repository before even creating a client? So, so on the first question, there is, there is right above the, right above the implementation that you did. There is that show entire commit summary and changes. So, look for the word show entire commit summary and changes. I have seen it. Yeah. It's a show and yeah, there we go. So, so that is a pattern of the, the kind of thing that I think you need to add because it uses the same exact technique. It has something in the descriptor. And then there are things inside the get SCM class or one of its, one of its related classes, which asks the get SCM object, if show entire commit summary and changes is enabled or not. So, so I think you can just leverage that. Look for it. I confess it's dealing with descriptors versus the parent class. I don't remember any of it. I always have to go do the research, but this show entire commit summary and changes is exactly the kind of flag you need. So you can look at its usages and see, oh, here's where it's used, here's where it's referenced and model your changes after that. So that's a good way to go about it. Okay. Yeah, I think if you, if I remember right, there's some, there's a, there's a method you can use called get descriptor that will give you the descriptor for that current plugin. But yeah, I think park suggestions probably the easiest one. Okay, but, but what if my question was, but what if I don't create an, I need to use the this, this field within the plugins within the SCM class before even creating the object for that class? How? I think I'll follow. I think, I think you can't. And you probably, or you could because it's statically available, right? It's a global. So it's got to be available somehow. But I'm not sure you will actually need to. That's the piece where, where I, I suspect you'll want to make the determination, which path should I use at runtime, after already instantiating it to get SCM object? I don't think you're going to need to make the decision beforehand. I suspect you might want to make it at runtime with the object already in memory and constructed. Okay. And I think technically Jenkins may have already constructed it for you when it loaded plugins when it started out. Yeah, you'll see some of this. Yeah, there is get descriptor by type in show get tool options. Anyways, I think you'll find it. You'll see, I think Mark's suggestions, a good one to see where that's used. Okay. And then you'll start to see the chain of where that kind of comes together. It's somewhat instructive to see and run through it yourself. I think sometimes too. Yeah, I do that. And now it's, it was easier for me to express those kind of flags. If the default value was negative. So, so rather than enable performance improvement, you might consider naming it disable or, or, let's see, enable redundant fetch, or, or use the performance abomination. But yeah, you know, it's okay. And I don't recall that this may now be cargo cult programming. And I apologize if it is, but, but I don't remember why, but it was easier for me to deal with things that had a default value of faults, rather than starting with a default value of true in these, in these descriptors trying to retain compatibility. Okay. I let change that. Okay, now, now your second question, though, was, was the more challenging. Ask your second question again. It's, it's already slipped my poor feeble mind. I think the second just to touch on that false thing. I think it's because the checkbox wouldn't be available in previous check ins and solutions. So keeping it as false would maintain like previous compatibility. Right. So he's going to be Joe is not going to have any XML and think before. Okay. Right. I think I think that's what you're talking about. All right, go ahead. That's a great. Oh, it was sizing. I remember your second question, Richa, it was about how do I deduce the size? So some speculation. Alright, first, I think we're looking for a fallible rule, a heuristic, that that could help us make the decision which path to take. And we've got some some information already available to contribute to this fallible rule, because in order to, in order to list the branches on a repository, we do a get LS remote. So one piece of the fallible rule might be to say, if a repository has more than some threshold of branches, we will assume, you know, try to do a correlation between branch count and approximate repository size. It's fallible. We know it is, we absolutely know it's fallible, but it's cheap, because we're already asking that question. And so we can use the answer from that question to already, if we remember the question, if we remember the answer or remember data from the answer, we could then use that that as part of our decision. Oh, okay, I saw well, I saw a repository that had 500 branches, it's probably not a one megabyte repository. It's probably more towards more towards the large size than the small size. Now, there are plenty of repositories that have two branches and are still hundreds of megabytes. So, so it is a fallible rule. But but one thing would be used get LS remote, because we're already calling it. I really liked your idea. And I thought it was brilliant to use to ask look for git commands that might tell us the size of the local repository if we've got one, because count objects, there are probably things like it, which will count inside pack files as well. So I think it's worth exploring that further to see what's available. Okay, but but the only concern there is that if but we would have to clone there is a repository to do that, right? Right, right. And that that for me is a non starter. You should not the heuristic has to give us an answer without requiring an extra clone, right? Because an extra clone will exactly sabotage the entire goal. Yeah, I am here concerned of one of the behavior of git LFS. So will that affect your decision? Yes. Like, does it matter? So it's like, it will just maintain a pointer in your actual git repository to some larger file that actually is there. So it will not consider the size of that. It's, I think you've got a very good point on car. It's the fun part there is that LFS objects are cached into the dot git directory as well. So, Rishabh, if you ask for the disk usage, literal disk usage on the drive of the dot git directory, that's a very, very good approximation if you've got it. So if if you've got a local clone, you can you can certainly ask the question because LFS, LFS is an important one to consider excellent point on car that we certainly if LFS is in use, that's probably a hint this is a big repository. It's not a guarantee, but it's probably a good hint. Okay. So in another way, you could handle it for like things like GitHub and GitLab, as you could potentially like, that could be something that for something like that, it's not generalizable for everything git, but it would maybe be an optimization for GitHub and GitLab is that you could check their API or use the APIs to get the repository size, potentially that now that that is a that is a very bold, bold one, because this would be the first time you're in someone's introducing REST API calls into the Git plugin. All of the REST API calls are done by higher level plugins like GitHub and GitLab and GitI. So but I think Justin's got a good point that that would be another way to another really excellent heuristic is those those providers may have API calls that will tell you the approximate size of the repository by a single API call. Does the GitHub and GitLab plugins, I'm a little rusty on on these bits, but would they pass down information like that from into the Git plugin? Because that's another possibility is you could like localize that into each of those plugins since they're already calling those APIs. So create create some sort of an interface in those upper level plugins that says, hey, provide me an estimate of remote repository size. That'd be worth considering as well. Yeah, that's an interesting number of options. But yeah, I hadn't thought of that. I think that's a very good idea. Okay, so that that would need for that I would need to create an implementation in the plugins which use the Git plugin to get lab or GitHub, right? Okay. Right. And so for that one, as soon as you involve another plugin, you're also contingent on their release of that plugin for you delivering that future. So it's much more challenging. It's architecturally very elegant, but it can be more challenging for you. It feels like though you may, Rishabh, we may have, we may have described that there is something like a class that you'll need inside inside the Git plugin doesn't exist yet that says, estimate the repository size or something about repository size heuristic. And we admit it's heuristic, it's fallible. But but then it collects data from various sources to give you your your size, size guesses. That might be a good task. Okay. Yeah, maybe you start with implementing something into Git plugin itself. That's like the best generalizable kind of way of doing it. You add you use this class, provide that to the other classes, and then maybe those plugins implement that later. That's not necessarily the scope of your work. Or if we have time, and maybe you add that stuff like that. Well, that's a possibility to to support Justin's idea Jenkins has the concept of an extension point that allows other plugins to add their to add to you. And so you could conceivably create this as an extension point in the Git plugin, which others if they wanted to contribute to it, and say, Hey, I want to provide an even better implementation than the Git plugins naive implementation. They could do that through this extension point system. Okay. Yeah, that's a great point. Provided as an extension. Yeah, and it's I think it's called an is it an extension point. I'll have to look it up. It's I'll send it to you separately. I think there's a there's a specific page on Jenkins.io that lists all the known Jenkins extension points. Okay. Okay. Extensions index is what it's called. Yes. And and yeah, they call it extension points. Okay, good. I'll put it into the meeting notes, Rishabh. Okay. So one more question I had related to the plugin was the SCM API. I have read the SCM consumer and implementation guides. But I have never mapped that with how I know the gate SCM class, but I haven't mapped all of it with how gate plugin is using the SCM API. So I wanted to ask, how much of that should I research before thinking of implementing this feature of performance improvement? How much should I is that is that is that something I need to do first so that I can do all of these things? Or is it something I can do parallely while I'm implementing performance improvement? I think you could do it in parallel or even after because the concepts that you're introducing are below the level of the SCM API, they are specifics to get internally. I don't think anything we've described so far any possible exception maybe is that if you ultimately decide that you want to allow implementations to offer a better way of estimating the size of a repository, that might need in addition to the to the SCM API. But my guess for right now is you this is entirely inside the get plugin for now. So the sophisticated and very capable things that are inside SCM can be largely ignored. But but Fran is better experienced than this than I am. I suspect that Justin, I don't know about your experience in it, but my hunch is that it's probably above a level that that you're that you're not working at. Okay, yeah, I think you're right for my experience. I've been at a different level. I've been at the higher level than the SCM API, generally. So I'm, but I think I would say the same thing. Fran, do you have any other other thoughts from the git perspective? I think I think I've seen a market totally right. So cool. Great. But it's definitely interesting to to learn about. So yeah, I think parallel or raptor seem like good ways. Okay, just I guess. These are my concerns right now. And so for I think for the next week, one of the possible tasks I have is first is to intact interactively test the the gate to tendon fetch fix with the possible combinations, permutation and combinations. And the second is to probably decide or test these heuristics we've mentioned, possible heuristics and create an extension if I can write that would be the tangible outcome of another task I can have to create an extension which would provide heuristic to calculate the size of repository. Yeah, I'm not worried about it being an extension point, but something that represents the estimate of repository size, I think is a good thing. It doesn't, to my mind, it doesn't it absolutely does not need to be an extension point. You don't need that complexity yet. It's it's more of kept remembering the the the statistics from git LS remote is probably enough to do the job that you need. If you remember, hey, this thing the last time it did LS remote, it had this many branches. That's a simple integer to remember it's not going to blow the class too badly. And but then you can use that in as part of your approximations. And where would I introduce that in the in the gate plugin is suppose if I'm if I create a class which which gives me the result of repository size, where would I where would I how would I use that in the gate SCM class? So is that something I should explore? I think it looks yeah, look for look for the collars of git LS remote. And okay, the they'll be part of that see if you can find a place logically that makes sense to attach a little bit of data into those collars, the places where you'll need it. Okay. Okay. I should have one casino here. So did you mention that account object, call it on gate? Yeah, did you use it with like the verbose mode or like without verbose mode? Without verbose, without I think I think that verbose mode gives the size of the packed objects also. So you can consider consider that. Okay. Check it. I didn't I didn't check that. Okay. I like that argument. Okay. And I guess another thing that I thought about too, and I'm not sure how simple this is behind the scenes. I know there are some Jenkins plugins that will cash some things. So you could maybe like, if you've seen a Git repository before, the workspace is gone, that agent's gone, you don't have access to it anymore. But your instance has seen that repo before. Perhaps there's a way to, to store that information on for the primary Jenkins. I don't know if anyone else has any experience with those APIs. I've seen it done. I don't know how it's, how it's done, because I haven't done it myself with a plugin. Yeah, I don't have experience. Maybe it's complicated. API for caching. Yeah, yeah. Well, there is no really API for caching. There is a piece of code which has been copy pasted between plugins with some level of success. And yeah, I believe we had a discussion about that maybe one year ago. Because it creates a lot of issues with backup management, etc. Because yeah, right now we do not have a standard for caching anything in Jenkins. So if you want to define something like that as a part of your project, I would be happy to see that. But right now there is no central solution. Okay. So I think this is it. Either the agenda is I wanted to discuss. And I'll work on the tasks I mentioned for this week for the next week. Yeah. Yeah. One thing, do you plan to do communications for the current evaluation results? Because I believe that you already have some data from performance testing. So maybe it's something to share with the community. Or do you plan to integrate everything into a single blog post? Um, like I could, I think I can do both. I can initially, for you know, for the time being, tomorrow I have a demo in the platform, say meeting, maybe with that I can release in the day in the community in the Google group form, I can post with the results I have. And then when I when I have multiple results with multiple operations, I can create a blog and aggregate all of those results. And yeah, it works for me. So okay, I'll do that for tomorrow. Great. And I'm looking forward to seeing you tomorrow in the platform. Yeah. I'll probably work on the benchmarks later to the redundant fetch to show what kind of performance differences we would have by removing it. All right. Rishabh, anything else that you need from us? Anything else that we can help with? No, I think I've discussed. All right, I will archive the recording and it'll be available on the on the list. Rishabh, thanks very much. We'll talk to you tomorrow in the platform sig meeting. Thank you so much. Okay, bye.