 And then we should focus more on the graphs part that in that case, let's say, Okay. Well, and if, if the, if the continued results that I'm seeing hold throughout today, I think it is safe for us to merge your, your change for 90904. I think that it's, we're not, we still want a global switch for it to have the ability to turn it off as an escape hatch. But that code, it's, I've been running it since Saturday in all sorts of stress environments in my, in my test setup. And I found one failure that was due to my tests having a mistake. And, and that was, that was great. A message changed and I was checking for the message. And so, so that, that's okay. Good. That was checking for something bad. The test was poorly coded, but good. The, the, the, the code change you made worked exactly. So I think you're talking, talking about, I have, I have a bunch of tests. I have a thousand plus jobs that I run inside a Jenkins server that check various conditions. And those, those tests, some of them are in this case were badly written. They were asserting that the message was shallow fetch and the message is now shallow clone because we skipped the shallow fetch that was the redundant fetch. Sorry, your, your floor, Rishabh, I think we're ready to merge. Uh, I'm going to give it maybe another six or eight hours of test. And then I'm likely to merge. Okay. So, um, one of the first things I want to discuss is, so, um, how much do I want to explain GMH and, uh, how we are using it. So I was thinking of first talking about how do we, how have we integrated the GMH module inside the gate line plugin, then I can show, uh, the repository, the GMH benchmark, uh, folder inside it. And I want to ask, should I explore one of the benchmarks though we have, although we just have 15 minutes, do we want to, we don't want to go there. Right. So, 15 minutes is not nearly enough to show source code. Yes. So, okay. So no code. So I'll just show that this is the module where we code our benchmarks. Then the second step is how do we run it on, uh, the infrastructure. So I, I can show the step we have added in the Jenkins file. Um, and I think for this, we don't need to show the, uh, the blue ocean pipeline that's not needed. And, uh, after that, uh, I was, I'm thinking since I have, uh, just one second, I have, uh, integrated the GMH report plugin inside the Jenkins instance. I can show how that is working. So this is a project, which is a pipe, which checks out, uh, the, so I have a, uh, before this, I have a standalone project, uh, for GMH. Uh, and, uh, so it has Git client plugin as a dependency in the form. And, uh, and it, it's, it runs, uh, from a different command. It doesn't, it's not running the same way we usually do. We usually run our benchmarks. It doesn't run from Maven's command. It runs from Java, as a Java jar. So, um, so, uh, the pipeline is simple. It checks out the standalone project and then it builds it, uh, Maven clean install, and then it, the test stay, the testing stage of it, it contains, um, I'd rather show that Jenkins, it's okay. The testing, uh, stage of it contains, uh, uh, running the benchmarks. And then, uh, I have added the GMH report plugin. You just have to add, um, a stage where you have to point it out to the, uh, to the JSON file, which is going to be, uh, generated by the benchmarks. So once you have that, there's this, uh, another, uh, tab here, it's called GMH report. And once we open it, so for the bench, so I just ran one benchmark because I wanted quick results. And, uh, this is how it's going to show visually. It's, it's the same website. It's basically the website we use, uh, to visualize the benchmarks. There are some options as well. So it's the scale. If we really want to see small, um, want to increase the bit. And then, uh, so if you, if you have multiple benchmarks is going to comparatively show multiple benchmarks as well, I was thinking to, uh, when I demo this tomorrow, I'm going to put multiple benchmarks here so that it looks a little, uh, maybe better. So, so these are the three, um, I could say, uh, stages of how we are working with benchmarks with, uh, get client plugin. And then I think the second point is, and so I was thinking of removing some of the slides because it's my, currently my presentation is about 20, 21 slides and it might increase as I work upon it tonight as well. So I would want to reduce it because I just have 15 minutes for it. Um, then I was thinking of, uh, skipping the parameters or the, uh, the parameters I wanted to discuss more. I was thinking to first come to the results and the graphs and the inferences and maybe if people are interested, they would ask questions, what are our parameters or should I explain them before going to the results? What do you, what do you think guys? What do you think? I prefer results first. I think people will be much more interested in results than the road you travel to get to the results, particularly in only 15 minutes. If, if you highlight the results. Yes. Yep. So I was saying, like Mark said that, uh, you can focus more on the results. So you can just focus more on the problem and both like, uh, just tell them what was the problem and what results we have found for that, that two, two part for them. Sure. So, um, so how, how I was thinking of showing, uh, the results was first to, um, show, uh, the, so I can, I can include the parameters, or I could first show the result here that, okay, this is the results we've got with git fetch benchmarking git fetch. And then I could explain, okay, this, this machine was, uh, was running on this version of gate. So do we want to go to the details where we, I explained the version of gate, the platform and, uh, Java eight or Java 11, where am I running it? So, um, that's, that's something I was, I'm not sure. Maybe I can just write it here and not speak about it. That's also something I can do. Um, so the results, so I, what I've done here is what you can see here is a clear difference in, uh, the behavior of J gates performance. It's kind of, uh, so what you see here is with, uh, the first, uh, graph is, uh, the performance benchmark on my macOS, my local machine. And the second benchmark is on a send OS seven, uh, machine. So the first result you see, uh, what we see here is that, um, so there's this intersection, the repository size before that intersection is around less than five MB and what we see is something we've already discussed that J gate is performing better than get by access is the average time execution. Uh, and this, all of this is in milli, uh, micro seconds per milliseconds per operation. And, uh, so what we see is for a smaller size repository, J gate is performing better. And after a point, uh, what we could call this as a decision variable for our, uh, improvement, it's a size where we decide. We'll, we'll stretch the implementations. J gate starts to exponentially, uh, degrade in performance as we've seen. Uh, so the, uh, this is a visual graph quantitatively. There's for a 300 size repository, there is a 1.5 minutes difference between J gate and gates performance. So, um, so with the first graph, you can clearly see how the nature is changing as the repository size increases. The second graph also, uh, the nature is same though the intersection point comes at a later stage. So there's something which, uh, we need to explore more is to find the optimal intersection point, uh, with multiple iteration of benchmarks. Because, uh, at the first case, we can see, we could assume that, uh, maybe for a repository size of 10 MB, that is where we're going, we're going to switch to, uh, J gate instead of J gate. But in the second one, it comes at a very later stage. Uh, so the reasons when we discuss, so we have discussed both of these results in one of our previous, uh, meetings and the reason which we could find out was with my local machine, uh, uh, there could be disturbances, which are, which would not be present in this sent to a seven machine. This is a master node in a cluster. So, um, and it's, it's, it's not, it was not doing anything at the time. I think it was a freshly installed, uh, cluster. So it was not doing anything at the time. So, so the differences, uh, in the results could account to that reason. And my local machine was also profiling at the time doing a lot of other things as well. So, uh, this is how I was thinking this, this was, I was thinking this is the most, this might be the most effective way to show the results because we, we can, uh, show that the change in nature of, uh, performance when it comes to, uh, J gates relationship to the size of repository. So, um, yeah, instead of bar graphs, this, this seems like a, um, better way to visualize the change. And, um, I think after that with Git LS remote, should I talk? Uh, I can talk about it, right? Get LS remote. We have no differences between the implementations. So, um, okay. So I think I have discussed that I'm going to change the graph. I've prepared the graphs. I haven't replaced them yet. So it's basically the same thing, but with a clearer, um, visualization. After this, uh, one more issue I have is that with fixing redundant fetch, uh, the, when we were estimating the impact on the performance, I have ran several multiple other benchmarks to confirm that, uh, to confirm from those benchmarks that there is less than a second's difference between, uh, the redundant fetch and the initial fetch. Yeah. The addition of the redundant fetch is adding less than a second in for, for a repository size, uh, five MB to a repository size, uh, 800 MB as well. I haven't seen more than a second's difference, which is so, so, yeah. So I was thinking, this is kind of cheating maybe, but, uh, so with my profiling results, I have a very, um, I think here I can show. Okay. I, uh, when we fixed the redundant fetch issue, the second fetch, uh, so I've, the red box, you can see the second fetch it takes around what? End point six seconds. So we, uh, so in profiling, we could see that by, uh, after, uh, after applying the fix, we would remove, uh, this much amount of time from the total execution of the, uh, checkout step. But, uh, when it comes to benchmarks, if I want to show the results from benchmark there, visually, it's very difficult. Either I, I take it in milliseconds and, um, I switch the scale, but visually it won't make any difference when we show the results. So, uh, this is also a point where I wanted the advice of you guys. What should I do? I think for me saying that the data does not give, does not support, does not say that there is a consistent dramatic improvement. That's a true statement. However, removing redundant operations, we have reports from the field, from, from users that the redundant operation was expensive for them. We're trusting that and trusting that by removing a redundant operation, we're probably not going to harm performance. I wouldn't worry about trying to justify it with data at this point because I, I already tried that route and the users kept coming back to me saying, Mark, I don't care that it is very cheap for you to do an incremental fetch the second time. It's not cheap for me and I can't refute what the users say, right? I, I, their, their experience is real and they ran their numbers and they said, look, that second fetch is costing me this much. I agree. So I, I would, I would, in this case, it's, I, I refuse to fight with the users anymore. They're right. We should stop doing redundant work. Yeah. And I think the other thing is, is like they, they may have characteristics that we're not talking about. You're like, you talked about size of the repository and maybe it's something else going on in, in their repository that triggers this. So I agree with Mark. Okay. So, so, so I should show the benchmark results and not this or I should show both. What should I, I wouldn't even worry about showing either of them. Probably let's just probably can hand wave over it. Okay. Yeah. You're presenting too much of software people. The word redundant is, is, is one of those evils in software people, right? They say, ooh, redundant is bad. Get rid of it. So you don't even have to justify getting rid of it. You just say it's redundant. We proved it's redundant. And we removed it. Okay. Yeah. And you can just say, like, there were a lot of people in the, there, I don't know how many people in the field, but there are folks in the field who were saying that this was causing them significant increase in time. And so we've read it. Okay. We've validated that it wasn't necessary and it was in fact redundant. And we should be no harm and remove it for everyone else. Okay. Sounds great. Okay. Uh, after that, this is also, um, so this is one of the benchmarks I, uh, where I tried to perform the same operation, but with, uh, with a remote repository instead of, uh, using a local Git repository. This is a Git fetch operation. And, uh, so this is, I think it's kind of an obvious fact that if we include network, uh, while we're fetching, it's going to increase the time of that operation. So, so should I show this result? It just shows that without network, uh, the performance of, uh, the individual implementations for Jenkins repo, which is around 360 MB and for Ruby repo, which is around 470 MB. And we just, so the graphs, they show that without network, uh, it's the, so we have a, uh, there's an increase in the performance over it when we add network to the equation of benchmarking the operations. So, and it's, it's for both of them, for Git and for JGIT as well. So, uh, so I also wanted to ask, should I show this? Is this something we should show on, uh, in the demo? 15 minutes gap. I don't, I, I don't know that you're going to get any benefit to the audience telling them that networks are slower than local access to disk drives. I think, I think they, if they don't understand that, they'll understand it soon enough. And so I'm, I'm surprised that the, the differences between those, between the width and without network are not larger than they are. So, so that, but, but I'm not interested in exploring. It probably says you're, you're somehow near a university in India and they have really great connections to local, local caches. And that's wonderful. I'm glad. Okay. And the results might be a little for the, for the network one, they might be a little, uh, they might not be, um, accurate because, uh, the error rate is too huge. When I, when I use network to benchmark it, just the, the error increases. So that's also something. Uh, after that, so the next thing I could talk about the last thing could be, uh, what do we want to do in the phase two? And, uh, so the first thing is, of course, are the estimated, you're thinking about the heuristics, which we are going to use. Uh, then right now, uh, whatever performance evaluation I have done in Git plugin, it is, it is around Git SCM checkout step. Uh, so I, I, I want to move on to other areas of the plugin. And, uh, the third thing would be to, uh, so right now, uh, most of my results, they came before I started valid validating them at the very basic level. So, um, what I want in the next phase is to improve the benchmark, the validation, the results to have more confidence when we, uh, when we have a result, apart from, of course, taking the opinion and, um, uh, and other ways of validation, like profiling the J, uh, the Jenkins instance, uh, the benchmarks themselves should, uh, have a method to validate. So, um, so what I want to discuss is how should we discuss the heuristics we're going to use or, uh, something like that, or is that something we should avoid and just, just tell them that these are the things we are planning to do in the next phase. I think you're saying that we've, we've realized that Git, we have data that says Git repository size matters, but we cannot always determine the size of a repository. Therefore, we, we believe the next step is apply some heuristics to decide how big this repository is. And, and we, we don't always have the repository locally. Therefore, and that, that would be enough for me as the description. Why, why are we using doing anything about repository size? Because the graphs in the earlier part of the presentation proved repository size matters. Okay. Okay. So, um, uh, I guess this is what I think I'm going to show in the demo. Is there anything else you guys think? I have missed, which I should show your second line item there reminds me that there are, there are known challenges or potential challenges hiding in multi branch pipelines and organization folders, uh, that may be related to locking. And so, so my, my repository with 150 or 175 branches sometimes spends a lot of time waiting for a lock on the, on the cash, on the master, on the master. So that may in fact turn into a very interesting angle for performance that isn't J get, get specific, but is very much still impacting a user. User. Okay. So you're saying you're going to broaden is it to me sounds positive. Yeah. Okay. And, um, so, okay. So should, so whatever you explained, I don't get the issue with locking the cash, should I add that? No, no, no, no, no, no, no, no, no, no, just saying that you're broadening. It is already enough. Okay. That's, that's, there is more to investigate is already statement enough for me. You, no way should you describe which specific things is we don't know that that's just me making wild speculative guesses. Agreed. Agreed. So, um, anything else, uh, in terms of the, uh, the visualization or, uh, the way we are, yes. Are we covering the, uh, that defense and get sale and J get for tomorrow's presentation? Uh, yes. So I didn't take it is what I was showing here. I did not explain as clearly as, as I would in the demo. I actually, we've discussed it a lot of times before you came to the meeting. So that is why maybe you, yeah, I'll, I'll discuss it at length tomorrow. This is the, this is the main thing we have. I think so. Yes. So, uh, just to, uh, like not for tomorrow's demo, I posted the message on zoom also. So, uh, like what I can see here in the first graph is like J gate and get sale are trying to merge again, like for the bigger size deposit. So I think we can explode that in the phase two, if needed. I actually, um, there's actually one, one thing I forgot to mention is that this is not the scale here is logarithmic. It's not linear. I switched it because I, uh, with linear, the behavior was not as, uh, uh, obvious as it was with, uh, so I think I should mention that before explaining because the quantities you see on the y-axis, they're not, um, actually. But for me, the fact that you chose log scale makes it much clear when there's that intersection, it's, it's not coming back, right? It's J get is going to get slower and slower. And when you said exponential, I thought, no way is it, no way is it exponential, but you're using log scale. Therefore it actually is exponential, exponentially getting worse. Yes. And I use the word without, without anything you ordered, but yes, Mark. Very good. Okay. That, that seems weird. Okay. Um, yes, I think that's, uh, apart from this, I, I, I, I also, yes, yes, Mark. Is there one more that we need to ask for availability of the, the, the JMH plug in on ci.jenkins.io so we can visualize these things because I don't think it's there right now. And I think it would help us, particularly given that there are resources on ci.jenkins.io that are not available to you elsewhere. Right. It's got a system 390 mainframe from IBM. It's got right. So, um, my, I actually forgot to ask you, how do we do, do we add steps in Jenkins file? There's a modification Jenkins file. And also we have to request for those in, uh, uh, extra infrastructure. There it exists. Already exists. Just, okay. We just have to modify our Jenkins file. Right. And I can, I can show you a Jenkins file that already uses them. So I have an example that you can use as a reference, uh, when, when you're ready to say, I want to run on power PC and on arm 64, gravaton and on system 390. When you, when you reach that point where you're ready, I've got a, a simple Jenkins file that already shows it. Okay. And for the plugin to be installed in the infrastructure, I would have to raise a request or is there a pro is there a manual? Yeah. And in just raise an infer ticket. So the Jenkins JIRA raise an infer ticket. Okay. Okay. Okay. So, uh, one, one thing I explored during these days was, uh, was, was he restricts the possible heuristics we're using for the repository size estimation. And, uh, with, uh, with one of the, uh, one of the ways we're trying to determine the size, uh, that is, um, using the APIs, the rest API is provided by the providers of gate. So what I want to ask is that, um, first of all, with get plugin, I've seen a lot of browsers. So those are, those are the providers which, uh, are implementing the gate, uh, SEM, right? First of all, that is sort of what the browsers are. Those are simple transformations from a repository URL to a diff URL. So a repository, you repository, you are I, because it's not always a URL, a repository, you are I that maybe get at github.com colon, marquee weight slash get dash client dash plugin dot get needs to be mapped to something. And the mapping to is HTTPS, something, something with parameter replacement. So those browser things are just ways to view changes. So they're not, they're not much more than that. In fact, they're nothing more than that. So, so when, so when I'm, uh, writing the rules to, um, so, uh, to get the sizes of a repository hosted in get, get hub, get lab, how many providers do I consider while I'm doing that? How, how do we know, uh, where is the repository hosted? We would have to figure that out. But even if we figure that out, there are, I don't know, possibly, uh, more than five providers who would, uh, be using, uh, this functionality. So we would need all of them or, or how, how should we think about that? Or should we just assume that, uh, okay, we can provide it for some of the providers and, um, and if, if, if it's not the case, can we fall back to the third heuristic? I, I think that we would, this is a case where we would probably want to use the Jenkins concept of an extension point where what, what that allows is a plugin can say, I am going to declare this capability. And then other plugins may actually implement and contribute to that. And by allowing them to contribute to it, we would provide, you would provide a basic implementation in the Git plugin, and it would probably only use command line, get and have all sorts of flaws because it's only using command line, get, but then you might go to the Git lab plugin or to the GitHub plugin and implement that extension point and use the REST APIs from the Git lab plugin to make calls to Git lab and answer the question better than if, than it can be done from, from the Git plugin. Okay. So, so that, at least I think that's now this is just me, me thinking that Fran and Justin may have more experience in that area, but my sense is, this is a place where you do something small in the Git plugin, but that other plugins may add their own capability. Okay. Now, now what that Mark, Mark, Mark is right. A, but you have to do in this case, you just expose some extension and in case that any other plugin wants to use the extension, they can just collect the extension, look for the extension and just make use of it or even extend the extension. So it's, it's an advanced extension that that is going to be used instead of the previous one. Okay. Okay. And I guess, okay. So I'm going to explore how to create an extension point and how we could do that. The second question I had was how, how would we ranking the heuristics, are we going like to use the first one is if we find the, if you find a local cache, then it's, that's the best thing that we get the size. The second is the API is, we trust the API is then the third is the last one is, is that how we should select. That seems reasonable to me. I've seen in, in other places, things like a popularity waiting, where, where the heuristic provides its own assessment of its waiting. And, and you ask the implementer, please provide your, your assessment of the reliability of this heuristic as a, as a numeric value, for instance, and you say the, the value of the, the reliability of the local cache is absolutely one. You know, it's, it's, it's flawless. There, there's not much better than that heuristic. But, and the reliability of the API call is probably point nine because it's pretty good. Everything else is far below that, right? Counting branches is wildly below that. And then we, we bias towards the ones that are stronger. Okay. And the last question I guess with the heuristic is with Git LS remote, how do we, so how do we reach, how do I reach to a point where I can decide, okay, this is a size where I want, so I, I want Git CLI Git when there's a large size repository, let's say 300. So how do I correlate the number of references to this size? Should I just have maybe a lot of, I should take a lot of repositories and perform Git LS remote on it. Maybe, you know, create a benchmark where I have not a benchmark, maybe a J unit test where I just iterate through a lot of repositories, collect a lot of data and then average through to get some experimental results because otherwise it's more of a, I'm not sure if there is a direct, there is not a direct correlation between the number of sizes and the size of the repository. So if we want that heuristic to be remotely close to what we want to predict, I think a good idea would be to have a lot of repositories from a small, small size number to vary to a large, large size repository and then collect that data, average it and maybe come to a decision variable where we can decide provide a reliable heuristic. Yeah, I bet I like the idea of sampling. I think that's a very wise thing to say. I'm going to go sample repositories to test, to test any one of those heuristics, you definitely should not describe those things in tomorrow's presentation. But as we continue our discussions, yeah, we should, we should evaluate. I, it may be that you ultimately will decide that LS remote is such a poor heuristic to just discard it. And that's perfectly okay too to say, look, there is no correlation that we could rely on at all. Yeah, I get it. Okay. So I guess this is it from my side. For the demo, I'm going to improve the presentation over the night, the visualizations and the overall look. And possibly practice once or twice so that I don't overshoot for 15 minutes. Right. The amazing power of a stopwatch. Yeah, the amazing power of a stopwatch. Yeah, I'm going to do that. Yes. So, yeah, I mean, I guess the thing I usually try and do with presentations is think about how many slides I have versus how much time I'm going to need to spend on each slide. Be careful about adding too many slides because you'll invariably speak a little bit differently when you're presenting in front of people too. Okay. Okay, I'm going to, I'm going to reduce the number of slides I have there. Yes. Yeah, we already talked about some of them. Yeah. Okay. Yes, this is it. We should end the meeting and stop taking my. Where is, okay. So we do this. Guys. Bye. Thanks, Rishabh. We're looking forward to your presentation tomorrow. You're going to be great. Thanks a lot.