 Okay, go ahead. It's all yours So I'm going to share mice or you have to enable screen-sharing Yes, there we go. Thank you. So Today I tried profiling The git plug-in checkout step with a much larger repository It sits public repository of a framework called CDAP. It's Later it's used for data analytics and it's of the size of almost one GB The size of the repository I have it's So it has a lot of commits fifty thousand one thousand branches. So so very big repository. So So what happened? So how was I profiling and analyzing the results? So I sorted the threads from From the time when it starts to an increasing order And so I know the first so what I know is that with the fix I would just see one fetch call and without the fix I would see two fetch call one After some time from the first one. So with the first one This is this thread dump is with the fix and as we progress further with time These are the calls get fresh. This is just one call for git fetch and it almost takes what 18 minutes For this git fetch. It's because it's it's a very large repository, right? Yeah, yes, just to be sure I understand context. So is the is the remote repository on your local hard drive? Or is the remote repository over the network over the network mark? Okay? Thank you Great, okay, so this is this is real-world testing you are testing a transferring a gigabyte from github Yes, and whatever tests I provided related to profiling they were over the network They were not they were not from my local get seven. Okay, so Without the fix the first git fetch call it Okay, the first git fetch call is taking 16 minutes. I'm not sure why is it taking less That might be because of the network But the second git fetch call is just taking 10 seconds more Okay, so when I was saying that there's a difference of two minutes What I think I did wrong was with the fix that I did not wipe out the workspace before Building the second. I it was a consecutive build Of the results I showed the so the build was a second build So maybe I did not wipe out the workspace and that is why the first git fetched the time taken was considerably less from the come from the build I was comparing without the fix so I May need to test this much. I mean need to profile this much more to Exactly understand how much time difference again, but it's it's going to be I think it's going to be much less Less than maybe a minute seconds Even so you have saved you by by if we would remove the redundant fetch and Even if only save 10 seconds out of 20 minutes that's that's still a win and The the crucial thing that I've seen at some large installations is they have their local bit bucket server and They're overloading that bit bucket server with calls because all of their agents are calling into this bit bucket server and By cutting out by removing one of the calls you have cut in half The load that we're applying to that bit bucket server. So so yes It may it may be it may be a smaller number in terms of of the actual impact on a specific job But by cutting in half the number of times we make a request to that central server We may significantly improve performance for some of these people who are who are very attached to their large repositories I remember a Previous employer where I had a 20 gigabyte repository and every single clone was was just terribly expensive So so this is this is a great excuse to save some time. So don't don't be shy good thing you learned. That's great I was thinking of profiling With a more positive some more large repositories to actually see how much time difference are we gaining? Can we can we try on that repository which Mark was mentioning the other day like he has some messy repository Linux Linux colonel. No, no the one the one he's talking about is my my Jenkins dash bugs repository It's a smaller repository. It's only 50 or 50 or 60 megabytes. It's not the one gigabyte that your large repositories but it is and It's probably actually fewer branches than your larger than your one gig repository. I think it only has maybe 150 or 200 branches, but they are almost all the branches are active and almost all the branches are independent of one another and Because of that the it gives a very different shape to the to the history of that repository than a typical repository Okay, so maybe mark I couldn't test I Could profile the Jenkins on that repository Yeah, and it's it's up to you you get to choose and and I think I think there's there if you've discovered Fascinating things already here that hey the second request to this one was much much faster And that matches with results I had seen when I had done benchmarking some years ago from bug reports in Jenkins where they were users said hey The second call is prohibitively expensive and my attempts to duplicate it as prohibitively expensive all fail I saw it had cost. It wasn't free But I didn't see the you know the 50 percent of the initial clones spent in the second clone or 50 percent of the initial Fetch spent in the second Okay, and one more thing I probably test is The number of commits is that making a difference rather than the size only So right now I was just looking at the size of the repository So with with the results I showed yesterday That this is the repository the Samba repository. So it has a lot of It's way the the commits is almost 50 percent more With the current repository I have so Maybe I'm not sure but this is something I I I I I'll I'll test That's that's an interesting piece of data to track and you should track it because It very well could be exactly the number of commits. That's the most interesting problem Yeah, okay. Yes, I'll I'll definitely do that I've I've heard that called sensitivity analysis to decide which parameter Is is the most impactful to some change And it's certainly very helpful to your determining which heuristic you should use for repository size If number of commits is the important thing Then asking for the size on disk is not nearly as relevant as as other other queries Okay, I'm going to do that So, um So this is the report I think in a sense the report on track works is to one station So mug No, do you want to say something related to the demonstration or should I move forward? You can move forward. I was I was delighted with how you handled it Pleased with with the results you showed and the fact that the the people in the sig meeting had good questions to ask you about the The results you were seeing Good good job. Yes Question about one question about the The result you were showing before a When you are talking about a large repository, do you mean a A repository with a lot of branch commits some files or Or is a repository with large files I don't know if maybe the size of the files a might impact in the results of the analysis So so friend while uh while I have profiled these Operation the git fetch operation the redundant one. I the only thing in my mind was the size of the files But uh, this is this is the realization as I was as I was showing the results This is the realization I had just had that I should also Look at the the number of commits and the number of branches while I'm doing so So, uh, so this is the next thing sensitivity analysis what mark said I'm going to do that The repository the number of commits I'm going to and I'm probably going to take multiple repositories with a lot of Commits and a lot of branches So I'm going to test. Uh, I'm going to profile the Jenkins one with with those repositories Now now in terms of your profiling is it Would it be lower cost to do that outside of jfr with simple time stamp instrumentation? Is that reliable enough for you or do you are you finding that jfr is so helpful that you just assume be inside jfr? I I don't know your experience there has jfr been helpful for you in that regard. You've liked java flight recorder Um personally mark I I tried uh using system dot nanotime to uh mark the difference between But I uh, it was giving me very undeniable results when I was uh, consecutively Uh, testing the builds uh taking the results but with the jfr the one thing I've seen is that the results are consistent for the same repository for the same experiments when I'm um When I'm launching consecutive builds So the thread the time duration for the thread for one gate fetch call is is nearly the same So, uh, so that is why I I have an inclination. I I think I think that It's giving me reliable results rather than uh using system dot nanotime Great, excellent. Good choice. I I could probably do both. I could just uh, so if I import system dot nanotime and I log out the difference So I can see that on the build off. Also, I can use profiling so I could do both if I have to check the difference if there is a difference Without using jfr. So you what you're saying is that is jfr adding a production? And overhead to the performance Actually, I was more I wasn't worried about jfr's overhead as much as I was worried about Is it is it the simplest way for you to do what you're doing? If it's the simplest way, then that's that's that gives it value immediately Because you're trying to explore and understand. So whatever whatever works best for you do that Yeah, it is okay. It's it simple. It's not that difficult for me to do great So, uh, the next thing so we discussed yesterday that we are On wednesday that we are going to uh interactively test The fix we have We've done for the gated undone fetch. So for that, I've started doing that. I I've done it for some So I the first the scenario I wanted to take was all the use cases which are related to Somehow the structure of the repository Which would bring a difference in the structure of repository after we choose that behavior Which could be related to the size of specs commit history So, uh, so I I have tested interactively This fix with advance with the advanced clone behavior I've chosen shallow clone with the depth and then so how do I test it? I go to the workspace and I I see the log I see the history I see the head Where is it attached for both of them? And And of course the site of course the size of the repository So, uh, with these parameters, I I'm seeing same results for both This is how I'm basically testing for any case the second, uh, test scenario I took was to check out for a specific branch Uh, so I wanted to see if I'm resulting with the same branch Is the is the head attached to the same branch when I check out In the workspace with the fix and without the fix and it was the same. So, uh, so I'm going to Take more cases. So it's I I have no preference with the behaviors. I'm I'm choosing right now, but uh, These were the cases. I I really wanted to test the shallow clone on I I'll probably Check sparse check out as well. Although. I don't think it it interacts with the redundancy of the get fetch Actually, I would drop sparse checkout because it is entirely a workspace operation So so it doesn't change the quantity of history we retrieve All it changes is the checkout operation and what your focus is on is the fetch operation So you don't need to spend time on sparse checkout if if we break it. We broke it for another reason Okay, okay. Got it. So, uh, so this is how I'm doing it. Uh, so one thing I'll do is I'm going to I'm going to make a document here at the test bed. I'm going to share it So that you know that I I'm not wasting time on some operations. I should not So we're going to do that. Um The fetch results we've discussed so, uh, going to change the parameters of my profiling a little bit and then See what kind of results we have So, uh, for the heuristics, we, uh, we were discussing to calculate repository size. So I, uh, so the first thing we were talking about was to use, uh, we Talk so, uh, before that So I was, uh, I was exploring a little bit. How is the repository size is affected by, uh, the Type of gate objects we have inside the repository. So you're seeing that we have blobs, trees, comets, uh tags and references So right now by using git ls remote, we can list the references and the tags Well, we can list So, yeah, to be clear what you can list is the tip of the branches And you then you can list the names of the remote branches And the shaw one of that remote branch Yes, and the tags and I think the shaw one for the tag Uh, rob I have I actually Uh, I I compared two repositories with this operation git ls remote And uh, so in the right side you, uh, the git ls remote is done for, um, git client plugin And in the left side it's done for, uh, cdab the repository. I just showed it as a way Much more branches and commits. So So I think as I scroll down, it's it's clear that Uh, just one second. I'm going to show you the difference So this is the cdab the left side is the cdab one and you can see the list goes on and on The point is that we'll have much more, uh, references for a larger size repository. So it is Safe to say that yes, if we use git ls report, we could Approximately assume that okay, this is a large size repository or a smaller size repository But my concern here is with so what I did was that I pulled up some of the, uh, Largest repositories I could find on github. These might not be the largest one, but Or the famous one and what I saw was so for this repository the cdab one. So we have, uh This is a one gb file size repository It has, uh, approximately 1000 branches Now let's go to the next repository. This is, uh, the vs code microsoft repository It also has almost a one gb size, uh file size, but way less branches 50% less Then let's go to kubernetes It's one gb repository, but just 41 branches And and with I also tried with ansible. It's also around it's 900 or 800 But 44 branches Uh, ruby ruby was uh was way way less. I think it was around 600 700 But 22 branches. So so what I can I could see here is that Git ls remote Just seeing the branches might not be the best way to estimate the size of a repository So maybe we need to make a combination like, uh, Maybe two or three Heuristics which could calculate estimate the size of the repository and I was also, uh, so I I was searching the internet to find a good way to Uh to find the size of the repository without learning that is repository. So it turns out According to my search, it's not that simple If we have a clone, it's pretty easy. I think we have we have get provides us a functionality There are ways we could, uh For sure know the size of the repository but without the clone Uh, I think one one sure way was what uh, what Justin suggested for github and git lab We have they have exposed apis. I've tested those so we can we can get the size Uh from a simple request rest api uh, but With that also, I think the concern is that we would have users with bid bucket or maybe individual git servers Or githia a lot more, uh, services which provide get sm so So that might also be not the complete solution. I'm actually The more I research on this thing. It's we might not have one single way come to completely Figure out the size we might need to use multiple We might need to have multiple actions for multiple scenarios and then Create a class like that and also So one more thing I was I was searching was Do we before before Could we do something like this once we've built? Built the repository for the first time using git plugin Do I think we cash if we have the workspace Somewhere I'm not sure I am actually not very well aware of the management of git plugin between master and agent I think I think that we have our workspaces on the agents and um So if we have the workspace So what I'm trying to say here is that for once we will not be able to improve the performance for the first time But for the consecutive builds or maybe Once we have some information Then we will be able to use the size or some information from that build from that workspace and then Use performance enhancement in that way Also before I can be just one last I think it's probably a stupid suggestion, but Can we ask the user for the size of the repository? Finally is configuring the git plugin Uh, I think that that'll be the easiest thing to do, but I'm not sure if Uh, so one thing I've experienced personally is that if I if I have a github own repository I think there's no direct way to know the size of the repository. There's no I don't see it on the ui Uh of my repository So I'm not sure if if that's something we'd like to place that responsibility on the user I haven't seen it in the git plugin the current behavior, but It's just something I was thinking about. So, yeah Yes So I I think what you described is Exactly the nature of heuristics, right fallible rules are in fact the word fallible is so strongly emphasized in that phrase of a fallible rule Right, we're trying to find something that we know is imperfect And we we know it cannot be perfect because the information we need is not not available always to us locally So I I think you're on the right track keep keep working on those topics and Yes, ls remote is and I'm glad that you found and confirmed ls remote is Probably the weakest of the heuristics we could choose Right, it is because you could have a one a two gigabyte repository with one branch And the heuristic completely mispredicts the size of that repository because A single branch repository is legal And it's perfectly reasonable and it could be enormous now now the the I like the idea of asking the provider for the number of commits or for Whatever the provider gives as size hints Right seems like number of commits is probably one of the size hints branches is another size hint On github this notion of releases, which is really number of tags Is probably another size hint And and each of those things could be could be part of that Now how do you get people to contribute To use those from the various providers or do you put them inside the git plugin? That's that's I think part of the exploration there Okay, okay, maybe if I provide an additional behavior like I was talking before So if someone is choosing to improve the performance of git plugin then they can fill in all of those details but then I think the biggest The biggest disadvantage is there that the adoption of this Whole feature would be very slow If you're providing an option like that Well or and it doesn't stop stop you as part of this project from contributing Pull requests to the github plugin and to the bit bucket plugin or to the github branch source Whichever layer it is that makes sense. You are welcome to go into any plugin necessary To to do the to do the hit the goal But but it does it does add an extra layer of complexity as soon as you start doing more and more plugins If you if you're now modifying gitty github bit bucket and git lab plugins Those are four additional plugins that you've got to think about and investigate and understand Okay, okay But I um if you talk about the api is github and git lab expose to um Do we do they the git lab and the github plugin Jenkins plugin? Do they calculate do they have that information or is that something I would have to explore? I I don't know I would be surprised if they gathered that but the I know that the github plugin does have an api that it uses And that that api is quite rich And capable. I don't know that the size information is already in the api But I I know some people we could ask About details of that api if that would help Liam Newman is is one that I'm sure if we just mention him on the git plugin getter chat He's happy to come answer questions Okay, so although I have tried the git api Github api which provides the size, but yes, we could probably ask how github is doing We could ask Liam. Oh, so so you've actually found an api in github. Yes Git lab Excellent. So if if you've got that then whether or not it supports it You know how to do it and the github api plugin for instance would just need to be extended to to give that a you access to that api Okay Okay, so So I need much more exploration on these topics and I think This is how much I could Let's see the agenda. Did I write something else? Yes, so So I think the there's a need to I think the first thing The need is to understand How much contribution To the fetch call The the site the number of commits the number of branches is Contributing to this to the duration of the fetch call is the first thing I'm going to do Secondly I'm going to explore the heuristics more how we could possibly do this and Well, could you could as can you do the can you do the The exploration of the sensitivity piece as part of your testing activity that you were doing because Could you get double duty just by while you're doing these tests? Interactively checking redundant fetch behaviors Watch the numbers to see Hey, what did this what impact did it have that I chose this repository rather than that repository? Actually, yes, that would be better. Yes. I'm going to do that I'm going to while I'm Testing my scenarios. I'm going to test it with this different repositories. I Great number of okay Did you have tags on the list also? Because those would also be refs that could potentially commit contribute to To fetch performance I had commits and branches, but yes, Justin. I'm going to add tags on the list as well Okay so Mark, do you would you want to talk about the release the release plan would you want to discuss that? Sure if I and I'm just borrowing this meeting because we got Fran here Justin you're welcome to chime in at home car as well But fran and I are co-maintainers of the plug-in So fran my proposal is to release git plug in 4 3 0 and get client plug in 3 3 0 today With the contents of the current master branches That won't give us the symbols capability that carl schultz has been working on Because I found a compatibility problem there. It surprised me and I just don't want to risk it There are other things in those releases that will help users I'm running a bunch of tests right now to be sure that I believe That code is in good shape. Are you okay with that or do you have concerns? Yes, yes, uh ship it. I I read your answer to jssp Totally fine for me Great. All right. So so reshob for your project what this will mean is that That the the distance from the 4 3 0 release to your changes will be much less than if we had had you working on something based on 4 2 2 So you've been working on the master branch therefore it shouldn't change your experience dramatically But it was it was one where I want to be sure that when we bring your changes into a release that Is the dominant portion of that release rather than all the other noise that's in this release for 4 3 0 Okay, mark. Okay, that sounds great All right, that's all that I had then Okay, so I guess this is it then I'm going to work on the I'm going to work on the testing heuristics and Yes, I think that's it All right. Thanks everybody. Uh recording of the meeting will be posted Separately and being made list the url be put into the getter channel Thank you. Thank you. Wednesday reshob. Thanks very much