 All right, we're on. Okay, Rishabh, you have the show. Yes, so the first thing today we should discuss. Okay, so I have access to your machine, Mark, and I wanted to ask what, so general rules, how I should use the Jenkins instance you've provided me. I wanted to ask that first, so. You are welcome to use that Jenkins instance in any, what I would call, reasonable way. I would appreciate it if you would not reverse engineer all of the credentials that are inside of it and take those away, all right? That would be really, that would be bad behavior. But I'm entrusting you that this is a, it uses Jenkins credentials, and therefore, there are techniques that would let you get them back out. Please don't steal my credentials. Other than that, you are welcome to do it. It is a destructible instance. I reconstruct them all the time. I reserve the right to reconstruct them at any time. I destroy them and restart them very frequently. So don't be surprised if sometime you were doing something and all of the history of the thing gets destroyed. So, but you are welcome to do, you are welcome to use the resources that are here. There you are welcome to define new jobs. If you would like the job to persist over a potential restart, let me know and I will archive it into the definition. So that then the next time I restart it, there are also multiple copies of that. If you need access to a separate copy, if you'd like a fresh copy, I run fresh copies on other machines and I could give you access to those other machines as well. I think this is okay for me right now. I don't need a fresh copy. I think mainly I would want to use it for running the benchmarks. Note that there are quite a lot of machines there. I would want to use it for that. And also the second thing was the GMH report plugin. So we were thinking of adding that to that Jenkins instance. So my question was that I would have to, I would have to, what I would have to do is to access the Jenkins directly, the home Jenkins directory of this instance and add the HPI, JPI to the, the plugins directory in the instance to integrate this plugin inside the Jenkins. That's what I did for my local instance. You're doing much, much too much hard work for that. What you do is you open the plugin manager, find the available tab and click the GMH plugin and install it. Don't, yes, yes, you describe the hard way to do it. Do it the easy way and we'll take care of it later with making it permanent. Just install it so you can use it. I think it's actually already there, but if it's not, go ahead and install it. And you are welcome to install any other plugins you need. If you install a plugin and it disrupts the instance, okay, no problem. If you install a plugin and I strongly disagree, I'll let you know. Okay, okay. Yes, we, Chuck Norris plugin, thanks very much Justin, that's great. No, I do not have the Chuck Norris plugin installed. No, do I have the Bruce Schneier plugin installed? Nor, no. Okay. The next thing on the agenda was that, so I reached a PR a long back for two or three benchmarks. And I, what I assumed was when I raised that PR, the benchmarks will run on this CI Jenkins IO because the branch was GSOC hyphen something GMH. So, but the benchmark, that stage is not being run by on this year Jenkins retire. And I was not sure why is that happening. I actually have the PR open there. As you can see the Jenkins file, we do have the condition where we run or we accept the branches, which are related to GSOC, but still it is not running for the PR number, this one. So I tried, I've had multiple comments in this PR two or three benchmarks I added and it was not running. I think I will not be able to use that. So I was actually not able to figure out what could be a reason for that. Is it because I don't know, I have actually, I don't have a clue why would that happen. So is the, I just, let's be sure we confirm it here. Okay, good. So could you go back to the Jenkins file definition? I wonder if what we've got there is a file regular file expression and what we need is a regular expression. Because it may just be that we're not matching the, we're not matching the branch name. Yeah, isn't GSOC dash asterisk, that's a file name style regular expression in order to match anything after the dash we wouldn't need to be dash dot asterisk. Yeah, yeah, that would be one character. So what that regular expression says, and I should have seen this when we code reviewed it, right? Sorry for not even thinking of it. That says GSOC followed by dash zero or more times. So if you named the branch GSOC dash dash dash or GSOC dash dash, that would probably have matched it. But we need, I think we need an extra dot in there, just looking at it now. So you want to submit the call request to fix that or if you'd like, I can commit it directly to the master branch to fix that. I'm not sure we need a PR, I think. Okay, sorry about that, that's cool. Let me check, Mark, it's not yours. Yeah, well, come on. Code reviewers have some responsibility and clearly I didn't fill mine here. Okay, okay. What's the difference between the null equals to a and the single equals to a there? I don't know. I added that, I'm not sure why. I was actually looking at a groovy documentation and I was doing it and I did not. Yeah, so another failing of the code reviewer, Justin. Oh, that's good. So one, this is the most special one that we do. CPS is like its own, generally groovy, but it's caveats. Yeah, it's clearly DSL. It is a domain-specific language. It just happens to be based on groovy, yeah. Okay, I cannot remember why I did this, but I'll look for it and I'll probably add the information in the PR. Great, thanks. Okay, so this is not there. So this is more of a, not something to discuss me about. Yeah, so I was, I did say that I would create a class, a utility in the benchmarks module, which would validate those benchmarks functionally. So I've created a class which right now it offers two kind of parodations. The first is to check the number of references if we're running some, an operation which is related to references. We basically compare number of references are correct or not for a particular repository. The second one is the size of the repository that is the dot gate size of the directory. So what I noticed was that I was actually using the file utility API to calculate the size of the repository directory. And that operation was included in the benchmark itself. So I was a little, then I thought of benchmarking the file utility dot, the file utility dot estimate, estimating the size of the directory API to actually check what kind of contribution is it, what kind of overhead is it adding to the whole benchmark. And I think it's not a surprise that it's just adding a millisecond, so the benchmark results it said that it's adding a millisecond to the operation is taking a millisecond for, I actually checked it for a 300 MB size repository dot gate directory. So it's just taking a millisecond, so results wise it wouldn't make a difference if it's a get fetch maybe, but for a small operation it might, I think it's just an observation for myself that I should not include anything, even if I have to validate it, I would have to find a way where I do not use the benchmark itself to validate any parameter. And that seems fair. A micro benchmark, the goal is to have it be as small a unit as you can possibly test, right? It's so, so I think you're right. It's the concept seems to assume that the thing, the benchmark as written is doing valid things and we have to do it by inspection or by observation. That seems fair to me. Yeah, and so the last topic today is the abstract GDSCM source I've been exploring the ACM API. So actually it was for our size estimated class. So I was looking at, I think I found something interesting. I was looking at how the multi branch projects they're working, how is it catching the branches? So, so first I wanna confirm something just a second, I actually wanted to confirm locking cash lock, we're not a big, this is a big class and it's very difficult to find a particular discovered branches. Yeah, so I'll actually talk about the locking of the cash after this because this is what I can find right now and I'll discuss this first. So what I can see is that we use a functionality provided by JGIT, RevVoc, which is basically blocking the commit tree to actually, we want the branches and we want other references. So if we have, if someone is scanning the PR and building the project, so a multi branch project according to my understanding of the website, if I have multiple branches, they put a scan those branches and then if they each of them have, they have a Jenkins file, it would run the Jenkins file and that is the very basic definition of a multi branch project and we have other references like PRs you want to build upon. We would have full request and it would scan them and then run the build on those full requests as well. So what I can see is that we, first of all, what I want to ask by seeing this class is that GIT can, CLI GIT can also provide these functionalities individually. RevVoc is actually, it does a lot of things. It's not just what I could see that it is parsing the commit, one of the things I could see that's parsing the commit while walking, we can do this. So, and then I saw that we have a functionality in CLI called GIT RevVoc command, which would do the same thing. So the question I have is why are we using RevVoc? Is it because this is, the GIT is providing all these functionalities, but we don't have a class to do it. We don't have, we would have to add similar functionalities into a particular place. And that is, GIT is already providing us that. Is this the reason we're using RevVoc or is there another reason I'm not aware of that? Yeah, I think it's a, it's GIT's interfaces are designed for and very well thought out for the way that Java programmers want to interact with things. And so I think that when this was implemented, it was detected that, hey, the GIT interfaces are just cleaner and easier to use than the GIT client plug-in interfaces. So let's use GIT, and I think it was a good choice. It was a very wise and sensible choice. It also avoids the overhead of launching a separate sub-process to interact with the GIT repository in small ways, right? This thing can crack it open and let GIT do a bunch of caching for it in the Java process. Does that address your question? It does, yeah. So I think my main, the main thing I was thinking about was that if we're using something like this, do we want to benchmark the operations we're forming here when we have considerable amount of references? This is, because I think we have seen how GIT LS remote works for the number of references. We've seen the equation between the number of references and size of the repository. It's not, of course, it's linear in the sense that it increases as the size of the repository. The number of references increase, the time increases for those references. But since we would, I'm actually, I haven't seen the rev walk in depth, but what I, this is just a high level thought that I have is that we could probably benchmark the rev walk, the functionalities we're using from rev walk and compare them with how CLI GIT works. Although just what you've said, I think it's not needed, but is it something we should explore? Because since this is something JGIT is providing with whatever experience I've had, just I thought that, okay, if this is provided by JGIT and the same thing can be provided by GIT, is it possible that GIT would perform, CLI GIT would perform that in areas which we haven't explored yet? Good question. And I think it's unlikely that in this use case we would find some place where we say, oh, this is significantly enough faster to justify reworking the implementation to use CLI GIT. But it's a valid question. I can't confidently say, I mean, we know that for small repositories, fetch is faster with JGIT and there comes a threshold where fetch becomes slower with JGIT. I would be shocked if the operations in JGIT for these RevWalk kind of things were dramatically slower in particular because JGIT is pretty commonly used to implement things like the Garrett code review systems back end and therefore it does an awful lot of RevWalk. And so I would have assumed that the JGIT authors had done quite a bit to make sure that that thing works well. Oh, okay. So maybe what I can do is to just which for this option, decide a benchmark would not take a lot of time to, maybe benchmark a certain functionality under certain conditions. Oh yeah, I think particularly if you can see a way to do it without a lot of effort, I think that would be a real positive. Another place to say, hey, I'm gonna try a benchmark of something that's a higher level than just fetch. And this spot is a, this particular area is a place where we've got potential for cash contention. So you might even be at a higher level where you say, I'm gonna try, yeah. So yes, interesting if you can do a benchmark without a lot of cost. So when you're talking about the cash contention here, Mark, what you're saying is that for a multi-grant project when we create it first, the first thing we do is we try to find the cash. If we find that, then we create a client and then do all of this work. And if we don't find it, we fetch the repository and then create a client and do that. So why do we have, assuming that we have a cash, so the contention, it happens while we're locking it, is that the area where the issue comes or for multiple branches? You know, I think at least the cash contention claims that I've heard were in the operating case where the clone already exists, there's been a change on the remote repository. And now each of the branches of that remote repository are being evaluated, are there changes there? And they tend to build a stack of people who all want to access the cash copy. They each wanna get their own lock on the cash copy. They lock it, update it, unlock it, lock it, update it, unlock it. And at least that's what I thought was had been in the report was, hey, if I have 150 branches and with 150 branches, if there's a change on one of those branches that gets detected by the GitHub branch source plugin, it will invoke a check of all of them. And if it will now process, and it may process changes on many branches, and they will contend with each other for access to the cash. Okay, okay, I understand. Okay, I think I need to read more about locking cash in cash resources. And because I am not, I haven't read too much about it. I just, actually there was a GSOC idea which was related to implementing a system which would distribute the cash as- Right, and this is not that. This is decidedly- It is not that- And in terms of your project, I'm not personally concerned about this layer. If you want to shift your focus back to the lower layers that you would initially describe, that's fine too. There isn't a requirement that we have to improve this piece. We're looking for things to improve and if you find that this could use improvement, great. But if not- Okay, okay. That's okay, I'll just read about it and I'll possibly try to- I'm not sure how to reproduce it. Is it something reproducible? Have you- Or is it a random event? So it's not like it's going to happen every time. So is it like, if I have 150 branches, it is going to happen for sure? Or is it something that happens randomly? Are you not sure why is it happening? So I'm pretty sure that I can see it and I'm pretty sure that I can see it on that computer that you have access to. If I run, if I, let's see. Well, actually, is everybody okay? I could switch and share my screen and show you the job where you'll see it if that will help you. Sure, Mark. Yeah, sure, Mark. So here, let's just put this up and what I'm going to do is bring up that job. Can you see my screen? Yeah. Oh, right, it's the wrong one. It would help if I used the correct, this one. Okay, so I have a thing called bugs pipeline checks. Let's make this so that it's actually readable on the screen size there. Is that easier? Bugs pipeline checks. In bugs pipeline checks, I have a repository named Jenkins-Bugs and then it's represented by different protocols that access the same repository. Any one of those, if you look here and we look at the scan repository log, it scans 150 or so branches looking for changes and I have scripts that cause changes in many of these repositories in roughly linear time. It just, it causes a change and then sleeps and then causes another change. And what I will see is that when I look at one of these jobs running, it will show a long pause in the console output where it says this, branch indexing. Okay. And that branch indexing and I think it's this branch indexing plus this and I have a suspicion that there's a long pause between this and this. But again, I'm not sure that this is in scope for the project you're working on, Rishabh. I show it to you. Your goal I think is to find areas to improve plug-in performance. I'm not sure that this should be your first priority yet. Okay. I get that. I get that. This is not going to be my first priority. I was just interested in how, what the issue was. Like actually did not quite understand it. And then right now I understand it when you showed it. Okay. So what I'm going to do is after, so I think I've read enough. I'm going to start creating a prototype for the estimator size task. I'll do that. The first task, coding task. The second is I'll create benchmarks for to compare the walking, the commit just walking functionalities between both CLI get and JGET. Yeah. These two things. And I don't think we have any other thing to... Does CLI get provide that working functionality or does only JGET has that? I just check like JGET has that rev-work and I think that is pretty awesome thing. I just check for the Git CLI. I was checking. So there are commands, multiple commands which would provide the same functionality. I haven't checked each one of them, but I am sure that we have commands which would... There's a command which would print the commit graph. There's a command which would walk with the commit history. I mean, it's a Git rev-work. So yeah. So I'll see more into the code where what operations I would need to benchmark. Would be useful to benchmark. And then add this to PR. And the first thing I'm going to do is to raise the PR for to connect the Jenkins file as a great expression. I think that's it from my side for today. Anything you guys want to discuss? Because we have five minutes left. So you could just end it. You're okay running the... You'll run the session next Wednesday and next Friday from your Zoom account. Yes, yes, Mark. I'm comfortable with that. Great. So I... Thank you for letting me take a week off. That's okay, Mark. It's very well done, I think. So what I'm going to do is I'm going to mail the link to each of the mentors. I think I'll create an event. Google Canada. I'll do that. And I'll manage it on the first day. Right. There's nothing that we could... So Rishabh, hello. Yeah, yes, I'm Rishabh. So do you need like my help for creating those empty like repositories for the benchmarking purpose or you'll be using those? We actually... Yeah, this is one potential discussion we actually missed. So what you're talking... I think we should explain the topic first. The topic is that we... So for the benchmarking strategy, we were more focused on the size of the repository. And now we want to move on to parameters like the number of branches, the summit history and so what Omkar is proposing is to create some repositories with the constant size but a different number of branches. So when we use the benchmarks I have for these repositories, what we could find out would be how much the number of branches they create performance overhead or they contribute to the performance of git fetch. So that would be... I think that's an interesting thing we could do and Omkar, if you're creating the repositories, I, as far as I remember, you have created a repository with 5,000 branches. Yep. So I think what would be... I think what we could do is that we could have a repository with maybe 10 branches, then we go to 50, then 500 and then 5,000. Yeah. So I was just waiting for a poll from Mark for that particular thing. So over this weekend, I'll be creating those sample repositories for you. Definitely. Okay. Thank you so much for doing that Omkar. No problem. So once you have those, I'll edit my benchmarks and we can run the experiment then. Okay, with everyone? Okay. So that's it for today. Thank you guys. Bye-bye. Letting go great, Mark. I'm sure it's gonna be awesome. Yeah, well, and if not, it will be COVID-19 makes things interesting. Even if they're not awesome, they are certainly interesting. So there you go. See you everybody. I'll post the recording link. Bye-bye.