 So, the Git Plugin Performance Improvement Project, first of all, I'd like to, before introducing the project, I'd like to introduce the real rock stars of the project, my mentors Mark, Fran, Justin, Omkar, they've been really great with their support, the time they have given for the meetings and the patience they've had for my work. The experience for me has been really amazing. So now I'll move on to the project. So the essence of the project or the background behind the project is the Git Plugin is one of the most widely used plug-in within the Jenkins ecosystem and we wanted to enhance its performance and for that we first wanted to identify areas within the plug-in where we could enhance performance. So we decided to focus on the Git SEM checkout phase of the pipeline and to enhance the performance of the plug-in which would in effect enhance the performance of a Jenkins pipeline in general. So we've seen in multiple places in various scenarios with multiple benchmarks that the checkout process can present as a bottleneck when we are talking about certain implementations of the Git Plugin and in certain cases where the size of the repository is large. So the expected deliverables of the project were to first fix the existing performance issues we have, which was the occurrence of a redundant fetch in the checkout process. Then the major chunk of the project was to focus on comparing the two existing implementations of the Git Plugin and compare them using a benchmarking framework called GMH. And then use the insights derived from those experiments to actually enhance the performance of the plug-in. So the idea of doing that, doing finding area to improve and then to implement that feature, it started with comparing the implementations. The Git Plugin has two implementations, the first is the command line Git and the second is a Java implementations of Git, which is called GA Git. Now what I had to do was to choose some operations, then benchmark those operations using these two implementations and if I find a difference of performance, a marginal, let's say a noticeable difference of performance between both of these implementations, I would then have to decide which implementation I would use and then implement a feature which would do that for us within the plug-in. Now the path we took and the process we used to reach to an actionable insight which we would use to create that feature, that performance enhancement feature, was to study the relationship of performance of the Git fetch with the repository structure of any repository. So what we found out using various benchmarks was that the size of the object, so the aggregated object has the highest correlation or the highest, it affects the performance as if we compare the other parameters of the repository which are the number of branches or the number of tags or the number of commits, size of the object is what affects the performance the most. Then tags and then a little bit branches, with number of commits it's almost, there's not much of a correlation between the performance of Git fetch and the number of commits. So now after these experiments, we knew that we know the parameters which affect the performance of the operations. Now we needed to encode these parameters to make decisions within the plug-in for different scenarios. And to do so, we have created a new feature which is called the Git 2 chooser. It is implemented within the checkout process where you clone a repository, a remote repository to your local machine. Now this Git 2 chooser is basically a feature which takes the responsibility of choosing an implementation from the user to the system. The user who is using the Git plug-in or creating a pipeline might not know which implementation is best or optimal for which use case or maybe let's say the user might not know that JGIT is, the JGIT implementation is bad when we're talking about large depositories. JGIT should not be used within the Git plug-in when we're talking about cloning large size depositories. So the user doesn't have to worry about that, we have shifted that responsibility within the system. The user just has to choose whatever, they have to do the same thing they used to, this is going to be an invisible feature, we're going to take the decisions to the back end. The second thing we've done is to remove the second fetch, which was redundant in most cases. Now doing all of this, introducing these features, what we are seeing is a 50% reduction in the overall job execution time for a pipeline. Now this comes with the catch, I will explain the performance enhancements into two parts, the first with major performance improvements and the second with minor performance improvements. So for any user, a user would see a major performance improvement, which is a noticeable performance, I would say a reduction in their job execution time when they've chosen JGIT as the implementation for Git plug-in and they're trying to clone a large depository. What they'll see is, they'll see a reduction of almost 50% of execution time, the first graph, if you focus at the first graph, I have compared four repositories, Jenkins, IO, Microsoft VS Code, Spark, Kubernetes, all of these repositories are greater than 500 MB in size and if you see the blue bar represents before the performance enhancement features and the green represents after the performance enhancement and visually you can see that the drop in the execution time is almost half and that means, yes. So the second graph, if we talk about the second graph, now we wanted to see that the result we're getting, this 50% drop is consistent across various platforms. So in this graph, what I'm doing is I'm cloning, I'm checking out a large depository tensorflow, it's around 800 MB, the size of that depository and I'm doing that in CentOSA, DBN10, VBSG12 and Windows and what you can see here is that before the application of the feature and after the application of the feature, there is a 50% drop. Sometimes it's a little more than that but noticeably it'll be around about 50% execution difference. Now one might ask that if I'm not choosing JGIT, I don't even know what JGIT is, I would choose the default implementation, what kind of improvements I would see. Well for them, the improvements would be a reduction of let's say a second. So why would that happen is that we've seen for a small size depository, let's say less than 50 MB, JGIT is performing better than CLI kit and the reason for that is that JGIT is a Java native implementation and JVM is hot by the time we are performing those operations. So that leverage makes it perform better than CLI kit but for a small size depository. For large size depositories, JGIT fails miserably, not miserably but exponentially there's a degradation of performance. So the performance improvements you would see with the graph I'm showing here is a 20 MB size depository. There is across multiple platforms, I am checking out the JGIT plugin depository and what you're seeing is here is a drop of one second. There's a drop of two seconds with Windows platform but as we all know across consecutive builds, there is a considerable variability. So I would say with the benchmarks I've seen the theoretical decrease in execution time is it varies from 0.5 seconds to a second of reduction and the second thing, the second improvement would be the removal because of the removal of the LAN fetch it could remove the unnecessary load on the GIT service because previously the checkout step used to perform two GIT fetch operations and now it performs only one. So that's another improvement. Now the feature we've implemented which provides the recommendations for implementation it needs two things to do that and the first thing is if you have a multi branch project within your Jenkins instance it would use the existing cache of the GIT depository to estimate the size of the depository and if it doesn't find that what it wants is it wants to find the size of the depository which is provided by the GIT providers themselves that they expose REST APIs which allow us to get that information from the providers. Now unfortunately the GIT plugin does not implement REST APIs we do not have a way to communicate with them but various branch source plugins like the GitHub branch source, GitLackity and so on they have the necessary APIs for us to communicate with the providers and get the size of the depository. So we've exposed an extension point which would allow these plugins to implement an extension which would then give us the information we need. Now currently we've implemented an extension for the GitHub branch source plugin. We're facing some issues with the GitLab branch source plugin mainly because there's a difference of authentication the way credentials are used in GitLab and the type of credentials the Git plugin possess. There's a difference the Git plugin does not support GitLab personal access tokens and that is why it is not able to provide that information to the branch source plugin and hence we're not able to establish an authenticated REST API request. So that's an issue we have to think about right now. And with the other plugins what we've seen is that the extension point, the model, the way we're sending the credentials, the information credentials and we're implementing the extension and just like we did in the GitHub branch source plugin it's going to be the same. I've tested it with the Git plugin as well but none of that is right now merged and released. So currently if a user is trying the features we've implemented if they have a multi branch project with cache they would get the benefits we want to provide to them. But I'm hopeful that the support which I'm talking about would be available within let's say two or three weeks. There's some issues with some of the plugins not being maintained currently by anyone so that might induce some delay in the process. Now is this available? Yes. The latest Git plugin packages this feature and all of the users are presented they can try it. I would also like to show the documentation where we have included certain safety switches in the global configuration page where you can disable the firm's enhancements and you can ask us to not remove the second fetch and all of this is done to make sure that if these enhancements are breaking existing use cases within your Jenkins for your Jenkins jobs you would not be stuck by that change. You can disable that from the global configuration page and we've documented the decisions we've taken and how you can disable it and please raise a ticket Jira so that we know that we have an issue and we need to fix it quickly. So now the future work with what we've done and what can be done ahead. So there is a scenario where if a Jenkins instance contains multiple Git installations, CNI Git installations, the Git tool chooser which is the feature is not able to map the implementations correctly and might provide a different implementation, a different version of Git which can be an issue which we will fix shortly. Then we need to of course write more test cases to cover new use cases and then we've implemented this feature Git tool chooser within the checkout process of the area of the Git plugin. We need to do the same within the Git SCM cell which is a scanning of the branches portion of the plugin. After that Mark also reported that sometimes he's seen unexpected delays in in building jobs which can be attributed to the process of creating a lock to the cache directories present within the controller. So we need to detect it and then fix it if it's possible. Lastly I'd like to thank the Jenkins community and my mentors. This has been a very positive experience for me. I'm glad that I chose Jenkins as my portal to the open source board. This has been an amazing experience for me. Thank you so much. I'm open for questions. Thank you for the presentation. If anyone has any questions please ask in Kone or in the chat and we will ask presenters. I mean, while we wait for questions maybe mentors like to say a few words. Thanks very much to Rasha. This is Mark wait. Thanks very much to Rasha for his active and very involved work. Google summer of code has been a treat. This is my first time being a mentor and what a fun experience to be a mentor and watch a little dismaying at times when he would ask questions and I didn't know the answer. It's really kind of scary when the mentor says well I don't know we'll have to go figure that out but he did a great job decoding things, understanding, working with multiple plugins, dealing with different committers at a great experience. Thank you. Anyone else? I'm here with Mark. Rasha did a great job. Now just wanted to say Rasha did a great job. Like Mark said it's a little challenging in open source because you have different things to deal with if you're a student working on your own project or you're an organization where people are paid to do the work and Rasha did a great job of collaborating with folks both within our team and outside. So there was a question in the chat, are there performance improvements enabled in the Git 4.4.0 release and yes advanced say yes. Yes by default they will be enabled within the plugin. You can disable them if you have some issues with the answers. Thank you. One more related question please but for this to work you have to have both JGit and Git available. Well Martin JGit is packaged within the Git plugin so you just have to choose it if you want to implement. Git is something, yes Mark. It's a good point that by default JGit is not enabled Martin and so in order to get the benefit you will have to enable JGit and there are instructions on how to enable JGit in the Git client plugin documentation. It's a simple process from the Jenkins user interface but it is not by default enabled so you will have to take some action to get the benefit. Understood thank you.