 Sounds great. Great. Welcome everyone. This is the Jenkins platform special interest group. It's June 18 2020. Thanks for joining us. Let's look at the at the agenda and be sure that we agree on topics and then we'll work through the agenda. So what we've got is we'll talk about open action items. Then we are delighted to have Rishabh with us to talk about his progress on the get plug-in performance improvement Google summer of code project. Interesting results, good results, and fascinating progress. So we'll give him as much time as he needs. Ask questions, record notes, etc. Then Alex and I are going to talk briefly about the adopt open JDK for Docker image change and work through what it means. Do some discussions briefly and we'll conclude there. Are there any other topics we should add to the agenda? No, I think that sounds good. Okay, great. All right. So let's do the review of action items. I have the action to switch the meeting URL to use the CDF Zoom account. I've done that. Rishabh, you will notice that you can share the screen without me having to do anything. That is one of the results switching to use F Zoom account. I have still had the open action item to open a Jenkins enhancement proposal for Docker operating system support. I apologize. Yes, I will get that done. We are in progress right now on the Docker build rework PR or a derivative of it. So that's encouraging. Oh, no, I take it back. The one I'm in progress on is the next one. So we still need to make some further progress on the Docker build rework PR because this one gives us a better structure. Alex, this is the Docker manifest and parallel build changes that I haven't done yet. Anything you'd like to report there? I've started to take a look at it. I'm running some local builds and things like that. So far it's looking good. The way I apparently have my Docker authentication set up is I'm getting a 401 at some point when it's trying to publish because my login is timing out. So I need to look into that. Okay. Now, and this is one that since I don't think I have authorization to do anything to the Docker hub account and I hope I don't, I should be able to do the evaluation as well and run it in some sort of local mode, a dry run mode. Yes. I think the dry run is still there. Okay. Great. All right. So this still needs review. There's lots to do in Docker image work. And the next was the Alpine image update PR and that one is PR 956 here. That one I would had started review on it last week, some additional review yesterday and further this morning. So, all right. Anything, Alex, you wanted to report otherwise on action items? Any other topics there? Okay. Rishabh, I think we're ready for you on the Git plug-in project. Okay. Thanks, Mark. Let's see. I need to stop sharing, don't I? Here we go. Yes. All right. All yours. Okay. Git plug-in performance improvement. As Mark mentioned, I'm the GSOC 2020 student. Rishabh is my name. So, the objective of the project, so it's a simple objective to improve the performance of Git plug-in and we've identified two ways to do it. The first is to use JMH, which is a framework, Java micro benchmarking harness, which provides us a safe environment to test benchmarks. So, we want to use benchmarks to identify known or unknown performance issues with the existing Git implementations we have, that is the CLI Git and the Java, pure Java implementation, JGit. So, this is the first objective. We have created, we have already created Git fetch benchmarks and we have some great insights from those benchmarks and we are going to implement those inside the Git plug-in. The second is to fix the existing performance issues we have in the Git plug-in and I'm going to discuss them further in the presentation. So, what have we done and what are we doing? The first thing we have done is to integrate JMH, the module, inside the Git client plug-in. So, basically what I've done is that I've added the benchmarks to the test module of the plug-in and now the benchmarks can run on cijengins.io, which gives us a wider selection of platforms where we can run the benchmarks and have a comprehensive result profile and how it looks. It's basically like this, a pipeline, a sample pipeline. So, we build the repository, then we check it, we run the benchmarks and then we have a JSON, which is a report which we can use to feed into a visualizer where we can see the benchmarks visually how things are going and how the operations are performing. So, this is the first thing we've done. After this, we've tried to fix the redundant fetch issue. So, what is the issue and how was it affecting the Git plug-in performance or is it affecting actually not was? So, the issue is very simple. The reason behind it is not simple, but it's... So, this is a freestyle project. I'm trying to check out a repository. So, as you can see here, I am trying to fetch the upstream changes from the repository once, which is expected, which is normal. But then again, I'm doing the same thing incrementally. So, this second fetch is redundant. And we've actually written test cases to understand and we've worked through some scenarios to understand if we're losing or breaking any use cases, if we're avoiding this second fetch. And we've come to the conclusion that though we have some interactive testing left right now, but still we have come to the conclusion that we can remove the second fetch, but we were not sure that if the removal of this second fetch is going to improve the performance of the Git plug-in, we did not have concrete numbers to show that, to support that hypothesis. If this is a hypothesis that removing the second fetch would actually improve the Git plug-in performance. So, for that, initially, I created a benchmark, four benchmarks actually, with Git fetch. The benchmarks were simple. The first benchmark is a single Git fetch with a narrow ref spec, which basically means that it is trying to fetch one single branch, which is the master branch. The second test was also a baseline single Git fetch test. But with a wider ref spec, that means all the branches. Then the third and fourth tests are double fetches with the same thing, narrow ref spec and a wider ref spec. So, before even fixing the issue, what we could see from the benchmarks was, as you can see, this is an interactive chart, as I progress with test one, test two, test three and test four, for each repository, the execution time increases when I'm moving towards test three and test four. Yes, Mark. Okay. So, what we're seeing here is, as you're iterating through the horizontal dot that moves across, that's telling me repository size. Is that what it is? So, test one is smaller? No, no. Tell me again, whether test one, two, three, four. Okay. Test one, test two, test three and test four are basically benchmarks. Test one and test two are benchmarks which calculate the execution time for a single Git fetch. Test three and test four are benchmarks, which calculate the execution time for double Git fetches. What you're seeing here, the x-axis is the repository size and the y-axis is the average execution time, which is increasing. I have four repositories. The size is for repo one, it's less than one MP. For the second one, it's five MP. For the third, it's 90. And for the fourth, it's 300. While we're iterating dots, it's basically running through the benchmarks I've done and the results of those benchmarks. So, what you can see here is, once I go from test one to test three, four, with repository three and four, you can see a remarkable increase in the execution time. And with repo one and two, there is not much of an increase, which kind of tells us, gives us a hint that with a small repository size, the incremental fetch would not add too much performance overhead. But with a larger repository size, there is a possible chance that the second fetch would be adding considerable performance overhead. So, this was kind of a theoretical experiment before fixing the fetch, which gave us a reason to actually remove the redundant fetch. Now, so how did we fix it? It was a simple fix. We added a Boolean to check once we're using this first fetch. We will not use the second fetch. And so, once we did that, are we seeing any change in performance? And to see that, I used profiling, profiling with the Java Flight Recorder on JDK 11. So, I ran the Jenkins war with the updated Git plugin with my changes, with my fix for the redundant fetch issue on the Jenkins war on JDK 11 with the JFR Java Flight Recorder, which is a profiling tool, comes with the JDK 11. It has very low performance overhead. So, what you can see here is, so I took two repositories. One is the Jenkins IO of 40 MB size. And one, I took the Samba Git repository, which is near about 300 MB. So, with these two repositories, I performed a simple checkout build. And here, what you can see is, without the fix for the smaller repository, there is not much of a difference. It's basically a 10 second difference. Without fix, it's two minutes, 55 seconds. And with the fix, it's two minutes, 46 seconds. So, there's a 10 second decrease in the execution time for the Git fetch calls for the plugin overall. And with the second, with a much larger repository, 300 MB size, there is approximately a two minute difference once we remove the second fetch, which is, I think, a considerable degree improvement in the performance. And if we increase the size of the repository, I'm sure we will have much larger differences. So, I've done this once or I've done this actually twice to confirm if is my performing results, profiling results correct or not. Maybe I need to do this with a wider variety of repository size to confirm that, okay, the redundant fetch is actually, the removal is actually improving the performance. But I think this is, yeah. Yes, Ma. That's gorgeous. So, this was with command line Git as the implementation or with JGit as the implementation? Command line Git. Excellent. Okay. So, then your Java flight recorder could not even, could not have any impact actually on the actual operations performed by CLI Git because it's a separate sub-process. So, that gives me even more confidence in your benchmark numbers and your measurements because, all right, there's not even the profiling from the Java flight recorder component basically pauses or has just sub-process or just process impact not. It can't touch the Git command line process that's a C program running entirety separate. Very nice. Excellent. Okay. So, at that large repository, that's really cool. So, for this issue, this is what we've done. So, the next step forward for us is to implement the performance improvement inside the plugins and to do that, we've figured out two steps right now. The first is to provide a compatibility switch to the users. The switch is going to be basically for users. We were assuming that once we, once we add the improvements, whatever we get from the benchmarks, we add those improvements inside the plugin, there might be cases where the user's functionality might be affected. The performance might be affected in some ways. We do not, we did not anticipate. So, we are providing a switch and it's going to look like this roughly I have it. It's in the configured system in Jenkins. Inside the Git plugin, you will have a checkbox which says, Revert Performance Improvement changes. So, once checked, it will revert to the old version of the Git plugin. So, this is the first step. The second step is to actually selectively switch between the implementation, that is the CLI Git or JGit. And from the benchmarks, we've found out that for Git fetch, we've found out that the size of repository is the biggest parameter which is affecting its performance. For an example, one insight we have is that for a repository less than five MB size, JGit is going to perform better than Git. And for a larger repository, Git is considerably performing way better. It's performing way better than JGit. So, if we want to implement that inside Git plugin, we need to estimate the repository size before creating the client. And so, we have, we've had some discussions on using heuristics like using Git LS remote to calculate the number of branches we have without cloning the repository inside the plugin. And we can use a rough estimate so that we know maybe 50 branches means a large repository or a lesser number means a smaller size could possibly use that. Second could be, another option could be to use existing REST APIs exposed by GitHub or GitLab where they provide the size of the repository. So, we could also do that. We're thinking how to do it currently. It's in process. And any suggestions on how to do this, how to estimate the repository size without cloning the repository first would be greatly appreciated. So, yeah, this is what we have done and what we plan to do. And any questions? Okay, so the first cycle then is the redundant fetch. You've already got a target where you say we know that there's a significant performance improvement to be gained here. And this redundancy affects both command line Git and JGit. So, it's redundant in both cases. It's not just one or the other. They're both doing this redundant fetch. Yes. Okay, so everybody gets benefit if no matter which implementation they chose or no matter which implementation we were to choose, they get benefit potentially large, potentially small, but they get benefit by this choice. Good. Okay. And then, oh, go ahead. No, no. Then on the benchmarking, the JMH-based benchmarking, the challenge there was that you've got to figure out before knowing, before having performed the full operation, you have to decide which implementation should actually execute the operation. So, before doing a fetch, you need to make a decision, shall we use CLI Git or shall we use JGit? Yes. Excellent. That's something you would have to figure out. Yes. Thank you very much. I think you've covered it well. Alex, did you have any questions you wanted to ask or let's see, has Oleg joined us? Is there, so with the JSON file that's created, is there any way of visualizing that in Jenkins itself as a way to kind of get a snapshot of these? Yes, Alex. There is an already existing JMH visualizer plugin. I haven't tested it. I haven't integrated it with the Jenkins instance, but I have to test it. But what it does is it consumes the JSON file and maybe I have just one second how it looks. I can show you. It's called JMH report and I cannot see how the visualization is made. I don't have it here, but there's a website that does the same thing. The plugin has basically used this website to do so. And I have a JMH JSON sample, just bear with me for a second. So this is how the visualizer visualizes the JSON. So once we add that plugin with our process of building the benchmarks, then we can see the results like this. That'd be awesome. Well, open an info issue to get that plugin installed on ci.jdc.io. Well, and another variant here is that one of the benefits of ci.jnk.io for Rishabh is that whereas he's got his Mac OS machine that he runs base checks on, ci.jnk.io provides him access to ARM64, to AMD64, to S390X, and to PowerPC. So he can safety check that, hey, are the benchmarks that we're using representative even across the different environments that ci.jnk.io represents? Oh, and Windows. Yes, I forget. There's one other platform there. We regularly have surprises with Windows. So yes. So then the second question I have is the integration of JMH stuff. Is that something that can be easily added to maybe the parent palm and optionally enabled for all plugins? Or is there a plan to do anything like that? So it's, oh, go ahead, Rishabh. I should not answer. This is yours. That's it. It's okay. So yes, there is an option that the Java, the Jenkins test harness already as it has the JMH harness inside the dependency. So once you have, so there's a version for the palm you need to have for Jenkins for that particular dependency. Once you have your grid, I don't remember the version. I think it's 2.5 for the Jenkins test harness. Once you have that, you don't need anything else. Just you can use the JMH benchmarks inside your plugin, in any plugin. That was done before. It was done in a previous GSOC project. Role strategy plugin. Okay. And does it require special tests? Or can, like, do you have to mark tests as part of the JMH data collection? I can actually show you how is a benchmark made. So, yes, like JUnit test, it's a little similar like that. So you annotate it with a benchmark annotation and you need a benchmark runner as well, which will identify the benchmarks and run it with some options you'll need. The options can be how you want the results, average time or the throughput time, whatever you want, how much you want to warm up the benchmarks before you run them, the time unit folks, the JVM folks you want. So all of those options, they are included in the runner and that class. So how do you run that? So there's a profile in Maven profile, a JMH benchmark. So your benchmarks will run from that command and you'll have a generated JSON report from that command. And if you want to integrate the benchmarks, you want to run them on cijenkins.io, there's already a build step called run benchmarks. You just have to add that in your Jenkins file. And if you have benchmarks, they will be run on the CI Jenkins infrastructure. Cool, thank you. So Rishabh, would you show the Jenkins file? I'm enamored with how elegant the work from the role strategy plugin made it for this, but then we found some interesting issues that made me beg Rishabh to do some extensions. So in the get client plugin, he's going to show us what the Jenkins file looks like. Yeah, so this is the step which enables running the benchmarks. And we provide the name of the JSON we want as an output. So before, so what was happening before was that for every pull request for every branch, Mark was creating for testing, the benchmarks, they have a considerable time duration, they run for maybe an hour, the four hour depends on the kind of benchmarks we have, it takes time. So it's an unnecessary addition for people who do not want to test the performance maybe. So what we've done is a simple pattern matching that we will run the benchmarks only for the master branch or for any branch which is related to GSOC. So it's pretty simple. See, for me, Alex, the treat here was that somebody else had taken the time, the last year role strategy had taken the time to create this plug-in, this pipeline shared library called run benchmarks. And so Rishabh was able to without doing anything more than using run benchmarks, he gets it to execute now on the ci.jankens.io infrastructure. And then for my needs, he also gets the ability to put conditionals in there so we don't have to run it on any branches except specific targets. Very cool. So that's it from my side. Excellent work, Rishabh. Thanks very much. Thank you, thank you, and good luck with your ongoing efforts. I'm excited to be able to announce the release of the get plug-in that includes this. This will be so cool. Thank you, thank you. Thank you so much, Rishabh. Okay, so I should stop sharing my screen. Yeah, and I'm going to switch on screen share and let's take a look at the next topic. It was that we want to do share, look, share, screen, really, there, okay. I think the next topic is around the alpine image. Oh, I didn't take any notes, Rishabh. I'll insert some notes here on your results so that we've got them in the text as well. The recording of the meeting will be posted shortly after the meeting. Okay, should I add the notes while you're... Oh, that would be great if you're willing to do it. That would be wonderful. I'll do that. So adopt open JDK for Docker on alpine, Debian Buster, and Centos. So Alex, this was for me an excuse to just have a conversation with you to be sure that I've understood what we're expecting in PR956 and what we should be reviewing. Sure. So as far as I can tell, it's giving us a nice integrated new structure where we say we start with the JDK target is the first part of the directory name, then the operating system, and then the, what do you call it, the JVM optimization technology, whether a hotspot or open J9? Yes, that's correct. And so then what I was trying to decode was how does that map to tags in the repository? So for instance, I was looking for a Debian squeeze-based version of this and didn't detect one. Is that... Have I misunderstood? There's Buster. So there's Debian 10, Buster in the slim image, but there doesn't appear to be a Buster. Oh no, there is Buster, but somehow or other, I didn't find that label. Okay. So I, oh, this is JDK 11. I was looking for JDK 8 with Buster. So what you've done here is if using Java 11, I get the newest version of Debian, not the old version of Debian. Well, so this is just taking exactly what we had before and migrating the structure. So it doesn't, like a, the only image, new image I added, I believe, is the, is a JDK 11 alpine. Everything else is already existing. I see. Okay. So the directory structure is a refactoring in that sense, not an increment of adding a whole bunch of new distributions. Thanks. I had misunderstood that. Yeah. And so the directory structure was changed and then the base image was changed, such that it uses the adoptable JDK. Right. And so that's this here, which in particular with alpine is a real win for us because it gives us a newer version of the alpine base operating system, right? Instead of 3.9, it's now 3.11. Correct. It's whatever their latest is. Yeah, exactly. And it's, since the alpine project has stopped any work, or I guess it's the wrong way to say it, open JDK as a project is no longer publishing updates to the alpine image, right? It's stuck on Java 8, 212. As well as they are not publishing correct images or other architectures. Ah, okay. All right. That's a big problem too. Okay. So the transition here to adopt open JDK gives us a more current operating system base, a more current JDK, something that the project adopt open JDK is actually actively testing. And we get a better base image for ourselves. Excellent. Okay. Thank you. All right. So one of my pleas to the viewers of the video is this is a great excuse to do additional testing to help us as we evaluate this because it's, so the intent here is not to alter what we're delivering in the sense of base operating system version, but we are altering which operating system we're using, or which base image we're using. So instead of using open JDK, we'll use adopt open JDK. Correct. Got it. Okay. Excellent. Okay. Thank you for that help. That was what I needed. If there are any other topics we should review today. All right. Let's call an end to the meeting. Rishabh, thanks very much. Excellent summary, great results, and looking forward to further. Alex, thanks for your time as well. A recording of the meeting will be posted within about an hour, hour and a half. Thanks, Alex. Bye.