 Okay, so recording is now on and I'm going to let's see well well how about we work through the agenda you and me together and the oh and Fran is here great thank you Fran thanks for joining okay so here is what I've got for proposed proposed topics for the meeting and let's let's it seems like we've got a good presentation that Rishabh has prepared let's talk through that yeah here we go actually maybe Rishabh would it be better if we have you drive the agenda and we will we will coach and and encourage you have a preference I'm used to when I schedule a meeting I run the agenda but typically this may be better for you run the agenda you tell us what you want to do and we'll let you guide us sure sure that that's okay I'll start so do you want me to start with the previous action items what you've done now or you want me to start with my presentation I wanted to start with the presentation because I in that needs a lot of time on discussion as well let's let's go with the presentation I think the action items we don't we don't need to review those you your meeting let's do presentation I'm much more interested in it anyway yeah so I'm sharing my screen now so I'm going to discuss my strategy implemented for the proposal I and a little experiment on benchmarking and I did that for get fetch I chose that as a good as an as a gate operation and I use gmh as it was advised in the project plan so what is jm is a java harness it's used for building analyzing micro benchmark basically it's written in Java and there are two ways to run it standalone and from the IDE so in their document in their documentation the developers the jmh developers they've said that they've recommended that we should use standalone as a it as a standalone project because they say it's more reliable than running it from an ID but I think in the previous GSoc one of the student there that person implemented integrated jmh in the Jenkins test harness so right now so for the project and for my for my experiment I did both I created a standalone project and I also ran it from the ID I created a module under the test model in the Git client plan to run the experiment and I'll be sharing the code shortly so so do you want questions during or at the end so so when you're saying standalone project or from ID I was assuming that was just the runtime whether I invoke the thing with a maven command or some other command or if I use my ID click and I could understand why it would be more reliable from a command line no in no interaction with the IDE but but that seemed different than what you were describing with yeah you had created a product it inside the maven project definition I assume and it just it runs so you can run it from a command line or from the IDE did misunderstand something I think that is a potential confusion I have because that when I read it I assume that that it means that I need to create a jar a separate jar and and I need to run it from the command line so I created a different project and I imported the gate client dependencies maybe maybe I'm wrong in this approach but I learned the experiment okay good okay so then how to create a reliable benchmark so this is something I just one of the first things I had to learn trust the numbers which I received because it's very easy to have preconceived notions and then just you'll get the numbers and and think that yeah maybe CLI gate is better than J gate and I'll shortly show how I was grabbed in one of the in one of the experiments I ran because I thought that the number would be that JMS would be a reliable same work yeah so some of the pitfalls I want to discuss before explaining my strategy so first is that we discuss that we don't want any external network interference because most of the gate operations they require an external connection and we want to isolate our benchmark from any kind of interference second is that code elimination so if I don't return anything if I'm if I'm testing operation I'm not returning anything in that benchmark JVM might optimize it and that might give us this false perception that our code is running faster but so use there's this concept of using a black hole GMH provides that it basins you object whatever you're returning and that tricks the JVM into thinking that there is there is an object which is being returned from the function and constant folding is basically when declaring constants and it replaces the calculation with the result so I don't want that as well so there's this another another feature with JMS provides for that and that is called a state static class and I'll describe that with the code I have so configuring our benchmarks I like so they are things like benchmark mode warm up before we run the test how many folks I want and I'm going to discuss I'm going to explain all that with my code so yeah so if you want to integrate JMH inside our project we just we just have to add this dependency into our form and and then what I have to do is I have to create a runner class this is the runner class I have this basically what it does is I can include my options I can add my options whatever I want however the parameters I want on the benchmarks and also if I I what I have to do is I just have to create a class I have to annotate it with JMH benchmark and this runner will identify those losses and run the benchmarks so this is what the benchmark is for then I'll okay so the mode the mode is the first thing we'd like to discuss it's basically the benchmark mode is the performance we want so if we see here we have throughput operations per unit time so we wanted execution time so it's basically inverse of that so I took average time per operation in milliseconds and we also have sample time single short time and we can test each of the performance metrics as well all include that as well then warm up iterations how how many times do we want to warm up our JVM before we run the benchmark test the default is just a 5 so I use 5 then measurement iterations is how many times I want to run my benchmark that is all that is also a default number I chose 5 time unit the performance metric in which I won and it was millisecond that was that threads I was not too sure because actually I'm not very confident if there is a connection between the gate operations and parallel programming I'm not sure if I'm actually not well versed with that concept so I used to but I think that's that's something I need to learn before doing this study so yeah and folks I used to but they they recommend to use as many folks as you can maybe five basically JVM child processes you want and so when I run a benchmark test I have created it's called a trial so what happens is if I have two folks it's going to test it it's going to first run it in and then it's going to another create another JVM and going to run another trial on that separate JVM and and and from the documentation they've said that then as much as we increase the folks it's going to increase the precision of our results that's what they've said I couldn't test it with five folks because practically in my local machine running the benchmark it takes almost 30 minutes to do that which is also I think it's a concern and the result I wanted in the format of J some that's that's yeah so now I'll show one of the benchmarks I've written and then if you guys want the results as well so the first thing is the state I was talking about this is basically JMH handling all the all the arguments all the objects you want for your operations to run so it's going to instantiate and it's going to handle the instantiation and sharing of the object so you don't want to declare objects and instantiate them in your benchmark because you want just you want to isolate your opportunity inside the benchmark so we declare whatever variables we want to in this state and then we can pass this state the objects as an argument in inside the benchmark method so I paradise this test because we have two implementations get CLI get and J get and I also I my strategy was to test it with a with a variable size of repository so I chose four repositories and the sizes which I have chosen I can I think I have a visual I have a yeah so the sizes are point zero three four MB then there's a four almost five MB repository and it's 93 and then it's 324 so I wanted to test it like this and then I had to create a temp a utility which would create a local get repository for the lifetime of the benchmark and for that I shamelessly I copied the temporary folder J unit rule I just removed the J rule I changed it to folder for benchmark this guy I didn't change a lot of things there and I've used that to to create a local get repository now after that so what I have here is is basically before and after just as we have been J unit so so the level what it means is that for each iteration of the benchmark it is going to set these things up and then it's going to basically tear them down that is what is going to happen when I and this is the scope we're mentioning here this could be the levels we can share we can set our creation and invocation so what I wanted was for each iteration I want a new local get repository that is what I want and I run five iterations in just one trial so I wouldn't want to share my repository space in the whole for the whole trial right that is why I chose the scope iteration and so what I'm doing is I'm and also the repository you are as I chosen it's basically the local the local get repositories so that we were not we're not connecting to a remote repository through via external connection so this makes sure that our benchmark is truly isolated and so the respects I have basically fetched all the branches right now and I initialize I also put as soon as I created the client I initialize the repository before before performing git fetch and this was done before the benchmark so that this doesn't get included in the benchmark I want to test now benchmark is pretty simple I have basically I have the Jenkins state I have a black hole which basically consumes the object I've created here so that the JVM doesn't optimizes it and reduces the time and I have any unintended results I don't want so the fetch command it's pretty simple it's the it's what we do and I execute it and I then consume the fetch object face command object which it returns so and I also created another class called fetch vanilla benchmark which was basically making the same thing but with system dot nano time to see how jmh is working on the benchmarks and how much difference do we have in time and how accurate nano time is we want to casually benchmark operations is nano time a good option or not that's also I think I wanted to see so I like show the results I had for git fetch this are the results I had for git vanilla benchmark which basically means perform benchmarking them with system nano time so here you can see that see with between CLI git and jgit jgit is performing is is executed execution time is is slower than git CLI git for every for any kind of repository size I take and then I did the same thing with jmh yeah okay that that one surprises me because even on the micro-sized repository it was 300 kilobytes right the the repo one is is tiny it's it's very very small and yet in that one still your measurement showed that jgit was slower and substantially slower interesting okay continue so then I use the jmh performance benchmark and the anomalous behavior I saw here was that jgit was performing better than CLI git for repository size less than 5 mb if I had 400 kb and one for almost 5 mb so this was something I wanted to test because I was not I had no idea why would this happen so to check why would this happen I tried one more thing I did the only suspicion I had was that maybe because JVM is warming up jgit might perform better because of that but I was not sure that that assumption is right or not so I tried another thing jmh it gives us a different benchmark mode called single shot mode where we run the benchmark test without warming up the JVM so I wanted to see if I want to confirm that jgit is performing better than git because JVM is warmed enough so it was it was correct the assumption was correct because here you can see without the warm-up jgit is still slower so so this was one of the observations I I had from the experiment I ran here and yeah this this is the strategy strategy I chose to benchmark git operation and I think this is that I am she'd like to ask questions about the code methodology or anything and yeah and I also I have some questions in the questions for discussion in the design document but I would want to discuss that after you if you have some questions feedback on anything for this presentation then if there's none then I would like to proceed with the design document so Justin any from you or Fran from you I think kind of makes sense I kind of have the same same thing you said Mark it's kind of surprised but I think that way you said about the cold versus warm JVM maybe kind of like enlighten some things do so Richard one of the one of the places of concern for me was around platform specific issues like the potential for a substantially increased forecast on windows compared to Linux were your was your platform that you ran the benchmark on a Linux platform or Windows platform Linux Linux okay good all right so that that we know that you're using native the platform that is native for git so that is good all right just knowing which one that is that's a good choice I've got plenty of windows access from myself so I could run I could run conceivably these kinds of tests and see how it do you have access to a Windows machine at all or is your your only machine that you've got access to a Linux computer no I don't have access to a Windows machine okay so the project would need to provide you access to one on on Amazon or something like that good all right good to know thank you yeah and actually that's a good point and perhaps like so he's on Mac perhaps it would be good benchmark against the Linux and those two well no no Rishabh I think you said one X I Rishabh I think you said you're on Linux right not Mac I am only no I am on Mac I just oh you are on Mac oh okay so so you so you're you're okay good you're on a Unix variant BSD like good okay interesting all right very good so so it's also an important place excellent thanks okay so the first thing I I think I should we should discuss for GMH because the first question I had was regarding the creation of the test environment machine to work upon is it going to be provisioned by Jenkins infrastructure and yeah that's the first thing I want to discuss so so I think the right now the strategy would be to test this to test my benchmarks within my local machine right correct okay yeah and I think waiters sorry good yeah sure sure please please Justin I was just gonna say like to like take a very thought of like anything else running on your machine and stuff like that perhaps like more final benchmarks you know we're gonna have remote Windows or Linux maybe I guess maybe not a remote Mac but that's gonna be a little harder because of licensing stuff but it'd be nice to run on clean machines for like the likes and back benchmark or Linux and Windows benchmarks at least yes okay okay so the second has both those I think Jenkins and perhaps both of those right mark both of which sorry Justin I missed that I don't know I think Jenkins and perhaps Windows and Linux variant yes yes absolutely what and well and and even better to a good good point Reshab maybe we may want to give you yeah why don't we that's a that's a very good idea Justin I think we should answer question number one Reshab by having you submit pull requests to the get client plug-in or get plug-in whichever one is is the more crucial for your benchmarks which provide which actually execute the tests in the multiple environments because the the executors on jenci.jankins.io right now are single single executor per agent therefore we don't have collisions with other agents now they are they are still virtual machines you don't get access to a physical machine so the variability is still probably quite high but having you be able to use that environment would avoid you having to get a local Linux computer or get a local Windows computer instead you could use ci.jankins.io we admit that it's wildly variable and we accept that variability as part of the exploration and the learning that you're doing sure okay I can do that friend so I basically yeah yeah I'm just gonna check with friend does that seem reasonable to you Fran I'm just thinking about how the the infrastructure might help this so that we don't have to put him on separate machines let's use ci.jankins.io okay great so so Richard I think we've got an answer on number one and the answer is a good thing to do during this community bonding period because you've already submitted pull requests but this is a different kind of pull request right because the current pull requests are running in running J unit tests jmh is not quite J unit right it's a different thing and so you'll have to figure out some infrastructure and how to make it work differently so this will be a very interesting thing and community bonding is a perfect time to do it I welcome that okay are they are the Jenkins agents for ci.jankins.io is that I think you said this but I just wanted to confirm those are are those warm VMs and then it's one VM per agent they they are they are tragically long lived agents and therefore they are not just warmed they are at times stale and overheated so yes they are there there is a facility which he could use and the Azure Azure Container Infrastructure ACI which would give him absolutely non preheated but that's not the default and that's certainly not what the get the the get project uses the get projects use this the the stable they stay for a long time I have to clean their disk space periodically all sorts of challenges okay good times I figured that and you know that happens the real world so right right it is the rule well and and it is Rishabh you should not publish results from ci.jankins.io is definitive right we would certainly be better off ultimately before we get to results that we run it in an environment where we have better control than than those wildly variable agents but those agents can give us comparative numbers to help guide your your development and your shaping of the of the tests okay so the next thing I wanted to discuss was how would we choose operations that you want to test like right now I just thought that operations which involve network or IO operations I would I would test them but do we want to make a start do you want to make do you want to prioritize these operations in a in some way or do we just want to list out the operations we have which in which we use in our plugin and then just test them bench them how would we go about it so I I think we absolutely should prioritize them at least for me I think there are enough operations some of them are corner case operations which is probably not relevant how long it takes for instance there's an operation in the plug-in that will apply a tag and we can predictably say we just don't care how long that operation takes it's not going to dramatically affect one thing or the other it is done so infrequently whereas fetch or check out we do all the time and therefore is quite important okay now another one I would for me I would put top of my list I would suggest and maybe we should propose a process you use to choose the things and bring a recommendation to us it might be that what you do is look at or instrument the the get client to give you a report of counts which things got called how often and then we put that instrument of thing into an environment where it's it's used I'm happy to run it in my environment for instance with a thousand plus Jenkins jobs and I could then give you counts which say hey the here is the here are the counts of which which methods were called at what ratio okay that that's that's going to be great because I tried profiling I was not sure how much reliability is in my machine I did I did receive gate in it and get fetch as one of the most used operations and and that's those are predictable absolutely the question for me would be is get LS remote a high a high profile one because of its use in scanning for brand multi branch repositories its use in detecting changes fit fetch and it and LS remote were the three on my list of probably those and those likely already are 80 or 90% of the benefit if you find a way to improve fetch you've already done dramatic improvements okay great okay so yeah the size of the parameters we want to use to test the operation size of repositories an obvious one that we have different operating platforms yeah so when those Linux and Mac I think you you made it you've the fact that you've got a Mac is a good benefit because I don't have convenient access to a Mac so I would include Mac in your list okay I'm a I'm a free BSD type I have but it is not not nearly big enough to put on this list for the Jenkins community okay so after that okay I've shown the presentation and then hard one of the major discussion topic I had in the mind was how are you going to use jmh if we integrated in our it as a test module it takes considerable time why first of all why would you want it after we benchmark it for once in different environments and we have consolidated results why would we want to integrate this module inside Git client plugin well so for me I would want to integrate it because I love having the results now but I want to know if new versions of command line get change the characteristic or new version new platforms change the characteristic so for example the platform sig is evaluating power PC 64 bit Linux and evaluating IBM system 393 370 x no system 390 x at those two are places where this would help or in my environment I may want to run it on free BSD but but I don't think we want to run it every time on ci.jankins.io but I think we we at least for me I would prefer it be readily available so anyone could run it anytime they wish comments from from other mentors yeah I think that sounds like a good approach okay so now we show an additional optional stage in the Jenkins file to run this how would we like for unit test that's something which is not optional right we run it with the code right yeah so so my thought was either make it optional in the Jenkins file or make it a purely a command line thing that I could then extend the Jenkins file myself privately on a private fork to invoke that but having it available in the Jenkins file would be very elegant if you can do it so oh and in fact there is a there is a history of doing that reshob if you would like to refer to someone else's work who had done that in the get plug-in Jenkins file history you will find so in there in the history and get of the Jenkins file you will find code that was added to run the platform the oh dear plug-in compatibility tester and the acceptance test harness I turned them off because they were too heavyweight and I finally deleted them but they are there and they give you an example of how to do it just look in the the get log for the exact Jenkins file and it will show you when I when I made that change and you can then use that as a pattern now I apologize obviously it feels like I made a mistake here I only allow 30 minutes for this I suspect we need an hour the other proposal was we would like to meet as much as twice a week to be sure that reshob gets feedback Fran Justin is there a time which I'll send you a doodle poll to see if we could get each other connected for another time during the week okay yeah so I think you're proposing due to half-hour sessions right so reshob do you have enough to continue making progress until the next time we meet we will for sure meet one week from today and I'll propose an additional meeting sometime between that based on the doodle poll I think what I can work on is the first thing is adding things to the design document because I've just added the benchmark strategy I haven't added things related to the fixes performance fixes we wanted to do the already existing bugs we have particularly redundant fetch and the second thing is maybe we I can test another gate operation I can try to test it maybe check out I can do that so yeah and or maybe we could discuss the implementation the way I want to include performance improvement inside the plug-in you could discuss that in the next meeting as well great now would you also be willing to share give a five minute brief summary and it really will have to be very brief high level to the platform special interest group a week from tomorrow so the platform special interest group will meet a week from yeah next it'll be the 17 oh you know the date I think I'd shared with you I'll send you a right 21st would that be okay for you yeah that is okay but I just wanted to know what exactly do I have to summarize what we want to do or do I have to describe the benchmarking strategy what particularly are we looking for huge you choose what you think would be interesting to the people in the platform sig this is a chance to do a status report to a group of people who let's see one of the people there is from IBM and other one is from Broadcom and and these are people who think about platforms all the time and so you just presenting hey we're trying this we're doing this benchmarking technique will cause them to be oh what about this what about that will be a good dialogue sure sure okay so I'll add whatever we've discussed and whatever the tasks I have for the agenda for the next meeting and there was things which we so identity progressive milestones for the project plan so right now when I when I created the proposal I had the key deliverables I just wanted to discuss we discuss it this time the things I mentioned as stretch goals to where do we want to shift them into the key deliverables we were trying to achieve so that's something after which I can identify the progressive milestones correctly yeah right and I think progressive milestones is a good thing to happen during this community bonding phase and we got two more weeks at least in community bonding during this phase so this that let's let's keep making progress there okay sure I think this is is there anything else you want to discuss I'll update the meeting notes excellent thank you so I'll send the Google the or the Doodle poll and we'll plan to meet again we will between now and next Wednesday I'll try to find another time when we could meet for 30 minutes you'll keep going on on the evaluation on the document on the design document and get yourself ready for presentation of the platform SIG as well okay all right okay excellent so I will I will post a copy of the recording because it helps people know how we're going thanks very much everybody thanks thank you very much work excellent worker job excellent work