 Thank you, everyone. So as a start, I would like to ask you what your feelings are if you see a picture like this? any positive or negative feelings fear Okay, so this is what a pull request or in this case just a diff from a pull request and get up looks like and This probably means you are reviewing it, which is hard for a lot of people so What would you say if I tell you that there was something that made this? Confusing I have to look everywhere thing look more like this and give you a heat map of like problem areas where you should look were There's probably a bug and What if I told you that there was a thing that would predict 50% of your bugs Of the bugs in your project by just looking at 0.5% of the lines of code in your project Does that sound good or helpful? I see some thumbs up So that's what lines but hopefully can do to da First who knows what a good commit is who knows what a hunk or who doesn't know what a hunk is in a commit Some hands. Okay. So just to be able to explain the algorithm later the blue colored lines are the metadata which tells you about who wrote the commit when he wrote it gives you a hash and Hunk is then what's marked with the big hunk That's just one part of code. That's kind of like it So near each other that get decided to make it into a unit and That's going to be used in the algorithm. So I just want to explain that So this is how line spots works the idea is you take a project and it has to be a good project at this point but generally you can use every version control system, I guess and You iterate overall commits and you have to decide if for each commit if it is a bug fix or not And this is not reveal and I'm going to talk about that later but Let's say we somehow know if it is a bug fix or not if it's not a bug fix We just keep track of all the movement of what happened in the In the commit and that's it But if it was a bug fix we not only track the changes But we apply a score because the idea is if there was a bug fix somewhere that means there was a bug And if there was a bug somewhere That means that area is probably complicated or someone wrote it when he was tired or It's just a place where obviously bugs will occur and Research has shown that if an area had a box in the past the probability is increased that there will be more bugs in the Future, so this is the whole basis of the algorithm For my study I just used a simple keyword finder for the commit messages to decide if a commit was a bug fix or not So for example at koala we use fixes colon Issue link or I think at GNOME. It's something like bug colon Issue link stuff like that. So that's individual for each project and it felt like it kind of worked So this is how the scoring works The idea is that every line in your project starts with a score of zero every time There's a bug that is connected to the hunk that line is in We increase the score Ideally we would increase the score every time we rode a bug But there's no way of knowing when I wrote a bug. It's just like implicitly known by solving it later So in this case, there's a red line that is removed So we also drop the score and there are two green lines that are added and Those scores are increased the first two lines are not changed in any way. So we just give them a flat increase and the bottom lines that are Added use something. It's like the small calculation in the bottom. It's just the average of the hunk If you're like very interested in the ins and out how the algorithm work you can come to me later But I don't think that's very interesting like for the whole group. So I'm not gonna go too much into detail on that The whole thing then is weighted by age. So newer commits Change the score stronger than older commits. That's just to make sure that If you fixed bugs in the past and there's no new bugs in that area We kind of decided that that area probably was fixed for now and we shouldn't look at it even more And this is what the result of the algorithm looks like you get a list of Line scores and corresponding lines and files and this can then be used to for example create a heat map of your project or Do whatever you think helps your project Those those scores are not absolute. They're relative to all the other scores in your project So for example, you could use it to look at the highest score 10% of lines of code or something The interesting part for me was to figure out how do I evaluate if this works or not so Ideally I would want a lot of data Using this using this like on a group and I have a control group that doesn't use it and then have like hundreds and hundreds of people reviewing stuff And see if it works or not and that wasn't really feasible for my bachelor thesis So what I did I created something I call pseudo future So this is what the history of a good project kind of looks like you have an in it commit where you started your project And it's some later point. There's your head comet, which is your now state And you can use this to go back in time and decide I'm just gonna go back like let's say a hundred commits and I call this now and everything after those hundred commits is now a future you don't know anything about because in this case at the The middle point I have no idea what's coming after that so I can use the following commits as a future that I Have in my project and I can that that I can check against but I can't influence my algorithm by it And what I did then was I used all the lines that were proposed by my by my algorithm and see if in the in those lines there would be bug fixes in the future and The what I use for this and I wasn't really sure if this is the best thing to use but it kind of sounded sane it's called hit density and it just shows the Ratio between number of bugs that are Accured in the lines I checked and the lines of code that I used to find those bugs So if this number is higher this just means that I found more bugs per line of code I looked at and I think that's what like efficient review looks like I want to find as many bugs By looking as little lines as possible So When you plot this it kind of looks like this the blue line is the proportion of Bugs I've hit so it goes between zero and one which is then a hundred percent So in this case I arrived somewhere around let's say 19 percent of whole of all the bugs that happened in the future and Red line is the head density which is the real interesting thing So you can see that the hit density spikes very early meaning that around 0.0001 or something like that We have the best ratio of Bugs we found to how many lines we looked at And this was very consistent over a lot of projects. I'm going to show you a table with some numbers later and It means that we can use this algorithm By looking only at very small numbers of lines of code We don't have to look at like half your project to find everything But we only have to look at very very small numbers of lines of code, which is good because that's what code review usually consists of So this is the comparison with the bug spots algorithm with this what This whole thing I did was based on Google's bug spots algorithm Which kind of does the same thing I did but on a file base So you don't get a score for each line but for each file and if you compare both graphs The bug spots algorithm finds more bugs, but that's just because it proposes whole files And it assumes that there is knowledge about parts of the code where there's actually no knowledge about and The line spots results on the right side a spike Much earlier and reach their plateau much earlier so For using this also you can see the scale on the right side for bug spots We have a ratio of 16% of Bugs found per percent line of code and on four line spots. We have 2,000 percent Bugs found per line of code, which is pretty awesome There's another picture which is completely zoomed out And you can't really see how steep it rises So this is just to show that overall bug spots looks to Find more bugs but for the cost of you having to look at like 20% of Your projects code, which is probably not something you want to do So these are some projects I tested with it and the green color just is an indicator of which ranking algorithm Work best. So what this shows is I looked at 0.5 percent lines of code because that was kind of a sweet spot Under that there's just the parameters. I ran the algorithm with and Then there's some projects. You probably all know at him. You probably all know gimp GTK Nautilus is the file manager for gnome And I ran a few algorithms over them And tried to find out which one works best So what the numbers show you on for example for a bird because it's the best case that I encountered in my whole thesis was that with looking at 0.5 percent lines of code I found 50 well almost 57 percent of bugs that would occur in the next 150 commits So by just looking at a very small portion of the project I could find almost all the bugs or three quarters of the bugs that would occur in the near future What sounds pretty cool? So what can be approved about the project and this is where this becomes really interesting because Although it kind of sounds awesome to find bugs in your code so far We haven't really or I haven't really figured out how this can be used in real life because everyone I've talked about Talked to you about this was like this is really awesome. This sounds super cool But no one had an idea how this could help someone Although everyone thinks it's awesome So first some academic stuff what can be Improved so first the decision if a commit is a fix or not is a very vague thing and there's the idea to use machine learning to increase the accuracy of that but Ideally you would reach an accuracy of a hundred percent for deciding if something is a bug fix or if something isn't a bug fix because that's what the whole Things based on if these decisions are not made precisely then everything after that becomes unprecise The next thing is which a lot of people suggest is tracking logical trunk so something like functions or you Logical blocks like if statements loops and stuff because if there's a bug somewhere in a function That probably is a sign for that function being complicated. So you better take a look at the whole function This was just too much for the nine weeks ahead in my thesis But generally I really want to try this although it's get gets very complicated and has to be implemented for every language itself So far the algorithm runs is completely language agnostic and runs on everything you have that is text-based Finally we can apply some machine learning voodoo because at the moment you can solve every problem by throwing machine learning and big data at it One part with would be the is it a fix or not a fix decision Generally, I think I have received Suggests and suggestions to use machine learning in every area of this algorithm So if you have like cool ideas on how you could use machine learning to make this better Please tell them to me and there's like there is a list somewhere where it starts with machine learning voodoo And then there's like a list of a whole lot ideas And Finally, I would really like to test this with people. I would really like to Get projects to work with me to find a way to get this thing useful in real life because I think it's could be helpful with code review, which is a thing we at koala struggle with because we just have so much to do and if you would be interested in Somehow collaborating with me or just me running the algorithm over your project and then seeing if it has somehow helps or not Come forward talk to me here or at the koala stand. Thank you Thanks Maximilian for a nice talk. We can take some questions Thank you for the talk Is it only you who can run this algorithm or is it open source it or something there's My complete thesis is open sourced so you have you can use my the first implementation. I wrote I'm currently rewriting it in a very like much nicer way But you the algorithm is open sourced and the thesis and all the data I gathered is also open sourced Can you talk again a little more about why you're looking only at the bug fix Sections rather than like feature commits and why the bug fix sections would be more Prone to new bugs. Yes So that is just based on the research I based my thesis on because they found the connection between Bug fixes occurring and there being more bugs in the future I'm pretty sure there is a lot of more science telling you that there might be problems And yes, new features probably are prone to have bugs because they're just newly implemented or not tested So that would be another thing to look for But maybe just like I only had nine weeks for my thesis. So there was just time constraints Thanks very much It was really interesting. So first Deciding whether a commit is a bug fix or not. Well, probably if all the commits are linked to Some issue tracker and issues are labeled as bugs or not that could solve the problem entirely, right? Yes, and the other question I had Did you? check, can you tell if Well, what kind of bugs are you looking at? I mean purely logical bugs or Also well bugs like misunderstood requirements stuff like that So there are really many different kind of bugs, right? so I guess the The algorithm doesn't care for what type of bugs you feed it and I'm the Research, this is based on didn't go into detail on like if there is a difference between let's say typo typos and some people argue that typos are bugs and some people don't I guess that it probably doesn't matter because having a typo somewhere in From what the research I'm based this on told Was that if there's a bug there's some reason for a bug to be there and that also leads to the Probability rising that there's just more bugs there. So we for example Try to label our issues with bugs and non-bugs based on what kind of bugs we want to find with Because we kind of use this So for example, we don't label type a typos as bugs because we're not interested in having the typo Bugs in this algorithm set But if that would be something you were trying to look for then I guess Libering that as bugs also works Did that answer the question? Hi Thank you max and just wondering if you can locate high-risk lines of code. Can you also suggest fixes? No There's that would be something that maybe I don't know you could then feed that into some static code analysis analysis thing But this is also Not like this is just a probability if there is a bug fix or not This is not saying there is a bug. This is just saying the probability of There's a back in this line is higher than the next one for example So no fix is sadly Thank you for your talk I was wondering are you planning on making things available like for instance when you push a full request on GitHub so you would have a nice tool that will highlight quite nicely the lines so reviewers can look at it There's the guys at GitLab running around somewhere. They are thinking about using this So far I didn't build anything in that direction, but I think it could be cool Although all the people I've talked to so far Weren't sure if they would use it So like the idea sounds awesome and I agree But all the like people with money that I tried to somehow get information out of if this could be useful somehow Weren't really into the put this into GitHub and make your pull requests colorful thing More questions. Thank you There is something usually when you fix bugs sometimes you write tests As I understand how it's working most probably as the code added in the test will be in a bug fix commit It will just increase the score of probability of having a bug here Probably maybe you write bad test, but probably it's not bugs just to ensure that the bug you fixed is not happening anymore So is there ways you Avoid some kind of parts in the code because this is the most obvious one, but maybe there is other kind of things like this No, the so like the implementation that it exists It's too simple for stuff like that But I guess it would be a rather easy to have like an ignore list that just looks for test files or Maybe documentation stuff or something and just doesn't track them and doesn't offer them in the lists of stuff you should look at so When you say make a bug fix then you your scores around there go goes up Is there any way for the score to go down afterwards or if you fix there, but a few times then That probably the code is actually fixed Would the score go down or would it stay up or? So the that's what the waiting function is for I didn't show it because it's just an exponential function But what it does is you run this this isn't run incrementally But this is just run once and you get one set of scores and then for the next commit you would have to have to do it again so for example, if you analyze last 500 commits then the 500th commit in the past Ways less than the newest ones So if there was a bugs bug 500 commits ago and there wasn't or a bug fix and there are no bug fixes in the future The scale score is reduced by the waiting function thank you so in This exponential decay with time how much do you only pick up like the most active recent Most active areas of the code compared to the most error-prone areas of the code So if you would analyze the CPyzen project, wouldn't you just have all the the as a guy old stuff light up and all the Possex libraries be converted zero if there is a lot of bug fixes in that area probably because that means there's a lot of bugs in that area It's time dependent. That's how recent the changes are right? Oh Yeah, that that's kind of a problem like generally this awaiting by time is not ideal because time is not a very good measure on progress because there's projects where there's a lot of Comments happening in a short duration and there's project where it's like weeks or month of no commits So even though you could have one commit two weeks ago and the next commit today And they are like in the sense of the project They are not very far apart, but they would be weighted very different due to the time With the time used to wave the stuff So I thought about using like number numbers of commits to waste stuff But then again numbers of commits is not a very good measure because some project use small and some project uses large commits So that's also a thing where no ideal solution is there so far We still have time for some questions so You're assigning each line some kind of a weight or Score I didn't get exactly how how some lines were scored higher than the others Somewhere zero three somewhere zero five. Oh, yes You mean the example scoring this one, okay, so to go about it in a little bit more detail the Can you hear me without my okay So here are the old scores which are the 0.3 for the first lines and 0.5 for the next for the following two and Here are the new scores and the way this the new scores are calculated that at the top It says score increase is zero point two five, which is just a random number. I choose this would be Decided by where we are on the weighting function. So how far back in time we are And the first two lines are not changed. So they just get the zero point two five increase flat The third line is removed. So the score is also removed and the bottom ones are Edit as new lines. So What is done is I calculate the average score of the complete hung which is done at the bottom and is 0.43 Which is then used as the old score and then increased by zero point two five to the new score Does it answer a question? Okay, so let's all thank Maximilian again for his talk Thank you