 So today, we're going to talk about clean code and about code quality. So the first thing that I would like to ask you, can you please raise your hand if you use some code quality tools in your daily job? OK, so that's good. So we have quite a few people. OK, first, I will introduce myself. So my name is Andrea. I work as a software developer for a company called Sonar. So at Sonar, we develop products that help developers to write clean code. And myself, I'm one of the maintainer for the static analyzer for Python. So what we're going to see today, so we're going to start by seeing what is static analysis and why it's useful for developers. Then we're going to see a few problems, a few challenges that we might face when we use code quality tools, especially on a big project. And in the second part of the presentation, we're going to see an alternative methodology called Clean as You Code that really helps you to keep the focus on the new code. So we're going to explain what do I actually mean with new code in this context. And we're going to see together a few techniques to put this in practice. So what is static analysis? Static analysis basically is the ability to analyze code, so to find problems or any way to understand your code without actually executing it. So this is a technique that is often used by code quality tools or even by tools that you probably use daily like black, like my pie. And this can be really, really, really useful to help developers to write clean code. And clean code is code that is first maintainable. So of course, we want code to be readable. And we want it to be easy to change for our colleagues and for ourselves, of course. And then we want code to be correct, so we don't want to have any bug on it. And finally, and of course, we want it to be correct. And we want it also to make sure that we write test so that our code is testable and tested. And finally, so we want the code to be secure, so we don't want to have any vulnerability in it. So why static analysis is useful? So in this code snippet, so you might not realize that at line 203, we have actually a small problem. And in this case, so we are using extend API of a list. However, extend API is mutating the list in place. So it's not returning a new list. So in this case, probably the developer forgot about the property of this API. So I think that this kind of problem can be really difficult to actually spot in a manual review. So I think the static analysis tool are really helpful to find this kind of problem. So today, for our use case, so I'm going to use a project that I like a lot. And it's an open source project that is called FlaskBB. So FlaskBB is already a good size. So it's a 2020K length of code. And so it's written in Python, and it uses Flask. And so probably you already encounter it online because it's often used to build forum software. And so why FlaskBB? Because so first, indeed, they have already a good size. So they already have in place some static analysis. And for our presentation, so we're going to assume that we would like to add a pylint. So pylint for those of you who don't know is a very, very popular linter in the Python ecosystem. And we're going to use pylint also because sometimes it's known to have a lot of rules and to have a lot of messages. And so it can be quite a challenge, actually, to use it on an existing and big project. So let's suppose that we want to do that. So we want to add pylint to the CI configuration of FlaskBB. So for this presentation, we're going to use a GitHub action. So we have some boilerplate code here. But it's not really important. So what actually matters is this last line. And it's as simple as that. So we're going to run pylint on FlaskBB module. And then, so let's suppose that we do that, we open the first commit, the first request on FlaskBB. And when we do the first analysis, so we're actually going to see that. So we're going to see a lot of messages. So you see that even on one-bantain projects like FlaskBB, we can actually still have some reporting from static analysis. And in this case, so I think that we might be a bit confused. Also, we might not know where actually to start. And also, because in this case, so we have 1,800 messages. And if you compare it to the overall size of the project, that it was 20-20k lines of code, it's actually, it's already a quite high ration. So here, we have a small recap of all the messages that we saw in the previous slides. So you see that we have a lot of the convention that are not respected, some refactoring suggestions, some warning, and a lot of errors as well. So you might see that. And so probably the traditional, so the reflex that we developers might have would be to try to go there and fix as many problems as possible. However, there are a few challenges if you want to do that. So the first challenge could be actually if we work in a company, for example, it might be difficult to actually find budget to spend days or to spend weeks to work fixing on old code, so on old problem. And also, so if the application that we are maintaining is already in production, so we have to be anyway pragmatic here and say that anyway the application is working is already in production. Maybe it has some shortcoming. However, those are probably already handled by if in your company there is, for example, a support team or the quality assurance team. Maybe they know already some workaround to actually fix existing shortcoming of the application. So this is all to say that so if you go there and change old code, you have actually quite a high risk of functional regression. So you really have to be cautious if you want to do that. And finally, if you suppose that maybe you are a new maintainer on this project, so it's not the code that you wrote yourself. Maybe it has been written by someone who already left the company. So probably you don't really understand the code very well. So it can be, I think that we as developers might have a bit of pushback towards the old code. And it can be, I think, a bit boring also at the end to try to go there and fix all the existing issues. And by the way, so what happens usually with the pilot is that so to reduce all the messages, something that can be done is to disable all the existing rules in pilot and to just enable a few of them. However, I mean, this solution can of course work. However, I think it's not the optimal solution because in this case we are actually limiting the power of pilot itself, so of static analysis. And I think we should aim for better tooling that really keep the focus on the new code. So it should change a bit the focus of those messages. And also, so we talk mainly about messages, but let's not forget also about unit test and about coverage. So on FlaskBB it's already very, very good that they monitor the unit test coverage of the entire project. However, in this case, so we have a 44%. So let's suppose that maybe at some points we decide that we would like to have a threshold of 80%. So that's between that we have to write tests for days and for weeks to reach the 80% on old code. So probably we don't want to do that. So you see that if we see the entire project, so we have a lot of messages, a lot of errors, the coverage is not at the right level probably. And probably this can be a bit confusing, a bit difficult to handle. So what I would like to suggest here today is an alternative approach to handle this kind of problems. And so what I would like to suggest here is really to keep calm and to focus on new code. But what is new code actually? So let me clarify that. And so the easiest way to think of new code, so probably the most intuitive way could be really to think about a new commit and a new request that you're actually writing on the main branch. However, we can extend a bit the concept and we can think it also in terms of number of days. So for example, if you're working in an agile environment, so you might think that actually the new code period might be the duration of the sprint. So it could be maybe three weeks or four weeks. And if we push this concept a bit further, so we can also, if the product has some semantic versioning, so we can think that, so if we already deployed, for example, the version 1.0, and we are working on features for the 1.0.1, so the new code that we want to focus on is really the duration between the 1.0 and the 1.0.1. However, for my presentation, I'm going to pick the simplest approach, so to think of new code as being just a new request. And for example, let's suppose that I want to add a new functionality on Flask PB. So in this case, I'm going to add a quick sort, it's an algorithm that I really like. And for whatever reason, I might need it to be in Flask PB. I'm going to add it to CLIUT's module. And of course, I'm going to add also some semi-nit tests because I want to make sure that my code, my algorithm, is working. So let's suppose that we do that. So we have the CI configuration in place. And again, so I'm going to see all those messages that come from piloting it. And so here, if you look really carefully, you're going to see that we have indeed the Flask PB CLIUT's module line. And you see that at line 1,400 something, so we actually spot those three new messages that are actually relevant to the new code that I just wrote. So in this case, it's about some missing convention. So I didn't write the doc string for the quick sort method. The variable name x is not respecting some naming convention and also having one unnecessary s. So you might imagine that this approach is not really scalable. So it's not really scalable to dig in the log to actually find the new messages for each request that I want to open on my project. So I think that we should strive for something better. And so the first approach that we might try to have is to say, OK, so I just want to analyze the file that has been changed in my request. And how can I do that? So the easiest way would be to, so here we are again in the CI configuration. So we're going to change here again the last line. And we're going to use, we're going to run pilot on GDIF. So GDIF is a utility command that tells you basically what's changed between two commit hashes. And in this case, so we're going to see what's changed between the main branch and the feature branch. And we're going to use also the flag name only to actually, to get only the file names that have changed. And so if you do that at the next analysis, so we're going to have this that I think it's already better. So you see that we only see the FlaskBB CLIutils module. And in this case, so it's already much easier to spot what are the new messages that are relevant to my code and are for the same tree that you saw before. However, I think that I'm still kind of bothered by the fact that I still see old messages on the old code because here I'm analyzing the CLIutils module. So in this case, I think the file was pretty small. But if you imagine that the file is longer and this can actually happen quite often in products, so we may end up in a situation that it's quite similar to what we had before. So I think that what we would really like to have is really to have the focus on the changed lines. But first, let me pause for a moment. Let me formalize a bit better what I'm trying to do here. So this is the CleanAzEucode methodology. The CleanAzEucode methodology is really something that it has at its core, the fact that the quality should be measured on recent changes. And why is that? Because recent changes are actually is the more risky part of your code. Because I mean, you don't know anything about the new code that you're adding to production. So you really want to make sure that the quality is really high and that it's, of course, tested. And what also matters, and of course, also it's important because the new code is something that you actually own. So it's something that you probably understand and that you have a strong sense of ownership on it. And in this case, I will repeat that we want to focus on new lines, so not on new files. And we want also the metrics not to be on the entire product but to be actually on the new code. So on the change lines of the product. And let's see again, so if we see the recap again from the pageant reporting. So we see that with the CleanAzEucode approach. So we would like indeed to have only two conventions. So the two conventions that we saw before on a quick sort. And only one refactoring suggestion. And zero warning and zero error. And if we see again the problems, the challenges that we mentioned before. So where to find budget is not a problem anymore because in this case, so I'm working on a new feature. So it means that I already have budget. So to fix this kind of problem is really part of the job. So you don't need a dedicated budget to work on that. Functional regression, OK, in this case, you always have the risk of functional regression. However, I think in this case, if you focus only on the new code, it's quite limited. And anyway, you're going to have the code quality tools that will help you to make sure that the quality level is high. And finally, we talk also about the developer pushback. And in this case, I think it's not the case anymore because it's the code I just wrote. So in this case, I feel a sense of ownership on this code. And I actually understand it. So let's see how to put this in practice. So we saw before that it was possible to analyze the changes files. So we would like to analyze in this case the changes lines. And through that, I'm going to suggest two alternative solutions. So the first one is going to be based on an open source tool that is called Darker. And again, have you heard of Darker before? Can you raise your hand if you did? OK, so quite a few of you. So that's interesting. And so Darker is a project that is often used in conjunction with Black, the code formatter. However, it can be used also in conjunction with PyLint or with Flake It or any static analysis tool. And the concept here is that we're going to simulate that. We're going to introduce Darker in our CI configuration. And so in this case, we're going to run Darker on top of PyLint analysis. So we're going to run again the PyLint Flask BB. And in this case, we're going to tell Darker to do a difference. So in this case, it's going to be implicit between the current branch, so the code where the Darker is run, and the main branch of the application. And so if we do that, so you see that at the next analysis, we only see the three messages that are actually relevant to our code. So I think this is really the fundamental step in the PyLint Institute Code methodology. So now we have actually the focus at the right level. We are not bothered with all the old messages. So the second solution that I would like also to outline here is based on Sonar. So this is the company you work for. And at Sonar, actually, we strongly believe in the Clingency Code methodology, and we have it in our products. And here, so we're going to use the cloud version of the Sonar solution, as it's called Sonar Cloud. And again, here, we're going to modify slightly the CA configuration. So on Sonar Cloud, it has its own GitHub action. And in this case, we're going to modify a bit the PyLint analysis to, in this case, to produce a report. And this report has to be parsable. And the Sonar Cloud then will be fed with the report coming from PyLint. And if we do that, so you see that in the request. So here we are in the GitHub UI. So we're going to see a message from the Sonar Cloud bot that tells us that it found three code mal and one bug that are actually only relevant to the new request. And if we click on this link, so we're going to end up on the Sonar Cloud UI. So in this case, it's not a CLI interface. It's more the graphical interface. And so we see that here, you might recognize again the same three issues that come from PyLint, about the convention that are not respected and the unnecessary else. And also, on Sonar Cloud, you get also this dashboard that is relative to the pull request. So it's not on the entire product, but it's just on the pull request that we just opened. And you see that we have a failed condition. So it means that the quality is not good enough. And also, I'd like to draw your attention to the coverage number. So you might remember that the entire coverage was 44%. However, in this case, we see a number that is much better. So it's 775.9% because it's only relative to the 17 new lines that are part of the quicksort implementation. So it's not on the entire code base. And also, we get the information that if we merge this code, so we're going to have the 44.7% on the entire product. So I think that this information is more actionable. So it means that I just have to write a few unit tests more to actually have the right level of actually to get at the threshold of 80%. And actually, so we see that we have also one more bug that has been detected by Sonar Cloud. Why is that? It's because Sonar Cloud. If you use Sonar Cloud, you have also the benefit of running its own rules. So on top of PyLint. And in this case, Sonar Cloud is able to find that there is actually a bug because in this case, I forgot to add the race keyword on this exception. And it means actually that if the array is known, so we're going to continue executing the code. And at some point, we're going to access the first element of the array to get the pivo. And we're going to have a type R. So I think it's very useful information to have. So let me recap for a moment. And so you might remember that the title of this presentation was about to forget about the old code. However, this was kind of a lie. Because anyway, when we have to modify to add new feature, we have actually to touch old code, so to modify the existing code. And I think that at that moment is the right opportunity to go there and clean your existing issues. And here, so if we assume, so we just made an assumption that in one year, maybe you will need to touch 20%, 20% of the existing code base. So it means that the 20% of the code base will be clean, even on old code. And why is that? Because you put in place this CI check that to make sure that the quality is good. And again, if we assume again that in five years, maybe you're going to modify half of your existing code base, it means that you clean incrementally half of the code base. So with the clean as you code methodology, you actually get also this benefit of cleaning incrementally even the old code. So let's wrap it up. And so let's recap quickly what we saw today. So we saw that to handle the code quality reports, especially on big projects, might be quite hard and challenging. We saw together the clean as you code methodology. So we saw the first approach that was based on Git diff. So it was only based on the changed files. And later, we saw a more advanced approach that it was based on darker and so on, that it's only on the changed lines. And finally, we saw that thanks to the clean as you code methodology, you also get a way to clean incrementally the technical depth of your project. That's all what I wanted to say, and now I'm ready to take questions. OK. Thank you very much for a very nice talk. So my question to you is that you saw there is an old code and I want to make it very good. But what if there is a technical tab? For example, I'm upgrading the Python version here. For example, let me take a worse example. I'm moving from 2.7 to 3, right? So can it work? Or can you suggest how can we do it in a better way? Well, I think that what we should strive to do, I think that, well, maybe in this case, it's a bit complex. But we should really try to have the smaller increment as possible. So even if we want to change the Python version, maybe we might think to do it only on a small part of the application. And I think that we, so I would try to, anyway, to use the cleanest ecology methodology only on a small part. So even the change to the new version, I think that it has to be incremental in nature, maybe to be easier. OK. OK. Thank you. Abraham? I have a question, because when we apply changes to the new code, like kind of new style, wouldn't it be like clumsy that you have some code in this standard, this conventions? And a new code in new convention could be confusing if you read that code. So how to do that, how do you approach that? Yeah, it might be confusing. I think it really depends on the needs of your team. However, I think that to try to change the entire project is probably quite challenging. So I think it really depends on the use case. So if we're talking about style, maybe it makes sense to try to do it for the entire file or for the entire project. However, if we are thinking more about potential issues, about bugs, I would go instead for the incremental approach. Have you spotted any disadvantages of using SONAR instead of NARICAR, for example, or advantages that you can find? OK, so one advantage that I might think of if you use SONAR is that on SONAR, anyway, there is the concept of the entire project, anyway. So even if you change the new code, so it's able somehow to know that if you imagine that you change the function definition, you add one parameter to a function, for example. So it will be able to actually spot that even the old code that is calling this new function has actually an issue. Because in this case, you're going to have a type R. So it will report it even if it's not technically a new code. However, it's a change that it's related to something that you actually modified in the new PR. And I think that with the combination pydint plus darker, it's not possible to do that. And then maybe, I mean, I think also it depends if you prefer the CLI interface or the graphical interface, so maybe also it can be a pros or cons of using darker versus SONAR. And thank you a lot for your attention. Did there is one more? Sorry for that. Sorry. My question is, of course, the reason why we want to do it incrementally on new code is because changing everything would be too much. And you see that the downside of that is now you incrementally have two camps in your code base, one which is the old style of doing things, which is a bit more lacking in conventions and idioms. And then when someone new comes and looks at this code base and looks at different modules that are written differently, can you see that that's still a good price to pay in exchange for that major hurdle of having to fix all of those hundreds of pylint tissues in one go? To unborn new people and getting someone new exposed to the code base and seeing that some files have even some functions in the same file are written following modern pylint conventions and then everything else is a bit weird. Yeah, I know. I think that it's not the silver bullet, right? So I think that we have to be pragmatic somehow and say that I think that one approachable way to actually handle all these kinds of messages is really to focus on new code and live with the fact that we might have some inconsistency in our existing code base. But again, I think it really depends also on the needs of the team. So maybe in some cases it can be better. So if we think, for example, on a code formatter, maybe it makes sense to do it on the entire product. So you have the same style everywhere. But yeah, I think it really depends on the effort also that it is needed to actually handle this technical depth. Thanks. No problem? Thank you. Thank you.