 In this session we're going to have Carl telling us about automatic code reviews, which I'm very interested in Carl is a refugee from Java land and works as a system administrator So if you could all please make Carl feel very welcome Hi So welcome to my talk, which is about automatic code reviews. This is the long title which they wouldn't let me put on the program The general gist of this is that there's really really cool tools out there And if people are willing to put in a little bit of effort, you can really get a lot of benefit from static analysis and warnings before they become problems in production so It's traditional to start with a slide about me I am a system administrator come software janitor at aqua.org, which means I do Apart from panicking about open SSL. I also do things like continuous integration and All the continuous deployment that kind of stuff and as a side project everyone landscape.io, which I'll come to in a second and then email Twitter get up So landscape is code reviews and automatic static analysis as a service kind of thing The idea is that if it's really super easy like you can click with your github account and just log in and then kind of get all That stuff out straight away then a lot of people will use it and this has been going It's sort of officially launched in November I've been coding it for more than a year and there's about 1500 open source code repositories are on there right now and they're being checked all the time and it's free for open source It looks kind of like this. Maybe you've seen this sort of health badge in the top corner there on some github repos occasionally So the majority of this talk is going to be out lessons. I've learned while building this specifically how it pertains to static analysis code reviews that kind of thing so As an a first of all just want to talk about what's the point of code reviews in general just in case you're sort of not really into it Code reviews are really cool because you can you can catch errors before their problems Especially if you have new developers people who are new to the language for example Maybe they come from a different background for example as said I come from a Java background and so my Python at the beginning was all like You know sort of classes everywhere interfaces everywhere until a colleague of mine looked at it and went So and and as well like people knew to the company who maybe don't understand how you do things in this company So that's kind of cool But it's also it's a really good learning experience You can see how other people do things you can kind of see the things that they are interested in doing and the Way that they do things and the techniques they've learned that you don't know about all the other way round You can teach them something that maybe they don't know and then business logic errors are pretty important Like you don't want someone to put a 15% discount where a 5% discount is kind of the right thing You don't want something to accidentally Financially cripple your company that kind of stuff so like the whole idea in general It's just a kind of keep an eye in the code base I think and that's what automatic code reviews can give you almost all of this stuff So they won't give you the business logic side of things All these tools all these command line tools as far as they're concerned and into the mint They don't care if it's 15 or 5% But everything else they can pretty much do and the great thing is that it happens automatically after every code push If you put it into your you know continuous integration you use Travis or Jenkins or whatever then this happens all the time and after every push you get a little bit of feedback to see what you Just did and whether it was better or worse, which is kind of cool And you can get you know pep8 compliance that kind of stuff and for me I really learned of by doing pylons and running pylon especially on all these all my code I really learned a few things and Yeah, so I'm gonna come back to this sort of number to track in the end because that yeah, I'll talk about that later So Hands up who knows roughly what's gonna happen based on this slide Okay, that's that's a fair amount. So this is a fairly common python gotcha You have a keyword argument with a default value, which is a list or a dictionary You do something to the argument you get and then you return it So let's say you call append one with no arguments. It uses this list It appends one to an empty list you get a list of one great you do it again now you've got two things Okay, that sucks and this is really annoying to debug and like I say, this is a fairly common python gotcha, but 20% of repositories that landscape checks have this error in them somewhere So out of 1,500 Yeah, 20% of those actually have this error This is a really, you know, this is something you shouldn't do and yeah I'm not gonna go into why because you can You can find that on the internet but pilot it will find this for you and interestingly even pie charm Which I think is a fantastic IDE and has the best static analysis of any idea. I've tried will not warn you about this So here's an example of how automatic code reviews can help you find bugs before they become bugs This is another one. This is something that I really learned If you call logging dot debug with a string and you want to see the values of foo and bar You will have the string with formatting parameters and you'll kind of squash them in there And then you will call logging dot debug thing is this is apparently turns out This is the wrong way to do it What will happen here is that you will interpolate the string and then call logging dot debug and if your logging dot debug is you know, if your logging level is info warning above whatever then That logging dot debug call is basically not but you've already wasted time doing the string interpolation And it turns out that you can pass in foo and bar as arguments to logging dot debug and it will interpolate the string for you That's pretty cool And I certainly didn't know that and again 10% of repositories do this of the 10% that landscape checks Then of course this style guide. So, you know, maybe you maybe you work with a monster who'll write code like this Like this is horrible. This person probably uses tabs instead of spaces, okay? So, you know, it's nice to have this conformity and style and again This talk really isn't about why but if you want that you can run pep8.py to enforce conformity with the pep8 style guide style Advisory I guess you could say so Next slide so up until now what I've talked about are things that you can find by Checking your code for errors, but a project isn't just the code especially for web applications having old versions of Requirements for example can be a vulnerability ask anyone in the rails community how that can go horribly wrong Fortunately Django hasn't had so many problems, but there are consistently security Editions to libraries and you know security bug fixes that come out So a way you can check this is if you have a virtual environment with all of your dependencies installed If you're on pip list hyphen o on a fairly new of I think it's pip version 1.4 I can't remember exactly it'll tell you what requirements you have that are old and This is kind of useful if you want slightly more involved There's requires that I own gymnasium.com which will take your code I think they only work with github certainly requires that I oh does it will take your Code out of github it look at your requires the text or set of the py and it will tell you Which libraries are using that are out of date and it will tell you okay? This one's a security problem. This one's a you know, just regular minor update that kind of stuff So this is pretty important especially for web apps and there are so many tools out there Like I really don't have time to go into all of them Pie flakes is kind of the big one that I'm not going to have time to talk about and Chances are if you use sort of Vim or emacs You're probably using a plugin that uses flake 8 under the hood which uses py flakes again Frosted is a new fork of py flakes, which is a little more maintained McCabe is really cool. It will it takes her code and it kind of looks for too many branches Oh, yeah, sorry. I forgot to mention. You see this little tiny link at the bottom that no one can see because it's too small This is car crowded comm slash EP 14 all of the things I'm talking about right now There's links to them on this page So you don't have to worry about finding these afterwards you can just go to that page and and look at all the links and everything So yeah, so there's all these other tools Jedi is a pretty interesting new thing, which is the guy who does it is doing a talk on Jedi straight after this talk in I think it's B 07. It's one of the B rooms at David Halt is going to be there doing a talk on that pyrom is pretty fun. It checks your setup to py to make sure you're doing the You're a good citizen if you want to have an open source library. So there's so many tools out there But these tools exist why don't people use them or how can you get the most out of it? Obviously if you have a brand new project then setting this kind of stuff up at the beginning is really cool You can as your project evolves you can evolve how you're doing all this checking and that kind of stuff But most of us don't have the luxury of starting a new project You you haven't you know, you have an existing code base You have a mature code base and you run pilot for the first time and you get like a thousand errors and you look at anything Oh my god, I can't deal with this So the way to get the most out of it is well step number one is understanding what these tools give you It's not a list of errors. It's a list of suggestions. It is like a code review Sometimes you're gonna have errors you're gonna find problems and you're gonna say, okay. Yeah, that's a good point Yeah, it's a bug but sometimes you're gonna find things that are more Is this right it's like someone looking over your shoulder and going are you sure Which is you know that that's the attitude to have when looking at the output of this tools Like don't be intimidated by a thousand errors that don't doesn't mean you have a thousand errors the second thing is Spend time tuning the tools the all of them are configurable and all of them have variable levels of you know How how much do you care about this? Maybe you don't care about a specific error. You can just turn it off It's it's worth doing this it really is and I'll come back to that in a second And then again measure and track over time is my next slide This for me, this is the most important slide of or most important concept over this whole thing you really need to be Concentrating on how you've improved or made things worse than last time you can't just look at the absolute value So if today is better than yesterday, then you've improved and you're doing well, and that's all that matters So in order to do that you need to measure and track the output of these things So as an example Some CI servers like Jenkins have a pilot plug-in and you get a little kind of graph over time of how many errors Pilot finds that kind of stuff If you have graphic sorry graphites or influx DB set up, which you may already have Maybe your ops team have this kind of setup and you can beg them to use it or whatever It's really great simple way to pass in numbers and then track these numbers over time And if you have a nice dashboard on front of this like Tesla or Grafana You get this beautiful pretty page that everyone enjoys looking at which gives you a little graph that goes up Or a little graph that goes down and if it goes down you can think okay Maybe we need to spend time refactoring our code Maybe we need to concentrate on this and if it goes up you can feel good about yourself At the very least Just just do something like pipe the output into word count and see how many errors you had this time compared to last time If you if all you have is a number that goes up or down You're still going to be doing pretty well and then tuning so Again all these tools have a way to configure things I think that the majority of people when they try pilot for the first time on a mature code base or other All these other tools, but especially pilot because it's you know, it's the most involved one It's kind of the the most mature one and it finds the most bugs because it actually does a whole bunch more stuff than say pyflakes does That you can sit there and you can tweak it and you can spend a long time doing this and it's again It's really worth it. If you spend four hours digging through the config digging through all of the The manual pages and all that kind of stuff trying to figure out what's going on and then apply that to your pilot RC so that the output is actually relevant to your project, you know, maybe it's four hours Maybe it's a day, but it almost certainly will save you four hours or a day in the future because it'll find bugs Or it'll make your project more maintainable and that means that future problems future features will be much easier to actually Do So it's worth spending the time and yeah again I don't really have time to kind of go into exactly what you can do But for example, if you google around you will find pilot RC files that people have already created Typically so the second one here I created this a while ago because it annoyed me that pilots would warn about things in Django that weren't really warnings So if you have a Django model you have a dot objects model manager. That's that happens at runtime It's part of the meta programming of Django So it doesn't exist when pilot looks at it and therefore pilot will warn it will say my model dot objects does not exist And yeah, it does so you can write your own customer plugins or you can tune this with pilot RC So the benefit of the plug-in approach, which takes a little more time and it's a little more involved But you can control exactly what kind of things you can Pilot will turn off or turn on you can kind of say okay I'm going to override this specific error in this specific instance rather than having to turn off all Examples of this specific error. So if you're interested in writing a plug-in the documentation on pilot log will help you Use this as kind of a reference. It's a little hacky, but it's better than nothing and yeah again Spending the time upfront is It'll make things more useful. I believe So by this point, I hope you're all kind of thinking wow, this is really super cool But if you're still not willing to you know spend a little time then I suggest you try out prospective Which is a tool that came out of landscape So the original point of landscape was like I say to to try and make things super easy for people to get get started but There was a part of that which was bringing together all these tools configuring them in a kind of best guess kind of way and adapting them slightly if you're running Django and so on and I pulled that bit out and moved it into a separate open source project called Prospector, which is a command line tool and it's it's really a layer on top of pylons and Pyflix and pep8 and all this kind of stuff. It's it's It does its best to kind of guess what you want. So for example, it turns off a bunch of really picky errors By default which you can turn on but by default it tries to it tries to make it so the first time you run it on a big code base You will get something out of it Which is actually useful to you and you're not sort of swamped by little picky errors You kind of really see the the core of the problem and Yeah, it tries to From your requirements of text if you're running Django, it'll try and figure that out and try and use the plug-in So hopefully I'm gonna create some more plugins for things like twisted and other frameworks and try and improve this further and further So if you really this this is sort of to steal a phrase from Python requests, it's static analysis for humans I'm trying to make it super easy to start with and if you need the power you can have it by just doing it yourself under the hood But going from zero to now I have static analysis. Hopefully this tool will help out So I kind of sped through this a little bit quicker than I expected so here is my conclusion Which is essentially I've already said this several times but spending time Spending time on this will really really benefit you and it's about, you know Developer cost and that kind of thing if you spend the time doing it up front It may seem like it's a little bit like you're not working on features, but it will come back to you in the future you will You will benefit from this at some point and yeah, most important thing is keep track of the metrics Keep track of what's happening from these tools keep track of what they're telling you Because even if you don't agree with them if they're telling you less than last time It's probably because you got less errors. It's probably because you're doing better Yeah, so I've kind of sped through this a whole lot quicker than I expected But anyway, so this is the big list of links the top one has links to all the stuff that I just talked about and Yeah, that's that's pretty much it Yeah, so I suppose it wasn't a question exactly but yeah So the first bite was the default argument might be intentional and how do you deal with intentional? Think things that the tools find that are problems that are actually on purpose and So yeah adding a comment for the next developer is kind of useful But you're still gonna get that error out of the tool and each of them have kind of ways to turn these things on or off It's a little more involved and one thing I'd really like to do with landscape and prospector is have a sort of interactive mode That you can that will help you do that because you have to dig through the configuration and find out Okay, what is the error code and how do you turn this off? So yeah, of course Yeah, so not just deactivating the error, but also explaining why it was deactivated Not exactly so the question is landscape doesn't actually deal with these with the configuration files, especially for pylons and It's kind of on the roadmap one of the problems with pylons is that you can you can sort of load arbitrary plugins And there's no real way to sort of do that. So I need to spend a little time figuring out which part of the config I can apply Prospector will do its best to deal with talks.ini and setup.cfg for you know pep8, pyflix, that kind of stuff But you're right Landscape, I mean it's still a work in progress and there's still a lot to be done So it doesn't do that for now And it's really for people who don't do any analysis right now and they want to start rather than for people who already do it And want a way to graph it, but it's coming So I wouldn't say I'm a relative newbie. I've been doing it for like three years or so like like I Accumulated this knowledge over the course of doing all this stuff So I forgot to repeat the question which I'm supposed to do which is how discoverable are the tools and I think again This is maybe something that needs a little work You tend to I found out about it from word of mouth. There's no centralized thing. I think Matt who does Full stack Python mentioned yesterday that he was thinking about sort of adding a separate section on how do you test? How do you do static analysis that kind of stuff which would be I think would be a real benefit I Will certainly do that. Yeah, cool. There's one here behind he's hiding from you Pip rot, okay. Oh, that's pretty cool. So pip rot is is a tool for Apparently for figuring out holds or requirements. Oh, yeah, that sounds pretty useful Yeah, I imagine there's I listed a bunch of tools that I've heard of but I'm sure there are many many more