 Hello, good morning Okay, now that you're awake. My name is Alex and I'm going to talk about pilot plugins I'm going to talk a little bit about tokenization a little bit about abstract syntax trees But the question is why do we need to have customized plugins? Why do we need to have even more linters on top of whatever we're already using and Why do we need to have tooling which will ultimately tell you your source code is wrong? You have to fix this and my answer is that the existing tools are not always enough and static analysis tools like plug-in like pylint Are very easy to extend they can help you make your software better and I will show you some examples so One use cases you want to enforce particular coding style This can be something which is not valid for the broader Python community This can be only valid for your own company or maybe only in the team that you work in Or more more more commonly valid only in the current project and that can be different from the rest of the projects in the company In key vtcms. That's an open source project. I work on we like to use Documentation strings with three double quotes and we don't like to have the rest of the styles which are perfectly valid Python styles for documentation strings We just like three double quotes and we have a customized pilot plug-in which will discover this for us We can fix it and we can keep all of our source code in the same way So you can do things like let's say you have client and server relationships In your application, you may want to name them the same way So both the client both the servers start with the same name and then you have Client at the end server at the end easy to find easy to grab in the source code if you like you can do all sorts of stuff Another example is you're using a lot of frameworks a lot of libraries. They have best practices They have some recommendations that you need to follow. For example, if you're using Django, this tells you Don't hard-code Authentication user as a foreign key relationship because this can be changed You should be using a setting for that And also Django tells you don't issue queries directly Against the user model because that can be changed. We have a helper function Which will give you the actual model at runtime and you can query that and all of this Is designed to facilitate downstream applications Who may wish to change the stock user model and provide something else? instead of the default so this is one way to Make your application aware of these things make it follow the practices Another example is we are using Django simple history to keep track of changes to some objects And Django simple history works with model safe So if you're using objects.update or bulk create this doesn't Use the safe method so we skip history. We don't like to use these methods again customized pilot plugins So we are aware not doing this Another example is this is especially true in big projects in legacy projects and old source code You can use static analysis tools to help you find Possible sources of problems of bugs and we have had the problem of missing permissions. So we have views Which process requests from the from the browser and they are missing the permission required decorator So that's that's really bad, and we've seen this a few times figured. Okay, let's create a plugin For piling then find all the places in the source code that may be missing these permissions So we have a list of them. We can go check them out Figure out what's going on if we add a new view later and forget to add the permissions to that the plugin will tell us so it's like nice mechanism to To very easily very quickly find some problems Before we continue two things that are important. You just need to know that they happen it It's not necessarily to know how they work in details under the hood This is first first thing is parsing or lexical analysis or tokenization And the other thing is building abstract syntax trees from the source code so first you have input which is The files of your program. This is all character input. It doesn't mean anything to the tooling It goes through this box, which is parsing tokenization lexical analysis And we get another data structure with which has a little bit more meaning So in my example, you see we have a keyword. We have identifier. We have operators We have numeric constants This is something that static analysis tooling can work with a lot more easily and you can use this information to make decisions about your source code Tokenization is Python is very easy. We have the tokenize module which provides the tokenize function This function receives a single argument which must behave like the read line method So if you're working with file objects, then file object dot read line should work If you're working with strings, you have to wrap them in a bytes.io object And use a read line method The result of tokenize is a generator which will return token info tuple objects token info is a name tuple type It has five elements. So token type is an integer constant and also these names in the brackets These are constants defined in the tokenize module so you can use them as well You have the token value as a string You have start and end position Of this token in the input character stream as tuples So these are starting rows starting column and row and column of this token And then you have the entire line which is currently being inspected by the tokenizer So this is all the output the tokenize gives you. This is how the hello world example looks like and you can experiment with it So it's very easy So experiment with that see how different pieces of source code look like to the tokenizer This is the first step that all static analysis tools do. This is also done internally by Python as well so Next thing is abstract syntax trees Sounds very complicated again used internally by a lot of tooling used internally by Python But if you want to work with them, you don't really need to know how they are constructed or All the details behind that you just need to know that it's a tree base structure It's very similar to DOM trees in the browser or to XML trees You have child nodes parent nodes. You have siblings. You have Different types of nodes. They have different types types of attributes and you can work with them Pretty much very easily All these different colors. They are objects from different types in Python and this is how you can recognize them internally when you're writing plugins Creating abstract syntax trees again very easy. We have the AST module, which is built in in Python That is used by Python internally also used by some other tooling like Cosmic Ray However, pylint does not use AST pylint uses astro ID, which is an external dependency Very similar to the built-in module almost everything is named in the same way, but you have to be aware that it's a different module So we have the parse function provided by astro ID This we will receive a string and we will return Astro ID node The root node is the module So everything that Astro ID parses is represented as a module which contains something else inside of it So we have in this example The module doesn't have very interesting attributes because it doesn't come from the file system It doesn't have a name, but it has a body which is a list of expression. So the expressions in the module And we have a single expression in this list, which is a code will function with the name print This function receives arguments, which is again a list and we have only one argument Which is a string constant in this example and we don't have any keyword arguments So again, you can experiment in the interactive interpreter or you can create a small script and experiment see how different pieces of code look like to AST This is relatively well documented library what you're going to need most of the time is The list of classes you need to know their names and you will see why In a second and you need to know their attributes. There are also some helper methods and helper functions That you may want to use they are usually defined in in the base classes So again experiment with that Figure out how it works Next is pilot checker interfaces. This is the internal machinery that pilot provides for you the developer to hook into the Analysis process and be able to to create the plugins This is also the machinery that the pilot itself uses internally So all the checks that you have all the errors that you see when you work with pilot They are implemented with these four interfaces and they are also implemented under as if they were plugins. So the same thing The names are pretty much self-explanatory. So we have open and close which are executed at the beginning and at the end Then you have the raw checker interface This is not very often used only in a few places used Process modules receives the result of AstroID parts so you can scan the entire module as a whole if you wish Then you have the token checker interface which provides process tokens method. This receives the result of tokenize and The most commonly used more than 90% of the time is the AstroID checker interface This will respond to visit and leave methods and the exact name of these methods is depending on the On the class name of the object of the node that you want to inspect So for example, if you want to inspect a function The AstroID know is function def. So you can Define methods visit underscore function def or leave underscore function def Or if you want to inspect a class definition, this is visit underscore class def Leave underscore class def. So that's why you need to know the names and the order of execution is this So from top to bottom You can implement more than one interface in your plugin and the order is important Another important thing the order of execution of visit methods leave method This is depth first and this is important because you can use it and build some sort of a state machine in your plugin Collect some information in the children and when you are leaving the parent node You note that all the children have been visited and you have all the information for them You can make decision Next let's create a pilot plug-in skeleton. So this is the hello world of pilot plug-ins Every module which is valid valid Python module that provides a register function with one argument will be considered a pilot plug-in Pilot will import the module try to execute this function. You can put anything you like this function Usually what goes inside is something like that Linter dot register checker and you create an object from something which pilot calls a checker class This is how the checker class looks like. This is all boilerplate code. This is the better minimum that you need to have for pilot to To be able to execute this class This is where all the logic about discovering coding patterns and deciding if something is an error or not an error is done So you need you need this attribute. This is double underscore implements double underscore and you give it a list of What interfaces you're going to implement? Usually it's only one but I can be more you need the name attribute Most of the time this is not used but it's mandatory and you need the messages dictionary Notice the name. So that's how it needs to be written the key in this dictionary is an Alpha numeric ID This must be unique across the entire pilot Installation and all the plugins that you want to enable The good thing is that if it's not unique pilot will crash and we'll give you a nice trace back and you will figure out That this is a duplicate and then you have the value in this dictionary is Two of three elements. So this defines your error message The first one is the short error message which you are going to see on the terminal if you use pilot It's only one line. The second one is the human readable message ID This is what you're going to use to enable or disable particular checkers on the command line So for example disable missing doc. This is something that we do almost all the time And the last one is a longer help message. This can be several lines long You can all you can also see this on the terminal with additional options This is usually also compiled as html documentation That is the place to explain to the to the developer that sees the message why that is a problem how to fix it maybe And you need to implement some method from these interfaces So you scan some source code and decide, okay That's an error self dot ad message give you the human readable message ID and the rest of the arguments are used to Annotate where this message this error appears in the source code So this module that particular line that particular column and pilot will print this information nicely for you Evoking the plugins with minus minus load plugins the only thing that you need to be aware of is pilot is looking in the standard Python path for these plugins So if they are not there you either you have to move them there or modify the Python path settings and that does it nothing else Now I'm going to show you a few examples from our open source project all of them are on GitHub We do have a lot more available These are things that we use to make our project better So documentation string checker. This is how it looks like the essence of it Implements to interfaces so in process tokens. We basically scan through all the token In the module find all the string constants and keep reference to them in a dictionary key in the dictionary Is the string without the quotes and the value in the dictionary is the string with the quotes Then we implemented these Asteroid based methods So when visiting modules class definitions and function definitions We want to inspect the documentation string and what we basically do is a dictionary look up We find this thing in the dictionary and if it starts with three double quotes, that's fine Otherwise we consider it an error trigger a message for the developer the checker for Django So again visit cost look for hard-coded strings. We don't really care if this is inside of a foreign key Definition or some place else if that is a hard-coded string We raise them we raise an error message for the developer that easy And also we inspect the imports if we see something like from Django contrib old models import user or do a wildcard import again error for developer To inspect this figure out what's going on Missing permissions checker. That's probably the biggest one that we have which fits onto slides. Unfortunately So first visit module we try to figure out if this is a views module and in our project We have application slash views dot pi another application slash views dot pi That's the structure and we just inspect the module name Keep this in a in a Boolean flag Next visit function dev we try to figure out if this function that is in a Django view file Is this a helper function or is this? function-based view something that responds to HTTP requests and the way we check for that is if the first argument is named request Then this must be a function-based view from Django and we continue with further inspection For classes we do a similar thing We want to make sure that The thing the class we are inspecting. This is a class-based view in Django This is not some helper class which is defined in the same module and the way we do this is we inspect the list of base classes so because when you when you Use class-based views in Django, they always inherit from something else So we use this to make a simple check. It's not very robust sometimes, but it works for us and The most important thing the inspection part is We basically scan through the list of decorators for the Metatour for the class and search for some well-known names So if you don't have any decorators, that's a problem for us If we do have some search for the well-known names and some combinations between them If we find them fine, if we don't find them again error Developer must figure it out We do have Other checkers in the project. So for example, we're looking for empty modules We're looking for nested function definitions or nested class definitions because it's that's a legacy code base It's been written in not very good way and we don't like to have these things When we see these things they usually mean there are more problems inside So that's why we have these checkers Searching for raw SQL. So Django is ORM based Unfortunately, we did have a lot of hard-coded SQL statements in the source code Which were not compatible with different types of databases. Again, we have a checker We have checkers for the libraries that we use For example this thing tags.py This is something internal that we have internal behavior in the application and we don't want to use Objects.getOrCreate. We want to use an internal method which will enforce some permission and some other logic So that's why we have this and also we have had Some checkers which started live inside of our project And then we were later able to contribute to piling and to piling Django Because they they were valid for other people as well So and the last thing is we do have ideas for other items Like other plugins other checkers to create which are important for us So if you want to experiment if you want to get your hands dirty and start writing piling plugins This is a good place for you to start We can give you exact examples of pieces of source code which we don't like and why we think is problematic and you can try to create a Plugin for that and contribute back to our project if you want to So the last thing I have to tell you is that we are also having a project stand here at FOSTEM So if you want to come visit us and say hi Talk a little bit more about you know why why or how we're using these plugins. I will be there After this presentation and now we have five minutes for questions. Thank you Okay, first question. Okay. Do you use piling? Yes, one person two person five people. Oh Everybody, okay Okay, flake eight. Okay What I didn't hear. Ah black. Okay Okay, but the thing about black is it's nice to However, it's more like for formatting. So Especially in the latest versions of piling and piling Django They they have checkers to show you things which are just considered bad practice Yeah, okay, so the Yeah The question is how many things we put in a check and and because developers don't always agree with something If you're going to so first of all, I am big fan of satisfying all possible Checks that come from piling. I think they are well designed and they are created for a purpose to make your life easier But but then if you're going to create your own customized plugins for your team And you have people that don't agree then you then maybe it's a good time to sit down and make some policies about you know Coding style within the team why you consider something to be a problem and why not and when you have this agreement Then you can create a plugins and people will be happy Yeah, how many false positives do we have in our plugins? Answer is quite a few I haven't counted them and This is for a reason It is it is relatively easy to create a plugin that will detect the most common cases and It is relatively hard to to create a plugin That will take into account all the edge cases so we prefer to to have very simple plugins and Have more false positives just disable them with a comment and ignore them instead of spending a lot of time fine-tuning the plugin Okay, yes Can you elaborate a little bit on how you would now fix the code actually also with pilot plugins? How can I fix the code when I see a problem? Okay, so the question is basically can we change the abstract syntax trees with pilot and The the answer is sort of yes and no There are tooling which use AST to to do dynamic replacement of nodes so for example cosmic rays a tool for mutation testing which is based on Automatically changing the source code and running your test suite And you can you can do this you can also save this into a file when you build an abstract syntax tree It's relatively easy to export this into Python source code and it's almost the same as what was the inputs Pilots doesn't have the machinery to change abstract syntax trees node nodes and then save them to the file system This can be added of course I mean it it will be relatively easy to add but it's not existing at the moment that the tool is not designed to Do these things but it is possible Okay, thank you