 Hi and welcome to my talk Today we are gonna be discussing like weird things the story of a python a snake and a vulture okay, so Vulture is a scavenger by the way, so it eats dead stuff and that pun is the inspiration for us So I'll start off with a little introduction My name is Rahul Jha and I'm better known on the internet as RJ722 And I'm still an engineering student. I'm in my final year right now. I study electronics engineering I am an open-source Pythonista, which is a like a fancy way of saying I code in Python and I put In the open-source domain and even more than the open-source projects I like the idea of the communities surrounding them and I'm quite lucky to be a part of few of them and The most prominent being the DGPLUG Durgapurlinax user group and the one I have at my college AMEOSS, Aligar Muslim University's open-source software society. Okay, that's been my boring introduction I write my blog at rg722.github.io and RJ722 on Twitter was already occupied so I have like Rahul 722j Okay, that's it Now for the next 20 minutes or so we are gonna be discussing a dead code a lot a lot of it So it's like worth spending the time on let's define this term dead code. What does it exactly mean? When we say the term dead code It's a little opinionated But what I have made sense out of it is that dead code is any part of your source program Which does not affect the usability of your application, which means it's just redundant if you remove it I mean it won't really change the application or any part of your application flow It won't change the behavior. So it's just redundant Let me show you some examples and that will probably make things more clear so here This is like a really complex program. I wrote print hello world But what do you see is like importance of low STF the first line it will take around like one or two seconds, which is Another thing which is to take care of when you have unused imports It will take up some sort of runtime and that is not a good thing One of the ways like why you should remove dead code from your projects So unused imports are like the most prominent and the most the what I can say the most I have found this dead code to in most projects and There are other kinds of dead code too. So like if you define some variables, but you never actually end up using them So that's one. That's another Example of dead code. There is I have seen the code after return statements in Python it will never ever be executed and Third one is like it's a little special. I'll be talking about it in a few moments It's unsatisfiable Boolean conditions So when you have like conditions if true and then there's an else block also, it won't ever get executed there's a special thing about it, I'll discuss and I Know what you might be thinking right now. This seems legit. Why I'm not stupid. Why would I write this code? So this isn't actually stupid. It's much more like it's a human way of expressing We have these kinds of anyway developing something. It's not that we just come up with the final product We we document our journey. We try different things. They fail and then finally I mean There was an xkcd if you have to go from point a to point b the linear Distance between them. It seems like the most sensible way of doing it But it isn't generally the way we go. We go round and round and round until we finally reach place b somehow and in this journey we might like leave around traces of what was there initially and That is like one of the reasons. So I talked about the unsatisfiable Boolean conditions Yesterday night. I had this crazy idea that I would run a vulture. We'll be talking about it on C-Python's code. I found like a couple examples of dead code and there was a special example this So this code right here says if one then do this else do this and This code has been written by any guesses Okay, this code was committed 20 years ago by the creator of Python and I'm still not sure why I can't make heads and tails out of this code And I'm not sure that this exactly is dead code here I discussed this with a friend of mine who's like crazy smart crazy smart being maintainer of papesmart And he also was confused like he said maybe this is dead code, but maybe not. So I'm still not sure But yeah, this was like a unique thing because this was the first time I actually found the unreachable else block and You might be wondering what are all the other ways the dead code might crawl into your project Here so there are like a couple of ways where the most prominent being debugging when you're debugging software You need to really read between the lines and for that You need to define some extra variables Maybe you have a function where you would pretty print something and then whenever the tests start Passing if you're even remotely like me would be like get commit dash am done And that's it There ladies and gentlemen you introduce dead code into your program Another one is refactoring again the same as debugging you tweak some things you optimize your code Maybe you do something else and then you leave around the traces of the old code and again ends up being dead There are also some spelling mistakes, which are like which might lead to dead code So now okay cool. There's dead code. There's a lot of it. There's an example of it in the CPython repository. Why do I care? Because it's not cool See first of all there might be even bugs which may be introduced because of dead code The unused function you might have might even be like superseding a function You might actually want to use with the same name. So that is a weird way of like introducing bugs into your system and The third the second thing is like the maintenance burden even you write code You have to maintain it for the next 20 years. See the maintenance burden is directly as a direct proportional Function of the code size. So if you can somehow reduce the code size, you are reducing the maintenance burden on you And that is like a pretty important thing Maybe you have a beginner coming in and reading on to your code after yourself and you might be and they might end up wondering Okay, what is this thing doing over here? I mean, I don't I can't figure out what it does And they might be afraid. They might be confused. So that's this So is there a way out? Yes, my lord. There is it's called vulture. So the part the plan is that vulture scavengers on dead code Okay, let's go for a demo. I was taught that Like whatever you do in a talk, don't ever Do a demo and that's precisely what I chose to do. So everything will go smoothly Murphy's law Okay, so The slides are changing See everything's going smoothly So let's first install vulture. It's as easy as doing pip install vulture That's it. I already installed it because there might have been connection issues here So that's that's it basically and then you what do you do is like you write to vulture? I have a file over here. Let's first look. What does the file looks like? Yeah, so what it does is there's a person called Guido and then there's a function I want like I want to greet someone. So I say hello The person's name, which is the argument to the function and when I was writing the actual code As as often the case I forgot that I have a function for this and and I ended up doing it myself writing here So this function here is unused. Let's try running vulture on it and see Vulture basic, sorry one basic demo dot pi You get one basic demo dot pi, which is the file name and this is the line number Okay, you have one line number three. There's an unused function called greet. Okay. Good. We found it. Good But there's also this thing this 60% confidence So what this shows is like how sure vulture is that? Your code is actually unused and it's not a false positive. So you might be wondering, okay So vulture gives false positives to yeah, it does and we'll talk about it just in a second and before this like I want to show you that all Python projects mostly end up having some sort of dead code and to prove my point what I did was like I went on to get up trending trending section and I will and I said selected the language Python and I and I just cloned the first project that came up. It was detector on two. I have it cloned up right here See detector on two and I will try running vulture on it right here It will take a second. Yeah So even this code has so much of dead code See So that's was my point now. We are back to this thing 60% confidence Okay, so let's study what all are the cases when vulture would actually like report dead code So I have this file over here. Let me see if I can Yeah, so this is just a basic flask app a minimal flask app Which is what it does is like whenever it receives a request on slash it will return a string. Hello world simple so if I Let's try running vulture on this thing vulture to underscore flask example huh Index function is being reported as unused here Any guess is why? Okay, it has to do something will like how well here like finds the what part of your code is unused We'll learn we'll talk about this. Let's also try one other example. There's also 3 underscore dynamic code dot pi So what I have here is like I have a class mock class which has a method do something and I Created instance of the mock class here and rather than doing mock dot do something what I do is like I Create a dynamic object I pass in the instance and then I search for this name in the method list of the class mock and I find this and Then I call this thing here So what it does is let's try running it python 3 underscore And yeah, I did something so it is working. Let's try running vulture on this Again unused So let's try let's see. Okay. Why is this being reported as unused? So what Vulture does is whenever it sees Whenever it you say that tried identifying dead to it from this project what does it is it passes something It passes your code into something known as an AST an abstract syntax tree So what is an abstract syntax tree? So abstract syntax tree, I'll discuss the three words and maybe you can make more sense out of it Sorry Yeah So an abstract syntax tree abstract here means that you are not gonna like represent the code as is you have already typed Not the aesthetics. We are not concerned about whatever white spaces you might have in the in the say You better used single quotes or double quotes in the code. You're only represent You're only concerned with the pure representation of the of the It is independent of the form factor the syntax here means because that it is Syntactically correct means it is following all the grammar rules of Python and T is stands for tree here Which is like a very popular data structure Huh, so let's try parsing this expression over here into an AST So this is a mathematical represent. This is mathematical Expression 10 star 2 plus 5 that's gives 25 Okay, so what we have is like the most basic node 10 to they get multiplied this gives me 20 And we have got 5 years. So this is 25 easy, right? Okay, let's try something more complex I'll show you an AST here and let's see if we can figure out what Code does it represents in Python? Okay, so we have an if node over here This is how the Python grammar would represent any if node in the language So first thing is there's a compare thing which is what it does is it? Evaluates whether or not the condition you have passed is true or not So here it is a greater than symbol and the variable number There's a number there's a variable called number and it is compared with an integer called 5 And if it is greater than it it will go into the true node What it does is then it returns a string called bigger than 5 and if it is false it returns not bigger than 5 Okay, so the code for it would look something like this number greater than 5 return this else this Okay, so all the beautiful drawings here they are by Toby Osborne and here's the link He explains abstract syntax trees pretty clearly. Make sure to check it out Okay, let's try one more thing. Let's try looking into how the AST node for this function would look like Okay, so we have a function greet which does basically print hello the person's name Yeah, so there's this node function def which tells Python Okay, the node you are trying to the node you are trying to access is a function definition and the name of it is greet and The arguments you have is only one argument in this case Which is a greet name which is the name of the person you're trying to greet and the third thing is a link to Okay, what does this function do? What is the call like when you call the function? What is ought to do here? It is just a print statement. Hello plus the other person's name Okay, let's try for one more thing. This is a module The basic module which we just saw in the example So let's see. Okay. What does an AST for this looks like? First of all, I'm sorry for my horrible handwriting. I didn't have the time to like make the fancy slides So I just drew it myself What I want to do here is like I want to emulate what welcher does in the form of a diagram So let's try doing it live. Okay. So what welcher does is initially when it sees an AST is it initializes two buckets Okay, I have two buckets over here. One is known as defined bucket And other one is the used bucket Okay, so the names are self-explanatory. So whenever the welcher Passes it as an AST what it does it it starts parsing nodes one by one. So first it would go here It sees okay. This is an assigned node. It means that a value is being assigned to variable So we see that the the string Guido is being assigned to a variable called person So it will just put a welcher has been programmed to put this name person into this thing defined Okay, so we'll put person over here. That's good. Now it comes on to the second node the function definition node Now welcher has been programmed whenever it encounters a function definition node What it will do is it will just go on to the name greet and then it will Put this into the defined list. So we'll have another thing in the list greet. I'm sorry for my writing and The third thing is it will jump on to the third node if node and It will check for both the true and the false thing So it will check for true and it says print. Okay. Oh, yeah person thing is being referenced over here So it puts that okay, this thing is being used. So I might as well put it in a bucket person Okay, so now we have two buckets one defined one used This is how we can like extract the unused code out of it We got to define thing person and greet and we saw that okay, this thing is using the used thing person So the unused part is just the greet and we also store the metadata about every node That is how we can store like this is an unused function and it is it is on line number three the length and Okay, so what about our problem? Our problem was that the dynamic code, which isn't explicitly mentioned in the world It which isn't explicitly mentioned in the AST what what is that? Whenever you are using getator or setator We can't actually Just directly parse it from the AST what type of node you are being using and that is how why Vulture reports it as unused So now you might be thinking okay, so my build will fail every time I have one of these necessary conditions Or if I'm using a flask project, it will always fail Vulture Shouldn't know that we have a solution and the solution looks like this If I use my unused code, it's no longer unused Okay, so what it does is it basically we try to fool Vulture What we say is we create a different file a new file and we just write the names of the things Which we know are being used, but the Vulture is reporting as unused So what it does is it just references that name and then this name can be there in the AST And then it can be parsed and put into the used bucket and thereby it won't be reported as unused so like for So there can be a file in this case You can create a whitelist.py file and you can just write a single line over there greet. I'll show you right here Okay, okay for the flask example We have this index thing. Okay, let's create a flask whitelist.py Index that's all that's all we have to do and now Vulture to underscore flask example flask whitelist.py And yay It didn't report it as unused now. I was doing it a lot manually So what I did in fact was like there is a way we can automate this thing So that's is you can pass a flag called dash dash make whitelist and it will print it in a format Which you can directly copy it paste into a file or you can pipe it pipe it directly into a file And you won't be needing so like once you've ensured that all the the output by voucher is being What is being reported is actually being used so you can just pipe it into a file Let's say whitelist.py and you're done And then whenever you will be running voucher on your project. You can just include that file name Okay, so by My expectation is that I've made you people realize that well, it is awesome and you might be wondering Okay, this is the real cool stuff. I want to help out. What is the way? So let's give good news. We are totally open source. It's an MIT licensed project on github. It was made By a very smart person who is also my mentor generic site He is a PhD at University of Basel, Switzerland and we're also hosting a Desperance in PyCon India on 14th and 15th October in IITM research Park Come collaborate with us. We'll hack on some real issues We have like when the upgrade in Python 3.8. This has some new sort of new AST nodes defined and we need to like update vouchers code for handling those nodes and it would be fun come At last I also want to give credits to all the people who helped me throughout this presentation Hosein, Pradyun, Prajul, Prateek and Sarpik. Thank you and That's all folks. Thank you so much Also one more thing the slides link here. It will give you a 404 for now. I'll just go and upload these slides Okay, so We have time for only one question Here gone. Sorry Is a pilot going to tell you that these imports are unused? Yeah I mean, how different is Vulture? There's no difference other than like Vulture also detects these things called like the special thing. I told you unreachable Code in terms of after return statements unused functions on you The the piloter I think it flake it also does this but it only is limited to unused imports and variables We also do it with functions the methods and we have some pretty fascinating like you can Watch the github repository and you can see that we have some pretty sophisticated Configuration you can pass on so if you're using if you have test cases which are only run by the unit test It would be reported as unused by every other tool But we have some ways of some mechanism so that we can just whitelist all that code at once without any configuration whatsoever So like that's the other thing Okay, thank you