 So welcome to a talk that I like to call deletion driven development. My name is Chris Arcand. Here's what I look like on Twitter and GitHub. I'm a really social person and I love meeting new friends at conferences and whatnot, so be sure to say hello. My username everywhere is just Chris Arcand. So we are here in Cincinnati, Ohio, the lovely city of Cincinnati. I've never been here before, but I'm really enjoying the week, especially the giant plates of cheese with a little bit of chili that they put in them. It's really good. I hail Northwest of here up in Minnesota on the Canadian border. Minnesota goes by a bunch of different names you've probably heard of before. One is the land of 10,000 lakes. There's the North Star State. That's super cold place. If you're from Canada, you might know it as the Great White South. And if you're at a Ruby conference like this one, you might vaguely remember it as that one place where those Jay Ruby guys live, right? So like these two guys, I live in the twin cities of Minneapolis and St. Paul. We have absolutely gorgeous summers there and have beautiful forests and lakes to enjoy. In the winter, we love playing hockey and the winters always look as beautiful as they do in that bottom picture. Except if you've been there, you know that I'm lying. It often looks a lot like this. That's a man on a snow bicycle riding through a blizzard. After such a blizzard, sometimes things look a lot like this. This is a line of cars parked on the street if you can't see. But hey, I'm just gonna repeat that the summers are lovely and you should at least come visit during the summer sometime if you're nearby. So I am a Ruby developer at what Aaron Patterson has always described as a small startup. You might have heard of us called Red Hat. I work remotely out of Minnesota. There's no engineering office there. And at Red Hat, I work on Manage IQ. So Manage IQ is an open source cloud management platform that powers Red Hat cloud forms downstream. It basically aggregates all of your enterprise infrastructure into one place and adds a bunch of functionality on top of it. The code base is hosted on GitHub. It's easy to find and you can learn even more about it talking with me afterward or at ManageIQ.org. And we're always on the lookout for good developers. If you're interested in joining us, please see me, send me an email, whatever. I also have a ridiculous amount of swag with me at this conference. So if you feel like you don't have enough stickers or shirts or lanyards or screencloths or even Manage IQ candies, please seek me out here at the conference. So why am I here? I am here because I love programming. And as such, you can probably imagine that I love writing code. However, there's something else I love even more. And that is I love deleting code. So Ruby has been a successful programming language for some time now and we as Ruby developers might now maintain legacy applications that have been developed on for many years. A consequence of our long-term success is that these applications may contain unused, obsolete and unnecessary code. Now I'm gonna tackle a specific case today. I'm gonna talk about methods that stand worthless and dead unused by any callers in the application. And that might be fine for frameworks where a public API is exposed and never called within the framework itself. But in terms of an application, it just adds cruft. Now how does code like this end up in our projects? There's a couple of reasons that I can think of. Think of like a developer from the beginning of the project years ago, adds methods that they think will be useful someday but actually aren't. They never actually get used. The implementation might change underneath them and they don't actually work anymore. That's a kind of over-engineering. There's also poorly written code. So imagine a brand new inexperienced developer joins the project, which is great. They write a very specific method that isn't very flexible and is completely on the helpful beyond anything besides the single spot that it's used. Maybe it could be refactored and written a little better more generally. Now hopefully these two examples are just caught in code review but sometimes they aren't. It also could be that things have just been refactored over time and methods just aren't needed anymore. So you might ask who cares? The short answer is that unnecessary code is confusing. It adds complexity where there shouldn't be any and it creates an unnecessary maintenance burden on future developers and it makes you scroll more in your text editor which is annoying. But don't just take my word for it. Other people think so too. There's a great post by Ned Batchelder called Deleting Code from 2002 and in this post there's a snippet that I'd like to share. If you have a chunk of code you don't need anymore, there's one big reason to delete it for real rather than leaving it in a disabled state to reduce noise and uncertainty. Some of the worst enemies a developer has are noise or uncertainty in his code because they prevent him from working with it effectively in the future. Now before we get wound up trying to implement a feature and already lost in the noise and uncertainty, I ask, what if we could programmatically find unused code to delete ahead of time before we're trying to implement something else in that area of code? It turns out we can to an extent. So today I'm gonna describe how a static code analyzer can be built to find potentially uncalled methods. Now because Ruby is a dynamic duck type language and this analyzer is only static, it's not going to be 100% accurate but it has the potential to point out some areas in our code that we can clear out some cruft and add a bunch of deletions to our GitHub stats. So we start with some Ruby code that we want to analyze. The first thing we need to do is transform the code into some data structure that we can reason with which brings us to part one, parsing the code. Now some of you know how language parsing works and are very, very aware of how it works with Ruby. Some of you maybe aren't and I think it's important for everyone to understand how things work from the ground up. So this is really a high level overview of how general language parsing works from a grammar, how Ruby does it and how we're going to do it. So do you understand the following sequences of characters and how do you know? The boy owns a dog, okay? A boy bites the dog, kind of weird but okay. Loves boy though. Now how could you programmatically determine which of those are correct and which are not? There's a way to do it and I'm gonna throw a couple definitions that might be very familiar to you. One is a context-free grammar or a CFG. It's a set of rules that describe everything contained within the language. It basically answers the question, what sentences are in the language and what are not? We also have backest NAR form or BNF. This is just one of the two main notation techniques for describing this CFG. So here is a context-free grammar for all of the sequences of characters that we showed you earlier. It's really simple to look at. Everything you see on the slide here is a symbol and symbols are split into two groups. The non-terminals with the little brackets there and the terminals that I'll put as all caps. The way it works is a symbol on the left is replaced with an expression on the right. An expression can be a combination of non-terminals or terminals and a non-terminal is always replaced via some rule. Now terminals are the actual token found within the language. They terminate at that point. That's why they're called terminals. So let's look at that first example. The boy owns a dog. Now notice I didn't say sentences. I said sequences of characters. We don't actually know that this set of characters up here is an actual sentence, but we have a rule for it and we can try to apply it. Now if this thing is truly a sentence, that means it has to be a subject followed by a predicate, at least within our simple grammar. Now if we try and split it out and say the boy is a subject and owns a dog as a predicate, we can keep following the rules by replacement. So if the boy is truly a subject, a valid subject, it has to be an article followed by a noun. And again, we'll go through and see, all right, let's try an article is the and noun is boy. Again, we go through. We see the rule for article. It has to be the terminal the or a, which it is. Do any other side? There's boy or dog. Terminals found, it parses correctly. You can also do the other side parsing down until you find those terminals. So in the end, we've identified every part of what we call a sentence in our grammar. And if we do a little bit of rearranging around, we can see this. This is a parse tree. It's the discreet representation of our language that we can use to reason about the sentence. What about a boy bites the dog? Again, if you go through it all, it has the same sort of structure so it parses out correctly. It's totally fine. But do boys often bite dogs? Maybe, could happen. Seems a little weird. We'll come back to that. Loves boy though. As you can imagine, this one doesn't work out. There's a reason why. Well, if this is a sentence, it has to begin with a subject. If that's a subject, it has to begin with an article. And an article isn't valid if it's anything but the or a. It's a syntax error. It doesn't belong in the language. So in programming terms, the written sentences or the code we're talking about here equate to these conclusions. The boy owns a dog makes sense. A boy bites the dog is technically correct. It's maybe not what we meant though. In software terms, that could be a software bug, right? You write Ruby, you don't have a syntax error, but it might be the wrong thing. Well, maybe a boy bites the dog maybe doesn't, I don't know. And then Loves Boy though is a syntax error. It doesn't work. So what does this all have to do with Ruby? Well, you can ask Ruby the same thing. How does Ruby know the meaning of these characters? How does Ruby know that this is a class definition named person, that it has two methods initialized and say hello, that there's an instance variable name, et cetera, et cetera. Well, Ruby does the same thing that we did. My English examples were easy because we skipped lexing, tokenization, and actual programmatic parsing. We just kind of went on intuition saying, this looks like a subject. But hopefully it captured the high level essence of parse trees for you. It's a bit more complex than how Ruby actually accomplished with it. Here's what C Ruby does. So Ruby has the infamous parse.y grammar file, which it gives to Bison. The resulting parser code is used to scan through your Ruby files and tokenizes them, then parses the tokens into an abstract syntax tree, which is then compiled to instructions for the virtual machine. Now, the parser generated from Bison is what's called an LALR1 parser. And I'm not gonna describe how an LALR parser works for you today, because it's a bit out of scope. However, I'm gonna plug this very excellent book. So the diagram I just showed you is from this book called Ruby Under a Microscope. It's by a fantastic human being by the name of Pat Shaughnessy. In it, he explains all about Ruby internals. The first chapter is all about tokenization and parsing that I just showed you, including an in-depth explanation of the parse algorithm. So it's a really fascinating read. You should definitely go check it out. So how are we gonna do it? We're gonna do something a little different. We're gonna use a gem called RubyParser, which is a RubyParser written in Ruby using Rack. Rack is an LALR1 parser generator. So let's take a look at an example. We'll have a class named person with a method greet that takes in a name and just says hello name. So you can initialize a new person and say hello RubyConf. So RubyParser has a class method called forCurrentRuby that brings an instance of RubyParser for a grammar for the current running version of Ruby, which you can then feed the parse method to that. And it gives you back this. Now this is an S expression or a sexB. It's a notation for nested list data and it originally comes from lisp. Now nested list data is tree structured so it's the perfect notation for describing parse trees. So awesome, this data contains the structure of our code. You can see the block note up here is the top level context with a class named person within that is a definition node named greet that takes an argument name, et cetera. So now that we have a parser to put our code in a format we can work with, we now need a way to process it, which brings us to part two, processing the S expression. So before we do something really useful with our sexB, we need a way to easily manipulate it and start getting some general information that we care about. Information like finding where exactly methods are defined and in what classes. We're gonna begin to process everything by building a very minimal tiny class called minimal sexB processor. And the goal of this class is simply to run dispatch, calling a method given a node type in our S expression if it exists. So in our initialized method, we'll build a sort of dispatch hash. We'll take the public methods in this interface, find all that start with the prefix process underscore and key them within a hash according to their suffix. That is if we had a method named process definition, we would seek out the method from its prefix, take the suffix and place the method name as a symbol within the processor's hash keyed by the suffix. Note the name corresponds to a node type in our sexB. Next, we'll write the main method for our processor. Every node in the tree will be passed into this method. It simply looks into the processor hash to call the correct processor method given the current expressions node type. If there isn't one, we call a default method that we can set as an option. If we don't have a method to call and didn't set a default, let's just return nil. Also, we'll put a cute little warning output to say we didn't recognize the node type and are calling the default method if that's actually what we're doing. Now this class, if you noticed, is pretty worthless. There are no processors. It's simply a base class. So to demonstrate an example of a processor subclass, I'm gonna define a subclass called sillyProcessor. Within our sillyProcessor, we'll define two processor methods. If we encounter a method definition in the expression, process definition will be called. Process not definition will be the method that will set as the default for all other node types. So the methods just call puts to tell us information about the nodes. You can see, if we encounter a method definition node, we'll say processing a method definition node in all caps. And for the everything else, we'll just say here's the node type. So both of these methods call some method called process until empty. Now process until empty iteratively shifts and calls process on the next node in the expression until the expression is empty. Every processor method calls this in the end to start parsing the next node. Lastly, we'll fill our initialized method to call super in the parent first and set the options that we want. So we'll say, hey, if you don't understand, if you don't have a processor for this current node, we'll call process not definition and we're gonna turn off warnings because we expect most of the nodes probably won't be identified. It looks like that. So we'll put together a little demo again. We'll just get the expression from Ruby parser and then call process on that. And we can see this is what it looks like. It's about what you expect. It goes through, finds all the nodes and for the method definition node, it does something different. Now, you might be thinking, wow, that's pointless. That's because it is. But we've now added the next tool to our tool belt. Now we're at the point that we can run whatever code we want at a given node. And this allows us to build more complex things like say, a method tracking processor. So using some information from Ruby parser, we can now record where we see method definitions in classes and their line number. We'll set up this class with the same options as our silly processor, but we have a couple new things. We have two new stacks, a method and a class stack. We're gonna use these to keep track of where we are in the code or the tree. We also have a method locations hash that will populate with the method signature as keys and the file name and line number as values. Now the file and line number will be taken directly from Ruby parser. We'll define a couple processor methods. So process definition, we'll just shift off the node type on the expression. The next thing will be the name and then we'll call this in method that gives a block and this process within there. So this doesn't do much other than signify, we're in a method by calling in method. Class does the same thing. So process class, we'll call in class. And then the process until empty is the same thing that you saw before. Here are the two location methods that actually do the work. Now this might be a little hard to see, but it's very simple. They both just add the current method or class onto their respective stacks and pop them off once we yield to the block passed in. Within method we also record the current method signature in our method locations hash with its location. So another thing to think about is that if you enter a new class, that's a new method space. Different methods could be in that context. So we save the old method stack when we go into a class, use a new one for that class and then revert back to the old stack when we're done processing it. Lastly, a couple little helpers. So the current class name would be the first thing on the class stack. The current method name would be the first thing on the method stack. And then we'll record a signature that you're used to seeing with a class name with a little hash and the method name. Great. So let's expand on our example a little bit. We have a person with greets. We're gonna add a little say goodbye, which does magically almost the same thing. We have a class called dog with a bark method that whoops. And then we're only gonna call greet there. The important thing to pay attention to is where the definitions are defined, right? So we have greet on two, we have a method on six and 12. So if we do the same thing as before and pretty print the method locations, you can see that we found person greet on two, say goodbye on six and bark on 12. Perfect. So awesome. We now know where methods are defined. The generic processors that we've built so far to process the S expression tree and record method locations provide the footing with which we can build our tool on. The only thing we need to do now is process the call nodes within the tree, see what's being called and line them up with what call these were now tracking, which brings us to part three, building the dead method finder. Yes, we finally reach the point where we can build the dead method finder that we've been working towards. So we'll do that. Dead method finder will subclass from method tracking processor. So in this one, there are two important collections that we'll maintain. Known is a hash containing sets. So we'll use this to maintain a mapping of method names to the set of classes that define them. That also will be called and this will just be a set of methods that were called. Here are two processor methods. So for process definition, we're gonna key into that known hash adding the current class name to being processed to the set of classes that call this method. Then call process until empty on the remaining six P nodes. With process call, we'll add the method being called to the set of called methods. Next, we'll define an uncalled method. This is where we'll take the difference between known and called methods. In other words, the ones that weren't called. And for each uncalled method, we then key into the known hash to find where a method by that name is defined by class and line number. We'll also have a little helper method called plain method name. This is just because the method name from rebarser is a string, but we're gonna use symbols. It looks like that. So let's expand on our example once again. So with greet and say goodbye, instead of calling puts directly, let's have a little helper called speak that does it for us. We'll also have a pet dog in person that takes in a dog object and sends it pet. And with dog, we'll add a little attribute accessor called fed, because hey, maybe you wanna keep track of whether or not you fed the dog. And we'll add a pet method as well. So to add a little bit of realism, I have a dog named Ruben. So I will call myself the person.new and my dog there. I'll greet RubyKaigi, or RubyKaigi, RubyConf. Ruben will bark and then I will pet Ruben. So we process the sex bee and then we puts out the uncalled and we see person say goodbye is supposedly uncalled and dog pet is supposedly uncalled. After looking closely, you'll find that this is wrong. There's a problem here. Supposedly, on line 38, I pet Ruben, but we're not finding that that's actually called. Now, why is that? It's because we fit an edge case, right? The problem is that our process call is using send as the method being called, which it is really, it is using the method send. But we wanna take into account that sending implies a method called directly. In other words, we've hit an edge case. So looking at our S expression for that, we can find where the actual method being called is and add some logic to say that that is the method being called and handle the edge case. We'll say, hey, when the method being called is send public send or underscore underscore send, look through that, find the literal being sent and we'll say that that is the method being called. Let's try it again. Great. That's there, that's being called, so it's no longer found in our output. Now, this is improvement, but it's still not correct. What about this guy here? We never used this attribute accessor, Fed, right? So it should be in the output. Looking at our S expression for that, we realize that adder accessor is itself a method called the defines methods, right, a getter and a setter, which is why our method tracking processor doesn't find them. Let's add another case in our caller processor to handle that. Say, hey, when you encounter attribute accessor, go through and record that as a known method. Now, record known method does the same thing that we were doing in our process definition method, except it also double checks that the method location was recorded. While we're here tinkering, let's also pretty up that output into something that's a little more readable. Let's define a method called report that looks through the uncalled methods. For each of them, let's look up the location of that method via method locations in our parent class, throw all of that into a pretty formatted lens report and skip this class if there are no uncalled methods. If there are, we join it all together and we print it out. Processing the code now is just getting the S expression, processing it and calling report. So here's what it looks like after we fix the adder accessor case and print it up our output. You can see that, awesome. Say goodbye is supposedly uncalled, which is not. And our attribute accessor is not used either, which is not, awesome. So, great, perfect, awesome. Now remember that this static analysis is not always 100% accurate and that these are potentially uncalled. So we do some manual checking ourselves to be sure that these actually are deleteable. So as suspected, it looks like they are deleteable in this very easy example, so we delete them. Doesn't it feel great? Isn't deleting code fun? It's perfect. So hey, we did it, we made a dead method finder. Now we can start finding code to potentially delete left and right, yes? Time to open a dozen pull requests with a bunch of red deletions, right? Are we truly done? No, we are. Ruby is complex to parse and Ruby has a lot of edge cases. But the good news is, adding edge cases is easy. So think about, for example, Rubin.fed equals true, if we actually did use that accessor. If we add that to the code here, you'll notice that it's still marked as uncalled here. Now if my dog were here and saw him that being fed was an issue, he would probably say just deal with it. And that is an actual picture of my dog wearing sunglasses. But back to the edge case. So it's an attra assign node, which means attribute assignment. The variable Rubin has the message fed equals sent to it with the value of true. So we can record that, again, as a method call. Perfect, looks great. There's so many other edge cases though. What about Rails methods? So every bit of Rails DSL and controllers and models that you're used to using would be an edge case. And there's a lot of them, right? There's after commit, there's before create, there's after create, there's before update, there's before destroy, there's before filter, except it's not called before filter now. Now it's before action. Around save, validate, validates, validates length of, after validation, validates formative, validates cuteness of, validates confirmation, yeah, you get it. Makes me sad. And what about my own DSL? So in Manage IQ, we actually have our own virtual column implementation for active record. Now this digs deep into active record internals to allow us to treat any method as a database column amongst other things. It's mainly used for reporting purposes, reporting attributes of entire tables with extra attributes sprinkled in. So for example, you could have a class disk that has many partitions, and you might define a virtual column called allocated space with a type of integer that uses those partitions. So we actually add a DSL to Rails models in the form of these virtual column calls. The point here though is that this DSL and Rails DSL calls aren't that difficult to handle. It's just another edge case. All of these methods look essentially the same, most of the time it's arguments of symbols naming methods to be called. So we can go in and call them all basically. The point that I'm trying to make here with all these little edge cases is that as with most things with the right tools, the job isn't very difficult and customization is easy. The other thing that's easy is that you can execute this code on your project right now. So there is a Ruby gem called Debride. Now the author of this gem told me last night that it's pronounced Debride, but apparently, Gory Puff's eye doctor proclaims it as Debride, so we'll go with that. Did Debride something is to remove dead contaminated or adherent tissue and or foreign material, which sounds a lot like deleting useless code? So when I first thought of programmatically finding dead code to delete, I went down the exact same path we just did, starting with rack and Ruby parser. I then discovered the lovely simplicity of processing S expressions with a gem called Sex B processor. It was created to easily do generic processing of S expressions given by Ruby parser. It also provides a method based Sex B processor subclass to the method and class tracking that we did with our method tracking processor today. Then to my great delight, I just stumbled on Debride, which does exactly what we did today with our dead method finder. What's more is that everything you see here on the stack from Ruby parser on is written by the same person. Each one of these projects is written by Ryan Davis of the Seattle Ruby Grade, and you people are in serious luck because Mr. Ryan Davis is here at this conference. In fact, Ryan, are you here in this room? There you are. Can I get a round of applause for Ryan for all the fantastic speakers? Thank you. So I've been hacking on Debride off and on for the past several months, customizing it for managed IQ and finding crufty code to delete on a project that started on Rails 1.2.3 nearly a decade ago, and thought it would be fun to rebuild the basic concepts for you today. All of the code you've seen is a modified and minimalist example of ZenSpider's very excellent work. So what does Debride provide that we haven't covered today? Well, it covers more edge cases. Our simple method tracking processor and dead method finder are the core of what Debride does, but there's so much more to consider. What about finding methods on singleton classes? What about numerous other uncommon Ruby syntax like calling methods with colon and colon, all those little edge cases? Ruby is a crazy flexible language and there are many cases yet to be handled. Debride also adds all sorts of lovely little options like excluding particular files, whitelisting methods based on a pattern, focusing on a particular path. There's a Rails mode just like the one you saw earlier. So I mentioned I've been hacking on Debride to find even more dead code. Now, besides adding your own DSL, whitelisting patterns and all that, you can easily do something like this to find other criteria for what might be a deleteable code. So remember what I said about methods that are maybe way too context specific that might have only one caller? Here's a really hacky way to find those methods. Instead of keeping a set of called methods, we can keep a hash with every value being a call count. Then every time we counter a call, we just increment that count and then the ones that are called once are the ones that are called once. Perfect. So those are, again, cases where a method might not even really need to exist. Maybe there's something that's way too specific that the caller itself could actually just handle it there. Just depends on what you're doing. So I've been busy enough hacking into leading code that although I've opened a couple little things on the project, I've still got plenty more that I wanna refine and then share. So pushing more work upstream is definitely my future consideration. Yes, please. Ryan says yes, please. So remember, tools like this are but one tool in the toolbox when finding ways to clean up your code base. Today, we looked at one way, parsing and statically analyzing Ruby itself to find potentially uncalled code. But there's so many more awesome tools to use in combination. For example, there's Old Code Finder by Tom Copeland, which is a Ruby gem that basically checks code content by date and authorship it can get. So maybe you have someone named Fred who used to work at the company and doesn't anymore for many, many years. Well, you might wanna take a look at his code specifically because there might be more stuff that you can get rid of. There's also unused by Josh Clayton over at Thoughtbot. It's written in Haskell. It utilizes C tags to statically find unused codes. So it's not particularly strapped to one programming language. And I am a huge, huge fan of using C tags. I have not looked into that yet, but I really, really want to. Someone mentioned, I think last night, there's also a library called Scythe. I haven't heard of it. I see a couple nods though. That is also one of the ones you should check out that I forgot to add. So before I go, here's a parting message for you. And if you use Merb before it was merged into Rails, you might recognize this. No code is faster, has fewer bugs, is easier to understand, and is more maintainable than no code at all. So when you go home from this awesome conference to leave some code, it feels fantastic. Thank you.