 Hello. Welcome. This is Get Driven Refactoring. My name is Ashley Alice Pierce. I am an application engineer at GitHub and I'm on the billing and payments team. You can find me on Twitter at ALS Pierce if you'd like to continue this conversation afterward. So as I mentioned, this talk is going to be about Get Driven Refactoring. But first, we should talk about what refactoring is for anyone not familiar. Refactoring is changing the structure of code without modifying its behavior. It's cleaning up your mess. And of course, for this to be the case, this implies that there is indeed a mess to clean up. And we've all been there. I don't know anyone whose code base is always perfect, despite the best of their intentions. We humans are imperfect. And unlike computers, we don't always do what we're told. And so even though we know better, we make mistakes. But the good news is that once we know what mistake we've made, we can find out exactly how to fix it. Knowing where we went wrong leads us to where we need to be. How? Well, luckily, we're not the first ones to ever write software. It's been done before. And smarter people than me have gone ahead and identified common places that we often go wrong. And they've put names to those common problems. And every common design problem has a specific refactoring technique to address that problem. Once you know what problem you have, you can easily do a search and find the recommended refactor. Now, for that reason, this talk is not going to focus specifically on how to correct your code, but more about that very first step of identifying exactly what problem it is that we have. For a great talk about how you can use design principle violations to find out exactly how to fix them, I recommend Sandy Metz's Get A Whiff of This. This talk goes in detail about matching up code smells with their prescribed refactoring to fix them. But again, here I'm focusing on the first step, how to find the design problem to begin with. And so that brings us to the solid principles. The solid principles are five design principles for object oriented development laid out by Robert Martin and intended to make software more flexible and maintainable. So let's just run through what these principles are. The first is the single responsibility principle. A class should have only one reason to change. The next one is the open close principle. A class should be open for extension, but closed for modification. Then comes the list of substitution principle. Every subclass or derived class should be substitutable for their base or parent class. And then the interface aggregation principle. No client should be forced to depend on methods it does not use. And then the last one is the dependency inversion principle. Entities must depend on abstractions, not concretions. If you follow these principles, your code will be easier to understand and easier to change. Instead of being this tangled mass for one-changed cascades and causes a change everywhere else, your code will be flexible. It will mold easily around new requirements. It will be maintainable and allow you to adapt it however you need to meet the changing needs of the business. So I told you what solid is, what each of the principles are, and why you should follow it. And I told you earlier that once you can identify what the problem is, there are already curative refactorings out there that you can just follow and apply. So that's it. Done. Easy, right? Well, maybe not quite. I said before that identifying our mistakes leads us to fixing it. But just because you heard these rules doesn't mean you understand them completely and can always identify when you're in violation of them. I bet you're a guess that there are some of you in this audience that have heard these design rules before and yet still you're here. Probably because you've had trouble recognizing these patterns in real life. So that's what we're here to talk about today because I've had that same problem. Some of the examples in this talk are going to be using Git on the command line and some will be with PRs on GitHub. These are, of course, two separate tools, but I often use them together in conjunction to help me achieve my refactoring goals. But these tools aren't usually thought of as the first thing one would look at to help you refactor. So you're probably wondering how or why I used them. As I mentioned in the beginning, I work at GitHub now and since this talk focused a lot on GitHub, many people will assume that I formed this talk as a result of working there. But I didn't. The beginnings of this talk started to form for me while working at Thoughtbot. Thoughtbot is a software consultancy and they're well known for their ability to help companies write clean code. They have whole courses on refactoring that they teach and this is one of the primary things that companies come to Thoughtbot for. So I started at Thoughtbot as an apprentice and I was in general very green. I had only worked at one company previously for about a year and so although I had done all my homework, I wrote a ton of books on refactoring and clean code except from Sandy Metz and Martin Fowler and all these big names, I didn't have a lot of experience putting that theory into practice. I was writing code that I thought was clean and that didn't seem to me to obviously be violating any one of the many object oriented design rules that I had memorized and yet other developers there were much more able to recognize in the code patterns that I just didn't see at all. I'd submit my own code and think it looked great or I'd review others and say awesome, great. And then later down the road, it's still grown to this mess. And eventually what I came to realize was that by only looking at the actual lines of code in my editor, I was doing myself a disservice because you can't recognize design patterns in isolation. You need context. Otherwise, it's like you've been studying for a multiple choice quiz, but suddenly there's essay questions. We need to know not just what the rules are, but the context that will be expected to know them in. And for me, because Git is where all of my code lives, Git is that context. So instead of looking for possible design variations or areas that might need refactoring by looking at the code directly, I turned to Git and started looking for stupid simple ways that Git might tell me when a class was in need of refactoring. So that's why all the tips that I gave you today are not going to be some new crazy Git or GitHub feature that you've never seen before. It's about saying the things that we work with every day, like the Git log, PRs and comments differently and viewing them through a lens that will help you on your goal towards refactoring and writing cleaner code. So let's get back to the solid principles and see if we can use clues in Git to make these principles make a little more sense. I'll explain each of the principles briefly as we go through, but I want to be clear that the goal of this talk is not necessarily to be an exhaustive exploration of solid. My goal here is to simply use solid to exemplify how we might use Git in general to find design principle violations and refactoring opportunities. Hopefully, this will get you thinking about how you can use Git as a lens to view whatever software design principles are important to you. So as I mentioned before, the first solid principle is a single responsibility principle, which again means that a class should have only one reason to change. When I hear this, I'm like, wait, I'm confused. I don't know about you, but my classes change for lots of reasons. Maybe there's a new feature to add or I want to improve some bit of code. But when I was really trying to figure this out and see which of my classes, if any, might violate this principle, I said, okay, I have to take a step back and really think about it. Why has my class changed recently? Because ideally, you never would have written any code that violated the single responsibility principle. But if you're like me, you probably thought everything was going great when you wrote it, but down the line, something starts to feel off. And you can't really pinpoint what it is, but your code gets harder to change and you start looking for ways to refactor. And this question is a great place to start. You have to really know the history of your classes to know what might have gone wrong with them. And luckily for us, there's a place specifically suited for that. The Git log for that class. This is where I start when I really want to understand if a class I have might be violating some design rule. And it's the perfect way to see if your class has violated the single responsibility principle. This is the commit history for my engine class of my Tesla Motors company. And here's the code. You don't have to squint and try to read it. But with a cursory look, it seems pretty reasonable to me. Lots of small, well-named methods that do one thing. An engine start method that takes a true or false as to whether we're starting the engine remotely through some sort of app or the fob. A delay start so you can set the time that you want to start your engine, like in the morning at 9am during the winters that it's all warmed up by the time you get in. An idle method that decreases the power output of the engine. An ill-assist that increases the power output when we're going up hills. And then we just define some devices that are allowed to start the engine remotely. Seems reasonable. But this class has been seeming a little off to me. I feel like it needs a refactor, but I can't really pinpoint what's wrong with it. But I know just looking at the code, I sometimes miss things. So I'm going to take another look at the commit history for the class. It seems like there have been a lot of reasons this class changed, mainly adding some new functionality. But I'm going to try to take a step back and see if I notice any patterns. And indeed as I look at this, something does begin to jump out at me. I have a lot of changes in this that are related to the remote start functionality. But then of course, some that don't seem to be related to that at all. And really, the more I look at this, the more I realize, you can group the commits into two logical groups. We have commits that are related to the power output of the engine, like increasing the power adjusting its frequency and adding hill assist and idle functionality. And then of course, we have all the remote start functionality. So initially, when I looked at this, I thought there are a bunch of reasons the class had changed and that identifying one was impossible. But really, when you lump these changers into larger categories, you can see that the class has two reasons for change. A change in the power output and a change in the remote start functionality. And suddenly the way forward towards your factoring seems much easier. Just by using the Git log to understand what was really changing about my classes. Now I know that I probably need to make remote start its own class. Having this as a separate class will make the code easier to change and keep these responsibilities separate. So now I don't have to worry about every little change to remote start affecting how the power output is determined by the engine. Now obviously, for this to work, you need to be writing commit messages of a certain type. You need good commit messages that serve documentation of the important things that change in your code and why. If you have a bunch of web commits or commits for trivial things like adding white space, that is not helpful. Because things like add white space add nothing to the overall story of your code. That's documentation about what changes you were going through in your process as you were writing the code. Not documentation about what changes the code was going through. You can write commits like this as you're working, but before you go to merge that in, you need to rebase or squash merge your commits to have good commit messages. Embrace the full power of Git to help you refactor your code by using a way that makes that easy. Help Git help you. I've heard the arguments of some people that don't particularly care about having nice commits, and a lot of them say they use PRs more as their code documentation, but there's no easy way to see all the PRs that touch one class. With your PRs, you can't get quick context locally on your machine or in your editor like you can with the Git log. In my opinion, that's reason enough for me to use commit messages that are clear and concise until the overall story of my work. The next solid principle is the open-close principle. A class should be open for extension, but closed for modification. This is one of those principles that sounds really simple, but was actually really confusing to me, because I didn't understand how you could extend something without modifying it. It's like saying you should be able to change this thing without actually changing it. So I didn't really get it. I read so many books and blog posts and articles on this, and I really felt like I understood the concept in theory, but I feel life of me could not figure out how to put it into practice. I never once looked at a class and said, you know what? This is not closed for modification. I would review my co-workers PRs and try to guess whether this class would ever need to change in the future and if that change would be easy to make, but I just didn't fully grasp the principle enough. So let's take a look at some example code and some example PRs on GitHub to see if we can make some sense out of this concept. You don't have to try and read all this code, but this is a class that checks for engine health, and so mainly it's concerned if there are any electrical shorts in the engine. So this PR gets submitted by one of my teammates, and it's my job to review it. So I'm like, ship it, it's great. And then the next requirement comes in, and it's a new feature request. They want to add something to the check engine class. They also want the engine to check that all the spark plugs are connected. So someone submits a PR and it looks like this, and then another feature request comes in, more change, they want to check something else, they want to check the alternator, so my co-worker adds their next feature and the PR looks like this. And without even having to read the tiny, tiny code, we can notice something about this PR and the last one. What do we see here in the diff? This thing has more red and green than a Christmas tree. We're adding a feature and all the new lines are in green, and yet we also seem to be changing or removing a lot of code as well, as indicated by the red lines here in the diff. The phrase, your class should be open for extension, but closed for modification, really just means that you shouldn't have to change existing code to add new functionality, which to put in the context of GitHub and what you're actually seeing in your PRs means, when adding features, there should be no red in the diffs. If every time you're adding functionality, you need to change existing code in your violation of the open-close principle. Learning this was a mind-blown moment for me. It's so simple. Learning to look for this one indicator in my get diffs instantly made this principle makes so much more sense. I couldn't believe that I had been struggling and struggling to identify this theory in practice, and the answer was color-coded in front of me the whole time. If every time you need to add a feature to a class, you see red in the diff, you know you may be in violation of the open-close principle. So if you have a class that doesn't quite feel right, look back at all the PRs that have touched that class and were only meant to add functionality. If you're seeing a bunch of red, then this is likely your problem. In fixing this problem, making your class truly open for extension but closed for modification will make it so much easier to maintain in the future. All that red in the diff indicated extra work that someone was having to do to shoehorn the new feature in and make it fit. Now that we've recognized that the problem is an open-close violation, you have the opportunity to fix that class and make sure no one is having to do all that extra work again in the future. Now in the last section, we talked about how for this to work you need to have solid commit messages, and this one comes with a bit of caveat as well. Consistent style matters. For this to work, you have to have a solid style in place. I used to think that having a set style guide was stupid and that it didn't really matter, but then when I started to realize how powerful the tool get can be for recognizing opportunities to refactor, then style became much more important to me. If in addition to adding a feature, you're changing around the syntax from symbols to strings or rearranging the method order to be alphabetical, changing the hash syntax or any of the other silly things developers fight about and feel very strongly about, then you're muddying up your diffs and you make it really hard just in general to review your code, but it also then makes easy and simple tricks like this useless. You've taken all the ease out. Reviewing this PR is going to be a pain in the butt. Programmers are lazy, keep things easy, keep your style consistent. The next principle is the Liskov substitution principle. Every subclass or derived class should be substitutable for their base or parent class. Set another way, this is you should be able to call the same method on a subclass as you could on a parent and get a compatible result. If the superclass returns a string, the subclass should also be able to handle that method and return a string. Now, there are a lot of languages where this wouldn't really be a problem and the language itself will use some checking of this for you, but I program in Ruby and Ruby has no such guarantees. So, despite all of our best intentions, I or someone else in my team might end up writing something like this. Here we have a vehicle class with the turn on lights method. In its car subclass, we actually don't want to turn on all the lights, just the headlights when this method is called. And for a bike, we don't have any lights unfortunately, so we raise a not implemented error. And this is of course an LSP violation, because you can't call turn on lights on bike and get a compatible result to what you would get on vehicle. But you know, people forget and try as hard as we might, we're not perfect all the time. So I think this is great and I push this code up to GitHub and open a PR. And my code will of course be reviewed by a teammate, but my reviewer is also not perfect. So I don't want to rely on them to always have to catch things like this. But thanks to the Magic of GitHub webbooks, I don't have to. When I open the PR, I could get this. An automatic comment from our stylebot that this might be an LSP violation. These automatic checks are a fantastic way of ensuring that your code stays clean. You can write your own stylebot like this to check for whatever design principles you want. Provided there's something easily identifiable in the code that Kong and not implemented on a subclass. Plus there are linters I count from Thoughtbot that I'll check for more general styleguide violations and every day more powerful tools are coming out to analyze your code. So if you've never thought about writing custom webbooks, I really encourage it. Internal to GitHub, we even have things like this to check for certain changes in the database files to warn us away from making those kind of changes or letting us know that we may need to get the database team involved. Possibilities are endless and this can be a great tool for refactoring. Issa and respond to are also some things that you could check for with this. Obviously there are places where these things might be called where they're not specifically an LSP violation, but they're usually a code smell, especially in a duck type language like Ruby where the class of a thing shouldn't matter. Calling attention to these things through the use of a bot makes the person who submitted the pull request stop and reconsider their code. Is this not implemented or really necessary? Is there a better way than having to check the class of the object? They probably knew when they wrote the code that it didn't feel quite right, but it may have seemed like the only way at the time. Having a bot publicly call attention to this thing that you did that already felt kind of vicky is usually pretty powerful motivation to take a second look at it. And for reviewers, this also gets them thinking about it as well. Maybe they normally would have skipped over or not noticed it in their review, but now suddenly they're also thinking about different ways you can design the code to avoid this. And they can chime in with suggestions to help you fix it. Don't rely on your human reviewers to always spot things like this off the bat because even if they see it, they might not want to mention it in a comment because they assume the best of you and they assume you probably really needed it. The bot makes no such assumptions and it will force you to rethink. Wherever possible, let your bots do the nagging. They can see things a lot more clearly than we can. After all, when you write a bot like this, you're writing it with the sole focus of helping you improve your code when you're much more focused on that task than you normally are. As much as we'd all like to say that we're always focused on code quality and improving our code, we've all got a million things on our mind day to day in our jobs. Sometimes it's hard to see every aspect of a pull request that you're reviewing clearly. Automate away the things that can be automated. Avoiding LSP violations like this makes your code able to adapt and grow much more easily. The next principle is the interface segregation principle. No client should be forced to depend on methods it does not use. The classic examples of this are in type languages like Java, where you have actual interfaces that define some sort of behavior. So let's switch to Java to see an example. Let's take, for example, the aircraft interface. This says that anything that implements the aircraft interface, anything that wants to be an aircraft, must implement these methods. Take off and retract landing gear. Here we have an air traffic control class and it takes anything that implements the aircraft interface that we saw earlier and it tells the aircraft to take off. Well, what if we have a helicopter as a type of aircraft that implements this interface? It should still be able to interact with air traffic control because it still needs to take off, but it doesn't need to retract its landing gear. So this is an interface segregation violation. Air traffic control depends on everything that it interacts with having a retract landing gear method, even though it does not use it because the aircraft interface defines it. Another way that this principle is often explained is by saying that many small client specific interfaces are better than one large general purpose interface. Instead of having one huge aircraft interface that does lots of things, as we saw, it's better to have a small interface that only does the things that air traffic control needs. So I said the typical examples in Java, but can you see ISP violations in non-type languages like Ruby? Let's take another look at the air traffic control class, this time in Ruby. This time when you initialize the class, you pass in all of the aircrafts that we'll be dealing with and we have a method called today's aircrafts that finds the flights that are supposed to go out today by checking where the flight date is today. In Rails, this where method is an active record method, meaning our list of aircrafts here must be an active record relation, a list of active record objects straight from our database. So even though you don't see it clearly defined here, air traffic control really relies on the active record interface. If we were to pretend that Ruby is typed and do a little like Ruby Java bastardization, it would look like this. With an explicit requirement, the aircrafts implement the active record interface, which is really not that different from the air traffic control class in Java that we saw was in ISP violation. What if you wanted to pass an aircraft from a CSV instead of an active record relation of aircrafts from your database? You wouldn't be able to because there's an assumption that the aircrafts are an active record relation. Of course, you could take advantage of Ruby's duct typing and just define a WARE method on CSV aircrafts. If we define WARE on our CSV aircraft, it's no big deal that in the beginning we assumed aircraft was going to be an active record relation. But the point is that that assumption can get out of hand really quickly. First, you're calling aircrafts.WARE outside of the aircrafts class, then aircrafts.FIND, then aircrafts.Order, aircrafts.Having, and then aircrafts.Group. And before you know it, you basically need to reimplement all of active record on your CSV aircrafts class to be able to use it with air traffic control. Because you wrote the air traffic control class, assuming that aircrafts would always adhere to the active record interface. And this is, of course, not saying you shouldn't use active record within the aircrafts class. Calling active record methods inside of the aircrafts class is completely fine and actually what you want. A model that's backed by active record can, of course, use active record methods. But just don't let that leak all over the rest of your application into completely unrelated models like air traffic control. Air traffic control should not have to know that aircrafts is an active record backed model. So you can violate ISP and Ruby. Or at least you can violate the spirit of ISP and Ruby by relying on huge general-purpose interfaces like active record instead of small client-specific interfaces like those methods that you've defined on your model. Because whether we're in Java or Ruby, the statement that small client-specific interfaces are better than one large general-purpose interface still remains true. All of that being said, whether you're using a type language like Java or a dynamic language like Ruby, using a large general-purpose interface isn't always a huge deal. Or at least lots of times it's likely not the biggest issue with your code. It's about balance. You don't want to rely on lots of huge interfaces or have tons of useless tiny interfaces. You want to be able to take advantage of Ruby's duck typing and implement your own version of one small method but not re-implement all of active record on a non-active record model. So do you get to help you find that balance? I would use the Git log again here to see if or how big of a problem I have. Instead of just using it to identify different things that were changing in a class like we did for the single responsibility principle, we're going to use it here to see how often our class is changing. And when change happens, how often is it related to some sort of ISP violation? If you have a class like this, where you're constantly adding additional methods trying to imitate some other interface, then you probably have an ISP violation. The last solid principle is the dependency inversion principle. Entities must depend on abstractions, not concretions. Here we have a car class with a play radio method. And inside that method we're calling fmradio.play. This makes fmradio a dependency of the car class. This class relies on fmradio's continued existence and you can't easily make cars with different types of radios. To be in line with the dependency inversion principle, we want to decouple this code. So to do that, we can practice the concept called dependency injection, where instead of calling fmradio deep in the internals of our car class, we inject it as a dependency right when we initialize the class. This is another case where you could benefit from an automatic style bot on GitHub to help call attention to this type of code when it gets added. Again, instead of depending on yourself or your human co-workers to tell you you did something wrong, use a bot. It's much more reliable and consistent and it won't allow you to skip buying code review without having to rethink your code. And at least try to reply here, explaining to the bot and your co-workers why you think this isn't a violation of the dependency inversion principle or why it's a worthwhile sacrifice to make here. And if you're just starting on your refactoring journey and you already have a lot of places in your code where you're coupling your classes to dependency, use history and blame to find out how often these classes or methods are having to change. So you can prioritize and make sure you're addressing your worst defenders first. Adapting your code to inject your dependencies into your classes instead of calling them directly in your code. Make your code more flexible and maintainable because you can now have cars that are equipped with all different types of radios. As long as they respond to play, you're good to go. Car is now less specific and a lot more reusable. So let's recap. How can Git help you refactor? You can use the Git log to group commits into topics and recognize when a class is doing more than one thing. You can keep an eye out for red and the diff when you're adding functionality. This can indicate a violation of the open-close principle. You can integrate tools like automatic style liners with your GitHub account to make code cleaner and easier to recognize problems. You can use the Git log and Git blame to find out how often classes or methods have had to change. This tells you where your worst defenders are most in need of refactor. By following all these tips, Git can help you refactor and achieve solid code. Your code will be more flexible and maintainable and adapt to change easier. Hopefully, for all of you here who are already familiar with the solid principles but had trouble applying them, these tips will help you recognize through Git how to find various solid violations and have shown you in the context of the work that you do every day what those violations look like. Thank you.