 Our next speaker is Coralyn. I had the privilege of meeting her two years ago at Madison Ruby. She's given lots of very awesome talks since then, and we're lucky to have her here with us today. Mike's on, there we go. Okay, so I'm here to talk to you today about data-driven refactoring. Your psychometric complexity is going through the roof. You're in danger of flunking out of code climate. RuboCop has issued a warrant for your arrest. There's whispered talk of declaring bankruptcy on your technical debt. You're being asked to add new features and new features and new features, but you know that the underlying code base is very unhealthy. It's a sinking ship. So what do you do? You refactor. That's basically my job. I'm Coralyn Aida Mkey. Coralyn Aida on the Twitter's and I have the best URL ever, wear.coralyn.codes. You can catch up with all the stuff that I'm doing there. I'm a lead engineer at a company called Instructure here in Salt Lake City. I'm in Chicago. I lead a team that was originally called the refactoring team that we renamed to the developer happiness team, which is pretty awesome. Lots of happiness today. That's kind of cool. And part of the charter of our team is to make developers happy where we had to ask ourselves what makes developers happy? Writing good code efficiently and effectively. Feeling good about the work that you do makes us all happy. So our mission is actually to make the code base a delight to work in. That's what the CEO told me and that's the way I operate every single day. So refactoring is one of the ways that we can make our developers more happy. So we looked at exactly what refactoring is all about. And when I use the term refactoring, I mean in a slightly broader sense than the Michael Feather's definition, I'm talking about refactoring systems, not just methods. So entire applications, ecosystems of applications and starting at that low level of the method. So we should ask ourselves first, why do we want to refactor? Maybe the code base is high friction. When you wanna make a change in one place, the application becomes less performant, less clear, harder to test. Maybe we're dealing with a lot of highs and bugs, bugs that change their behavior when you actually look at them. When you have denseness of conditionals, highs and bugs are sort of the natural progression of the code. And they're very frustrating to deal with. Maybe we have cognitive dissonance. When I look at a method within a class and you look at the same method within the class, we're not seeing the same things. Maybe the method has drifted away from its original intent. Maybe the entire class has drifted away from its original intent. And you're afraid to change it because you don't know what's going to break. So names are something that are set in stone, really hard to change later, even though things do change and you end up in the semantic sort of shell game. We can't afford to blow the code up. Much as we'd like to, we'd love to burn it down in Greenfield every single application that we touch. That's just not gonna happen. But at the same time, we can see about our future. We can see a future where our code is not just functional and successful, but also elegant and beautiful. Beauty is a proxy for intuitiveness. Elegance is a proxy for maintainability. I believe that the primary drive of software developers is an aesthetic sense. We want to write code that is not only functional, but also beautiful. But managers, I cannot go to my manager and say, this code is ugly. I want to make it into something beautiful. They don't really support throwing time and money at beauty. So we have to have some good practical reasons that we want to refactor. Some practical things that we can, practical advantages that we can share. And really, if we don't set these kind of goals and make these sort of practical decisions, refactoring just turns down to combing a wookie. It's sort of the opposite of yak shaving. When you comb a wookie, chewy's gonna look better for a few minutes, but it's an exercise in futility. So refactoring without a plan is an exercise in futility. So what are some legitimate reasons to refactor? Maybe we want to improve performance. Maybe that one controller action that generates a mile-wide object graph isn't the most efficient way to use system time and performance. Maybe we can look at making some things that are currently synchronous, asynchronous to improve performance. Maybe we can reduce the number of database calls we need to make. All of these things can have a very positive impact on the user experience, which are very positive things for managers to understand. We want to reduce the number of bugs that are in our code. We have a special relationship with bugs as developers because bugs represent holes in our reasoning or holes in our logic. We're embarrassed when we find bugs, especially if someone else finds our bugs. But we shouldn't be. Every developer in this room, except me, generates bugs. There are a natural sort of thing, but if the code is more complex, you're gonna end up generating a higher proportion of bugs. So anything you can do to simplify, reduce that bug count. Again, that's something we can measure and something that we can say to management that this is a good reason to refactor. We have some buggy code here. We have this code here that generates a lot of bugs historically and we want to take advantage of some refactoring efforts. We want to reduce the cost of adding new features. As the cores of our code get more complex and more entangled, it fights us back when we want to add a feature. Feature time to implement gets longer and longer and longer. And that's something you can measure pretty easily as well. Look at the number of features that are being added in each of your sprints. How many sprints is it actually taking to implement that feature? The more complex your code is, the longer that implementation time's going to be. And promising a shorter implementation time is a sure way to make friends among the product team. As an industry, we're hiring a lot of junior devs. We're bringing a lot of new people into our field. And someone talked today about delivering on day one and whether that's real or not, whether it's even a good idea. I think that the sooner someone can start delivering really meaningful code changes, the better. If you have a very complex system with very complex methods and a tangle of class dependencies, it's gonna take that much longer for a new person to understand what the moving parts are and get over the fear of even making a change. So anything you can do to simplify the code, the relations between bits of code are going to have a positive impact on ramp up time for new developers. Refactoring also, as opposed to burning it down in greenfielding, preserves the information that is encoded in the system. Our systems that we've had for a year, two years, six years have a lot of edge cases embedded in them. They have a lot of institutional knowledge embedded in them, probably not documented, probably not called out anywhere, may be tested. We can't afford to lose all of that information about edge cases. We can't afford to lose the intellectual effort that went into accommodating all of the business processes that the code is fulfilling. So refactoring can help us leverage that knowledge by extending the life of our existing code. And refactoring also preserves all of the hours and all the dollars that have been poured into a system to make it work. Presumably, if your company paid for you to come to this conference that are making some kind of money, and that system that's making them money, you can't afford to tear it all down. So I'd like to talk first about how we use tests to drive a refactoring effort. We sort of said refactoring good, not refactoring bad. Most of the time, when we write tests, we're doing tests for a few different reasons. We want validation that the code is working as we expected it to work. We want to use the code to document and explore edge cases, and we want our tests to serve as documentation for developers who come after us. Refactoring uses tests for different reasons. Refactoring primarily uses tests to establish guardrails to keep you from going off track in your refactoring efforts, and also to challenge and validate your assumptions about how code works. It's important to note, and I'm gonna go over some specific testing strategies, these are not the tests that you want to live in your application long-term. Some of the tests we're gonna write are generative, which is a bad word. Some of the tests we're gonna write have flickering failures, which of course is a bad thing. But remember that these are throwaway tests. These are tests for a moment in time. Michael Feathers may or may not have said if you're refactoring without writing tests, you're just changing shit. I love that quote, and I'm gonna attribute it to him in any way, hopefully he won't mind. But if you refactor without tests, it's great, because no one can prove that you broke stuff. No one can prove you made it worse. There are no existing tests, seriously. If there are no existing tests around a piece of functionality you want to change, you need to write those tests before you do anything else. And run them past someone senior on the team, more senior than you, maybe you've had more experience with that who can identify what it's covering and what it's not covering in a good way. So in that way, you're documenting your assumptions about how the code is actually working. One of the first methodologies for testing I want to talk about is boundary testing. We're all hopefully doing unit and integration tests to some degree. With boundary testing, we're generating extremes of the input domain, and using those as data to run through our tests. The most important values in boundary testing are nil, zero, one, and infinity. So those are the four sort of boundaries that we want to look at the most. So let's say we have this really simple class, my math, and we have this multiplet method which takes two values, converts them to integers, and multiplies them together. And let's say that we're really bad at math, and we look at that and say that's a lot of calls to two i, so I think I can do better. And what we're gonna do instead is take those two numbers and multiply them together and convert that to an integer. What could possibly go wrong? So in testing this, and this is sort of a guardrail test, one of the tactics we can use is capturing the original algorithm in a lambda inside of our test file. Then we write our example. We want to test that our refactored code produces the same results as the original code. So we're gonna run it a thousand times. We're going to generate two random inputs that are within the domain space. We're gonna call them on the new algorithm and capture the value, the old algorithm and capture the value, and we're gonna count the number of times that they are the same. And we expect it to be a thousand times correct out of a thousand. But math is hard, so of course we get a failure. So that's good. That failure is good. That failure tells us that our assumption about how the code worked was wrong. And that's something we can go back and address. One of the other things we wanna do in our boundary testing is test those values that are just outside of the boundaries. So in this case we're gonna throw some different values at it. We're gonna throw nil, an empty array, an empty hash, a negative number, and an alpha character in it. So in this case, we're again testing to see is it going to fail in the exact same way? So one thing we can do is wrap a begin rescue end block around the original call. We're gonna capture the value if it succeeds and the fact that an exception was thrown if it fails. We do the same thing with our new method. We can write a test here that uses that array of boundary values, call each of those two, and look at what the output values were, whether they were an actual value or an exception. And again, we want them to be handled in the exact same way. We've written our generative test to be verbose, so it will tell us the two cases in which this fails with the nil and with the alpha character. So that's an example of boundary testing, which is, again, guard rails on our tests. These are not tests you wanna keep around. These are tests that are guiding your refactoring efforts, and when you get it all right, you can throw them away. The second type of testing that I wanna talk about that's valuable in our toolkit as refactors are attribute tests. And attribute tests are testing that after some series of action is taken place, the state of the object is the certain way that you expect it to be. So let's say we have this coin class. There's an adder reader for face. There's a toss method, which is calling some mysterious thing, this random.is thing that someone was way too clever in naming. We're giving an array of heads and tails, and my presumption, looking at this, is that random.is is some sort of improvement on selecting random values of an array. But when I'm looking at the coin class, and I wanna refactor the coin class, I don't really care about the internals of that random.is method. What I wanna test is my assumption about how that thing works in the context of this coin class. So here's a test that would test that exact assumption. We wanna make sure that when we call toss on a coin, that sometimes it returns heads. So basically a thousand times, oops, a thousand times we're gonna count, oh, a thousand times we're gonna call toss, we're gonna count the number of times it comes up heads, and we expect it to come up heads about, sorry, I keep doing that, more than 400 times. So you might be thinking, this is a flickering failure test. Yes, does not matter, not gonna stick around, we're throwing it away after it. We're done validating our assumptions. We also wanna test that it doesn't always return heads. So we expect it to be heads less than 600 times. This test passes, we have validated our assumption about what that mysterious random.is call does, and we can go ahead and refactor the coin class safely. Another interesting tool to use in refactoring, in getting refactoring efforts is what's called an approval test, also called golden master testing. I use a gem called approval, which was developed by Katrina Owen. I find it very, very helpful. You simply were included in your spec helper if you're using our spec. So let's say you have a class called drink, and it has a single attribute of name, and it initializes with the name when you call it. Inside your spec, we can write a spec like this, inside your specs, right, a test like this that says when I initialize a new drink with the name absent, I want the approval's gem to verify that the drink has the name absent. The first time you run it, it's going to tell it, and you run it from the command line just like this, it's gonna tell you, it doesn't know anything about what the golden master of drink is supposed to be. So it asks you if you wanna approve the current state of the output, the serialization of that object, and you say yes. And let's say that we go back to this drink class and we add a strength attribute to it, absent is pretty damn strong, so we wanna track that. And now when we run our test again, the serialization of that object, the deserialization of that object has changed, so approvals will tell us that something about this output has changed, do you want to either accept that change or do you want to accept this as a failure? So if you're testing interfaces between classes, you're testing the shape of the model, the shape of the data you're generating, approvals is a great way to tell yourself automatically if something has changed somewhere else that may have a ripple effect elsewhere in your system. So the testing we've talked about up to now is about testing our assumptions about how things are working. So once we validate our assumptions, question them, maybe come to some different conclusions, we're actually ready to get started on refactoring. But remember what I said, if you're refactoring without a plan, you're combing a wookie. So if we're committing to an objective improvement in the code base, we have to have objective measures that we can use to test how well we're doing. One of the first measures you can gather, it's really, really easy, is time your test suite. And this will make you very unhappy. It will make you cry. It will make you weak tears and gnash your teeth and beat your breast and pull out your hair. We're gonna, what we're gonna do is commit to making that test run time better. And that's a really, really easy metric to tell if you're making progress or not. Another interesting metric is the feature to code, feature to bug fix ratio. Look at the sprint planning that you do. How much of your sprint plan is devoted to fixing bugs versus implementing new features? That's a ratio that if you can move it even a little bit, it has dramatic impact on the overall quality of your code base and the responsiveness of your dev team. You can use code metric tools, static analysis tools to establish a baseline of code quality and monitor it over time to ensure that things are getting better and not worse. For example, we may commit to reducing code duplication or coupling, but if you don't measure it first, you have no way to prove that you've actually made the code better. Over time is key here, and I'm gonna talk about overtime a little bit more in a minute. You might wanna generate a code Atlas smell. There's a wonderful gem called Reek by Kevin Rutherford that will generate a catalog of code smells in your application like Prima.net or Intimate Knowledge or these other really cleverly named sort of things. You can actually capture what those values are and you can create a refactoring plan that says, I'm going to go after every smell of this particular type across the entire code base. That gives you an efficiency of scale in that as you learn to fix a particular problem, you can apply that solution in multiple places before you move on to some other sort of tactic and forget what you were doing. Most importantly, ask your developers where their pain points are. We actually did this in a structure. We sent out a survey. We had questions about different aspects of the code and we asked, is this causing a lot of pain, a little pain, do you wanna gouge your eyes out, et cetera? And it gave us a lot of insights into problems that were actually standing in the way of developer productivity. We also got some useless data. We found out that someone wanted chair massages and someone else really liked turtles, but most of the data we got was actionable. So you'll wanna ask your developers what their pain points are and make sure that you're concentrating your efforts on the areas that will make them happiest. And your metrics may vary. What is important to me and my team, the engineering organization and my company is not gonna be the same as it is at yours necessarily. A financial services company is going to care more about accuracy and precision than performance necessarily. If you're writing games, you care more about speed than accuracy and so on. So the metrics that you derive, the metrics you wanna track and work on are gonna vary from organization to organization. So we have a lot of metric tools at our disposal as Rubyists. I think some of them are great. A lot of them are great. And some of them are problematic either in themselves or the way we use them or use the data. One of those sort of central problems with a lot of our code metrics tools is that they provide isolated snapshots of data. They tell us where our code is at this precise point in time, which is great for identifying this file as a problem spot. This file has a high churn value. This file is generating a lot of bugs or this file has a lot of deduces in it. But a refactoring effort, you have to be able to track over time to demonstrate to yourself and to powers that be that you're making things better. So snapshots aren't really gonna help us necessarily in that regard. They'll give us good starting points, but they make it really hard to track over time. Basically, they suffer from a lack of perspective. I'm gonna skip ahead. Okay, I also have a problem with tools that give you a letter grade. Thank you. If I have a developer on my team and she looks at code climate and a class has an A, she would be within reason to say, I don't need to look at that class. It's as good as it's going to get because what's better than an A? At the same time, that developer looks at a class that has been given an F grade and may throw up her hands in despair saying, there's no way I can tackle something as large as a class that has an F grade. I think that letter grades are not actionable. Code climate, another thing that I'm not super happy about with code climate is that it kind of uses an assignment branch conditional algorithm under the hood to measure code quality, but it also analyzes for constructs that are considered complex. So this is really, really good, like I said, for figuring out where problem spots are. I find opinionated tools a little less good when you wanna track progress over time because what I prefer to have is the raw data that lets me impose my own opinions on it, especially if the metric is a little obfuscated or you don't clearly understand what the tool is telling you, you don't necessarily know that the data you're getting from it, you don't necessarily know what the data you're getting from it is telling you. One example is code climate will never give this class an A. This class does nothing yet. It's a Neo4j model. Neo4j, like Mongo, or Mongoid rather, you have a declarative schema. So inside of the file, we're calling properties to define the schema for this model. And the algorithm, the tool that code climate uses for code complexity sees each of these as a class macro and it's going to punish us for complexity accordingly. So tools like code climate, you wanna look at the tools you're using and see are they favoring one sort of orm over another, are they favoring one sort of coding style over another and see if that lines up with what you're trying to achieve in your refactoring effort. Test coverage. I have a lot of opinions about test coverage and I wanna tell you a little story. So, does anybody work for this guy? We're conditioned through Dilbert to recognize the pointy-haired boss. I think the pointy-haired boss, we have plenty of tactics for dealing with the pointy-haired boss and keeping the pointy-haired boss at bay. This guy's a little more dangerous. The bearded boss knows more than the pointy-haired boss and can call us out on RBS. So I had a bearded boss. We'll call him Jason, because I don't know, Jason. And he insisted that every new feature, every commit that we made had to have 100% test coverage. Every commit. So I was working on, and this isn't the actual project, but I was working on a project and I was spiking. I didn't know what the final interface was gonna be like. I didn't wanna write tests that tested the boundaries between my classes because my interface was shifting on a day-by-day-by-day basis. I was spiking. I was experimenting. And I wanted to commit those changes I was making so that other members of my team could see what I was doing and provide feedback on it through code reviews and through other processes. So even in general, even if I were not spiking, I would be suspicious of code coverage that was above about 85%. Because I think that if you have code coverage at 100% at all times, it's an organizational smell. But I had to eat it, this mandate, to produce 100% coverage at all times. And it made me a very sad panda. But I'm also an evil code monkey, so I got a great idea. And I'll tell you how to improve your test coverage completely with only 10 lines of code. Are you ready? We'll start by monkey-patching SimpleCov source file and finish up with five lines in SimpleCov file list. Boom, 100% test coverage. And I am a happy panda. However, we are dealing with a bearded boss. Bearded boss is gonna figure this out. So what is our strategy here? This is Ruby. The answer to many problems is to introduce a layer of abstraction. I give you Covenor, a gem that will guarantee you 100% test coverage at all times. 563 people as of last week thought this was a great idea. So you add it to your gem file, you bundle. You require Covenor, which sounds nice, right? It sounds like it's enforcing something, like a governor. You require it in your spec file and your specs run at 100% coverage and your bearded boss is really happy and throws you a little party. So that example, of course, is kind of frivolous, but my point is that not all tools and not all metrics are telling you what you need to know or telling you what you think they're telling you. Not all of them are applicable at all stages in development either. Not all of them are applicable for refactoring effort. So what are some useful tools that we can use as part of a refactoring effort? I picked on Co-Climate earlier, but Co-Climate does produce some valuable data. So I'll say Co-Climate sometimes and I'll give it a B. Um. This is one of the main graphs in Co-Climate and I want to deconstruct this a little bit. So the GPA over time, this is the only actionable metric that I see on that dashboard for Co-Climate because it tells me if things are getting better or worse. The resolution is such that I can guess, maybe I can break out a ruler on screen or something. I think it's getting a little better. I'm at a 2.79, at least I have a baseline, at least I have something I can measure against. And I sort of have a graph that shows change over time so maybe that's good. This thing, I don't know. I don't know what's going on with this. It's a rainbow that's supposed to make me happy I guess. It's very green, green is good. I don't know how to read this. Of course I know how to read this is telling me like it's something in A, B, C, D or F, but I can't, there's no way to extract data out of that. There's nothing actionable there. All I know is I have a lot of green and some other colors thrown in there. So that's not really actionable. Only drastic changes are gonna change that graph at all. And if I'm making drastic changes, I'm probably breaking stuff. And this graph is my least favorite, even though I really hate rainbows, I hate this one more. The story that this graph tells me is that, ready? Code that changes a lot may be low quality. Maybe. Doesn't tell me which code is changing a lot. Doesn't tell me what that quality is, but there's definitely a correlation. Thank you, Co-Climate. So Co-Climate does produce some actionable data, but I think overall there's some problems in the way that the metrics are displayed and probably calculated as well. Speaking of coded changes, churn is a great tool. How many people use churn? Anybody, yeah, a few people. Churn is a great tool. You run it from the command line like this. You point it at a directory and it gives you JSON output. There's a JSON configuration flag that tells you for every file how many times has that file changed. This is really cool because you can say, you can identify areas of your code base that are changing frequently and ask yourself why are they changing frequently? You can cross-reference that data with bug reports and the fixes that go in as part of a commit on a bug report. And you can use that to sort of zero in on problematic classes. So churn, pretty cool tool. Rake notes, I am surprised at how few people know about rake notes. You can run this in a Rails project or a plain Ruby project. You run it from the command line like this. Rake notes, pretty easy. And it gives you output like this. It finds all of its dues, fix means, optimizes, et cetera, comments in your code. And what you can do with this data, well first of all it's in a machine parsable and human readable sort of format which is good and useful. What you can do with this is find out are your developers being given enough time to write the code that they want to write? If there are a lot of to-dos in there, probably not. And again you can correlate that with error reports. Fukuzatsu is a relatively new gem. It's a non-opinionated code complexity tool. It calculates cyclomatic complexity which is essentially a measure of execution paths through your code. I apologize for the console command that you use to write Fukuzatsu. Do you invoke Fukuzatsu? It was not my intention. Fukuzatsu is Japanese for complexity and I didn't want people to have to type Fukuzatsu all the time so. It generates JSON output. It also generates some HTML output so you can see what the complexity is of various files. You can drill down into a file and see the complexity of individual methods and most importantly for the reason I'm gonna get to in just a second, JSON output. This is at the source file level and you can also get down at the method level as well. Another gem that's interesting and relatively new is called society. Society maps relations between classes. It looks at invocations of different classes within one class and also looks at active record relations to tell you what those relations are. The command line API is much nicer. Society from Lib, it sounds very civilized. What you get out graphically is a network graph that looks like this. Green and red for afferent and afferent connections. Let's zoom in a little bit so we can see what we're dealing with. Basically I can see what the user class is connected to. So it does static code analysis, reduces things down to an abstract syntax tree and looks for and tracks those correlations, those couplings. And recurring theme, JSON output. Why do we want this JSON output? Why do we want machine parts of the output? Because what we wanna do is database our results. We could check in our code metrics. Git is a great database for keeping track of what changes over time, but it's really hard to extract data out of Git. So I recommend a setup like this. Something that runs your command line tools, generates the JSON and posts them to a database. Once you have your data database, you can create really simple HTML views that show you how things are changing over time and cross reference them, different metrics and how they relate in a different class. So tying it all together, and in structure we're building a tool that we call Pandometer. Pandometer is pluggable. There's a little gem called PandaPounds which is a command line tool runner, generates JSON, posts it to Pandometer. Pandometer is metric agnostic. Treats all metrics the same way, assuming that a high value bad, low value good. So there are various things we could plug into it like code duplication tools, data from society, from Fukuzatsu, from rake, from rake notes, line of code counter, churn and so on. We can plug whatever we want into it. And we can look at how we're changing quality on a commit by commit basis. So at this point we pretend that we've clicked on a SHA for commit. This is all generated data. And we can see that the coupling score has gone terribly wrong, but the complexity score has improved. And again, this is made up data. So for each of the metrics, we have an individual page that we can go to. So for complexity, we can see within this commit how complexity has changed. And we can see a trend over time of how that's changed as well, like this, like that. And of course that's ridiculous change because it's junk data. We can also look at hotspots. So we get a list of six up at the top, five up at the top of the most egregious classes in terms of this metric. And then we have a sortable list at the bottom so we can compute or view the average and total complexity in this case and kind of see what those main offenders are. And we have those same sort of views for lines of code and coupling. And you can also drill down to the individual class level and see all of the stats for a given class. This took us six or eight weeks to build. There's talk of open sourcing it. Not sure if that's gonna happen or not. But really it wasn't very hard to build. If you assemble your suite of tools, or command line tools that you'd like and get them by hook or crook to generate data that's machine readable, you can build a system like Pandameter pretty easily. And it doesn't have to be very complex. It can be a single page app or what have you. So the most important thing there is gathering that metric data, being able to display it over time and being able to tell if the changes you're making through your refactoring efforts are helping or hurting. So where do we go from here? You want to decide what quality attributes you care about the most. You want to find ways to measure them. You might wanna look at Ruby Toolbox as a way of identifying different tools for complexity or for coupling or what have you if you don't wanna take my suggestions here. Then you wanna create a refactoring strategy. Figure out what you wanna do with this data and how you're gonna move the numbers. And all this will enable you to make something beautiful. We can create a world where our code is not just functional, but also beautiful and successful and maintainable and extensible. That's what refactoring is supposed to be doing for us. It's supposed to be moving us toward a place where we can feel good about our code again. And if we're serious about the work that we're doing, we want that code to be beautiful. We want it to be maintainable. We want it to be extensible. And refactoring is the way that we can get there. Thank you.