 The AV has been broken in the other room, that's why we're here, but I didn't think we would be here. I thought it was going to be in the broken AV. So the solution to that was to not have presenter notes and just mirror the display, and at that point, mostly the slides would show. I have a lot of slides. I have like 300 slides. It's okay, we're going to get through it. But I use my presenter notes a lot. A lot, a lot. So I spent the night figuring out how to print them at the hotel. Now, this is a hard problem in computer science, clearly, because when you export your keynote presentation to PDF, it gives you the great big slides, which I don't care about at this point, and itty-bitty presentation notes that I couldn't read. So I wrote Apple script to generate some Ruby code so that I could write a prong thing that would generate a PDF that I could send by email to the business center at the Clown Plaza. It was great. I'll show you my notes. They're going to be souvenirs after. So just in case, I realized afterwards that I could have reduced it by 50% if I had exported my PDF to PDF via the file print thing and done like a layout two-up thing, but at 3 in the morning, I'm not thinking very clearly. All right, I think it's time. All right. So there's this thing that we do, and I'm pretty sure we all do it. Here you notice that this bit of code over here and that bit of code over there are basically doing the same thing, except they're implemented differently. And so you decide to switch everything over to use just one of them, and about 17 tests break. You remember that there's an edge case, and so you quickly patch it conditional. Now 24 tests are failing. You realize that there's a missing dependency, and so you pull that in, and now you have about 50 failing tests, and it turns out that there's a conflict between two different dependencies. And so you swap them around so that they load in the opposite order. Now almost 80 tests are failing. And by the way, that encoding thing that always happens, it goes, hey, why are you are assuming that you've got UTF-8, but actually you're getting Latin 1, or is it the reverse? I don't know. And the method you're trying to use needs access and some stuff that's not a part of the public API, and so fuck it, you use send. And a whole bunch of stuff blows up, and it's something to do with unexpected nils, and it's been hours since the tests were last passing, and now you're considering monkey patching nil class. Instead, you do a get reset hard. The most difficult thing about refactoring isn't actually making the change. It's knowing where to begin, and what to do next, and how to make that change safely. Now I'm going to use the children's song that goes, I know a lady who swallowed a fly as an excuse to talk about decision-making and refactoring. The song is incredibly simple, but it has an algorithmic component, and this adds just enough complexity that we can talk about real-world refactoring principles without bogging you down in real-world code. And so the entire example is all just text. This is all hard-coded into a hear-dock, it's wrapped in a method, and then wrapped in a class. Now the very first thing you need to ask yourself when you're about to refactor is whether or not you should do it at all, and in this case, frankly, the answer is no. The song hasn't changed in generations. The code could be appalling, and it doesn't matter. The code works. We're never going to touch it again. Stick it in production and walk away. So in order to avoid a spurious refactoring, I have invented a spurious requirement. We need to be able to continue to generate the song the way we've always done it in its traditional form. We like that. In addition to this, we want to generate the song with arbitrary creatures, and so this establishes a perfectly legitimate reason to change the code. It also gives us an information. It tells us a little bit about the axis along which we need to support change. We don't need infinite flexibility. We just need this one type of flexibility. Now, at this point, there's another important question that you need to ask yourself, and that is, do you have tests? And here again, the answer is no, because why on earth would you test a hard-coded string? So to protect against regressions, copy the hear-dock into a test at a very simple assertion, and unsurprisingly, this test passes, which means that it is safe to start changing things. That raises yet another question. Can you make the change that you need to make without any dirty hacks? Almost invariably, the answer is no. And so we have this new feature. We want to implement it. The very first thing we're going to do is not implement it. Instead, we're going to rearrange the code. We're going to find that flexibility that we need, and then we're going to add the feature. Now, it's almost never obvious up front what that flexibility is going to look like. And so to start the process, just take a moment to look at the code for a bit. Notice stuff. See what jumps out at you. Look for things in the code that you don't like. And then pick one thing. I tend to pick either the thing that I hate the most or the thing that I understand the best. Now, what most people remark on here is all of the duplication in the string. There is duplication between verses. There's duplication within verses. But it's not exact duplication. The bits that change are all embedded inside the bits that stay the same. And so it's also stuck in this great big hear-dock, which makes it really hard to deal with a small piece at a time. And the first change I'm going to make is to introduce a little bit of indirection. I need to take tiny steps. I want to change one small piece at a time. And for this, I'm going to use a case statement with a branch for each verse. So this means that we have to loop over the range of verses, join them to get the full lyrics, and you can do it all in the same method, but we only really care about the case statement. And so I'm going to go one step further and isolate it into its own method. And then we can start touching individual strings. So what I want to do is tease apart the bits that vary from the bits that stay the same. In other words, some of this is template, and some of it is data. And we immediately run into the problem of naming things. Typically, you extract it, you name it. If you get the names wrong, it can make the code dramatically harder to understand, which makes the code dramatically harder to change. The easiest time to get the names wrong is right now when we understand the least about the problem. Now, of course, in this case, even if you get the names right, it doesn't really help, because we have everything all mixed together. And so this may seem a little bit odd. C-style string formatting using the percent method can be really handy. It provides a clear separation between the static part and the variations. I like that it doesn't matter where in the string the placeholder is. All of the data ends up outside of the template, and we haven't had to name anything. Now, if you apply this to each of the verses, the duplication between verses has gone from being similar to being identical, and all of the bits that vary are off to the side. And it's the same for the duplication within each verse. Now, each entire verse has all of the template-y stuff on the right, on the left, and all of the data on the right. Now, this is a tiny change, but it is noticeable, even when squinting at it. This is the hear-dock that we started out with. And we used a case statement to break up the individual verses with the strings unchanged. And now we've dissected the strings in place to separate the template from the data. Now, you might argue that I just shoved code around on the slide and pretended to have refactored. This is a little bit like shoving the food around on your plate and pretending that you've eaten. Kind of pointless. Some people can look at duplication and they can immediately see how to extract methods and objects. And often I just can't tell, not at first. And so dissecting it like this gives me a little bit more information without forcing me to commit to anything prematurely. And in this case, separating the template from the data has made the algorithmic part a lot more obvious. Each verse grows systematically, adding repetitions, and finally it stops abruptly. Now, most of the verses take this phrase. And they repeat it some number of times, passing in a number of different creatures. Now, it's not a lot of code, but it does seem like a complete thought. And I'd like to name it. There are three very common strategies for naming things. The first is to name it by using some fragment of the implementation, like swallow or to catch or something equally concrete. And this is the kind of name that quickly gets out of date. You find yourself with a method named blue. It returns the hex code for red. The second strategy is to name it structurally. You have recurring line or incremental sentence or middle phrase. And this is the kind of name that is absolutely technically accurate and totally unhelpful. The problem domain here isn't about phrases. It's about a little lady who inexplicably swallows a fly and then compounds the problem by swallowing larger and larger creatures. And this particular part of the song is trying to explain the reasoning behind her behavior. It's talking about why she would ever do such a thing as swallow a bird or a dog or a goat. In other words, her motivation. So we need to call motivation some number of times each time with different data. Now, ignore the fact that cows and goats are vegan. The predator in one verse becomes the prey in the next. And what we really want is a single list so that we're not writing everything twice. Ruby turns out has an innumerable method that conveniently yields each consecutive pair given a list of critters, loop over them, pass each pair to the motivation method, join it into a single string. And this chunk of code also seems like a complete thought. And I'd like to name it. Here we're talking about a sequence or chain of events. It describes a sort of bizarre food chain. So the algorithmic portion of the code is all isolated into these two methods. And there's a trade-off here. We've taken something that was blindingly simple and we've complicated it. We've paid a price. And we've gained a number of benefits. We've isolated a small and very cohesive piece of code. We've named an important concept of the food chain. And what's more, this implementation shows the bones of the algorithm. It expresses the underlying structure of an essential piece of the song. This structure was indiscernible when everything was just straight-up strings. So two things just happened. We extracted and named the algorithmic piece. Oh, wow. That was two. And one thing didn't happen. We didn't use it anywhere. Technically, this is called a parallel implementation. And that sounds very fancy, but it means that I just didn't know what I was doing. And I didn't want to screw things up while I was figuring it out. And so after making up a parallel implementation, you should be able to just swap it in seamlessly. It will just work. And sometimes it doesn't. Or it just failed. And so that means that there's something that we haven't understood yet about the food chain. And it turns out one part of the chain is not like the rest. It's what you'll usually see in a code base when you have a rule with an exception is a conditional. Now, these are kind of problematic in a foot in the door kind of way. You have a conditional. You blink. Now you have 12. Somebody's going to get hurt. Within the context of a refactoring, conditionals can be a useful tool. First create the conditional, then name what it represents, and then use what you learned to make a decision about the next step. Now, that sounds deceptively simple. And there are a couple of pitfalls. The conditional itself should contain the smallest possible difference. Now, it makes sense if you think about it. If something is the same in both blocks of an if statement, then it's not really conditional on anything. And so, if we keep just the wriggly bit of the spider, then there's nothing to put in that else branch. Now, you could drop it. That might lead you to think that you're naming just the exception. And you're not. The conditional represents two variations on the same concept. The rule is one variation. The exception is another. You're not naming two different things. You're naming one single idea. So, in the song we're talking about the critters' distinctive features or some sort of qualifier, it's just that some critters aren't all that distinctive or special. If you leave only the bit about the spider, then the name will almost, inevitably, end up being about the spider. If you name a fragment of an idea, it introduces not just indirection, but misdirection. This is a magic trick. It gives you an illusion of understanding. This sort of deception makes it very, very difficult to refactor. So, by forcing yourself to consider the symmetry, you name the whole concept. This method needs an argument. Prey is not the right name here. Prey is about hunting. This has nothing to do with one animal killing another for food. This is about the critter itself. Take a moment to look at the conditional. It contains the smallest possible difference. It's symmetrical, containing both the exception and the rule. The name doesn't misinform. It's about a general concept in the song. Notice that the result of the method depends solely on the argument that's passed. This method could live anywhere. It could be a global function for all we care. It doesn't have anything to do with the song. So, we've got this critter string, and then there's more stuff associated with it. We're trying to decorate it with more behavior. Instead of passing the critter to the qualifier method, we should invert it and send the qualifier message to the critter. And to do that, critter has to be an object. The data can all be extracted from the verse. New up all the critters in the initializer. And predator and prey are now no longer just strings. They're objects to which we can send messages. And notice that once we found the critter object, the conditional just went away. It won't always be the case. Don't be afraid to introduce something that makes you feel a little bit slimy. Refactoring isn't about making the code pretty. Refactoring is about understanding the problem better. And sometimes you have to get a little bit dirty to get clean eventually. So, the motivation method was so nice just a moment ago, and now not so much. This chunk sticks out. It's not terrible, but it is too much information at the wrong level of abstraction. These are intimate, nitty-gritty details, and the motivation method shouldn't have to care if we extract it and we have to name it. The only thing I could come up with that kind of works is epithet. I will not tell you how long I spent in the source before I came up with it. And worse, I've given this talk three times and I mispronounced it every time. I just realized a few days ago how do you actually pronounce that. So, epithet. Since this is all about the critter, it belongs in the critter class. And this is pretty good. A lot has happened. We tried to swap in that parallel implementation, and we failed. Focusing relentlessly on a small symmetrical difference isolated a meaningful idea. And that idea was the seed of a tiny object, the critter. This abstraction was not at all obvious at the start, but now having found it and named it, it feels inevitable. So now our parallel implementation is finally complete. We can look at the case statement containing the original implementation, identify the algorithmic repetition, and then swap it out with the call to this new implementation. And this time the change is seamless. The test passes. And this is better. The case statement has gotten noticeably shorter than it was, but we've ended up with more code and more complexity. And actually, you'll notice that there's a continued up there on the second squinty thing. That's all one class except the three little lines of critter there. Five lines of critter. So, to be fair, this is pretty mild complexity. There's probably nothing here that you couldn't understand, even if you were just a little bit drunk. This whole time we've been dealing with duplication in the strings of the song. But it wasn't obvious duplication. It was kind of... We've been handling it indirectly from the edges. When we started out there were fragments of plausible duplication throughout the case statement. Now there are whole chunks of unmistakable duplication. Even the data has gone from being mostly a jumble to having a recognizable pattern. It's tempting to focus on that sameness, to extract it and to give it a name. Resist that temptation for as long as you can. Identify the smallest difference. See if you can make it go away. And then ignore it and look at the remaining differences. The difference on the first line is the name of some critter. We've got a bunch of critter objects. We just need to find the right one and then get the name of it. And this will take this tiny difference and make it the same everywhere and we can ignore it. And that leaves us with one last difference between the different blocks of the case statement. And this one is something that we've not yet named. Again, this is about the critter. And it's kind of like a little bit of a commentary or an aside. So we'll need to pass more data when we're creating critters and then use that abstraction in the case statement. And now this too is identical and we can ignore it. Something interesting has happened. All of that sameness that we've been ignoring can now collapse into a single block. Sameness is a trap. It's a distraction. Differences are the parts that are interesting because they're fragments of some larger idea. You want to encapsulate the concept that varies, not the concept that stays the same. And once you've encapsulated the differences, then the sameness either evaporates or it condenses into something obvious. So all of this started with identifying duplication as the ugliest, best understood problem. We added a direction in order to be able to start dealing with individual strings and then encapsulated the algorithmic portion of the song discovering the critter object, which is tiny and very, very cohesive. And finally, we focused on all of the little differences between the different cases, making them go away and allowing us to collapse the entire set of identical cases. But we're not done yet. We start out with a gigantic string. We've extracted and extracted and extracted and we've spent a lot of time down in the weeds. It's really useful to take a step back and look at the bigger picture. So this is the song. It has five methods, three of them are private. All of that private stuff seems like it could be its own thing. Now, a song does actually need all the old code. Otherwise, the test will blow up, but we can ignore it for a moment and just focus on this parallel implementation. Before calling this from song, we have to do, well, one thing mainly. It has to actually work. Otherwise, the test will blow up. And this doesn't actually work. I want to do a second thing. I want to make the API of that class, the API that I want, and I'm only doing that because it's a hassle to fix it later and it takes longer and I only have 40 minutes with you, so I'm going to do that first. Any other changes can wait until we've integrated it because then when we run it, the tests will have our back. The reason this is not working yet is because the first and chain methods call last I on critters and the verse object doesn't know about critters. This is pretty easy to fix. Set some initialization logic and then give it a reader and with this change, verse has what it needs. Actually, verse has more than what it needs. It doesn't need all of the critters. It just needs the critters for that verse. Now, this has implications for the API of the class. Notice that verse and chain both take I as an argument and we can get rid of the parameter and instead get I from the size of the critters array and it doesn't fix the most egregious thing about the API of this class. This is not the verse's verse. This is the string representation of the verse object. Now, there's a Ruby idiom for that, which is 2S, which is very nice and it makes the public API perfectly reasonable, especially if we make the last two methods private. So, to swap the parallel implementation into the song class, replace the call to the old verse method with a call to the new class and you don't actually have to call 2S here because join will do that for you. So, then you can delete all the old methods. This passes the test. We now have three tiny objects. Each one is focused and cohesive. The critter is minuscule. It's basically a tiny wrapper around the raw data that we initialize it with and then the song class is a little bit bigger. Most of this consists of the array of critter data and this class knows not very much. Four, maybe five things. It knows what the raw critter data is. It knows how to transform that data into critter objects. Implicitly, it knows who eats whom in the food chain because it's the order of the raw data in the array. It knows how to instantiate verses. It knows how to combine verses into a complete song. It no longer knows what those verses are or how they're constructed. That is the domain of the verse class which is significantly more complex than anything else in this code. This class knows how the sausage is made. It knows which verse should be long, which verse should be short, how to combine that data into templates, what the algorithm actually is for the food chain. It knows a lot. I am not saying that you should always split things up into new methods and classes, but it can be useful to ask yourself, if this thing that I have were two different things, what would they be? If you can come up with an answer that's not too far-fetched, it could be worth exploring and you can always inline it if you hate what you get. The verse class is looking pretty decent from the outside. It's pretty gross on the inside. The most glaring thing here is the case statement, which is still sticking around. It has some blatant duplication. Now as before, ignore the sameness, focus on the smallest difference. Create a small focus conditional for just this difference, making sure it's symmetrical, and then name it in this context. This is what the narrator is summarizing or recapitulating, everything that has happened up until this point. There are two verses where there's nothing to summarize because there's no backstory yet. The very first verse, nothing has happened yet, so there's nothing to say. And the very last verse, the little old lady is dead, so there's not much to say about that either. Having extracted the recap, the duplication in 2S can be collapsed and named, and this is the incident that the verse is about. This is the main deal. 2S tells a pretty good story with all of the details relegated to private methods. And so this is a pretty decent division of responsibilities. The blatant duplication is gone. There are a number of smaller, annoying things. For example, there are calls to last eye on critters, and that's totally redundant because the critters that are stored in the verse are already the last eye critters, so you can delete those. And this removes almost all of the places where we reference eye. There's one reference left in the case statement, so you can just switch directly on the... Sorry, critters length. And another little thing is the first critter in the list, which is the main character of the verse, and it would be really nice to name it. So, there is one more thing that I don't like. Go ahead and squint at all of the templaty stuff. Notice that the templates are all strings. They're all red. We're interpolating stuff that varies, abstractions. That tends to be all black. And here, there's one string. There's one thing here that's not black, and it really sticks out. It's a hard-coded string, and you could argue that if it doesn't vary, should it even be data? Maybe it should be a part of the template? There are a couple of arguments for using an abstraction here. The first is that we have this exact string in two places. This is the aside for the fly. So we've already named it. We have an abstraction for it. The other really good reason is that the whole point of making all of these changes is that we might not even have a fly. So we're going to just use that abstraction. The verse is looking pretty good at this point. It has boilerplate to set everything up, then 2s, which is called by the outside, everything we care about in terms of the API, and then 2s calls incident and recap, and then recap calls chain, and chain calls motivation. And it can feel like all of these itty-bitty, kind of annoying, insignificant changes are not worth worrying about. But all of these tiny irregularities and duplications and inconsistencies, they add up. It's hard to measure the impact of one tiny change, but the difference in terms of clarity and understanding can be staggering. So the code is pretty good. I have one complaint left, and that's the case statement. In the very beginning, I just added it as a way to sort of start getting at the strings. I thought it would be temporary. I hate that it's still there. So the final refactoring is one of the classics from the refactoring book written by Martin Fowler. It's called Replace Conditional with Polymorphism. And this is a step-by-step recipe. Refactoring is not about achieving maximum design pattern density. Refactoring is about balancing simplicity and readability and changeability. Don't refactor because you're embarrassed about your code. This is not about aesthetics. Don't refactor because there's some shiny design pattern you've been wanting to try out. This is not academic. Don't refactor because you imagine a beautiful future with ifs and what ifs, and it would just probably really come in handy. Your guess about the future is as good as mine, which is to say total shit. We are more likely to be wrong than we are to be right. Refactor because you need to make a change. Not a hypothetical change, an actual change. Now, the whole point of the refactoring wasn't to make the code beautiful or satisfy some academic itch or craving. It was to fulfill a new requirement. In addition to the age-old version of the song, the client wants to introduce a jungle-themed song and an ocean-themed song, and I do recall there was something about a squirrel. The purpose of the refactoring was to make this not only possible but easy. And so we need to evaluate the code from this perspective. We need to be able to send in any set of critters, and right now we've got the hard-coded critters for the traditional version of the song. And this is fine. The client likes the traditional version of the song, wants to be able to keep generating it, and all of their code should still work. They also want flexibility. So this is a pretty straightforward change. If we give the constructor a parameter, we can default it to the existing array of data and then loop over the argument instead of the constant. And so this would almost work exactly the way you want it. The problem is that we're making assumptions about how many critters are going to be involved. Now, that is also an easy fix. We can loop up to the size of the array. We still have a problem. And this one is not so easy to fix. There's a hard-coded 8 down in the verse class. The verse probably shouldn't know anything about anything outside of that one verse. Shouldn't know about other verses. Shouldn't know about how many verses there are. And we have a few options here. Most of them are really gross. So let's talk about them. On the one hand, in addition to telling the verse what critters to use, we could tell it how many total critters there are. There's no worse than hard-coding the 8, but also no better. More or less equivalent would be to pass all of the critters and then also pass i. This pushes even more knowledge about the big picture into verse, which is unfortunate. But on the other hand, there's a trade-off. Song knows a little bit less. That could be good. Another option would be for song to figure out whether the verse should be long or short and pass some sort of token to the verse class, which duplicates knowledge. Now, verse knows about the long and short. Song also knows about the long and short, and they both have conditionals. Great. Not great. Song shouldn't know about long and short templates. Verse shouldn't know about total verses. And so what we need is a little shim between the two. Someone who can know about long and short templates and know how to make verses in what they need. We could have two really short, really stupid verses. We'd have this short verse, a long verse. Neither would have to know about each other. Neither would have to know about the total number of verses, and as long as both verses have the same interface, then the song doesn't have to know which verse it's dealing with. So we need to turn one verse with one conditional, containing two branches, into two verses with two recap methods, one for each branch of the conditional. And since the long verse has a chain, it also needs the food chain stuff. So there's a bunch of common stuff, and we could duplicate it, but that seems like kind of a bad idea, so we could stick it in the short verse and then the long verse can inherit from the short verse and override recap and define the food chain. Now, up in song, we need to call this in between thing, this shim, and it will know about all the critters, and it will switch and figure out which verse it is, and then instantiate it with whatever it needs. Return it back to song. Some people would call this, well, over-engineering for one thing, verse builder, factory. I don't really care. I like verse four. It totally works. So we used to have code that was incredibly straightforward. We've ended up with code that is still pretty understandable, definitely more complex, and everything used to be all in the same place. Now it's spread out across a bunch of tiny methods, classes, things. We've traded simplicity for flexibility. Now, if we'd done it right, we should be able to generate lyrics with arbitrary critter data. The simplest way to check is to write a small test, and when I say small, I mean as small as possible, but certainly no smaller. The test data needs to cover all of our edge cases, everything that's relevant about the song, everything that's interesting about the song, but it shouldn't be realistic, because that would give us redundant data, and it would obscure things and distract from what's important. So instead of lions and monkeys and lizards, the data should be simple and contrived and really, really uninteresting. So with this data, we should end up with these lyrics. The first and last verses are both short, the middle verses are long, and one of the middle verses has that extra qualifier thingy. So if you stick this whole thing in a variable, you can assert that the code does the right thing, and it doesn't, which was a surprise when I was preparing this talk. It's pure luck that my contrived, really silly, uninteresting boring data happened to uncover a bug. Now this fails because we have the wrong indefinite article, that means one A zebra and alligator, and it's not a big deal. If this came up in production, you could fix it in a heartbeat. So everything passes. Slip that right in. Kent Beck once said, make the change easy, then make the easy change. First, refactor the code. Change the structure without changing the behavior. Keep doing this until you can add your feature without hacking it in. And often you'll find that the new requirement just takes a moment to implement and can kind of feel like a bit of a miracle. So actually Kent Beck didn't say this. He said almost this way. He said, make the change easy, warning this may be hard, then make the easy change. Refactoring can feel like a little bit of a dark art. And it kind of is. There's a lot of gut feeling involved. So you're recognizing stuff you don't like. But that gut feeling, that intuition is learned. Read about code smells. Watch Sandy's talk, which is next. Look at code. Do code reviews. Practice refactoring. Try stuff and then get rid of it and try something else. Ignore the sameness for as long as you can. Don't name the fragments and slivers and shards of ideas. Name symmetrical differences. Name the whole concept. It should feel like you're discovering your abstractions, not inventing them. A good abstraction feels obvious in hindsight. It's already there. It's buried in your code. And it's up to you to unearth it. Thank you. We have four minutes. Supposedly that's for Q&A. The code is up on GitHub. Separate commits, one by one. You can clone it, download it, look at all the changes. It's actually in a slightly different order, but that's because I kept changing things as I was working on the slides. It doesn't really matter which order you do things in. You usually end up in the same place. I'm writing a book with Sandy Metz. I made this pretty slide and she was like, no. That's what it actually looks like. I failed to do the thing where I designed my slides afterwards, so I have this odd red thing going on. Okay, the book is awesome. It's also about a children's song that is algorithmic. It's about the 99 bottles of beer on the wall. And again, it has that same algorithmic complexity that gives you enough stuff, difficult stuff that we can talk about good design and refactoring techniques without teaching you about investment banking and shipping containers and things like that. If you want to practice refactoring, a great place to go is exorcism.i. I made that. It's awesome. So the story about exorcism is that, oh, lots of languages. You can do exercises. The real story about exorcism is that once you've done an exercise, go look at everyone else's code and start developing that sense of gut feel about what trade-offs there are, what's easy, hard to understand, and have that conversation with people about how that code could be more interesting. Thank you.