 So this is not a normal thing that happens at RubyConf, but my name is Sam, I'm the track director for the testing track, and I'm actually really pleased Justin is here today because I owe him a huge debt of gratitude. I got very, very ill immediately before RailsConf this year, and I couldn't come and make my talk. And exactly two days before the conference, I texted Justin who at the time was many, many time zones away, I was like, Justin, listen, there isn't a backup speaker who can give a testing talk in time, and Justin was like, all right, I'll do this, that seems fine. So why did you reject my talk in the first place, Sam? So I just want to say a huge thanks for Justin for being here, he's definitely having a little bit of a rough week, as I think many of us are, so could we all just give a huge round of applause to Justin and welcome him to the stage? I was touching. Come on! It was touching, it was not an ironic, pithy statement. Those are coming. All right, let's roll. This talk is called surgical refactoring. My real name is Sam Phippen. If you don't get that joke, you can call me Searls. That's what my face looks like on the internet. If you got any feedback from this talk, you can reach me, Justin, at testdouble.com. That's right, I come from a company called Test Double. The way that we work at our company is we actually are consultants that work on existing engineering teams to just get a lot of stuff done so that we can create some slack in the system so that we can pay down technical debt, make things better, refactor our goal in life is to make the world a better place and make software a little bit less broken for everybody. If your team could use some of that help, you can say hello at Test Double, and we'll set up a call to talk. So this is exciting. Like, there's a national Ruby conference in Ohio, and I live in Ohio. So welcome to Ohio, everybody. If you live in Ohio, welcome to you too. A lot of people who know me know that I live in Columbus, and so they expect that I know all about Ohio stuff, that I'm on Ohio, and I'm not. I'm actually from Michigan. That's right, I forsook the beautiful, awesome landscapes and lakes and everything that fill up the memories of my childhood for the rich cultural heritage of Ohio. This is the Circleville Pumpkin Show, and I'm actually not joking. Ohio has culture. I think where I found it best is in its comfort food. There's a lot of fun stuff, because you go to the Circleville Pumpkin Show, and yeah, there's these hilarious, silly pumpkins everywhere. But they also have deep fried buck eyes, which are chocolate peanut butter candies, and it's fantastic. Or if that's not your fancy, they have chocolate-coated, chilled pumpkin cheesecake on a stick. Very creative, very intense culture here. But it's not just desserts, like no cuisine is safe. These are one of my favorite dishes in Columbus are Ohio nachos. They are kettle chips covered in queso and sprinkled with breakfast sausage. It's very hashtag health. But it's not because we're into food, not because we got great ingredients. As far as I know, all we do is grow corn and animals that eat the corn. So it's not like it's about the ingredients. In fact, it doesn't really matter about the ingredients, because we're just going to deep fry it anyway. This is a grilled cheese sandwich. It is a grilled cheese that contains fried cream cheese jalapeno poppers on the inside of it. And then once they build the whole thing, they dunk the sandwich in pancake batter and deep fry that and serve it as a Monte Cristo sandwich. So that's one of my favorites. That's an example of American exceptionalism right there. The other thing about Ohio culture that I've learned since moving here is I feel like Ohioans are really competitive and in the context of food. You know, if there's food and a stopwatch, someone's going to find a way to turn into a contest. And that's something that I've learned to accommodate in my life. So Monday this week, I was just feeling like I just really wanted a good sandwich. So I thought I'd go to the neighborhood deli. I've got a lot of awesome restaurants. They're all in strip malls and they all have really generic names, but they're really good. So I wanted to go to my neighborhood deli, which of course is called Neighbor's Deli. And I walked in and I just noticed for the first time that they actually have a competitive challenge sandwich that they make. And you get your money back if you buy it and you eat the whole thing in 20 minutes. And I was like, all right, well, that's interesting. But you know what, honestly, I'm better than that. Competitive eating, this is wasteful. That's, you know, I used to have weight issues. That's not something that I'm going to do. I'm just going to get a normal little hot pastrami sandwich. But then I looked at the list and I was like, well, you know, they got like, I could sample all the different types of meats. And then you might catch this up here. It's like, it's called the model lift. And I said, I know a few things about picking apart monoliths. Maybe I can do this. And so then I thought about all of you today and I didn't want to let you down by skipping this challenge. And so this is what got served to me, zooming out a little bit. Like most monoliths, with which we are familiar, it is falling over on itself and it was not exactly built to spec. There's like a pound and a half of corned beef at the bottom. So all of that for less than a grilled cheese in San Francisco. So what did I have to lose? Of course, I looked at that thing and I was like, nope, I'm out. But maybe I picked up some Ohio culture because the waitress said, all right, 20 minutes go. And I was like, yeah, I got this. So that's me and my sandwich. And I dove right in and, you know, developers were familiar with just stupid things under extreme time pressure all the time. 20 minutes, how do I, okay. So I made a huge mess. It was disgusting. It didn't even taste that good. And I was feeling sick. And I knew like if I pushed any harder, I'd still fail and just be sicker. So I was like, I'm going to quit. And there was this elderly lady who'd come in and she was just watching me and she said, she said, you can do it. In a sweet like grandmotherly tone. And I said, I really can't. And I shouldn't. And why are you rooting me on? This is unhealthy. And she didn't know. We believe in you. And she was really serious at that point. She's really as like, well, I give up. So don't. But I don't know if that is also part of Ohio culture, that there's just this sort of, you know, unjustified faith in others. Sam mentioned it was a rough week for a lot of us, I think. And the reason that I do these conferences, the reasons I'm here today is because I do believe in all of you. And I believe that you can do great things. So I just wanted to say that before we got into stuff. Anyway, this is a talk about failing to conquer monoliths. So I propose nothing. First, let's back up and talk about some context. I love Ruby. Ruby is obviously a super successful awesome language. If you think about what made Ruby successful in the early goings, everyone's really happy. People were building gems just for fun, just for the accolades and the attention you get for being associated with Ruby. And the thing about languages is that the early days of success are determined by your ability to make it easy to make new things. So because gems need to exist, you need to attract people to the ecosystem, and it needs to be easy to learn and pick up. And Ruby was awesome at that. But later success is fundamentally different. We're 20 years in now, right? And post more than 10 years of Rails. People are more critical. It's an incumbent. It's not the new shiny thing anymore. People are using it at work. It's in much more serious context. And a lot of money is riding on these systems being long-term maintainable. It's a very different mindset, isn't it? And so later success for languages, you look at something like Java, I would say it's based on whether you're able to make it easy to maintain old stuff. And I really don't feel like Ruby's ever excelled at that. And so my challenge in writing this talk was to ask myself, is there anything that we can do as a community to make it easier to maintain old Ruby code? And I thought about it. And I thought that the best way to, like, think about that, to pull that thread would be to let's refactor some legacy code, because that's the context in which I think most teams struggle. So if you're not familiar with terms, the word refactor is a verb defined by Martin Fowler as to change the design of code without changing its observable behavior. That's a great definition. I'll add to some purpose to it by saying, typically I refactor to change code in advance of the bug fix that I want to make or the feature that I want to implement so that the job of doing that is easier. It's like pre-factoring. It's getting the system in so that I can slot that new feature in later. The other term here, legacy code, isn't well defined. It has a lot of definitions. Some people just use it to mean old code, maybe code that doesn't have tests, Michael Feather's definitions, code that doesn't have tests, or code that we don't like. But today my definition is a little bit more specific. I say legacy code is code that we don't understand well enough to change with confidence. Whether you have tests or not, that's I think the one that best discriminates what it's like to maintain old stuff. Today we're going to talk about refactoring some legacy code. If you're here, you're going to talk about refactoring, you probably know refactoring is hard. I think refactoring legacy code is really hard and what makes it really hard is it's easy to accidentally break unrelated functionality because there's so many variables, so much complexity all tangled up. And as a result, most of us view legacy code refactors as a fundamentally unsafe thing to do and no fun. Additionally, they're hard to sell to people. The way I'd visualize that is a little two axes graph. Business priority on one axis and the cost and risk of implementation on the other axis. In the top right, you could say our new feature development. They're very important, but they're also expensive. And in the top left, you drop bug fixes. Also important to the business, they're relatively less expensive. Bottom left, I'd probably put testing. Certainly less important than those two. Obviously important to us, but also not so expensive that the business doesn't let us do it. And what goes in the bottom right? Well, if I had to put it anywhere, I'd put refactoring in that corner, right? It's very expensive and it's a nebulous business priority. So we don't have to sell our businesses on letting us build new features. That's probably why they're paying us a salary in the first place. And it's probably not going to be hard to sell but it's become normalized culturally now in software. And so typically we're afforded time to do it. But it's still really hard to sell people on refactoring and habitually paying down the technical debt in our projects. You think about why it's hard? I mean, it's because we can't necessarily predict how long a refactor is going to take us. From the business's perspective, we just said that the definition of refactoring is you're not changing the observable behavior. So if you spend a month refactoring something, they're not going to tell if you were going to refactoring and just playing video games. So it takes a lot of trust. And additionally the areas that need the most refactoring tend to be all tangled up. And so as a result, when we're refactoring a particular area in our code, it's not safe for other people to be working there. We have to stop everything so that we can merge it in, otherwise we'll have all sorts of merge conflicts. And so it's very disruptive to do a lot of refactoring. And what you notice is that complexity is correlated with importance. So the more complexity in any bit of code, the more branches and conditions and everything else that you've gotten thrown in there, it didn't get there by accident, got there because it was really important to the business that it cover all those cases. And so the things that need the most refactoring are also things that we're most afraid to change. So if you think about sure, it's down there, it's relatively low priority because it's so hard to sell. What could we do to make refactoring a better sell to the business is the first thing we should think about. Like how could we raise the priority? Because in their minds refactoring feels like road construction. We're telling them they're going to get less of what they need more slowly, but money is going to continue to fly out the door at the same velocity that it normally does. And we have a few strategies for dealing with it. None of them good. First, we can try to scare them. We can say, hey, well if we don't refactor, then someday we'll need to rewrite everything. And that's far in the future. That's too nebulous. Or your maintenance costs will be higher. And that doesn't help because it's hard to quantify, right? The next thing people do is they try to absorb the cost of refactoring as part of their just, you know, every story. Like for instance in this little pie chart we probably spend some amount of time planning some amount of time doing development and some amount of time testing. Well the team could just agree like for every single card we're going to grow the pie and add habitual refactoring and we're going to do refactoring as part of every single story. And that would be fantastic except it requires extreme discipline which probably means it doesn't scale and it won't work on every team. And additionally, if the team is ever under any kind of time pressure, which is most teams, it's going to be the first practice that goes out the window. So I don't think that's going to be successful. The most common thing I see as a consultant is the strategy to take hostages. So the business is like hey, I've got feature one, two, three and four and I want them in this order. And then we say, oh Contreraser, you're not going to get feature two until we pay down this technical debt. And you're not going to get feature three until we pay down that technical debt. And I don't like this because it's adversarial, right? It blames the business for having rushed us in the first place. Additionally, did you know that software developers are like highly paid and expensive to businesses? So it erodes their trust in us if we tell them that this thing that we just built them six months ago was actually shoddy junk and we needed to go fix it. And if we get in the habit of telling them that eventually they're going to probably find new developers. So, yeah, refactoring is hard to sell. This is not a talk about figuring out how to solve that problem because I haven't yet. I think that there's a lot that we could do, but a lot of it's also cultural. So let's just give up on that for now. And let's talk about the other axis, cost and risk. Why is it so costly and risky? Well, from a developer's perspective, it's a lot of pressure to do refactoring, right? You have to keep a lot in your head. It's really scary, the scariest, darkest, dankest basement of the code base. You feel like you're under a lot of time pressure because, you know, getting any sort of allowance to spend the time on this stuff can be difficult, like we just talked about. And the tooling isn't really that great. Like most open source tooling and libraries are written by people who don't want to be thinking about the legacy mess that they have. It's about creating new stuff. That's where most of our attention goes. We don't think of it as being 80% of our job, even though it probably is. So the tools aren't that great. And so that makes refactors feel really scary. And if I'm on any kind of mission, if there's any theme to the work that I've been doing in my career, we try to find all the scariest things about dealing with the complexity of software development and somehow make them less scary so that I can be productive. If you're on board with that message, I think that you should buy my book. Working on a book for a few years, it's called The Frightened Programmer. That is a joke. I am way too afraid to write a book, so that book does not exist. Let's talk about what we can do to make refactoring a little bit less costly. Well, the first thing that we already have is the book refactoring patterns and that sort of approach, where what they do is they define a handful of operations, like extract method, pull up, push down, or split loop refactors. And these are safe operations that we can do in our code, but they're made safer when we have good tools through language introspection, like static analysis and Java. My favorite thing about using Eclipse IDE with Java is I have this right-click menu and I can do all these things, and I'm basically guaranteed that all the references can be matched, and I don't have to spend all day gripping around for the changes I want to make. I can't do that in Ruby, but even if I could, these sorts of operations aren't expressive enough for me to radically redesign stuff. I can move things around and make things a little bit better, but I don't think that refactoring patterns in and of themselves are enough of a solution. Characterization testing for refactors, making refactoring easy, was pioneered by Michael Feathers and his seminal team. It was published in 2004, but basically all it says is that treat your legacy code like it's a black box, and then put a little test harness around it, write a test just for that black box of code, and then just pass arguments to it, and then listen to what the result is back and write a test that locks that in with an assertion, and then do it again, and do it again, and pass in different arguments to try to cover every single case that you might anticipate and lock it all in with a whole bunch of assertions. There's no wrong answers. If you find a bug or a weird return value, you might make a note or file an issue, but the goal here is to just crystallize the current behavior of the code and not try to jump ahead to fixing stuff. And once you have that, that black box becomes transparent enough. You can go in, you can be as aggressive as you want, you can delete it all if you want, and you can refactor new units that you do understand, that you would be able to change with confidence, and then you backfill those with unit tests that actually want that system to work. Whereas the characterization test has no clue of that. But you know what, that's a lot of testing, that's a lot of work, that's a lot of sunk cost, especially because the next step after you've done all this is to blow away those characterization tests because they don't understand how the system is supposed to work, and if you were to keep them around, they'd be like an albatross just holding you back and increase the carrying cost of the code. But if you're a team that has a lot of legacy code, you probably don't have a lot of code coverage. So you look at that, it's really hard because you just finally saw your code coverage just to go up, and now Justin's telling you to go and delete all those things, and you're gonna see it go down again, that's depressing. The other thing I've seen a lot of teams that try to do characterization testing, they fall into this trap of only half finishing this stuff, so you end up with a whole bunch of characterization tests that they come to rely on, but they never actually follow through with fixing anything, and just adds to the nine-hour build process of semi-quasi integrated tests. So that's, it's very helpful, and the third approach I see people use is akin to A-B testing, or maybe we've learned it from A-B testing. Basically, you got the old code over here, you write some new implementation to code, and then you put a router in front of it, and you say 20% of the time we're gonna go to the new code, 80% of the time we're gonna go to the old code, and that, you can limit the amount of damage the new code can do, because you're just releasing it to some small group of people. GitHub has written a gem called Scientist that's like an A-B testing tool, but specifically for this kind of experimental activity. And, you know, I think it's great, some concerns I have are it doesn't answer any questions about how to do that big rewrite. You know, it's moving in a very big step, and that can be, you know, difficult to figure out everything that it would need to do, and how it would have to behave, and you need to have very sophisticated monitoring and analysis to be able to understand what is happening to those users who are using the new code path, which not a lot of us have. And finally, you have to be working in a business domain like on GitHub. It just goes down a lot, right? And that's fine. I'm not kidding. If it was a financial transaction, that wouldn't be appropriate. If it was a healthcare thing, that wouldn't be appropriate. So GitHub is safe to experiment with. And some of our domains are that way, and if yours is, then you can use that sort of approach. But it's not gonna work for everybody. All right, so if you do that as a spectrum with characterization testing on the left, A-B experiments on the right, so you can see this weird divide where working effectively with the legacy code is great in development, a little bit painful in testing, has almost no advice about what to do about staging and production. Something like scientists or the A-B testing approach doesn't tell you how to develop that news thing or test it locally. It might be really useful in a staging environment where you can experiment and see how things are working. It might be a little overwhelming in a production environment, but it answers those questions much better, obviously. And so thinking about this talk what if one tool could just give me a good development story, a good testing story, a good staging story, and a good production story, and kind of carry me through the entire life cycle of a refactor? Because I'm not scared at just any one of those stages. I'm scared of all of the stages of a big refactoring, and I want something to just carry me through the whole thing. And I was thinking, like, what would that tool look like? Does it exist? And I did a lot of research, and I did a lot of thinking, and I did a lot of procrastinating, and then nine months passed, and then I was like, oh crap, I have to give a talk on this. And I thought, like, I should give, like, a standard issue just-in-circles talk of 700 snarky slides and say, like, how I think things should be done. But then I had this cool idea that was like, hey, I could just write a Ruby gem that actually helped people. And so I did that instead of just writing a whole bunch of snarky slides. I practiced this, you might be familiar with this particular methodology called TDD. So I used TDD to build this gem that we're going to talk about for the rest of the talk. And this is one of the things for talk driven development where you submit abstracts that then commit you to massive amounts of work. And what you check out at the other end of this is a gem that we call Suture. So it's up on GitHub under testables or you can find it there. That's what GitHub looks like. You can install it with gems. That's how you install a gem. And the metaphor here is that refactors can be treated like surgeries. So the similarities are, you know, surgeries try to solve these intractable problems and make us feel better. They require careful upfront planning. They leverage these tools that are very different and used differently in different contexts. Just like we want to use this in these different environments and these different modes of thought for development. They follow very clear processes, not for an arbitrary reason, but because there's so much variation and there's so many distractions that following a clear process can help you understand, you know, and they have a plan for like long-term observation. Obviously, while you're under the needle, you've got a whole bunch of people looking at everything. But then in your follow-up checkup, it's like a little bit step back and then you might have years of other follow-up that's just like, you know, a lower resolution measurement that everything is okay and safe and successful. Of course, like surgeries, refactors can get pretty bloody, too. I mean, like, it's... things can get messy and it's a way to control things that get messy. So, suture, the way it works, is just kind of like nine features that we're going to talk through that are each there to help you out through this particular workflow and hold your hand. The first step, we plan out the refactor and then we cut what is called a seam in the code that is like a call site to the legacy code. Then we record all the interactions that pass through that seam, the arguments passed in, the results that are returned. We validate those recordings against the old code to make sure we can replay them back that the recordings are valid. Then we can refactor as aggressively as we like into a new implementation, which we can then verify by replaying the new implementation against the same recordings so like locally we're pretty confident. Once we get up to staging, because we've got all this stuff configured, we can actually just run through in a staging environment the same critical path of code, the old stuff and the new stuff side by side like a double entry accounting and we throw up an error and explain what just happened and we can use that same configuration then in production so if anything blows up in an unexpected way, we can fall back from the new path to the old path so that users are not interrupted by our mistaken or buggy refactors. Finally, when we're confident everything's done, suture is meant to be deleted. We pull it out and then we just point everything to the new code path and you can call the refactor complete at that point. That's the process and the first we're going to talk about how to plan. Today we're going to have two example bug fixes. First, we're going to have a silly little calculator service. Forgive it, but all of these code examples are very contrived for the purpose of clarity. This calculator service is supposed to be able to add numbers but it doesn't add negative numbers correctly. It's an example of a pure function. Pure functions are always easier to deal with because you pass in arguments, you get a return value. Here we instantiate a new calculator. We call add with a left operand and a right operand. We assign it to an IVAR. If you look at the implementation of this add method, it's defined here and then for the number of times the right thing is we add one to the left, which of course that's where our bug is, right? Because we're always adding and not subtracting. You're probably looking at this being like, that's really ugly code. Well, you know what, your legacy code is really ugly, so deal with it. Our theme is obviously right here. We're calling add there, so that's where we're going to be introducing suture later. The second bug is that we also have this tally service, which is stateful, which means we're having a side effect. It doesn't handle odd numbers correctly. We're going to call this the mutation case. If you look at this one, it spins up a new calculator, loops over some number of parameters in some collection and then for each of them calls tally on that number and then finally assigns the result to the total of those things. You can look at this awesome implementation here of like a lazily instantiates an IVAR called total counts down from the number and then if it hits the halfway point exactly, doubles that and adds that to the total. So, yeah, it doesn't work on odds, right? Because these are all integers. So that's pretty cool code. Call tally there, you call total there, and you realize it's not super clean, right? So this seems more complex. We're going to have to figure out how to cut this and we'll take a look at that in just a second because now we're going to cut these seams. Again, pure function, pattern emerge here. The pure function is always easier to deal with. TLDR, write more pure functions because they're going to come back to bite you less often. Because here what we do is we take this code that existed, we're going to replace the actual call to a call to suture. We're going to say suture.create and we name it something, we name add. We pass in the arguments now as an array of args and we tell it this is the old code path and we pass it the method and this can be anything that responds to call. It doesn't matter if it's a proc or a custom class, it doesn't matter. And initially this setup so far is a no op. I would do this and then I would run the code and just make sure I didn't break anything. It should continue to behave just like it normally would. It's just going to call through by default. In the mutation case we're going to cut that one now. So we look at this one and this is again more complex. We're going to replace the call to tally with suture.create tally passing the args of calc and n which you should say wait what because calc isn't an arg. How to design these seams when we don't have a pure function when side effects are involved. Pure functions, they're a black box, right? So if we call add with 2 and 8 we're going to get 10. And if we call it with 2 and 8 again we're going to get 10. And that makes all of this really easy because it's repeatable input and output. But mutation and side effects are hard, right? We call tally with 4, we get 4 and if we call tally again with 4 we get 8. So now you have to consider that it's not an argument per se in language terms but it effectively is one of our on calculator is part of the state that influences the behavior of the method. So what we could do is we could just logically say well calc with i var total 0 is the first parameter into this thing and now I can force repeatable inputs and outputs. This is counterintuitive and there's not very many other context as programmers that we have to think this way. But it seems to work here in this case. So we're going to treat that like an argument. We're broadening the seam and that means we can't just simply delegate to tally. We have to actually write a custom proc here and it's going to take in the calculator, the state of the calculator as well as the number. We're going to call tally with the number and then we're going to return the total which is something that the actual code doesn't do because it's a void, it just returns. But we're going to return the total here so that we can build these recordings that take in all the state that matters and then returns all the result that we care about so the recordings are actually meaningful and useful. Speaking of recordings, here's how we set up these recordings. All you have to do is add to the configuration record calls true and it'll start recording every single call that's made to that seam. You can also, almost every single one of these options can be set up with environment variables so if you're running a suture in a deployed environment you don't have to make source code changes to the configuration. And to record some calls you can just invoke it so you could open up a Rails console, you could create a controller, set some params, call show, that's going to record. Different params call show, that'll make a recording. Different params call show, that'll make a recording. There's a lot of cases in this ad cases. You could record via the browser if you wanted to and click on stuff and have that invoke the code. You could throw it up in production, record there and pull the snapshots down later if you'd like. That seems safe. I think it would work. I don't know. I haven't done it. The mutation case. Again, just add record calls. There's no extra complexity here. You know, pass in a few numbers, they're all even. Pass in some more numbers, great. You pass in one with an odd and an even. You pass one in with all odds, you make sure you cover all your cases. Where does it go, where does this get saved? Well, Suture actually just on-demand instantiates a SQLite database in this path. You can set it wherever you like. And it just dumps all those recordings in there, creates the database if you've deleted it or if you don't have one handy. And it uses marshal.dump. Marshal is part of Ruby Standard Lib and it's a way to convert Ruby objects into Ruby later. At this point, you might be curious, does this work with active record objects? I had written Sutures for the last 10 days at the point that I was like, I should probably see if this works with Rails stuff since a lot of our legacy code is Rails. I took a look at the Gilded Rose Cata. If anyone's familiar with this cata, it's a really cool exercise that Jim Wyrick ported to Ruby in a repo here. It's still up on Jim's GitHub page. You don't have to read this, so this particular code takes in an item, we call update quality, and then I've configured the lambda to return the mutated item, and so that's what we're measuring there. And I made a little Rails app around it, so this is a real beautiful Rails scaffold. So I'm going to create a few items that are significant for the purpose of that refactor activity. They're listed in the list. And the update quality, that's where the critical path is. So when I click that, stuff happens, and then I check my SQLite database and I make sure they're good. I click it again, make sure a few more things get in. And then, yeah, so Rails apparently works, so sweet. So this all does work on Rails. I've actually tested it, and if you're interested in the example directory of the repo, you can play with a Rails example based on that. Now, we have to validate that these recordings can be played back in a test environment. So in the case of the pure function, we just write a test, create a calculator, and then the second API that we're going to look at, suture.verify, we'll replay all of our recordings. So we name it add so that it knows which rows from its database to go look up. It has to match the production one. And then the subject is whatever thing we want to test, whatever thing we want to make sure matches up with all those recordings. And here we pass in the calculators add method. Once we've done that, what will happen at runtime is it's going to verify every single recorded argument set with the recorded result, whether that was a return value or a raised exception. And you can imagine right in your head that what we basically just got was a whole bunch of disposable characterization tests. All we had to do was record, but unlike the other characterization tests, we don't have this tremendous sunk cost fallacy wanting us to keep them forever. In fact, they don't feel like tests, they're just these rows in this rando database. So we don't feel any sort of unnecessary sense of attachment to our characterization test, which is one thing I really like. In the mutation case, very similar here, with Tali, here we create a lambda that behaves exactly like the production one. It should look exactly like it because we're expecting it to behave the same way. And one of the fun things is I'm not a big code coverage fan because I've seen it abused on a lot of teams and used as like a whipping metric for people, but this is actually a really useful case for code coverage because it can guide our recording activity. So if you look at the Gilded Rose Cata again, here's how Jim Wyrack, like in his example repo, he did the characterization test using his RSpecGiven library. If you've never seen RSpecGiven, I love it, it's really, really terse. But even though it's really terse, he had to write like 240 lines of custom testing. And I highly doubt he was in the mood after this was all done to like go and throw this test away and write more isolated tests later, right? So that's not ideal. Doing the same testing, covering all the same behavior with suture.verify, it was one test. I said suture.verifyRose, I passed in the subject of saying call, update quality, return the item. So yeah, it's actually dumping the item before it calls it and then dumping the result after it calls it. So the delta is captured in every single row. And then I have an additional option here, fail fast true when you expect all the recordings to pass so that it doesn't, you know, if something fails it doesn't waste your time by running all the other ones. And then before I called it done, I just ran a simple cover report. Took a look at it and I was like, sweet, everything's covered. And if something wasn't covered, you can click a button and, you know, cover it that way and then run the coverage again. So this is really fun that you can get legacy code, the hardest to test stuff, 100% code coverage, writing zero custom tests. Which I think is pretty neat. So at this point, finally, we can refactor, right? You came to do this refactoring talk so you're like, I'm a learn about refactoring. Well, I have a secret. You have to not tell anyone the secret. I don't know if we can like cover the camera lens or something. The secret is I'm actually really bad at refactoring. I don't know very much about refactoring. All I know about refactoring is that when I'm refactoring a scary code, I tend to hold my breath. And sometimes I'll go hours and I'll be like, man, I'm graying out. So you now know my secret to refactoring. Our friends in the community, Sandy and Katrina, they wrote this book, 99 Bottles at Oop. And it actually has refactoring as a very heavy emphasis. So if you do want to learn more about actual refactoring, I'd strongly recommend you check this book out. It's a cool book because unlike mine, it actually exists. So has that going for it? In the case of the pure function, this refactor, these are simple contrived examples. You look at this, the problem right is that it doesn't work for negative values. So we're going to create a whole new method. This is so we can call both of them as opposed to changing the existing one. And we're going to do this clever thing where we just say left plus right to add the two things. But I have some extra space there because I'm going to do something weird. Return left plus right is less than zero. I'm actually re-implementing the bug and then marking it with a fix me. Because remember, I want to retain the current behavior exactly, bugs and all. This is super counter-intuitive. But remember, this is a separation of concerns thing. Refactoring is not about jumping in and fixing it as soon as it feels comfortable to fix it. Refactoring is about reworking everything so that the fix can come in later. And we're not confident of that until we can be sure that it behaves exactly the same way that the old implementation did. And additionally, like it can be really arrogant sometimes to just rush in and fix a bug without considering that some higher order caller might be depending on that buggy functionality in ways that we don't anticipate. So I'm all about taking my time before I actually make the fix. Now at this point we got a backfill with real unit test. So a simple test of adding two numbers. This is the most fascinating slide in the whole thing. You add things and then you assert that they add successfully. And then additionally we're at test negative adding. So here we'll just skip that because it's not implemented yet but obviously we want to not get through this and forget to fix the bug. So then the mutation case, similar story, skips the odd values. We're going to just ignore all this, write a new one where we just lazily instantiate the same Ivar, we add, and then we return because we want to still return nil to capture the existing behavior. But we also want to reimplement the bug. So we're going to have a guard clause at the top just return if something's odd and fix me note. So Kent Beck has one of his most famous tweets. I don't know if he wrote this prior to Twitter like in a book or something. I don't read books. But Kent said make the change easy, warning this may be hard, and then make the easy change. So that's the mindset with which we're coming to this. Again, write some unit test so you call the tally a couple times, make sure two and four are up to six and then a skip pending test for the odd one. So we've refactored a code. Now we've got to verify that the new code paths can be played back against the original recordings. So in the pure function case, we're going to test it out, create a calculator, call suture verify add, pass in the subject calculator method new add. Remember, got to call the new one. I made this mistake like the first couple of times. It was like, sweet, it works. But I was calling the old one from the new test. And great, that works. That one just passes. Now the mutation case where everything is more complicated isn't going to. We're going to call this with tally. We're going to create that same land again, again being sure to call the new implementation. And this one fails. This one blows up. WTF. So another thing that the late great Jim Wierich imparted on me when I was writing one of my first gems, he said Justin, like, the most important thing any library can do is provide really excellent thorough error messages to its users to help tell them, like, how to respond to exceptional cases. So this is the error message that you get. Instead of scrolling through a long test, you're scrolling through essentially a customized markdown read me of what to do next whenever any of your verifications fail. And so we're going to break this down. First, you get a list of all of the failures that occurred in that run. Then if you look at any of those individual failures, it'll show you the arguments as well as the expected and actual return values or error results. And a couple little things to help you out when you're practicing, when you're running these things, you can set a flag to only run that failure and just focus on that one. And if you decide that a particular recording is erroneous and you can just regard it, then an API and suture to delete just that particular recording so that it doesn't show up again. Additionally, we have advice about how to deal with these failures. So at the bottom of all the failure lists, it'll talk a little bit about, like, hey, maybe the problem is in the comparison of the arguments and the results, like maybe those aren't matching up. So talk about how we compare these results. Suture ships with a default comparator. It's real simple. It compares these things with equals. You'd think that would normally work, but not always when you're dumping stuff out. So it has a backup plan of comparing their Marshall dump byte strings, and if those match up, then it considers them to be equivalent. For active record, it smells out if it thinks that these things are active record objects, and it'll compare the attribute's hashes less updated at and created at. And you can actually add additional columns you'd like to ignore on a per-scene basis. But if you're using the default comparator and your thing doesn't equal the other thing for whatever reason, and you get stuck, you can implement a custom comparator. So let's suppose just hypothetically that this calculator that we're working on has these other fields on it that we don't really care about for the purpose of equivalency, and that the subject actually returns the calculator, so we're comparing two calculators with each other. Well, the comparator would simply pass in. We've all written comparison logic. The recorded one, the actual one, and we would just return it as true if the totals match up, because the total is what we care about. Classes in Ruby are also a thing, so if you don't like writing a lot of anonymous lambdas, you can implement this as a class. You can actually extend from suture's default comparator, which means that when you implement call, you can just return true if the default comparator is successful, otherwise fall back on whatever your custom comparator logic is. Then to the to the seam, you just pass in an instance of whatever your comparator was. So going back to that error usage, a couple more things that show up in here is by default suture runs all of your tests in random order, because that seemed like a good thing to do. If you can't, if you have one that happens because of the random order, you get some order dependency, you can lock to a particular seed by setting that environment variable there, and if you know that you have to run the test for some reasons in an insertion order, in the order that they were recorded in, you can just set that seed to nothing, to nil. Additionally, I wanted to make the variable as possible. So you can see here, this is the comparator, the database path, fail fastest at defaults. You can actually limit how many tests it runs, and how much time you're willing, like I'll just run for five minutes, I don't want it to go for hours if you have a lot of database stuff. You can limit the number of error messages it prints out, and we talked about the seed. The goal here too, is like a lot of times we're not just refactoring, we're re-implementing something, and when you're re-implementing, you're starting at zero and then you're running to the other side, so it's something that you're using for a lot of errors. So, it's going to be coming up in a couple of weeks, I don't know. Some years ago I never got to use the Bar of Progress, so it was an excuse to throw a progress bar in here, but it's to give you a sense that you're making forward progress, you're at 92% this many are failing, this many are passing. It's a way to trick our brains into thinking refactoring is fun. So, yeah, I think it's really important for it's nil and it's passing in one so I think that's a hint so if we look here we're like oh right one is odd and it's returning before we've lazily instantiated total so we need to leave that put that up at the top run it again and now it all passes so this is actually a real bug while I was making the slides that like I got caught on so so that was that was affirming now let's talk about comparing the new and the old path because remember our mission is to make things good everywhere and so far we've only done development and testing we haven't really considered staging and production and the pure function case here all we're going to do is add another flag called call both to true and it's going to call both the new and the old code paths and in the case of the first one it just it just works in all cases but if you run it actually in staging I think you'll find like a lot of surprising inputs and outputs in most cases and what'll happen in staging because it's staging it'll just blow up with a big error message and explain exactly what just happened in the mutation case here we're going to set call both the true and like always it's going to be a pain because it's not going to exactly work it fails we get another huge error message with as much advice as I could fit into it and what it says there is you can see the total was two and we were called with two in the new code path returned true to but the old code path returned for and what's happening of course is the calc is being mutated Della so I added another option here called dupe args the idea being to protect against arg mutation that would actually dupe the calculator before calling either of the paths of course I was really proud of myself and that still didn't work because now that the calculator that actually we started with never changes and its total is always nil so you have to do another little trick of after you've made the mutation to the thing that you're caring about do a reassignment this is just a custom one-off thing that I had to do in this particular case it got messy most legacy code doesn't have really neat seems will have side effects and it'll it'll get messy so this one worked but you know I didn't feel real great about myself and you know got to remember like there's trade-offs right so like these features exist to try to make this easier if the if the feature is making the refactor harder than maybe it's not worth it just take advantage of the features that are useful to you but if you did pay that penalty to get it working in staging you'll also get this benefit in production for free because Mike the thing I care most about is making my changes safe for users the last thing I ever want to do is break stuff for users and if the new path errors I want to rescue with the old one so in the case of the pure function here I just change call both to fall back on error to true it'll rescue that new code path with if it if it raises with by calling the old one so everything's invisible to the user in the mutation case I already paid the price of making this actually work when you call both paths that means I can also just change the flag and we're golden it'll work fine obviously if you only call the old path when you really need to that's going to be faster than calling them both fewer side effects in case you're worried about like you know calling both things having effects and databases and stuff that's probably a thing you should check out before you throw this into production all the errors are logged sutra actually comes with a pretty sophisticated logging system so you can configure that you can merge it with your rail stuff you could just I just recommend you keep an eye on it so that if your new code path is failing 100 percent of the time you know and additionally you can configure particular error types because sometimes we expect our code to raise exceptions so if it raises an error that you expect you can just register that class and it won't consider it a failure it'll it'll let things keep passing through all right finally when we're done we just delete it so just like stitches we remove sutra once the wound heals in the case of the pure function now this is the fun part we just get to delete that test we get an api here to delete all of the recordings for that for that seam uh you know we get to sort of rip this stuff out here the old and the args and yada yada and we just call back to the original method looks just like it used to except it's calling our new path exclusively that feels good in the mutation case here uh blow away that test blow away all of its recordings uh it's like it's literally like ripping off a band-aid this is the most fun part you're like yeah go away all right be four lines again please yeah all right so you feel a lot better so you know what we did it like that was the whole refactor there was a whole step there was like pretty much every feature in this thing uh we should all feel really good note that we didn't actually fix the bugs uh so so i added this slide because i've given this talk three times before realizing that we didn't actually fix the bugs so it's easy to get carried away in the refactor step i guess uh so we did it ish uh suture is ready to use again it's up on github there we were very careful because like production is scary refactors are scary enough so we decided to release this initially at 1.0 we're going to respect Sember we're not going to make any breaking changes and surprise you um additionally i've been chatting with michael feathers a lot lately about uh ways to make production refactors safer and a big part of that is deleting dead code he just released a new gem this week called scythe which lets you register probes in production that will like basically like be tripwires to let you know this code is still in use this code isn't in use last time this code was called was x x or y date uh and to try to give your team the information that needs to justify deleting old code that isn't used anymore um and all this is part of an effort to make refactors less scary so that ruby can be more maintainable so that we can keep using ruby at work for many years to come uh and i i hope that even if you don't use these tools you'll think about this a little bit more about like maybe ways that you can contribute in the community because i've got a feeling everyone in this room has a lot of legacy rescue experience um additionally uh uh noll had this great idea uh sam betsey noll and i at lunch we're going to congregate in the same kind of area so you follow us all the lunch after this uh we'd love to chat with you about testing or any i'll chat with you about anything they'll they've only promised the testing uh we'll discuss some Ohio food over i don't know yesterday was really hearty so i'm sure it'll be good um there's one more thing i just like to say this is my own word of thanks my own gratitude uh testable turned five this week uh our our company there is no way our company would exist without ruby without this community i've got a lot of i think i can't see you all but i've got i know a few current and former like testable clients in the room we got a few testable agents in the room uh uh love all of you that like for this awesome journey like the if you told me five years ago we'd be like one of the most recognized ruby agencies in the world uh i don't know what i would have done i was already like the maximum level of panicked so i would have remained that level of panicked so yeah i'm sorals uh you find me on twitter i'd like to be your friend let me know what you thought about this talk uh share it with other people there's already a video of the version i gave in japan online so i'll tweet that out right away um testable like you know we're on a mission to fix what's broken about the world software you can send us an email uh join at testable if you'd like to join up uh we're always interviewing you know potential new agents who who want to work with us um and yeah i don't know if you know any teams that have any legacy ruby but we love working on this stuff we love the complex problems helping teams manage complexity so if you know anyone any teams that could use some help uh shoot us a message or grab me um i've got stickers and business cars and i'd love to meet you uh oh and that's the end of my slides so thank you