 Every once in a while, I come across a bug that I can't help but admire. Bugs like these consume me. I forget to eat, I have trouble sleeping, it's exciting and it's stimulating, and when I figure out the devious ways in which they're causing the system to fail, I feel victorious. Princess One, Dragon Zero. This particular bug was not like this. This was the mind numbingly boring kind of bug, the kind that causes all of the other items on my to-do list to seem suddenly urgent. The bug was in an application whose sole purpose was to allow villagers to post gossip on the web. So, who got hitched, who had a baby, who has a birthday coming up, and the gossip would then get piped straight into the village rags production system. So, it would be printed on actual paper alongside the story of the largest fish to be caught off the dock since 1962, and the broken feature was the feature that when Alice posts a picture of her new baby allows Bob to tell everyone that they should come gossip about it. In other words, the broken feature was being able to gossip about the gossip. The full bug report read as follows, I tried to share by email but nothing happened. I opened up the application, put in my email address, clicked the button, and lo and behold the villager was right, nothing happened. I activated the JavaScript console and tried again and got a 500 error. The JavaScript was trying to make an asynchronous call to an endpoint inside the application, and the response that came back in the console was a wall of text formatted as HTML. I copied out the call with all the parameters so that I could call the endpoint using curl so that I could redirect the output into a file so that I could open it in the browser. And this showed me a stack trace that was exactly one line long. The execution had not even reached the controller action, a before filter blew up trying to send a message to nil. And so a quick investigation revealed that the before filter was completely irrelevant, and we could simply bypass it all together. So I was now expecting to get a 200 okay, a snippet of JSON, and an email, and instead I got another stack trace formatted as HTML, and this time the code managed to get all the way down to the email template before it blew up in something to do with translations. The stack trace drew out the entire feature. It starts in the controller in a method named share by email or SMS. This calls out to a module called sharing which has a method called send to recipients, which presumably is a collective term for email addresses and phone numbers. This stop in the stack trace is send to email in the same module. This further delegates on down to an action mailer which sends an email using a text template which is internationalized and interpolates strings from a locale file. The error message showed the exact line of code that was blowing up. The internationalization thing was some fancy string interpolation which took four arguments and the ERB template was apparently only sending in three of them. And so through a process of elimination it was straight forward to determine that the missing key was tidbit sender. So I patched the ERB template and then kicked off the email. And at this point I did get back a proper response. And also the email arrived in my inbox. So my work here is done. Management would be impressed. The truth is, I did what the French call le minimum sans et cal. This is a term that mocks the bare minimum. It makes fun of bureaucracy for setting standards that are so low that you literally could not do worse because at that point you wouldn't have done it at all. Now having fixed this feature, I was sorely tempted to tiptoe away. I knew that if moor bugs were discovered a few months down the line I wouldn't be in trouble because my name is not even in the commit history for the feature, aside from the two lines that I just touched. Still I was unable to turn my back on this code. And I'd like to pretend that the reason that I couldn't leave this alone is that I am a good person. And the truth is, I am driven by fear. The code might be decent or it could be a stinking quagmire of rot and decay and without at least glancing at it, I can't know. And I fear stinking quagmires of rot and decay. I dread getting sucked into them and being thigh high in mud and leeches and fighting off stack traces with one hand and wrangling bullion gates with the other. The last thing I want to do is wade into the swamp. Yet I fear that it's going to cause me X hours of pain. Let's say three, three hours of pain. If this thing blows up in my face one night then I'm very likely to spend those three hours and get it wrong causing me three to four days of hell and pain writing scripts to parse access logs so that you can put lost orders back into the database. Guess how I know this? It's this fear that motivated me to take one tiny step to just eyeball the code. Look at it and make an evaluation about how likely it was to blow up later. Now, before we look at the code, I'd like to take a moment to talk about the tests. Right, moving on. In the controller, in the share by email or SMS action, the first thing that caught my eye was a constant that I had never seen before. Tiny URL. It wasn't present in the stack trace nor did the email that I received contain a short in URL. The logical explanation for this is that it is only used in the SMS portion of the feature and it is true that we are building up the text message for the SMS here in the controller on every request, whether or not we've received any phone numbers to actually send this SMS to. However, the text message also does not contain a short in URL. In fact, once we've generated the tiny URL and assigned it to this local variable, that variable is never referenced again. So let's take a look at the tiny URL module. We have a base exception class and we also have an exception class. And incidentally, the only thing that the base exception class is used for is to define the exception class, which in turn is used exactly once. This is a bit cryptic, but the method name does a good job of explaining the comment and this is both cryptic and misspelled. But the method name explains what's going on. This is the core piece of Tiny URL. Now you don't have to read the code. Just notice that we are calling out to an external service over HTTP. Every time we call the share by email or SMS action, we call out over HTTP and then throw away the result. The author of this code has thoughtfully added caching, so we might not actually make the call on every request. And besides, it only happens in staging and production. You'll notice that the case statement always returns. Therefore, the last line of the method will never be reached. Also, you can see that the case statement has exactly two outcomes, true and false. The caching mechanism manages to use both a module variable and a module instance variable in the same breath. Now it turns out that the Tiny URL module is not used by any code anywhere ever, and so it can safely be deleted. The code has been racking up expletive points pretty quickly. Now expletive points or expletive points, if you're American, are also known as XP. We found two bugs where only one was expected. Bugs are known to congregate, so there are very likely more of them. There is dead code. There's a gratuitous case statement. The code uses a module variable. The code uses a module instance variable. The comments either state the obvious or are incoherent. We're building up a text message even when we're only sending emails. The entry point into the feature is a method named share by email or SMS. Apparently, the method does two things that are so unrelated that we can't even use and in the method name. There are no tests. The code is calling out over HTTP and never using the result. So we've accumulated 72 expletive points. But wait, there's more. And unless else is being used to determine the response for this action. And it's switching on a flag called invalid name, giving us a double negative for the happy path. The name flag itself is actually kind of spectacular. A name must not exceed 40 characters in order to be valid. The significance of the number of 40 is unknown. Out of curiosity, I kicked off the curl command with an invalid name and I got a reasonable error message and an email. The process goes like this. Having determined that the name is invalid, we send out the emails and text messages and then return an HTTP 400 bad request. Notice that nil is totally legit. And it seems likely that hitting the controller action without a name would result in an email being sent out. Instead, we get a 400 bad request and the error message tells us that they've already been told, obviously. It turns out we don't get an email because the send to recipients method has a guard clause against nil names. Let's look at what else it has to offer. In the sharing module, the send to recipients method is very complicated looking. Fortunately, there's a comment that explains what the method does. Unfortunately, every single part of that comment is wrong. There is no message key, the hash key is called SMS. There is no email recipients key, it's just email. And there is no SMS recipients key, it's called lumpers. And you'd think that it can't be more wrong. But even if all the keys were named correctly, the comment is still wrong because we are not sending an SMS to emails. And numbers, we just fixed a bug in the email template that the action mailer uses in order to handle the email portion of the feature. The method has code to clean up the text message, but that code never, ever gets called because the hash never has a message key in it. And then it cleans up the name. And this is actually reasonable aside from the fact that the controller validated that the name was less than 40 characters long before it passed it into this method. So having cleaned up the name, it could now be shorter and therefore valid. Then it goes ahead and validates the length of the name. This time with a cut off of 41. Now, once we've gotten past all the validation and clean up, we get to this block. So let's break this down. We combine emails and phone numbers that came in separately into a single array. Then we iterate over the whole lot and use regexes to figure out whether we have a phone number or an email so that we can dispatch, so that we can finally dispatch the message using the correct protocol. Now, according to the comment, the send to email method sends an email. The comment is only partially correct. This is very confusing, so I'm going to illustrate it with an example. Alice posts a picture of her new baby. Bob goes to the website and enters Charlie's email address to tell him about it. So the tidbit was created by Alice. If we don't have Alice's email address, we will skip blithely and pass the bit where we actually dispatch the email to Charlie, whose address we happen to have because Bob just gave it to us. Incidentally, it's quite likely that we don't have Alice's email address because in this application you sign up with your cell phone. Now, we do prefer to report good news, always. And finally, in the other branch of the conditional, we send the send to SMS message, which, according to the comment, sends SMS. Does this code say about the developer who wrote it? It's easy to criticize and a whole lot of fun. It's tempting to believe that whoever wrote this feature is stupid or a bad developer or at the very least a bad person. This is known as the fundamental attribution error. When somebody writes terrible code, we make assumptions about why that is. At one end of the scale, we attribute bad code to individual traits, to that person's fixed characteristics. They're not skilled enough or they don't care enough. And at the other end of the scale, we assume that it's situational, a momentary lapse due to contextual pressures. So at one end of the scale, they're a bad person. At the other end of the scale, they're having a bad day. The forces that push us around from moment to moment are invisible. They're obvious to us in our own lives. We cannot help but notice them because we're in the thick of it. Those pressures are hard to see from the outside. And the farther that you are from the situation, the harder it is to imagine them. And the result is that people we don't know are bad developers and we and our friends are just having a bad day. So how does an experienced, smart developer, like the author of this share by email or SMS, end up with a mess like this? Now, most likely, what happened to them is what happens to all of us, rush jobs and rapidly changing requirements? And then again, who knows? Maybe there was no deadline. Maybe they understood the code perfectly. Maybe they were just bored or drunk. If you strip away all of the fuzzy, muddled humanity of it all, it all comes down to an algorithm that is based purely on rational choices. Imagine a game where player one and player two may either cooperate or defect. If both players choose to cooperate, they both get three points, a reward for mutual cooperation. If both players choose to defect, they get one point each, a punishment for mutual defection. If player one cooperates and player two defects, then player two gets five points, known as the temptation to defect, and player one gets zero points, known as the sucker's payoff. In terms of software development, cooperating means whatever it means to you and your team. It could be taking the time to write tests or making sure that your methods are short and that their names are expressive and that you've deleted all the trailing white space. And defecting is not doing those things. Defecting is committing with no thought to the next person who's going to see it. And here's the thing. If player one knows that player two is going to defect, then it is in their best interest to defect as well. Both players get one point. On the other hand, if player one knows that player two is going to cooperate, then it is in their best interest to defect because they'll get five points and player two will get none. The rational choice is to always defect. Yet in doing so, everyone loses. Notice that this is not a zero sum game. Players do not have strictly opposing interests. The only way to get the highest combined score is to cooperate. If everyone looks out for themselves, nobody wins. And of course, this is overly simplistic. It draws out a world where strangers meet once and then never interact again. And the more interesting question is, when should you cooperate and when should you defect in an ongoing relationship with another person? This arithmetic of cooperation maps disturbingly well to the costs that we pay in order to deliver software. When everyone on the team cooperates, the team delivers good software with a reliable cadence. Everyone looks good. Everyone can be productive. If one person on the team consistently defects, then that person looks efficient to outside observers. They're ticking off features on the backlog. They're fixing bug after bug. Nevermind that they most likely introduced a whole number of those in previous fixes. They seem productive. They deliver heroic saves and they do it quickly and reliably. And the rest of the team slows down. Both from trying to understand the code that this person is producing and from shoring it up, from having to rework it and rewrite it wherever and whenever they need to make changes. They add tests. They improve error handling. They extract responsibilities. The time it takes them to make a change increases. If the whole team defects, there's an initial rush where features are delivered rapidly and we quickly hit a point where each change takes longer and longer to make until we reach a pathetic but stable equilibrium where progress inches forward in an endless, painful struggle. Humans are biased towards cooperation. We will readily cooperate, even if doing so makes no rational sense. Yet in being irrational, we will gleefully defect when we have no incentive to do so. Sometimes it's easier to just not, not write that test, not rename those variables. We want to do the right thing but good intentions are meaningless when we're tired or distracted and decide that just this once will commit and move on with our day. Every commit we make tips the balance ever so slightly in one direction or the other, an inch towards entropy and inch towards increased order. Perfection is unattainable and ultimately irrelevant. If most of the commits you make are shifting the balance towards increased order then no matter how slight that shift is, you will have a code base that constantly improves. So I urge you to ask yourself before you commit, am I cooperating or am I defecting? Thank you.