 Unit testing is one of those things that everybody talks about and everybody says it's a good thing and everybody really wants the world to have more unit tests, but nobody really wants to actually write the unit tests, do they? So, quick show of hands. I'm curious, how many of you actively write unit tests for most of your code today? You're my heroes. How many of you have never written a unit test in your life? Okay. Don't feel bad. There's always time to start. So my assumption here is you've at least started to drink the Kool-Aid. You've at least got your toe in the unit testing cult. You're either already writing unit tests or you get it. You're ready to start. No matter what I say today, you're kind of going to do some unit testing real soon now. So a corollary of that assumption number one is that you already get that unit tests help make your code more correct. I'm not here to convince you of that. If you don't believe that unit tests make your code more correct, I don't know how to convince you. That's like trying to convince you that water is wet or the earth is round. It's just utterly transparently obvious. What I'm here to talk about is that unit tests make your code better on a higher plane. The aesthetic elegance beauty sort of plane. Code as art. So if you're one of those practical people who dismisses people like me with elegance, smellegance, I just want to get a product out. I want customers to use this. I want users to use this. I want to make some money or analyze some data or whatever you do. Well, okay, that's great. Shipping products is really important, especially if you have payroll to meet. But beautiful code is better code because it is easier to understand, easier to maintain, easier to extend, easier to reuse. It just makes life better in the long run. So you might save a little time in the short run by cutting corners, but beautiful, elegant code will pay off in the long run. And unit testing will help you get there. So my plan of attack for this evening's talk is a real life case study. I'm going to show you a simplified cartoonish version of some real code from my real day job, untested, because that's how it was when I started the job back in July. I'm going to work through the process of adding tests to this untested code and understand the interaction between less than perfect design and tests. Hairy tests in particular. When you test code and the tests are unusually hairy, complex, voluminous, that's a smell that tells you the design isn't as good as it could be. And we're going to see how improving the design makes the test code smaller and simpler, which is a nice virtuous feedback loop that tells you your design is getting better. So I need to spend a few minutes motivating. This is real life code from my real life job. So I have to give you a little bit of background on it. What is this code? Where does it come from? Why does it exist? Most importantly, what requirements does it meet? Because when you're writing tests, you have to make sure that you're actually asserting the requirements of the code. Otherwise, why bother? And if you give in to your indulgence of refactoring, you have to make sure that the new shiny, beautiful, elegant code after refactoring still meets the requirements that it met in the first place. Otherwise, why bother? So this code comes from my employer, a company called Rhinosis. Monitor is such a loaded word. I prefer to say that we map the internet, we measure the internet, we're not reading your email. We are, however, pinging every single one of your public IP addresses every couple of months. More relevant, more importantly, we run trace route to about 1.5 million hosts all over the internet, all day long, every day, from about 100 locations around the planet. And it turns out that that results in about 200 million trace route runs every day. Average trace route run is 12, 13, 14 hops. So we gather about 3 billion hops every single day. And then we try to make sense of this torrent of data and turn it from data into something approaching useful intelligence that we can actually sell to people. So when you have a torrent of data like this, clearly you need a super duper whiz bang, high power, next generation, ultra advanced data storage and representation technology, like plain text, tab separated text. That's what the bulk of our, most of our bulk data sitting on disk is tab separated text files, because it works. Yeah, of course we use Postgres for a lot of stuff where low latency, high performance is important and we're starting to use Redis for other things where performance is important, but not stability, persistence is not so important, but a lot of plain text files. So over the years, not we, I haven't been working there for years, but we as a company have learned some rules, which are probably pretty obvious to anybody else who does this, how to stay sane with plain text. Number one, keep it simple, stupid, good rule for any kind of software work, keep it simple, stupid. You want to restrict the data tightly, like for example when you're using tab separated text files, make sure there are no tabs in your data, kind of obvious, or at least it should be, and try to stay consistent across the years. Over the years requirements evolve, the business evolves, data evolves, but tab separated columns of data as a concept doesn't really evolve, because it doesn't need to evolve all that much. So there's a lot of deep consistency there, so I'm going to show you some examples of our raw, and this is actually not our raw data, this is a little bit of processed from raw, just to give you an idea of what we're working with. So this is an excerpt, this is one single line, I've put backslashes there to emphasize, this is physically one line, it's just unreadable if you spread it out. Capturing a single trace root from one host to another host, another IP address. So the first column of the file is, this is a nice feature of all of our data formats, many of our data formats, is the name of the format is right there in the first column. There's a timestamp, a protocol, there's the source IP, the name of that host, we're tracing from our collector in Nairobi, Kenya to some machine somewhere, I don't know where, S means it was successful, and then there's a variable number of fields, because a trace root is sometimes it's five hops along, sometimes it's 15 hops along, so we're going to have n times three additional fields tacked on there, which just makes it a little more interesting. It wouldn't fit in a normalized relational database without a little bit of work. So that's one single trace root run, we get about 200 million of those a day, which is a lot. So for a lot of purposes, a daily summary is more useful, so one of our post-processing steps is to take this and crunch it down into something we call tip one, no I have no idea where the name comes from, I've only been at the job for seven months and they haven't told me all those dark secrets. So this is an example of a daily summary to one particular IP address, 78.124.106.59, with all the information, not all the information, this is a summary, a reduction of all the traces from our 100 odd collectors all around the world to that one IP address, and also some information about that IP address, because we do a lot of geolocation stuff, so it's very interesting for us to be able to quickly pull out what's going on in, say, Syria today, or how were things looking in Ukraine last week? So for example, I could just very quickly use Auk and find all the targets in Paris or all the targets in France, and this one would pop out. Now, so there's only, because we have about 1.5 million targets, there's 1.5 million of these every day, which is much more manageable. And there's another interesting feature of the data. So you can see most of these fields are strings, integers, floating point numbers. This last field is a key value mapping from collector name to integer, which is the number of times we traced to that target. So our collector in Vancouver did it two times. Our collector in Fez, Morocco only made it once. So there's some more complex data structures. Again, this is something that would be awkward to squeeze into a relational database efficiently. I mean, of course, you can do it. Anything can fit in a relational database if you're willing to do enough joins. But this is not normal form. So there are some deep similarities to the data, tab-separated plain text, obviously. First column of every line is the name of the data format. So it's a little teeny, tiny bit self-describing. But it's not fully self-describing. Every column has a name and a type, which you do not see in the data. That information lives somewhere else in the source code. Space is a premium. Most of the data types are really simple, integer, string, float. Some data types are more complex, comma-separated list of string int pairs as an example. It's UTF-8 encoded. I'm very happy to see non-English characters in that data. That's a really impressive accomplishment now that we've had Unicode for 20 years. And the data often lives on disk besipped, but not guaranteed. Depends on how old it is, how far along in the processing it is, stuff like that. So we don't have a common data format. These are two very different formats storing very different information. We have a common meta format, and we have dozens of individual data formats that conform to this data format, which argues obviously for a common library. For a long time we didn't have one, but sometime last year, a couple of people who preceded me in the job are still there and wrote a class called Generic Line Parser. Generic Line Parser is the embodiment of that meta format. You can't actually use it to parse any data. You have to subclass it or maybe delegate to it depending on your taste and needs. So there's a subclass called t3 parser for that raw tracerate data. There's a subclass called tip1 parser for the daily summary data and several dozen other subclasses because we have several dozen similar formats. So requirements for this guy. This is really important, right? So the bad news about this code is there were no unit tests. The good news is it met the requirements. What were those requirements? Well, structure, named columns with data types. Performance, when you're dealing with 200 million records a day. Sometimes it's really expensive to convert all those strings to integers when you don't even care about them. If you only care about one particular column, you don't want to waste time converting 20 other columns to int and float or parsing out that comma separated list of string int pairs. So the type conversion has to be optional. It's got to be flexible enough that it's really easy to define new formats because every couple of months somebody realizes, oh crap, you know that tip1 format we designed last year? We really need this column in it and that's how tip2 is born. And that seems to be happening a couple of times a year. So it's got to be really easy with minimal amount of code. So here's the overview of what generic line parser looked like. Still does look like when I sat down and said, hmm, this thing needs unit tests. The public interface is nice and small and fairly straightforward. There's a constructor with a few too many optional keyword arguments, but they're all there for a reason. They're all there to meet some requirements. There's a parse method, no surprise. It's a line parser. Of course it's got a parse method that takes a line object or a line parameter. I'm not sure why that's an optional parameter, but unfortunately there's not time to get into that in a half hour talk. And curiously enough, there's a read method. This is a line parser, not a file reader. Why is it to have a read method? Well, if I was a real purist, I would object to that, but I'm not because 99% of the time when you're parsing lines of text, guess where they come from? A file. What do you do? You read a line and then you parse it. So of course you have a convenience method to read a line and parse it. That's just a no-brainer. However, it should be a one-line method. So this is an example of how this code meets one of its requirements. It's fairly easy to define a new format. You just have a list of columns and data types, and then there's a bit of boilerplate. This looks the same for all of them. You could do it with less typing. This is just how the code is today, not how I wish it was. So this is repeated dozens of times, and it's no big deal. There's no serious code there. There's just a constructor with a superclass call and some data, which is nice. I like that. So time to test this. I have to write some tests for it. I needed to move the code around. I am kind of a purist when it comes to writing tests. I refuse to modify code if I can't run the tests and if I can't run the tests because they don't exist. I write the tests. So I had to write tests for this guy. This is a class with objects. You just do everything through the objects. You can't test an object if you can't construct it. So I always start with a constructor, obviously. Most constructors are trivial. So the test case for a trivial constructor is usually trivial, but I write it anyways because this is Python. You can make a typo, and there's no compiler to catch you. So you have to write that test case. Code like this where the constructor is non-trivial with complex internal logic and sometimes it does IO and sometimes it doesn't, you're damn right I'm going to write test cases for the constructor. So the outline of this thing, I've already showed you the signature of all the methods in this class or all the public methods in this class. Here's an outline of the constructor itself, which I hope gives you the shivers. It gave me the shivers the first time I looked at it because if you count the number of branch points in this I did it several months ago and I convinced myself that there are nine branch points which means two to the nine 512 code paths through this thing. So in theory I have to write 512 test cases. Oh my god, I'm not going to do that. Nobody sane would ever consider doing that because it's completely impossible in the real world. But I do still want to cover all the, every line of code in here, every feature. And just stop for a second and think, this is a class called generic line parser. Why does it care so much about file names? Kind of weird. And in fact sometimes it's a file name and sometimes it's a file object and sometimes it's just an interval. It's got a confused mission. So it's a line parser and a file opener. It's a read method so it's a file reader and I'm not going to show you this, but that read method also does progress logging. There's a little too much going on in here to make me comfortable. But I can't change it yet. I look at this code and say, oh god I want to change it. I can't change it because I have no tests. I have to write the tests first. That's just the way I am. So I had to write six, I didn't have to write 512 test cases but I did have to write six test cases which is about five more than I like to write for a constructor. Definitely a code smell. So now that I've got these... So writing those six test cases gives me two things, two very important things. One it gives me a deep insight and understanding into the code. You cannot write... You cannot collapse 512 code paths into six test cases that touches every line of code without understanding what's going on. How many times have you read a piece of code and thought you understood it and then realized, oh yeah, I just went in one eyeball and out the other? Well, when you write the tests for that code you have to understand it. Writing tests for somebody else's code forces you to really get it. And the other thing it gives you is the courage to refactoring. I'm going to go into it more in detail later but it's an important concept. So the problem, number one, is this class is a file opener and a line parser. Let's fix that because file opening is a fairly important thing, especially in an environment where most of your data files are UTF-8 encoded. You sure hope they are. Sometimes they're BZIP compressed and who knows, maybe sometimes they're GZIP compressed and in three years maybe they'll be XZ compressed. I don't know. Why does that logic belong in the line parser? Especially when this line parser class only covers a subset of our data, not all of our data. Well, it doesn't belong there. It belongs in a generic utility method with a short, snappy name that anybody can understand, remember and use, like ZedOpen. Or if you know that you're dealing with possibly compressed UTF-8 encoded data, use ZedOpen real quick or real easy or real fast, easy to remember. So those exist now. Anybody at my company can use them because they're in our Util library. And that means you need for this guy to take a file name anymore. It's really easy for anybody calling generic line parser or any of its subclasses to just call ZedOpen or use ZedOpen. Boom, the file is opened by this utility method and no longer does this have to worry about opening files. Now all it has to worry about is I didn't get a file at all. I got a file or I got some other sequence of lines. Still a little too much but it's much, much better. And the really great thing what I really like about this refactoring is, boom, half the test cases disappear. For some reason eliminating hundreds of paths, hundreds of code paths makes me happy. Eliminating half the test code makes me much happier. I'm just a test oriented sort of guy, I guess. That's cool. So three test cases for one method it's okay. I could live with that. So the constructor got simpler, got shorter. We have these nice general purpose utility methods that you can use that have nothing to do with generic line parser. There's less test code to maintain fewer code paths to worry about. So, another fix I want to make? Really this is really getting kind of obsessive but really it's unnecessary. Why do we care about the difference between file and BZ file and some other BZ file? The only reason is because some of these things have names and some of them don't and if you look at this code for 12 seconds you'll realize that it's quite easy to collapse it down to this. If it's an iterable, get the name. If it doesn't have a name, that's okay. We don't mind. If it's not an iterable, blow up. You have to pass a file or a list of strings or some other iterable and really that's about as simple as it's going to get. So it's still, I have even fewer code paths to worry about. This time I didn't remove any tests. I still have three tests. I have to test that it can handle a file. I have to test that it can handle a list and I also have to test what happens if you don't pass in anything at all. So what's the big deal? I refactored some messy code. So what? Anybody can do that, right? But I wouldn't have done it if I hadn't gone through the exercise of writing those tests. Writing those tests made me look deeper, dig deeper, understand deeper and think deeper. It made me see the good side and the bad side. Like I said, this code met its requirements very well. It's quite elegant, nicely structured. It just had a few too many features in one class. They were all good features. They were all needed features. We wrote those for a reason. You can't just throw them out. You have to put them somewhere. Possibly even more important than the deep understanding I got from writing those tests was the courage to refactor which is really hokey. It sounds like something from a self-help book but I'm sorry, it's true. Having written those tests, I now have no fear at all about tearing this class, generic line parser to pieces and putting it back together again because I know if I screw up, when I screw up the tests have got my back and they're going to tell me, hey dude, you screwed up. Now I would love to give you a happy ending to this story. I would love to say these refactorings have been done, committed and they're live in production today but I can't unfortunately because there's a whole lot of code out there that uses generic line parser and T3 parser and tip 1 parser and those dozens of other subclasses and I do not have the courage to refactor that code because it doesn't have tests. I can't make incompatible API changes and just go update all the callers. The best I can do is prepare a patch, email it to the maintainer and ask him to test for me and that folks is not agile. So the job unfortunately is half done because of the lack of tests elsewhere in our code tree, sad face. And that gets you to the cost of not writing tests. Obviously writing tests means more correct code so if you don't write tests, you're going to have more incorrect code. The bugs will be caught later in the cycle worst case they'll be caught by your end users which is the most expensive time to catch bugs. You become afraid to refactor because you don't know if you're actually breaking things. Leads to code duplication because all code has bugs that means bug duplication and insufficient code reuse. I don't know if this library works because there are no tests for it. I'll just write my own. All of these things happen. I don't want to get you down with the, I mean, yeah, there's a downside to not writing tests which means there's a big upside to writing tests. Having a thousand tests is better than having 999 but having one test is vastly better than having no tests. Huge, huge win from writing that first unit test. You'll never cover everything with unit tests but boy you can cover a lot. You'll be surprised if you put some effort into it just how much you can cover. Pleasantly surprised. So, of course, writing unit tests makes your code more correct and the earth is round, water is wet, space is big. That's not what I'm trying to convince you of. It should be obvious. What I'm trying to convince you of is that you get more beautiful code in a virtuous feedback cycle from writing tests. Tests show you where your design is flawed and help you to fix that design. And beautiful code in the long run is cheaper because it's nicer to work with, more maintainable just pleasanter. That's it. Thank you.