 Meet Sima. Sima is an engineer at Doxa Russ, a company that provides appointment booking software for medical professionals. Doxa Russ was founded in 2008 and has grown from a tiny start-up to a company employing nearly 100 people. The app itself is a decade-old majestic monolith. Code dating back to Rails 1.0 can still be found in its ancient Hulk. The teams changed considerably over the years with only the CTO remaining from those that wrote the early lines of code. Sima's been at the job a couple of months now and feels like she's starting to get her head around the code base. Today she's started a new task. There's a page in the app where doctors can view the patients they've previously had appointments with and the patients are sorted alphabetically. Sima's task is to update the page so doctors can also sort the patients by their last appointment date. That way they can see who's not been in for a check-up for a while. Sounds straightforward enough, but whilst looking at the code something catches Sima's eye. There's a method called sorted patients that's returning the patients sorted by their names. But what she finds surprising is it's doing so using Ruby's sort method in memory rather than as part of the database query. That doesn't seem very efficient, thinks Sima. She makes a quick change to try the sorting the patients in the database instead and runs the test to see if it breaks anything. But they come back green, which suggests the in-memory sort is probably unnecessary, at least assuming the test coverage is good. Sima's eager to remove the in-memory sort as part of her work, but before she does so she wants to understand why somebody might have done it in the first place so she can be sure there aren't going to be any unintended consequences. She starts by checking the model looking for clues. The patients association is loaded through the appointments association and everything else looks as she'd expect. Satisfied she's explored the code enough, she knows exactly what to do next. On her last job, Sima was fortunate enough to work with a couple of wisendold developers who taught her the mystic and ancient arts of gitfew. She learnt powerful techniques for searching through revision histories to discover how code got to be the way it is. She starts with a pretty basic gitfew technique, gitblane. Gitblane will reveal the revisions and authors that last edited each line of code. She runs the command and picks out the line she's interested in. Git is telling her the author was somebody called Josie. I don't think she works here anymore, I think Sima. If she did, she might have asked Josie directly why she'd used an in-memory sort, but given the change was done almost a decade ago, she probably wouldn't remember. She takes the revision shaft for the line she's interested in and passes it to gitlog so she can see the commit message. She also includes the patch option. That way, git will show her the full diff for the change, as well as a commit message, giving her maximum context. Looking forward to finding out the reason for the in-memory sort, Sima runs the command and examines the output. Now, being the fastidious type, Sima's very glad that Josie corrected this typo. As to the mystery of the in-memory sort, she's none the wiser. Unperturbed, Sima cracks her knuckles and prepares to use a more advanced git food technique. Time to break out the pickaxe. Gitlog, minus Capsule S, also known as a pickaxe, allows us to search through commit histories and find all the commits that contain a particular snippet of code. Sima's going to use the pickaxe to find the very first commit that introduced the sorted patience method. She calls the command with the method name as a search parameter, again, including the patch option, so she can see the full diff. She also includes the reverse option. This way, git will return the commits in reverse chronological order, and she'll see the first commit that used sorted patience right at the top. Hopefully, this will finally solve the mystery. She runs a command and inspects the output. OK, so another dead-end. It looks like Josie had had a change of heart about the method name. I guess sorted patience is probably more intention revealing than load patience, but she's still no closer to solving the mystery. Not a problem, I think, Sima. She can rerun the search, but this time use the original method name. So she calls the pickaxe again, this time searching for load patience. Perhaps this will finally solve the mystery. OK, so it feels like we're getting somewhere. Up until this point, the code was performing a search in the database query. And this is the very first commit that used an in-memory sort. And the commit message mentions something about an ordering bug, so it looks like it was an intentional choice. But there's still no clues as to what caused the bug or why this would have fixed it. Sima decides it's trying to switch tacked and goes looking for the original pull request for the change. She finds the commit on GitHub and clicks the magic and unassuming link that will take her to the PR for the commit. Sima's hope had been that the description would give her a bit more context for the change, but all she sees is a link to Pivotal Tracker. She wasn't aware that the company used Pivotal Tracker, so she asked a colleague how she might get access to the project, but... turns out the company archived the project when they switched to Trello and when the subscription lapsed, so did access to the project. Back on GitHub, Sima scrolls through the rest of the diff looking for more clues. She finds the commit that adds some tests and sure enough they're verifying that the patients are displayed in alphabetical order, but there's nothing else in the diff giving her any indication why the in-memory sort was used. By this point, Sima's fresh out of ideas. Her search to discover the reason for the in-memory sort has come up fruitless. And although the test suite is giving her some confidence that removing it's probably fine, she's still a little uneasy. She decides to proceed with caution while she continues her work and look for more clues. OK, let's find out how we got here. Meet Josie. Josie is an engineer at Doxa Russ, a startup that's building appointment booking software for medical professionals. Josie was pretty much one of the first hires by the company after they secured funding and loves the fast pace of startup life. Today, however, she's having a bad day. She got up late and managed to spill her carefully crafted single-origin pour over all over herself when I rush out the door. So not only is she late and covered in coffee, she's also severely under-caffeinated. Late the day before, she was working on a strange bug where patient records were being displayed in the wrong order. They needed a fix pretty sharpish because they had a big demo coming up with a potential new client. After a bit of digging, Josie had figured out what the problem was. The patient's association, it was quite a puzzle because it looked like the controller code was doing the ordering by name. After a bit of digging, she'd managed to figure out what the problem was. The patient's association was being loaded through the appointment's association, but the appointment's association had a default ordering on it, ordering the appointment by date. What this meant was every time the patient's association was being called, it would automatically inherit the ordering from the appointments and any additional order calls would get appended to the end. So instead of returning the patients ordered by their names, they were in fact being ordered by their appointment date and then their name. So the obvious fix would have been to remove this default ordering. But when Josie tried that, there were a heap of test failures. Turns out quite a lot of the code was relying on the appointments being returned in a date order. So realising it was going to take a while to unpick all the failures, she decides to put together a quick fix for the bug and come back and remove the default ordering later when she had more time. So she introduces a method in the controller to resort the patients after they've been loaded. She'd later had a change of heart about the method name and also added some tests. So here's how the commit history looks this morning. Her plan had been to tidy up the history before creating a pull request, but today she's feeling cranky and she just wants to see the back of this bug so she can move on to something more interesting. So instead she throws caution to the wind, pushes the code to GitHub and creates a pull request. She uses a link to the pivotal tracker story for the bug for the description because all the details are there already. There doesn't seem much point repeating the information. A few moments later she gets a notification from the CI server. Looks like the build's broken on her branch. She discovers she'd failed to update an integration test that also needed updating. So she adds an additional commit to fix the test and then another when a co-worker points out a typo. Now normally her co-worker would have pulled her up on such a messy commit history, but having seen the unhappy state she was in that morning, perhaps they thought it would be kinder to let it slide this time. With the build green, the PR is approved and the bug fixes shipped just in time for the big demo. Happy that she squashes another bug, Josie moves on to the next task, but not before adding something to the backlog to remove the default ordering. So it doesn't catch anybody else out in the future. OK, so the eagle eye amongst you here may be wondering at this point why Josie didn't just use the reorder method instead, as that would have replaced the existing order clause. Well, that's because I've contrived Josie's timeline so it happened in 2010, which conveniently for me is before reorder was added to Active Record. OK, intermission over. Let's get back to the story. Meet Josie. Josie is an engineer at Docks or Us, a startup building appointment booking software for medical professionals. Josie was pretty much one of the first hires and she loves the fast pace of startup life. Today's got off to a great start. Josie got up early to beat the rush hour and enjoyed reading a really interesting blog post about revising revision histories after she sipped her delicious, lovingly crafted single-origin pour-over. Yesterday Josie had been working on a bug where patient records were being displayed in the wrong order and by the end of the day she'd put together a fix. The commit history was a bit of a mess and her plan had been to tidy it up this morning, hence her choice of reading on the way in. She decides the commit where she renamed the method isn't going to be much use to anyone in the future and at best would prove a distraction for someone trying to understand the nature of the change. She also decides that the history would be more focused if the bug fix and the test for the bug fix were part of the same commit. To tidy up the history Josie's going to use gets interactive rebase tool. Interactive rebase makes it possible to revise our commit histories by letting us edit, squash, reorder and reword commits. She tells GIP she wants to interactively rebase the last free commits. When GIP presents her with those three commits she marks the rename method commit and the add test commit to be fixed up which is basically telling GIP to squash them into the first commit. The first commit she marks as reword so she can write a more detailed commit message. For the commit message itself she makes sure to explain the nature of the bug and also why she chose to fix it in this way. At the bottom she also adds a little note about the work that's planned to remove the default ordering. And the commit message also serves as a perfect title and description for the PR saving her the trouble of having to write a new one. A short while later she receives a notification from CI telling her the build's broken on her branch. It looks like she'd missed an integration test that also needs updating. So she updates the test and makes sure it passes by staging the changes ready to be committed. But when she runs GIP commit she includes the amend option. So instead of creating a brand new commit GIP will amend the existing commit that way keeping all the changes related to the bug fix on a single commit. Because she's happy with a commit message and doesn't need to make any changes to it she also uses the no edit option and GIP amends the commit without prompting her. Because she's made a change to a local commit that's already on Github she's going to have to force push that's already there. To be safe she does so using force with lease that way GIP will warn her in the unlikely event that somebody else has made a change to the branch in the meantime. With the build green and the typo corrected the PR is approved and the bug fixes shipped just in time for the big demo. Happy that she squashes another bug Josie moves on to the next task but not before adding something to the backlog to remove the default ordering so it doesn't catch anybody else out in the future. Meanwhile back in the present day Sima has started a new task and is puzzling over why some code is sorting patients in memory rather than as part of the database query. She wants to know why and decides to use some GIP foo. She runs GIP blame to identify the revision for the line she's interested in and passes it to GIP log along with a patch option so she can see the full diff as well as a commit message. I don't think Sima she reads the message so this was a work around for the default ordering on the appointments association. She also notices the commit message mentions something about removing the default ordering altogether and sure enough when she checks the model it's gone. I guess whoever removed it must have forgotten about the in-memory sort. Oh well, these things do happen. With the mystery solved Sima feels confident that she can remove the in-memory sort and carry on with her work. Now as developers we do many things to try and keep our code maintainable. We carefully think about the names for our objects and methods. We write and maintain automated tests. We try and create good abstractions. We refactor. We make deliberate effort because we want to keep our code easy to understand and easy to change. But here's the thing our software is so much more than just the code. At its best code can clearly articulate what our software is doing but if we want to understand the deeper why of our software and the code it's often key that we can understand how it got to be where it is today. We write modern software iteratively start-ups pivot, requirements change, bugs are found and hopefully squashed. We're constantly course correcting the code to keep up with our ever-growing and shifting understanding of what it is a software needs to do and along the way we make many decisions and trade-offs the consequences of which I felt long after they're made and as we do so we build up a kind of institutional knowledge that defines our software in a way that the code can't express and it's our ability to grasp this knowledge that's as important to the maintainability of our code and it is to keep the code in good shape. Peter Nower spoke to this in a paper he wrote in 1985 called Programming as Theory Building. In it he proposes that programming isn't actually about the production of executable code but it's actually the process that programmers build up their mental model their theory of how the software needs to work. To Peter the code itself was merely a secondary artifact in the paper he goes even further and states that for the software to remain viable and maintainable the programmers that hold the knowledge in their heads need to be around. The program effectively dies when the team is disbanded. Now it's rare that an entire team is disbanded but just like our software our teams are not static either. New team members join and quickly need to get up to speed with this knowledge to become effective and long standing members leave taking their hard one knowledge with them. The power of the revision history is it gives us a way to capture this knowledge right there alongside the code as we change it and it does so in a way that's both searchable and in a way that won't go out of date or age. With a well put together revision history every line of code is documented every change is explained and as the code base grows and ages the value of this revision history grows with it but only if we take the time to shape a useful history in the first place. Now I imagine there are many of you in this audience who everything I've said today will be self-evident. You're confident reshaping your histories and regularly write novel length commit messages. Equally I imagine there will be quite a few of you in this audience for which as great as this sounds in theory the prospect of shaping your histories can be a little bit of an intimidating prospect. Let's face it, it's called Git for a reason. Whilst Git is extremely powerful and versatile as a tool its primary interface with its many commands and esoteric option flags is not particularly user friendly. Sites like Git WTF exist for a reason. And unfortunately there aren't really any silver bullets here just like writing good automated tests, doing this stuff well takes patience and practice. My aim with this talk was to convince you that the effort is worth it and I'd like to finish by sharing a few simple tips that help me on my journey to creating more useful histories. Now there's only so much I can cover in a time I have today but I've published a blog post with links to more in-depth resources. So if you want to learn more on this subject go there. So first up make sure you're set up for writing good commit messages. Now for me this meant getting out of the habit of committing with dash m. Instead, well basically the command line isn't an environment that encourages you to write detailed messages. Instead I recommend you configure Git so it knows your editor of choice. That way you'll find yourself in a friendly and familiar environment and you're much more likely to put some detail on your commit messages. I'd also recommend turning on Git's verbose mode. With verbose mode on you'll get to see the full diff for the change right there in your editor as you write the message and it's a great opportunity to review the change and remind yourself as you write the message. This is also where I'll often spot something in the diff that I think actually belongs in another commit and it gives me a chance to back out, restage the changes and go again. Or perhaps in the process of actually explaining the change I'm looking at the code and I'll say actually there's another approach I could take here. And when it comes to the commit messages themselves focus on capturing the why and not just the what. Hopefully most of the what should be clear from looking at the diff. Instead use your message to capture the kind of context that will be lost otherwise once the change is made. I've got a simple example here based on an idea I have when I'm writing a commit message I like to put myself in the shoes of a future developer that's looking at the change and trying to understand why I've done what I've done and answer the questions they might have right there in the message. It's not even like some hypothetical person if you do code reviews it's literally the person that's going to be reviewing your PR. This is a simple example to illustrate the point. I was doing some work on a project recently refactoring some partials and I'd come across one that didn't appear to be referenced anywhere in the app except for this one spec that was testing the PDF rendering code. I wanted to know why it was used there so I could be sure it would be okay to remove it. Luckily the developer that had written the test was around and I could ask them and they confirmed that they'd chosen it because it was basically plain HTML and it made the test easier to set up. That's an example of something you could capture right there in the message then if that person had left the information would still be available. This is a useful example to illustrate the point but it's not a great one. The information we're capturing here is fairly low value it relates to the particular mechanics of the test and I could have reasonedably argued that it would be safe to remove that partial. Far more interesting is capturing information that relates to the business domain for your project the sort of stuff that can't easily be figured out by looking at the code alone. Capturing that sort of stuff in your commit messages is like burying little knowledge nuggets of treasure for future developers. Another little mind hack I like to use is if I find myself in a situation where I really want to write a comment in the code and there's not an obvious way to refactor things to make it obvious without a comment then maybe that comment belongs in the commit message instead. Thirdly, think carefully about the shape of each commit. Now in the story for Josie this meant collapsing everything down into one commit and I hope the message you took from that is that everything should just be one big commit. That was done mainly to keep the story easy to follow. Instead focus on creating small, focused and atomic commits that just do one thing. Joel Chipendale gave a great talk at a local meetup back in the UK called telling stories with your get commit messages and in the talk he talks about this idea of a minimum viable commit. If you find yourself using the word and in a commit message perhaps there's actually another commit trying to break out there. Think of each commit as almost like a mini pull request that gradually builds up on the work to deliver. And thinking about the shape of your commits as you go is going to make your life a lot easier instead of rebasing everything right at the end. Your best friend here your best friend here is that good old patch option but this time when you're using get add to stage your changes it's like a little mini text adventure for staging changes offering you each chunk of the diff in turn and asking you whether you want to stage it or not. I'll be wearing this talk where I've said the patch option by the way you can use dash P for short. Another little mind hack I like to use is this great quote from Kent Beck but as it relates to the revision history if I'm working on some code and I'm adding some new functionality and I decide that there's some refactoring work that's going to make that easier then I'll try and break that down into two separate stages. I'll do one commit that performs a refactoring without changing any behaviour that actually changes a behaviour. This is going to make each commit more focus and it's also going to be much more helpful for the person reviewing your code. Next up, get used to treating your commits as mutable things that up until the point that they're merged into your main branch you're free to chop and change and reorganise as you see fit. There are some caveats here of course if you're collaborating with others on a branch you've got to coordinate any rebasing carefully so as not to cause conflicts. A great place to start is the amend option which allows you to edit the most recent commit adding changes, removing changes, correcting typos. If you want to take things a level further and you want to start tweaking older commits then look up using the fix up option to create fix up commits and then automatically rebasing them down with autosquash and don't fear the rebase. It can be intimidating at first but with practice it becomes an invaluable tool for revising your histories. And honestly, once you get good at it you feel like a superhero using it. And if you do find yourself in a pickle you could always back out with rebase abort. Finally, spending time searching through your revision histories is a fantastic way to build the instincts for the sorts of histories that will be useful for other developers. This might sound slightly controversial but I'm going to recommend trying to use git blame a bit less. It's a pretty limited tool as it will only ever show you the most recent revisions for anyone given line. Instead, get used to using the pickaxe. It's far more powerful as it can show you the full history of a particular snippet of code not only across multiple commits but also across multiple files. And if you do find yourself really needing to identify the most recent revision on a line maybe use git annotate instead. It's basically the same thing but slightly less accusatory language. As I said before unfortunately there aren't really any silver bullets but hopefully those tips will help you on your journey to getting better at constructing revision histories. Everyone in this room will exist somewhere on a spectrum of git fluency but we all start in the same place. This history I put together back in 2012 isn't going to be much help to somebody trying to understand the nature of the work I was doing at the time. Take a minute to admire them. There's actually more. It gets worse less anonymous the further down you get so but you know switcheroo I bet nobody's ever got that into a commit message. Since then I've been fortunate enough to work with some fantastic developers who've helped me understand the benefits of putting together good revision histories and also helped me learn the skills to do so in the first place and it's because of their help and patience that I'm able to be stood here sharing this with you today. So my final tip is for those of you at the other end of the git fluency spectrum this Do you like that Aaron? This stuff can be intimidating. If you work with someone who's commit suggest that maybe they don't fully appreciate the value in constructing a useful history or maybe they don't know the skills to do so, help them and I don't mean leave snarky comments on their pull requests actually sit down with them, pair with them show them how they can revise their histories into something more useful and teach them the benefits of doing so. If everyone in this room that's already mastered this stuff helped two co-workers do the same we'd all be better off. Thank you.