 I'm Carolyn Tamor and I'm here to talk to you about getting unstuck using the scientific method for debugging. I'm a software engineer at Pivotal Cloud Foundry and I want to talk to you about my favorite tools for debugging. So who here has gotten really stuck while debugging? I see a majority of hands, which is great because I think it's a pretty universal experience and I think if you haven't yet, you will someday. I know I definitely have. So I want to share with you today my favorite tool in my toolbox for how to deal with that moment when you're really, really stuck. It's the scientific method, a process of making your debugging a science experiment. So today we're going to talk about the what, the how, the why, and the when. What is the scientific method? How do you use it for debugging? Why is it such a valuable tool? When do you know to pull it out of your toolbox? What is the scientific method? The scientific method is just a fancy term for the process of doing science, which is really cool because I think science is really cool. So it starts with a step of gathering knowledge. What is already known about your topic? If you are researching stars or frogs, what research have other scientists already done? What is known about your research area? And then you start asking questions, which are really what don't we yet know about this topic? What is there still to learn? What did previous research find, which is really weird and interesting and you want to dig in more to? And then you make a hypothesis. A hypothesis is just an educated guess. It's a statement about what you think might be the answer to your question. When you're doing scientific research, you phrase your hypothesis in terms of a null hypothesis, which is to say you phrase it as if, like, the statement that there's no correlation between two variables and then you want to disprove it to provide evidence that there is some statistical correlation between two variables. And then you design an experiment. What information do you need to disprove your hypothesis? What information can you collect, which will give you some sense that it's incorrect? And as you're designing your experiment, it's important to be very, very detailed because when you're doing scientific research, it doesn't have any weight, your research, until it's been replicated by other scientists. So you need to be explicit enough in every step that someone else can do the exact same thing. And then you run your experiment following your experimental procedure and you take really good notes because if you don't take notes on what you see, then you'll have a really hard time figuring out what you learned. And I really think the step of taking good notes is part of where the scientific method really shines as a tool for debugging. After you've run your experiment and you've taken good notes, then you come to a conclusion. Did what you observe about your topic disprove your hypothesis? Maybe it didn't disprove your hypothesis, which might lend some support for the idea that the hypothesis might be true or it might have shown that your question was really irrelevant and uninteresting or that your hypothesis is just really off-target and you don't really know. And all of those are great results because then you can come back to the gathering knowledge step and you know a little bit more than you did. Even if what you know is that's not an interesting research question or this hypothesis is definitely not true, those are useful things to know. And then you also share your knowledge out. What good is scientific research if it stops in your lab or your living room or your office? When you're doing scientific research, this looks like publishing your results in a peer-reviewed journal. And we'll talk a little bit later about what this looks like when you're doing debugging on your software. So how do you debug with the scientific method? Well one of the first steps is throughout the whole process it's really important to write it down. That's part of what makes this such a useful tool. I don't think it matters a lot how you write it down. Sometimes I use my notebook. I'm more of a stickers on notebook person and stickers on laptop so you can see my cool stickers. The whiteboard is a really perfect tool if you're collaborating with others if you're pair programming, if you're talking with other teammates. I like stickies if I'm working with people who are a little bit resistant to using this method. And then I can make my experimental procedure on sticky notes at my desk and my pair can be less enthusiastic about using the scientific method. And it works great. My favorite way of writing it down right now is actually just in my text editor. It's right where I'm doing my debugging. It's really integrated with my process flow and it's really easy to copy information out of code or out of logs straight into my notes and then it's really easy to copy relevant pieces of those notes into a bug tracker or into Slack or other ways of sharing out with people. So it works really well. I don't actually think that you have to write it down. I know I said write it down and I'm going to tell you that again and again. But I think the crucial step is that you in some way take some of the information that's floating in your head and move it out of your head in some concrete way. So I love writing. It works really well for me. But I think you could get the same results by talking into a tape recorder and not like saying, like being thoughtful about what you're saying. Like what would you write down? You could say that out loud. I know that not everyone loves writing as much as I do maybe. So then you start with that first step. Gathering existing knowledge. What does this look like when you're debugging your software? It looks like doing a brain dump of everything you know about your bug. And you want to start with the user-facing impact. How did you notice this bug? A customer went to the admin page and got a 500 and they sent us an email with not enough information. And you want to see everything else you've discovered in the process of using this tool. I often forget about this when I'm starting debugging and then I only pull it out when I'm really stuck. So I've often been working on the bug for an hour or a day or a week. So I've learned some things and forgotten some of them, but you want to write down everything that you remember that you know. And you want to include like weird log lines you've seen, other strange behavior that seems maybe related, just sort of stream of consciousness, write it all down. And as you start writing it down, I think it comes kind of naturally that you start asking questions. Because at first you write down the things that are definitely happening. And then you start writing down the things that are definitely happening, but you're not really sure. So for instance, maybe my service is spamming the database and that's why the whole thing's falling over. I'm not sure. That's a great question to write down. And you might have two or three questions. You might have 20. You might have an overwhelming number of questions. But once you start getting a few questions, hopefully one of them will start being interesting or you can sort of give up and be like, I'm just going to pick one. And so that's sort of the second half of this, is you pick a single question. And a couple of other interesting questions to ask when you're thinking about these questions are how do I know what I think is true? So I stated in my facts before the things that I know are true. How do I know they're true? Are they true? Potentially the most valuable question you can ask is is the thing I think true actually true? So I'm going to start with a question to pick as your one question to start with. You can come back to the others. You don't have to be afraid that they're going away forever. They're still on your paper. So once you've picked a single question, then you make a hypothesis about it, an educated guess. When I'm using the scientific method for debugging, I sometimes play a little fast and loose with it. I think of it as a general framework, not a rigorous scientific approach. So occasionally I frame it as an old hypothesis, but sometimes I just frame it as a more general statement about what I think is happening in the system, a guess to the answer that question. And I think it's helpful either way. If you have more than one idea about what's happening, that's okay too. You can just pick one, flip a coin. It doesn't matter. You just need some statement against which you can test. So then you design an experiment. And it's important again to be really detailed. Here it's less that you're going to publish your experiment in a peer reviewed journal for other people to replicate unless you have a different type of bug than I do. But you might want to, you'll want to refer back to it for yourself, for teammates, and so it's important to write down and detail the steps that you want to take. And the reason that this is a helpful step is because it's much easier to say what information do I need to disprove my hypothesis. I don't have to fix the bug. I just have to prove that the problem is not that my service is spamming the database. That's all I need to find out. And as you're designing your experiment, you also want to write down what you expect to see. What things would you see, what log lines, what behavior if you restart the service, what would you expect to see if your hypothesis is true or if it's false. And then you run the experiment and you take really good notes. What do you take notes on? You want to take notes on what you expected to see? Did you see all those things you wrote down in your experimental procedure? And you definitely want to write down about all the things that you didn't expect to see. And especially the, oh my gosh, I had no idea my software could do that and I really don't know why. Those are also really important to write down. I often find other unrelated bugs while I'm following this process. Definitely write those down in your bug tracker. Not now. Don't get distracted. But write them down. You have new bugs. Yeah, that's software. Not sure it's great. One of the things that's really helpful in this taking notes process is I love to grab annotated log snippets. So you don't need to read the details of what the logs here are doing. But the interesting thing is I grabbed a chunk of logs with all their time stamps from the logs of the system that was taking too long. It was timing out and I didn't know why. And then I took a note of, hey, the time between the beginning of this process, the staging and the end of the creation phase took 52 seconds. And I know I have a three-minute timeout for a process of which this is only a small part. And in a healthy system, this takes three or four seconds. I don't know what's going on, but it's interesting. This is helpful because if you just grab the logs, then tomorrow you're not going to know why those logs were interesting. It's really valuable. This bug was like a month ago and I still know what it was doing because I wrote two sentences at the top of the log. So then you come to a conclusion. Was your hypothesis disproved? Do you feel like you know what the cause of your bug is? Maybe you still have no idea what the cause of your bug is. And that's actually great because you had 6,000 possibilities and now you have 599. And that's actually a lot less possibilities that your software could be breaking. And it helps you move forward. If you figured out how to solve your bug, it's great. Go on and fix it and don't forget the share out phase that we'll talk about in a moment. But if you haven't figured out why your software is breaking, your hypothesis was disproved or was proved in a way that you had no idea what was going on. Then you circle back to the gathering knowledge phase because you now know more about your bug. You know that your hypothesis is untrue and that's not what's actually going on. And you've probably, because you've taken detailed notes on exploring a corner of the system that's related to your bug in some way, you probably have learned a whole lot of other things. So you may have new questions. You may have new knowledge that prompts new questions. You can then refer back to the questions that you said aside before, and maybe one of them looks a heck of a lot more interesting or a heck of a lot more likely as the cause. And then there's the share what you learned phase. So what is important to share when you're debugging? There's a lot of things that you can learn when you're debugging. Certainly all those new bugs that hopefully you put in your bug tracker, you can share those with your team, your product manager. There's a lot of things that others may see later. So I sometimes will be working on a bug and I'll discover that the issue is not actually a bug in the code. It's that the system was misconfigured and another customer may come along and configure their system in the same way that we know won't work. So it's really helpful to tell my teammates, this is what I saw. Here's the answer. And now next time that you see the same thing, there's an idea in your head of what the problem is. You may have things that you need to share with another team. It may not be your software that had the bug in it. It may be another team's software and that's a great thing to share. And my friend Ray Krantz helped me see. I think one of the most helpful things that you can share from this process is your experimental procedure. Because you just wrote down a playbook for how you solve really hard bugs in your software. Which can be a great onboarding tool for a new person who doesn't have any idea how to solve bugs in what is now their software, but they don't know it because it's only their third day on the job. So it can be a really great teaching tool as well. So, story time. I want to share with you a little bit of how I've used this on some interesting bugs that I had. So I was working on a project and we had a process where we were taking some customer input and customer data and shelling out to Node to run some code on it in a JavaScript sandbox. I know this sounds like really wild and probably like a terrible idea, but we were doing it for pretty good reasons. And we had a customer who was reporting software that they were starting this process and then it was just hanging for like five hours. And we expected that sometimes it could take a little while, it could take a minute, it could take two minutes. You know, our web app was designed to handle that. We had like a spinning little bar to show that we were doing an asynchronous stuff in the background and like, don't worry, it'll return. And five hours later, no returning. So I knew vaguely what section of the code base it was in. I mean vaguely like, it's in this third. And so my first question that I wrote down was what part of the life cycle is hanging? And I thought maybe what's happening is there's something funky in the JavaScript that's running in the sandbox and it's not returning. I don't know. I know more Rails, JavaScript, like what we're doing here is kind of wild. Maybe this is the problem. And we didn't have very good logging. So I couldn't tell from our logs what part of the system it was. So I wrote down a procedure I'm going to add. We were able to get data from the customer that was similar enough to their production data that we were able to replicate it on our system, which is great on a test system. And so I added a bunch of log lines just like we're at this part, we're at the about to shell out to node step. We're shelling out to node. We got back from JavaScript just like where in this life cycle are we log lines? And I restarted the Rails server and it didn't hang in the JavaScript code. It never got there. So this hypothesis was not true. And I had learned something new. So then the question was maybe the data is too big. We knew that our test data was a lot smaller than our customers data. We had seen this before. Other teams were testing with larger data sets than we were and we had this vague sense that maybe our customers data was even bigger. And so we thought maybe when the data is too big shelling out to node hangs. And so we we tried it again with smaller data because in the exact same situation except with a much smaller data set. And that worked. And so we're starting to get a sense that this is really what the problem was. And then we circled back to the gathering knowledge stage and we did more research. And we found out that node has a buffer limit that was way smaller than our customers data set which was way bigger than we had ever imagined. And so we realized that the problem was that we were not piping our data properly and never flushing the stream and so it was just sitting in the node buffer and we were able to fix the bug. And by properly using like the IO library, no open 3 and not like forgetting to flush our data. So I hope this sort of shows how you can use this cycle. Your first question might not be the relevant one and that's what's useful about it. So why is the scientific method helpful for debugging? Well the first reason is collaboration. And it's great for getting unstuck. And once you're unstuck, it helps you keep moving forward. Why is the scientific method a great tool for collaboration? Well it can help you and your teammates get on the same page. When you have been working on a bug for a while and you're frustrated and you go to a teammate and ask them for their help your notes are a really valuable tool for helping them come on to the bug in an efficient manner. You can tell them just what you're working on right now and you don't need to tell them the entire history but you have your notes so you can refer back to it when you when you need to like when they become relevant. You also can avoid telling them all the rabbit holes. If I'm telling a coworker just stream of conscious about a bug I'm stuck on I often go through the process that I've gone through to get here over the last two days and that has a lot of dead ends that aren't useful information for my coworker and so you can refer to your notes, realize their dead ends before you tell your coworker this information and just skip the dead ends. It's also really helpful because I think I switched those slides well the good notes do make it really easier for your teammates to help you because of that getting on board but getting your teammates on the same page is also really helpful if you're disagreeing about what the cause of the bug is. I do a lot of pair programming I spend most of my day pair programming and sometimes when my pair and I are working on a bug we have really different understandings of what the bug is and we might have really different questions we might have really different hypotheses and so by picking this procedure where we have to agree on one question and one hypothesis it actually gives us a really it forces us to communicate clearly about what we think is happening and it gives us a really generous way to give way to each other without feeling embarrassed or feeling like we're not being respected because we can say okay we came up with 20 questions we have to pick just one so let's investigate your question if it turns out to be correct that's great we'll fix the bug and if it's not then we can come back to my question and it gives you a really generous way to communicate when you're sort of really disagreeing we talked a little bit about why good notes make it so much easier for your teammates to help you so it also helps you by getting unstuck and the reason it does this is that it narrows your focus and so you get unstuck with a little rocket ship I love that writing down what you know helps you organize your thoughts when you're feeling really stuck your thoughts are often just going in circles and they're getting a little overwhelming and you just kind of don't know what's going on and by writing down your thoughts and then being able to look at them afterwards that that sort of externalization process is really helpful for organizing your thoughts because you look at all the things that you stream of consciousness dumped and then that gives you a little bit of emotional space to step back from your feeling terrified that you can't solve this bug and it lets you notice which things you still think are important after you put them down on paper and it turns out you might find some of them are not important and you might be like oh that one that I wasn't even really thinking of is really I think that's where my question is I think that's the problem one other way it really helps you get unstuck is the question of how do I disprove this hypothesis is actionable how do I fix this bug when it's a really overwhelming bug is not actionable there's no clear action you can take but very often when you have narrowed your focus down to if you take our example from before I think that my service is spamming the database then you can say okay I can look at how many connections from my service to the database there are and it's much clearer what information you need to select if you've narrowed your focus down so much it's just much easier to come out with what information do I need to do to disprove this tiny question then fix the whole bug and so it helps you move forward indeed moving forward it helps you once you've gotten unstuck then it also helps you keep going forward once you've gotten unstuck you can ask that frozen place you can keep going one of the most valuable ways that the scientific method is really helpful is that it prevents you from repeating yourself in two ways the first is when your teammate comes in and you've asked them for help and they're like hey did you try restarting it and you're like yeah I spent all afternoon restarting it and that didn't work and like do you think I'm an idiot and you know that your teammate is trying to help and it's really pretty frustrated about it I hear some laughs so I think other people have had that experience and so the scientific method you have your notes and you have your experiment and so you have your great notes from when you spent all afternoon trying restarting it yesterday and your co-worker gets to see the results what happened because that's what they really want to know they don't think you're an idiot for not trying restarting it they just want to know themselves what happened when you restarted it you have your notes and they can read your notes and then you can skip that whole process and jump to where you are now which is you know eight hours later from the time when you were trying restarting it with new knowledge it's also great because it prevents you from repeating yourself who here has been working on a bug for a while and then all of a sudden you realize you're doing the same thing you did three days ago to try and debug it again I see a number of hands I've definitely done this and so having that record of what questions you've asked what experiments you've run as you're debugging helps you not go back and do the same things pointlessly because you're lost in this fog of oh my gosh I've been working on this bug forever and I don't really know and it just all blurs together and is it three days or is it ten years I don't know so it helps prevent that repeating in the fog it's also great because each iteration moves you forward in a way that's really observable and concrete every time you've answered your question and you followed your experiment and you come back to gathering knowledge it's much clearer than if you aren't taking notes on it and you aren't following sort of a framework it's much clear to see that you are making progress even if you now instead of six thousand potential reasons for your bug only have five hundred ninety nine five thousand nine hundred ninety nine and that's still way too many but it's really clear that it's one less and so it helps not get frustrated and it helps you move forward so when should you use the scientific method for debugging another story so this method really started to crystallize in my mind when I was working on a project where we were shipping a Ruby on Rails app as a VM image four or five different infrastructures as a service in each one had to be packaged differently and we were using as our VM image like a standard Ubuntu image from Canonical and then we built we were using Packer and Chef and a bunch of other tooling to put all our stuff on the VM image and we decided that we wanted instead of using this standard Ubuntu image from Canonical we wanted to use a VM image that was based on that same image but that another team was building and they had done all kinds of security hardening and so we could get all of that done for free right so the general the best general procedure we could think of for this was go in Packer which is a tool for building VM images and swap out the URL from the Ubuntu one to the one our team was building and try and build it and try and boot it and hope it works and let me tell you that it doesn't work that way and we spent about a month doing this and sitting there being like the VM didn't boot we can't even SSH in what's going on and so it was really a month of the whole team being incredibly stock and incredibly frustrated and we learned a lot and we got it done eventually but I started pulling together these tools of like take notes when you're frustrated and you know throw out a variable or focus on one variable and like all these tools that I had that were individually and started pulling them together into thinking of it like the scientific method because we just kept getting stuck and I had to develop ways for us to move forward because we needed to move to this new image for our VM so I think there's two types of getting stuck and I actually think the scientific method is really helpful for both of them this idea of two types of stockness and that when you're debugging or just building software really just living life you often are alternating between them comes from my friend Jesse Alford so the first type of stock is when you have too much information and you can recognize that you have too much information and that's why you're stuck because you feel overwhelmed or bewildered or you're saying yourself it could be one of any thousand things or you're noticing impossible things that definitely cannot happen in your software you're really sure it can't do that but it seems to be doing it anyways and like are there leprechauns in your software I don't know maybe it's like poltergeists the other type of stock is when you have too little information I can't think of anything I have no more ideas you feel frustrated you feel stalled and the scientific method has aspects which will help both so when you have too little information writing down everything you know is a way of realizing you actually have some information it's very helpful and when you have too little information when you pick a single question and you pick a hypothesis you're narrowing your focus and so even though it could be one of a thousand things now you have just one little thing to help so it's actually really helpful in both phases of being stuck and you're usually alternating between the two as you go through a debugging process and scientific method sort of goes with you the whole time and like can be your friend for both so we've covered the what the process of doing science the how with lots of writing why the scientific method is so helpful for collaborating getting unstuck and moving forward and when to use it when you're feeling frustrated or overwhelmed on really complex bugs I hope the next time you're stuck you try gathering knowledge asking questions making a hypothesis designing an experiment and running it taking really good notes coming to a conclusion circling back to gather knowledge again and sharing out with your team and definitely make sure to write it all down I have a reference guide for this that you can refer back to while you're debugging on my website so that you don't have to I'm not a huge like slides as sharing I think that they lose they're very low fidelity so this can be really helpful for referring back to and I would love to hear if you find the scientific method helpful or have any improvements on it for your debugging on Twitter and thank you I think we have a few minutes for questions if folks would like yeah so the question is how do you know when to stop I talked about spending a month on this bug and like how do you know when it's something external I think that's a really hard question and I think that it has a lot of judgment call involved in it I would definitely not recommend spending a month on a bug without going and talking to other teams and other people we were doing a lot of going and talking to other teams and other people who knew more about this as well during this process probably if you're spending more than a few hours stuck before at least having a conversation with a co-worker or you know a rubber duck or something is like that's probably too long I don't have any clear guidelines I think that it's it's definitely a judgment call based on on your software and based on you know your team and your product team and how important is this bug and maybe we thought it was an important bug when we thought we could fix it in two hours but it's not that serious if it's going to take us a week it's really a judgment call yeah yeah so the question was do I estimate in advance about how long it will take me to solve the bug and because often as engineers we underestimate how long it will take us to fix it but we think we can fix it in an hour and that's totally unrealistic I don't usually try and make concrete estimates with bugs I generally try and develop sort of a working framework with my team with my product manager with my team lead if I'm not the team leader if I am the team lead like with engineering leadership it's sort of a general framework of like at what point should we start checking in pretty regularly so on my current team that's usually like two days like if a bug is taking us more than two days to fix then I try and be having like a daily conversation about how is it going are there other people or teams that need help or that could help you know is it still worth keeping going on this bug but I think that's really something that develops with each team and so it's a conversation to sort of have with your co-workers about what is our our team's cultural understanding of when should we start having regular conversations about how long a bug is taking yeah other questions ok well I know nobody will be sad about going to lunch a couple minutes early thank you so much