 Hey everybody, thanks for coming. My name is Emily Schaefer and I work at Google contributing upstream to Git. Today we're going to talk about project onboarding, specifically about writing documentation as a technique for bootstrapping yourself on a new project. Due to technical limitations with the platform, this presentation is recorded, but I'm here and I'll be doing live Q&A at the end. Please type in your questions and I'll answer them when the presentation is over. Let's start with a quick story. About a year ago, I switched teams within Google. Previously, I'd spent a couple of years working on open source on the open BMC project. My new team now spends most of its time contributing upstream to Git. So I was trying to figure out how I could filter the contents of Git log within the codebase, but I really wasn't getting it to work well for me. This kind of filtering occurs during a fairly complicated process, the revision walk, and it uses a written from scratch grep library. So I went looking around for documentation on the grep library and I found this. And I really didn't know where else I could find out about the grep library short of examining the code myself. By the way, files like this have been removed pretty recently from the codebase, so... This kind of thing happens all the time. Documentation for newbies is written by project veterans who really don't know what they know, and they end up leaving a lot out. That's bad because it intimidates the heck out of new contributors. They have to crawl through source, which is often uncommented to try and figure out what's going on. Lots of projects don't have a good way to onboard new developers. And having poor onboarding documentation can get in the way of a lot of kinds of people. So for starters, we've got somebody who's mainly a hobbyist writing open source. They might be kind of new to writing code or they might be new to this code layer, like for example, they usually write frontend, but now they want to contribute to OpenBMC. Second, we've got a professional developer who just started on a new team and wants to succeed. There's a bad team that doesn't have this documentation, by the way. Documentation is really hard and nobody wants to write it once they start writing code. So lots of teams have bad documentation. And finally, we've got a manager or a team lead who wants to know how to onboard their new developers quickly. They don't want to spend a lot of developer, like veteran developer time relating, you know, sort of eldritch lore that nobody writes down. These people can all find their answers by writing or by asking their new hire what they mean tutorial documentation. Wait, tutorials? Let me back up. There's four main kinds of documentation. You might write a how-to document to document your release process. You might write an explanation to document the root cause of a bug. You might write a reference to document your API and finally, you might write a tutorial to allow somebody to learn from zero about a project, module, feature, so on. How-tos and tutorials seem kind of similar, but in fact the scope of the how-to is narrow. It tells you how, but it doesn't tell you why. And the why in the tutorial is really what's vital to the deep understanding that you need to get rolling and be productive on your team. By the way, these four quadrants are pulled from a really great blog post on the roles of each type of documentation and I linked that at the bottom here. You should totally read it. So, why tutorials? Let's talk a little bit more about why they're so great. For starters, I have maintained a project before and if I open up my Garrett queue and I see someone had written documentation on my project, I would be so excited. Now I want to work on the review with this person. I want to get their fantastic documentation checked in. You're saving future me time. I can explain with a link instead of a paragraph now. And usually, I would ask for a show of hands for this, but just think. Do you like it when a project has great documentation and how many projects that you've worked on have had great documentation? It's pretty common that projects need it. Finally, when you're making yourself known with the new kind of work, this kind of work actually starts to build your reputation in the project when it comes time for you to actually submit code. So, when you're adding a new feature, if you're like me, you kind of chameleon code. You look for something that does something similar in the project or in Stack Overflow and you tweak it until it does what you want. Do you understand why that works? Do you understand why you needed to make the changes that you did and how that will impact the rest of the project? This is especially helpful with legacy code bases that have a lot of history that you need to untangle. And once you understand the architecture this well, you'll identify other things that you can change or add and have a lot easier time doing so. And finally, this lowers the barrier to entry for others. So now it's really clear how I get started. And by the way, explaining to a expert is a really fantastic way to learn for yourself. Like maybe in your past you've explained a homework problem to appear. Your own understanding grows when you do so. More contributors is really great, especially if they're qualified contributors. So this is really important work that you can do. Okay, so we're sold. We're going to write a tutorial. But when is it appropriate to write these? If there's already tutorials, of course you don't want to write more tutorials that cover the same ground. Instead, you should walk through those tutorials yourself and you should make edits if you don't understand something. You can do this when you don't really know how to do something yet. But this way you can make a really huge impact right away. Everybody loves documentation and you're easing the path for the next movie. And it's also really good when you're new because you don't have any prior knowledge which means that you don't have any prior assumptions and you'll definitely explain all of the details that season 2 members take for granted. For example, one time workspace setups. But hang on why am I writing the definitive guide for something that I don't know anything about? What if I make a mistake? You kind of feel like this, right? It's a little scary. The best way to get the right answer on the internet is not to ask a question but to post the wrong answer. That's Cunningham's law. It's an axiom attributed to Cunningham Inventor of the Wiki. And you probably experienced it if you've ever opened Stack Overflow before. When your tutorial is reviewed by fellow contributors, coworkers, boss, whatever they're going to find mistakes, they're going to find inefficiencies and misunderstandings and they're going to fix them for you. Maybe you didn't have the contextual knowledge to phrase the question, maybe you didn't even know that there was a question. So when you iterate on your tutorial you can include this new information and you can rejoice that you learned something new. So how do I write one? The first thing that you want to do is figure out what you want to write about. A decent exercise is to watch for questions that you ask your coworkers and you get any kind of answer that isn't just a link to documentation. These can be specific to the code base, like how do I add a new command to git? And this way you can cover some of the project architecture and the best practices or they can be general to the whole team, like how do I make a change to production? And that might include some explanation of the version of the process code life cycle, so on. Or they can be somewhere in between, like how do I run the test suite on my machine? And this way you get a little bit of both. You get the test environment set up and you get the test location in the source code. For the rest of the talk we'll follow along with an example, a tutorial that I wrote for brand new git contributors on how to add a new git command. And I've got the link just actually up here in the sidebar. So now that you asked a question you'll go about it. And as you do so, you can write stuff down. You can treat your tutorial like a journal, like first I added a new file in the source directory, then I added the entry point to lib slash api.h. And finally I built it with make. If you need to go digging for something, you should teach your reader how to dig. If you simply answer the question for them, you're robbing them of the opportunity to learn how to learn in the scope of this project. So for example you could say, if you grep the project in module.c, which grepping again shows that it's invoked by api slash handler.c. If you need to ask something, the next person will too. And so if you have a question you should write it down. Even if there's already documentation, you should consider merging that documentation into your tutorial or linking directly to the other documentation. And by the way, if you're not really sure how to ask your question, like do I use the mailing list? Do I use IRC? Do I use a use net forum? You should write that down too. If you're frustrated figuring out how to do something, try to remember that you're the last person who's ever going to feel frustrated that way. If a concept didn't make sense to you until you reached a Eureka moment, you should try and summarize the key pieces of your Eureka in the tutorial. This kind of happened with me figuring out what the different kit objects do. Don't worry too much about making a mistake. Usually if you're wrong about something, that means that the documentation is wrong, which is what you're fixing. Your application is going to help you out when review time comes around. So in our example, I actually wrote the headers here, like clone the kit repository, identify problem, blah, blah, blah, before I knew how to do any of those things. And I actually was wrong at first about where and why the best place to clone code from. That was subject to a little bit of quibbling during the review. So it happens to everybody. Speaking of reviews, that's actually what comes next. You finish the tutorial. Now somebody can read it and they can go on a fantastic voyage from zero to working code using your document. So next you can send it in for review. Well, first, if you're not sure how to send it for to review, or if the contributing doc isn't very clear, you can add that to your tutorial also. This is especially true if you're working on an absolute beginner's tutorial, like the My First Contributor example that we're referencing. Try and keep perspective when you're dealing with something wrong or you've done something inefficiently. That's not your fault. You're new. There wasn't any way for you to know, and that's why you're writing it down now. The code review process can actually be pretty intimidating and it's easy to start feeling like reviewers are attacking you. Try and take a break, shift your perspective, and realize that review comments are actually opportunities for you to improve. Unfortunately, sometimes that can sound kind of adversarial, especially if you use X, or wouldn't it be safer to do it like Y? It's okay that you didn't magically know that this API existed. That's like the point of the exercise. So here's some examples of some review comments that I received when I submitted my PSUH tutorial to get upstream. They sound a little bit scary, so let's see if we can rephrase them. This first one, we can actually transform into, hey, I didn't know that there was a bash style document. I thought it was the Wild West, so now I know. This second one, at this point I have been working on Open BMC for quite some time, which is written in C++. And in C++, you can just sort of say function method, whatever you feel like. They're pretty much interchangeable. And so I figured they would be in C. Not so much. So I learned something new. And this last one, I actually didn't know about this check docs tool until the reviewer brought it up. This was a neat, cool, new tool that I could add to my own toolbox and start using. And I can put it in my tutorial. So try not to think of your code that generates comments as being careless mistakes. They're not careless mistakes. It's just stuff that you didn't know yet. This takes a lot of practice, but it's worth it to get the most out of your reviews. Let's talk a little bit more about code reviews in open source, because they're really important. It's not that uncommon for companies to have slim or zero code review on their own code that their own employees write. I've been in a few companies like that. And it's really not uncommon at all for those who are new to the software industry, like graduating students or engineers who are just changing roles to be unused to the code review process. But open source projects do a ton of code review. When you're writing code at work, your coworkers kind of know you. They know that you passed the hiring bar for your company. They know your background. They know all kinds of stuff about you. But in open source, you don't bring that reputation with you at first. So it's really important to make sure that your code is something that the project really wants. And second, has everybody heard of lottery insurance? You're kind of, you're insulating against an unexpected sudden event that results in you, the developer, no longer being associated with the project with no way to contact you. So you won the lottery, you bought an island to live on, but there's no fiber so you can't get internet on your island. In open source, this is even more important than within your own company. Contributors come and go, especially those who contribute in their free time. And the project really can't be sure that you're going to be around next year to answer a question about your code. So it would really better be easy for somebody else to take care of it after you're gone. And finally, within your own company, you can be at least reasonably sure that the developer who sits next to you isn't contributing malicious code full of backdoors for her to exploit later. Again, out in open source you get a little bit less of this kind of assurance. One of the really great ways to ensure security of a piece of code is to put it in front of as many eyeballs as possible, and that goes for bugs too. All of this comes together to make code of viewing in open source really important and really detailed. There's a couple resulting corollaries. Your review is going to be in flight quite a lot longer because your reviewers are maybe off winning the lottery themselves. Your criticism is going to sound harsher in text. We mammals rely heavily on nonverbal clues like body language and tone of voice to interpret our intent. And text really laps that. So take a deep breath, get another cup of coffee before you decide all of your reviewers hate you. And remember that reviews are an opportunity for your code to improve collaboratively. It's a team effort, so learn a lot and work with the community to make something awesome. So we're back from our tangent on reviews and finally it's time to dog food. Dog food is a common phrase at Google, but it's not really that common elsewhere. It's a reference to the phrase eating your own dog food and it means something like using the product that you work on. It's a lot more exciting than it is. Hopefully while you were writing your tutorial, you also generated a bunch of code from the tutorial at the same time. But if you didn't or if your tutorial changed significantly during the review process, it's time to walk through it again. Plenty of folks like to look at a complete code sample to learn. So feel free to add a lot of informational comments, push the topic branch to a repo fork and link to it from your tutorial or otherwise share the sample code in whatever way your project prefers. Next, you want to get a bunch of other people to try it out. This is how you make sure that your language is readable and really rock solid. Most of your reviewers were probably people who are already experienced with the project which means that they aren't really the target audience. So you want to get somebody who's really new to it to run through your steps and make sure that it works. You can shop for these test subjects on the mailing list, on IRC, whatever form the project uses. And then you need to listen hard to their feedback. You need to review their changes to the tutorial text or you need to make changes based on their feedback yourself. Finally, you can start over, you can do it all again. But really though you can keep an eye on your finished tutorials for pull requests, you can recommend it to newbies, you can listen to their feedback, you can keep freshening and improving it and you should write more. It's inevitable that you're going to find more topics to delve into as you write your tutorial. It's okay to return to these topics with another follow-up tutorial. Okay, I hear you. It's recorded, but I hear you. Emily, I wrote a tutorial. I wrote three tutorials. My LinkedIn contacts have all endorsed me for tutorial. But I want to write code. I don't want to write tutorials. Didn't you say that I was going to onboard as a coder? Let's take a look at some of the side effects that your foray into technical writing produced. By examining I'm sorry, I noticed that this function does this, but couldn't it do that instead? By examining the code closely to figure out what it's doing you notice some inefficiencies or technical debt. Congratulations, you have something you can write a patch for. When I asked why we built it this way my tech lead said, we should fix that. By asking your team about something that you didn't understand, your team noticed something that really should be changed. Congratulations, you have low hanging fruit and you're going to benefit your team in a really big way when that's fixed. My tutorial shows how to use the API in a way that seems reasonable, but it crashed. By demonstrating bare bones use of an API, you discovered a bug in edge case. Congratulations, you broke an assumption that the project made about its inputs and now you can write a patch to make the project more robust. My tutorial shows how to use an API in a way that's totally broken, but it worked? By demonstrating that an extremely bare bone use of an API would not work, you found that it actually does work. A security flaw. Congratulations, you found a back door, you can write a patch to disappoint hoodie wearing, laptop users in dark rooms across the globe. All of this deep exploration and understanding of the code isn't an accident. A common study strategy is to find someone who doesn't understand a subject and explain it to them with their consent. After we're done with our studies, this usually manifests in the form of rubber duck debugging. As we explain a system to someone else, we gain a deeper understanding of the system than we had when we were just reading ourselves. By asking ourselves, how do I do this and why do I need to do it this way? We start to understand the history and the architecture of the project. This has a tendency to manifest itself in the form of discovery, bugs, inefficiencies, poor assumptions and areas for improvement. And those discoveries are enough to get the ball rolling. One refactor leads to another and before you know it, you're an active contributor on the project. I think we have time for some Q&A, so I'll be, if people wrote a little chat question, then I'll come up live and answer. Thanks, everybody. Hey, everybody. Thanks for tuning in. If anybody has some Q&A, I didn't see any come in yet, but I'll hang out in here and if anybody types up a question, then I can answer it. Thank you. So Stan asks, is there a point at which you can have too much documentation? That's a good question. I think if your documentation is confusing or if you have multiple documents that are describing the same process, then you're going to find that those two documents actually become out of sync. It's really better to have one document per process and then use, like, links instead. But yeah, I would say if you have two documents about the same thing, that's too much documentation. Samyak says, any documents that you suggest for getting started? It depends. I don't know. I think that you should familiarize yourself with Git. That sounds a little selfish since that's what I work on, but Git is really what everybody uses to contribute with stuff. It's really worth going through a couple of tutorials and getting yourself used to it because there are a lot of pitfalls, unfortunately. So that's probably a good place to start. Samyak says, how do I know in an early-stage project what I need to document? I think when you're at the really early stages of the project, that's actually the most important time to document because you really want to write down what the goal is of your Greenfield project. You want to maybe start with documenting the API and build the API based on the documentation. I've actually done that a couple of times on projects and it's really, really effective. That's probably a good spot to start, yeah. C.R.T. says, where would you publish these tutorials? So I actually had success the tutorial that's linked, the PSUH tutorial is actually part of the Git source tree now. And it was then rendered and hosted on the Git SCM book. So it's not if you're writing a tutorial that's for contributors to the project I think that it should go everywhere else where the project's documentation goes. C.R.T. Oh, I read this one already. Monica says, is there a project that you think did a good job of integrating documentation into early levels of contribution? That is a hard question. I haven't really seen that many projects that really knocked it out of the park. I think I tend to respect Python's documentation just because it's built into the source, but that's reference documentation, not really tutorial documentation. So I think the ArchWiki is actually really good for tutorials for stuff. I've usually even as a Debian user, I usually have really good success with that. Gerald says, we use SharePoint's Wikis Confluence Slack and have lots of information duplicated. What have you come across for wrangling distribution? Yeah, this is kind of related to the question that I got earlier about having multiple when you have too much documentation, right? And once you have duplicate documentation, that's actually really tricky. I think no matter what you use, you kind of want to pick one place for the tutorials to go. Even if you have some of the reference comes out of the oxygen and some of the tutorials are in your wiki and some of the user-facing stuff is in SharePoint, then as long as you have one place for all your tutorials to go and they all go in the same place and everybody knows to look there, that's good. You just don't want to have like some user how-to in one place and other user how-to in another place because you kind of want to be able to see all of it from a bird's-eye view at the same time. Rodrigo says, how are you handling charts and diagrams? Are you using textual representation for that also? Yeah, that depends. So the Git project uses ASCII Doc, which doesn't really do a great job of charts and diagrams. I think that you can maybe like host the asset and then link to it and actually Git does a lot better with binary blob support. Anyways, so that's not really the end of the world. Yeah, it depends. Okay. It looks like we can do another five minutes of Q&A because there's a lot of questions here. So, woo-hoo. Matthew asks, if I can copy and paste the Bitly into this chat, I will post the Bitly link into the Slack chat. Wanda asks, how do you balance technical info with explaining things in an easy to understand way? Wanda, I usually try and fall towards easy to understand over technically complicated. If you find that there's a minor detail of the thing that you're trying to explain and it's really technically deep, that probably means that it's not the right place to delve into it in that tutorial. What I would do instead is actually like write a second one that's got, you know, take a lot more time to explain this complicated technical thing and then link out to it. Because, yeah, that's hard. I think being able to write something in a way that's easy to understand is really the best way to know that you understand it yourself. And so I would always push harder towards easy to understand. Florian asks, what do I think about Java-like documentation? I think Java doc is great. I think that it's fantastic for generating reference, like API reference documentation. I think for tutorials, it's not the right place. This kind of goes back to that slide that I had with the four different kinds of documentation. There's not all doc is the same, not all doc is for the same thing, and Java doc is great for reference, not so great for anything else. Sriram says, what do you suggest for documentation that often gets outdated? I suggest that you source control it. And I suggest that you have a culture of when I update the code, I also update the docs. The git does that really strongly. You can't really check anything in that changes something user-facing without also changing the documentation. That's the culture of the project. Matthew asks, which channel will I post? That's in the Channel 2 Track Open Source Project Updates. I think that's all the questions. Thanks everybody for coming. I'm so glad that everybody had so many questions. Have a great conference.