 Yeah, don't do these things for that matter. You want this on in the room? So this is the room level. This, my bubble, is open. But I should be the one speaking, I guess. Can you hear me? You can't hear me? It's good? Of course. What? Oh, yeah. I just don't hear anything. So here we're on camera. Do you want laptop or do we want camera? No, no, no. So again, that's just better. Go ahead. So welcome to the Dacathon pick-off tutorial session. Yeah. Woo! So I assume that a lot of you probably have already been hearing maybe more than one or two about this thing called the Dacathon, but for anyone who hasn't heard about it yet, basically what this is going to be is one week where we sort of collectively agreed to spend a little bit less time building new features and fixing bugs and a little bit more time making our various projects in the open-source world more sort of like human-focused or more user-friendly, more readable, and we're kind of encapsulating all of this in the idea of building better documentation. So I just want to give a quick little logistics information for those of you who aren't familiar with this. We're going to begin with about an hour before the Dacathon starts. We're going to start with about an hour of tutorials. We'll have a coffee break. We'll have another hour of those tutorials and then we're going to have lunch. After that, we'll have a working session, roughly, that expands the afternoon during which if anybody wants a project to work on and is unsure of what to do or if you have any questions that come up for many of these tutorials today, then there will be somebody around to help in that process. Basically, each day, that's going to be happening tomorrow. So we'll be here for a few hours each afternoon and we're sort of looking through the different supports that we have to do to mark here. But we encourage you to commit to make additions to your documentation, pretty much wherever you are and we'll put a few drafts about that. Bathrooms, there are two bathrooms in here in the library. You can go out that back door and go straight down the hallway and go to the end of that hallway and do both bathrooms. The only challenge is you can't come back in through this room so in order to come back in, you have to go back around and do the main in this room. Can they hear me? Because I'm the one with the mic. Yeah, so usually the speakers hopefully do. Let me know if they heard what she was saying. Bathrooms, I think that's it. Okay, so now we're going to start off by giving us a general overview of some documentation, best practices and we'll take it from there. Thanks. Hello everyone and welcome. As you may have heard, we're streaming this live for our friends in Seattle and possibly the people in New York will also follow this. Everything that we'll be presenting today, you can find on YouTube. This first tutorial is going to be about how and why you want to write documentation. So my title is how to write good documentation but I think I'm going to settle on writing documentation as a first step and maybe I'll talk a bit about the background on why we decided to organize this event. So a lot of people that work in this space are involved in open source software and unfortunately even if many of the software are great and some are really well documented other lacking documentation and it's hard for a developer in my point of view to find incentive to write documentation and yet it's super, super important. So after discussing with Chris we decided to organize this one week where we're hoping to get people excited about writing documentation and we got good feedback. Some people are really excited about writing documentation so that's good news. So let's get started on how to write good documentation. So first I'd like to thank our sponsors without them this event wouldn't be possible so that would be the free host so bids here the science in Seattle and the Graduate Center Digital Initiative in New York and the Moore Sloan Foundation who's funding both Seattle and us and continuum analytics who provided funding for pizzas in New York. So first question, why write documentation? So documentation is important because it's one of the first connections between the human world and the machine world and so it helped us efficiently translate our concepts and ideas into the language that we use to control machines. So this is a citation from Chris that I sold but it's basically why documentation is so important and you want to write documentation so first especially here in the world of science for you because you'll be using your third in six months and I think we've all been in this situation where having to re-run an analyzer six months ago, well it's not that tubal so writing documentation is like the basis of writing reproducible research. So this is one first reason. The second reason when your project starts being bigger is you probably want your users to use your code and writing documentation shows how, so first that your project exists it should show how to install it because of course if people can't install it they won't be able to use it and it also shows how to use it and last when your project becomes large enough you probably want contributors and if you want people to help out so of course you first need users to this so if you don't have users you won't have any contributors and then second writing documentation is important as it provides a platform for first contributions and I'm going to come back on this part later on so before you can have contributors you actually need to write a bit of documentation. There are different types of documentation we're going to talk about the first one is little documentation so it can be installation instructions tutorials user documentation contributor documentation so everything which can be read as a textbook then there is something we call API documentation which is going to document specifically the function that is in the project so what does a function do what are the arguments the type of arguments what are the default values of this function so everything you need to know to use a particular part of the toolkit then we're going also to talk about examples and galleries so this may be a bit specific to Python so R has a different way to write documentation and in Python it's quite common to have a list of quite short examples to display them in a gallery so I've also added once again tutorials and guides as this is important and this can either be a literal documentation like a textbook or a video on YouTube a tutorial that is given at a conference and then last the type of documentation that usually the developers don't write is a free stack overflow or a frequently asked question list so stack overflow has become popular I guess in the last 10 years or so and it's also a great way for some projects to find documentation so I don't know how many of you are Python developers but one of the projects that has really great stack overflow documentation is map.lib for example so it's really hard to navigate into map.lib's documentation but often if you google a specific problem then you'll find a stack overflow link with like detail explanation on what you want to do and there's also different formats of documentation and this is also quite language specific so there is either HTML documentation so a website that usually has literal and API documentation with gallery and examples or stack overflow is also on the web PDF documentation which is usually equivalent of a website so in Python it's not that common to find PDF documentation but in our for example in the biostatistics community it's very common to only find PDF documentation and no HTML documentation and then last there is interactive documentation which is mostly for API and this will allow you to find the documentation of a specific function so I'm going to move to my terminal which I guess you can't see anything so this is my ipython terminal I'm going to so if I go into ipython which is the interactive Python interpreter and I want to use a function from the library scikit learn so for example I want to find out the documentation on the k-means class then instead of opening my grad browser if I know I want this particular documentation I can call the help function on this class and it will display what we call a doc string so it's going to tell me that k-means is a class that it has k-means clustering and it takes a number of parameters with the description of the parameters and I can scroll there is an example that I can read to see how the function works now of course this also exists in R so now I'm going to do the R demo I'm going to call a library called lima and in R to get the documentation you type question mark the name of the function so I know there is a function called room in this library and here it displays me the documentation so it transforms RNA-seq data count data into lot two counts per million and then the same thing I have the list of arguments the default values what they do and so this is what we call documentation a couple of months ago I guess we ran a survey to find out what people used in terms of documentation what people thought that was neglected in open source documentation and so I'm going to present some other results so first the type of documentation people use is pretty much balance between user documentation and example and gallery and what's the way I interpret this is basically that in order for a project to be well documented you can't ignore one type of documentation so this is not exactly true in all cases and I'm going to show concrete examples where there are projects that have really good examples and galleries no user documentation or really good API documentation so some people so this was not one choice question it was a multiple choice so this is why it adds up so a lot of people just physically ticked everything so what format of documentation do people use and I find this really striking and so this is also where you can see that probably most people who reply to this survey were Python developers because most of them use HTML documentation so I use both R and Python and I much prefer HTML documentation but I just don't have a choice in a lot of the R packages I use and I hope it's going to change more to HTML documentation so Rebecca is going to talk about how to do R HTML documentation and interactive also pops up a lot and for me this is one of the actually most important type of documentation because often I don't have an internet connection when I work and I find it really useful to be able to just in my terminal I type something and I find it but this is my personal point of view and which type of documentation is most often neglected so again this is I think quite balanced a lot of people would like more user documentation it is also the hardest to write so I understand that it's lacking in a lot of projects and I was surprised by the amount of people who were asking for documentation on how to contribute to your project but again it's quite balanced between user API examples and contributions so now I'm going to present a number of examples of well-documented projects and so this is projects that we got through this survey so in Python the number one answer about well-documented project was scikit-learn and then unsurprisingly there are two other projects MNE and Nylon that pop up which are actually were started by scikit-learn developers that probably continued the culture of how to write documentation so there is also Django mentioned in this list which is not a scientific Python project but it has had a really good reputation for documentation since it started 10-15 years ago and the one that I find quite interesting is NumPy because NumPy has mostly only API documentation so it's one of these projects where API documentation is much more important so I'm just going to go over the scikit-learn website so this is the front page and immediately from the menu you can see that there is the home page installation instructions which is super important documentation is divided into tutorials API FAC and contributing so you find the major type of documentation here and examples are all the examples and something which I find particularly interesting so on this documentation I mostly use the example gallery because when I want to do something I just Google Variable Selection for example I know it doesn't work but it must be somewhere oh yeah future selection oh I said Variable Selection and then here just clicking on an example has a short introduction on what the example is about easily some results plot and then the code to reproduce the example so this is quite common in the scientific fighting world now if I go to NumPy's documentation it's a bit different there is still the getting NumPy installing NumPy and NumPy tutorials that I I haven't looked at it in a long time but I remember that it's not great on the other hand this is one of a project where if you Google NumPy tutorial you'll find tens like dozens of really good tutorials now where the strength of NumPy is really in its API documentation every single function that I've ever used and I've used a lot is entirely documented with what it does the exception it raises the parameters needed sometimes when it's deprecated when it was added it has a really extensive API documentation now moving to the R world if I so I do code in R but not that much so I hope I'm not going to say anything stupid and Rebecca is going to shout if I do so this is one that I find interesting because it has an HTML layout which is very clear and it reads like a book so it's it's basically the equivalent of psychonein in R so it does all the basic steps of data analysis from processing, splitting, validation etc and each chapter corresponds to one specific element so it reads like a book so interestingly psychonein's documentation developed like this the goal was to have the documentation to be read as a book so this seems to be a common theme in projects that have good user documentation another one which I found interesting was the ggplot documentation I don't think the documentation great but it's very easy to find out what if you know what you want to do to find out which function to use so it's just a list of all the functions and with an example so I think there is an example somewhere so it's basically api documentation with a description and then a small example on how to do this so all of these I've added links on my slides and I put my slides on slide chair so if you want to have a look at this you can just download the slides and click on the different examples so a couple of key elements to think about when writing documentation so rule number one is documentation is like code it needs to be maintained and people often don't think about this when writing documentation it also means that if you have too much documentation then it's going to be just un-maintainable which means some of the documentation is going to quickly become out of date people are going to try it and the examples are not going to run so it's going to be a bit of a mess so two tips first the quality of the quantity and second there are ways both in R and Python to have your examples encode in the documentation to run it automatically so use this feature it's really easy to set up if you do it from the start then it means every time you write a small bit of documentation then you can just run all the example, all the code and see if it still runs and same thing when you add a feature write a bit of documentation it's a good way also to test that for future for writing actually yeah exactly rule number two if it's not documented it does not exist so this is something I read on the cycle mailing list a very long time ago and there are projects that have really cool new features but that are not documented and in practice it ends up being features that are not used it's a bit of a shame it also means that developers forget about these features and it becomes features that are not maintained so write documentation as you write code also because it helps you think about how you as a user will use it there is something called documentation driven development which I think it's a bit extreme but there are people who say you should write the documentation before writing of your function to make sure that it does what you want so you can find more about documentation driven development online I would just say write documentation as you write code every time you add a feature documented in the user documentation documents for API and just add a small example rule number three and this is something that most developers forget documentation is about communication so when you're writing documentation you need to think of the same way when you're writing a paper or you're presenting a project which is who is your audience who are you targeting so if you take the case of a biostatistics package for example you're probably going to assume that people know what the data is what the clause of the data is but if you're doing your plotting library then your audience is super large so you can't really assume any basic knowledge so find out who is your audience and then second how can you organize your ideas and this is where Jen goes and psyched learns Karat's way of just writing a book allows you to think of how to organize the documentation of your project in a way that people can easily stream through and visual communication is very important and this is why I think that examples and galleries are very very important because people can just look at plots and if it's in their domain for example psyched learn which is a machine learning toolkit has a bunch of very basic plots like a rock curve people know what these are from the field so they can just go through the documentation to very quickly find out what they want now writing well is hard and it's something that is often undervalued there are a bunch of books that you can read on how to write well non-fiction there are guides but it's a long process I suffered a lot when I had to write for example my PhD manuscript and I think a lot of people in science struggle with this and it is a skill that is undervalued at least in France it's not something worth taught very well so it's normal that people are struggling to write documentation but it's like code, it's a learning curve now documentation in practice so I'm going to give a couple of tools guidelines on how to improve documentation this one is tools for documentation so you don't have to start from scratch there is both in Python and in R a bunch of tools that are there to help you so for literal documentation in Python, most people use Sphinx and DocTest so Sphinx is a tool that will allow you to write in a markup format such as markdown or restructure text and compile this into either HTML or PDF and Chris is going to talk about Sphinx just after me DocTest is something that allows you to run the pieces of code that is embedded in your documentation automatically to check that it works in R something which is super popular is Knitter which you can use either with Latex to produce PDF or with our markdown or I believe bookdown to produce either HTML or PDF and it's the same thing, you'll have code text in markdown and it's going to produce a nice output for the API documentation in Python this is done through writing doc strings so doc strings are just text under each function in the scientific Python community a lot of people use NumPyDoc which Stefan is going to introduce later and same thing if you have pieces of code such as examples you can use doc test to run them in R this is done through help files it's pretty much the same thing as in Python except I think it's above the function and then you can use something like R oxygen to compile this into an output and Rebecca is going to talk about this for examples and galleries Python has a tool called Sphinx Gallery which is becoming more and more popular which allows you to write example files and it's just going to build everything and put it in a gallery in R the equivalent doesn't exist but people use something called vignettes to do this vignettes can be short or very long depending on who writes them and I think Rebecca is also going to present a couple of examples now as a project manager where should you start so the first thing to do is to write a readme file which is super short just with a brief description of a project and installation instructions second a license because if you don't have a license people will not be able to use this so it's not so much about documentation well it's documented what people can do so I'm not going to talk about this but there is a link that you can go to if you want to know more about licensing then I would recommend for example and tutorial just because it allows people to get started so it can be 10 lines of code that do something super simple for API documentation which you should probably just do when you write a function because else you need to go back and this can be a bit troublesome five add a link to the code, wherever code lies the issue tracker a mailing list if applicable this will allow your users to find out where to ask for help then comes literal documentation so for scientific project I usually do it just before submitting the paper so it's a bit late but I find that before this it's hard to know well I don't know where I'm going at least so I usually do it just before submitting the paper and then contributor documentation is just probably not applicable for very very scientific projects but as soon as you start working on broader audiences project like scikit-learn maplotlib, MNE then it's good to have a small list of for a new contributor the 10 first steps that you need to do to get started so if you're a new contributor then you start in writing documentation so the first thing I would recommend is to familiarize yourself with the project this is actually how I started on scikit-learn I wanted to learn more about machine learning and my brother who's also involved in scikit-learn told me don't go into one of these fancy machine learning books you won't understand them take scikit-learn's documentation and if you don't understand something create a pull request find out what the function does what the topic is and then create a pull request and I find this is a really good way to start on a project and it can be as simple as there's a typo, there's an English mistake a sentence is not written properly the second thing would be to improve doc strings so I usually do this when I'm working on a project and it's documentation of a function and it's unclear it could benefit from an example then it's very easy to do so I just clone the project type up a small doc string and port it back longer are examples but they're also super important and so this also happened to me recently on scikit-learn I was looking on how to do feature selection and I found that the example was very, very unclear so I haven't actually done the work yet but I've discussed this with another core developer and I've written a ticket for someone to do the job so I wouldn't necessarily add this to the documentation itself so we have this on tickets at least on scikit-learn and map.lib people ask questions and then sometimes it often leads to other person answering this and then at some point the person who opened the ticket can port this back into the documentation yeah, yeah but this is the role of a core contributor to also guide the user to writing documentation and so actually this is something I'm going to show right now if I go on map.lib's GitHub page in the issue tracker there is a bunch of tickets that are labeled documentation and so you can filter for this there are some that are written as difficulty easy new contributor friendly so I haven't looked at this one but I yeah so this is typically a user that read our tutorial tried something that is mentioned in our tutorial but it didn't work so that user just opened a ticket asking a question and then one of the well the maintainer is saying that the tutorial has been wrong for a very long time so this is typically what you want to avoid but I'm pretty sure at some point no, not yet at some point someone will say would you feel like fixing this and making our tutorial better so this is also something that happened quite often now as a project maintainer you probably want to get users to contribute documentation and so here is a couple of steps so first you need to think about your contribution workflow make it easy for users to contribute else if it takes too long people are just going to give up so a couple of things to think about can user install your package easily is the package publicly available can users contribute easily and that sometimes means using a tool like GitHub that some people won't feel comfortable with but it makes so I see Dimitri Froning because he's not a GitHub user but in practice it makes porting back patches extremely easily and a lot of people are familiar with it so make sure your users can get started so write a very brief tutorial it doesn't have to be good but if you have a tutorial it's at least something that people can improve write a couple of examples so same thing if you can identify pinpoints in your documentation open tickets about them and label them clearly as documentation write a short guide on how to contribute and last it won't be too picky about your documentation patches and this is important if someone proposes a patch that improves your documentation it doesn't have to be perfect to be ported into a project documentation is something that is frustrating to do and you want to keep your users that are writing documentation so like accept the patch and say thank you and write down what can be improved open another ticket and hopefully someone will pick it up so this is I think one of the important point because documentation is also one of the things that is very easy to criticize so you can spend a lot of time on a pull request that should be merged very quickly now let's get started so the doc of them tracks documentation related tickets so if you have your own project then you probably registered your project already else it doesn't you can register it still on the website I'm going here so to register you can fill in this registration form there is a list of projects that have registered so if you don't have a project to work on you can pick one and we can help you out we can help you get started on one of these projects and then so if your project is hosted on GitHub there is also a bunch of them which are hosted on the bucket and we unfortunately cannot access the API yet but if your project is hosted on GitHub and there is anything related to documentation in the commit message we will identify this and display it on our website so right now Pandas is leading in documentation projects with map.lib shortly behind so we'll see how it evolves and same thing here if you want to understand exactly what you have to do on your commit message for it to be taken in account here we can give you all these documented somewhere actually on our website I think yeah it's only look yeah okay we'll try to patch something up this week to take this in account so do you have any questions a lot of the things I presented here are going to be explain more in details by other people I can take a couple of questions this is the end okay so as Nell said the first one the first talk was sort of like a high level overview of the world of documentation sort of what works what doesn't work what are the options that are out there and the rest of the talks today are going to be more covering some specific tools that exist in the sort of ecosystem that we have for building documentation so I think Nell mentioned things like Sphinx gallery are marked down these are like specific tools that you can use to actually get things done with respect to documentation and we're just going to cover just short examples for how these things work the first thing that I want to mention really quickly is one thing that Nell didn't show on this projects page is that each project that has signed up for the doccathon has its own page and on that page we're showing a list of basically open issues that don't have poor requests associated with them that are somewhat related to documentation so if you're looking for some of those kind of like easy to contribute sorts of things then check the doccathon page for one of them and you might be able to get some inspiration and find some like easy things to take care of and we're going to update this probably a couple of times each day sort of see how the activity and the contributions evolve over time so as Nell mentioned one of the most important things about sort of modern day of documentation is that yeah is that good? so one of the most important things about the modern incarnation of documentation is this sort of auto generation of relatively well rendered beautiful html and that's really nice because I think that it's easy to underestimate how important it is to have sort of like a nice looking user facing side of your project I know a lot of people basically for better or for worse sort of judge the quality of a project based off of how tightly put together the documentation seems, how complete the examples are and things like that and so one of the best tools that we have for doing this is a platform called Sphinx so what we've done is sort of for this session of the doccathon as well as for a special session of the hacker within that we're going to hold tomorrow we've put together a github repository that basically contains a template Sphinx implementation so we can go right here and this is basically designed so that anybody can clone this repository to their own computer that should be able to just immediately build a semi functional Sphinx website instead of documentations and there's some instructions on how to do that here as well but the idea is really for this to be sort of a template so that you can get this on your own computer see kind of what the basic bare components are and then start playing around with that for whatever purpose you might have yeah Sphinx is yeah I'm going to be taking a sort of Python centric approach to it that's just because it's my world but it doesn't have to be Python there are some tools that are more specific to Python that we're going to cover but then there are also some tools that are specific to R that we're going to cover as well and you'll see some of the files that we can use to build Sphinx sites here so yeah I just wanted to sort of let you know about this in case you want to either follow along here or I'm going to be going kind of quick but tomorrow we're going to be having basically an hour and a half long session of the hacker within the goal of that is going to be to basically start with nothing so start with sort of a standard package like a few Python scripts and stuff like that and then by the end of it to have everybody deploy their own sort of auto generating Sphinx implementation build on their own computers and so we'll take a little bit more time in that situation to help with people that are having trouble getting it up and running so we've also created another sort of sister repository that contains a collection of both Jupyter notebooks and markdown files that is meant to be more of like a conceptual guide through some of this material so Sphinx template is the name of the actual Sphinx repository the the user guide is called zero to docs right here so and I'll make sure that those two sort of link to one another so it's easy to find them but basically you can go through the guides here just using the readme file if you want to and sort of step through the process of what it looks like to create your own Sphinx website so that's what I that's actually what I'm going to do right here really quickly so as I mentioned before Sphinx is a platform for basically using structured text in order to auto generate documentation it basically just says as long as you can include a collection of text files that say things like you know which things to add in the table of contents what sort of semantic conceptual information to put on pages and as long as you structure that text in a particular way using a language in our case called restructured text it will know how to take those files stitch them into HTML and then sort of render it in whatever visual way you would like to you have a lot of the flexibility of any kind of CSS HTML setup so the Sphinx website itself is a pretty good place to start and they have a lot of good information for just really quickly getting started a lot of which I'm going to cover here another link that I've included in this tutorial is a link to more of sort a cheat sheet there are any number of python and other language tutorials on how to get this working but this is an example of basically some combination of both Sphinx but also restructured text so it tells you sort of how you can structure the text inside of your files and how that will then render when it's actually put onto a web page so I just threw those links in there in case it's useful for you as you can see you can do all kinds of complex sorts of stuff so the question that a lot of people always have is why don't you use markdown for this kind of thing a lot of documentation it is possible to use markdown but in my experience anyway you sort of have a ceiling effect where you can't do as many of the things in markdown as you could do using restructured text so markdown has a lot of benefits I think but for sort of a full-fledged documentation approach I think that it's common and useful to use a more sort of powerful flexible language like restructured text and this is just a collection of lots of examples of how you can do things with restructured text so what I'm going to do is literally just clone the website the github repository that I just showed you right here and go through some of the first steps towards setting that repository up on your own computer so I'm basically just going to say take that and dump it into a folder that's right next to the folder that I'm in right now as I mentioned before that's a fully functional working repository but since we're going to do this from scratch I'm just going to obliterate everything that's inside the docs folder because that's something that Sphinx Gallery is actually going to generate for us this is just to make sure that you actually have Sphinx installed and now what we can do is look inside of this folder that we just cloned so if we look here we have sort of the basic components of a lot of Python packages and it would look very similar if you had a package in another language we have a folder that's got a couple of examples in it we have this my package folder which is what actually contains the code that represents our package we have some kind of a readme file and then we have this docs folder right there so it basically just has a couple of Python files inside they just define some functions that we're going to be using later on we also have a tests folder here and this is for we're not going to cover it here but if you wanted to integrate some sort of testing of your documentation and your package more generally you can do that by specifying these kinds of folders inside examples we basically just have two quick little examples one of them plots a scatter plot basically a demonstration of the trapazoid rule which for those of you who haven't taken high school calculus in a while is just a way to calculate the area underneath the curve basically by using a bunch of trapazoids so this is sort of the bare guts of what a repository might look like without any kind of documentation associated with it you could argue that those examples themselves are a form of documentation maybe the readme file is a form of documentation but what we're really going for is some kind of like fully rendered web friendly google search friendly version of this kind of stuff so as I mentioned before it's quite easy to get sphinx up and running on these kinds of websites the easiest way to do this is to use this function that's packaged along with sphinx called sphinx quick start basically all that you need to do is move into the directory where you want your documentation to build run sphinx quick start answer a couple of quick questions and then after that you'll get basically the template that you need in order to create this documentation so that's what I'm going to do right now so I'm going to just go into my sphinx template repository this is what was just created this is what I just cloned I'm going to go into docs there's nothing inside of here I'm going to run sphinx quick start and now it basically just asked me a couple of questions to try to figure out how it should set up this repository most of these you just need to give the default answer to so you can just press enter and go through I'm going to say yes when it asks me whether I want to create separate folders for building and sourcing my website basically the source folder is where you're going to put all of the configuration files in a raw text that is going to be generated into your website and the build folder is where sphinx is going to dump all of the html files that it actually creates so by putting yes here we're basically just saying to create two separate folders for those two things all of the other stuff are things like naming your package what's your name project versions stuff like that I'm not going to go into these here I'm just going to talk about some of the extensions about them afterwards then we can chat about it in one of the working groups or something like that and then it's going to ask basically for a couple of default extensions to be installed so one of the really nice things about sphinx is because it's so widely used people have written a lot of extensions that extend its ability to do different kinds of things so there are a few that are sort of packaged along with sphinx that will automatically insert some doc strings from modules we're not going to worry about any of this other stuff we are going to include the math jacks extension so this basically allows us to render mathematical law tech style formulas on our documentation and then we can also tell sphinx that we're going to be hosting this on github pages which is a way of basically freely hosting websites using github, it's kind of meant for projects along these sorts of lines and sphinx can sort of generate the configuration files needed to make that easy we'll create a make file to make it easy for us to build our website from scratch and then this windows command file is not something that we're going to use but it's useful if you have a windows machine basically so we did all of that and sphinx just told us hey I just created a website for you if we look inside the docs folder then we can see that we now have these extra files in folders that just got created we have the make file which I just mentioned we have this build file and we have a source file ok so and that's listed out right here so basically that make file contains instructions for how sphinx should go about actually generating your website it allows you to basically just use a single line to recreate your documentation from scratch and it's just a handy way to get things done the build file as I mentioned before is where all of your html is going to be dumped after building a website and the source file is where all of our raw text files are going to go if we look inside the source folder we'll see basically two things that are of real interest static and templates just contain some sort of configuration files for the html in your website it's how you can customize it to look the way that you would like to but we're not going to go into too much detail on that here two things of importance though are this conf.py file this is a configuration file that sphinx is going to use to generate your website so this is where you can for example define new extensions that you want to include into sphinx you can point it to certain folders in your documentation in order to make sphinx know about say python scripts that might exist in those folders and then index.rst is basically your landing page for this documentation so you may note rst is short for restructured text so this is a file that is meant to be structured in a certain way so that sphinx knows what it can do with it we'll take a look at that in a second so the first thing that we're going to need to do is modify our configuration file and that's basically to allow our package to allow sphinx to discover the package itself so I'm just going to go inside my docs folder I'm sorry that's so tiny over there I don't know how to make that bigger and I'm going to bring up my configuration file so as you can see this just has a bunch of sort of default things it looks pretty similar to like a bash rc file if you use matplotlib or if you just use your own linuxrc file it tells sphinx things like what are the kinds of structured text that I'm going to use stuff like that what we're going to do is uncomment these lines and then basically include the path to our actual package and that's just because this package isn't installed on the computer yet so this is what allows sphinx to know that it exists so that it can generate the documentation oh damn it the whole thing looks like oh ok yeah so this is just telling sphinx where where our package lives the next thing that we're going to do is add numpy doc to the list of sphinx extensions and right here one of the first things here is basically a list of the extensions that are activated along with sphinx and so all I need to do is copy and paste that there because it's already installed on my computer yeah yeah yeah so actually Stefan and I were just talking about this yesterday we're going to talk about numpy doc here but yeah napoleon is also a really great option and it seems like there's a chance that things are just going to move in that direction anyway so that there's not sort of duplication of functionality and stuff like that but yeah napoleon is another totally legitimate pretty awesome option for building documentation to a lot of these have been sort of tools that have been built up slowly over time on like an ad hoc basis you know like eight years ago just as the developers of packages wanted to have a way for displaying their content a little bit more beautifully but either way we're going to activate the numpy doc extension by just copy and pasting this in here this will just install it for you in case it's not already installed on your computer so now we've got our configuration file sort of up and ready to go let's look inside of our index.rxt file and I should note these are just like a few quick extra guides on how to write markup or how to write like html html using restructured text so if you look inside of our index.rst file we can see that there's this kind of like strange structure that exists there and this is restructured text structure so for example this is how you write like a header if you use markdown then this is similar to writing something like like that so that's in markdown and oh yeah this is not an rst file anyway and this is how you would do it in rst inside this there are a few things that we should also note these little refs down here basically point to particular parts of our Sphinx build and it's kind of like linking to an entity that Sphinx already knows about so we can just give it by name right here this structure right here is a very common structure that you see within Sphinx it's called a directive and it basically just is a way of structuring kind of an arbitrary variable that Sphinx can then do extra stuff with so here we're defining a table of contents tree and this is something that's going to automatically go through it's going to basically go through the pages that we've given it and automatically generate a table of contents with links to it and stuff like that we can do something similar for hyperlinks or for images and we'll take a look at that here in a little bit but as I said we don't really need to change anything else in order to actually generate these docs so what we'll do is just run this make file and I'm just pointing it here to the folder that the make file is inside of we'll run that and let's see what happens so it sort of thinks for a little bit it tells us about the different files that it's building here and then it spits out that the build has been finished if we look now what's inside of the build folder we see that there's a folder called html and inside of that are basically a ton of html files that have been generated by Sphinx and this is the actual website that we have that presents our documentation so if we just open that index.html file then we get something like this so again this is basically just automatically generated for us by running Sphinx Quickstart and then make html and if you notice the structure of this page basically corresponds to the structure that's inside of index.RXT right here so here we've defined indices and tables and references over here if we look at the web page here that's where they are and so you can click these and it'll take us to things where you can search through the documentation in your website yeah so if you tell it not to put build and source in separate folders then it'll be called underscore build and underscore source yeah okay so and then you can also kind of see how the titles are rendered into headings by using the equal signs so now that we have our sort of index file let's add an extra file to this so that we can see what happens once we've started to grow our documentation outwards and doing this is basically as easy as just adding new RST files so what we'll do is just generate a couple of RST files and see how they render one we might create is api.RST this is for example it could be an explanation of the actual api of our functions you know what parameters do they take what are the doc strings what are the outputs that kind of thing and so we're going to use this auto module directive right here in order to generate that and where to go so I'm just going to copy and paste this into a file called I'm putting this inside of the source folder called api.RST I'll generate another file that is maybe this is more of like a conceptual kind of an explanatory document so this is just going to be called overview.RST this is where you sort of give the high level idea of what your package is doing and then I'm also showing here how you can do things like include links to external pages so you can basically just use this fact-tick style syntax here and that'll generate a link for us so I'm going to do yeah okay so the api we're now going to delete but what we're going to yeah so here we're going to create overview.RST right there so we're just going to copy and paste that overview.RST and then now we can just have a little bit of fun and create a page with some other kind of media on it that shows off a little bit more about what Sphinx can do we're going to generate an image we're going to generate a link to YouTube and we'll call it awesomepage.RST so now we've basically added two extra pages to our Sphinx structure to our Sphinx source file we've added resources.RST and awesomepage.RST in order to include those inside the table of contents now we just need to add those to our index file so we go to index we paste them here you need to make sure that they're sort of lined up with the same level as what's above them and now if we rebuild our website let's see what happens so everything worked give us a warning that's because one of the images links to an external source and now if we look inside of HTML we've got our awesome page has been generated for us so now we can now we can see what that actually looks like if we open our index file let's see we've now got two things in our contents here overview and if we click that it will take us to the overview page that we wrote so we can see like the links that we added in there right here and then the awesome page that we wrote which has the picture of a cute little cat on it in case you're not paying attention so it's really as easy as that just generating a couple of lines of code and you can very quickly create these like extra components to your documentation that make it look really professional and so you can do a lot of different stuff with this to add extra information or to link to other bits of your own documentation you can sort of make it as complex as you would like to so this is kind of like a bare bones implementation of Sphinx a lot of fully functioning like highly used packages just Sphinx as their own documentation tool you don't necessarily need to add a bunch of extra features and extensions and stuff this is just one example of which is the request package which is incredible useful and very highly used and you can already see sort of the similarities in font style and stuff like this because they're basing their deployment on one of the base Sphinx themes and if you look at their GitHub page you'll see that they have a folder called docs and inside of docs is our index.rxt file and our comp.py file and stuff like that so this is all the same stuff that we just did here so this has been just like a general approach to getting started with Sphinx as I mentioned before we're going to go into more detail at the hacker within tomorrow and for that you can sort of like type along with us and try to actually get this stuff up on your computer but next what we're going to talk about is how to incorporate specific syntax for numerical computing inside of Sphinx and that's using an extension that's called NumPyDoc which as has already been mentioned is just one of a lot of different options that you might have for doing sort of doc string generation and stuff like that but that is going to be covered by Stefan who is walking towards me right now so I'm going to let him take the mic in and do that Hi good morning everyone if you could note down the two URLs that are on the whiteboard right now that'd be good Chris mentioned these already but the top one the zero to docs that contains the instructions for going from essentially nothing to having a fully fledged Sphinx build with NumPy documentation support included the second one is the repository that shows the result once you follow those steps what you end up with so if you clone the second repository you should just be able to build that and get what Chris just illustrated so just give you a moment to write those down so when I learned programming I learned turbo Pascal and the way you did it back then is you had a text book a physical reference guide so you would flip through the reference guide find either API examples or the user guide would have examples of how to program and you would type over the commands and execute them and see how that works so that worked fine a little bit tedious but a good way to learn and then when I learned Python I thought well what a fantastic concept that you actually package the code and the doc string together all of a sudden you have this mechanism that can easily display a piece of text that tells you how to use the code and for the coder it's also beneficial because you can update the code and the documentation at the same time for some reason though the Python core team doesn't really use this functionality very much so the Python core documentation is all hosted in external Sphinx generated HTML but when in 2006 we had to write the documentation for the NumPy project since we had about 580 odd functions to document we wanted to do it right from the start so we thought carefully about it and we wrote down a standard that is now known as the NumPy documentation format so you'll see that most scientific projects NumPy, SciPy, Scikit-learn, Scikit-image all of these packages, Pandas they all use the same format for the doc strings so if you fire up iPython and you import NumPy and you ask for the doc string for something like np.log you do that by just typing a question mark afterwards in iPython and if you type enter you get a doc string and the doc string in all of the NumPy for all of the NumPy functions we use exactly the same format it's basically the single sentence that gives you a short summary then a paragraph tells you a bit more about that function and then a list of input parameters a list of returns along with their data types and often references to other functions some implementation notes some references perhaps to literature and quite often some examples and this you will find for every single NumPy documentation for every single NumPy function out there so it's a wonderful way like when you're exploring code when you're using NumPy to immediately have all the information you need right at your fingertips so this is great but it's also nice to have a reference guide that you can either load up in your browser or print out so most packages also generate html versions so for example if you go to docs.cypi.org and you click on the NumPy reference guide let's say we want to know more about the linear algebra functionality and specifically the Kuleski factorization well there you see again a doc string of exactly the same format single sentence then a paragraph then the input parameters the returns some implementation notes some examples so how do you how do you get this kind of documentation to be generated for you in Sphinx so I'll show you the quick steps oh maybe I should show you also in Python what does the log function look like so I imported the math module math.log question mark so two lines of documentation so I'm in the Sphinx template repository that I told you about earlier on we've made all the customizations that Chris already showed you there are two important files to remember or two sets of files if you go into the docs directory you'll see that there's a source directory that contains all the content of your documentation and then inside of the source directory there's also a conf.py that's the file we edited to enable the NumPy extension so let's open that conf.py and you'll see that there's a list of extensions somewhere in there and NumPy.doc is one of those extensions that you have to enable then I added a new file called api.rst that's just going to be the api documentation that we're going to generate and in there I literally have only those two lines so it says Sphinx please automatically inspect my module, find all the functions, run them through the NumPy.doc package and include them in my documentation simple as that so let's go up to directories and find my package so my package is a standard Python package that we wrote just for demonstration purposes I'm going to open the trapezoid.py file inside I have a function called trapzf and you'll see that its documentation string is just a standard by docstring format so I could add some more descriptive text here the notes so we have parameters returns and the note should probably go in here at the bottom there we go okay so there's the docstring and we'd like to see that appear in our Sphinx documentation so we go into a docs directory and we type make html just like Chris did before NumPy.doc is not installed install numpy.doc try again everything builds and let's see so it should be in the build html directory yep we've got a bunch of html files let's open the index file we have our api reference and inside you see the function with the signature you'll see it's oh this one let me remove the cache seemed to have stored my previous build let's make that again there we go so you'll see the standard documentation format from NumPy just nicely marked up into html and automatically included yep so the syntax for NumPy docstrings it's just restructured text so you can use it in any package that supports free format text as a docstring you can also because a lot of languages don't support docstrings that are attached to the code but you can write these externally and just compile them someone asked earlier about Napoleon so Napoleon is an extension included in Sphinx nowadays that can understand the NumPy docstring format I think the rendering of the NumPy doc extension is a little bit better but you can use whichever you prefer so is the question basically can you use this type of technique with another language I've never tried to document c++ using Sphinx but basically you can have external text files list your functions with their docstrings separately from your code and you can compile it into this doxygen and c++ yeah that's a common combination so yeah that's it you can see you can include mathematics and if I had an example in there that would have been included here with python syntax highlighting but that's the gist of it so any questions great thanks we're going to get started on round 2 of the docton tutorials do this right here thanks for sticking around post coffee really okay so we're going to start the second half of tutorials these are as I mentioned before these are going to cover 3 basic topics the first is going to be a really quick run through of Sphinx gallery the second is going to be a bit of an open the second is going to be a bit of an overview of a lot of the stuff we've been talking about but tools that exist in R in order to build a lot of the same kinds of things and then the third is going to be a more general conversation about how to deploy this documentation using Travis for continuous integration and for auto generating your documentation so Sphinx gallery is actually pretty straight forward and simple so hopefully this won't take too long but the basic idea behind Sphinx gallery is to basically assume that the yeah, oh bigger to assume that your code is sort of like visual in nature or that your package is visual in nature and this is particularly true mostly for scientific packages in R or Python it's very common that whatever our package does some kind of an end product is like a visualization of data or an analysis or a statistic or something like this so for this case it's very useful to have these kind of like big displays of what these visualizations might look like and a really great way to do this is with something called Sphinx gallery so the easiest way to explain what Sphinx gallery does is to just look at an example and so here is what the examples gallery for MNE Python looks like if you just click on the gallery button right there you can basically just scroll through this and get like a little snippet of an image of what all of these different examples will show and so basically this is what Sphinx gallery does for you if I click on one of the images like this one right here I can see that that image was created right there and I'll show you guys kind of how that is done basically right now so this is also a Jupyter notebook inside of that repository so you can kind of follow along here if you want to deploy this on your own website then again that's probably something that we can do with the hacker within later the repo is right there you got it and we're going to send out links to all of this stuff afterward as well okay so inside that repo this tutorial exists as well so if you want to follow along that way that's an option too so as before Sphinx gallery is pip installable as well so it's pretty easy to get it immediately up and running since we already have a Sphinx deployment that we built from the previous tutorial it's going to be pretty quick to get this done so the one thing that we need to do in order to enable Sphinx gallery is just to again add it to our extensions that's inside of our configuration file so if we go here I'll just add that line right there and now we have access to what this is designed to do so one thing that you guys may have noticed is that our original python package had this folder called examples so inside this folder is basically two python scripts each one of them demonstrates some functionality inside of the package and each one of them generates a single plot so what we'd like to be able to do is to generate a gallery that shows off what these plots actually look like so the way that we can do this is basically by just pointing Sphinx gallery to that folder that's really the only thing that we need to do so we're going to add a single dictionary to our configuration file and what that dictionary does is tell Sphinx gallery where to look for examples so we're just going to point it to that folder with our two python scripts in it and this is the actual gallery that it generates and a sort of common convention is to put it in something called auto examples so once we enable Sphinx gallery and once we add this configuration file the next time we build our website it's going to automatically go through that folder and try to generate this gallery on its own so one thing that we need to make sure is that whenever you use Sphinx gallery folder that folder needs to have a readme file in it and basically that readme file is in restructured text format and its purpose is to sort of give context for what that folder is for and subsequently what the gallery that's generated is meant to represent so we can just copy and paste this into our examples folder which is right here so we'll add a new file and we'll call it readme.txt so now our examples folder has three things in it the two python scripts and this readme file here now that I've got that all I need to do is run my documentation build command just like I did before and it goes through and an exception occurred because that always happens let's see examples does not have a readme.txt but I just created what's that? but it's right there yeah maybe let me see where's the that's supposed to be readme.txt I see so it seems to be looking inside the wrong folder let's see what happens okay great there was just a rogue folder that was in existence that should not have been in existence so actually that was a good sort of demonstration and was completely intentional by me if you don't have a readme.txt file inside a folder that Sphinx looks for it's going to break so that's just something to keep in mind but one thing you'll notice is that we've got this extra output now that we've run the build command for our docs and basically what it's showing us is it's progress in running all of those python files that it found inside of that folder so basically now we can look inside of this auto examples output so this is just in build slash html it created a new folder called auto examples that has the html for all of those python scripts that we included there and if we open the index that's inside of that folder we get this nicely rendered gallery style output each of these is clickable and it has some sort of nice interactive features if I click on this it'll take me to the actual page for that file and something that's really cool about this is that it also automatically generates jupiter notebooks that represent this this sort of marked up file as well so I can just click that and it will download it to my computer so you may notice that like the scikit learn documentation you can download any of those examples very quickly as a jupiter notebook and that's not because it's written as a jupiter notebook but because what Sphinx gallery does is basically converts the python file into a jupiter notebook so that it can then convert it into this nicely formatted html document this will be it kind of breaks it down based on the way that you break down your example and I'll actually show an example of that in a second so let's see yeah and we just saw what a single example looks like so this is basically what Sphinx is doing when you when it finds a folder that has python scripts inside of it it runs the python file it converts it into a jupiter notebook it stores that jupiter notebook along with the rendered output for that file in the html file so what's really cool about this is that one of the things that now mentioned is that it's really useful to have this kind of like narrative flow to your documentation to your examples and this is really easy to do when you use something like Sphinx gallery because it basically allows you to embed restructured text in line along with your code itself kind of like how jupiter notebooks are naturally structured but here we're doing it with just a text file so we'll take that trapezoid example and just fancy it up a little bit we're gonna copy and paste a few extra lines of code one right here and so this is basically just defining a section and inside of this section we've got a little bit of contextual information we have that header that we've seen before in restructured text files we'll do the same thing for the plotting the first plot that we generated right here not right now no it's definitely doing a similar kind of thing there are particular benefits to having it in a restructured text format for example much easier to diff like if things change so I think this was like an intentional design decision by the Sphinx gallery people basically but as you can see this is basically the same general flow that you would have in a notebook and so then we'll add this final extra plot bit right here okay so now that we've added these extra these extra bits we'll also add one more line which is to say if we look at this example it generates two different plots and say we like the second plot a little bit more because it's a little bit prettier we can pretty easily tell Sphinx to pick that second plot for its gallery with this commented line right here so we just say use the second plot Sphinx that's generated and for whatever reason it's not zero index I don't know why but yeah yeah it does right now it does and we'll actually see it in a second so now I've added just a couple of extra bits of commented code and now if we regenerate the docs probably it'll break nah it didn't break okay it worked so now we regenerated the docs and let's take a look at our plot trapezoid page looks like now so now because we've defined these sections inside of the TXT file it broke up the output that it used to generate this file so we've got our inputs here and here's that line that you mentioned but now we have these kind of natural header sections we added in some math that got rendered pretty nicely and now whenever there's a new plot that's generated it's sort of grouped along with the section that it belongs to so now we have one plot here the next section and the next plot here which is really nice and again it sort of makes it easier to follow a more natural kind of flow to your example if we downloaded this notebook then if we downloaded this notebook and then looked at what was rendered we see that the rendered version also has everything kind of split up in the same way that it was in the rendered file as well so let's see oh yeah we can also basically break up our galleries into multiple folders if we want to so one thing that we can do is take this single examples file and basically create two folder and create two folders inside of it so we'll go to so this is what's inside of our examples folder right now we'll just create two new folders called scatter and trapezoid we'll copy the readme file into each one because we need to have a readme file in every single folder and now we'll move our two plots inside each ok so now if we look inside of that folder we can see that kind of hard to see but we've got our examples folder inside that we have our first readme inside scatter we have another readme the python file and the same thing for the final folder and we can just edit these readmes maybe so we'll save that one and we'll save that one so now we've got this folder directory that's kind of broken down if we then remake our documentation and then look inside of the index file that gets generated and again this is the sort of base gallery that it generates then we'll see that these two sections automatically get split up here so a lot of packages do things like they'll have different sort of conceptual points that their package touches on and then they'll have a collection of examples that are related to that point and it's a good way to structure your information and if you look at for example that the MNE gallery you can also do things like generate tables of contents you can quickly jump through all of these examples very quickly so then the last thing that we'll need to do in order to in order to sort of finish this website is just add that auto examples file to our index and that way the base website knows about its existence so we'll just go back into our configuration sorry into our index file and add the location of the index file for our Sphinx gallery here and now finally we can rebuild the docs and when we look inside of that index file we have our gallery right there so we click it and here it takes us through our Sphinx gallery so this is again like a very simple bare bones way of doing this but it kind of gives you a feeling for how quickly you can go from like you know raw text inside of your folder to some sort of nicely rendered HTML version of your documentation or of your website you can do a lot of other cool stuff with Sphinx gallery and sort of there are different models that people have played around with for how to incorporate it into their website some packages like MNE actually generate like full tutorials and use this for teaching students and stuff like that so as an example this is like a file where they basically just dumped the whole script into a Python file and it will just have it all right there and it will make the plot at the end which isn't particularly useful but if you go to their tutorials page then you can see a much more sort of like fully fleshed out collection of information so here's a tutorial that they have on filtering your data and you can see it's kind of broken down by multiple sections and it sort of iteratively goes through the process of filtering a signal and manipulating it in different kind of ways which is really useful to do using something like Sphinx gallery and this kind of narrative flow okay so that's all I've got for Sphinx gallery again you can kind of look at this repository as a way to get some inspiration for how you can incorporate this kind of thing into your own documentation if you have other questions about setting this kind of stuff up you can either stick around for one of the working group sessions in the afternoon this week or come to the hacker within where Matias and I are going to be going through this a little bit more slowly so without any further ado up next is going to be Rebecca that's you who's going to talk a little bit about using some of these tools in the context of R basically thanks okay cool so I'm going to talk to you guys about documentation in R everything so far has kind of been very Python centric but first I'll introduce myself hi I'm Rebecca I'm one of the fellows here at BIDS and I'm also a grad student in the statistics department which means that R is my life can I first get a show of hands quickly how many of you use R oh yay people use R I was worried everyone was going to be like Python okay great so how many of you have made a package in R that's great some of you have so basically what I am going to do is I'm essentially going to go through and make a really simple package in R it's actually pretty easy to do and then I'm going to document it and write a vignette for it so a lot of the tools that we use in R are very simple similar to what we've seen for Python like doc strings that's kind of a version of that for R there isn't really a version of Sphinx but looks like you can use Sphinx for R which is kind of cool yeah and then I'll write a vignette and I'll show you guys an example of book down which is a really cool way of writing essentially books for documentation not really designed for documentation it's actually designed for writing books but it turns out it's pretty useful for documentation too so if you have used R before and you're aware of like you know about a library and you are not entirely sure how to use the functions typically what you'll do is you'll load in the library so let's say your friend's like oh you're using dates what you should be doing is using the lubricate package well you would library lubricate load in the library and then suppose that you know that there is a function that is YMD obviously this is going to stand for year month day maybe not obviously yeah and this kind of help file is going to come up in one of your panels so this is RStudio if you've ever used R you're probably familiar with it if not don't worry too much but yeah so it kind of tells you it's kind of like the doc strings that we saw for Python right it gives you a description of what the package does in this case YMD is a function from the lubricate package which lets you pass dates in the order of year month and day so that seems pretty straightforward it also shows you how to use it so this for example this always lists all of the possible arguments you can have if you have a function that has a huge number of arguments this isn't going to be super useful and then it gives you a description of what all of these arguments do follows on by some more details and some examples so this is pretty standard if you've ever done question mark of any function you'll get a help file that looks pretty much like this so another example of figuring out how to use a function is to look at a vignette so the idea is suppose that you don't even know what functions are in your package you can kind of go to the so this is the kind of CRAN all R packages live in the CRAN kind of gigantic database of packages and they all have these these files if you just go to the CRAN website type in the name of your package you end up with one of these files which has a huge amount of information most of which is not very useful for example this reference manual lists all of the functions in alphabetical order which is useful if you know what function you want but if you don't know what function you want it's kind of useless so instead what people have is people often write a vignette people don't always write a vignette but it's really nice when they do it's basically just a longer version of a tutorial but yeah it gives you examples it shows you how to load in the library it just gives you a lot of examples and it uses words instead of code essentially so I am going to show you guys how you can write your own package and document it it's pretty simple the main package you're going to use is a package called devtools written by Hadley Wickham and a bunch of people over at RStudio like everything else that is useful in R so I'm going to load in devtools and I'm just going to write a new package so basically I'm just going to do create and I'm going to call it math package okay did a bunch of stuff cool what did it do so I'm not going to read any of this all I am going to do is go to my desktop which is where my working directory is and see that oh I have a folder up here called math package so basically what that function did was created a new package for me to do anything yourself it does everything for you it has a bunch of stuff in it most of which is empty I'm only going to really focus on this R folder it's kind of hard to see but yeah so basically every function that you write is going to go into this R folder so I'm going to write a few functions so I'm going to start in your R scripts and I'm going to write a function called add can anyone guess what this function is going to do obviously subtract I don't know so I'm just going to return x plus y super simple I'm going to save it inside this R folder and I can call it whatever I want I'm going to call it add.R I could call it apple.R it doesn't really matter and basically I now have a package that has one function in it which is very useful what I can do is I can do load all oops I'm in the wrong load all and that's just going to load my package so now I can do add of 5 and 7 and I get 12 so it now knows my add function just by using this load all function from the dev tools package but if I do question mark add I do get something because apparently there is something that involves add I'm going to call it add function okay now I can load all it will know what add function is alright it gives me 12 that's good and if I do question mark add function it doesn't know anything it says there's no documentation for it so what I can do super quickly is add documentation essentially using roxygen it's pretty much I'm just writing comments it's a special version of comments which instead of just in R tag is comment but roxygen comments are hashtag followed by a apostrophe quote sign so I'm going to say this is a function for adding two numbers together okay so I'm going to say the parameters are x which is a numeric value and y is a numeric value and it returns value I can't remember what the thing is these are all of the possible things I could have maybe it's just return how about that the sum of x and y okay so now I have all I've done is added these comments above my function and what I can now use is the document argument and all that's going to do is it's going to look through all of my files in that R folder and it's going to say oh do I see any roxygen comments and if it sees them it's going to make a health file so now if I do add function it has my health file which is really useful okay what it does is it adds two numbers together this is the usage it has two variables and it has a value and the value it returns is the sum of x and y so that's pretty much like a doc string right it's pretty much what we saw earlier just yeah you literally just add the comments in a specific format above your function and there you go you have your documentation but again this is really really useful this kind of documentation is really great if you know exactly what you're looking for if you don't know what you're looking for what you want to do is write a vignette so that's entirely separate from these kind of doc string style documentation but what I'm going to do is I'm going to write a vignette that is going to be called math because why not okay so what that did is that created this rmarkdown file which it's essentially a markdown file in which you can write rcode specifically so let's say I wanted to compile this particular vignette it is obviously not specific to my package it's just a template but it looks really pretty and it's got headings rcode other stuff pictures you know so it's pretty cool so all I'm going to do is just change it for my own package so math package vignette author is me math package why do I always do that is a super cool package for adding numbers together here is an example add so first I have to do library so if this was a package that was already loaded into my workspace from CRAN for example I can just do library of the function but this package doesn't actually exist anywhere other than my specific workspace so I won't be able to find it when I compile so I'm just going to use the dev tools load all function to load it into my workspace and then I will be able to do add function of 5 and 6 and show that it is the same as add function of 7 and 6 dev tool last dev tool last is a pretty cool package okay cool so yeah it's just it is this is my vignette by the way it lives in this file in this vignette folder that was created when I used that use vignette function and yeah it just it ran all of my code it printed the output shows that adding 6 and 7 is the same as adding 7 and 6 who would have funked it and yeah it's pretty simple pretty simple stuff but it's really really really useful for if you have a package that has a lot of functions or anything like that you want to be able to tell people how to use it rather than just saying this is how you use this specific function you want to give people more information of like the package itself so that's really cool vignette's great if you want to show how to use a specific functionality but what if you have a package that has a huge number of things that you can do and if you just tried to fit everything in a single html file your screen would explode then what you can do is actually write a book about it you know it's kind of like the Sphinx gallery and all of that kind of stuff this is not necessarily something that people do as common practice yet but I think it's getting there book down is a really new package for creating get books essentially so this is one that I wrote from my package that makes super cool hit maps but basically it's just really easy to write your own book so the first page is always kind of how do you install and load the function and I have a bunch of chapters you know I have like you can cluster and add dendrograms and it's you know it's got a huge amount of different things that you can do which is kind of awesome and it's just really useful to instead of having to find all these separate vignette files you can just go to this kind of vignette book essentially so I think Nell showed an example earlier of the carrot package that's where I got the idea to do this for mine and I just thought it was super cool so how do you make one of these? it's pretty simple it's not quite finished yet I think at the moment what you literally do is clone a github repository that has an example and then just edit it yourself so that github repository is here so I'll show you guys first in case you are curious so it's basically all of these RStudio people some amazing things for R and they just have this repo that has an example and you clone it so that is book down demo from the RStudio user so I've lost my thing okay so I am just going to go into my thing okay where am I? desktop sorry this is super tiny I've already forgotten what I called my package clone this into my package folder and now I have a folder that's called book down demo so that's right here and inside is a bunch of stuff so what is that stuff? that's a great question basically you have each of the chapters lives in an R mockdown file the book itself hasn't been compiled yet but is very easy to do so I'm going to go into the directory and I'm basically going to compile it so first I need to load the let me get rid of all this stuff so library, book down and it's called I think it's render book render and you do index.rmd because that's where everything in the book kind of lives and I want to render a get book and it does all this stuff which is super cool and what I now have is this underscore book file and I have index.html and that's where the book lives so it's that easy to make your own book and then you can literally just edit everything as you want it yeah so this book down is a specific R package so it is really well integrated with R studio you know how well you could use it for if you're using R but not R studio you could definitely use it all it really did was make that book down demo file and you can edit that and it compiles it yeah exactly so if you want to be able to compile the R markdown scripts without rendering the entire book you do kind of need R studio you can render can you knit files outside of R studio there's a function for it right I don't know but basically I haven't done anything in here that is actually specific to R studio because to render the book I didn't click any button I used this this render book function and that would just render everything you don't actually need to be in R studio to do any of that yeah so that is my quick tutorial of how to write documentation in R if anyone has any questions maybe ask me either later or now I don't know probably got running a bit late so we'll move on to Matias now you have a question yes great question so I literally as I was sitting there listening to other people's other people's presentations I made a quick get repo that contained the package and the tutorial that I was kind of following with all of the description so that is my name R L Bada and documentation cool Daniela did you have was that your question too yeah they're all in here they're all in here yeah cool alright thanks for listening guys nobody heard you so on the video so it's fine we have to resay everything press I want to thank everyone for being here it's getting late I'm sure everybody is I'm angry so I'm going to try to go a bit a bit quicker than I planned so that we can get get lunch so I'm going to talk about continuous integration and documentation using Travis CI I'm taking the example of Travis CI to be simple you can do that with other continuous integration integration tools but on the Python community at least we use Travis a lot I'm working here mostly on the Python project and I have a really terrible French accent so I'm sorry if you don't understand what I'm saying I can repeat just raise your hand and say ok we don't understand what you're saying you might be speaking French and I will try to repeat in English so Travis I tend to really entropomorphize Travis and if you see conversation on GitHub sometimes we refer to Travis Travis is happy Travis is sad Travis is angry it's actually it's a mascot is a person so it's really easy to think about it as someone who helps you it's not actually someone behind a computer who is running thousands of jobs every every day but it really feels like that and what is Travis CI so CI stands for continuous integration it took me a couple of years to realize that so that's why it's the first point I put here it's usually projects think about Travis as something someone who can run unit tests in isolation and mostly complain because your tests fail and you know that when you're on a PR because you get a red check mark which is nope you can't merge that it's going to break everything everybody will complain it will pursue you to the end of the earth with a chainsaw which you don't want but usually Travis can run almost arbitrary code in some limitation you probably don't want to mind byte bitcoins or send spam and you have a limit on the time you can run the jobs and so let's use it for documentation so why for documentation we already had a lot of presentation about documentation documentation is never complete because most of the people who write software projects don't like to write documentation they prefer to write code and state of the project always evolve you can always find things that are better and you can always have to find a balance between the compactness of your documentation and the extensiveness of everything you want to write your project is likely already set up on Travis for tests to make sure that all the test are passing and you're probably already running things or are marked down or other things on it to be sure your documentation doesn't have broken links or doesn't still have all your functions and usually sometimes you might want to use another service to build your documentation but when your project grows it's one more set of credentials to keep remind people to go to this website update the branches etc and so your documentation gets out of sync with your project so it's painful and so let's have we already have service that builds the documentation to check something we already have service that builds that has all the dependency that might be really difficult to keep in sync with your project make sure everything builds so let's just build the documentation there and upload it somewhere so how Travis works for those of you who have never seen Travis you basically drop a Travis.tml file at the root of your repository and you give it some information on which language I am going to write things on mostly I'm writing in Python so I put Python which version you can have actually several versions and if you have several versions of your dependencies Travis will have which is called a matrix and it will test every combination of things that you have a project on which you want to build the documentation you can for example peep install Sphinx and NumPyDoc or Napoleon that we have seen earlier and on the script part here I'm just building the docs so I'm going to the documentation folder and just make the HTML if I have mistakes if RST file is not correct because there is a missing back tick the build will fail and if a computer make a PR they don't have to go through you you don't have to tell them oh your build is broken you have to add a back tick here Travis will already say oh the build is not passing if you have a good error message and it will fix things and at some point the PR pass which means that if you know that if you merge your documentation will still be correct and now how do we deploy so usually what happens is you land a commit on your GitHub repository it goes on Travis and Travis builds a doc and if the doc are correctly built and you merge the PR the commit lands on master and the same thing re-triggers and now you need to find a way to push this documentation to GitHub pages and usually you want to do it only for a couple of branch master which is always in development because you want people to see the latest doc of your project to be able to bring fixes while you're developing and you can also say that you want to deploy the documentation on a couple of other branches usually you have a version number that ends with an X the tricky part is to give Travis credential to push back on GitHub well obviously you don't want to put your password in plain text on a file because otherwise someone will scrape on GitHub find your repository and push commits that overwrite your project and do some some crazy things and if you ever commit to the things as plain text and push it even if you overwrite GitHub's story please change your password because there is always a way to re-find it a little known feature of Travis is Travis Encrypt so here Travis is a local program you can install off your computer which is made by the people from the Travis CI company which can either encrypt an environment variable or a file and the thing that we want to do in our case is to encrypt an SSH key or actually something else but it doesn't matter that has credentials and this is meant to be decrypted only when a commit lands on the repository which means that if someone send a PR your credential won't be decrypted it will only decrypted once the PR is merged Pro tip if you want to do that don't echo your encrypted variable because otherwise someone can just look at your Travis log and figure out your SSH key and then they can push things back back to GitHub so that's kind of hard to do if you are not really familiar with how SSH works but you can do it manually if you really want which means that here I'm going to show how to do it for a Python project if you have Ruby installed on your machine you can do that by hand by gem install Travis and so on and so forth but it's quite complicated to wrap your head around and what you want to do now is basically a simple project that will for you on your machine create credentials to push only on GitHub pages once you've built your documentation encrypt the right part of the SSH keys you commit them to your repository and now what's happening is that every time something lands on a cluster in your repository it will be automatically built by Travis Travis will look at the doc and push it and push it online for more hands on if you want to participate please come tomorrow we will have one hour and a half we will go back through all the things we did this morning where we will create a repo with things, Python projects we will build things docs push that on Travis and have Travis and if you don't want to use Travis command lines there is a relatively recent project called doctor for documentation Travis which basically does all of that for you, you just have a few commands to type and it will automatically infer what it has to do, it will set up credentials tell you to copy and past things in Travis.yml that's the hardest part you really have to just copy and past and it will find for you the documentation folder when it's on Travis and push on GitHub so how to do that you need to pip install docker once on your machine then at the root of your repository you run doctor configure it will ask you questions like which repository do you want to build the doc for what is your username on github what is your password what is your two-factor authentication token if you have one it will not store them but it will get a unique key that it will encrypt and give you instructions, please commit this file please add that to your Travis.yml so here is one example for the conch repository where we copy and pasted that and now every time something lands on conch master it will automatically be deployed and push a commit and now your documentation will automatically deploy on GHPGs and you just need to head to your organization or your username .github.io slash your project slash docs in between brackets by default it puts things in the docs folder but you can configure that it's already late so I might not do a demo I can just show you how it looks like so here for example if I want to do it for a Sphinx template I will have to activate the repository on Travis this is my fork of Chris repo that it showed you earlier and if I were to just run on my machine doctor configure I might be able to zoom in I would type my username, my secret password that I want to tell you my two-factor authentication code which is always 1, 2, 3, 4, 5, 6 um then I have to type type where I want to deploy that so that's the wrong I did that yesterday the repo had another name so what's the repository and where I want to deploy it on different location if you have complex setup and then it will just print that for you which is a bit complicated and you want to just forget what this is and now it tells you commit this file add this to your Travis.yml and copy and paste that at the end of your Travis.yml and that's it all you have to do then you push on github and your documentation will auto build and be deployed on github pages and you can you can go there and it works if you want more detail you can come tomorrow we will be happy to dive into that and it should work with R, with Python with Julia, with everything and I'll be happy to take questions so because Travis is has a deeper integration with github so the question was why do you use Travis and not Jenkins because Travis is hosted which means that we don't have to maintain it which is really great you basically just log in with github activate your repository and it works and all the configuration of Travis is in the .travis.yml file which means that someone which is completely external to your organization can say hey you actually have this option that makes Travis try it as fast if you use the core container they just have to make a PR and so you don't have to maintain your own Jenkins you have to maintain your own instance there are limitations in Travis of course because you only can run build for 20 minutes if I remember correctly but for most open source projects it's super easy to set up and when someone fork your repository they can just check the box activate Travis on my fork and then it will not only run the test suite when they make a PR but when they push on the repository which means that if they are scared to send a PR that will fail they can just push on the repository until the build pass and then they make a PR and they are less scared so what is this approach pros and cons of this approach versus read the doc read the doc is really nice as long as your project doesn't have compile dependencies and you don't need a really specific way of building a documentation and the other thing so read the doc is really nice for that as soon as you have dependency on matplotlib as you need to compile something you need to be doing all the job of configuring that on Travis in your Travis.tml file and you already have like QT and things like that you are already building your docs why not just use it the other problem we have we had with read the doc is you have to set up a second set of credentials and we regularly have hey I don't have access to this repository on the read the doc can you make me admin just so that I can experiment and it's becoming annoying the drawback of this approach to use Travis is that something that read the doc does are not yet completely possible with that like for example when you build documentation you want to build documentation in subfolders for each other version of your project this is something that now you have to do manually with that you have to install a Sphinx extension to do that and store your docs in subproject the other things that you cannot do is set some headers for Google and other search engine to say which is the canonical page and I don't know if you've probably seen that if you Google something for the docs of Pandas you will always you will always get a different version of Pandas for each result because Pandas doesn't have this real canonical at the top this is something we are working on which extension in Sphinx you have to install so right now that there is this balance the project is still relatively young not super easy to set up in some specific case but it remove a lot of pain points of read the doc for a lot of the day to day maintenance of a project so as an example on the doc-a-thon website for a while we had Travis set up so that we were embedding our GitHub API key in there encrypted and we had a bunch of scripts that would basically query GitHub's API and pull all the commit activity for people make a bunch of plots like deploy those plots to mark down and then generate the actual website using Pelican so that was all completely automated and one of the nice things about it is that it made it so that we didn't have to embed a lot of extra information like PNGs in the Git repository itself so it saved a lot on space and it was also just something that it was like one less thing that we had to remember to call before pushing to the GitHub repository it just made it pretty easy the only problem was that we like maxed out our GitHub API after like four hours or something but food is there so we can just wrap up we'll stop the YouTube recording and we'll see you this afternoon? okay so this is going to be very short I promise the food is here and it's time to eat but a couple of people have just asked sort of like what should we be doing this week what's the doccathon all about and really I want to sort of the labor the point that we're kind of letting you define this week however you would like our goal is to get people sort of excited about documentation and to get them working on it sort of spending time that they would have otherwise spent developing and just take that and use it to improve the documentation in whatever project you would like so as long as you do that we're pretty much happy whether that means you're a developer on a project already or if you'd like to contribute something new and sort of get involved the three easiest ways the sort of three most actionable things that we could ask you to do is one to make sure that you're signed up either as a project or as a participant and this is just like a little heat map of where we have people signed up on already so again this is really like we do have these sort of like physical working groups in Berkeley, Seattle and in New York City but it's meant to be something that anybody anywhere could contribute to and as long as you're signed up we can keep track of what you've been up to and what kinds of commits you've made towards documentation on github the second thing is if you are signed up then remember to tag your commits with either doc or documentation or doccathon we have some really simple scripts that are just scraping the github activity and trying to find examples where you're talking about documentation and we're going to try to use this to generate I don't like calling them leaderboards because it's not really a competition it's more of like sort of celebration of all the awesome work that people are doing so we're going to be like posting, we're going to try to like highlight some of the projects that are being worked on over the course of this week and again just sort of give props to wherever those are due and then other than that you can get involved if you'd like on Slack we have a Slack chat room where people are sort of shooting questions and discussion points back and forth that's a link that will just allow you to automatically sign up for it and then just tell your friends about it because again the point of this is really just to try to build a little bit of community and get people excited about the docathon so that's all that we have for you right now we've got some tasty lunch in the back we're going to be holding sort of like a working group session for the next couple of hours for anyone that wants to stick around and work on their projects here and we're going to be doing the same thing over the next couple of days from roughly like one to five or six each day and we'll make sure to bring in some coffee and snacks and stuff like that in case you guys want some energy to keep on working so thanks for coming to the tutorials thanks for agreeing to spend some of your time thinking about documentation and I look forward to seeing what you guys work on this week so thanks now go