 In this talk, I will present tools that are good to know, that is Dr. Mai, Jit here, Ren targets orderly and so on. She will also report whether making an R package out of everything is always a good idea. After this presentation, you will have land teams, tips and tricks to apply as you like in your R projects. I hope everyone of you will learn at least one new thing. Over to you, Dr. Mai. Thank you for the introduction and my talk is entitled please draw me a project and I'm joining from Nancy today is actually a sunny day but often we have a sky like a gray sky like on the picture and I'm delighted to attend Saturday in Nairobi so thanks for your invitation and organization. It's my first Saturday conference. So, in Cape Town in 2018 I gave a talk where I wore a hat hat during the more technical part of my talk today I don't have a costume so I'm sorry about that. In 2018, I gave a short version of the same talk in the Cardiff. And in Paris, so I attended Saturday Paris conference in 2019. And I'm actually wearing the t-shirt from that conference but you only see a black t-shirt so you have to believe me. And today I will share some tools and tips around managing your art project. I had some styles introducing myself but I don't need to go through them because I was introduced by Jean Matisseau. And my slide deck is online. I've put the link in the Zoom chat so it's Saturday iPhone Nairobi iPhone mail.netify.app. And the goals of my talk are really the goal is that everyone of you learns at least one new thing. And I have several sections. The first will be about basic principles for art projects and how to protect your projects from external changes, what structure to use for your project and how to run your project. And why am I giving advice on art projects so that's not so the title of my talk is them. So please run me a project in the little prints the book there is a quote whereas the little prints as please run me a sheep so that's not exactly what he asked for like a project that is cheap. So I think that advice on projects can really improve the life of anyone touching or reading your project like someone reading the results, or you that needs to run the projects again in a few weeks. So this all helps reproducibility. It's also good because there is always something you can improve in your art projects in the way you work with our. Sadly, it also means there is always something you can procrastinate on so that's something we all need to be aware of. But sometimes we just need to do the work and not improve the workflow and what he said to me so I'm actually not regularly in charge of analysis so that's a bad point for choosing this topic. But I try to keep up to date with our news and this is how I gathered some tools and tips for this presentation. So starting with some basic basics doesn't mean it's easy basic means it's fundamental and very important but that's, these are all things to learn. So the source of advice here is Jenny Brian, I ended identify with the quote by Charlotte Gelfand, who said everything I know is from Jenny Brian. And just a note that Charlotte Gelfand said that after a talk, and Charlotte himself is a great source of information so you could go watch this talk by Charlotte Gelfand which was don't repeat yourself talk to yourself repeated reporting is our universe. So coming back to their phrase, everything I know is from Jenny Brian. It's something that enough people identify with. So there is a logo and on this logo varies a computer on fire and they will explain later why there is a computer on fire when there is this logo. So if you have a new project in our like maybe an analysis, scientific article or report. You might think of your project as a garden, where you really try to spend time carrying for your flowers. That is a bad thing with this analogy would be that you cannot really take a garden somewhere else. I'm really happy to see a treat about a competition in Japan, where people build beautiful gardens on little trucks and that's really how I want my projects to be like a beautiful but also very portable. And to come back to the idea of laptop on fire during a talk. Jenny Brian said, if the first sign of just script is either said WD a path that's only on your computer or RM, whereas your argument list is equal to everything, then Jenny Brian will come into your office and set your computer on fire. So it's quite scary and in a in a blog post in particular she explains why how you could not do that. And so the first idea is that you should have a path that are related to relative to the root of your project. So for instance, if I had a CSV file in my Saturday in Nairobi folder. I would use the sub folder data. I shouldn't use reader with CSV and then all the path from home mail documents conferencing Saturday Nairobi, because this structure of folders only exists on my computer. If I give you my Saturday Nairobi folder and you try to run the code it will fail. To also illustrate the idea so Alison Horst will make a beautiful illustration of our packages and other things ready to are made an illustration is reading the here package. And on the left, you have a very scary path, like with spiders and set WD everywhere. On the right, you have a beautiful path with flowers and the here package so what's what is this all about. So the here package is a package by Karen Miller and you need to use it to have something at the root of your project like if you use our studio and our project file, so you don't need to do anything. If you don't need to use our studio and you don't use it, you can have a dot here file as a project route and once and it's an empty file. And once you you have that you can use the here here function and it gives you the path to do something related to the root of your project so in my Armageddon slides I run here here and it gives me the path to my Saturday Nairobi folder on my computer. If I if I run I run here here data it gives me the path to the data folder of the Saturday Nairobi folder. So if I want to read the CSV file that's in the data folder in my Saturday folder I can use here here data cool stuff dot CSV. And this is really cool because it works from anywhere inside my Saturday Nairobi folder so I can use this code in my, maybe a read me file that's at the root or maybe I have a sub folder with reports on a script, then it would work from there too. So it work on your computer, it would also work anywhere in my in my folder. And then how not to use RM list equal to LS. The idea is you should always start fresh or you should restart our regularly and we've got fear. If you somehow have the idea that there is something in your environment, maybe a package loaded some object that would disappear if you restart our and you're afraid of that this you haven't written everything you should in your script because their source code should be enough to recreate everything you want. And you should never save and reload the dot our data, which you might know is a default setting in our studio. So the default is a bad option. But there is a use this function to use this use blank state function so if you run this once on your computer, it sets your R studio preference to never save or restore our data our data on exit startup so that's a very good function to to know about. And this traders is a path is a path using relative path and and restarting regularly. This is a project oriented workflow, which is presented in particular in a blog post by Jenny Brian and it's really a way of working that will change your life. And it pairs well with using the use this package for instance maybe creating projects with the use this create project. Then, and so in Shakespeare in room sorry in the room of Japan Shakespeare, there is this quote that we often see that we that's sorry that which we call a rose by any of the name would smell as sweet. So this is a cool quote but that's not true when you're writing code. So there is a slide egg by Jenny Brian with tips about naming your files, which is very useful. Your file name should be machine readable. So if your name is male to dot on the E, like me, you shouldn't name your files with your name. Then it should be human readable. So the idea is that when you see a file name you as a human should have a basic idea of what's inside. And it should work well with default ordering for instance if you have dates in your file names. It's good to use your month day as a way to call them, because this way when you order the files alphabetically they are also other by time. I can't see anyone so everyone has their camera off so if there is any problem please tell me or if someone wants to turn the camera and then that's a good idea and and also about naming things so not only files but also variables for instance, when you want to write readable code you try to use viable names that are sort of informative. And this week Jenny Brian tweeted I didn't expect programming to involve so much time studying a scissors so and just a small public service announcement. Please take time to think if someone came and put your computer and fire what would you lose so hopefully you would only lose the machine be a bit afraid but you wouldn't lose all your files because you have a backup somewhere so it's important to have a backup. But then, not only with a backup your project will evolve over time your static one day and then you're improving it so how do you keep track of changes to be able to come back to the, to your project at the given state for instance. So to control the version you have an approach that could be to have dates in file names or something like adding final final final in the names of your files, or you could use version control and the most people are tool for version control this case is it and learning it is not easy, but it's it worth it. And it's also not something specific to our, which is good to use it when you don't use our but it's very useful. When you are using our idea with get this that it allows you to try things out and to come back to a previous version and to understand past changes. And there is an exciting comics by random and robot kids, where you have three people in front of a computer, and one person says, this is good. It tracks collaborative work and project through a beautiful distributed graph theory model. Another person asked cool, how do we use it and the first person answers. No idea just memorize this child comments and type them to sync up if you get errors save your work elsewhere, you'll leave the project and don't know the fridge copy. Learning it is not always that awful but the idea to delete the project and don't know fresh copy this sometimes happen when things go wrong. But more, more simply so what is gets so here is an illustration of several states of a sheep like baby sheep, young sheep, adult sheep, and a box, and between them there are rows. So the idea is that you can save a snapshot of your, of your files of your sheep at any anytime you want and you can. You can see the difference between two states, and when you do the snapshot so if you arrive at the young cheap state. You can see the informative comment message such as put now their sheep is standing or something like that that's something informative. So that's the idea of it, and also the ideas that you can go from the young sheep here to the adult sheep or to the box so you can explore different ways to do your project and this cycle. French is it get in get but really but you can try different different things and still have their state saved in the history. So here are some get comments roughly explained. So we get add you start tracking a file, and if there are files you never want to track for instance maybe files that contain some sort of secret you could list them in the get ignore file with get commit to save a change in the history. With get pull and get push you sync up your local version with some remote version maybe a GitHub repository. And with get checkout I can be your creative branch, which is a way for you to explore me for maybe deciding that this approach is indeed the best or maybe that's not the best approach. Good, but where do we use this get comments so here are my preferences. Sometimes I don't even leave our to use get comments so with the use this package you can, for instance, is start using it in a project with this use get fun, sorry with the use get function you can create a GitHub repository with the use get a function. There is also the guard guard package sorry that has lower level interface to get so with good you can push you can commit you can pull. Then if you use our studio there is a git window where you can click, and you don't have to go too far away from our sometimes I use a common line for comments that I am either copy pasting from Stack Overflow or the rare comments that I know by heart. And I don't use a graphical interface forget because maybe I never do anything complicated enough to need that but you can use for instance get cracking or I've been told by the git interface in VS code is very good. I've listed some resources about kits and since my slides are online I'm not going to describe them now. So until now what we've learned from like Jenny Brian wisdom is to isolate our projects and to restart our regularly to name files well and to use version control. I've also listed even more wisdom by Jenny Brian collaborators. And now I wanted in this section to explain how to protect your project from external changes. So another scary story where no one is coming and sitting your computer on fire, but imagine you have written beautiful data running with a function called my favorite fiction from the package package. You update packages on your computer, you come back to your script and you realize that in the new version of package varies no longer the my favorite function package. So this was actually a good idea by the package authors because they have another function that's better but for you it's bad. It's bad news because you will need to update your script because it's not broken. And sometimes you don't have time to do that you just want your script to keep working as it was. So the idea is that you can actually store a project dependencies, like encapsulating your project like having one set of package version for one given project and in another project using other package version. And an important tool for doing that is the run package by Kevin Ashie. It's a successor of packrat by the same author if you use if you use packet before round does the same thing but is better. So how do you use runs in a in a project is actually very simple so you start using using it with as a brand in it function. After that you install and remove packages as you would normally, and what's a bit more difficult maybe is to remember to regularly run the function for snapshots around snapshot function. So what this does is storing the metadata about the packages in a file program.lock so what is metadata of dependency it means that in this file. All the packages that you use in the project are saved and their version and where they come from so maybe it's version 1.0 from CRAN or its version from GitHub so this kind of things. And it means that if you take the folder with your project and you give it to your colleague or you give it to yourself on a new machine. To reinstall the correct versions of all packages all you need to do is running the run restore function. So that's very handy. Now if you want to encapsulate even more things so maybe you want to have a fix our version operating system, like everything, you might be interesting in Docker. There was a good introduction to Docker by Colin Fay and also recently there was at our ladies Brisbane, a meetup about Docker by Malin three year that Dama Ratni. And this, this talk introduction to using Docker for reproducibility in R has been recorded and materials are online. So in short to protect your project from external changes, I'd recommend listing dependency of the project and the easiest way to get started is the run package. So now what file structure to use for your project out to tidy things up inside your, your project. So in your project you probably have some data and or either data or code that gets data from a database or some sort of website. Then you have some code that manages the data fits a model to the data maybe. And you are willing to have some output maybe that's a graph or reports about kind of things know how do you tidy all these things. The file structure for your project should be consistent. Hopefully you can start a new project automatically. And maybe you want that to look like a package or maybe not. There is also there are many packages for creating projects but one of them is a project template by Kenton White. I've never used it. I really like the blog post love for project template by a Larry Packer, because the criteria she list as reason for liking this package are very good. Things to have in any package any tool that helps you with project structure. So she wrote routine is your friend. It's easier to start somewhere and then customize rather than start from the ground up reproducibility should be as easy as possible, finding things should also be as easy as possible. Very good. And some debate that very sometimes it should an analysis be an R package. And I will for at least scenarios scenarios where it's a good idea to make an R package. So if you want to store functions and data that you are using in several projects, not only one, but several of them, then making an R package is an excellent idea and there are many great resources out there to learn how to make an R package. In particular, the tutorial by Shaka Yuki that she gave at all it is never be recently and their visual has been recorded and materials are online. Then maybe you can also make a package to store something like an R Studio project templates, or maybe a package that has functions for creating a project structure. So that's a good idea to like without adopt. Now, storing your project as a package, then that's maybe maybe it's a good idea, maybe it's not. So, what do I mean by analysis project as a package idea is that you would list dependencies in description of the functions in the R folder, maybe documented in the R folder, you would have data, maybe the data folder, data folder, your analysis could be vignette, so R marked on false, and you could have an informative read me. So that would be the basic structure. The advantages of using a package to store your story analysis is that it allows you to either reuse or refresh your package development skills. And you could reuse tools that are made for package development like DevTools and use this. And this approach is the idea of a compendium. And was in particular, Larry described in the paper packaging data analytical work reproducibly using R and friends by Ben Marwick, Cobbett here and then come in. So in 2018, but maybe already said the year. And if you create your project as a package for a specific tools that you can use the R tools package by Ben Marwick can help you set up a new compendium. If you use a whole punch package by cathedral with the. So what it does is it interfaces as a binder project. And if you use that then you can add a batch to your to the read me of your compendium. And it allows the reader to play with your code online in a new as you do. instance. It actually works without the command structure, but I think it's still meant to be good with with companions. And if you also, so if you have your, your project as a package and your analysis are vignettes. And then you can use our universe project by Sharon orms at our open site that will allow you to publish your analysis because with our universe you have the universe with all your packages and it renders the vignettes in one type of that then that would be a way for you to other, and you're able to point people to when they want to read your analysis. And then there are still opponents to the idea of project as a package in particular as a blog post project as an R package and okay idea by mass McBain. And in particular in this blog post where he wrote was my response to advocates of project as a package is you're wasting precious time making the wrong packages. You're turning your work into the package even on domain with all the loss of fidelity that entails why aren't you packaging tools that create this move that will just be style experience for your own domain. So that's a bit harsh, but that's so good to keep in mind if you are easy teaching what purchase for your project. So to conclude on the file structure for your project, I think that what file structure you use is really up to you and your teammates, but it's important to have a basic structure that's consistent over time, and also to have a way to automatically create it it could be some sort of project skeleton that you have somewhere and just copy paste when you start a new one, it could be as simple as that. And the idea is to make reproducibility easier. Now, how do you run your project so you have your resources you have your scripts so you do you get the analysis and outputs. And, of course, you could run code line by line, but that wouldn't be necessarily very handy to do that. So if you, if your project is one or a few are Macdon reports, maybe all you need to do is a neat pattern, for instance, or are a random render function surrounding that regularly. But maybe you need something more complex. So I am going to discuss two cases with one package for each case. So maybe you want to optimize a pipeline or you want to track versions of an analysis so both input and output. So in the pipeline, I'm going to present targets package by well endow and to track version of an analysis over time, I'm going to present their orderly package by rich fifth john. So targets by well endow is a package that the juices or relationship between the pieces of your project to say in your project you have raw data. So you just from it into data, you fit the data with a model and then you make a figure of the model model fit, then, if you only change the model you only need to redo the figure you don't need to recompute the data from the kind of logic. So what needs to be rerun when something changes is something that target does for you, targets only performs necessary computation when change some things and that can lead to a lot of gain of resources and time. So a target is part of the urbanized suite of packages so it has been reviewed at our inside, and it's a successor of Drake by the same also. So if you work at the core of a targets project of a targets folder, you have a file code underscore targets that are, and it that file that you write your, your, yourself you can load packages. You can load functions so if you've written function in the off folder, for instance you can source this script, and you define targets so what are targets this is key to understand so. So here is a target definition in an example from the targets manual. So we have a list of targets, the first, so all of them use the target function to be defined. So the first one. So it has name road data file, then there is a path to the file, related, related to the root of the project, and the argument format is set to file so this one is a bit special it tracks the road data file. So the target school road data, and it's defined by our code by the code read CSV with the path to the road data file. Then we have a target for data, and it's defined by our code as well also by filtering the road data, and the filter function that is here comes from the supplier package the player package has been loaded previously and their packets that are fell. Last target is unfit that are defined by our code but uses functions that are specific to the project this function has been sourced previously in the targets that are fell. So this is how you would serve when you write a target package you define the different PCs and you write them in this list. So how does it run. So to build a target package you run the time make function. If you need to destroy everything at some point there is a function for that's the time destroy function. And what's very useful with targets. Well, many parts of this are useful is that you can understand your pipeline with functions just tag lamps. I'll show you shows you a network of dependencies between the other pieces of your project so in the case of the example. I have shown you have the road data file that becomes for data that becomes data and data become is used in fit and is a target. So in it's a very simple one but sometimes you see people sharing a pipeline that are very complicated and it's very cool to be able to visualize them this way. So how to get started with targets so manual of targets so it's a book done book it's very well done. So if you prefer learning with a video so there was recently talked by well and though as I use a group of you have targets, and I would recommend starting with a small project so I know that in my case I read the manual, maybe I watch the video or two, but I things become became much clearer for me when I wrote a small target project and that's really my current people I'm not an advanced targets user. So targets is not a package on its own there is the whole ecosystem of packages around targets are target opium, but as, for instance, package for using targets withstand another one for using targets with jacks. And how do you keep up with targets, given that it's a package that that is developed very actively so you could watch the get a repository of targets. So we'll end on Twitter. I would recommend subscribing to the our website newsletter, I would recommend doing that anyway because I not created so it's a very good newsletter, and you should connect with other users to exchange ideas and questions. So this is a totally package by rich fits john. So here's the challenge is a bit different. What if you are writing report about the situation of the pandemic in your region. And this report is used for making this year for making decision. It's very important in that case that you keep track of everything that has gone into your analysis at different point in time. And if in one month from now someone is doing an audit then or yourself is doing an audit of what you've done in the past you can see what you have done in the past and you and with orderly you can even combine the like you can make an analysis of the analysis. So, so that's the idea behind orderly. So how does it work so in orderly vocabulary you have repose and in repose you have reports back reports but then and they are not necessarily reports it could be a script that generated figures and there is no report in both. But anyway, I've made an example of a report that has one report in it. So I was in an existing our studio project. I runs orderly in it functions, and I created the block repose block is not an informative name, but that's the one I used and then I created a report inside inside of their block report by using the orderly new function. And I added two files from their orderly documentation to have some sort of minimal example so what I had was a block folder with a general configuration orderly config dot channel. And that one I didn't need to edit to make things work so it can be useful in some cases but you don't necessarily have to edit that one. And then there was a as as a folder to the source folder with the example folder so this is my report and my report is very simple it it's only orderly configuration but one I need a script. So what is in the orderly configuration. It reminds you of the targets that are fell, because it describes many many things we describe the script that has to be run. And it describes your artifacts but my script is creating so the artifacts are static graph and data, each of them is described by words like a graph of things and by a file name. So it's a script. In that case it's very simple one so it generates run of data in a data frame, it writes this data frame to a file my data CSV, and it makes a plot of the data. So this is a project I do I go from the script to the result. So first, in some cases because and with orderly, for instance, the report is the sub folder sub folder sub folder. It might be hard to write the code so you might want to experiment with the development mode that puts everything inside the current session. Once you are fairly happy with what you've written you can create a draft version of your report by using the orderly run function. And what this does is that it runs the analysis and it puts everything. So a copy of the input and output in a folder inside the draft folder. So it's in draft example and some ID that's based on the time at which it was computed and a hash I think of their analysis. So this is a draft folder. If you're really happy with what you've done and you want to save this you use orderly commit function so commit is the same verb as when you think it. When you use orderly commit function, it puts a version of the analysis inside the archive folder. So your archive example and then some ID for the previous version. So something that is important to say here is that when you use orderly maybe you have a huge analysis. So you are not expected to track archive and draft with Git but with something else you might need to have some sort of backup server or something like that. So how to get started with orderly. So the orderly documentation website is really great. So when you read it, I think that you can really see that it has been used for real like there is a real people using the orderly package and the orderly documentation shows how their experience has been used to improve both the package and the documentation. I would recommend starting small to understand how it works because like it's again like with targets. You read all pieces fits together pipelines and is the same with orderly like the idea of archive folder draft folder. It's all much clearer when you run it once. And I would also recommend a connecting with other users and how to keep it because as targets orderly is actively developed so you could watch orderly GitHub repository. It's a blog of the team that develops orderly. And it's actually a good blog to follow in general because they have really great blog posts but aren't programming in general. And you could follow a rich fit John on Twitter. So I'm going to be section how to run your project. So if you have any one or a few are marked on reverse maybe you can use a neat button also are marked on render function. If you want to optimize a pipeline and recommend taking out the targets package and if you want to keep track of all version of the analysis for future audits for instance, the orderly package is a great package. There are other tools that that can help you create your, you're running your project and you could even create yours. And what's important also to say is that in this talk I presented the idea of the file structure and how to run your project as separate concerns. I presented packages here but I'm not opinionated about the file structure. Sometimes you will find our packages that help to do both create the structure and run the project and better the better quite straight in how your structure your project them. And to go include their talk in general first I wanted to thank the organization team of Saturday in Nairobi was a pleasure to to be invited to to meet some of you, and I also want to thank Christophe de Avio who reviewed the blog post I wrote with the content of this talk. So how do you do a project. So, it's important to know the good basics like isolating your project having backups, it's important to encapsulate the project, for instance with the run package to not have your script broken by package updates. The file structure of your project should be practical consistent and hopefully automatic doesn't need to look like a package. And I would recommend using tools for building outputs that are adapted to your needs, like friends optimizing a pipeline. I've put I've listed some more resources and how to learn how to do a project. And to really summarize even less words you should read everything but Jenny by and worked. You should create choose or even create like with your teammates if you have teammates like the box in which you put and build your project so what's faster turn what package package do you use, and you should not be afraid to renew your tool set over time because your taste and your needs will change there will be new tools out there. But the good news is that if you're attending a conference conference you are probably not afraid to learn new things. So that's all from me thank you. Somebody asked what is the use of data. Can you repeat there are data file. Yeah, you say there's some point you said that they are that data file is not very much useful. No, yeah, but what is it's use. Yeah, I don't know because I only know they were shouldn't use that like I've never used the dot data file content myself like baby. Maybe for some work for it just to be a good idea. But yeah I don't use it at all. Okay. Project management with rent. How does rent under installation in the case of project sharing when the library for the packages is not within the project folder. So what when what for the package is not set project folder. Yeah when your package library is not inside the project folder how does rent handle the installation. So we've rent you always have one package library per project. So it really yeah. So I think central with the cash like if you are installing friends if you're installing a package for project and it has already been installed for another project on your computer it should be faster, but you still have like one library per project. So I'm not sure I understood the question. Sharing your projects. But when you're. Yes, yes, but yeah and something I need to specify is you have one library per project but when you share the project with someone else you don't share the library of packages running the run restore function will install the package. Otherwise things will be too big to share. Okay, so. Sorry, much so we are losing you please be clear to the microphone. Sorry. Yes, if you share with the with the lock file the rent the lock does install the most recent package version of the one that are in rent stock stock. Yeah. And then the last question is how do you keep up to date with us. There are several ways to do that so Twitter would be one but Twitter can be overwhelming, like a very good way to get started with that is subscribing to the our weekly newsletter. And our pinstein newsletter of course. Both of them. I think. Somebody asking please can I use this. To clone from there. Please can I use this to clone a rest repository from Gitter. So we which which packet. Use this I think is talking about is this not clear the question is not clear from. It's a very functions in use this to help you so there is one that's called create from GitHub. And that's the one that would clone a repository. So, so thank you very much Dr mile for the presentation. Thank you for sharing the session. Thank you to stretch to five minutes or probably less. After which we'll have the last time. I don't mind. Please refer to the mail this was sent to you yesterday for perspective and join your sessions by.