 We are now live on YouTube. Hi everyone. Welcome to Esmerconf 2022 and this workshop on our functions and packages in introduction to writing them. This workshop is being live streamed to YouTube and has a group of participants taking part live, a very warm welcome to you. If you're watching via YouTube and have a question for our presenter please reply to their tweet in the ES Hackathon Twitter feed and we'll try to reply to those as soon as possible. We'd also like to draw your attention to our code of conduct available on the Esmerconf website at esmerconf.github.io. Just for those of us joining on Zoom, if you do have a question please feel free to raise your hand and I'll come to you or pop it in the chat and I will ask it on your behalf to Martin. And just a reminder that because we're being live streamed to YouTube, if you do turn your microphone on or your video camera on, then you will appear on that live stream. With that in mind, I'll hand over to Martin Westgate from Atlas of Living Australia. Lovely. Thank you Chris. And hi everyone. It's lovely to see you. I actually can't see any faces on the screen and that's totally fine. We're here for the next hour and a half or two hours to hear about our functions and packages. So hopefully that's a familiar concept to some of you. I'll dive in in just a second. Before I do like to say that while I think a lot of you are calling in from Europe, I'm actually in a bedroom in Canberra right now. In Australia. And that's actually on the land of the Nunawal people and like to acknowledge that and pay my respects to the elders of the Nunawal people past and present. A little bit about me. I lead a team called the Science and Decision Support Team at the Atlas of Living Australia, which is a biodiversity informatics facility that stores 100 million observations of plants and animals from around the world. And we do a bit of our programming as well. And that's what I'm here to talk to you about today. So I'm going to start sharing my screen in just a moment now. Let's have a look. So I'm going to assume in going through this whole live stream that will for a start, I hope you can see my screen. I'm pretty sure that's the case. But also that you guys are fairly, at least a little familiar with that. And also I'm going to assume that you're using our studio like I am here. So not actually my like day to day development environment, but so, you know, if I have any like hesitancy about using this screen, then that's probably what's going on. If I'm going too fast, if I'm going too slow, ask questions at any time, you can try and put them in the chat and I'll try and see them or Chris might read them out on your behalf potentially if I miss anything. So yeah, so I've got a basically a workflow that we're going to go through here and like I say, it assumes that you guys have done some work in before and know roughly what a package is. And I don't think that's that's necessarily. Oh, we seem to have lost you, Martin. Just bear with us. We're having some minor technical issues and I'll see if I can get hold of Martin bear with me. Apologies for this. I will keep you updated as soon as I can. And it seems to have disconnected. Just bear with us. I'm sure he will be back permanently. Hi everyone. You might have seen as it went dark in his videos just had a power cut. So I think he'll connect by mobile and get back very shortly. So just a couple of seconds. Thanks very much. Can anyone see me now? Yes. Welcome back, Martin. Sorry. We've got a storm camera. Did not see that coming. So my apologies. I'm now seeing slightly more in the dark. If the power stays out, it's going to get slowly darker as I keep talking. We'll see how this goes. My apologies. Yes. So look, I'm going to assume that people have used packages before. And what I'm actually going to try and do is walk through a really basic example of building some functions and package. And I'm going to lean really heavily on the automated tools that are built into our studio. And the purpose here isn't to show what a full package will look like when finished. Is to give you guys a quick insight into the sort of steps that I follow and my team follows in the workflows that we do for building packages at the ALA. So look, just for getting started, if any of you guys have our studio open right now, all I've done. And I'm hoping this works. So I've hit the file button. I've hit a new project. New directory and an R package. And that comes up with this little panel here. Now this is all new in the last few years when I started writing packages, you literally had to define your own file structure and things. But it's really quite straightforward. I've actually pre run this and I've created something that I'm going to have called nice plots. Typing in the dark. I'm not going to hit go again partly because I would create two packages with the same name. Also because for some reason it crashed my studio. So I'm hoping that isn't the case for everyone. But once I reloaded our studio, this had worked. And so what I'm going to show you is that once you've run that is what you get. And basically this gives you an empty file structure. I'm going to minimize some of the stuff that we're not going to use with this content on the right in it. So to give some context here, not all of this is particularly important. We can ignore for now the R build ignore on the history. Those aren't relevant to us. The really core parts are two folders. One could R. This contains in this case a script that's already opened on the left. Doesn't actually do much. It's a little function that says hello. And that's more to give a sense of what the structure of a package is rather than because it's something particularly useful. Going back up in the file structure. There's another folder called man. I've never quite worked out why it's called that. We're going to kind of ignore that because that stores our help files and we're going to build those automatically in a moment. I could actually answer that one, Martin. It's short for manual. Oh, that makes sense. That makes perfect sense. Don't know why it's taken me about 12 years. Perhaps I should try and cover my lack of knowledge more carefully. The other thing you'll notice is that there's a dot our project file. So that is what I double clicked to open our studio and to set this as my working directory. And that's a useful thing to have. I'm going to assume that once you've each loaded your package that you've reloaded from that approach file. And then there's two things that are useful. Again, this one is the namespace. We're going to ignore that because again, that's going to be auto built. In fact, we're going to delete this file in a minute and rebuild it automatically. The important one is the description. And at the moment, I've just literally typed the word nice plots and that's all it's got all the rest of this is boilerplate that ships with the package. But in practice, what this tells you is some little piece of information about that. Who wrote the package? There has to be a maintainer. And that's to be at least one author and they can be the same person. And some little bit of text about what the package does. There is some content that becomes important later on, which is. I thought there was a suggestion here, but that's missing for now. It'll get built in later on and depend. So those are fields that tell. Package and later on print if we want to submit it there. What other packages your package depends on. So. I'm not going to write all this stuff in just now. It's enough to know that it exists and that content is, I think, fairly self-explanatory. Obviously before doing anything useful with the package, you populate that as a way of telling people what the package does. And for those of you familiar with Cran, that information gets put on the web page there too for your package. So the important thing about a package, obviously, is that it contains functions. And those are in a package is basically just a way of collecting a set of functions in one place so that you can reuse them. And I'm going to walk through a very simple example of what that might look like today. Literally going to just work on one or two functions. They're going to be extremely trivial in terms of what they actually do. But that will give, hopefully, an overview of what a function it looks like and how you use it properly, but also where it fits in a package. Now, normally at this sort of point, I'd stop and ask if anyone's got any questions or if anyone's been able to run that, to load a new package themselves. Are there any questions from the group at this point about that? So far, I've not seen any questions in the chat, but happy if anyone does have a need to either ask them or pop it in the chat. Yeah, I can stop at any time. So I'm happy to go back and explain details or try running something on my machine if you guys have any particular problems. If you have code that doesn't work, drop it in the chat. We can see what we have to do. So in practice, actually, I don't really want this function here. Someone's coming in with torches. That's very helpful. So I'll delete that in a minute. For now, I'm going to set a new... I'm going to start a new file. And I'm going to ask groups. And then before I even get started, I'm going to save this into our folder. And I'm going to call this something like... BackgroundColor. It may not be much of a giveaway to say at this point that we're going to do some example scripts that change visualisations in a ggplot. Now, the sort of stuff that you want to put in a file like this are two things. It's the function itself and the information needed to build the documentation. Now, for the purpose of this exercise, I'm going to... I ended up deleting some of this code, but just to show what the sort of thing we could do is with a function. So let's say we were to load ggplot2 and then we were to draw a plot. So I'm going to use the mtcars data set because it's built in and it's easy to use. I don't use this as an example data set very much and I don't know what all these things mean. I just needed something that had two continuous axes on it. So let's have a look. What I'm hoping is that this will render in our plot window. So does that work? No, it has not. There we go. So as you can see, a plot, a scatter plot, nothing too complex, nothing too adventurous. Those of you who have used ggplot2 before will be familiar that you can build a plot in this way where you specify some basic parameters like I've done in that first line, assign it to an object and then you can later on assign other stuff to it just by using a plus sign. So something that I have done before is set themes, for example. I quite like when I'm publishing work to do a black and white theme because it prints well. Something else you might want to do though is you can do all sorts of crazy stuff with theme. I'm going to do something hideous which is to use panel background and set it to some horrible colours. Sorry if this offends your sensibilities. So it's very easy to do something like that, for example, when you get a red background. As may become clear, this is not necessarily a good idea, but it is possible. For those who really want a lot of options, the help file for theme lists everything that you could possibly do in here. Now, I think this is long-winded, but it is really comprehensive and really useful. It's possible that you want your background to be read so often that it's worth you building new functionality just to do that without you having to go back to the documentation every time. So what we're going to do now is build an example function where we do that. So if we were to say something like background colour and name that as a function, the way we do that is we literally just define it as an object. We could use the equal sign, but the left arrow is a bit more accepted. And we literally just type the word function. Now, if we always wanted this to have the same effect, if it was always going to be red, we could do something like this, where we go. We simply copy this text that we've already written and just put it in a new function. So just to show what that does, I'm just going to load that into the workspace. And we've still got our plot from earlier on. In theory, now, we can go P plus background colour. So just to give, again, it is hideous. I understand that. All we've done is wrap the generic function theme, which has lots and lots of options, into something a bit more specific that we actually want to use. And you could extend this downwards. You could say, actually, I want the panel background to be red, but I want the lines to be green. A more useful thing to do is to define arguments. So at the moment, this function, the things you can tell this function to do are effectively nothing. Just by having opening and closed brackets, you see that it is a function, rather than just a named object in the workspace. So our knows the function knows to call the deployment, but there's nothing you can tell it to do. You can't change its behaviour. You can use it or you don't. Obviously, that's not a common or necessarily useful way for a function to behave. Something a bit more common is for you to say, actually, I want to be able to specify the colour for myself. I'm going to use American spelling because there's fewer characters. I hope that's okay with everyone. In which case, you can then literally just pass those around within the function and I will do the thinking for you. So you could literally say, I want the colour to be to go here. So if we reload that now, actually, this should fail. I think about it. Yeah, you get an error because you haven't told it anything, but if you give it some text, it's not nicer, but it is more customisable. I think in theory, you could actually say this to be like, oh, gosh. Let's see. I don't know hex codes off the top of my head. Oh, dark grey. Horrible. But anything that is interpretable by R as a colour could get past to that. You could also, if you were being very clever, say, set a default. There's two ways you could do that. You could say in the function itself, you could define something. So let's load that again. Now if we delete this text. You remember before, this errored out. Now it chooses red by default, but it can still be overwritten. So that's useful. Alternatively, we can do something like, we can tell what we want it to do if it's given no information. So we start with an if, and then there's a particular function that you won't have used necessarily unless you build functions because it doesn't really work in a reactive context, but it's called missing. Cool. So I'm using a new ergonomic keyboard, and I'm not very good with it. So that should read fairly clearly. If we're missing colour, that is, if no one has entered a value, then we tell it what our default should be. It should have the same effect. So again, load that into workspace. There's no value. It's yellow. We tell it there is one. It goes great. Hopefully that's fairly clear. So feel free while I'm doing this to be testing your own functions. Like I mentioned, theme has many possible options. You could set your grid lines. You could set your fonts. You could set the size of your text, anything you want. With this sort of a format, obviously wouldn't call it background colour in every case. Some of you may have noticed that this is a little bit repetitive, right? So we have, each time I've run this, we've had to copy this bit of text and put it into the workspace, under the console, I should say. It should be fairly obvious that if we had five functions or 10 functions or 100 functions, that would get very cumbersome. And that's the point of a package, and it's a point of the set of workflows that we're going to talk through next, is to say, how do we start doing this in a more efficient way? This is useful for iterating on getting something right. But how do we make that work more efficiently? Before we get to that, I'm going to assume that this is a sensible example function. There's an extra thing that you can put in these files that makes them much more useful for a package development workflow. And it's ROXigen. So ROXigen 2 is the name of the package that we use for documentation in our packages. There are other ways you can handwrite your own documentation. It works. It's fine. I've done it before. It's a little slow. The benefit of putting your documentation straight above your function is that they're right next to each other, and it's easier to spot errors. Plus you can get that. ROXigen is generally a bit easier to write than R's default markdown for manual files. So basically the way this works is you give it a title. You say... We should say something about what the function does. Set background color in ggplot2. You'll notice that because we're using RStudio and it's been set to recognise this sort of ROXigen format, it's now putting in my insets for me. So this sort of hash followed by an apostrophe is written in. So I don't have to keep typing those two characters followed by space. But the format for ROXigen is that you have that title, then you have a skipped row, then you have your description. This makes... Whoop. Is that working? I don't know what happened there. Okay. This makes... Countsbell. Beautiful plots. And then I think, for now, the only thing that's really important that we're able to do is tell R that we want this function to be available to the user. Hit the word export with an app in front of it and delete any code that isn't part of the function. So that's very minimalist, but it should function as in... It should work. Martin, sorry, Tania from the UK has just asked how do you get the automatic little pound sign with the quote mark after it to appear? Pound sign with the quote mark after it. The starting thing for the ROXigen comments looked like when you made a new line, a new one appeared automatically. I've literally learned that today. This isn't my usual development environment. I hand-typed that first one. So that's literally a hashtag and then a single apostrophe. And then after that, I'm just hitting enter and it's doing it for me. Ironically, deleting them is a bit slower. So does that make sense? She said, OK, thanks. Yeah, I learned it today. A useful feature. Apparently those guys at our studio definitely know what they're doing. So just to show where we're at then, we now have two files in here. I'm going to delete this hello one because that's just for example purposes. Yes, I'm going to delete that. And actually just while we're at it, I'm going to do the same in the manual file because we don't need this anymore. So we actually have less in this package than we did when we started. On one last thing I'm going to do, this namespace doesn't contain anything at the moment or that doesn't really matter what that means, but it's useful, but it's not terribly useful. When we use Roxygen, we're going to do some stuff that builds the namespace file for us automatically. So I'm just going to delete that for now and it will be populated in a minute. Right. So this is where we come into a workflow for actually building and testing a package itself. So until now, we've been playing around with ggplot. There's a few base functions that we've looked at. We've generated some plots. We've used MT cards and data, but it's all just been to get that plot, that single plot to work in a single instance. When we're doing operations on the package itself, there's a new package that we'll need, which is called DevTools. So obviously if you don't have this already, the function is install.packages and then quotation DevTools. It's already installed on my system so I'm not going to actually hit go on that. So I'm just going to load it. Now, DevTools, actually, I'm just going to launch this for a second, launch the help file for it. Assuming you've done things like load from a .rprog file, there's not much in DevTools. This is the whole help file. There's not much in it. That's a very misleading phrase. There are not that many functions for something that is this useful. It does have a lot of dependencies, so it builds on a lot of underlying work, but it's actually a very clean package to use. Actually, what might not be entirely clear from here is that the things you use all the time are just a single word. Things like check, build, create. I should have used create first. I forgot about that one. In practice, we're only going to use two or three of these very much. The first one we're going to do is called document. I'm just going to hit go. Now, this is giving us an error because basically what it's telling me here is that within the package nice plots, which I've just built, there's a function called background color, and that already exists in the workflow space. Of course it does because we put it there when we were testing it. Just to stop that error coming up, it might make sense to delete this stuff. Actually, a fairly sensible thing to do is to simply cycle your workspace. You can do something like restart error. That should mean the whole workspace is clean now. No, I must have saved it somewhere. Never mind. Let's grab that. We'll remove this for now and we'll try that document again. I've got to load DevTools again because I reloaded R and so my libraries aren't the same as they were. Document. You notice that's done a few things. It gives us an update. Basically it says, we've checked your documentation and updated it. The main thing we've done is written something called a namespace file. You remember we deleted that before? We open it now. It says it has this export background color. You'll recognise that as the name of the function we defined. You'll notice this is the same tag as is written in our file on it. It's just here. Export means that your function you've written is available for users. That might sound trivial. Of course you want people to be able to use the functions you've written. In practice, I don't know how much it's true for everyone, but in the packages I've written, I've always needed internal functions. That is things that do useful things, but you don't want everyone to see. It's not just because they're messy. They can be. They could be perfectly and beautifully written. It's just that you want the user to have as clean a set of functions as possible. Often that has the effect of hiding some of the more complex functionality from the user. It just seems to work seamlessly in the functions that you use. Actually, this export is not given. You won't use that in every case. Where are we at? We're basically at the point now where the namespace says that this function should be available to the user, and that function has been defined by this set of code. That's the whole package. That's all it does. Just to test it, we can now want to check that it actually works. Normally, you'd use library, right? Like we did for DevTools. I really doubt this will work. It certainly shouldn't. Okay, this should fail. I don't know what it's actually loaded, but what you should actually be using rather than library to avoid confusion, and the reason I raise this is because it's possible that there is an actual package called niceplots that someone clever has written rather than this really small toy example, in which case running library is a mistake. What you want to make sure is that you're loading your local one, and the function for that is in DevTools as well. It's called loadOr. Again, because it's just choosing your working directory as a package and setting that up, you don't need to give it any arguments. Just go loadOr and you'll notice that's come up nicely. Hi, Martin. Just would you recommend one function per .rfile or how would you recommend laying out your package? I like one per exported function. That's another point that comes up here. It's possible that, for example, backgroundColor would need functions inside it that are useful to me but no one else. It's also possible that those functions are only used by backgroundColor and not by other exported functions. In which case it's possible to write things, oh gosh, I don't know. What would this even be? Say if you wanted to do something like choose defaultColor. This is terrible behaviour. Never write something like this for real. But you could do something like SysDate. Let's do that. I don't know if people use this as one of the base functions. You could say if there is a 3 in SysDate, then that's a fairly complicated statement but it works. Then maybe the defaultColor is yellow. Otherwise, it's red. Again, an insane function to write. Not something anyone would do but you could put that here. That'll still work. It's just the default will change depending on where in the month you are and what year it is and a whole bunch of strange things. Let's assume that I'd written something sensible rather than something crazy here. That is a sensible thing to put underneath this function. It's not exported but it only relates to this one. There is another situation which is where you have functions that are really critical infrastructure to the whole package but are still hidden from the users. They appear in lots of different places and they should tend to get their own .R file even if they're not seen by the user. That's my policy. I think other people would have different ones. It depends a little bit. I tend to group my functions by theme. Stuff that's related to setting the background colour would go in one file but stuff that's related to setting the gridline colours would go in another for example. The way that you could do things like that is you can put more than one set of oxygen stuff in a single file. It works the same way. How did you choose the only background colour is exported and not choose default colour since it's the same file? How is only this function exported and not this one? It's what comes immediately after this export tag. We can do the same again. In this case I'm not going to do anything here but in that case these two functions would be exported but not this one. You can get really specific and say actually you can name it. I haven't tried this and it feels a little risky but you could in theory just put all your exports on one line. I suspect that would work but actually the colouring of that tells me that maybe it wouldn't. Not sure. Safest way is to have one set of documentation per function and have it exported at the end. Thanks Martin. I'm actually going to delete this for a second because I'll probably make a mistake and I don't want to confuse myself. I'm going to save that file and then what I'm going to do. I'm going to change the documentation enough for that to be an error and I'm going to load all again. Now the ultimate check is to say does this function still work? Let's have a look. We'll type in our same plot code as we did before. Background and actually no let's just run that first. Now I did actually write a note to myself to do this on purpose but you'll notice though is that it isn't able to find where these functions come from. Right? So I haven't called anything yet. In this case what I'm going to do is call yeah, in the right place. So I hadn't loaded a ggplot too. Let's try that again. That's worked and then background colour worked. Great. And actually we'd expect it to be read because there's no three in the date at the moment. Sorry. Such a dumb example. So that is actually while aesthetically displeasing from a package development perspective is the result we had expected. What I think would be interesting now though is that this we're fooling ourselves into thinking this would work because we actually haven't told the package that ggplot2 exists. You notice we've called these different several of these functions so we need ggplot to make any of it work but theme is within ggplot2 and so is this function element rect. So in practice any user that was going to use this and didn't load ggplot2 would fail if you were able to use it and that's where our description file comes in. So I mentioned before that there's some actually really important stuff that goes in here that's currently not there and those are depends and actually is another one which suggests I'm going to put depends in so basically what we do is we tell our users and the system that we need ggplot2 or none of this stuff will work and that's self evidently true because this these functions are from ggplot2 and to be even more certain that we can get this function to work without the user having to think too much about that we can tell it one of two things we can either say import all of ggplot2 that works it's not particularly tidy in the sense that there's a lot of stuff in ggplot2 and we don't use all of it to import a whole extra package to do two things is a little inelegant you could ignore that completely just have the depends and every time a function comes up you can specify the package that each function comes from like this as long as ggplot2 is in depends that should work without this import I'm told in practice the delay that that introduces is like 5 milliseconds I don't really know how much I should care about that but that sounds like a delay so I'm not going to do it that way so the last option is you say actually run just import the whole package and tell you specifically what I want from there I want the function theme I want the function element rex and I want nothing else so this is a way of you still have to install ggplot2 for this to work but for your package to work it's only pulling out a few of the functions from it so it's a little bit neater so a lot of detail there but all just to basically let this package know where it's getting its information from the only reason you ever really need to do more than that this is useful for the developer as well for a start so it tells you where you're getting your information from if theme was used by six different packages sorry the name theme was used by six different packages for different functions you've still got a potential ambiguity here in which case you might add the ggplot2 element back in that's a very rare instance I can think of places where the same function might be used by two packages it's not common to develop with both of them at once but it could happen but it's rare that a function name is used many many times so we're now confident that this package will load before I go on to talk so the next thing I'm going to talk about is testing and to me testing and documentation are two sides of the same coin I used to dislike them I come from an ecology background I didn't train as a computer programmer if any of you have trained as a computer programmer this would seem outstandingly basic to you the CRAN package ecosystem doesn't require you to write your own tests to get things on CRAN and many developers have pointed out that because of that it actively disincentivises testing of your package because they run tests on your package on their system and so if those tests fail they can archive your package if you have no tests it gets past that problem but it means your code is probably wrong sometimes you won't know the more package development I've done I came to good documentation first because I found I can remember how to use the functions that I've written even though I spend a lot of time in them and that feels wasteful and then as a logical extension to that tests bits of software that check that things work the way they should and so they embody decisions that you've made about the functionality of your package and its behaviour and what I've come to notice and by far the first person to do this and in fact the recent test exists is that you forget the decisions you made on your package a long time ago and so it's very easy particularly as the number of functions and the number of lines of codes increases it's very easy for you to break your own some bits of your package without even noticing and a test is designed to catch that situation where some great idea you've had that changes a set of functionality mistakenly breaks a whole other set of functions you weren't looking at and it prevents you inflicting that challenge on your users Sorry Martin, just going back very quickly to the import from stuff Neil's asked, I find it difficult sometimes to remember which packages my code depends on if I don't use package, codon, codon function, is it best to update import from every time a new dependency is added or is a way to find dependencies in a function you've written without package, codon, codon function? Yeah, so that is a good point So I have come to use import from for that same purpose and so that is to say I use it to remind myself what I'm depending on because at a certain point in a package development process, I'm not sure if everyone does this I think it's not terrible practice I have another look back over my dependencies and try and make a decision as to whether they're actually being useful or not and it's surprising how often you import a package to use one function and then realise there are six other ways to achieve that without relying on that dependency So my process is that I put it at the top here and I write this as I'm writing the functions because I can't remember afterwards and then when you build document actually, let's run that now I'll make sure this is saved Save that I can't even type it See how it's been writing namespace if we click to that now that's written in there So the namespace acts as a lookup to everything that's getting imported and I'm trying to remember I think that ends up being alphabetical it might not be it might be in some obscure order but I think it's alphabetical and in practice what it means is because this is automatically generated but description isn't it's actually possible and I've done this myself to write things into depends that you later delete and they get removed from namespace automatically but not from depends and so your and your art doesn't know what to do with that it doesn't know that that's wrong so it continues to depend to think it's depending on things it imports those packages when you install it but never does anything with them and so actually using namespace to like check back against the description and index one against the other is a useful way to see if your dependencies are right Neil is flat outright that being really really specific about this comes from ggplot2 but you know I don't know what say this came from somewhere else is a more visual way of seeing every line by line what you're relying on so to some extent it's by it's what helps you to some extent it's style it like I say there's the documentation I've read I've never benchmarked it myself is that there's a slide slowdown from adding the package name but on a scale that most people won't notice so so that's my approach I think others are just as valid and actually just looking in the chat Neil's right that things like AES also functions it's that is actually another benefit specifically for plotting stuff is that ggplot has more functions and you'd think if you look at this example we just wrote that's three functions right there ggplot AES gion points yeah almost as many functions as there are other things and that's intentional right like that's the way that package has been built is to be fairly modular and to have functions that like gion point you can specify arguments but you don't have to it still does something kind of nice so you kind of forget that they're functions and so if you're trying to mark each one of them into those packages from it just takes time it's a lot of repeated text as well which we're often trying to avoid in writing packages sorry I've just had another question come in via email so someone's just watching on YouTube a brilliant conference they just had a quick question so this person has a shiny app that uses five or so quite long functions they've written themselves will the shiny app itself load faster if those functions are sort of removed from the app itself and put into a package that's then loaded from the shiny app rather than having the functions written out directly inside the app ooh good question I'd expect so you could benchmark it fell easily right like you could test both of them just by copying your file and doing each of them so there's a actually just while we're on the topic then so the micro benchmark package is what I use for this it is for really fine scale differences so it makes sense to do like for testing you know small functions and because it by default runs 100 replications if you're testing loading times on a shiny app you'll be there a few seconds so you can change the number of times it runs the way this basically works is you know I'm not going to run anything too much now but you say you give it many functions inside it so so you put things within curly brackets and say well what's the performance difference see what that does ooh that's accurate nanoseconds time to avoid I don't know what that is so yeah you can just run sets of code within curly brackets and this is how we how I tend to use it in practice it's usually fairly obvious which the quicker of two solutions is going to be or you don't you can't think of another way to do something anyway so it doesn't matter how fast it is so it's not always necessary to micro benchmark things for that shiny app example I'd expect putting it in packages one solution there's also an option it's been a few weeks since I've done a shiny app but you can put things in an R folder in there and I think it uses source to load that SOURSE that is and I suspect that would be faster than just literally putting all your functions within a single script by app or oh no shiny apps are single files these days don't they I'm remembering them from when they used to be two when you used to have your UI and your app separately I would think that you would putting them in a separate R folder and loading them would be quicker but it's largely a guess and something like micro benchmark so thank you very much yeah no worries yeah and it's funny actually I deliberately put anything on shiny in here partly because it's hard to keep current because it moves quite fast but it's just hard code right like maybe in a future session we'll do something on shiny apps say are increasingly useful but also difficult code in some ways right anyway getting back to this where we are last thing I did was oh so look the last thing that's useful to do once you've done your so the last thing I did was document and then load all so you load that package back in and then the obvious thing to do is to we didn't run a test to make sure that the function works is to look at whether your help has rendered properly but didn't spell it right and the benefit of DevTools is that if you're using our studio looks identical to what an actual help file would look like if you're using the console that ships from Cran this will just load in a browser so it's still useful obviously this is very minimalist instructions but you can see that you've written some some help information there and it's simply been rendered out of this this commented code here and actually it's added things that you wouldn't it's recognised that this is a description without you having to tell it for example as well it's recognised this is a title without you having to tell it so it's done some fairly clever things it also gives you the name of the function and the name of the package that it's within if we wanted to go the whole log and test what this looked like when you actually install it you could put this on github and install it from github and then you could actually you'd get an index at the bottom here where you could flick between functions from a single package so that can be useful to do when you're a bit more advanced than we are in this case but we were going to move on to testing next so yeah I was just saying before that documentation to me is really helpful for to remind myself what arguments something needs so actually something that is critical to documentation that I've completely missed here is oh gosh what's it called sorry I've got my notes here and I'm completely gone blank so there's a name, no it's not one of those param parameters should remember that so the user is actually going to need and by user I mean you in about an hour's time when you've forgotten what you've written so usually the format here is you say the first parameter we've got is colour and then you tell yourself something about it this has to be a string that can be interpreted as a colour eg a hex code whatever it's surprising how often I forget to do this and come back and go what is colour supposed to be is it matrix, is it data frame is it a string can it be a factor it sounds done but it's really easy to do and so having that information in there makes it much more straightforward it is of course critical for GRAN so if you ever want to put your packages on GRAN you have to document this stuff as well but just to show so I'm going to keep doing the same workflow just to show that I literally do this every time document build your documentation load all to make sure it's in your workspace and then you can check your Martin sorry we've just had a question from YouTube so what's the difference between depends and imports in the description file right yeah there's three others depends and suggests so the two I use most depends and suggests depends on things you have to have or the whole thing doesn't work suggests things that no I use suggests generically what it means is they're useful specifically what it means is that you use them in some of your testing or your documentation so it comes up a lot when you use vignettes imports I forget the difference between imports and depends there is one I've looked at before and gone that's clever but I can't remember what it is so yeah I go look at Hadley's book for that because he's better at this stuff than me sorry to not be more help but I simply cannot remember yeah sorry everyone so so yeah just say that then that that document load or workflow something will come back to all the time in practice you can use the up button to literally just load the previous bit of code you did and I end up my workflow ends up when I'm particularly with documentation I'll see a typo where or something is informative it could be so for example say here that this has to have length one and then what I'll what I'll do in the workspace is I'll literally just use the up button on my keyboard until I get to go back through previous functions run document again run load all again check the documentation and you notice that's been updated there now that might sound inefficient and for something small like that it's trivial but in practice actually building effective documentation for something other than a trivial function like this can take a bit of time if you've got multiple arguments if you have something we should say here is we should tell what the default behaviour is for example in this case that would be very complex building out your documentation so it's as clear as possible takes time and so actually you don't come back to the console as often as you might think the other thing is that literally even though this text and this text are nominally the same thing because this is so much easier to read things that you thought were sensible on the left end up being complex on the right and so this interaction between the unrendered and rendered versions is actually really useful for getting your documentation right and so this it looks inefficient but to loop through this process of getting document load or check your help is something I end up doing a lot of so yeah where was I going so better it slows you down to do better help it slows you down to do more useful tests so when I was starting writing packages and still if I'm just doing something for me and I'm in a hurry particularly for proof of concept if I don't know if a function is possible or can work I won't do this stuff I'll literally just write some functions load them and see if it runs the benefit of documentation and testing is that it makes your future life easier as well as your users and so I'm in favour of it these days for that purpose and particularly for packages you do intend to be more widely used I think it's it's very the case is very easily made that that's an important function so testing I'm going to move on to that again feel free to jump in with more questions if you've gotten about the stuff we've done so far and I can go back and show more stuff about functions if people are interested how to you know but in effect the details have been covered there um one thing I was now come if I've got time I'll come back and talk about how you deprecate functions but that's not a generic use case so look in terms of how to build tests once again I'm going to rely very strongly on prepackaged solutions to this so there's a package called test that um which unsurprisingly builds and runs tests um and is a very simple function that actually is I think the function I'm about to use is we then use this so let me run load that it got loaded with dev tools but I'll load it again anyway um so the function I'm going to sell is called use test yeah so basically rather than me say oh you need to file with this name and within that you need to file this name and so on because that file structure is really important for how to be able to find all the stuff that needs use this as simply just um some type in your place no I don't want that um just does that work for you so in this case I'm literally going to just say the name of the resulting file that I want the function that we're going to test relates to background colors so I'm just going to say it should be called background best way to see what that did I mean it gave me a description which is lovely you'll now notice there's a new folder called tests um don't need to worry about what this script does at the moment within that there's a folder called test that and this folder this file here just has some boilerplate text in it to show you roughly what it should look like when you have so there was a question earlier on about how about naming um files and how we group functions together into files my practice for tests is that if you've got a dot r um file in your that contains functions you're using you should have a corresponding test script um so that there should be basically the same number of scripts in your r folder and your test that folder and my reasoning there is that if you've written a function you should write a test for it within reason um and basically this gives an example but just to walk through this what this consists of is always has the first nine same first nine characters which are a test that um the first thing that you give us a bit of text that is a note to yourself almost no one else will see this grant if they run it uh but it says what it is that you're trying to achieve because it's not always clear from looking at code what is that code is supposed to do for a function you write your documentation that tells you for a test you just write this single string and so um we might say something like um test oh we don't need to write test that um let's think uh background color changes when we tell it to right so now our job is to write a piece of code that evaluates whether that is true this requires a little bit of creative thought and actually working out what are sensible tests requires some thought usually it's the set of things that you want your user to be able to do and you want your package to be able to deliver to them you write a test that replicates that behaviour in this case most of the time when I write functions they're like manipulating data frames or um there's actually a result from it that you can test ggplot is to state the obvious it all the functions don't they do return something but their main purpose is the side effect that they generate a plot so if we were it's very difficult to just say ggplot you know mt cars again and then run a test on that because you've produced a plot we haven't produced any objects to work off so what I'm actually going to do I'm going to take our example from earlier on putting this from a separate script so that is I think the example we used earlier on I'm going to delete this example because it's distracting me um and this last I'm just going to check that it actually works for us here okay that's the example we had before like I say it's difficult to to test that so what we're going to do is we're going to output it to an object which we're calling p now if you run p all you do is you get the plot again but if you look at it um you actually get a really complex object so this is huge but I don't know if anyone's ever tried doing this but but ggplot two objects are very information rich they store the data that's plotted on it they store all the aesthetics you know there's the data set in fact look that's the whole data set I guess that makes sense right because we have provided a whole data set but not just the aesthetics that we asked it to draw for example so in in practice if you've got people's ggplot two objects you can extract raw data from them rather than just what was drawn that's really quite neat so in this case again this is fairly cumbersome and normally I wouldn't do it this way but we can use this so what I've used is um STR to find the structure of the data that's in there and the thing that background color changes is that the background is red and if we scroll down we don't need to start scales no it's not that theme so there's a theme parameter there and then panel background and then it says the fill is red right so about five layers down in there there's an object and we can we can literally just map this using the dollar sign so as though well because it's a list right um so we can go down this list or down the nest of this list I guess until we get to the bit that we've changed which is there and that should be red if we've worked so now we need to tell that red is the correct answer so there's a set of functions all beginning with expect in this case I'm going to use expect equal which we might use quite a lot so and we told first we tell it the thing to look at which in this case is this aesthetic here and then we tell it what we expect it to be just a question and this is a question for me wouldn't that break on days or dates containing a three yeah would yeah and yes when I wrote the tutorial I didn't think I'd do that quite as our nested function inside it so in practice what you'd have to do oh gosh you could do a few things let's let's not keep that that's what you do if it didn't have that unusual behaviour you could do um now let's think what would be a better solution if we rerun this without background color what's the default background so we can find that by just going at same location seeing what the basic is null so I think there might be an expect null yeah right no you know what that wouldn't work because we want it not to be null right so I guess sorry this is my internal model going on here I'll try and explain a little more clearly what I'm doing um if we don't use background color or background color fails the value that we test comes back as null so we need to test that it's not null so we might use one of the expect functions that evaluates logicals so expect underscore true expect underscore false so we might say expect false uh is.null and then the name of that not that one this one so at the moment that errors because it is null but if we put our background color in that value is now red and so you notice now that I ran that bit of code saying expect false and it didn't do anything it just went straight past it and that's expected behaviour when your tests pass does that make sense yeah bro thanks Martin yeah and as that example might show there is no right or wrong way to write a test so this is in many ways a simple function that you're right though the fact that another way to do it I guess it's possible you could get an error where background color generated the wrong color but still not a null right there's loads of ways that could happen maybe there is the way I write code so you could do it something like oh is there expect any no what you'd say is you'd expect expect true and then you take this thing I'm trying to think of the safest possible test you could write here and I always get my in things the wrong way around so we know that it could only be yellow or red yeah yeah okay whatever so does that work yeah it does so we could do expect true that it's one of those that would work too and that's a little safer than the null one because null is pretty uninformative so yeah there's loads of different possible tests that would effectively test the same thing and this is where you'll get caught out long term right is that there'll be some really odd edge case where someone manages to make your code break in such a way that it makes your background black and you didn't test for black because you've never thought of it but it slips past your test because they're only tested to decide to look for yellow or red or something so no test can be perfect or complete for any remotely realistic or complex function but they're a good start and they encode what you expect to happen which is a really useful purpose as well so this here is it is a test it is checking you haven't made a mistake but it also is a statement to yourself that this is what your function should do in the same way of the documentation is a description of how it should behave and so as I mentioned earlier on thinking really hard about how a package should behave this is part of that process and it does slow things down it does force you to think really hard it also it means that if you change your decisions about how things should behave it makes it harder to pivot your package to behave in different ways but once it's there it sort of freezes it a little so it makes it more robust in that new format not sure that freezing metaphor works but never mind so look I won't spend a whole lot more time writing the same tests over and over but that's the principle there so basically you could do an endless number of these you can just use expect functions and call code within it that does manipulation but my general approach is to generate output that's been manipulated by your function in some way it's the behaviour of that using the expect functions and as long as it's all within these curly brackets then that's a piece of code that can run and then of course if you have a second test you want to do you just put it here I don't even know what that would be so save that now I guess the obvious point is that we've checked that to make sure it works so it seems to pretty happy with that then the obvious thing is how you get to run and once again there's one of these lovely short commands from dev tools you run tests it works you're fine I think there's something like 200 or 300 tests in the package I worked on recently and the more complex the set of transformations those tests have to do to run their checks and the more of them there are the longer it takes so expect that to run for minutes if you're doing something more complex but what you do see is that it's actually really nicely formatted output it tells you which package you're working on it's got this little table stuff format where so fail warn does that one stop? no that's fail I forget I'll skip it says at the bottom silly me and pass is represented by an okay you'll get one row per file so if we were to write more tests within to test dash background dot r this number would go up so if we'd written 10 tests you could plot that some of that row would be 10 hopefully it'll be okay but some might fail or warn and then for the next file you'd get another row and so on and so forth so that's the summary and this is the outcome if any of these had failed you would get a detailed description so we can test that by just changing this so we know that this doesn't return black or purple but if we run that test you notice now not only do we have that something failed where it failed not just what file it failed in but which line of it failed which is enormously useful and this text that we wrote at the top that gives a detail about specifically what we're trying to check gets printed here too as well as the specific expectation that was failed so it's really detailed and while it's devastating when you get dozens of these because you think you're being really clever and you think you've nearly finished a package and then you get errors this is the level of information that you want to be able to fix that bug can be very useful for fixing errors that you've encountered so that's test the last thing that I want to raise is is a different kind of testing which is default testing called the RCMD check I recently forgot this existed before trying to release a package and remembered with about expecting to submit my package to CRAN and then went oh hang on isn't there a check function and had just another three days work fixing all those tests so it is worth remembering that it doesn't exist like say we've looked at test test is the things you write yourself to tell yourself and the user how the function should behave check is the stuff that CRAN writes for you and says no your package has to behave in this way or it's no use to anyone again it's written in the dev tools and we'll see what we get you can see this is a little bit more low level like there's a lot of checks some of them are written in languages that can be a little bit confusing if you're not familiar with it and you get a slightly different output at the end the important point being here that these should all be zero if you want to submit to CRAN for example or really just to help your users out and the warning we got here is that we haven't given a license so that's not particularly helpful but would be in description oh there we go just there I'm just going to type that actually that might even work so you notice that there was something that just came up there saying checking package dependencies so we'll check that there are no that when you've used a function from somewhere else that it knows where to find that function so we had a chat earlier on about that need a license file you see so this is the sort of thing that you get a lot is you think you can fix something and then you get a slightly different error same is true with test getting through these time but you notice though that these are in decreasing order of severity errors are worse than warnings which are worse than notes and before we had a warning you never got a note so that is an improvement it's just that we need to have different information so I'm seeing someone I had an error because I didn't put depends duty block yeah that would happen so I didn't really focus on that so I might be leading some of you down the garden path a little bit with this I think that's the just to go through them all in the major functions that we've used today actually we didn't use build but we could have done that document for building our documentation so someone in the chat has just pointed out that they had an error because they didn't put depends duty block so I think that's important to highlight I think that you make sure your depends and suggests are all up to date yes absolutely so yeah depends is what might not be clear I guess when we hit document it writes and actually we just did it so we can see it writes to the namespace so in our case just gets a little the challenge that I have with package development is there's a lot of moving parts lots of files they have to be in specific places you edit something, affect something else and that makes it a little hard to follow but just to walk through the logic here I know that's the test file we've done with that for now we told for a start we used some ggplot2 functions we decided that's how our functions work is to call on ggplot2 then we told it to import from ggplot2 those functions that we used and then we ran document and that put them in the namespace none of that interacted with description at all and so it's quite possible to have a really well documented package if you haven't added that one piece of text it doesn't even know ggplot2 exists and I think look I didn't develop any of these really useful packages that we're depending on here but I think it's because so much of the description is often manually written you have to write your description yourself you have to write it at all the format for authors is actually fairly complex so for example if I was to put so often we replace a simple author tag with some code called authors to let notes our code and we literally write in using the person class my name is this, my email is this your role and there's like a dozen loads and loads of possible roles for example so anyway the point being not that that matters particularly and you can add any authors and stuff the important point is that there's no markdown version of this, this is the only thing you edit and so I think the reason that DevTools doesn't update it would be fairly trivial for it to say make sure that anything that's getting imported from goes in depends but it doesn't and I think that's the reason because it doesn't want to mistakenly overwrite something you've done yourself someone can tell me if I'm wrong about that what is interesting though I didn't write this and it wasn't here before so clearly there is and you notice it not only says it needs test that but it also says what version of test that it needs you may be saying before that suggests is for things that you or package needs to run or to build even if the functions themselves don't use it and test that matches that so because CRAN runs test that you have to have it suggested but that's been added I'm not sure at what point I'm not sure test that maybe it came out of test that when we said use test yeah I suspect that's what's happened actually so I'm literally just thinking about this for the first time I think use test has generated that so let's have a look at the description doesn't actually tell us much but yeah I suspect when it sets up the same the file structure we used for example so you remember that we created this file I think it's probably imported that bit of script at the same time but if you use underscore test as part of the use this package then that does tend to have functions that automatically update your description file for you so yeah right it's funny isn't it so you tend to do this stuff once you build a package like you establish a package once you set up your test once and then after that everything is just adding in new content and so actually what it does the first time is behaviour you see very rarely so I didn't actually think about that until now but maybe that's what's affecting it there but thanks Chris yeah that's really useful I hadn't thought of it right so look we're at 8.30 am I my time on so I'm glad about that I might just pause just in case people do have some questions there's a little bit more stuff I can talk about but that is the core information that I want to cover about packages and package development so does anyone have questions they want to contribute to the chat or feel free to speak up I don't mind talking if people would rather We've not had any extras yet and I've not seen any on Twitter but just wanted to give a bit of a shout out to one of the replies on Twitter from Matthew Granger answers the question about the difference between depends, imports and suggestions it's a quite helpful link so I don't know maybe Neil could add that to the YouTube description as well yeah lovely yeah no Matt knows what he's talking about people should listen to him on many topics lovely okay no that is helpful thanks Chris so in which case we don't have to stay in the whole two hours necessarily but there are a few more things that could potentially be useful so something that's there's two topics in particular one of which is deprecating functions so this isn't something that will be an issue while you're first developing a package but something we've found particularly with packages where the syntax is experimental although there are many possible ways to do something and you might want to give people different options it can emerge that you want to retire one way of doing something so if for example we decided that from now on we wanted to create a function that was just called background and we wanted to have more options so a plastic example is that a lot of the gplot 2 things have a colour and a fill it's really trivial to amend our earlier function to do that so let's have a think maybe if now our default is that the edge border is red and the fill is yellow and actually now previously we had to literally enter this stuff in the console but now we know that we can do we can save that file we can load all and we can rerun our code from earlier on so now we have fewer characters so it's a little misleading because it's only the background rectangle that we've changed and so it's the edges that are red not all the gridlines we could change that by going into theme and setting all the gridlines to be red as well but we don't need to for this illustrative case but let's assume that's so useful to us and we so prefer the language background rather than background colour that we're going to retire background colour so maintaining two function you've written them both already so that's one option but an alternative is then to literally say actually we're going to keep background as being the function that does all our work from now on and everything that we're going to retire we're going to make into a shell that just calls our core functions and so a way that you could do that then is you would say background colour rather than having its own functionality inside it that you then have to maintain and check and run tests on and all that sort of stuff is to keep the arguments the same but you just run that through background now you notice though background and background colour are not synonyms because previously the aesthetic it changed was fill and now colour maps to colour to the edge line not to the fill and so in practice to get this to work you'd have to correct your defaults to match your historic behaviour not your current behaviour so previously I think I was fill equals red and I don't know colour was black or something like that something like that and you notice now that there's nothing going on inside this function except it's calling the new one so if we again run our whole like cycle again load all so that is I mean we didn't have a grey edge I don't think I don't really remember but that's the default that our previous users would be expecting it's the same syntax they were using but we've effectively gutted the function so we don't have to run tests on it and now we can reassign all our effort to making sure that background works which is the newer and more flexible function and in practice what you'd probably do I'm not going to do this now necessarily why not is you'd say something like um we're not going to use this after version 1.1.1 or whatever some sort of message like that to suggest that stopped it working something interesting okay that shows unexpected behaviour maybe because I put a message in that broken I don't know what happened there um alright I'm not going to look into it in too much detail because that perhaps quite a rare use case but it is but that principle of maintaining previous behaviour while moving on to a new syntax and functionality sorry Martin just sorry having deprecated a function before there is a dot deprecated function that you kind of wrap it in but it's probably a bit too complex bit too much to explain sometimes but there's a manual documentation that explains it yeah lovely yeah um here's a confession for you we've done it a bit lately but someone else in my team handled it they did an amazing job and the deprecation messages are lovely but I can't remember how it was done so yeah there's some ignorance for you um the last thing that I'm going to raise and because it's come up for me a little bit I don't know how universal it is but it's putting data into a package so um I mean this is fairly it's a simple piece of code but it is quite useful to have so I'm just going to give an example where let's imagine we wanted to generate some data oh and this could be anything um so that's what let's see what we've just created nothing very interesting right just some just some information but let's assume that that's a standard for some critical piece of data that your package needs um obviously the most efficient way to store that in ours with a .rds file and there's a built in way to do that again it's within use this so this it's quite straightforward actually to just go use data um a data set in this case uh and then the last thing we'll do is go internal equals true and the benefit of that and it tells you what it's done is that within our our folder now that data exists it's always called the same thing sysdata.rda it doesn't matter what you store in it it ends up getting called that and what's amazingly useful about this is that now if we go remove a and then load all we can call a data set at any time um so someone asked a question earlier on about shiny apps this is a case that's the case where I've used this before um sysdata can be anything so I've used it to store um like UI information for a shiny app for example it doesn't have to be just data there are some big packages that are all about data in fact I think um Wolfgang which I was talking about metadata I think that has a similar structure built in um in which it's very efficient, very small shares a lot of information very quickly and it recalls it very fast too so always useful to remember that you can store um data as well as functions within this R folder that said they do have to be just one RDA file you can't have sysdata2.rda for example so it's usually if you have lots and lots of data then usually you need to put some thought into how that needs to be stored um I find that um lists are your friend there if you've got um you can have many different kinds of objects within slots of a list and just store it to this RDA it does mean you need to index things quite carefully but it does work right that's literally all the stuff I've written code for um I guess the last few things that I want to flag are sort of warnings or points that I've noticed about development and it's a little bit um hard to know how much detail to go into into here but um what I might do is I might share some um some examples from some stuff that I've done that isn't in our studio hang on right I'm just going to share my browser so first thing I want to talk about is naming so this is the um help file for a package that I've been working on a lot lately it's called Galar it doesn't matter especially what it's for um it gets data off the internet basically about biodiversity but that's fine the important point here is that we put a lot of work into our function naming and that's not something that's easy to go into in a workshop like this where you're trying to show someone line by line how you build things because it gets too big too fast but basically we made a decision that this package was going to do a lot of things and therefore it needed a careful naming convention for it to be able to remember what to do so we basically have three types of function now um the whole purpose of the package is to download data anything that downloads data has an atlas prefix and then the name of the type of thing that downloads afterwards occurrences are a type of data counts um the numbers of them you can modify what kind of data you get using a function that begins with the Galar prefix which is just chosen because it's the name of the package and people should remember it in this case the suffixes are the same as for major dplier functions and they work in the same way with some exceptions because you know there isn't a geospatial element to dplier yet I don't think and then there's a bunch of things that search or look up information and so the point of this is not to say that this is the best syntax in the world because for a start we had a different one a month ago and we changed it because we thought we could do better so I'm not certainly not an argument that we've got everything right yet but helping you use by having names that for a start every function is in the same case it's all lower case with underscores so they never have to think about how to type it and that's classic tidy verse of course um there's only a certain number of prefixes that narrows down the complexity and the prefix tells you something about how the function works and what it does you could come up with different rules for different packages there are different programming paradigms that like this wouldn't work very well if you were using R6 I suspect it's a guess I'm not very good at R6 um but um helping your user as much as possible with your naming conventions is a good principle and it's very difficult to do well so we've done our best there the other point is about that is useful to raise and one that comes up on Twitter quite a bit is about dependencies so this is another package I've written incidentally that has no tests so needs an update so it violates my own rules I've described to you today um this revtools is um you know I think was kind of neat when I released it I think it's still pretty good it needs a little bit of polish but still does what it says on the tin which is to visualise patterns in text and then we need to select points based on their similarity um to do that it has shiny apps built in you'll see here shiny and shiny dashboard it has text mining Snowball C Ngram it's got the plotly visualisation library um and it's got some um statistical manipulation tools at E4 it's an ecological um modelling package it may be clear from that that those packages don't have much to do with one another and therefore have their own dependency trees that are fairly large independently never mind when you glue them together in practice when you actually install this it's uh it's somehow like a hundred packages um now that I feel was critical when I built it there's new packages that have smaller dependency trees that I'm shifting this to and we'll put on Cran as soon as I'm able um some would say that there's two counter arguments one is that are is pretty stable people are good at maintaining things if you pick packages you trust dependency trees aren't a problem the other point is that that's a lot of downloads people some people run slow internet connections some people don't have fast processes this thing does take time and it's annoying and it restricts people's access to your packages plus it's possible a package will break and then they can't download it and this has happened like so I think topic models that package uses a modelling framework in here breaks on some operating systems and they can't use ref tools it's not entirely common you know happens to the best of us but it does happen so look you notice that this is in imports not depends that's the difference isn't it please depends for are and imports for everything else I don't know if that's the correct answer that Matt does go ask him um so sorry I got distracted there point is um you're going to have to think about dependencies once you get into any package that does any large amount of stuff what conclusion you draw is up to you um we've got a real problem with that at the moment in a specific case with a package called Biomod 2 which is actually a really useful package that I'd encourage you to check out and it's actually under active development they're doing cool work with it at the moment but we're using it purely for a back-end modelling solution and you'll notice it includes things like um or for sites built on Rasta which is going to be out of date soon but it's also used in plot 2 it uses RastaViz visualization framework for a staff it also until recently I thought it used shiny and it seems they've dropped it so that'll make our life a lot easier the point being that packages that do everything can be difficult to depend on because they bring their own enormous dependency tree so if someone were to do some analysis that built on ref tools the better solution would be to say actually what is it that I want to replicate and go to the underlying package that actually does all the work rather than import it through ref tools and we're thinking that there's an open question whether we do the same thing with Biomod 2 so those are big questions but ones that you might find and the last point I'm going to raise before I stop talking because this is you know this is supposed to be a workshop and I realise that I'm not giving you exercises to code I'm just speaking now but nonetheless it's easier is a concept of it's about non standard evaluation so this is the idea that that you can put in unquoted texts like you can in tidy versus functions if you're doing like if you're using dpi for example you can quote column names without putting them in quotes stop sharing my screen because it's not important anymore so that is really intuitive code to write as a user once the package is built it's very difficult to write as a developer is my perspective on that and it requires quite a lot of care in processing objects in some ways that are not entirely straightforward and so if you're thinking of using NSE I'd suggest just using depending very heavily on packages that already do and just passing functions straight to them so you know depend on dpi and all that sort of stuff don't try and rewrite it yourself so that's all the content I had to cover I realise we've come in at 15 minutes under time which I hope is acceptable to people but we do have some time so if people do have further questions please feel free to jump in I've had a thank you it's been extremely useful good to know thanks it is hard to know exactly what with such an open audience what present is difficult I've certainly found it valuable I've fell into developing some packages and I think there are some things that I've definitely picked up from that that is very useful moving forward it's funny it's what what I find is everyone I speak to who is even remotely engaged with this sort of process has a different has a slightly different workflow different tips or tricks I always learn things from doing this myself so I'm glad that's been reciprocally useful I suppose I have to ask and this is partly from my point of view but I know you used our studio for this but you said you use a different development environment and I suppose we do certainly in this conference we focus quite heavily on the use of our studio for a lot of audience members but it would be I guess valuable to hear what other workflows and other tools exist that people use oh sure yeah look I mean I don't know so our studio is nice it's an aesthetic thing functionally I can't criticise it it doesn't work very smoothly on my work computer and that's caused me something like it is not a theme in breaks what I use is I just use a text base um I could text editor called atom yeah it's produced by github it was cool a while ago and isn't anymore but it integrates well with github unsurprisingly it shows you which files you edited shows you your file structure very clearly and otherwise it gets out of your way so that's my solution to put that to the to the console that you download from gran and so in effect it ends up looking like our studio because you've got your script yeah so I personally I use visual studio code which is probably a similar issue at workflow but yeah I just think it is valuable to hear what other people use for their their development yeah look there are equally good and better solutions than mine it's just you know you tend to I remember a few years ago doing a bit of looking around and settling on that and it works there are reasons for and against I'm definitely the same with vs I used it for developing in other languages and and then tried using our studio and it just was unfamiliar to me so I didn't want to learn something new and we have had some more questions that have come in so one from Claudia who says what tutorials courses would you suggest to get a more comprehensive idea and similar question from Constantine or you're saying what would be your recommendations for going further yeah so look the obvious answer there is the book by Hadley Frame, Hadley Wickham sorry wrong headley yeah literally just called our packages I think you can google our packages book and it will come up and look that picking up even a digital version of a book is a bit confronting and a bit scary but it is I found it easy to read I can go to the bits that I need information on and it tells me very clearly what I need to know yeah it's free so I found that really useful I'll pop it in the zoom chat link and maybe again Neil can add it to the YouTube description yeah lovely so so recommendation from going further I guess I'd interpret that in the same way if you're looking at package development yeah other than that I mean that I guess there's a broader point there which is that all the tools that I've talked about today weren't around when I started and actually our studio didn't exist when I started and the development team there I mean I know that lots of people have said this but the tools are just outstanding and there was a bit of a kerfuffle a few months back when it looked like Cranwell going to archive one of their packages and the response seemed to be like of course we were going to fix it like we have a bunch of people who know what they're doing and they work on it all the time and to get that sort of support in an open source software ecosystem is I mean kind of unbelievable right I was talking recently with with a new colleague who works in python and of course there's a lot and there's always an argument about our python being better but I said look I'd love to learn python I find there's a lot of this interesting about it I've used it before it's harder to install and that stops me and that might sound really pathetic to a lot of people but the sort of the sort of benefits you get from for start having had Cran be stable and then having our studio team developing new content that's really useful at all time the combination of the two is so powerful and we're really fortunate to be able to have those resources and so I'd recommend any of their like all their documentation is good their websites their ebooks all just any time you've got a question I go there I think from my point of view you know get stuck in I find the easiest way to learn how to get better at developing stuff is perhaps maybe not start with creating your own package from scratch because that's quite daunting but now that you've got the basics of what things look like and what a package looks like you know find something you rely on and you use and think how would I use this better is there something this doesn't quite do that I want it to do can I make it do that and then you know that that can be a really valuable way of kind of getting that experience and going yeah you know what is only a small change but actually it's going to make life better for everyone so I'll do that and I'll go watch the tutorial on how to use get and get hub and and do that workshop and submit a pull request and you know you'll find yourself in a position to help I'd completely agree and actually look I've talked a bit about Cran today and passing tests and checks and all that sort of stuff and that's useful you know start there I didn't I suspect you didn't Chris like it's just for me it was I needed to do something a lot like the same actions over and over for different people at different times and putting in a package just saved me having a copy and paste code and that's enough you know and you could putting on your get up is a good idea because it's a bit more stable than if you hard drive crashes you don't lose it so you kind of can start there and you very quickly like fall into good practices these days there's a lot of things that support you things well it's really quite encouraging yeah for me it was I wanted to draw a prisma diagram and the prisma 2020 package existed and I fixed a couple of bugs and that's that's what it takes is you just kind of find something you like and go oh actually it doesn't quite do this right and then you fix it hmm someone did that this week for Galar the package I maintain most actively at the moment and it was amazingly useful I just wouldn't have found yeah so people and it might not be clear to beginners just that's not that might appear cheeky or a strange thing to do but to the developer it's like you know someone's used your package they're engaged enough that they wanted to look at your source code it's actually really gratifying so it's the opposite response that you might expect as a newcomer I don't think we've had any more questions I'll just double check on Twitter just in case and on YouTube I don't think we have because Neil's been pretty good at keeping an eye on I don't have the YouTube video on here right yeah we don't have we don't have any oh hang on we've got just a quick can we make can we make if commands in the expect equal function that was just a question that came up on YouTube in the expect equal function so yeah yeah I think I guess so I haven't tried it there's a question from Matt Granger which is really serious and really needs to be addressed as soon as possible what birds on your shirt Flemingos Flemingos nice originally this shirt was white and it went through the wash with some jeans so it's a bit blue now there you go Matt I like to sprinkle these details in just for the detailed observers but yeah if commands in the expect equal I would suspect so because you can pass any function to the expect equals function right so therefore you can yeah you could do that's a good point Ashley you could have if and then nest different expect functions within else yeah it would work I think thanks very much right so I think we are done if no one else has any more questions I guess all that remains is to say thank you to Martin for his time today and valuable expertise and thanks to everyone who joined this on live on zoom but also quite clear that people were engaging through youtube and twitter because there were the odd question came up through that so it's really nice to see thanks everyone thanks to the conference organisers you guys have done a great job