 Hey folks, when I was first learning the tidyverse one of the things that really caught my eye and just really was intuitive to me was the use of pipes. Now pipes aren't explicitly part of the tidyverse, they come from a package called magridder, I might be mispronouncing that that's my midwestern pronunciation here, but the pipe character allows us to pipe data from one function to another and we've seen in the past episodes that we could have many many functions connected with this pipe character. I think and most people that use it think that the pipe makes the code far more readable than it would be if we wrote the code in a nested fashion or if we wrote output of each function to a new variable. That just really gets kind of klugey and really frustrating to use. Well with the release of r version 4.1 base r has introduced its own pipe character and a few episodes ago I got a comment asking me why I don't use the base r pipe character and so I was familiar with the base r pipe and remember reading some of the documentation about it and thinking I don't feel like that really gains me anything and if you know me well you know that my brain has a limited capacity and so trying to store information about two different piping systems is just a pain and unless there's a really good reason to switch to this new piping system I'm going to stick with magridder so what I want to do in today's episode is show you what code looks like without pipes what it looks like with base r pipes and then what it looks like with magridder's pipe. Now you may know this if you've dug into magridder but the pipe we typically use is kind of the vanilla pipe right that percent greater than percent sign but magridder actually has a variety of other pipes and really useful alias functions that go along with the pipe that work great in a pipeline to help you create beautiful readable code over here in our studio you can see that I definitely have a version newer than 4.1.0 so if you don't have 4.1.0 be sure that you update your r version to get probably the latest version would work great I'm working with 4.2.1 this was released on June 23rd of 2022 there might be a newer version I don't know and this has the great name of funny looking kid I've got a few funny looking kids running around my house anyway I have a r script going called pipedemo.r and I'm going to start it by sourcing a code local weather.r it doesn't really matter what we use for this episode I just want a data frame a tibble that I can work with to demonstrate using different types of pipelines if you want to get this file though this r script as well as pipedemo.r and its final form down below in the description is a link to a blog post that will show you the links to get the code at the beginning of this episode as well as at the end of the episode so that script ran through and looking at the output here of local weather I see I've got the date the tmax the amount of precipitation and amount of snow pure cp and snow are in millimeters tmax is in center grade and the data goes back to october 1st of 1891 I can of course wrap local weather in a tail to get the last six rows of the data frame to see that this data comes to us through August 13th of 2022 so the driving question that I'm going to use to demonstrate the use of pipes is the goal of calculating the correlation between prcp and snow so I'm going to start with base r without using any pipes okay so we'll take local weather and I want to get those rows that don't have an na value and where snow is greater than zero to do that I'm going to use the square brace notation that allows you to index into a row and column of a data frame so the stuff on the left side of that comma is the row the stuff on the right are the columns I'm going to leave the columns to be blank so nothing to the right of that comma so that I get all of the data back all of those columns back but on the left side I'm going to put in some logic so that I can find those rows that I want so I'll start by removing the na value so I say not is dot na on local weather dollar sign prcp and not is dot na local weather dollar sign snow right and so if you're not familiar with this notation local weather dollar sign prcp returns that prcp column as a vector of values right and so then if I do this so not is na then anywhere that you see an na is going to come back as false right so is that na on an na value will be true the exclamation point will make that false right and so now we see we have truths and falses and then when we combine this all together between prcp and snow we find those cases where either of these is false and so then we'll want to remove that and so if we look at the output of this we now find a data frame that doesn't have those rows containing any na values for prcp or snow the next thing I want to do though is remove these rows where snow is zero to do that I'm going to kind of clarify the logic here by putting this and statement in parentheses so that r executes that first and I now want local weather dollar sign snow to be greater than zero this removes all of the rows where snow was zero right so we can kind of see that this is getting a little bit complicated right like it's it's not once you kind of break it down it's not that complicated what's going on here but reading the code is a little bit problematic right it's just not as intuitive as perhaps we're used to seeing with using tools from the plier and the rest of the tidy verse all right so the next thing I want to do is I want to take prcp and snow and I want to look at the correlation between them right and so I will assign this to a variable that I'll say no na no zero that's a great name huh um and let's kind of break this up across different lines so we're not kind of getting all weird and scrolling and of course if we look at no na no zero we get back that data frame so we'll do core dot test and I'll do no na no zero dollar sign prcp no na um dollar sign snow this gives us a correlation of point six four cool right so another way that we could have written the same thing would be to say copy this down and to use the formula notation so I'm going to take out that no na no zero and do data equals all that so we can put this in the formula notation by doing prcp plus snow but putting it tilde before that and so now this gives us the same result but what allows us to do is not have to write this ridiculous variable name over and over again right so you get the sense let me clean this up a little bit right that you know this isn't so bad but this logic statement gets a little bit messy um we have to create this variable that we're then feeding into core dot test which you might not really want to do right and again this is a fairly simple thing that we're doing here in base r right so before I start using pipes I want to show you another approach that we might do the same thing that kind of illustrates I think some of the problems with this type of approach and again this might not be exactly how you would do it but I've written code like this for more complicated examples so I think it's worth kind of showing what we might do so again we'll do local weather and we get our data frame so one of the first things I could do to clean up local weather again using tools from uh the tidy verse but without using the pipe would be to do drop na on local weather of course that removes all of our na values and then I could call this a variable I'll say no na's right and so the nice thing is that this already cleans up the code considerably right so instead of all this right I say drop na and I remove those rows so that's nice I output that to a variable I'll call no na's I can then do filter on no na's uh and I can then add this requirement right that local weather dollar sign snow be greater than zero so this is giving me an error message of course that input must be size 38581 or 1 not the size 47505 and so this is of length um 47505 this is of length 38501 right so this is local weather a dollar sign snow not no na's dollar sign snow so again if I did no na's dollar sign snow that works great and actually it's easier than that right I don't even need to specify the data frame because it's going to grab snow directly from no na's right and so that's the same output we had before and we now see that snow doesn't have any zero values right so I'll then call this no um na's no zero right and we're right back to what we had up here right so the nice thing about this is that the syntax is really intuitive it's descriptive there are these verbs right that tell you what's happening that isn't so clear as you might see up here in this base r notation where we're using these kind of boolean terms of course I could go ahead and take core dot test down here to then get the same correlation value that I had up above right so that's basically the same idea right so how would we convert this into a pipe well what we can do is we could take local weather we can use the base r pipe which is a vertical line and a greater than sign and then we can say drop underscore na and this then gives us our data frame with no na values cool right so we can now add to that pipeline by again using the base r pipe and then doing filter snow greater than zero this then removes those rows that had zero in the snow column right and so now I want to extend this to do the core dot test so I could do core dot test and we could think about doing prcp plus snow and then data equals well what should the data equal well in the magridor pipe what we would normally do is put a period here right that would kind of be our intuition because the magridor pipe allows us to use the period to indicate where the data should go if you've watched my previous episodes where I do like an inner join I'll frequently use that period to indicate that the data coming through the pipeline should either be on the left or the right side of that inner join but of course when I run this it complains because it says that this must be a numeric value and so the downfall of the base r pipe here is that you can't direct the flow of the data you can't indicate what argument it should go to it only goes to that first argument in that function right so if I were to look at the help for filter what you'll see is that the first argument for filter is dot data right that's the data that's being filtered and we saw that up here right so the first argument was no na's from the previous step in the pipeline right so the problem with the base r pipe is that I can't put the data into a slot other than that very first slot of the arguments okay so this won't work and so what I'll have to do then is call this that no na's no zero right and then break the pipeline here and we'll again put that no na's no zero in the place of the data so let's go ahead and run these two steps and again we get the same correlation value so the next thing that we'll do is let's go ahead and use the mag ridder pipe so again we'll take local weather let's get a little bit extra space in here pipe that and again that's the mag ridder pipe the percent greater than percent and we can then do drop na we then get you know all those na rows removed we can then pipe this to filter snow greater than zero that then removes those rows that don't have snow one other thing that's kind of cool about the mag ridder pipe is that you actually don't need the parentheses for the drop na argument if I ran those two lines it removes those those rows that had na values I like to leave the parentheses in to indicate that it's a function if I do that with the base r pipe where I leave out those parentheses it complains right it requires a function call as the right hand side so again that's a subtle difference that I don't think is so important I always put in the parentheses for my functions even if there aren't arguments but if you're using the mag ridder pipe you don't need the parentheses if there's no argument okay cool so again we're right back to where we were back up here using the base r pipe and so what I'd like to do again would be to pipe this to core dot test and then to do tilde prcp plus snow data equals period right and so that works right that gives us back a correlation that we've been seeing all long I'm gonna slightly tweak this so I'm gonna copy this down and if I had instead done core dot test prcp snow data right so instead of using the formula notation I use the x y arguments for core dot test along with data running that it complains that object prcp is not found okay so the challenge is that at this point right data the data going into data has a prcp and snow column but core dot test for whatever reason can't see these column names so what I can do instead is use a slightly different pipe that comes to us from mag ridder which is the percent dollar sign percent running this however I see that I don't have this function so the pipe is actually a function right and so this exposition pipe as it's called I'll write that out exposition pipe only comes from mag ridder and only the normal pipe as it's called comes into tidy verse when we load the tidy verse so I'll come back up here and we'll do library mag ridder get that loaded and so now when we run this it works great right and so again the exposition pipe what it's doing is it's allowing core dot test to see the column names in data so this also often comes up when people are using the lm function to create linear models but it works well here for core dot test as well I think in the previous episode I created a variable that had the filtered data and then I use that data explicitly that data frame explicitly as the argument to a data argument like this in core dot test and I didn't have to do that right I could have used that exposition pipe that percent dollar sign percent after loading mag ridder and so that's pretty slick for the most part when we use the normal pipe of the percent greater than percent that's what I consider the normal pipe that that it's one direction right like the pipe goes from point a to point b and while it does different operations in between there's no bifurcations right well there actually is a t pipe that comes to us from mag ridder that allows you to kind of bifurcate the pipeline in a way so I'm going to go ahead and grab a couple lines of this to demonstrate the t pipe right where we take local weather drop at a filter and I'm going to go ahead then and to use the t pipe so it's percent t greater than percent could then do plot and then pipe that to summarize and we could say a total prcp and we could then do some on prcp and what we get out we obviously see this total precipitation of 17682 millimeters of total precipitation but we also see the plot out here right and so something to keep in mind is that the plot function is a base r plotting tool that doesn't have any output it doesn't return anything to the screen right something that you might want to do is like well what if instead of plot I want to do something like gg plot and we could do like aes x equals prcp y equals snow and then we could do like plus geome point right that's something that I would like to do right and unfortunately that complains because um just the I think the setup of gg plot and how it kind of adds things together it doesn't do a good job of working with that t pipe in general I don't find a lot of good use for the t pipe what I would say is create this as a variable but I have highlighted here and then feed that into gg plot and separately feed that into summarize one place where this could be useful though would be to do something like um print right and so print will output to the screen and we now see basically what the pipeline looked like after the filter right perhaps what you could do is you could use this to help you debug what's going on right and so I could put another t pipe in there and I can output the data frame at these different steps in the pipeline right so here it is with the n a's removed here it is with those zeros and snow removed and then here is the summation of all that data right again I don't get a whole lot of use out of using uh the t pipe your mileage may vary so if you remember number of episodes back I talked about making distance matrices and one of the things I was always having to do would be to kind of break out of the pipeline to add row names or to remove row names or do things like that well it turns out that there's actually tools like that built into magridder that make it easy to use these functions within a pipeline so let's go ahead and for fun let's make a 96 well pipe that we can label the columns with letters and the rows with numbers and so what I might do would be 1 to 96 right and so that gives me that vector and I could then pipe that into the matrix function so I could do matrix on that and of course that then gives me a 96 row matrix but of course let's do n call equals 12 and so now giving us a little bit more breathing room we see that we have 12 columns and eight rows right so this is a matrix right and so if you go to the aliases page on the magridder.tidyverse.org reference page and I'll put a link on the screen here so you can better see what's going on there's a variety of aliases that you can use to do different manipulations of the pipeline so let's go ahead and create those labels so we can use set call names to set the columns and to set row names to set the row names so again we'll come back to our pipeline and I can do set underscore call names and we'll do letters or all capsulars letters 1 to 12 right and so now we see we've got those column names again before what I'd have to do would be to say this is my plate and then set call names plate equals letters 1 to 12 which is just kind of messy so let me just show you real quick so if I did plate on that I would then have to do row names on plate being 1 to 8 right and so now if I look at plate oh plate I see I've got my row names there right but I have to break out of the pipeline to do that and that's just not necessary right so instead what we could do would be to do set row names 1 to 8 and now if I run all this I now get the same output as having run row names plate so now we've created a pipeline to create and label a 96 well plate something else we could do is then pipe this to an add function and I could say add 10 and so now I've added 10 to all values of that data frame or I could do subtract 10 to remove 10 I could do multiply by 10 the spelled it multiply by right and then divide by 10 right so there's a variety of these alias functions that allow us to manipulate the attributes and the values of this matrix so I can convert this from a matrix to a data frame by then doing as dot data dot frame right so now it's a data frame and then I could say use series and then I could put in D to get back the D column right and so that gives me a vector of D values as the documentation shows you use series is a lot like using the percent sign alternatively I could do extract on D and for that I need to put that in quotes and that then is a lot like select on D right that gives me that column and if I do extract to that's equivalent to using two square braces that basically extracts a vector from a list and so now I go back to having that vector so again I want to just kind of expose you to these different alias functions that come to us from the MagRitter package to enable us to make more readable and more attractive pipelines versus kind of you know the things that we saw way back up here when we were doing you know this stuff with kind of nesting Boolean stuff in it or creating temporary variables that we were then feeding to downstream functions it's just a little bit too klugey right as opposed to when we start thinking about using the MagRitter pipes using things like the normal pipe or the exposition pipe or as I showed you down here using the t pipe to kind of bifurcate the pipeline I don't find again that the t pipe does a whole lot to help me so your mileage may vary there encourage you to experiment with that there's another pipe that uses an exclamation point that's called the eager pipe to be honest I don't totally understand it and I've never really had a great use for it this gets into the process of what's called lazy evaluation which again I've filmed a couple hundred of these videos so far and I've had no need to really be too concerned about lazy evaluation so I suspect you might not either but if that's something you worry about know that there is that other pipe out there the eager the eager pipe so I almost forgot that there's one more version of the pipe that I'd like to share with you and so again I'm going to go ahead and grab these three lines where we did the drop na and the filter and bring it down and again we see that local weather we filtered out the na's we filtered out the zeros for the snow precipitation right and so not necessarily in this situation but sometimes we would like to save this back to a variable so I could save this as clean data right and so now I come down here and I have clean data and I have that data stored as that variable right well maybe I don't want to save it as clean data maybe actually want to write it back over local weather what would we do then well what you could do is we could imagine taking this down and instead of using the right pointing pipe right we could use the less than n greater than sign within the pipe character right so basically what's going to happen is it's going to use local weather it's going to feed it into drop a and then into filter and it's then going to write it back over local weather so again local weather looks like this now but now when I run the data frame and come back and run local weather again it's now been cleaned up by going through these two steps right so I'm not a big fan or big user of this assignment pipe but know that it is available if you're doing a whole lot of cleaning up of your data and you don't want to be you know creating another variable ultimately it makes me a little bit uneasy to basically use the input as the same as the output because say I screwed something up in here well then I've got to go back and regenerate local weather and then come all the way through here and so that you know that that's just an extra step and it's not that much to store the data at least once as a temporary variable local weather so I hope you found this interesting and again for my use I don't know that I really have a great benefit to going back to the base R pipe people will say that well on the other side of the ledger of the balance you know kind of thinking about you know trade-offs of things that the magridder pipe has a little bit slower performance it requires you to load magridder again these are not things that I'm really concerned about I'm not worried about that level of performance enhancement to go back and use the base R pipe my fingers are so well trained that you don't know how hard it is for me to write that base R pipe versus the magridder pipe anyway let me know if you find a situation where you do prefer using that base R pipe at the end of the day I feel like the magridder pipe is a lot more powerful for the types of things I do again that main difference being being able to insert data from the pipeline at a specific point that's not that first argument all right we'll practice with this tell your friends all about the different piping options I think if you can explain to them what I've talked about today then you will definitely be well on your way to understanding the different types of pipes and what you can do with them here in R keep practicing and we'll see you next time for another episode of Code Club