 Hey folks, if you've been following along in recent episodes of Code Club, you know that I've been trying to highlight a variety of tools from Base R that you might find useful in your own data analysis. We frequently spend a lot of time focusing on tools from Tidyverse or other R packages, but the reality is that those are built on Base R, and still Base R has a lot of functionality that you kind of need to know to be successful in using R. Well, the application that I've been trying to demonstrate these various Base R tools with is reading in a lower triangular distance matrix or a philip formatted distance matrix that shows the distances, ecological distances between two samples of mouse fecal pellets looking at their community dissimilarity. If you have no idea what I just said, well stick around because I think you'll still get something out of today's episode. At this point, we have a matrix. A matrix is a vector, a vector with something special, and that's called an attribute, and the attribute that it has are dimensions. And so we might have a 20 by 20 matrix. Well that 20 rows and 20 columns, those are attributes. We'd also like to perhaps add names for those rows and columns of our matrix. And you know what, those are also going to be attributes. So in today's episode, I'm going to show you how you can find the attributes of any variable that you have in R as well as how you can modify those attributes so that you can set them to be the most useful for what you need in your application. So to help us get going and learning about attributes and how they relate to different data structures, I'm going to use one of the built in data structures. It's a data frame called MT cars. I talked about it a few episodes back when we were kind of answering a question on the RStudio community forum. It is from Motor Trends. So that's the MT cars data set. I think it's coming from back in the 70s, looking at different aspects of a bunch of different cars. It doesn't matter. What's important is that MT cars is a data frame. So how do I know that it is a data frame? Well, there's a really useful function called str, which you can think of as being short for structure. So if I do str and give that MT cars, so the output of running str and MT cars is kind of the transpose of the data frame MT cars. So instead of having columns, we then have rows, right? And so we can see this is a data frame, right? It's got 32 observations of 11 variables, and then it shows us the 11 variables. And if you remember my episode and talking about lists, you'll know that a data frame is a list of vectors where each vector is the same length. And so what you can then see is that MT cars again has these 11 variables. And if we wanted to get the values from any of these variables, we could of course do MT cars dollar sign, and then let's do like WT. And that gives us all of the values of WT from MT cars. So that's pretty basic. And we've seen that before in previous episodes when we were talking about how we can get access to lists. Well, what we don't see here in this str output, at least not directly, are the attributes. And so if I type attributes on MT cars, this will output for me all of the attributes and their values for the MT cars variable. We see the names. And so these are the names of those 11 different variables, the column names. We also see the row names. We also see the class, right? And so I could get back attributes, attributes, MT cars, dollar sign names. And this then gives me the names of the different columns. So hopefully you can see how I did that. The attributes MT cars, it returns a list, right? And so then to get a value out of that list, as we've seen before, I can use the dollar sign to get back that variable, that part of the list that I'm interested in. But again, what I'm trying to highlight here is that we can use attributes to see what the different attributes are in MT cars. So the nice thing about the attributes function is that shows you all of the attributes connected with a variable. Another way to do it, if you perhaps don't need to know all the different attributes, is the ATTR function. And so we give ATTR, MT cars, the variable that we're interested in, as well as in quotes then, the attribute that we want to see. So we can then do names. That then, you know, I think a little bit more compact returns the different names of the columns in MT cars. So this is one way to get the data out, but we could also modify it, right? So I could then say ATTR, MT cars names, and I could then say, well, let's look at element five. And so that would be this D rat. And let's name it the underscore rat. Okay. So nothing output it, no error messages, which is always good, but we're not quite sure what happened. So now if I run ATTR, MT car names, I now see that I have modified the fifth element of names, right? I could also name, rename all of them, right? So I could do ATTR, MT cars names. And I could do letters. So the letters is a vector of all capitalized letters a through Z. And I could then give it one through 11. And now if I look at MT cars, ATTR function was useful in changing the column names to those 11 letters from a to k, all in capitals, I guess k is the 11th letter of the alphabet, if you ever needed to know that and didn't want to count on your fingers. Anyway, that is a way, as I said, to get access to the attributes, as we saw with ATTR, as well as to change the attributes. So I don't know that anybody ever really does it this way, though. This is really a basic much more fundamental level of working with the attributes and the data. There are other functions that allow us to work with the data more easily. So to get the column names out of MT cars without using the ATTR function, which again is a little bit wonky, we could do call names, MT cars, right? And so that then gives us all of our column names. We could also use this to assign values, right? So let's do letters. So these are the lowercase letters from a to z with one to 11. And now if we do MT cars, and what we see as we had expected is the lowercase letters from a to k, right? So what I want to highlight here is that you can use ATTR to change or get access to specific attributes in a variable. Oftentimes, there's also functions like call names, or row names, that allow you to set the names of the columns or the rows of the data frame. So we could do row names. And we could then say MT cars. And this would then return the 32 names of the different cars in the data set. So to kind of repeat this and show you it in a different context, let's make a variable x. And we'll say this is a vector from one to 20. Well, we've seen in previous episodes how you can create a named vector by using the c function to say, you know, this value equals something else comma this value equals something else. And then you can look up that's the value of, you know, that specific seat in the vector. So let me let me show you before we get too lost here. Well, if I look at str on x, I see that it's an integer one to 20 with those different values. If I do attributes on x, I see there's no attributes on x, right? Well, I could do ATTR x names. And again, it's going to come back as null. But then I can assign to that letters one to 20. And so now if I do ATTR x names, I get those 20 letters. And if I look at x, I now see I have two lines in the output. So what this allows me to do is to say x. And then in quotes, I can put k. And what we should get back is the number 11. And sure enough, we do. And so now we see that we have changed the attributes of x to have names. And so now if I do attributes on x, I see that I have a vector of names that goes with x. And if I do str on x, I now see that I've got a named integer vector one to 20 of one to 20, like I said. And then it's also got attributes associated with it that are a through T, which is the 20th letter of the alphabet. So this is showing you a little bit more under the hood of what's going on with attributes and different data connected with our variables here in R. Now, I would never do ATTR x comma names, and then set that equal to something else, because that's just too hard. What I would do instead would be names x equals, and let's change these to capital letters. So we'll do letters one to 20. And so now if I look at x, I've changed the names attributes using the names function. So one of the other reasons I like knowing about the ATTR function is because I don't have to keep track of all of these special functions like names, call names, row names. I can never remember if call names has a period or not. But if I use ATTR, then I at least I have a more common interface to work with the data, albeit, you know, using the function like names or call names, row names, makes things certainly a lot easier. Okay, well, I've got x. I can get rid of those column names by again doing names x equals null. And so now if I look at x, it doesn't have anything. If I do STR on x, I see it no longer has those attributes connected to it. Well, let's change another attribute of x. Well, let's change its dimension. So as a vector, it's got one dimension. So we could do ATTR x. And then we could say dim. And we could then give dim a vector. So we could say C four comma five. And what we should get back out is a four row by five column matrix of our values one to 20. And so now if we look at x, sure enough, we've got that right. And so again, x has been modified, it is no longer a one dimensional vector, like we had up here, it is instead a four row by five column matrix. Well, what if I wanted to change the dimension. So instead of say, four by five, maybe I want to make it 10 by two. And now if I look at x, I have 10 rows in two columns. So I can really easily change the structure of x. But one of the challenges, if I did 10, comma three, it's going to complain, because I don't have 30 values, I only have 20 values. And so it can't do the impossible, so to speak. So instead of using ATTR, we could also use dim. So if I do dim on x, I see I have 10 rows in two columns, I could also pass dim x a vector. So let's go back to four and five. So now x is a four row by five column matrix. Okay, so again, this is maybe a little bit confusing, but I want to highlight, again, we have the ATTR function. But there are also these special accessor and setter functions that allow us to more easily change the attributes of our data. And in this case, we're working with a matrix is what we've turned our vector into a matrix. And so again, this also highlights that a matrix is a vector that has special attributes, right? It's got attributes that allow it to have two dimensions. And that's, that's again, another little tidbit of information to know about how these different types of data relate to each other in our. So let's say I want to give row names and column names to x, well, I could do ATTR x. And then in quotes, I could do dim names. Again, in quotes is going to be the, the attribute that you want to modify. And we see that it's a type null. So there are no row names, there are no column names for my matrix. And what I'm going to want to give this is a list. And so we give it a list. And the first element is the rows. And the second element are the columns. And so I can then say letters, one to four, and then letters, one to five. And so that should give ABCD on my rows and ABCDE on my columns. And so now if I look at x, I see sure enough, I've got my row names and my column names. But why would you do that, right? Well, instead, we could do row names on x. And let's again do letters in all caps, one to four, call names on x, and we'll do letters, one to five. And if we look at x, we now see that our row names and column names have been changed. And if we do attributes on x, we now see that we've got those two different dimension names, and that dim names is sure enough a list, and that it's got the dimensions four and five. So what we've been talking about so far are the attributes where there are specific setter and accessor functions like row names, column names, dim names, so forth, right? Well, maybe I want to attach other type of information as an attribute to my data. So I could do htrx. And then I could say created by, and then I can assign to that patch loss. And now if I look at x, I see that it's got a special attribute down here created by patch loss. And so that's pretty slick, right? I can attach other attributes to my data that is not what you typically think of as being part of a generic matrix. And again, I could then get back that created by by doing htrx created by, and it comes back as the value patch loss, right? So I could give an attribute that's a single string like this, a vector, a list, what have you, right? And so again, this is a really convenient way to add information to an existing data object in R. So let's go ahead and look at our R script that we've been creating as we've gone along. So we'll go ahead and run all this. If you want to know how this all works, I encourage you to go back and look at earlier episodes in the playlist. If I now look at disk matrix, I see I've got my 10 by 10 matrix. And if I do str on disk matrix, and I can do attributes, disk matrix. And I see that it's got two dimensions, 10 rows and 10 columns. I also have a vector called samples, which are the names of the samples that are sitting on those rows and those columns. And I would like to replace these index values on the rows and columns with the sample name that they correspond to. So how would I do that? One approach would be to use the ATTR. And we could do ATTR disk matrix on dim names. And we could then give it a list. And the two values, the two vectors for the list would be samples and samples. And now if we look at disk matrix, we now see that we've got our rows and columns, right? So again, by using that ATTR function, we can assign samples values to the rows and the column names of distance matrix. But that's a bit tedious. And what we'll do instead of using the ATTR function is perhaps to do row names on disk matrix equals samples. And then we'll do the same thing, but using call names. Right. So now we run all that, we look at disk matrix. And sure enough, we now have our distance matrix that has our rows and our columns. Attached to the matrix. And so now we don't have to worry about samples being one variable and disk matrix being another. We have both pieces of information intimately connected because we made use of those attributes, which is is pretty slick and really useful. Of course, we could come back in and we could then look at using another file. So we could remove that underscore sq. We could rerun everything. We could then look at disk matrix and see that sure enough, we have a square distance matrix with our labels and everything is working well. One thing that you've perhaps noticed after multiple episodes is to keep running this script over and over again, when when all I'm really changing is the name of the file. So what we're going to do in the next episode is we're going to convert all this great code into a function. And we're going to make our analysis dry so that we could use this function for any distance matrix file that we might want to use. And that will make our analysis going forward in the future just so much easier and so much more reproducible. So make sure that you are subscribed to the Riff-Amonas channel so that you get that next episode. I wouldn't want you to miss it. We really are learning a lot of great things here about base R. And I encourage you to subscribe and keep following along. So keep practicing. Go out into the wild of working with your own data and see if you can figure out what attributes are attached to different types of data that you're working with in your day-to-day analysis.