 In recent episodes, I've been trying to work people through different elements of Base R that I think are most important for working in this R programming environment. Sure, we've got lots of great tools in the tidyverse, deplier, ggplot, and so forth. But even though you've got those tools, you often still might need tools from Base R. In today's episode, I'm going to tell you everything you need to know about using matrices. So what is a matrix? Well, as I mentioned before, everything in R is some type of vector, even the number one. One is a vector that has one seat in it with the value of one. In a vector, all of the values in that vector are of the same type. They're all numerical, or they're all logical, or they're all character, again, all the same type. If you try to put a character into a numerical vector, it makes them all characters, right? So we've seen that already. A data frame is a vector of vectors, where each column is a different vector, and can have a different type of data in that column. Data frames are what we're most used to working with when we're working in the tidyverse. Well, a matrix is a vector of vectors like a data frame, except where all of the values in the matrix, so all of the columns are of the same type. In today's episode, we're going to learn about creating matrices, accessing values from matrices, and then we're going to apply that knowledge to our effort to read in a file up formatted distance matrix. This is a non rectangular data format. And we're going to try to read those distances in to a matrix. So we've seen elements of how we can do this in previous episodes using the scan function, building vectors, accessing values from vectors, removing values from vectors. So now we're going to take those values and put them into a matrix. This is well on our way towards reading in that file up formatted distance matrix into a format that we can do all sorts of fun stuff with using dplyr and ggplot eventually. So let's head over to our studio and I'll introduce you to the concept of matrices. So to create a matrix, we use the matrix function, and a couple arguments are needed to build out this matrix. So first is the data. So what goes into that matrix, right? And so I'm going to say one to 100. And then we can give dimensions to the matrix. So we can say n row equals 10. And this then creates a 10 by 10 matrix of the values from one to 100. Now we could also add and call of 10, which is unnecessary because if you've got 100 values, and 10 rows, then naturally you're going to have 10 columns. But it's useful to be able to parameterize our matrix function to get the dimensions that we want. So something you're perhaps wondering is how does the matrix function know how to lay out the values in the matrix? As you can see from this running of the matrix function, it's doing it column wise, right? So in column one, rows one through 10 are the values one through 10, column two, 11 through 20, and so forth, right? Well, maybe you want the values to go row wise. So we'd want to go one through 10 across our columns in the first row. How would we do that? Very easy. We could do by row equals true. And here now, again, we have one through 10 across that first row, 11 through 20 in the second, and so forth. Again, the default is by row equals false, where we then have column wise reading in or laying out the data into our matrix. Let's go ahead and assign this to a variable that I'll call x, so we can have something to play around with. So how do we get values out of x? How do we get rows? How do we get columns? How do we get individual cells? Well, we saw in the last episode when we were working with vectors that we could use this square brace notation. And we could insert a number in there to get out a value from the vector. So it's very similar here, except that we have two dimensions, we have a row and a column. So we could do x two comma four. And so this will be the second row and the fourth column, it goes row column. And so this gives us 32. And as we can see, that's this value here, row two, column four. If I wanted to get out a individual row, well, I could do x three comma, and then nothing, nothing for a column. This then gives me all of the values in row three, right? And so we can see that row starts with 31323, which of course is our row three in the matrix. What if we want a column? Well, we do comma, and then say like the fifth column. And so then we get 41, 42, and so forth, which is right here, right? That's the fifth column. So you don't need to give a row value, you don't need to give a column value. But if you want to get a specific column or a specific row, you need to give that value, but you can leave the other blank. So you get all values for that row or all that column, we can also give the square brace notation a vector, right? So I could do x two colon three should give me the second and third rows of the matrix. And sure enough, I get those two rows, right? Of course, we could do the same thing with the column, but perhaps, you know, we we only want to give it one value. So let's get the second and third rows, but the fifth column, right? So now we get 42 and 43. Again, we said second and third rows, which are those, and the fifth column, which of course are 42 and 43. So it's really versatile and allowing us to insert numerical values vectors or individual numbers to get the cells rows or columns that we want out of the matrix. We can also remove values from our matrix using the approach that we used previously for those one dimensional vectors using a negative sign. So I could do x minus two comma and we should remove the second row. So sure enough, we see that we go 134. So we've skipped that second row. And you can do the same thing to say remove the second column, right? If I wanted to remove multiple columns, I could do x comma C 234. But that's going to give me columns 234. If I wanted to remove those columns, I would then do minus C 234. So what you can now see is that we've removed the second, third and fourth columns from the original x matrix. Again, here is our full fledged x matrix. Perhaps we want to modify values in our matrix. How do we do that? Well, we can use this indexing using the coordinates to say I want this cell, and then you can assign a value to that cell or those rows or those columns. So let's try this with 35. So row three, column five, and I'm going to assign this minus 100. And so now when I look at x, I now see that row three column five is signed to minus 100. Well, again, we can take x and we could assign and complete rows, right? So let's do rows two and three, all of the columns, and let's assign those to zero. Now when we look at x, we see that both rows two and three have all zero values across them. And of course, you can mix and match and we could do rows two and three. And we could also do say column four. So what we see actually is that column four doesn't become all zeros, because it is doing the intersection of rows two and three and column four. So it's saying give me rows two and three and column four and set those values to zero. Another way that we could perhaps see more easily what's going on is instead of two and three, we could do eight and nine in column four. And so now what we see is kind of what we expected was that we then change those two values. If I wanted all the values in column four to be zero, I would then do comma four, and then assign that to zero, right? And so now we see that rows two and three and column four have been set to zero. As I mentioned earlier, all the values in a matrix are of the same type. In this case, they are all numerical values. Well, what would happen if I inserted a character? So let's do x. And let's do six comma eight. And let's assign that to let's assign to that pat my name, right? So let's see what happens when we do that. So now what we see is that everything in this matrix has become a character type. And we know that because we have quotes around all of the values. But we also see that we have pat inserted in here, right? So again, the important thing to recognize that all of the values in a matrix are of the same type. How might we go about making a blank matrix? What we saw earlier was something like matrix one colon 100 n row equals 10. This gives us the matrix that we started with before when we're creating that x matrix. But perhaps they don't know the values that I want to have in this matrix. How could I make a matrix say of all zeros? Well, we could do matrix rep, zero 100, and then n row equals 10, to get that matrix of all zeros. Alternatively, we could do matrix zero, n row equals 10, and call equals 10. This then gives us a empty matrix of all zero values. And so this is where we're going to start with, as we go back and think about the data we're reading in from our philip formatted matrix into a matrix representation here in R. I've got my R script here, I'm going to start by creating a variable that I'll call dist matrix. And this will be a matrix. I'll again give it zeros, we'll say n row equals n samples. So again, if we run all this, we'll see then that n samples is is 10, right? And so what we're going to create here is a 10 row matrix, actually didn't show what happens in this case, if we only give n row. So let's see what that does. So that actually gives you a one column, but 10 row matrix, we want 10 columns as well. So we'll do n call equals n samples, we then see disk matrix, as we'd expect where we've got 10 rows and 10 columns of all zero values. So I'm creating this with all zeros, because I'm initializing the matrix. So I'm going to start with a matrix of all zeros. And then we're going to add values in effectively as we chomp through the vector. So in the last episode, when we were talking about working with vectors and getting values out, we talked about shift and pop. And while R doesn't have shift or pop, we're going to kind of roll our own to chew through that vector. And while we're chewing through that vector, we're going to insert values into this matrix using the approaches that I just showed you using that x matrix. So if we look at distances, we see that we have the sample names, as well as distances. So I need a separate vector that have my sample names. So I'll say samples equals. And here I'm going to create a vector of type character. So I can do rep, quote, and then n samples. And so this will give me a vector of all quotes of all character types, but it's empty. That's n samples long. If we open up the simple Bray Curtis distance matrix, we can see the structure here that's being represented in this one dimensional vector, right? And so the first sample f 3d zero doesn't have any distances, because this value here where my cursor is would be zero, right, because that's a distance between f 3d zero and f 3d zero, right? And then this value over here, which if you're like in the, I guess the third column here would be point 392162. And so it's lower triangle, because we don't need the diagonal because those are all zeros. And anything above the diagonal would be the same value as what's below the diagonal. And so we get a sample name, followed by a sample name followed by distance, right? And so there's a pattern to this. And, you know, we could spend some time thinking about what the pattern is, and perhaps design something a little bit more intelligent for how to kind of chomp through or parse through this vector. But I'm going to kind of do it brute forced. So again, samples one is going to be distances one, right? I got a load samples. So samples now has f 3d zero in the first spot. But as we've seen, distances still has f 3d zero. So we need to remove the first slot from distances, right? So I'm going to do distances equals distances minus one. And so that will remove that first element from distances that f 3d zero, right? And so now if I look at distances, I see that it now starts with f 3d one. Okay, so we're going to keep going through this. So we can then do samples two is distances one, right? And then distances is distances minus one, right? So if we then go through these two steps, we can see samples has the first two sample names and distances and then distances starts with that point 392. And so we need to put that point 392 into our matrix, right? And so to do that, we can then take this matrix. And this is going to be samples two, right? So this is the second sample, but the first column, the second row, first column, so we'll do two comma one. And that will be distances one, right? Because that is the first value in here that we are now mapping into our dist matrix. Now this is of type character. And so what I'd like to do then is to do as dot numeric around distances. And so now if we look at dist matrix, we see that we have our first distance in our distance matrix. So exciting, huh? And so we now need to go ahead and repeat this step, where we remove the first distance from the distances vector. And if we then look at distances, we now see we're starting at f 3d one, the 11. So the issue with all this is that say I goof up what I remove or where I'm at in this whole process, let's keep pressing along. And for organization sake, I'm going to try to group these different lines of code as we go through the different lines. And again, we're going to brute force this, just want to get a feel for how we're working with the data to move values into the back out of a vector into the vector and out of the vector into the matrix. Okay, so we'll do samples three. And that is going to equal distances one. And we'll also want to remove that first value from distances. So again, if we look at samples, we now see we have our third sample. And if we look at distances, we now see that we've got two distances, right? So we could repeat this step up here at line 19 and 20, twice, to add those values to our disk matrix. But I'm going to show you an easier way. So I'm going to copy it down because we're going to use this as a scaffold for figuring out how to do it, right? So we're going to be in row three. And we're going to be looking at columns one and two, right? So we're going to look at those first two columns. And we're going to take out of distances, the first two values, right? So values one to two, writing with this colon notation, and it'll be easier to see why we do that as I go forward here, right? And so then we're going to remove distances one and two. So I'll do one, comma two. And so now let's run these steps. And so now if I look at disk matrix, I now see that I've got row three, and I've inserted those two values. If I look at distances, I now see that I'm ready for my next sample. So again, I'm going to copy this and we'll be looking at samples four. And again, there's always only one sample name at a time, right? So so that repeats. And now we're looking at rows four. And we're going to be doing these three distances, right? So here, we'll put in one to three, one to three. And here what I'll do is I'll do one colon three. And so let's go ahead and run these four lines and see where we're at. And so again, if we look at samples, we see we have those four sample names. If we look at disk matrix, we now see that we have the fourth row and those three distances. And if we look at distances, we now see that we're ready for F3D13. So I'll do one more iteration on this for you all. So we'll go ahead and copy this down. And we're ready for samples five, row five, rows one through four, maybe you're noticing a pattern here. And then we're going to remove the first four elements as well from distances. So now if we look at samples, we have those five sample names, disk matrix, we now have through the fifth row and fourth column populated. And if we look at distances, we now see that we're ready for the next thing. So I'm going to go ahead and repeat this for the other five samples. So you don't have to watch me doing the same thing over and over. And I'll be back in just a moment. All right, so I've copied that chunk of code over and over again, so we could read in all 10 samples. If I look at samples, I see I have all 10 sample names. If I look at disk matrix, I see I now have that 10 by 10 matrix. The last column, of course, is all zeros, because the diagonal term here is a zero. And then everything above it is basically the 10th row, right? So we now have read in our lower triangle matrix into a matrix for, you know, pursuing further in R, something you're perhaps thinking is, well, Pat, you read in mice, simple break, Curtis dot dist, that only had 10 samples in it, our real mice break, Curtis dot dist has 348 rows, right? So do you think I'm going to repeat this chunk of code 348 times? No, I am not. And I'm not because that wouldn't be dry. Don't repeat yourself. That's what dry stands for. And in the next episode, which you need to come back for, I will show you how we will take this code, and we will make it dry, we'll make it generalizable, so we can not only read in this matrix that has 10 samples, but that we could read in the full matrix that has 348 samples to produce a matrix. If you notice the pattern in how I was updating the code as I looked at each subsequent sample, hold on to that thought, because we will use that to dry out our code. Be sure that you've subscribed to this channel. So you're notified when that episode is released. Smash that thumbs up button so that you're, you know, even more likely to learn about it when it is released in a few days. And I'll see you next time for another episode of code club.