 Hey folks, I'm Pat Schloss and this is Code Club. We're in the midst of going through a series of episodes where I only use base R to read in non-rectangular data to make it rectangular. To make this a little less abstract, I'm taking a lower triangular distance matrix, commonly called phylip-formatted distance matrix, and trying to convert it into a rectangular data frame that will play nicely with tools from the tidyverse. We saw in the last episode how we could use scan to read that in that produced a vector. Now we need to convert that vector into that matrix. To do that, we need to understand how vectors work. A lot of the concepts around vectors are kind of abstracted away from you when you're working in the tidyverse because it just makes things nice, right? And so what we need to know though is how we can use basic tools working with vectors to enhance our ability to work with the data. Now something to appreciate and I'm sure somebody will probably correct me, but basically everything within R is a vector, okay? So a data frame that you play with in the tidyverse with dplyr is a vector of vectors. And in a data frame, each of those vectors could be of a different type. A matrix is a vector of vectors where all of the values, all of the vectors of the same type. What do I mean by type? Well, is it numerical? Is it a character? Is it a logical? Is it a date? Right? So when you look within a single vector, all of the values in that vector of the same type, you can't have a vector of numbers and characters and logicals. They have to be all numerical, all character, all logical, all dates or whatever type you have. Now the nice thing about data frames, of course, is that we can then mix those, right? We can have multiple vectors of different types. But again, we're getting ahead of ourselves. We are looking at vectors. And so one of the things that we need to know to work with vectors is how to build vectors, right? So we read in our data as a vector, but to access different values in that vector, we're going to need to use vectors to get into the vectors. Sounds a little mad, huh? Well, let's head over to RStudio and I'll give you a demonstration on how we can build vectors. As I mentioned, everything in R is a vector. So if I type 4, that comes back with that silly prompt, the square bracket 1, and the number 4. Believe it or not, that is a vector. It is a vector of one length, right? But that is a vector. And so again, vectors are such a fundamental building block in R. Well, the next more sophisticated vector that you might make is with the C function. And so the C function, you can think of as being combining different things, right? So I could do C1 comma 2. And I get back a vector with the elements one and two, I could do one, two, six, and get back that vector. So the C function allows us to build vectors with any kind of information we want in it, right? So I could do C, T, F, T, for true, false, true, right? Or I could do C, A, B, C, and get back a vector with three character values. Now I mentioned that a vector, all of the values in that string, all that series, all that information, whatever you want to think of it as, those values have to be of the same type. So if I do C1, B, true, so that comes back as a vector of type character, because R doesn't know what to do with B, other than to keep it a string. And if everything has to be of the same type, then it makes that one and the true also strings, right? Well, let's see something that's perhaps not as nebulous to R, we could do perhaps C, one, two, three, four, and we could do true, false. And so here, what we see is that R has actually converted true and false into numerical values. And so what it did was it took true and made that one and false and made it zero, right? And so that's really important to know is that the numerical value numerical representation for true is one and false is zero. Now, I could take that whole vector and tell R that I want it to be a logical. And I could do that by wrapping this function, this vector, in a function called as dot logical. And so that then makes all of the values logicals. And so the true, the real value of true, the numerical value of true is any number that is not zero, right? And so if it's zero, then it's false. But if it's 1234 or 100, that is going to evaluate as true as a logical. If I do as dot numeric, and if I give it the same vector, then again, like we saw earlier, it will turn all those logical values into numerical values. If I do as dot numeric, and give that a vector, let's say, A, B, C, and then 123, that does it, right? It returns a vector, a numerical vector. But it's giving us a warning message that it inserted an A values for the A, B, and C, because it didn't know how to convert A, B, and C into numeric values. So we have as logic as numeric, there's also as character. And so if I do as dot character on C12 and three, that's going to turn 123 the numerical values into characters. So that as logical, as numeric, and as character are really helpful functions for turning vectors from one type into another. But again, the take home message from what I'm describing here is that all of the values within your vector have to be of the same type, or are is going to give you warning messages. One of the things some people don't like about are is that we can do this flipping back and forth of different types, or that I could give this, this vector here a true and false values, logical values with quantitative values, and it figures it out, right? So people don't totally like that. But some people like myself, you know, appreciate that flexibility within R. So this C function is really a think of fundamental function in using R. I use it all the time with the tidy verse with GG plot, you know, if I'm trying to specify my own colors, you'll see that I'll do C in parentheses, and then say three hexadecimal values to give three different colors, or three different shapes to give three different plotting symbols, right? And so that C function I use regularly. And when I'm teaching, I almost always forget to teach people what the C function is. So here you go. Here's what the C function does. So another way to create a vector is by using the colon. So I could do one colon five. And this will give me a vector of integers from one to five. I could also do six to two. And that will give me a vector that goes from six in the first spot to two in the last spot. You can also give this doubles so floating point numbers. So R has the built in pi. If I do pi colon 10, this will give me pi plus one plus two plus three all the way up to the last value that's less than 10. So this gives me out to 9.141539, right? And so that's pretty nice. I generally don't use this with floating point numbers. I generally use it mainly with with integer values. And so if I need to make a vector say of values of one through 10, this is my go to again, the colon operator allows you to increment by one unit with each step or each value in the vector. Say you wanted to go every two or every three. Well, there's a function for that. And that's the seek function. And so you can go seek from equals, let's say one to 100 by equals, let's say 10. And so this should give me going from one to 100 by 10. And sure enough, we get 111121, right? Because it's adding 10 as it goes along. If I wanted those multiples of 10, of course, I would need to go from zero to 100 by 10. And then I could get those multiples of 10. I can also do a descending vector by doing seek. Let's do from equals 10 to two by minus three. And so this gives me 1074, you'll notice it doesn't output two or one, because two is going to be the smallest value that it would ever report. And because four minus three doesn't get you two, then it doesn't show you two. And because one is less than two, it doesn't show it at all, right? So again, this is very helpful when you have kind of uneven increments that you're trying to go over. Now, I typically shorthand this and don't write out the arguments for from two and buy. So I'll do something like seek to 10 to. So it should give me all the even numbers between two and 10, right? So when I use the seek function, I tend to drop out the arguments. Now one place we're leaving in an argument would be helpful is if you don't know, you know, how often you want the values outputted, but you will know the final length of the vector you want. So I could use seek to 10. And I could say length dot out equals five. So what this is saying is give me a vector that starts at two ends at 10, and has five elements in it. And it would help if I spelled that right. So let's switch that length. And so we now see we've got two, four, six, eight, 10. Let's do something a little bit more adventurous than what we had before. And let's do length out equals six. And so now we see we get those decimal values of two, three point six, five point two, and so forth, up to 10. Again, the seek function is really handy when you have an increment that you want to step over, or when you know the length of the vector you want out, and the starting and ending position. Another helpful function for building out a vector is the rep function. So the rep function is short for repeat. And so I could say a five. So this is going to repeat the character a five times, but it's not going to give me a single string. That's five As it's going to give me a vector with five seats, five elements, and each of those elements is an a, as you see there, right? So we have five individual is so that's helpful. But we can now think about combining different ways of building vectors to make something a little bit more sophisticated. What would happen if we give rep of another vector, right? So if we gave it a and b, and then we said five, what this should do is take a and b and repeat it five times. Sure enough, we get a b, a b, a b, a b, so that we have a full vector that's 10 units long, repeating that pair five times. So this is repeating a b five times. The argument that this five is using is really the times argument. Again, it's taking that vector and repeating it five times. Alternatively, we could say each equals five. And that would take each element of that vector and repeat it five times, right? And so now we have five As and five B's. Perhaps you'd like to have, say five As and three B's. How would you do that? My first guess would be to do C five comma three. But that doesn't work. So what we need to do instead is to use the times argument. And so we could do C five comma three to then get five As and three B's. A final thing that I'll show you with the rep function is that we can also specify the length of the vector that we want to get out, kind of like we did up above with that seek function. So we could do length gone out spelling it right this time. And let's say 11. And so now we get a vector that is 11 units long, and it repeats that A and B until it gets a vector that's 11 units long. And so you'll see that the 11th unit is an A. And so it's kind of recycling that A and B. But because adding a B would give you 12 units, and you only want it 11, it truncates it at the 11 slots, 11 units in the vector are actually has a couple vectors that are built in that are worth knowing about. One is letters, and these are all 26 letters in lowercase in the English alphabet. If you do all caps letters, this gives you the all caps alphabet, you could also do month dot ABB to get the abbreviated months, and then month dot names name to give you the full names of those months. Again, these are super useful tools from base are for building vectors, some that are predefined like these letters and month ones, but also using the C function, the colon, the seek and the rep functions. I use those functions regularly. And to the point where I don't even realize that I'm using them, but they are really important functions that even when you're working in the tidyverse, I'm sure that you'll find you'll use. So practice with these, see if you can't build out a variety of different vectors of different composition. And be sure that you subscribe to the channel because in the next episode, I am going to talk to you about, well, how can we figure out what the 13th letter of the alphabet is, or how could I perhaps pull out letters 10 through 20 and get those letters. And of course, we'll take it back to our original problem of trying to figure out how we can parse apart this very long vector that is of type character, and that ultimately we'll want to take some elements and keep those characters, and some elements will want to make them numeric. So we're really going to build on the content from this episode as we go forward. And I've put up the playlist here for the rest of the series, so that you can be sure to follow along with what we do next. Keep practicing, and we'll see you next time for another episode of Code Club.