 The next step in our introduction is to talk about accessing data and to get that started we need to say a little bit about data formats. And the reason for that is sometimes your data is like talking about apples and oranges. You have fundamentally different kinds of things. Now there are two ways in particular that this can happen. The first one is you can have data of different types, different data types. And then regardless of the type you can have your data in different structures and it's important to understand each of these. We'll start by talking about data types. This is like the level of measurement of a variable. You can have numeric variables which usually come in integer, whole number or single precision or double precision. You can have character variables with text in them. We don't have string variables in our they're all character. You can have logical which are true false or otherwise called Boolean. You can have complex numbers and you can have a data type raw. But regardless of which kind that you have you can arrange them into different data structures. The most common structures are vector, matrix or array, data frame and list. We'll take a look at each of these. A vector is one or more numbers in a one dimensional array. I mentioned them all in a straight line. Now what's interesting here is that in other situations, if it's a single number, it would be called a scalar. But in our it's still a vector is just a vector of length one. The important thing about vectors is that the data are all of the same data type. So for instance, all character or all integer. And you can think of this as ours basic data object and that most of the things are a variation of the vector. Going one step up from this is a matrix matrix has rows and columns. It's two dimensional data. On the other hand, they all need to be of the same length. The columns all need to be the same length and all the data needs to be of the same class. Interestingly, the columns are not named. They're referred to by index numbers, which can make them a little weird to work with. And then you can step up from that into an array. This is identical to a matrix, but it's for three or more dimensions. On the other hand, probably the most common form is a data frame. This is a two dimensional collection that can have vectors of multiple types. You can have character variables in one, you can have integer variables in another, you can have logical and a third. The trick is they all need to be the same length. And you can think of this as the closest thing that are has that's analogous to a spreadsheet. And in fact, if you import a spreadsheet, it's going to go into a data frame typically. Now the neat thing is that are has special functions for working with data frames, things that you can do with those you can't do with others. And we'll see how those work as we go through this course and through others. And then finally, there's the list. This is our most flexible data format. You can put basically anything in the list. It's an ordered collection of elements. And you can have any class, any length, any structure. And interestingly, lists can include lists, include lists, and so on and so forth. So it gets like the Russian nesting dolls, you have one inside the other one inside the other. Now the trick is that may sound very flexible, may very good. It's actually kind of hard to work with lists. And so a data frame, really sort of the optimal level of complexity for a data structure. And then let me talk about something else here, the idea of coercion. Now, in the world of ethics coercion is a bad thing in the world of data science coercion is good. What it means here is coercion is changing a data object from one type to another. It's changing the level of measurement or the nature of the variable that you're dealing with. So for example, you can change a character to a logical you can change a matrix to a data frame, you can change double precision to integer, you can do any of these. It's going to be easiest to see how it works if we go to our and give it a whirl. So open up the script and let's see how it works in our studio. Now for this demonstration of data types, we don't need to load any packages, we're just going to run through things all on their own. We'll start with numeric data. And what I'm going to do is I'm going to create a data object, a variable called n1, my first numeric variable. And then I use the assignment operator. That's this the little left arrow. And it's read as n1 gets 15. Now, R does double precision by default. Let me do this n1. Then you can see that it showed up here on the top right. If I call the name of that object, it'll show its contents in the console. So I just type n1 and run that. And there you can see in the console at the bottom left, it brought up a one in square brackets. That's an index number for the first object in an array. And this is an array of one number. But there it is. And we get the value of 15. Also, we can use the r command type of to get a confirmation of what type of variable this is. And it's double precision by default. We can also do another one where we do 1.5. We can get its contents 1.5. And then we see that it also is double precision. We want to come down and do a character. I'm calling that c1 for my first character variable. You see that I do c1, the name of the object I want to create. I put the assignment operator, the less than and dash, which is right as gets. And then I have in double quotes. In other languages, you would do single quotes for a single character. And you would use double quotes for strings. They're the same thing in R. And I put in double quotes, the lowercase C. That's just something I chose. So I feed that in, you can see that it showed up in the global environment there on the right. We can call it forward and you see it shows up with the double quotes on it. We get the type of and it's a character. That's good. If we want to do an entire string of text, I can feed that into c2, just by having it all in the double quotes. And we pull it out and we see that it also is listed as a character, even though in other languages it would be called a string. We can do logical. This is L one for logical first. And I'm feeding in true. When you write true or false, they have to be all caps or you can do just the capital T or the capital F. And then I call that one out and it says true. Notice, by the way, there's no quotes around it. That's one way you can tell that it's a logical and not a character. If we put quotes into it, it would be a character variable. We get the type of and there we go. It's logical. I said you can also use abbreviation. So for my second logical variable, L2, I'll just use F. I feed that in and then you see that it when I ask it to tell me what it is, it prints out the whole word false. And then we get the type of again, also logical. Then we can come down to data structures. I'm going to create a vector, which is a collection, a one dimensional collection. And I'm doing it by creating V one for vector one. And then I use the C here, which stands for concatenate. You can also think of it as like combine or collect. And I'm going to put five numbers in there. You need to use a comma between the values. And then I call out the object. And there's my five numbers. Notice it shows them without the commas, but I had to have the commas going in. And then I ask our is it a vector is period vector. And then ask about it. And it's just gonna say true. Yes, it is. I can also make a vector of characters. I do that right here. I get the characters. And it's also a vector. And I can make a vector of logical values true and false. Call that. And it's a vector also. Now a matrix you may remember is in going in more than one dimension. In this case, I'm going to call it M one for matrix one. And I'm using the matrix function. So I'm saying matrix and then combine these values t t f f t f. And then I'm saying how many rows I want in it. And it can figure out the number of columns by doing some math. So I'm going to put that into M one. And then I'll ask for it and see now it displays it in the rows and columns. And it writes out the full true or false. Now I can do another one where I'm going to do a second matrix. And this is where I explicitly shape it in the rows and columns. Now that's for my convenience. R doesn't care that I broke it up to make the rows and columns, but it's a way of working with it. And if I want to tell it to organize it to go by rows, I can specify that with the by row equals t or true command. I do that. And now I have the ABCD. And you see by the way that I have the index numbers on the left are the row index numbers. That's row one and row two. And on the top are the column index numbers and they come second, which is why it's blank and then one for the first column and then blank and then two for the second column. Then we can make an array. What I'm going to do here is I'm going to create data and I'm going to use the colon operator which says give me the numbers one through 24. I still have to use the concatenate to combine them. And then I give the dimensions of my array and it goes rows, columns, and then tables because I'm using three dimensions here. I'm going to feed that into an object called array one. And there's my array right there. You can see that I have two tables. In fact, let me zoom in on that one. And so it starts at the last level, which is the table. And then we have the rows and the columns listed separately for each of them. A data frame allows me to combine vectors of the same length but of different types. Now what I'm doing here is I'm creating a vector of numeric values, of character values and logical values. So these are three different vectors. But then what I'm going to do is I'm going to use this function C bind for a column bind to combine them into a single data frame. I'm calling it DFA for data frame a or all. Now the trick here is that we had some unintentional coercion. By just using C bind, what it did is it coerced it all to the most general format. I had numeric variables, I had character variables and logical. And the most general is character. And so it turned everything into a character variable. That's a problem. It's not what I wanted. I have to add another function to this. I have to tell it specifically make it a data frame by using as dot data dot frame. When I do that, I can combine it. And now you see it's maintained the data types of each of the variables. That's the way I want it. And then finally, I can do a list. I'm going to create three objects here. Object one, which is numeric with three values, object two, which is character with four and object three, which is logical with five. And then I'm going to combine them into a list using the list function. Put them into list one. And now we can see the contents of list one. And you can see it's kind of a funky structure. And it can be hard to read, but there's all the information there. And then we're going to do something that's kind of, you know, hard to get around logically. I'm going to create a new list that has list one in it. So I have the same three objects, plus I'm adding on to it list one. So list two, I'm going to zoom in on that one. And you can see it's a lot longer. And we've got a lot of index numbers there in the brackets. There are the three integers, the four character values and the five logical values. And then here they are repeated, but that's because they're all parts of list one, which I included in this list. And so those are some of the different ways that you can structure data of different types. But you want to know also that we can coerce them into different types to serve our different purposes. The next thing we need to talk about is coercing types. Now there's automatic coercion, we've seen a little bit of that, where the data automatically goes to the least restrictive data type. So for instance, if we do this where we have a one, which is numeric, a B in quotes, which is character, analogical value, and we feed them all into this idea coerced one. And by the way, by putting parentheses around it, it automatically saves it and shows us the response. Now you can see that what it's done is it's taken all of them and made all of them character because that's the least specific, most general format. And so that'll happen, but you got to watch out because you don't want things getting coerced when you're not paying attention. On the other hand, you can coerce things specifically if you want to have them go in a particular way. So I can take this variable right here, coerced to, and we'll put a five into that. And we can get its type and we see that it's double. Okay, that's fine. What if I want to make it integer? Then what I do is I use this command as dot integer. I run that feed into coerce three, and it looks the same when we see the output. But now it is an integer. That's how it's represented in the memory. I can also take a character variable. And here I have one, two and three end quotes, which make them characters. I can get those and you can see that they're all character. But now I can feed them in with this as dot numeric. And it's able to see that they are numerical numbers in there and coerce them to numeric. Now you see that it's lost the quotes. And it goes to the default double precision. Probably the one you'll do the most often is taking a matrix. And that's just let's take a look, I'll make a matrix of nine numbers in three rows and three columns, there they are. And what we're going to do is we're going to coerce it to a data frame. Now that doesn't change the way it looks, it's going to look the same. But there's a lot of functions you can only do with data frames that you can't do with matrices. This one, by the way, we'll ask, is it a matrix? And the answer is true. But now let's do this, we'll do the same thing and just add on as dot data dot frame. And now we tell it to make it a data frame. And you see, it basically looks the same. It's listed a little differently. This one had its index numbers here for the rows and the columns. This one is a row index. And then we have variable names across the top and it's just automatically giving them variables one, two and three. But the numbers in it look exactly the same. On the other hand, if we come back here and ask, is it a data frame? We get true. And so it's a very long discussion here. But the point here is data comes in different types and in different structures, and you're able to manipulate those so you can get them in the format and the type and the arrangement that you need for doing your analyses in R.