 Okay so in this video we're going to play around with some data frames. So when you're manipulating data you want to work on columns, you want to work on rows, and you can do that in a visual spreadsheet manager like Excel, but R has a lot of capacities to look in on specific things and columns and rows and do a bunch of advanced stuff. So we're going to talk about that in this video. So if you don't know R actually has some built-in data frames that you can play around with, like sample data sets, and the one we're going to be using in this video is beaver 1. Now if you type beaver 1 into R you will see that a bunch of data plops out. That's because this is a built-in data frame that's in there. I'm pretty sure it's in every distribution of R and all operating systems, but so it's here on mine and it's all the other ones I've checked it's there. So let's play around with this thing. So let's first off say summary beaver 1. Let's try and figure out what this is. I'll go ahead and tell you actually move that up. I'll go ahead and tell you that this is a bunch of measurements of body temperature taken from a beaver and there's also beaver 2. That's another beaver though. So look at the kind of columns we have here. So we have day, we have time, we have the temperature, and we have this column active which I'm pretty sure is if the beaver was active or not. I'm not a hundred percent sure, but either way it doesn't matter for us now. So if you want to look at if you only want to see the beginning part of a data frame you can just type in head and then the data variable or you can type in head equals or head in equals like 15 if you want to see the first 15 of the rows. Okay so we have day and this is presumably day of the year so the 346 the day of the year. I guess that's in December. We have the time that the temperature was taken and then we have the temperature we actually got. So notice the time here if I do let's say if I do max beaver one time that's gonna give me 2350. That's the maximum of all the times we have so we're on 24 hour time just to be clear and we have temperature in Celsius. So first off I should explain what did I do here. So I did one of the first little things where you're when I'm ending up sub-setting data. So what you can do is if so this whole data frame is like a big Excel spreadsheet or big spreadsheet generally but if we want to talk about specifically one column or row we can narrow in on them with the dollar sign. So if I say beaver dollar sign time or excuse me beaver one dollar sign time it's gonna return all of the time values and this is effectively in a vector and we can treat it like a vector. For example we can say you know beaver time plus you know 200 or something and 200 is added to all of those. Now of course we're not actually adding to that in the data frame we're just doing that real-time you know doing the calculation in R. But so you can say beaver you know beaver one temp or something like that if we want all the temperature values. The other thing you can do is in the same way that you can call temperature or call columns you can create columns. So let's say let's say we want to add in a new column and we want to call it here's what we're gonna do. We're gonna say beaver one legs. So we're gonna say that beaver one legs is equal to four. So what is that gonna do? Now if we look at our data again if we look at data of beaver one we're gonna see there's now a legs column and it has four in every single value. So here I've just added in one number. You could add in a vector of the same size and you know add all that data in. But now we know that each time that the beaver was measured you know each time we counted it or measured his temperature he had four legs that's very convenient. Now that's not gonna mean anything much but this is how you add in different columns and you can add vectors or numbers or whatever. Okay so let's start doing some let's do some real manipulation of this data. Okay so here we have the temperature column. Now the temperature column is in Celsius. Now I'm not big on Celsius just because I'm an American. I don't know I find like Fahrenheit is so much more intuitive. So if you don't know how Fahrenheit works here here's how it works. Zero is really really cold on a really really cold day. A hundred is really really hot on a really really hot day. That's what Fahrenheit is. I don't know the Celsius stuff I don't care if it's more logical it's not for me. So I'm gonna change temperature I'm gonna change temperature from Celsius to Fahrenheit. So how am I gonna do this? So we already know that beaver one temp this is a vector. We also know that we can modify vectors by you know just sort of algebraically adding things to them or something like that. So and we also know that we can like we did back with the leg thing we can actually create different new columns or actually replace columns by setting setting them equal to something else. So let's do something a little magical let's say beaver one temp and we're gonna set that equal to something. Now what I'm gonna do is make a little equation. So what's the equation for Celsius to Fahrenheit? I'm pretty sure it is the Celsius degrees times nine over five all of that plus 32. Okay so that's the equation. So what I'm gonna say is beaver one temp I'm gonna set that equal actually you know let's make a new column we'll say F temp for Fahrenheit temperature or something like that. That's gonna be equal to beaver one temp yeah times nine over five and I put this all in parentheses so the order of operations works out. I think it would anyway in this situation but so beaver one temp times nine over five plus 32. Now if I run this I hope that works. Let's look at beaver one F temp. Let's see what that looks like. Those looks like good body temperatures. So oh in body temperature in Fahrenheit just a little under 100 if you have a hundred you might be close to get having you know some kind of a flu or something or fever. So here we go here are all of our body temps and Fahrenheit now we can run beaver one again and we'll see that we now have a column of Fahrenheit body temperatures. Now keep in mind this is this is a super easy thing to do it's the equivalent of like in Excel when you have one of those things when you have like equals and then some you know times whatever but we can do this sort of on the fly in R. Now it just as well could have said now that back here I said that beaver F temp is equal to this. I could have also just said temp and what that would have done is rewritten the temperature column with this but I think I want to have both Celsius and Fahrenheit. So what else can we do to this data? Let's do something more fancy. So here as I said before we have time and we have time in 24 hour time. So let's say hypothetically this isn't really a good idea but so let's say hypothetically you want to have that time in 12 hour time. So how would we go about doing that? In R we can do it pretty quickly but how just think about it programmatically. So if you're in the normal computing language that didn't have all this fancy vector arithmetic or whatever you would probably have to have some kind of loop that goes through here and decides you know which one of the temperature or excuse me the time values is over is 1300 or over because at that point 12 hour time is going to roll over 24 time isn't. So all of these values right we would take those values and subtract 1200 from them and that would give us 12 hour times. So how do we do this in R? So R has this really cool function called if else. So what if else is is right here is you give it some kind of test for it to do and then it returns you give it three arguments one is the test one is what it returns if the test is true one is what it returns if the test is false. Now this isn't just a simple like yes no thing but if you have it act on a vector it's going to return as many yes or no values for however many elements are in that vector. So let's just let me go ahead and say so beaver one temp or excuse me time is what we're now doing. So what we can do with if else is do something like this. Well we'll actually set this equal to a variable we'll call it a con for condition okay. This is going to be equal to if else and then first we put our condition and our condition what our condition is going to be is if a beaver beaver one time is a greater than or equal to 1300 so effectively if it would have rolled over if it were in 12 hour time. So if that is true what we we're going to have it return the number negative 1200 where we're going with this you'll see in a second and if it's not like that if it's not over if it's not 1300 or over we're going to have it return zero okay. So I'm going to run this now all of this function has now been or the output of the function is now saved to the variable con. I'm going to pull this up now what this has done is it's gone through beaver time as we told it to do and it's checked for each one of these values if it's above or below 1300 and if it is above or if it's 1300 or above it returns negative 1200 otherwise it returns zero so we have this very nice vector here so you may be able to see where we're going with this now. So now what we can do is we can take so we can take beaver one time and we can add con to it I was about to say subtract but we have negative negative numbers here so what this is going to do is it's going to take each and every one of those times and add them to these so all of these 24 hour times are going to be subtracted so let's run that and you'll now see actually let me move it up you'll now see that we have all of these so 2pm is now you know it's not 1400 it's now to 00 so if we want we can go ahead and put this in a column in our data frame because right near right here we're just doing the math but if we want to put it in our data frame let's put it as you know 12 hour time or something like that so 12 hour equals beaver one time oh yeah we shouldn't use we should not use one two so we'll say 12 okay so yeah you can't have variables with that in um so if we look at this now we now see that we have 12 hour time actually let's uh well we see them up here anyway so now we have a column in our data that has time that we've manipulated with some kind of condition now again I could have just as easily replaced this actual time variable but you know you can do it if you want now keeping other things we could do with cond we could have it um or if else we could have told it to for all of these say that it's afternoon or something like that or for all of these say that it's am or something like that um all of this kind of stuff you can play with you don't just have to give it let me go back to if else you don't just have to give if else numbers you can give it strings to put in a vector you could give it true or false or something like that depends on what you you're actually looking for okay so the last thing i'm going to talk about is also pretty cool it does sort of similar idea and that is the subset command so subset is a very nice thing basically what you do is you feed it data and you feed it a condition and it gives you all the data in that condition it's doing something relatively similar to what if else does um but you know you're not making a new vector you're just looking at a subset of the data okay so if we remind ourselves let me look at head beaver one again so let's see we have temperatures here we'll look at the celsius just because why not uh no let's look at fahrenheit because we made it ourselves so let's say we have you know 97 we have 98 we can look at let's actually look at summer beaver one let's say we want to return all the places where the temperature is below 98 or something like that so what we can do with the subset command is the following first you give subset a data frame so beaver one um and then you say the condition uh like which one of those to return so we want the ones that um are true when beaver one f temp is uh greater than or let's say less than 98 i think that's what i said before okay so if i run that you'll now see that it is not it's not just you know puking out random numbers it's puking out only those observations where the f temperature the temperature in fahrenheit is less than 98 degrees so we can reverse that as well we can say only those greater than 98 and there are actually a lot more of these um but that's a way of looking at a subset of our data or something like that um now we can do the same thing with time we can do the same thing with day so remember there are two days in our uh let's see so let's say uh let's say we only want the observations of excuse me we only want the observations from day 346 we can just say day is equal to 346 and we will only get those observations not the ones after them or you know when active so we said active is a binary variable so we can say only those when active is true or equal to one um we have only those when it's equal to one or true okay so this is just uh i guess sort of an entry way to messing around with data frames in r again we played with subsets we played with making if else conditions so this is a great way at taking your data manipulating it we play we perform mathematical operations on columns made new columns we converted an entire column to another you know a way of another way of looking at data um so this should give you an idea of like where you can jump off for in actually addressing problems in data sets in r so we'll probably talk about some classes of data in r next and then maybe on to some more advanced plotting stuff so i'll see you guys next time