 So let's get to plot some data. Now I'm going to import the DataFrames package and from that we have this read table. Now I do have video material on the DataFrames package. You can also watch the first of our mini projects which was on medical statistics just to have a look at the DataFrames package I'm going to just run through it without explaining much There is a CSV file that lives right in the folder that I'm working with. That is ccs.csv, a comma separated value file, and I'm going to import it as a DataFrame with a read table function, and I'm going to import that into this computer variable that I just called df. Let's do that. It'll take a while just to import that. There we go and let's look at the tail of that DataFrame, and that's going to print out to the screen for us the last few rows. So we see we have the following patient. There seems to be up to number 200, Agenda. We have the ages, and then we have variable 1, variable 2, variable 3, and 4, and category 1 which seems to be a's and b's, category 2 which seems to be cx and r's, and category 3 which seems to have p and r values when we're only seeing the last six row entries of this spreadsheet to this comma separated value file, which is now a DataFrame, and I do that using this tail function. I can look at all of these columns and I can just look at the type of data that they contain. I see the patient column, see all the list of columns there, and the patient is integer 64, data type, the gender seems to be a string, the age an 64-bit integer, variable 64-bit float, variable 2 seems to be an integer, variable 3 a float, variable 4 an integer, and then strings, and we see that we have no missing values. Very quickly, I'm not going to explain much, we can just get these descriptive statistics, describe the DataFrame, open and close, square brackets, colon gender that refers to the column name, and we see it has length of 200, again it's all strings, there are no empty or missing values there, and we see it finds two unique entries. And I can check what those are by writing this by function, it takes the arguments DataFrame, the column, and then it inserts this new variable called d such that we create this DataFrame with a new column name n, and it takes the arguments to sign of size of whatever it finds. Let me show you what that means. Remember it said it found two unique types of entries in the gender column, and indeed it found an f and an m, and the n counts the size of that, found 100 instances of f and 100 instances of m. So there we go, we can describe the age column, and the reason why I put it out here is once we start plotting so that you can see if we do box plot you're going to see the mean of 46 on those graphs. So describe just gives the summary stats of mean, median, 25th percentile, 50th percentile, the third quartile, which is the 75th percentile maximum, which is the fourth quartile or 100th percentile. And we can do go through age, we can do through variable one, we can go through variable two, variable three, it's all pretty self-explanatory, and we see variable four there. Now remember we get to category, and it's going to tell us again that it finds 200 entries, no missing values, it's a string data type, sorry no missing values, and it finds two unique entries, and by writing this by function I can create a new small little data frame which will tell me it found a and b, and again 100 instances of each. This is look at category two, it found three unique entries, and I can just look at what those are and how many they are. Now this is important to be able to do because if we're going to do box plots, we just count, it needs this pre-compile information, it needs to know that there is this, so you will construct a bar plot from this new data frame that you create here from the data frame, you won't be able to count the instances in get fly itself. Let's look at category three, there we see it, and again the unique entries and it found p, q and an r, 90 p,s 44 q,s and 66 instances of r. So let's do our first bit of plotting, a box plot. Again we have imported, we have imported get fly, so we can just use the plot function, but it is data aware, so I can just pass it the computer variable that I used for my data frame, I called it df, so I can refer to df. Now for my aesthetics, I can say well look down the gender column for the x-axis, so remember that was categorical data, so it was going to do one box plot of f and one box plot of m, and what do I want these box plot values to be of? Well for my y-axis, look at the age column. The geometry that I want you to do with this aesthetic is the box plot geometry, comma, and I'm going to give it a guide dot title, we've looked at that in a guide dot x label which is gender, so let's run that code and see our very first box plot, and there's beautiful deep sky blue box plots, as I told you it found two unique entries, female and male, and a charts there, we have our median, we saw before what it was, when we just looked at our descriptive statistics, and we have these beautiful little box plots. Now I can play around with them with themes, arguments for the theme function again, remember the theme there, so again plot, it's data frame aware, so I pass it the data frame argument, x is this the agenda this time, y is age, this is we have before, geom dot box plot, box plot geometry, I'm adding the same title and x-axis label, but you see it would also do it automatically, we did not specify the y label but it took the column name age and it attached that to the y label itself. So theme, the first argument I'm going to use is box plot spacing, underscore spacing, and I'm going to put it at 100 pixels, so that's the spacing between these two box plots, my grid color I'm going to do color and white again, and the default color is another argument that it takes, and that's going to change this color, this deep sky blue to something else, and I'm just using hexadecimal code here, so hashtag A, A, A, A, A, A, so you know it's going to be a bit of a green. And last thing I'm going to do is the middle width, that's this line, the median line for the two, I can change that as well, so look how beautiful this new plot of mine is going to be, that looks brilliant, I've changed the color to this gray, I've thickened this line up, I've made them smaller by increasing this box plot spacing in between those two, so let's play a bit more with this box plot, let's pass something else, we're going to pass the data frame, for x we're going to pass category three, for y we're going to do variable two, the geometry on this aesthetic comes from the box plot, box plot geometry, I'm going to add a title box plot of variable two values for category three groups, I just give that a name, my x axis is category three groups, I don't pass y because this is going to say variable two, but I could pass a guide dot y label, let's play around with the theme, I'm going to give it a default color of orange, we know how to do that, the grid color I want white so I don't see those little gray lines, now I'm introducing something grid color focused, I think we had had a look at that before actually, it's just when you mouse over that you don't see that, so I'm changing that to white as well, the box plot spacing is 80 pixels this time and here's something new default default point size, because they are going to be, I'm telling you now I know for a fact because I created that CSV file that there are some statistical outliers and I can give them a default point size, I'm going to make that seven pixel, now look at this beautiful graph, so all constructed there that is that is aesthetically pleasing, shall I use that term, so there's my title, y axis was grabbed from the column name x axis, I actually entered some string value there that I wanted and look the color is nice and orange, I've got this nice spacing in between making my box plots a bit smaller, if I mouse over the grid is gone, it's not gone, this is white and then look at these statistical outliers with quite large little dots, so that looks beautiful, now I can add another dimension to my box plot data and that is going to be done by this y group, so this was all just category three but I can add another group, so let's do exactly the same, so or let's at least say category one for my x axis instead of category three, we had three there, let's use category one, variable two still but I'm going to add this new argument called y group and that's gender so it's going to do these two rows of box plots for me because they're two unique entries for gender and then I've got to use a special, I've got to use a special geometry called jump dot sub plot grid, so I've got to use that and inside of that I pass the argument jump dot box plot because I want those to be box plots, next argument theme and it takes its own set of arguments remember and what are we going to use here is default color, we're going to make it orange again, grid color white, grid color focused, also white, let's just see, so it's just adding this extra dimension to our data, let's have a look at it, wait for two to render and there we go, so it says variable two by gender, it added that for me and it found female and male and category one it had two unique entries a and b and it takes variable two and it makes all these little box plots for you, so just adding that extra dimension by this y group good in the next section we get to start looking at density plots and histograms followed by some of violent plots