 If you thought regression analysis was a little bit esoteric, I have news for you. It's getting really esoteric now. We're going to talk about dimension reduction, moving in and out of higher dimensions and reducing dimensions. So the foundation of dimension reduction is often a procedure called principal component analysis. And we'll talk about how principal component analysis can be useful for dimension reduction. And I hope that by the end of this unit you'll be able to perform PCA on your own data and interpret the results. So not just do it but also understand what you're doing. And be able to use the results to identify interesting aspects of the data. And we'll also talk about some alternatives such as projection methods and embedding methods. So what's a dimension anyway when we talk about dimensions of data? So we're not talking about something that would appear in the X files appearing from higher dimensions. What are we referring to? What does the word dimension mean in the first place? I'm hearing an attribute. What word is embedded in the etymology of dimension axis? Yes, the idea is there. But etymologically, looking at the word, it's got a prefix d i and then something with mention measure something. It's something about measurement. Not necessarily independent. When we're talking about dimensions the values can be very dependent on each other and something. So that's something that we would like to figure out. Can we make them independent? Then why should that be so? But basically a dimension is something that can be measured. And two dimensions are orthogonal to each other if none of the measurements has information about the other measurements. So orthogonal dimensions describe independent aspects of our data. Another way to look at the concept of dimensions is to call them features. And that's become a term that is used a lot currently when we talk about machine learning of data. Machine learning basically is the art of using features of data and then drawing conclusions about the data in terms of grouping the data or making decisions of whether the data overall have a non-observed property. And this kind of looking at PCA and dimension analysis conceptually has a lot to do with feature engineering for machine learning. So what we're trying to do is often we're working with data that is quite high dimensional. So for example if you look at a gene expression profile across different types of stimuli then each of these stimulus say heat shock, acid shock, induction with some cytokine and so on and so on could be a different dimension along which this particular gene expression value varies. And they're not necessarily correlated in any way with each other or perhaps the correlation is something we would like to find out. Or we might have gene expression data that are collected in a time series at different time points and that's an odd kind of correlation with special properties that we can discuss. So these features together if they're the same feature we can think about them as being plotted on the same axis i.e. corresponding to the same kind of measurement. And the goal of principle component analysis is to take such possibly correlated variables or such features or such dimensions and transform the data overall, the entire data set so that we can describe what's important about the data with fewer dimensions. And if we have everything that's important about the data presented with fewer dimensions it becomes easier to discern structures. We're as humans really not very good to find structures in four and five and six dimensional space. That's not usually most of us anyway where we are situated when we think. We're very very good at finding structures in two dimensional images. We are often better than any kind of computer algorithm that we've discovered so far. We can even find structure in two dimensional images when there's none there. That's how good we are. We just imagine it. If you do a Google search for par idolia that's an example of seeing faces everywhere. Quite funny. So what we're trying to do is we're trying to take two high dimensional data and transform it into a smaller number of dimensions because it's easier to analyze. And to do that without losing anything that's essential about the data. So that's the idea. And the way we can do that is if the data is correlated in some way we can use the fact of that correlation to remove it. So if like in our regression example before we have a perfect linear correlation between two variables then we don't need the x and the y. We just take the y axis divided by the slope, subtract the intercept and then we come up with the x axis. So either using the x or the y values has exactly the same information. We can just throw the other dimension away. Now that's not often not always possible so we need to look at the data in a different way. The idea is to use this mathematical procedure principle component analysis to transform the data into these so-called PC's or principle components that are orthogonal to each other i.e. they are now uncorrelated. So here's the deal about that. Never mind the code we're going to reproduce that in code. So if we have a histogram of random values of x and y we could get a random plot that looks like this one down here. If we have a random plot of x and y2 where y2 is computed from a linear relationship plus some errors of y1 this looks not very much at all like this year. This would have a very low correlation coefficient. This would have a rather high correlation coefficient. However, if we only look at the histograms of x and the histograms of y2 we can't actually distinguish the two. They look the same to us. So the idea of correlation analysis is to take data sets such as these and rotate them or view them in a way where we maximize the variance along one dimension and then we may hopefully be able to ignore the variance along the other dimension. So kind of you're trying to rotate this in space in this case it would be in two dimensional space. You're trying to rotate it so that when you shine a light from the top the shadow in one direction is maximized and then that's what you analyze. So you take a data set and you rotate it in space along the dimensions. In this case this would be since it's a two dimensional data set in 2D space but it could be in high dimensional space. You rotate it and you try to rotate it in such a way that the variance along one of these dimensions is maximized and then once that is done you rotate it around that orthogonal to it so that the variance in another dimension is also maximized and so on. And it then turns out that the more you do that the less information is still remaining in the dimensions that you didn't treat like that. So in this two dimensional example if I rotate this parallel to the x-axis there's not very much more going on in the y-axis at all and then I can say well there's not much going on in the y-axis maybe I'll just throw it out and then just keep on analyzing with the x-axis data which is then one dimensional data. So the interpretation of these principal components even though it's a mathematical technique that's not very hard but to properly understand it you need a little bit of linear algebra but the intuitive interpretation is the first principal component shown in red here corresponds to a line that passes through the mean and minimizes the sum of squares of the distances of the points from the line that sounds familiar right it's just regression analysis along the first dimension and then you take that line and you treat that as the new x-axis and the second principal component is calculated in the same way after the correlation with the first principal component has been subtracted out. So apply to our data set where the histograms are indistinguishable between the original data and the data plus error terms after principal component analysis this histogram would correspond to the projection along the red vector it's most of the variance is now explained here and on the same scale this histogram corresponds to the components along the blue axis there's not a lot of variation left here. So what we can then often do is ignore this part and continue the analysis only with this part. In R we have two alternatives they're very similar PR comp and print comp for historical reasons there's two different approaches to that they can be used in very different ways but they use slightly different terminology which you might come across when you read tutorials that uses either of them. The center is basically the vector that was subtracted to center the data i.e. the whole multi-dimensional data set you calculate the mean of that and then transform translate the data set to the origin so it's then rotated around the zero point of the coordinate system. The standard deviations for each dimension of the rotated data basically tell you how much variance is explained by one of the principal components the what PR comp calls rotations or what print comp calls loadings are the actual principal components. So these actual principal components are vector that have the same length as the number of dimensions and they basically say how much each original dimension contributes to the new vector. And that's kind of interesting because it says after we've done this analysis and we look at the principal components we actually cannot interpret what these principal components are to a very significant degree. All of the dimensions can equally contribute to one of these new dimensions. So imagine that your data would be a perfect correlation along a high dimensional diagonal through the hyper cube then all of the dimensions would then contribute perfectly equally to the one dimension that the resulting data would require to be perfectly represented. So that's one of the downsides of principal component analysis. We get a mathematically very robust procedure to work with our data but we lose a certain amount of interpretability. Not all interpretability. We can still look at the data and then make some conclusions as we'll see when we look at actual data. So the actual principal components are called rotations or loadings depending on which program you use. PR comp uses X for the rotated data i.e. the data after principal component analysis and print comp uses scores. But these are the same things they're just called differently. One thing one should mention is that principal component analysis is sensitive to scaling. So it makes a difference on whether we report, for example, temperatures in Kelvin or in degrees Celsius or even in that arcane other measurements that we sometimes use in North America. Even though temperature is both, is equally similar to what we really well described with all of these systems, the fact that the numbers are different has an influence on the principal components. You can imagine if I say I have data of heights and weights and the weights are given in tons and the heights are given in micrometers. So these of people, these would be very, very different scales. So all of the variance in the data would be dominated by the numeric values of the micrometers. And the principal components would correspondingly just focus on the weight scale and the other information would be lost. So how do we fix that? Well, what we usually do is we use some kind of variance normalization. We calculate the variance of the data. We divide each individual dimension by its variance and then everything is on an equal scale. That's a good approach if the dimensions are independent. If they are not independent on each other, we can introduce some artifacts in this way. But as a good first approach, it's often reasonable to do a variance normalization with your data. So basically just use SD or the var on the data and divide the entire data column by that value. Well, okay, so again, let's do some, let's do some code. Let's play with it. It's the same procedure that our studio project is called our EDA dimension reduction. This is the script. I'll pull it out again. And let's start with the same synthetic data example. 500 normally distributed samples each uncorrelated. So build a vector of X1 and Y1 of normally distributed samples. And then we generate this Y2, which is correlated with or dependent on X1. So Y2 is just two times X1 plus Y1. And when we do that, we notice that we have a significantly large mean so we subtract the mean from that data. And now it's virtually zero. And we also notice that we have a standard deviation, which is not equal to one. So we divide the data by the standard deviation. And now it is equal to one. And then we plot. So basically what we've done here is we've generated data that has a linear correlation and an error term. And then we've normalized it so that the mean is the same as X and the standard deviation is equal to one. So let's plot this. I've shown you how to plot things and then add elements to the plot with commands like text, or line, or points, or map lines, or other additions. But what we can also do is we can plot more than one plot into the same graphics window. And that uses one of the graphics parameters. There's a quite large set of graphics parameters that you can set globally for your session. For example, you can set the number of rows and columns that will then determine where R is going to plot its next plot. So if I have two rows and two columns, the first next plot will go into the top left, the next plot will go into the top right, the next plot bottom lift, and then bottom right. So then I have four plots in the same window. Of course we need to reset that if we want to plot different things. But the graphics parameters are sticky. So in order to reset that, it wouldn't be automatically reset when we make a new plot or a new plot. We have to reconstruct our graphic state as beforehand. And so this parameter function has a side effect of setting the graphic state. And it has also an effect of returning the previous value of the graphic states, which is kind of crafty. It doesn't return what it did. It returns the way it was previously. But in this way, we can run this function to set a new graphic state, and we can save the old graphic state in a variable. This is old parameters. And then if we are done with our plots, we can reconstruct the previous graphic states, i.e. again get back to the function that one plot fills one entire plot window. We can reconstruct that by defining par equals opar, or par as a function of opar. So that restores the graphic state parameters. Now I usually get that wrong. I either forget to reset the graphic state or I have a typo or I forget how to properly use this. One way of how to basically reset the graphic state is simply to close your plotting window. And then once a new plotting window is opened, it will come with default parameters. So in this case, once we execute this, we can put four plots in the same window. The first plot should be a histogram of our X1. So that's this histogram. And it goes into the top left. The next plot should be a histogram of Y2. The histogram of our new data. Remember, we've normalized it. So the means are the same. The standard deviations are the same. And simply from looking at the histograms, we could not tell in this data set whether X or Y would be more important or what's going on with the data. Now if we plot X1 against Y1, we see that this data set is essentially uncorrelated. And if we plotted again X1 against Y2, we see that there's a large linear component in there. Now calculating the principal component analysis of X1 and Y2, i.e. the principal component analysis of this plot is very easy. Simply run PR comp on these two columns. So once a matrix, what we have are two independent vectors, but we can generate a matrix with the C bind command. C bind binds two columns together, or binds a column to a matrix. C bind is short for column bind, and there's a cognate R bind to bind rows together, which we can also use. So if we do that, we get an object which I've now called PCA sample. What is this? Well that's the data we get here. Standard deviations of the PCA sample are 1.3 and 0.3. So you already see that after rotation with principal components analysis, most of the variation in the data is only in the first principal component here. So you have to think of these principal components as the new X and Y axis. The one X and Y axis we get after rotating the data, or that would correspond to the rotated data. And the rotations are defined here. Minus 0.7 and so basically this corresponds to a 45 degree rotation. Now as I've shown you in that PowerPoint slide, there are a number of values that are available in that PCA sample object. There's the standard deviations, there's the rotation, and so on. So let's look at that. Standard deviations is something that was printed by default. We can get at the rotations like that individually. We can get using the summary command. Remember the summary command gave us minimum values and quantiles and means and medians for vectors. It has a special version of summary for PCA object so it behaves differently. And it gives us the importance of the components. So the proportion of variance here is that 95% of the variance is now found in PC1 after rotation. And only 4, about 5% of the variance is found in PC2 after rotation. So we see that PC1 is 20 times more important than PC2. And that is something that might lead us to conclude, well we can just toss out PC2 and keep working with PC1. PCA sample $x gives us the rotated values. And if we plot them, we get this here. See did I forget to run this? I don't remember why I set it explicitly. Maybe just for nicer numbers here, of course it would work in the same way. Without that. Ah, yes. Okay, so that's why. So what's the difference? Right, so it's random, but the scale is very different. So what the plot does by default is it optimizes the axis so that we see the most amount of data in both axes. But with the plot I did previously, I wanted to emphasize that now all of the variance is just in one dimension. So plotting it in this way shows me that because now the y-axis limits are on the same scale as the x-axis limits. I'm wasting a lot of space. Well in this case I'm not wasting space, but I'm demonstrating something through empty space. Okay, so if we compare the histograms before and after rotation, essentially this is the plot that we've seen before. This was before rotation, the two histograms. This is after rotation. Most of the variance of the data is in the first principle component and much less of it is in the second principle component. If we plot the sample along the principle components as axes, that's what that looks like. So this is now a plot of the rotated sample using the axes. The object PCA sample is list five elements, standard deviation, rotation, center scale and x. Some of these have attributes, dimension names x1, y2, PC1, PC2 and the class is PRComp. So plot recognizes that PRComp is a special kind of class and treats it accordingly. So that's a very simple synthetic example, a very simple data set. Let's look at something that's a bit more biological. I remember when I was a medical student I was volunteering in the department of physiology and looking what people were doing there. And some people were interested in salmon. Kidney physiology of salmon is actually quite interesting because they spend part of their life in salt water and part of their life in sweet water and the kidneys have to adapt to that. So they were studying that and as a side effect, every now and then they had a large batch of salmon. They took out the kidneys for study and then of course the rest of the fish had to be otherwise disposed of. So we had excellent salmon at that time. Now these researchers here apparently like to go scuba diving and they're in West Australia, nice place for scuba diving probably, and they study crabs. The question they had is if we do a morphometric study of crabs, can we distinguish the species and the sex of the crabs? God, I can never remember. Is it actually two genders or two sexes? I think with crabs it's sex and not genders. So if I use them interchangeably, it doesn't really mean anything. Okay so there's blue and orange of these crabs and there's males and females and we can take a number of measurements. So we catch a crab and then we measure its frontal lobe size. That's this measurement here and we measure its rear width with calipers all the time trying very hard not to get pinched because the crab doesn't agree. The carapace length and the carapace width and the body depth. So these measurements should somehow characterize the crab from its shape. They're related species but maybe they're different enough. The question then is once we've done these morphometric measurements can we infer which crabs is which? And the data set is included as one of the sample data sets with all our distributions. You simply type library mass and load the data set crabs. Now that's interesting. Here as in terms of data set crabs has been loaded but nothing actually exists yet. It's a promise. One of the ways that R is able to do things very efficiently is because it defers activities that it doesn't actually need yet or is able to defer activities that it doesn't actually need yet as promises. So we've loaded the data set only in principle but since we're not actually using it yet we're not reserving space in memory or reserving process of cycles to do anything with it. So it's just there in principle. If we look at the header of this we see the species blue in the first six rows, the sex, the index, the counting index and then the measurement for frontal low rear width carapace length. And now the promise has changed into an actual data set. So we've used it so at that point R said okay now we actually need to load it and we need to access it. So let's have a look at that. First of all I'd like to be able to annotate the data. And in order to annotate the data I want to have a code that characterizes them. We have letters for species B and O and we have letters for the sex M and F. So B-O should be for, B-M should be for blue males and O-F should be for orange females and so on. And in order to get that we use the paste function to take the letters from the first column and the letters from the first column and the first column and the first column and the first column and the first column paste function to take the letters from the first column and the letters from the second column and just paste them together. So these are little labels that characterize our crabs. And as you can see the first set of crabs are all blue males then we have all blue females then all orange males and all orange females and we have 50 of each. Now that's very nicely cleaned up data usually it's messier but just keep that in mind. 50 blue males, 50 blue females and so on. Now this is one time we actually want to use factors. Now if we take these labels here and turn them into factors, assign that to the variable FAC, you see the first six measurements are now B dot M as labels, they're factors now, no longer strings so they don't have quotation marks around them. And whenever you print some factors it will always tell you what all of the possible levels are. So this is B-M, we also have B-F, O-F and O-M. If we look at the different factors like these four, the factor of the first, the 51st, the 101st and the 151st, these are now two, one, four, three, that's the contents of the factor. And we can turn them into numbers, two, one, four, three and we can use these numbers to get different colors or different plot symbols or different line types and thus make plots where the types of crabs are distinguished as we plot them. So in that they're not all circles, they might be different plot symbols. So that's what we do here. We plot crabs column 428 where the plotting character is the factors that we've just determined turned into numbers. So that's what our labels, what our actual data look like. Quite highly correlated. I didn't draw a legend here so I can't tell you offhand which ones are the circles, which ones are the X's, which ones are the pluses and which ones are the triangles, but one of them each is blue males and another one orange females and so on. So now the challenge is here's the two dimensional information. We have five dimensional data sets. We have two dimensional information essentially like this and the question is how do we take this information to actually distinguish between the crabs or can we do that? And what you see immediately is that there's a lot of overlap. So the question is can principal component analysis or some other procedure help us to remove the overlap based on these measurements? So if you assume that probably because we're working on this example it's true that some combination of the morphometric measurements is able to distinguish the different crabs here. It is not possible to do it with the original measurements and the correlations of these original measurements in two dimensions. There's simply too much overlap. Can we imagine which combination of these dimensions we would need to do that? No. This is where our capability to visualize and to imagine data breaks down. So we can do this easily in two dimensions but none of the two dimensional presentations here actually help us. So let's apply principal component analysis to the five dimensions. So the dimensions we want to work with is not column one and two because that characterizes the crabs, not column three. That's just an index but column four, five, six, seven and eight. So principal components analysis PCA crabs on columns four to eight. And we plot that. And we see that the principal components analysis is quite successful. There's a huge first principal component that has all of the variance and a lot of other information in the other principal components. So what does this mean and what does it do? So if we look at the summary here, the principal part of interest in these summaries is often the cumulative proportion. So basically this is taking the first principal component, the one with the largest variance and then saying what proportion of variance do we have in that? Then we add the next and the next and the next so we can basically pursue how the 100% variance is generated. So with the first component we have 0.98, adding the second component goes to 0.99, adding the third component goes to 0.99, 9 and so on. So almost all of the information is in the first component. Now there's a standard plot that we use for such scatter plots and that's the so-called by plot. The by plot plots the principal components in the rotations from PCA crabs and it also adds the projections of the individual principal components so we can have a look at how they are oriented and how the data, the rotated data lies along the new principal components. We can add, okay. So these are now the numbers, one is the blue males, two is the blue females, three is the orange males and that's the by plot of the first principal component and the second principal component. So after principal component analysis, we see that actually the data is separable to a certain degree. We have very large variance overlap between 1 and 4 and we can't distinguish it perfectly and it's kind of all odd going in that trend. So let's look at plotting along the other principal components. So this is 1 against 2, this is 1 against 3, the first against the third principal component and it's separated in a different way. Again, there's a very large amount of overlap here. So number one was the strongest principal component, number three is the third largest, the one relatively small principal component and that's what we see here. Now here's what we see when we plot 2 against 3, the two smaller principal components. What do we have here? What's going on? We have almost perfect separation of our four categories if we plot principal components 2 against 3. Didn't we just say that all of the information is actually in the first principal component? So we can get separation if we look at the noise but not if we look at the data or am I misrepresenting something here. What's going on? Any idea? Why do we have these huge correlations in the first place and why can we look at the categories in a representation that doesn't even look at the huge correlations? Well, actually all of the units were highly correlated with something. So what we have in a very similar way and Brandon's actually absolutely right, they were all correlated with a confounding factor that we didn't take into account here and that could be something like weight or volume or age of the crab, the older they get, the larger they get but all of the dimensions grow essentially in the same way and we didn't distinguish, we didn't require that all of these crabs would have exactly 50 grams or 500. I don't know how large they actually get. They do look tasty. So that's what's happening here. We have this huge correlation but it's due to a confounding factor and once we can remove that out through principal components analysis then the actual information in the data becomes apparent. So what we're doing now in the second and third principal components is essentially classifying them not by this correlation here but overall shape independent of what the absolute values of the size is. We could probably have a similar effect if we take an average of all the values and then simply divide everything by that average set of values then we would get more information here. But this principal component analysis nicely pulls out and is able to remove the confounding factor. Now that's a special example. It's not always the case that you have to throw out the first principal component because it corresponds to a confounding factor. That would certainly be false. But if there's a confounding factor it is often very strong in all of the data points because it affects all of the data points equally. So that's one way to detect it. Mohamed. Machine learning at least based on my vague understanding from an undergrad course that we often look at the components with the highest variation and actually keep those components and throw smaller variations away. It depends. Go on. Sorry I was just wondering. This example very nicely shows the counter example of what is often done in machine learning. In other words here we should throw away the largest variation and keep smaller ones. Yeah. So it depends both on the type of machine learning you do and the kind of computational resources that you have. Some types of machine learning are quite sensitive to features that are not informative like the confounding factor here. So in that case we would do a pre-analysis and find that the first principal component should be removed and then we would only work with the second and the third principal component. This reduces the number of dimensions and reduces information that's erroneous. If we have fewer dimensions we can run more cycles of optimization on the classifier and then we can get better classifiers and then everything will work better. However, that's not actually true anymore for the modern machine learning approaches like deep neural networks. They don't care. We just throw all the numbers we have on a deep neural network and let the neural network figure it out. As long as your training data is well conditioned and covers all of the possibilities and is correctly annotated you are likely to get good results. Of course training runs will still be longer if you have uninformative features but it's not necessarily the case that uninformative features will lead to less accurate results. So I don't know. There's this modern perspective where we might not really need data science anymore. We just need large clusters and large machines. We don't need to worry about features as much as we used to do. But I don't think the data analysis will go out of style very quickly for a number of reasons. You still have to prepare the features in some way and understand which features are available in the first place. The most important part is asking intelligent questions. You never mechanically treat data in that way. You analyze it in different ways and you interpret what you see. So as I said this is an example that's interesting because we can identify a confounding factor here. But to say we just throw small factors away and keep large ones that would not be correct. So what I would do is I would drive it from the performance of the classifiers. So I would try different ways of preparing features and feeding them into the machine learning algorithm and then see which ones give the best reductions and the best classifications in the end. Keeping maybe the top ones. We do a genetics for example. If you want to control the population stratification, you actually want to look at the population and see if you're separating out. And that's how you decide to keep them. Here we're classifying. So we're looking at the top ones and we saw that the top ones don't decouple or decompose the different types. So we move down to lower ones as well. So I would say that it's never one size fits all. I think you have to look at your data. Look at your data, play with it, learn about it. Simply looking at the first and third and first and second plot that we had before, it wasn't clear that we could separate the data at all. This might have gone on for all of the dimensions. And then that would have said, you know, what we're measuring here is what we're measuring here, but it doesn't really correspond to our categories. Okay. I have a small task for you. And that is to plot this last plot, i.e. this kind of plot. Not as a by plot, but simply as a scatter plot of the rotated values in a way where your plotting symbols correspond to the sex and the type of the crap. Orange and blue circles for females and triangles for males. Orange and blue triangles. The reason behind that is that interpreting scatter plots is often tremendously helped by identifying which data points we're actually looking at. So if we see outliers on the scatter plot and the outliers are just circles, that doesn't tell us a whole lot. But if the outliers are labeled with gene names, then we can perhaps identify all of these are known cell cycle genes. And that then tells us something. So one of the things that I would like to start doing here is when we do scatter plots, decorate the values with information that we have about the values. So in this case, the information would be encoding the categories of species and sex into the shape and the color of the plotting symbols. And I'm being deliberately vague. You don't know how to make circles and triangles in a way that you can color them. We've talked a little bit about specifying colors and so on. But we're talking crap, so I'm throwing you into the deep end. Try not to get pinched. See if you can solve this. And before you despair, put up a red postage so we can help you out. Throw you a life boy. So I'd like to walk you through one possible way to do it. If you have very different ideas, then we'll discuss this as we go along. So the problems we need to solve is first we need a plotting frame that will accommodate all of the data. And then we'll need to find a way to plot the points individually. And then we'll need to find a way to assign a particular shape and color to some of the points that we plot. So plotting the value in the first place requires nothing more than plotting $x of the PCA crabs, column one and column two. So that's essentially the underlying plot that we had. But right now, they're all circles. So in order to plot some of the values with different symbols, we can plot, for example, only the first 50 values. So all the first 50 I think are blue males. So that's what that looks like. Now, R automatically adjusts the size of the plotting frame when we do that. So now we would have no more space to plot orange males and blue females and so on. So that's not good. We need to define a plotting window that will actually accommodate all of our data. We talked about x limits and y limits before. So can we use x limits and y limits? Yes, in principle. If we do range on PCA crabs x one, we get the smallest and largest value. So we can plot the same thing, column one, column two, and specify that the x limits should be the range of all the values in column one and the y limits should be the range of all values in column two. And that creates a plotting window that is large enough to accommodate everything. So that's one way to set the plot window large enough. There's another way which is probably preferred, much easier. And that is simply to plot the window without plotting or showing anything as an empty window and then adding everything we want to show through appropriate commands of points and lines and text and whatever. And if I remember correctly, we simply need to specify type equals n for none. So now we plot all of the values, but we don't actually show anything empty. And now we can use not plot, but points. The points command works exactly like the plotting commands, except it doesn't set a new plot window. It writes nothing into the margins. It doesn't do access ticks and labels or anything. It just plots symbols for whatever data we tell it to. Oh, yeah, of course. Well, if that is wrong, if I limit it, of course there's no difference. So here's my empty plotting frame and here's plotting the first 50 values as points. And doing this in a very pedestrian way with the... I hope you understand very well that there's a lot of possibilities of adding typos here, doing things wrong. This is quick and dirty, but quick. We can get the categories here. Okay, so that's our second set of points, our third set of points, and our final set of points. What did we gain again? Well, several things. First of all, this was wrong anyway. We didn't want one and two. We should have had... What was it? Two and three. Okay, so let's change that. Two and three. Oh, if you haven't seen this yesterday, we can use the Alt key in our studio editor to select things from columns in the text editor or to enter things in the text editor in columns and delete them and replace them. If things are lined up, this is quite quick and easy. Okay, so let's do this again. Plotting the empty frame, plotting the first data set, the second data set, the third data set, and the fourth data set. Yay, it looks just like before. So what did we gain? Well, what we gained is that we're now doing it independently and we can write additional commands into these points to do different things. So for example, we can define different plotting characters. What plotting characters should we use? Triangles for males. How do we specify that? What's the plotting character for color-filled triangles? How do we know? Do we do question mark or PCH? I'm not sure that would work. It's there? For PCH? Question mark, PCH works? Awesome. Okay, very good. So the triangles would be 17 and the circles would be 16. We need the filled ones because we went to fill them with a color. Okay, so which of these groups are the males? I think it was the first and the third one, right? And then that was 16 and 16. Okay, so the first triangles, second circles, third triangles, and the fourth circles. Okay, now accept that they're not colored yet. Should we hex it? Yeah, let's hex it. I'm going to remove some spaces here so it doesn't wrap. Okay, so the first one is blue. Let's use something of a light blue with a little bit of cyan. And then we get two orange ones. Let's also make it something like this. The orange is very light. It's more like a dark yellow. Let's make it deeper. There we go. Okay, so now we can actually see that the separation is very good. There's actually only one overlap that we can see here. Everything else we could potentially draw a line through the convex shape that covers all of the points and none of the others. So that's a very good separation. That solves this task. Would you have done it fundamentally differently? Any other ideas? Anything confusing about this? Could you do this for your own data? There's more elegant ways to achieve these things, but I think this is very pedestrian, doing nothing that we haven't done before, no special new tricks. Yeah, right. Awesome. Exactly. So basically the way you do that is you generate a vector that contains the individual color values, and you use that vector for the colors or for the plotting characters. I'll write up a little sample solution tonight that we can just run by tomorrow, and I'll also add a little bit of code, because that's quite informative here, that scales the plotting character by the average weight or the average size of the individual measurements. So the hypothesis is that the young animals would not be easily distinguished, so the smaller they are, the more closely they should lie together, and the larger they become, the more characteristic the differences become. That's what I would hypothesize about data like that, and we can easily test it and show it on a plot like that if we scale the plots by average size. So we'll look at that function tomorrow.