 Coming back from our coffee break, I put a slight modification into the code that we just wrote here. I put the digit 9 here. This specifies that the string should have a width of exactly 9 characters. In this way, I can format my output in a way that's nicely aligned for writing tables. There are quite a number of commands to modify and customize the behavior of the Sprinter command, and I will write a little cheat sheet on the most important ones that I will include in one of the next projects so you can look that up. That is probably going to be useful. Are there any questions on what we've done so far? Then let's do some more. We mentioned quantiles, and I'd like to give you some more examples of how quantiles are really useful to look into data and see if data has particular properties. The quantile is basically a threshold that has a given fraction of values above and below it. If we calculate a number of values and look at the quantiles, I can illustrate better what that means. We start with a small sequence of numbers from minus 4 to 4 in intervals of 0.1. And then for each of these x values, I calculate the expected density of the normal distribution with mean is 0 and standard deviation is 1. That's what I'm just getting to. So there are, for the in-built probability distributions, canonically there are always D versions, P versions, Q versions, and R versions. So for the normal distribution, this is D norm, P norm, Q norm, R norm. For the uniform distribution, i.e., the same probability of values in some interval, it would be D, UNIF, R, UNIF, and so on. So D norm gives you the density, i.e., the value of the function at that particular point. So using points that I calculate through D norm along a sequence of regularly spaced numbers allows me to make a plot of the function. So essentially D norm is the y value for an x value that I give it. So my x values simply numbers from minus 4 to plus 4. My f values or function values are numbers that correspond to that. And now Q90, Q norm, is the quantile at the 90% density. So it's a value that I can plot on the x-axis and 90% of all the values of the normal distribution with the same parameters will be smaller than Q norm and 10% will be larger than Q norm. And then I can plot that. So let's do a plot. Plots in R are very simple. You give some x value, optionally some y value. We can specify labels for the x and y-axis. We can specify whether the type is lines or points or both. And we'll get many more examples later on. And in this case, I also specify that I want a very thick line. The default is 1, and I want a very thick black line here. So if I execute this, that's the plot I get. So this is the normal bell-shaped normal distribution. And the interval from minus 4 to 4 with the function values here and the density values here. After this plot, I'm going to plot an AB line. AB lines are ways to draw single lines over a plot to emphasize something. So there's other ways to call it, but two of the main ways are with V equals some value or with H equals some value. V means give me a vertical line at the Q90 value. Which means give me a horizontal line at whatever value I request. So an AB line vertical at Q90, and the color is supposed to be red, and line width should also be 5. So 90% of all values are on the left-hand side of that red line, and 10% of all values are on the right-hand side of that line. We think of significant outliers as corresponding to plus-minus two standard deviations. How can we plot a line that corresponds to two standard deviations into this plot? How do we calculate the standard? Do we need to calculate the standard deviation? Q90 would just say V is exactly the same. No. Q90 is not precisely equivalent to a standard deviation. No. So you say dot V is not just something that we have to do, isn't it? Yeah, exactly. I need a different, I need a different value here. But what value should I use? Can we just use this? Why? Why just use two? The standard deviation is one. The standard deviation is one. Why is standard deviation one? We assume a normal one. Because I've defined it to be one, you're absolutely right. I'm using function values here for the normal distribution with the mean of zero and the standard deviation of one. So two standard deviations are the points at minus two and two. So let's look at the two standard deviations, AB line V equals, let's plot two points here. We can do this minus two and two. And let's give it, oh, let's just make it C green and line with the two. So these are the two standard deviations. Two standard deviations on one side correspond approximately to a Q90 for half the values. So if we would take the absolute values of the normal distribution, then the Q90 would be approximately two standard deviations. If we take the entire set of values, and they can be much, much smaller than that, then the Q90 shifts to the left, obviously. Okay, wow, so we've seen something interesting. We've seen plots in R. We've seen ways to generate plots for arbitrary functions or functions that are defined. And we've seen one way to put more information into the plot by adding lines to the plot. Yes, line width, line width. It's one of the plotting parameters. If we plot lines, we can specify different line widths. Now for plotting colors and lines, I actually do have a cheat sheet which we'll get to in a moment. But it's not wrong to write it down right now. It makes it easier to remember. Now if we take the same probability distribution for empirical quantiles. So these are theoretical quantiles. So this quantile, the value was obtained from Q norm. Empirical quantiles are quantiles that are determined from the data. So I don't know how the data is distributed. I happen to take it from the random values of the normal distribution. But these could lie all over the place. So when I ask for the quantiles of this distribution, it tells me the range of the values that I got. 0 and 100 percent quantiles goes from minus 2.3 to plus 2.6. And the 25 and 75 percent quantiles are in this range. And this gives me a small number. So that's the default for quantile X. But I can also specify quantile X with specifically requesting a certain set of probabilities. So the quantile of X for probabilities of 10 percent, 20 percent, and 90 percent can be calculated in the same way, 10, 20, and 90 percent. So these are now the actual empirical quantiles of the distribution that I give it. That's the way you would calculate quantiles for a distribution of data values. Of course, you don't know whether your data values correspond to a normal distribution or to a t-distribution or a binomial or whatever statistical distribution the data is drawn from. In the best case, an underlying model for a distribution could be obtained from your exploratory data analysis. But this is the way you can apply empirically the quantiles to your data. Essentially what you do is you rank them and then you cut off the ranks at these cut points. So let me reproduce my plot here to recreate it. Now since my X values were taken from the same kind of normal distribution with the same mean and the same standard deviation, I can plot these quantile lines over my plot to get the 10, 20, and 90 percent quantiles. Here we go. So the 10 went wrong. So this is the 90 percent quantile of my empirical distribution, and this is the red line is the 90 percent quantile of the theoretical distribution. So they're close, but they don't exactly coincide. Why? Well, because my random numbers more or less conform to the shape of the normal distribution. So these are empirical quantiles. A good way to characterize data is box plots. That's another way of producing plots. So again, now we'll take more values, we'll take a thousand normal deviates with a mean of five and a standard deviation of 2.5, and box plot X. This is the box plot of this distribution. Many parameters, I thought I would have a little sketch of what these lines mean. So anyway, the black line in the middle is the median. The box characterizes the interquartile range. I believe the whiskers characterize the 10 and 90 percent interquartile quantiles, and in the default, outliers are explicitly shown as little circles so that you can see whether and how many data you have that lie far out of this distribution. Now if we have to be a little bit careful with box plots, they're quite useful to basically get an idea of the distribution and compare distributions of different data, but they can also obscure important structure. So if we, for example, if we have a bimodal distribution, i.e., a distribution that is composed of 100 values of a normal distribution centered on minus 2 and 100 of a normal distribution centered on 2, and we add them to each other. So the second time I run this command, I combine or concatenate the vectors for X with the new values. Now X is 200 data points long, 200 data points long, and the first 100 data points are centered on minus 2, and the second 100 are centered on 2. If I do a box plot of that, it's basically indistinguishable from what I did before. You might maybe say for normal distribution the inner value seems suspiciously large, but nothing to tell you that this is actually a bimodal distribution. If we want the actual values or the actual shape of the distribution, we can do a histogram, and that clearly shows us this is not a unimodal distribution. There are two peaks in here, and the box plot obscures that. So there's other ways than just box plots to calculate, to look at the similar data. A pretty good way to do this is the so-called violin plot. So the violin plot is like a box plot, but instead of drawing a box, it basically draws on both sides of the line a density curve, like a histogram that's put on its side that will show you where the distributions are. So in basic R there are no violin plots. We need the ggplot package to plot violin plots. So let me see if this works. I have it installed. If you need to install it, it'll probably download lots of stuff. Now ggplot has a very different, a syntax that's very different from basic R. First I convert my vector x into a data frame, and then I define that there's a basic plot, and then I add to the basic plot the violin plot geometry, and that's what I get. So this is a violin plot of our imodal distribution with ggplot. Lauren will tell you all about what this syntax is and what the pluses mean and how that works. You'll see that it's a quite different concept of how to get R to do things. Now if we plot more than one column with a box plot, the plots get plotted side by side. So if we do a box plot of two columns of LPS data, we get these two columns and the call names are automatically transferred into the plot window. This is a way to first draw the box plots of the standards, and then the box plot of the LPS, and we can put a separating line into the middle between these plots. So we have these two windows. Now if I extend this far to the right, how far do I need to go? This is maybe not a good example. There we go. Now we have all of them. So I can make a nice plot of all of my data, arrange it in different ways, label it in different ways, and so on. So as you notice, R doesn't just overwrite data values. It makes an intelligent guess about how much space it needs for its labels, and then adjusts whatever it prints into the plot accordingly. Are you just going through the comment that produces graph towards the 5, 2, and 14? So I'm creating a sequence of numbers here, 2, 4, 6, 8, 10, 12, and 14. These are the column indices of the control values, and I'm creating a second sequence of numbers of 3, 5, and so on, up to 15. So if I put that together, I have these indices here of column indices. And in this way, I order my box plots. So the first set of box plots are for all of the cell types, the control values. The second set of box plots are for all the values, the stimulated values, the elpa stimulated values. So this expression simply orders the columns in a particular way. And since by default, the first box plot is centered on the numerical value 1, 2, 3, 4, 5, 6, 7, then putting in an AB line at 7.5 draws me a nice separating line between the two sets. LPS, control LPS for all the different cell subsets, and then have a key to multiple ab lines. Sure. So let's just say here, we plot 2 to 15, and now I want ab lines after every cell type. So I want the ab lines positioned from starting at 2.5, and the last one would be 13.5i equals 2. So these would be the values for the ab line. Yeah, that's it. Oh, that's color them red. Color is not a graphical parameter. That's true. It's called invalid A and B specifically. That's also true. I forgot to put the V there. There we go. So it's very flexible. Once again, in producing our plots, you're not at all limited to what any kind of package offers. You have all the flexibility to completely customizing your plots and coming up with different plots on your own. I always thought maybe I should write an example of how to do a violin plot in base R using a function that's called density, which does density estimation and basically takes a distribution like a histogram and then draws a smooth line that approximates the curve. But maybe that would be more confusing than illuminating. So maybe just take my word for it. You could do it, and it's not even that hard. Very flexible. Sorry. I'm shifting the window so much. I've been asked to increase the font size even further for the benefit of those in the back row. I hope that those in the back row feel terribly benefited, and it's now actually readable to everyone. And I just have to drag things around a lot more. Now let's look more at plots. Let's explore some plot types and some lines. So there's a plotting reference script here which has an introduction to some types of plots and how to work with colors and what lines there are and how to use coordinates and how to place titles and legends and different types of plot symbols and so on. The whole file, there's a lot to think about plots, so the whole file is rather lengthy, but we'll take some highlights here. And it's often very useful to go through that and explore it some more at home. So we often use the normal distribution for sample data. This is what the normal distribution looks like in a simple plot. So here I create a sequence of values for my x-axis. I create function values with using d norm and then plot my x-axis against my function values. I put a label here and I put a label there and I specify a certain line width and a certain type. So that's what underlies a lot of the data we have here. So here's a plot of x and y values. This is just normal distribution of 200 y is these values cubed times the scaling factor plus some other random values. And let's look at what that looks like. So it's kind of, since I now have different values for x and y and I'm varying them, I'm getting in a scatter plot. So this is for x and y values the default way to plot things. This is a scatter plot of these values. By default the scatter plot is plotted with circles not with lines and the circles are empty not filled white so they overlap each other. That's the standard scatter plot. There's a fun type of plot which is called a rug plot. The rug plot has little hairs sticking out so you have a little rug that's added here. This visually allows to see what the density of the points is in one of these projections. So that's, if there's a lot of data points that obscure each other, a rug plot added to the graphics can give you a better indication of how these densities actually look like. By default the rug is done for the values that are plotted on the x axis. If I want the values for the y axis I can specify that with side equals two which puts it to the left hand side and I can also specify the color. So this is the rug plot for the y axis and the rug plot for the x axis. Bar plots are the kind of plots that we encounter most frequently in our types of seminars. Let's make some bar plots. We have 200 values for y and if we use the round function we can round these y values into integers. So these are the y values. The function table takes a set of elements and these could be characters or numbers or something and it counts the number of occurrences. So it gets a table of the kind of elements that I have in a vector. So a table of this looks like this. It tells me I have one value of minus three, five values of minus two, 90 values of zero, one value of four and so on. That's extremely useful to say count especially categorical variables. We use table a lot for that. So this now gives me a number of categories and a number of values which I can then cast into a bar plot and then that's this bar plot. Let's illustrate this in a different way. Let's make a bar plot of nucleotides in a gene, of random nucleotides. So how do we get random nucleotides? Let's assume the frequencies of our random nucleotides should be uniform. So all we need to do is we use sample and we sample from the vector a, c. Let's make it the longest gene. Let's say it's 693 bases long and of course we need to set, replace to true. So these are our 600 and whatever random a, c, t, g. So what's the distribution of a's, c's and t's and g's? So for that we simply ask for table of this, 172a, 177c, 178g and 166t. And if we bar plot that, we have this distribution here. And now notice we don't have 1, 2, 3, 4 but a, c, g, t because these were the categories. Now table already has these things and they're set as an attribute which is called dim names or dimension names here. Now if I wanted to have my own vector of some other values say 2648 and I bar plot this, I don't get any category names because none have been defined. If I define names and then bar plot, I get the names in my plot. So that's something you might wonder about. How do I get the names as the labels into the bar plot? Here is you have a vector of numbers that you want to plot and you define the category labels as names. If we produce the numbers with the table function, the names are already defined. If we do a bar plot of a two dimensional object, it automatically takes the column names as the names that it uses. Histograms are very versatile. The simplest way to do a histogram of a set of numbers is just to say, here's text. So this is the default histogram of 50 normally distributed values. But sometimes it's, or often it is very useful not to just use the default values but to explicitly specify the boundaries. So what R does, it makes usually very good guess about what the boundaries for the individual bars in the histogram should be. And you can apply different algorithms to determine the boundaries. But you can also specify that, for example, you would like five columns and not more or five column breaks by saying, histx breaks equals five. Then the algorithm will try to use five breaks for the histograms and it will usually honor this, except if the data is very far off and it can't do anything reasonable with it, then it will instead do something that's more reasonable. So now this is a histogram with five categories. Previously it gave us, I think, 10 or so. Excuse me, so does that refer to the gaps between the white scale? Five breaks. One, two, three, four, five. Five breaks in the data. Instead of saying five breaks, I can also give it a vector of values. And the vector of values should cover the range of data. And for example, I can then split things up into seven quantiles or whatever. So like AB line, the vectors of values would be explicit. So let's say this should go from minus three. And then the next one should be at zero. And then I want, right? So I'm not confined to having the same x-range. I can specify whatever numbers I want in whatever breaks I want. So here I have a very large set and here I have smaller sets. Like a rug plot in a scatter plot, I can use a strip chart plot on a histogram to give me the actual data values. And that's a very good thing because I'm not that much hiding data anymore. So you can see even though this range is from minus three to two, it actually only contains a single value which is much closer to two than it is to minus three. It's just the way that getting at five breaks was distributed over the interval of possible numbers. Now histogram has two behaviors. One is the side effect of plotting, but it also has an actual return value. And the return value are the parameters it used in constructing the plot. So if we call these parameters info and we calculate the same histogram, now info contains these values here. It tells me exactly where the breaks were set. And for example, I can use that to modify the breaks if I want. It tells me what the counts were. I can extract that and do some numerical computations with these counts. It tells me what the densities were. So basically this is some of values or the number of values divided by the counts for each category. It tells me where it placed the midpoints on the plot. It tells me what the name of my variable was. It tells me whether it used equidistant breaks and so on. So getting out these numbers can be useful. It may come up in one of the examples where we actually modify a histogram. So to access any of these, we do have to say info.dollar.nesting.com. Yep. Exactly. Info.dollar gives me these values. We can color the bars individually into different colors, say histogram color. And then generic plots of parameters like colors and the title of the main and the X labels and the Y labels that I can apply. So these are the colors that I defined. These are counts. This is a Greek letter sigma because the axis here is labeled in terms of standard deviations because I used a standard deviation normal distribution. How do I get a Greek letter sigma? Well, that can be a bit involved, but you can put explicit formulas on these labels. For more details, you'll need to find a tutorial with Google somehow, but the key here is to use the expression function, which essentially constructs formulas similar to the way it would be done with latex. So we can put not just simple text, but more explicit formulas and intricate labels into our plots. And here's a way to actually use the information and adding the individual counts to the plot. So we've just done this plot here. When I plotted it, I assigned it to the value to the variable h. So $h mids, for example, is now the midpoints of these breaks. And I can use these to plot the values because $h mids gives me the midpoint of each bar on the histogram, $h counts gives me the height of each bar on the plot. So this is the x and y coordinate on which I can place a little bit of text, i.e. the text of the counts. And the counts I can find in $h counts. And I adjust this a little bit to the right and to the bottom and use the same color. The essential command is text. Like a b line, or like we'll later encounter the command line, text also puts something on a plot. So in this case, the command text $h mids, $h counts, and so on gives me the actual values. X, y, text. So the value 6 is plotted at the level 6 at this x value. So this is the x and y value, but the contents of this has to be text. It could be something else, right? But I give it these numbers, and then these numbers are converted into text. So histogram is very, very flexible. So almost all plot components are aspects of text. Actually x, y, and text is also adjusted c. This adjusts the text a little bit to the top and to the right, otherwise it will fall right into the top. I think you can easily practice some more with that. If you try things and play with things, and you don't seem to be able to get the result that you want, ask me or ask our TAs when you get a chance, I'd like to spend a little time on QQ plots, quantile to quantile plots. Because that's something that I'd like to apply to our data. Remember we calculated the standard deviations for the individual columns, and we saw different values. I would be curious whether the standard deviations of our expression enrichments are normally distributed. So the difference is between these values. Do these correspond to a normal distribution, or is there something else going on? If it's just noise, I would expect them to correspond to a normal distribution. How can we tell the difference? So using QQ plots is one way to test whether something corresponds to, similarly distributed to a normal distribution. So let's again build some random values, normally deviated, and plot the function QQNorm. So if these would be random numbers, according to a normal distribution, in terms of theoretical quantiles, this is where we would find them. This is where we actually found them in practice. These lie on a straight line, and that tells me the distribution of the sample quantiles is the same as the theoretical quantiles for a normal distribution. Or in other words, there's nothing that distinguishes my sample quantiles from a normal distribution, which is expected, because I constructed them from a normal distribution. Now let's look at what this looks like if we take a t-distribution instead, so the t-distribution from t-tests. So like we did before, we create a sequence from minus 4 to 4 of small differences, and we build a first function which is normally distributed, and we build a second set of function values which are distributed according to the t-test. So this is the normal distribution. Now into this plot, I would like to plot my second distribution, so as an overlay. And the way I do this in the most general case is to use the lines command. So we've encountered AB line, we've encountered text to put new elements into the plot, there's ways to draw boxes, there's ways to draw polygons, but this is a way to draw lines. So if I want to draw another curve, I just plot the curve point by point from the lines. This simply is the command along the X values, the F2 values, it looks exactly as the plot values, only I don't have to specify the X and Y labels because these are given, I don't have to specify the density of tick marks or the size of anything, this draws only the line. And what's important to remember is it draws the lines into the last used coordinate system. So the absolute values of the ranges of the X and Y axis of the plot window will be taken from the last plot it produced. And this is why the lines that I plot here can be overlaid perfectly on the last plot. So this is the t-distribution. The t-distribution has fatter tails, so it's higher in the tails of the distribution than the normal distribution. Now if I want to label that and add a legend, which line is which, I can use the legend command, normal and t2. Things are a little squished here. So that I know which line color corresponds to what. You might have noticed if you look carefully that I can specify colors explicitly by the color names. And I think at some point we'll be talking more about colors. There are 656 recognized color names which are hard-coded into R, different grays and red and black and green and sea green and fire brick and Dodger blue, deep sky, dark violet, dark orange, chocolate, coral. Is there mauve? There's maroon. We don't have mauve. How strange. Do we have petrol? No petrol, but peach puff, papaya whip, pink and plum. So most of the useful colors are there. There's this nice theory that guys know red, green and blue and women know a lot more so may even be able to associate a difference between plum and purple and violet. So you can use them. Of course that means you need to remember them. The way I usually specify colors is by so-called hex codes, hex values. So a hexadecimal value is a numbering system that has 16 values per digit. So it goes from 0 to 9 and then continues into A, B, C, D, E, F to make 16. And if you have two of these numbers, 16 by 16, this gives 255 from 0 to 255 or 256 different values and that's how internally a byte of information is structured. So hex values from A to F often comes up when we look at machine codes. So the way that colors are specified in hex values, they're usually prepended with a hash mark. Now in this case, that hash mark does not mean common. And then six hexadecimal values where the first two specify the intensity of red. The second two specify the intensity of green. And the last two specify the intensity of blue. So red, green, blue. So this value, what is this color? Black. So the intensity of red, green, and blue are exactly zero. This is equal to red. So if I specify to plot something in red or I specify something as hex FF000, it's the same thing. Usually when I plot things with red, I find this too brilliant and garish and I tone it down to CC, it kind of makes it nicer look. I think, classy. More classy. Yeah, I'm not going to do this now. What would this be? Yellow. So full intensity, red, full intensity, green gives a yellow color. Full intensity, green, full intensity, blue, we call that cyan. Full intensity, red, full intensity, blue is called magenta. So we can get cyan, yellow, and magenta or red, green, and blue from the combinations. Now there's a shorthand for that. In R, we can also specify the colors for plots as numbers. So one is black, two is red, three, I believe, is green, four is blue, and so on. So you can just iterate through these numbers and then generate a certain limited set of numbers. That's often very quick and easy to do. For example, if you have a cluster and then you have cluster categories one to five and then you can just use the cluster categories to color your cluster elements. It just doesn't look nice. So plotting in these primary colors usually looks cheap. It looks cheap. So it's better to know more about how to actually specify numbers in a satisfying way. But that's what I did here. Color equals one and two. So color takes the vector for the legend, first color and second color and this here into the legend. Okay. Let's get back. Let's use QQplot to compare these normally distributed samples with the t-distributed samples. Remember, normal distribution was, we can take empirical values from Rt, so like our norm, d-norm, we have Rt, dt and so on for the t-distribution. So 100 t-distributed samples with two degrees of freedom in a QQ norm plot. This looks like this. So now in this example, you see something that significantly deviates from normality. In the center, it kind of correlates well, but you can see that the tails are anything but normal. So this is, this is what a distribution looks like that cannot be scaled into a normal distribution. There's significant deviation along the lines. We can add a line to this QQ line. Color is two. Actually, I will, let's plot it like that first. That places a line where the data points would fall if it, if my sample quantiles would correspond to a normal distribution. So it makes it very obvious how large the different sample values disagree with a normal distribution. If I, if I change this to X code, slightly less bright, I don't know, I like this better. Okay, we can also do QQ plots of sample against sample. So if you have a set of control values, you can then plot the sample quantiles against your control quantiles and determine from that whether there is a significant difference or an obvious difference between these distributions. So this is a part of data exploration. Your control values might correspond to the values that you would expect for your, your, some measurements that you do with a control cell line and then you have sample values and you can use QQ plot sample against sample to plot that. Now note that this is now not QQ norm. QQ norm is a QQ plot of a sample against the normal distribution. But the command QQ plot plots sample against sample. So let's define within that QQ plot X values as a normal distribution. Key values as taken from the T distribution and then let's do a QQ plot. And it kind of looks like the previous plot. More patchy because the, the samples are in less predictable intervals. But you can see that at the tails, we have the same big difference of effects. What do you, what do you mean in terms of the data? So I assume by default the number of quads, have you changed that default? Well, is that, first question is, is that? You can specify the type of quantile computation and it gets a bit involved there. So there are nine possible quantile algorithms and they're discussed in this, in this reference paper and they can be selected by type. And they're listed here for discontinuous sample and continuous sample quantiles and so on. So you can, you can customize it from that, customize the quantiles. If you only want a certain range though, only low values or only high values, then you simply use the quantiles to take a subset of your data points that you compare, right? So if you only want from the mean to the high points and, and ignore the low points, then, then you, then you simply subset your, your data values and, and use that. Okay, let's, I think this subset or data, how does QQ plot know that it is a subset? I mean, does it not try to find quantiles on the new subset? In other words, does it not assume that the new subset is a complete data set? And then try to find quantiles on that new data set, which is a subset of the original data? Would it matter? I don't think it would matter. As long as, as you compare the right quantiles with, with each other, it doesn't really matter where in the whole distributions they are in terms of plotting it. So let's try this. Think of columns of LPSDAP that you could compare with a QQ plot and then explore this and interpret the result. So perhaps, just as one suggestion, but you could think of different things is, ask if the QQ plots of, right. So if you have a cell line and, and an induced cell lines, what does the, does the induction look like something of random noise being added to it? Or is there some trend that you can discern? So, so try, try using a QQ plot on, on some of the LPS data. And think about what kind of question you would like to ask. So essentially the QQ plot allows you to, to quantitatively compare two distributions, are two distributions the same or are two distributions different? This is where your, your actual knowledge comes in, the biology. What kind of a distribution would be of interest from, from this data? What kind of a change in distributions could be of interest? And if you're stuck and you have no idea and you need help, put up a red, post it.