 So, if you ever want to play with some completely random, now it's not ever really, really random. There's a little algorithm that your computer uses depending on what millisecond it is at the moment and other forms. But if you just want to draw some completely random variables, I'm going to make this computer variable called random underscore variables and np.random.random. So, numpy has a sub-module called random and it has a bunch of distributions and from that I want a completely random one. As I said, it's never ever really random, but it's a random and I want 2,000 of those and let's plot them. Plot.figure, plt.figure, fixize, let's make it a bit smaller, let's make it 8x6, that's okay. plt.what type of plot I want? I want a histogram. I want random variables to be plotted and I want 50 bins, plot.show, semicolon, there we go. And now you can see there's no distribution to this, it's just completely normal. And also it would say a value in that little bunch, I got 43 or 42 of those and I've got so many of those and so many of those. So that's just completely random variables and the random variables come between 0 and 1, so in decimals from 0 to 1. So you can play with some data there if you wanted to. Now back to the Z and the T distributions. Now just think back again, I want you to cast your mind to your thought experiment. You have two groups of patients, group A and group B, group 1 and group 2, your control group, your experimental group. And you take a certain variable, again say for instance a white sub count, you do the mean of the one group, the mean of the other group. You subtract one from the other, that's the difference in your means. And now you want to know what's that difference significant? What was the probability of finding this value and more or this value and less? That's the question you have and you understand now that your difference that you found was just one of countless others. Now you've got to tell the computer, draw that beautiful graph for me and put my value somewhere and tell me what the area under the curve for that is. Well the computer can really draw two types of curves, well there's more, but let's stick to Z distribution and T distribution. With a Z distribution, you tell the computer what the mean and standard deviation is of the larger population if you know it or if you can confidently estimate it. Now that's very rare in healthcare research, usually we don't know what outside in the population of 7 billion, 6 billion people. We don't know what the real, in the population what the real parameter is for a variable. We just don't know, we have to take a wild guess at it. If you do know, you can use the Z distribution and I promised you I'll never show you equations, but just to take all your fancy but there's the equation for it. You can see the computer is going to use that, so it's going to use standard deviation, it's going to use mean for the population, not for your sample and it's going to do that. What I've asked for here is, this is just writing out the code for that and I'll show you what it looks like, there we go. No need to worry about the code there, but there is a Z distribution of knowing what the population, all human beings, their standard deviation and mean is for a certain variable. Much more common for us to have a T distribution, we know nothing about the population standard deviation in this case. So very clever people had to construct a different mathematical equation, once again to take all your fancy, there it is, the gamma function there, it is horrendous. Can you imagine that is what is used with the little bit of information you have, all you have is your sample values and it's mean and it's standard deviation and it's difference between the means, it's all you have. Statisticians remember when this was done, there were no computers, this was done by hand, someone sat down and thought this out. Now, T test, there's a rich history to the T test, wonderful little history of beer brewing, we won't get into that now. But look at this wonderful, fantastic mathematical equation to draw that little curve and that curve will look differently every time you take a new sample set when it really gets generated. Through the central limit theorem, it generates a nice little plot and it tells you where yours will fall. Don't worry about it here, this is, I'm saying again stats.t, not norm this time, but t, the random variable give me, now this works much differently, it requires this thing called degrees of freedom. Not to worry about that at all, there we go, we plot it and there we go, t distribution, again normally distributed and it will allow you to do your analysis. So it uses this very complicated, if you know the populations variables and if you don't know the populations variables, those are the two distributions that get drawn and the data, from only the data that you have and especially the t distribution and it then tells you, tells you where your value falls and what the probability was of finding a value as extreme as yours and that is really the beauty of statistics, that with very little data you can infer something about a population if you don't all falls to that equation that draws a curve for you and from that curve, by the central limit theorem it is guaranteed to be symmetrically distributed, bell shaped and from that your value will fall somewhere, you calculate the area under the curve and you have a p-value, absolutely fantastic.