 All right, welcome back to our series on more traditional statistical inference. And in this video, we're going to talk about the normal distribution and how we can fit the normal distribution to our existing data. And for demonstration purposes, we're going to use a version of the card draw data set that we collected in the previous video on significance. All right, so we are here in Google Colab. We're using these libraries up here. So we've got NumPy, pandas, bot9, and scipy stats. And we're going to use the scipy stats library to generate and fit normal distributions. The documentation can be found here and also linked in the pages. And in particular, we're going to use the stats.norm.pdf to fit the normal distribution to the data. So I copied down some code from the significance level test where we drew six red cards in a row. And I am just going to multiply it by 10 so that we have a sufficiently large data set. You've got 60 data points then. And then we conducted a binomial test calculated the proportion and converted that into a data frame. And so to fit a normal distribution, we need to know two values. We need to know the mean and the standard error. These are the two main parameters that control the shape of a normal distribution. So we always need to find them. I'm going to say the mean r for red is just p hat df d hat dot mean. And then likewise the se r for red is p hat df p hat dot std. I made a typo right there. And then to show these we can print the mean is just mean r. We can print the standard error and go ahead and run that code. We can see that the average p hat was point five. The standard error is point oh six. These are proportions so this makes sense when we run a binomial distribution we expect it to be centered around whatever our null hypothesis assumption is, which in this case was point five. And then visualize this normal distribution we need to create new data that fits this fits the normal distribution to our existing data. And so we say p hat data frame, a new column called x underscore pdf. This is the x variables of our normal distribution. And so to create them, they're just evenly spaced variables so we say np numpy dot lin space. And in this case you give it the first value, the second value, and how many data points you want. And this will give us 1000 evenly spaced data points between zero and one. And then we also need a y variable. So why PDF. And this needs to be p hat df. And this is where we actually fit the normal distribution, we say stats dot norm dot PDF. And then a give it the x values that we already created x underscore pdf. We give it the location, which is the mean value, and we give it the scale, which is the standard error value. So if we just run as expected, we can now look at what p hat looks like if we just, you know, print the first five rows. We can see that we've got our proportions here and then we've got our x values that run from zero to 1000 or zero to one with 1000 in between. And then we have PDF values that are associated with these x values, but follow the same shape as p hat. So then we can go ahead and visualize this. And so we'll do gg plot with our p hat data frame. And the first thing that we're going to plot is a dot plot so we'll say genome dot plot. aes x equals p hat. And dot size is 0.25. So I'll go ahead and run this to give you a look about what this looks like. So we've got a fairly normal ask distribution of our data points. We can add the actual normal distribution onto it. We can add a line plot, say genome line aes. And our x value is just that x PDF. And our y value is the y PDF. And then we can say color equals red. And I'm going to add a size equal to one, make it show up a little better. So here, that should be x. And so now we can see how this normal distribution looks with our actual data. So we've got our actual data and these black dots. And then we've got this normal distribution. And what has been done with this y stats norm PDF, which created y PDF is we specified the center of the normal distribution to be centered on the mean value of our real data. So this is the standard error, the spread of this normal distribution to be equal to our sampling distribution. And therefore we can see what this idealized normal distribution would look like. And therefore can look at where we're maybe over estimating normal or it's less normal, and where it's possibly more normally distributed.