 So now that we've been talking about some of these descriptive statistics, you know, uh, the Caritosis the skewness mean You know one of the things that you almost commonly have to deal with when you're dealing with data is this idea of Whether or not your data is normal the entire approach to that is if we're kind of looking at this nice old graph here You know we want to be on a very simple normal distribution bell curve that models a traditional bell curve But if we have something like a high kurtosis or very negative kurtosis, or we go With skews again We're no longer normally distributed. And so as a result Different statistical tests have to be done. So how do I go about doing that? How do I check for normality? Well, there are multiple steps to it. It's very Some of these can be just very straightforward If you think about it, so the first one obviously is just check a histogram of your data if I came in and Simply just plotted a histogram of our generated data You know, okay. I balling this it looks normally distributed or somewhat normally distributed Everything seems to be kind of going up and down in the right way Well, the reason why is again if I just eyeball this without doing any extra tests and whatnot if I Generated data where 17 was the midpoint, you know, again, it's now Hanging heavily to the right. And so as a result just from eyeballing this. I don't need to do any other statistical tests I don't need to you know learn any math do any math just from that right there I can go. Oh, well, this is not this is normally distributed Same thing could happen on the right side or the left side Where it's now on the 15 you can see. Oh, well, you know, it's all hanging out over on the 5 That's not normally distributed moving on but maybe you're running into issues where The data could be, you know, alright, you know, yeah, I bought I now I put mine at the middle point of a 12 Okay, well, it might be it may be slightly off But I can't quite tell and that's where again you would start to look at the kurtosis So again, that's where we're using sci-pi to just check out Not the kurtosis the skew skew of my data and the further that this goes from X All right. Well now we're starting to kind of bleed into bad territory, you know once again if it was a 10 10 Tends very close to zero Here even though it's slightly off. Okay, again, you're not gonna you're not gonna get perfect data So it's ever so slightly moving Very close to zero, but as you can guess if I move to something like 17 Again, we see it's very far away. So that same kind of approach can happen with my 12 Now the big thing is with this We're only talking about skewness, right? We're only Talking about how it flows from left to right The next approach is that you could use some statistical tests that also take in kurtosis So the vertical if you again think it's the horizontal plane the vertical plane of our data And I'm going to butcher these names. So But the Colomo Gorov Smirnoff, I know that word And the Shapiro Wilk tests each one of these is going to effectively look at the data with both the kurtosis and the skewness in tact and Determine whether or not your data is normal how it does that is by saying well if that test is not significant if it's if its p-value is Below some threshold it is not significant and these are pretty straightforward to run into so if I were to May unpack some of those data's I'll start with the Shapiro Wilks So the Shapiro Wilk has a w and a p-value that are returned when we're working with the function so stats dot Shapiro Shapiro x And we'll go ahead and Shapiro Wilk W is gonna be That and then the p-value equals that Format dot format W p Okay, so just to at least see this again We always like to test to make sure our data is doing what it should and so if we run this we see oh whether This is my Shapiro Wilks. There's my W value for that and you can notice that at the middle point of a 12 We're kind of you know, we're teetering, you know That's getting into a whole another argument of whether or not this would be considered significant or not That is for other people to deal with I'm not gonna put that in you know I'm not gonna make a decision with that but that's where you know I would make an assessment on this about whether or not my data was normally distributed If say for example, I went with a 10 again This is the normally normally distribute the perfect normal distribution You can see that in this situation It is not significant and so it is normal if However, I went with a higher number in this case you can notice well 17 and even though it says a One here. It's actually one point seven times ten to the negative six power. So it's Incredibly low. Oh, okay. Well the same thing on the five approach and we can see here it is well below our p-value and or well below the 0.0 0.05 Sort of significance test that everyone works off of so oh well this data is not normal Okay, well we have another approach as well the again. I will butcher it the Clomagorov smirnoff. I'm gonna call it a case smear. No, I wouldn't do that But it has in its case a d-value So the same approach now. I think sci-pi Developers were on the right track. They kind of knew what was going on here and even they are like I can't pronounce that or that's too Much to type so we're gonna call it ks Test much easier Now the thing about ks test is one you do have to pass in the data that you're working with and You also have to pass in a string parameter of what you're testing for in this case. We're testing for normality So I'll come in and I'll print those out. So Clomagorov Smirnoff Let's see there the d-value is gonna go in the zero spot and the p will be here So again, we'll go ahead and just run this as normal now the big thing that I do like to point out as you can see Oh, well, it's got some data going on Now Realistically speaking though, this is where it depends on what kind of data you're working on because again if we look at our 10 middle point right the 10 middle point is What everything else has said is normally distributed is the histogram looks fine The skewness is very close to zero But if you were to run the smirnoff the ks smir Analysis you're gonna see it's kind of very very low p-value. And so This is sort of one of those times where again, there are multiple ways to check for Normality all of those flags do not technically have to be Set and I know that there's probably some statistician out there who now wants to know where I live So he can hit me or she can hit me they can hit me But the big idea here is this is why you're running through multiple tests Shapiro Wilk says that we are a normally distributed set of data Keshmir does not so you gotta kind of play with it and this is where you You really need to make those assessments. It's not a perfect science It's making analysis and then determining what you can from that data The one thing I would say is if you are hitting these large p-values Most likely that is not the statistical test that you should be doing for your data So in our case, we're just dealing with a bunch of numbers getting generated Maybe the Keshmir nov test is not what we're looking for but either way Here's a way you can check for normality