 good morning so today we are dealing with normality okay so normality testing how to do a normality testing so before that we should know what is normality what is a normal curve okay so why we are doing normality testing is because which test to be applied on the data we collected is decided by whether the data is following a normal distribution or not a normal distribution that is whether it is following normality or not so based upon we have to choose a parametric test or a non-parametric test which will be dealing later so the testing of normality is different thing all softwares will be doing the testing we don't do it normally but which test to be applied and what is normality the concept we should understand better so the normality means the data set if we are taking blood pressure of 100 people and we are putting into x and y axis so this is like a compilation of points of blood pressure okay so if we put 100 people it will come like this few people's value will be this side few people value will be this side and majority will be this side so this will be the measure of central tendency this will be the dispersion so there is one theorem okay that theorem is known as central limit theorem central limit theorem says that as a sample size is reaching up to 100 not 100,000 the data or the sample will be normally distributed so if you are taking a sample of thousand thousand people and we are checking any data we are taking thousand people and we are checking blood pressure or height or weight of that particular thousand people it will follow this curve this curve is individual values connected so it has some peculiar characteristics these characteristics okay so if it is towards thousand actually need not to be thousand if the sample is very high some 200 300 400 it will follow the normality okay so little thing a little about his history normality or normal curve it's also known as Bell's curve because it looks like a bell inverted bell and it is given by a scientist known as goes g o u s so it is also known as Gaussian distribution it is a compilation of measure of central tendency and measure of dispersion yesterday we learned this is the measure of central tendency and this is dispersion it is according to the central limit theorem if the data is sample is very high it will follow these characteristics if it is not following these characteristics we can say that the data is not no not following normality so we should go for a different testing okay so this is the basic about normality and we'll see the characteristics okay so if the data is following normality what happens is we can calculate the mean and standard deviation of any data okay so if the data is following normality mean with one standard deviation will cover sixty eight point four percentage of the population mean with stood as standard deviation will cover ninety five point three and mean with three standard deviation will cover ninety nine point seven I'll explain you suppose thousand people are five hundred people are mean is the measure of we are measuring the weight of a person okay so the mean weight is 60 kilogram and standard deviation is 10 kilogram okay just an example for around thousand people we have thousand people n is equal to thousand so we want to see that whether it is following normality or not so we calculated mean and standard deviation so now we should apply this thing okay so mean plus or minus one standard deviation that is 60 plus 10 to 60 minus 10 that is 50 to 70 or 70 to 50 same thing mean plus so minus two standard deviation that again we get like 40 to 80 mean plus so minus three standard deviation so we get 30 to 90 so the weight of a sixty eight point four percentage that is six eighty four people out of thousand will be within the range of 50 to 70 kilogram okay it should be within the range of 50 to 70 kilogram that is this one sixty eight point four percentage so it is like above the mean there will be thirty four point two below the mean thirty four point two okay so next is mean plus or minus two standard deviation that is ninety five point four percentage or three percentage should be within forty to eighty that is nine fifty three people out of thousand will be having or should be having kilogram weight between forty to eighty kilogram okay and almost all the participant that is ninety nine point seven that is nine nine seven out of thousand should be having within thirty and ninety so three people will be having values greater than ninety or less than thirty so we need to calculate mean and standard deviation we need to apply this we should see whether six eighty four people are coming within one standard deviation ninety five point four whether it is coming within two standard deviation that is above the mean and below the mean two standard deviation that is above mean sixty plus two standard deviation that is eighty okay sixty minus two to two standard deviation that is forty okay so above and below mean it should be within it should have a ninety five point three percentage and the three standard deviation that is ninety nine point seven percentage so if this follow this maths or this characteristics we can say that it is following normality so for the normal distribution the area of curve will be one mean will be zero and standard deviation will be one this is some common characteristics and mean and median and mode will coincide okay so this will be the mean median and mode okay so how the mean will be zero is when we standardize to this curve all the values above the mean and all the values below the mean will get cancelled each other actually mean will never be zero but when we standardized it okay so when we standardized it means when we put values just standard values that is if we have ten twenty thirty forty fifty so when we standardized it we put plus one plus two plus three plus four plus five like that and the similar one here minus one minus two minus three minus four minus five and like that when we add all these so we get zero that's how it is coming mean zero actually it will not be zero in a normal scenario when we standardized it so when we should do standardization is a different thing we don't do standardization in a normal case but it is a characteristics of a normal curve okay mean and median mode will coincide at the center because it will be almost same in one standard deviation it will have sixty eight point four percentage ninety five point three will be having two standard deviation and ninety nine seven point will be having three standard deviation whether it follow these things we can say that the data is following normality