 Good morning everyone. Today's session we're going to look at empirical now. I need to bring to our attention that this session we will do in detail when we do normal distribution. I just wanted to introduce it now since you still have the concept of the mean and the standard deviation from the last session that you would have attended. So now by the end of the session today we're going to cover topics looking at how we find the distribution of your data in terms of where your data falls with things, two standard deviation, one standard deviation and three standard deviation. You will notice on the screen right now we're using the population parameters but later on when we do the activities you will see that I'm using the sample parameter. So now you need to be aware that sometimes they will give you the population data and you do your analysis on your population data, sometimes they will give you an example and you're going to use the sample formulas. It's easy, we can just interchange there and there will be some activities. Feel free to pause the video at any point and do the activity on your own and then come back and reflect on the answers. What is empirical rule? The empirical rule is a statistical rule which declares that for a normal distribution all data will fall within three standard deviations of the mean. The empirical rule is most often used in statistics to anticipate final outcomes. After a standard deviation is calculated and before exact data can be connected this rule can be used as a rough estimate of the outcome of the data. The probability can be used meanwhile gathering appropriate data may be time consuming or even impossible to obtain. The empirical rule is also used as a rough way to test the distribution's normality. If too many data points fall outside of the three standard deviation boundaries this could suggest that the normal distribution is not normal or the distribution is not normal. When we dealt with histogram when we were looking at the description of the data or summarizing the data especially for numerical data we looked at symmetrical information and we also when we were looking at the shape of the data when we were looking at the analysis of the data when we spoke about when the mean and the median are equal we said the data is symmetrical and in chapter six that's when we talk about when the data is symmetrical we say it is normally distributed or it shows the signs of a normal distribution data. Later on in study unit six you also discuss the impact of the mean and the standard deviation on your normal distribution. I'm not going to touch a lot on that but at a high level when you change the values of the mean your graph will move horizontally from left to right and when you change the values of your standard deviation your graph your graph will become narrower or flatter and it talks more about the spread of your data. Within perical rule this can only be used if the data can be reasonably described by a normal graph so if your data is skewed or is not distributed correctly you've got at the tail ends you've got too many data and then at the middle you've got too many data then it doesn't become normal it's not you are unable to draw a perfect normal graph then you cannot use the empirical rule. If you are able to use the empirical rule you always use the word approximate because it's not exactly it is approximate so we always say approximately 68 percent of the data will fall within one standard deviation of the mean and 95 percent of the data will fall within two standard deviation away from the mean and 99.7 percent of the data will fall three standard deviation away from the mean and I'm going to show you how we calculate that. So this graph shows you how your 68 percent would look on your normal distribution so remember for a normal distribution it's normally distributed with the mean of zero and we have the negative side and the positive side hence the formula has the the mean minus one times the standard deviation and since we are using the population on this one when we're doing exercise we just replace the population with the standard deviation with the sample statistic okay population parameters with the sample statistics a 95 percent normal graph will look like this which represents two standard deviation from the mean and a three standard deviation from the mean will look like this. We are going to continue using the same survey data that we have the last time we met we calculated the mean and the standard deviation and if you don't know how to calculate them you can look at the previous videos especially on the analysis of data. Our mean we calculated it and it was 29.55 and our standard deviation was 7.53. Let's assume that the sample age of students who completed the survey is symmetrical with the mean of 29.55 and the standard deviation of 5.3. Now we are making assumption here we're not saying this data that we we have previously that I showed you is normally distributed I'm making an assumption that that follows a normal distribution because the last time we we did the analysis on the age it was not normally distributed so you cannot do empirical rule on that data but for today for the papers of showing you as an example we're going to use this information. So our mean is 29.55 and the standard deviation is 7.53 so we need to always identify the facts given in your statement so that it makes it easy or clear. Usually sometimes I draw pictures so that it makes it easy for me to remember things or it unpacks the question easier for me in a visual format. So we have our sample mean and our sample standard deviation and we can calculate our first one standard deviation away from the mean our first calculation we will use one standard deviation. If you notice our formula so the ones that we introduced to us the mean plus or minus one times the standard deviation the sigma. So now all we just need to do is just use the sample statistics that we have in this one times s and then substitute the values into the formula so we'll have the mean plus or minus one times the standard deviation remember the plus or minus means negative side and the positive side so we can split them always start with the negative side. So 29.55 minus 7.53 and 29.55 plus 7.53 and we get 22.02 and 37.08 which represents 68 percent of the data falls within those ages. For a two standard deviation we say the 29.55 minus 2 times 7.53 and 29.55 plus 2 times 7.53 and we get 14.49 and 44.61 which we will refer to 95 percent of the data falls within 14 and 14.49 and 44.61. For a three standard deviation we get 29.55 minus 3 times 7.53 and 29 plus 3 times 7.53 and we get 6.96 for the negative side and the positive side we get 52.14 which represents 99.7 percent of the data falls within those range. Let's look at another example. The static intuition for Beacom student is bell shaped with the mean of 25,400 and the standard deviation of 1,500. What percentage of Beacom students have a static intuition between 22,800 and 28,000? To answer this it means we need to calculate the ranges so that we are able to identify which range falls within a one standard deviation two standard deviation or three standard deviation this data falls within. So to do that we identify the facts given then we start calculating our first standard deviation and we find that it is 24,000 and 26,700. Then we calculate our second standard deviation and we find that it is 22,800 and 28,000 and we calculate our third standard deviation or our three standard deviation where we find that it was 21,500 and 29,300 and that will be our answer because it falls between 22,800 and 28,800 which is what the question is asking and if we know this is 68 percent this is 95 percent and this is 99.7 percent. Therefore the percentage there will be 95 percent of the student falls between those tuition fee. I have their tuition fee between 22,800 and 28,000. When we do the activity you can pause the video and do the activity on your own and come back and watch and check if the answers correspond. Okay question number one the mean of a distribution is 50 and the standard deviation is 6. Using the empirical rule find the percentage that will fall between 38 and 62. I'm going to give you time to answer this so what are we given? The mean of 50 and the standard deviation of 6. We first calculate our first standard deviation. One standard deviation of the data we substitute the values 50 minus 6 and 50 plus 6. The answer is 44 and 56 which is one standard deviation 68 percent of the data falls there. Calculate the two standard deviation and find that it is 38 and 62 which represents 95 percent of the data that falls within that. And the third standard deviation represents 99.7 percent of the data falls between 32 and 68. Answering the question which percentage falls between that and that and that is 95 percent of the data falls between 38 and 62. Question number two a sample of our new wages of employees who work in a restaurant in a large city has a mean of 75.02 and standard deviation of 2.09. We have a, b and c all of them say is using empirical rule find the range which at least 68 percent 95 percent and 99.7 percent of the data will fall. Giving you a minute you can do number a the answer I get remember you can pause the video before I give you the answer and answer it yourself and then continue the video so that you can see the answer. For one standard deviation we find that it falls within 72.93 and 77.11. For a two standard deviation a two standard deviation this should be two a two standard deviation will fall between and a three standard deviation a three standard deviation will be between 68.75 and 81.29. Let's look at another question and with this kind of a question you're stuck scratching your head panicking and getting frustrated in terms of what is it that they need you to do is if you have numerical data they told you that it is sample data you can calculate the mean so calculate the mean and let's check if your mean is the same so remember the mean is the sum of your values divided by how many they are and we get 3.71. Now I'm going to give you time to calculate the standard deviation you can pause the video and calculate the sample standard deviation and here is the answer for the sample standard deviation. The sample standard deviation is given by the square root of your variance which is the square root of your sum of your observation minus the mean squared divided by n minus 1. Substitute the values calculate solve and you get the standard deviation of 2.81. Then you can calculate or before I show you that you then can calculate or find the range for 68 95 and 97 99.7 by calculating one standard deviation two standard deviation and three standard deviation so for one standard deviation we'll represent 68 percent of the data and two standard deviation and 95 percent of the data and a three standard deviation will represent 99.7 percent of the data and that note that concludes what I needed to share with you for today. The next time I meet you please remember to go to currently to join the sessions and look at the days that are available and book those sessions so that you don't miss out. The next time we meet we will be discussing the box plot so don't forget that. Let me also introduce those who don't know or who are watching this video for the first time know that I'm representing Pambilia analytics where we're building community of individual who have analytical skills and need to become more data literate or data driven leaders in within any sector. We're trying to close the gaps when it comes to data data analytics as well as numerous. We offer a range of services including consulting as well as our flagship skills development in terms of literacy and data literacy and programming. We offer consulting services in terms of data analytics data science research and market research consulting. And now our flagship skills and development we offer two types of training that we normally do it's instructor let which is the same as what we have whether you can attend in person or in class or via the facebook live youtube live sessions but if you want consulting of one-on-one sessions we charge a hundred and fifty now currently running on a special and if you want you interested in learning data analytics and research literacies and so on you can also look at our self-led online training and look at guided learning there. Otherwise you can find us through youtube please remember to subscribe so that you can get notification when new videos are uploaded as well as you can join I will really appreciate if you can subscribe watch the recordings like comment and share the recording if you need the recording for the current sessions that we are currently running the free sessions that we are running make sure that you go to youtube and join as a member not just subscribe but join as a member we've got different packs for different membership the first two membership are for those who support me and support the content that we put out there but if you need access to the recordings you will have to subscribe these are monthly subscription check carefully the packs for those membership before you sign in and start paying because it's very important to know which membership you want to subscribe to and what type of packs are on there as well if you want to get hold of us we are accessible via email whatsapp and our website otherwise you can go through our youtube channel thank you very much for joining the session and being part of the discussion enjoy the rest of the day