 Statistics and Excel. Normal Distribution Calories Data Example. Got data? Let's get stuck into it with Statistics and Excel. You're not required to, but if you have access to OneNote, we're in the icon. Left-hand side, OneNote presentation, 1632, Normal Distribution Calories Example tab. We're also uploading transcripts to OneNote so that you can go to the View tab, Immersive Reader Tool, change the language if you so choose being able to then either read or listen to the transcript in multiple different languages tying into the video presentation. First a word from our sponsor. Yeah actually we're sponsoring ourselves on this one because apparently the merchandisers they don't want to be seen with us, but but that's okay whatever because our merchandise is better than their stupid stuff anyways. Like our Accounting Rocks product line. If you're not crunching cords using Excel, you're doing it wrong. A must-have product because the fact as everyone knows of accounting being one of the highest forms of artistic expression means accountants have a requirement the obligation a duty to share the tools necessary to properly channel the creative muse and the muse she rarely speaks more clearly than through the beautiful symmetry of spreadsheets. So get the shirt because the creative muse she could use a new pair of shoes. If you would like a commercial free experience consider subscribing to our website at accounting instruction.com or accounting instruction.thinkific.com with the timestamps. One note desktop version here in prior presentations we've been looking at how we can represent present demonstrate different data sets using both mathematical calculations like the mean or average the quartiles the median the mode as well as with pictorial representations like the box and whiskers like the histogram the histogram being the primary tool we typically envision when thinking about the spread of the data and we can use terms to describe the spread of the data on a histogram like it's skewed to the left the data is skewed to the right. We then thought about lines and curves that can be represented with formulas that can approximate different data sets depending on the circumstances if we can approximate a data set with a line or curve that has a formula related to it that would be great because it gives us more predictive power over whatever the data set is representing. We looked at different lines curves that might have a formula related to it and could represent things in actual nature in real life including the uniform distribution binomial distribution Poisson distribution exponential distribution now continuing on with one of the most famous of course of them all the normal distribution or the bell curve remembering not all data set will conform to any of these distributions it could be a data set that's too chaotic to conform to a simple line or curve that has an equation for it. However we have observed in nature that many times things do roughly conform to these patterns and if we can find one that does then the the formula and the curve can be useful with the bell curve we've thought about many things in nature like heights and weights and so on often conform to a bell curve type shape so what we typically want to do is think about what we're looking at does the thing we're looking at conform to one of these distributions we might test out the data to see if that is indeed the case and then we might plot out the curve to give us more predictive power this time we're going to be looking at calories now if you're looking at calorie counts if we were kind of tracking our calorie counts for example you would expect that it would follow some kind of normal distribution intuitively because you would think that my calorie count would have to be somewhere pretty steady and not be going too much on the high end or low and at any given time given the fact that our weight has to be maintaining somewhat constant so once we have our data we can sort it and put a table around it which we'll do in excel we can sort it by the date or we can sort it by low to high high to low this one currently being sorted from high to low so what's going to be different about this data set than some of the examples that we have had in the past are that because the calories is a pretty small unit of measure then we're going to run into this issue of should we be putting the calorie counts into buckets so that we can better compare our actual calorie count to what we're going to plot when we plot the calorie counts out so this is going to be a little bit different in that way to what we've seen before with the plotting of the the bell curve information for the calories so our data is on the left we're going to start with the normal kind of stuff that we do does this conform to a bell curve well let's do some of our normal calculations let's take the mean or average this would be the formula in excel to do so it's at 2189 that would be summing all of the data up and divided by the number of count let's take the standard deviation then this would be the formula for excel so that helps us with the spread 815 let's take the median that would be the one where we sort all the data we pick the one in the middle and we're picking the 2062 because this number is fairly close to the mean the closer it is to the mean the more likely that it might be conforming to a bell curve so that's an indication to us the bell curve might be useful here's the formula for the median and then the mode the mode is 10776 now this one is a little further off than than the mean but it's still fairly close so we're thinking that possibly a bell curve could still be something that would approximate this data set this would be the mode remember that the mode is the one where it's going to have the number appear multiple times and might be more or less useful depending on the type of data that we're looking at if we're looking at data such as this data which has the unit of measures pretty small so you might you might not have the mode where multiple multiple numbers show up that are exactly the same as you would if you had a smaller unit of measure that you were that you were looking at then it would be more likely that the mode would be representing that kind of middle point now the next thing we might do is plot this information into a histogram to see if it looks like a bell curve so here's a histogram of the data just taking this data set putting it into a histogram in excel excel creating the buckets from 0 to 730 730 to 740 calories and so on the middle point is is here which would be some the mean we recall was 2189 so it looks like it's kind of conforming to a bell curve remember that the last example that we looked at because we had a whole lot of data points we were looking at heights then the data looked a lot more bell shaped but if you don't have as many data points then it's not going to be as bell shaped but we would still expect that it would look like clumped in the middle and then moving out towards the sides here as as the look and shape of something like a histogram which might give us more confidence that this could be conforming to a bell curve scenario so that we can plot a bell curve so let's plot the bell curve we're going to say all right let's take our x's let's take our p of x's that we will then calculate the question is where should we start with our x's so the x's we're talking about calories now so you would think you can't have zero calories because you they'd have to be just positive calories you cannot have negative calories however uh in theory remember that the bell curve goes in if uh indefinitely infinitely on to the left and the right so so let's take our four standard deviations remembering that if i'm trying to plot all the data so i have a pretty nice bell curve that has all the information in it so i could see the tails of the curve four standard deviations would be the vast majority of the data so i can do that by taking the standard deviation 815 times four and then i'm going to subtract that from the middle point or mean two one eight nine to get to the uh 1071 uh it's rounded here so it's not exact let me do that again 815 times four minus the two one eight nine about uh 1069 now that's a negative number so you might say well why don't i just stop it at zero because you're good but sometimes it's nice to to try to plot it all the way down in the negative so you see the whole shape of the bell uh and it can give you another verification by the percentages adding up possibly to 100 percent so we'll keep it for now just to demonstrate that then if i do this the other way 815 times four standard deviations plus two one eight nine we get to the high point of the five four four eight on the calories so my count then over here if i'm going to say all right let's count this thing out we're going uh uh negative x's and then we're going to go all the way down to the positive now i've cut some of it out here we'll have the whole thing in excel but i'm just going to then it goes into the positive here's the positive calories and so on then we can do our p of x calculation this would be the norm dot dist or actually notice that this x here uh we did this with a formula that will demonstrate in excel as well because what we want to do is go from negative 1069 up to positive five four four eight now you could do that by putting negative 1069 negative 1068 highlighting those two and having excel see the sequence as you go down but you'd have to go down 5488 times so it might be faster to use the formula of sequence and what we want is the sum of those two plus one in terms of how many columns do we want we want or rows not columns five four four eight plus 1069 we want six five one seven uh columns here so that would be uh six five one seven plus one columns and then skipping the start that's why we have two commas and then the starting point is going to be that 1069 then it'll plot all of these x's for us without us having to kind of drag it down once we have that we can then do our norm dot dist now it looks funny because calories are negative up top but remember we kept the negatives uh for the for the examples of the curve of a normal distribution so that we can get the full four standard deviations on the low side norm dot dist we're taking the mean and the standard deviation which of course would be this number and this number in our function or formula and then we've got does it should it be cumulative it's going to be not cumulative or zero so then if we do this all the way down you can see that it's plotting these out now if i get into the positive numbers down here so now we've got that the likelihood of our data set being at 126 calories is 0.0020 so note when we're looking at this we're getting pretty small numbers in part due to the fact that that our calories are a pretty small unit of measurement so so that means if i'm looking at just this one calorie point of 166 then the percent is pretty low it's likely that we're going to be asking questions about ranges like what's the likelihood of being 167 or below or something like that which you would be tempted to sum it all up but you'd have to use another formula because we're talking about area under the curve although because this is much more detailed because we're using a pretty fine detailed approach here you get a pretty good approximation if you were just to sum up sum up the whole thing we'll talk more about that later though so now we we want to be out so here's where an issue comes up we want to be able to compare this to the actual count now the ways we've done that in the past as we said okay well i can take my actual count i can count all of the all of the numbers over here using account formula of this how many how many data points do we have with a count function and it comes out to 457 so we have 457 far less data points than the last example we had where we had like 4000 data points so i could say i'm going to take this number times the 457 but you're going to get you're going to get these really small fractions of the number because we have such small units of measurement here or last time what we did is we we grouped all of our all of our actual data over here into bends or buckets based on the calorie counts but that's still not going to work quite as well this time because because there's there's such fine data over here that we're just going to have a bunch of zero zero zero and then every once in a while we'll have one that landed into a bucket and then a bunch of zeros because again we have so many small units of of the calorie count so for example here is us taking the percent times times the count so remember that the count was what was the count 457 so if i go down here even to one of the larger percent it's still a quite a small number if i take that 457 i think it was times and i'll multiply it times this one which is 0.00021 if i put it in decimal format then you get this really small number and this small number is isn't going to match any actual data count because of course the data count is just going to be a one you can't have less than one of the data so when i match that up over to my actual frequency so this is the actual frequency meaning we're looking at these in terms of buckets and this would be counting how many times in our actual data set we had a count that was above 126 but below and including 127 and you get a bunch of zero zero zero zero for all of them and then every once in a while you're going to have a one over here in our frequency so it's going to be difficult to compare those out last time when this when we had different examples in the past when we were talking about heights for example or weight then we then this frequency count kind of lined up fairly nicely because we didn't have such small units of measurement and we were able to then take the percent of the total and give a comparison of the percent of the total and the p of x over here which we're not as able to do in this data set the nice thing on this side of being able to take each calorie count in excel even though it's quite a long set of data is that when you add this up it adds up to basically pretty close to 100 percent which is kind of a nice double check so what we would like to do though is we'd like to say okay well I'd like to kind of group this stuff together so that these p of x's would be for the range of like anything less than a zero and then everything from zero to four hundred for example I would like to kind of sum that up into a group and then when I do my frequency count for my actual counts we can put them into bins or buckets there's a couple different ways we can do that when we're looking at our actual data that we could just use this to kind of sum up into our buckets sum these up and remember that normally we don't like to do that here because we're talking about the area under the curve but because we're using such fine data we get to a pretty close number or we can use a formula which will be the between formula so that we can we can use the norm.dist in the upper range minus the lower range so let me show you what I mean we'll do a couple of these over here we're going to say that we have the x's and the x's are going to go up by 400 so these are going to be basically our buckets so we got it going from zero up to 400 and then 400 to 800 for and then 800 to 1200 so that we have our buckets instead of just one calorie at a time so then the actual frequency if I was to do my frequency calculation then now I can do my frequency and have buckets that are much larger so for example this one would be saying everything that's greater than zero up to and including 400 in our actual count over here of our actual data and we had five of those this one would be saying everything over 400 up to and including 800 of the actual count we got 14 of those we're doing this with the frequency calculation the data array being our table on the left and in the bends being these bends and it gives us our buckets which is nice so now we have actual numbers in here as opposed to if we did it one calorie at a time when we did our frequency we had almost zero numbers with one like a one showing up every once in a while although I don't have a lot of the data in here because I didn't want to copy it all over but we do this in Excel if you want to check it out the sum adds up to 157 which is a double check that these bends are adding up because that's what our actual count was of our actual number of data now we can take the percentage of total and it gives us something that may actually be relevant meaning I could say okay this is going to be five divided by the total divided by the 457 and that gives us 1.09 and this is the same thing 14 divided by the total 14 divided by 457 is 3.06 if I move the decimal two places over so now the question is well if I can if I can do the same thing for my p of x information then I have something that's kind of comparable and we're going to do that with a formula that looks something like this and then we'll do it another way as well this is a sum if formula and it's saying sum if and we're picking up the sum range which in this case we're looking at the p of x is over here so we want the p of x is comma the criteria so we want we're going to pick up the criteria range which is going to be this information and and then we're picking up the criteria it's got to be less than or equal to this number over here it's got to be less than or equal to in this case the zero so that's so I'm summing that up so so it's basically saying pick this up if the x is less than zero sum this up so it's summing everything up down to when the x is zero and then this one is summing everything up if I did a similar formula it would look a little bit different but this one would be summing everything up if x is greater than zero up to and including 400 and we would be picking up this column if this area the x was between zero and 400 now remember usually that's just an approximation but because we have such fine units we come up with a pretty close approximation so that's one method that we can use and now we have comparable numbers I'm like okay there's 800 actual data came out to 14.06 versus the 3% that if we use this technique of summing it up and I can see if this adds up to 100 it adds up pretty close to 100 percent on this data which might give another indication that that's given me a pretty accurate calculation and then I can take a I can also do it this way I can take my p of x count times the count meaning now that I have these numbers I can multiply them times the number of counts that we had so I can take this number of 0.036 0.0036 0.0036 times the count of 457 and then we get around two and now I can compare this two to this five so these are the two ways that we can kind of compare our data sets right I can convert I can convert the norm dot this to to a count by multiplying at times the count and comparing it to our actual frequency count or I can convert our actual frequency count to a percent to compare that to the percent the only difference here now being that we put these items into buckets so that we're not talking about one calorie at a time the other way you can do this buckets thing which is which is more accurate uh would be to say I'm going to have a lower buck a lower bin and an upper bin from 0 up by 400 400 to 800 800 to 1200 and so on and so forth and then use our norm dot dist formula to calculate this one's calculating the first one which is just norm dot dist to for the for for the zero but for the second one you'd have to have norm dot dist of the upper part minus the norm dot dist for the lower part so it would look like this you'd have norm dot dist of the x of the higher x and then the mean the standard deviation it needs to be cumulative that's the one minus the norm dot dist for the lower part the zero if you're talking about the second one and that'll give you that in between this should be more accurate than what we did up top summing but because we we did such a fine uh we have such a fine unit of measure they're pretty close either way that you do it so we get the 0.36 to 1.04 the three the the 6.82 the 0.36 to 1.04 the three the 6.82 so they're pretty close either way this one comes out you can see if I add them up to exactly a hundred percent this one came up to 99.99 okay let's go on to our graphs then so once we have this then we can kind of grab them together we can put both of these on the same graph this is graphing I believe the actual percent column and the p of x column so we're looking at our actual data and the the the the function data on one graph and you can see they line up fairly close giving us further indication that a bell curve might be an appropriate tool to use to approximate this data uh this one this one is a a I believe this is a bar graph of this data over here probably should have labeled it better this data over here well we picked up all of this right now this this was a long column because remember if we did it one calorie at a time that we had a whole lot of we had 106,517 you know calories in total so if I graph that even with a line graph then you get something that looks almost like a smooth curve right it doesn't look like this jagged thing over here or it doesn't look like this this jagged this jagged thing here because we have such fine data that we're using not this isn't the actual data this is the curve that we're plotting but we're plotting it on a calorie by calorie basis and that's one of the reasons again that if you when you when you think about the area under the curve and and you think about the integral calculus of it if because you have such the fine so fine of the lines could be the reason why that if we sum up you know we sum everything up this way that that we come up to something very close to uh if we did it with a calculation of uh the the norm dot dist so there is that one