 Statistics and Excel. Histograms with car-related data. Got data? Let's get stuck into it with statistics and Excel. You're not required to, but if you have access to the OneNote workbook, we're in the icon left-hand side. OneNote presentation 1065. Histogram with car-related data tab. We're also uploading trends. First, a word from our sponsor. Well, actually these are just items that we picked from the YouTube shopping affiliate program, but that's actually good for you because these aren't things that were just given to us from some large corporation which we don't even use in exchange for us selling them to you. These are things that we actually researched, purchased, and used ourselves. Bayer Dynamic? Not sure if I said that right, but this is the DT770 Pro 250 OHM Studio Reference Closed Back Headphones. I wear headphones basically every day for a large part of the day. They are important to me. Therefore, I've gone through many different kinds of headphones. I've had these for some time and they've worked quite well. They fit over my ears, but I'm still able to put my glasses on under the headphones. The headphones not pinching too tight on the glasses to give me a headache, which is nice. The quality of the patting is good and it has lasted for some time. I've had these for some time now and they haven't gotten all torn up on me or anything like that. I also like that I have a cord when I'm doing my recordings as opposed to a USB centered headphone because that frees up a USB port and I find the USB headphones to be less reliable. They come with an audio jack that looks like this, which is useful for me because that plugs into my audio interface. However, if you want to use the headphones for some other purpose, I believe it's fairly easy to get a converter to other types of audio jacks. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com where we have many different courses. You can purchase one at a time or have a subscription model giving you access to all the courses, courses which are well organized, have other resources like Excel files and PDF files to download and no commercials. Scripts to one note so that you can use the immersive reader tool, change the language if you so choose, and either read or listen to the transcript multiple different languages using the timestamps to tie in to the video presentations. Desktop version of one note here continuing on with our theme of taking data, making pictorial representations from it, the primary tool this time being the histogram. So we have a car related data. If you want to look up data sets you can practice with, we suggest looking up kaggle.com. Our first data set has the name of the vehicles on the left hand side and the miles per gallon. If we were to put this information in Excel for example we might then first want to sort the information by the miles per gallon because oftentimes we will first have the information by the name. But if we have a whole lot of data then still that's not going to give us enough to really extract meaning from the data so oftentimes we might want to look at other tools such as calculating the average and the median if you were to do this in Excel the average has an average function the average or the mean would be equals average and then just simply selecting the data is going to give us the 24 in this case. What is that average doing? Well it's taking this whole column of numbers summing them up and then dividing by the count of those numbers and then the median you'll recall is taking the one in the middle just like Rocky the boxers coach told him to do hit the one in the middle if you see three of them out there. So we took the median and in Excel the function would simply be equals the median or the second quartile but this is the more common function and you've got the 23 so that would be just simply picking the one in the middle so those are some mathematical data pictorial representation you might then have the histogram so in Excel creating the histogram typically as easy as selecting the entire data set and then inserting the histogram Excel then populating the buckets so now we're looking at the miles per gallon falling between 9 and 13 and then 13 to 16 16 to 20 20 to 24 and so on and so forth. Now if we look at this histogram we don't it's not exactly like a bell shaped kind of histogram right it's skewing to the right meaning we have the tail kind of happening over here towards the right so that when we get up to 46 to 50 miles per gallon we you know this one could be you know kind of outside the normal range so what is our goal typically when we're looking at this data we want to kind of have an idea of the center point if you think of this as like a teeter totter where does it kind of center over and then what is the spread that that goes around that center point and is it possible for us to then kind of approximate this with with a curve or a line of some kind that we can predict that we can make mathematically not always will that be the case right this doesn't seem to approximate exactly some kind of curve that we can easily make with a function the reason we would like to make a function out of it with a mathematical equation if it were possible with a data set is that that gives us more predictive power because now we've got you know an equation that we can plug numbers into which would be great but not all data sets will comply with what with what we would like to let's take a look at another one this is other car related data so we've got the name of the car and then the cylinders so the number of cylinders so if I was to then take the average this is the median and I've added the max and the mean these are our average calculations or our normal calculations that would do pretty much every time right we got our data we can sort the data so now I've got the highest ones on the right and then they go down to the number of cylinders down to three so clearly when we're looking at the number of cylinders notice the data set you expect whole numbers clearly right we're talking about how many cylinders are in a car so you would expect you know somewhere between two and eight here right the high so we don't have as much kind of variance of data and this could give us some different ways that it might be like easier to create the grass we'll take a look at but first we can calculate the average so we can sum this up and and take the average uh which is five now note that when we think about five you might not have a whole lot of vehicles that have five right you don't have any vehicle well there's three that have five cylinders but normally you would think going you know to four to six uh generally so remember that that average can sometimes depending on the data set be a little bit misleading so we have to know what we're talking about the the median the one in the middle uh is four this is the calculation for the median if I was to select that data set the max is taking this data set and picking the highest value the formula for that in excel is simply equals the max and then the data set and that's eight and then the minimum is three equals the minimum in excel will give you the minimum these are you know common formulas in excel the most common formula of course the sum function uh but then the other function is quite common average medium and then the less common but often quite useful to know about is the max and the men so then uh notice that uh we can we can have a graph representation fairly easily in excel here because we know when we think about the cylinders we don't have like 1.2 or you know we don't have a lot as much variant we know it's going to be somewhere in this case between one and eight so it might be useful to just look at this in a table format to just see how many populate between one and eight and it's an easy formula to make that in excel because we can use the count if formula to do that so the count if would be count if the range so we're taking this range and then we're we're saying uh the second is the criteria count if this is a one now none of them have a one uh two count if it's a two none of them have a two three there's four of them four there's 204 so clearly that's the the biggest number five although that was the average is is not the one that we're expected to have on outcomes just because knowing a little bit about how how many cylinders the car has and then six uh 84 and eight uh 104 here's another count if just picking this cell up same formula taking this dangerous data range so then if we were to construct a histogram with this if you just select this data and create a histogram then it creates our buckets uh three to four four to five uh five to five five to six now we can adjust these buckets down here because really we would want them to just be one through eight possibly but sometimes when you create a histogram in excel that can be a little bit tricky to try to get this uh this x axis to be exactly what you want it meaning i would like this to just be you know one two three four to eight because it wants to it needs to have a range so so so that could be a little bit tricky and so when you see this kind of data sometimes it might be easier to say yeah i would like a histogram but but instead of of making the histogram i would just need a bar chart so it might be easier actually to make this table and then create your histogram from a bar chart based on this table instead of the data set so we can come down here and say this is another histogram basically created instead of from this data from this little table we put together which sometimes is easier to put together than others in this case because like if we had a range between one uh zero and one of a bunch of decimals and we don't know what it is then the rain it's going to be a little bit more difficult we need the ranges we need the buckets but if you have something like this now i can say hey look i i just need eight eight buckets right that are pretty well defined so i can create a table and then make our our histogram uh in this format so now i just got one two three four five six seven and eight and so and and this might be a little bit more clear format for us to put it in recognize just recognizing what the data looks like and making a slightly different kind of uh histogram or using a different tool we might trim this down even further and say hey look i don't really need the the ones that don't have anything in it because i only have a few factors so we might cut down the data set to something like this and then and then we might get a little fancy we got a little fancy with the with the the colors here so different formats for uh the histogram let's take a look at another one uh in this case we've got the name and the horsepower so in contrast with the horsepower notice that the horsepower might vary a lot more so so i can't so it's going to be more difficult to use the two use the second method we saw last time so the general idea we've got the name and then we've got the horsepower we might try to sort the horsepower from low to high or high to low and that will give us an idea of the horsepower but we would like to see it see other information related to it here's our standard calculations let's do it this time this way the average taking the average which is summing adding all of these up and then divided by one two three four the the count right and then we're going to take the median take the one in the middle just like rocky the boxers coach told them to hit the one in the middle that's the one we're going to do we're going to hit the one in the middle and that's the median function we've got the max that's the highest one the function to do that in excel is simply equals the max meaning take the highest number in this data set and the min is just simply going to be equals the min brackets data set take the lowest number in the data set so when we're looking at horsepower we've got the max 230 the min 46 the average 104 and the one in the middle uh nine 94 now just looking at these numbers note that that like if this maximum is an outlier it's like 230 it's you know this it's not like it's out there all in and of itself but remember that the outliers on the high or the low end could have an impact on the average versus the median so if you see a big difference between the average and the median there's not a huge difference here but if you see a big difference then your thought process would be well maybe there's some outliers that are skewing the average that you need to kind of keep an eye on so if i look at a pictorial representation of a histogram we have something that's uh that's that's skewed to the right because we have the tail end on the right meaning the ones out here at uh 226 to 244 the high horsepower there's only a few of them out here right so we've got kind of the middle that's somewhere around here right the 104 is the average so the middle point if ours to take a teeter totter you know and try to put it like at the middle so this thing balances you know left to right uh kind of thing on the histogram and there is our spread of the data so clearly this is going from the low the buckets 46 to 64 the 64 to 82 the 82 to 100 the 100 to 118 and so on and so forth now note it is possible for us to create a bar chart and recreate this data but it's not as easy as with the last data because i don't simply have just one through eight different buckets that are well defined because now i have numbers you see these numbers i need a bucket i need something going from i can't list every number from 46 up to 244 that would the chart would be useless so so i have to have some kind of range on the buckets so that means if i was to try to make a a bar chart histogram i i would be a little bit more difficult because i would then have to make the ranges myself and then use a little bit more complicated it's still doable but you have to a little bit more complicated uh uh calculation to pick up the numbers that are between a certain range that's still a useful tool to do we'll do it in excel uh in future presentations because sometimes you might want a bar chart even when you're using the ranges because you might for example want to have two bar charts on top of each other and that might be easier to do then uh then with the histograms you might have a little bit more flexibility if you create your own table in other words and then create a bar chart with it then you do with uh the histogram tools so we'll take a look at those in future presentations