 Statistics and Excel. Histograms with car-related data. Got data? Let's get stuck into it with statistics and Excel. You're not required to, but if you have access to the OneNote workbook, we're in the icon left-hand side. OneNote presentation 1065. Histogram with car-related data tab. We're also uploading transcripts to OneNote so that you can use the immersive reader tool, change the language if you so choose, and either read or listen to the transcript multiple different languages using the time stamps to tie in to the video presentations. Desktop version of OneNote here continuing on with our theme of taking data, making pictorial representations from it. The primary tool this time being the histogram. So we have car-related data. If you want to look up data sets you can practice with, we suggest looking up kaggle.com. Our first data set has the name of the vehicles on the left-hand side and the miles per gallon. If we were to put this information in Excel, for example, we might then first want to sort the information by the miles per gallon because oftentimes we will first have the information by the name. But if we have a whole lot of data, then still that's not going to give us enough to really extract meaning from the data. So oftentimes we might want to look at other tools such as calculating the average and the median. If you were to do this in Excel, the average has an average function. The average or the mean would be equals average and then just simply selecting the data is going to give us the 24 in this case. What is that average doing? Well it's taking this whole column of numbers, summing them up and then dividing by the count of those numbers. And then the median you'll recall is taking the one in the middle just like Rocky the boxers coach told him to do. Hit the one in the middle if you see three of them out there. So we took the median and in Excel the function would simply be equals the median or the second quartile. But this is the more common function and you've got the 23. So that would be just simply picking the one in the middle. So those are some mathematical data, pictorial representation. You might then have the histogram. So in Excel, creating the histogram typically as easy as selecting the entire data set and then inserting the histogram, Excel then populating the buckets. So now we're looking at the miles per gallon falling between 9 and 13 and then 13 to 16, 16 to 20, 20 to 24 and so on and so forth. Now if we look at this histogram, we don't, it's not exactly like a bell shaped kind of histogram, right? It's skewing to the right meaning we have the tail kind of happening over here towards the right so that when we get up to 46 to 50 miles per gallon, this one could be kind of outside the normal range. So what is our goal? Typically, when we're looking at this data, we want to kind of have an idea of the center point. If you think of this as like a teeter totter, where does it kind of center over and then what is the spread that goes around that center point and is it possible for us to then kind of approximate this with a curve or a line of some kind that we can predict, that we can make mathematically? Not always will that be the case, right? This doesn't seem to approximate exactly some kind of curve that we can easily make with a function. The reason we would like to make a function out of it with a mathematical equation, if it were possible with a data set is that that gives us more predictive power because now we've got an equation that we can plug numbers into which would be great but not all data sets will comply with what we would like. Let's take a look at another one. This is other car related data. So we've got the name of the car and then the cylinders. So the number of cylinders. So if I was to then take the average, this is the median and I've added the max and the mean. These are our average calculations or our normal calculations that would do pretty much every time, right? We got our data. We can sort the data. So now I've got the highest ones on the right and then they go down to the number of cylinders down to three. So clearly when we're looking at the number of cylinders, notice the data set. You expect whole numbers clearly, right? We're talking about how many cylinders are in a car. So you would expect somewhere between two and eight here, right? The high. So we don't have as much kind of variance of data and this could give us some different ways that it might be like easier to create the grass. We'll take a look at it. But first we can calculate the average. So we can sum this up and take the average, which is five. Now note that when we think about five, you might not have a whole lot of vehicles that have five, right? You don't have any vehicle. Well, there's three that have five cylinders, but normally you would think going, you know, to four to six, generally. So remember that that average can sometimes, depending on the data set, be a little bit misleading. So we have to know what we're talking about. The median, the one in the middle, is four. This is the calculation for the median. If I also select that data set, the max is taking this data set and picking the highest value. The formula for that in excel is simply equals the max and then the data set, and that's eight. And then the minimum is three equals the minimum in excel will give you the minimum. These are, you know, common formulas in excel. The most common formula, of course, the sum function, but then the other function is quite common, average, medium, and then less common, but often quite useful to know about is the max and the min. So then notice that we can have a graph representation fairly easily in excel here because we know when we think about the cylinders, we don't have like 1.2 or, you know, we don't have a lot as much variant. We know it's going to be somewhere in this case between one and eight. So it might be useful to just look at this in a table format to just see how many populate between one and eight. And it's an easy formula to make that in excel because we can use the count if formula to do that. So the count if would be count if the range. So we're taking this range. And then we're saying the second is the criteria count if this is a one. Now, none of them have a one, two count if it's a two, none of them have a two, three, there's four of them, four, there's 204. So clearly that's the biggest number five, although that was the average