 Statistics and Excel. Probability, distribution, models, and families. Got data? Let's get stuck into it with statistics and Excel. Introduction. First, a word from our sponsor. Yeah, actually, we're sponsoring ourselves on this one because apparently the merchandisers, they don't want to be seen with us. But that's okay whatever because our merchandise is better than their stupid stuff anyways. Like our crunching numbers is my cardio product line. Now, I'm not saying that subscribing to this channel, crunching numbers with us, will make you thin, fit, and healthy or anything. However, it does seem like it worked for her. Just saying. So, yeah, subscribe, hit the bell thing and buy some merchandise so you can make the world a better place by sharing your accounting instruction exercise routine. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com. In prior sections, we've been thinking about how we can describe different data sets using both mathematical calculations such as the mean or average, the median, the quartiles, and with pictorial representation like the box and whiskers and the histograms. The histogram being the pictorial representation most used when we're thinking about the spread of the data, how the data is dispersed. We can then use different kind of language to describe the histogram and what the histogram looks like such as it's skewed to the left or it's skewed to the right. Now we want to spend more time using mathematical models to describe different data sets. So in other words, when we're looking at a data set, if we can approximate that data set with some type of mathematical model, which will give us a line or a curve approximating the data set, that will often give us more predictive power over whatever the data set is representing in the future. So three pillars to describe distribution. Remember, when we're thinking about the distribution of data, we're thinking about the shape of the distribution. You're envisioning here a histogram of a data set, which will give us an idea of what the shape looks like is the data centered in the middle, for example, or dispersed to the sides. The center where is the center point often represented by the mean or some other centering kind of tool like the median, for example, and then the spread of the data. How is the data spread around generally that center point? How is it spread around, for example, the mean? Those are the characteristics we typically have in mind when we're thinking about a data set, again, usually envisioning, say, a histogram. Shape of data represents the distribution of data. Any curve can model a data set, but some shapes are more useful than others. In other words, if we had a set of data, we could plot those data points into a curve or a histogram. And when most people envision or imagine a curve or histogram, the first one that comes to mind is a bell shaped type of curve. But it's important to remember that the bell shaped type of curve is only one family of curve, one possible shape of distributions. If we take any given data set, it's possible that that data set could represent any kind of curve. In other words, if you just looked out of your window at the horizon and you saw this mountain, for example, you can imagine some kind of data set that would be represented by the curve of this mountain. It's just a jagged type of curve. It doesn't necessarily need to be resulting in a bell shaped type of curve. And if that is the case, if we don't see any pattern in the type of data that we are looking at, it's going to be more difficult for us to approximate that data set with some kind of smooth curve or line, which is what we would like to do. Now, of course, when you look at things in nature and you look at just about anything, there are oftentimes going to be patterns. And if there is a pattern, then it might be the case that that data set can then be represented by a smooth line. And if it can be represented by a smooth line that can be then shown with some type of formula, that can give us predictive power into the future. So oftentimes the way you might want to start thinking about this is looking at the actual data, the thing that you're trying to test, and then plotting those data points. And then once the data points are plotted, you're trying to say, is this information something that could be represented by a smooth curve? Because the smooth curve possibly could be represented by some kind of equation or formula. And oftentimes many things can be, and if they are, then we can use that perfect representation of the jagged line in order to make future calculations. And you can kind of, I think about this kind of like, if you think of like, what was it, Socrates that had the idea of you're in the cave and everything that you look at is basically a shadow that represents the actual realness of something. So the horse that you're looking at is kind of like a shadow of the horse that represents, I guess you can think of the God's vision of actually hoarseness, what a horse is, right? When you're looking at a data set that seems to be following a pattern, you're looking at a small sample of basically the entire pattern. And if you were to be able to extrapolate out to the entire pattern in a similar way, then you would have that basically smooth curve that's representing the pattern that you have the small snippet of. Maybe one way to kind of think of it. So salaries at a corporation, for example, skewed distribution. So when we're looking at the shapes, we're trying to think about the shape of the actual data. If we were to take a look at the actual data of the salaries of the of a corporation, we can describe the shape as we saw in prior presentation. So it might not be a smooth curve. We're looking at actual data on the histogram. And we could say that the data might be skewed to the right or to the left, for example. So most employees earn an average or below average wage with a few outliers at the top. That's the other thing that we wanted to keep in mind from prior presentations. So you might have the CEO, for example, that makes a lot of money, which means that there's going to be an outlier to the right end. So you would expect the curve to basically be kind of skewed to the right. And those are some of the terms that we use to describe the distribution of the data. So intervals between cars at a toll booth or or atom decay. So these are going to be some examples of different types of distributions. We haven't spent as much time on in prior presentations. We'll spend more time on in current presentations. When we look at these line waiting situations like waiting in line at a toll booth, for example, then oftentimes there's a pattern that we can see with those types of representations, which is like a poison distribution, which we'll talk about shortly. And it has characteristics shapes that we can describe in terms of the characteristics we talked about in the past. In this case skewed to the right. And then we could have exponential distributions we will also talk about as well. So types of data shapes. So the types of data shapes, we could describe our data shapes. Remember, if we have our data in the histogram, we could, for example, have a single peaked histogram, which most common values in the center and fewer values as we move away. That's what you might envision more like a bell shaped curve. So we would describe that as having more of the data in the middle with a single peak to it. Symmetric, the data looks the same on both sides of the center. So if it's a symmetric, again, you're probably envisioning like a bell shaped curve at the middle point. And then you have the data somewhat symmetrically on either side of that middle point. But when it is skewed, that's the term we used right skewed tail on the right of the center, meaning you've got that that more data that's going to the right side. And you have that tail that's going out towards the right and then left skewed tail on the left of the center, the opposite, and you could have a binomial which has two peaks of the data. So instead of having just the data in the middle and then spreading to the side, you could have those two peaks of the data. These are just some terms that we can use to represent the data. And remember, when you're looking at different data sets, you could have these, we're trying to, like if you're looking at the landscape here and that was representing a particular data set, we can try to look at any particular data set and use those general terms to get an idea of what the data set is doing. So now we want to, once we get an idea of being able to kind of describe the histogram with those general terms, we want to be able to see, is there a mathematical description of the data? If we can describe it mathematically with some type of curve or line, that's what's going to give us more predictive power. That's going to be what our focus is more here. So we're going to take a look at some families of distributions now. These are some common families of distributions. One's going to be the uniform distribution. We'll talk a little bit more about each of them in future in a little bit here. You got the Poisson distributions. You've got the exponential distributions and the binomial distributions. So let's take a look at each of those in a little bit more detail and we will do example problems in this section related to some of these families of distributions. So we've got the uniform distribution. This is the easiest one to start thinking about. So in other words, if you're thinking about a set of data, we're trying to say, is this set of data, the histogram that's coming from it, something that I can represent with one of these mathematical formulas and the first one is a straight line. So that would be a flat line distribution. An example would be rolling a fair die. So in other words, if you roll a dice, you only have one through six that the dice could roll and you would expect then the distribution to be an even distribution between all the numbers if it was a fair die, which would be an easy function f of x equals c. And if I was to make a histogram of it, it would look like this, right? If I rolled the dice, I think this is representing rolling the dice like a thousand times or something like that. Pulling out the trusty calculator. So if I rolled one die, you would expect it to be one over six. That's the likelihood 16.66% that it's going to be either a one, two, three, four, five or six. If I rolled the die a thousand times, then what would you expect to happen that times a thousand? You would expect to have about 167 of each number rolled. That would be what you would expect. Now notice that this is just an approximation, a model of what might happen in the actual world and you can clearly see that because it's impossible for me to roll 166.62s because I can't roll 0.6 of a two, right? That's impossible. So the model is not an exact representation of what could actually happen in the world, but you can see how it gives predictive power of what we would basically kind of expect to happen. And you can use that same kind of concept we've thought about in the past, which was we're using kind of like a sample. So again, the idea would be if you have the entire population, if you were looking at everything, we weren't in the cave, but we were looking at everything and we could see the actual vision of everything, then you would have that even distribution in this kind of representation. But because we're taking just a snippet, a sample, then we're taking an unperfect representation of the world, right? But in any case, we would have just this line. It would just be a line. So when we actually roll the dice, it's not going to come out to exactly this line if I roll the die like 1,000 times, but this will approximate what we think should happen. And therefore, I can use just the graph of a line to predict what's going to happen. And if I rolled it less than 1,000 times, it would have a family of uniform distribution curves or lines, which would be 1 over 6. And if I rolled it 200 times times 200, then the line would be at 33. So that would be one form of distribution where I could use that mathematical equation to predict what's going to happen, even though it's not perfect. So then we have the Poisson distribution. Now, this is the formula for the Poisson distribution, which you might say, oh, my, this is going to kill me. It's the Poisson distribution. But no, it's not that it's a Poisson distribution. And we're not too worried about the formula because the goal isn't that we need to be able to re-represent the formula. The point is that we have this genius that came up with the idea of the curve. And if we see certain characteristics in the data, it might be represented by a Poisson distribution, which we then can use Excel functions and whatnot to make future predictions using the formula. So don't be intimidated too much by the formula, but we will talk about it a little bit more in future presentations. The general idea is that it represents events and fixed intervals and examples are cars arriving at an intersection. So this is also often in a line waiting kind of situation where the Poisson distribution happens to work out. And remember, the idea here is, well, if I have these situations, these data sets that I'm looking at, is there a way that I can have a smooth curve that represents approximately the actual data set? Because if I can do that with a function, it will allow me to give me, like, predictive power into the future. And it has been noticed that in business scenarios, a lot of times when you have these line waiting situations, you're waiting in line at the drive-through or at a roller coaster or cars arriving at an intersection that they seem to follow this Poisson distribution. We'll talk more specifically about characteristics that are typically present for data to follow a Poisson distribution. So if we plotted the data of cars arriving at intervals, say every minute interval, we count how many cars arrive at an intersection, then we then plot that data, we might observe that it is closely represented by the curve of a Poisson distribution, and if so, then we can use the Poisson distribution to approximate what is actually happening given us predictive power. So this is a graph of a Poisson distribution. We'll talk more about it when we get into Excel examples. But the general idea is that if you're talking about cars that are going into an intersection or if you're talking about line waiting situation and how many people are showing up to the line in any one-minute interval, then the upper limit is going to be infinite, is the general idea. Now, in practice, you're not going to have an infinite number of people showing up to a line in any given situation, but in theory, it can go up forever. So this looks like a bell-shaped curve, but it's actually kind of skewed to the right, and that's going to be the general characteristics of a Poisson distribution. It's going to have this somewhat gentle right skewness to it, and we'll talk a little bit when we get to specifics, problems on how the shape kind of changes as you change some of the parameters to it. The next one is the exponential distribution. So it represents time between events, and this one is often related to a Poisson distribution. So in other words, if you're looking at a line waiting situation, then the Poisson distribution is telling you or asking the question of how many cars are arriving in a certain interval of time, or what are the likelihood that how many cars arrive in a certain interval of time, like a minute. The exponential distribution kind of flips that around, and now we're talking about the time between arrivals of individuals or cars. So it's a little bit more difficult, I think, for most people to kind of first wrap their mind around that relationship between the Poisson and the exponential distribution. The examples that we go through, I think, will shed a lot of light onto that relationship. So we'll take a look at those in future presentations, but you also have a radioactive decay as another common example of the shape of the distribution, which we'll take a look at in a second here. Relation to Poisson times between Poisson events follow an exponential distribution. So if you notice a Poisson distribution on the events, the time between events, then, you would expect to follow an exponential distribution, which often happens in business scenarios with those line-waiting situations, and it looks like this. So that's going to be the exponential look that you'll be envisioning when you're thinking exponential. And I think the decay of radioactive material kind of comes to mind to me oftentimes when I'm thinking about this shape that gives me the vision of this shape more than a line-waiting situation, which is a little more difficult to wrap your mind around at first, but the examples I think will help with that. Next, we have the binomial distribution. Now, again, don't be intimidated by the equation. We want to put the equation up top, and we'll talk more about the equation later, but the equation isn't really the important thing. The important thing is someone came up with an equation that represents a smooth curve that often represents things that happen in nature, and that allows us then to, if we see something that can be represented by the smooth curve, to not have to do that, you don't have to go through the math so much because we can do it within Excel, but all we have to do is recognize it and then we can basically use the formula to make predictions about something. So this represents the number of successes in a fixed number of trials. So we'll take a look at examples like sales made in a fixed number of sales calls. So usually the characteristics of this kind of distribution will be that there's got to be something that has like a yes or no outcome to it. So if you're saying that you're making a number of sales calls, then you're either going to have a success or a non-success. That's why it's going to be binomial. We can basically say, well, if I get a sale during that call success, a not sale is a not success. If you're thinking about a coin flip situation, then you can do a similar type of test, which we'll look at some examples on where you would label the heads or the tails, but let's say the heads as a success and the tails as a non-success. And then we would need to know the percentages of each of those activities in terms of the likelihood of it being a success or not, a coin flip basically being 50-50, a sales call usually being a lot lower for the success, 10% success or something like that. So we'll get more into the specifics again of when something in actual practice will typically follow a binomial distribution and if it does, then we can use this concept to make predictions about the future. And importance of mathematical models so it allows for quantitative analysis. So obviously if we can get a mathematical model, remember when we're taking a data set, the data set could be any jagged line. Just like we said when we looked out the window, it might not have any curve that can easily represent the data. If that's the case, then the data is not useless. We might be able to use calculus or use some complex methodology to take that data and extrapolate it into the future and possibly get some predictive power from it. But if there is a mathematical model that it can be represented by, which is some form of line or curve, then we have a really nice, powerful tool in order to plug numbers into that equation to give us more insight about what's actually happening. Helps making predictions, assists in understanding and understanding underlying phenomenon. So if we know the characteristics of what goes into a particular curve normally and we see that some data is following that curve, then that might give us some better understanding about actually what is happening within the world. So conclusion, understanding the shape of data is fundamental in statistics. So clearly we need to know what the shape of the data is, which we can use in non-mathematical terms, meaning we can plot the data and use terms such as it's skewed to the left, it's skewed to the right, it's centered, it's got two peaks and whatnot. And we can then use mathematical models to provide a framework to describe, analyze, and predict. And then we can get into more technical, actual mathematical models, not always something we can do for every data set. We can't do it for every data set. We can do it for those data sets that we see a pattern where the curve is approximating, something that we know to be represented by a line or curve, which happens a lot because nature seems to follow patterns. So if we can recognize those, then we can use these. So combining shape, center, and spread provides a holistic view of the data. So that's the theme of the course here. We cannot represent data with just one number typically because really to represent what is going on, we need to know more and you can summarize that in terms of the shape, the center, and the spread of the data, which we can see pictorially with a histogram and possibly be able to use more mathematical calculations to represent those numbers as well so we can get more specific on the mathematical side. And if we can do that, that would be great.