 Statistics and Excel. Getting a picture. Data and distribution. Got data? Let's get stuck into it with statistics and Excel. First, a word from our sponsor. Well, actually these are just items that we picked from the YouTube Shopping Affiliate Program, but that's actually good for you because these aren't things that were just given to us from some large corporation which we don't even use in exchange for us selling them to you. These are things that we actually researched, purchased and used ourselves. Ugg slippers. I usually walk around my home in just my socks, but I wanted a high quality pair of slippers that didn't have a heel on them so I can slip them on easily, give me a little bit more warmth than just my socks provide and which has a sole on them so I can deal with messes in the home such as spilled liquid or broken glass without getting my socks wet or my feet cut up and the Ugg slippers do a great job with that. I like the quality of the slippers. They feel like they're going to last a long time. They will probably outlast me so I recommend the Ugg slippers. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com where we have many different courses. You can purchase one at a time or have a subscription model giving you access to all the courses, courses which are well organized, have other resources like Excel files and PDF files to download and no commercials. Understanding and interpreting data from tables to graphical representation at the heart of statistics is, of course, data. Various entities like governments, businesses, universities and sports enthusiasts collect a wealth of data on a range of subjects. So in other words, no matter who we are, whatever our interests are in, we're usually busy collecting a whole bunch of data about those particular topics in the hope that we can use that data to give us some more information, a better understanding about the topic we're interested in. So this data is often organized into extensive tables and the key challenge lies in making sense of this sea of numbers. So all that data that we're compiling together and these days we have more access to data than ever before, we want to basically put that information into such a format that it can give us some understanding about what we're interested in. So the role of visual representation, here comes the pictures again that we want to be putting in place when we're thinking about our data. So the first rule of statistics, draw a picture. So we want a picture of the data. Why? Visual representations like graphs can reveal patterns, relationships and other important features within data. So we often think that with a statistical analysis, we're just going to have the numbers, the formulas. No, we want to basically see the pictures because that's going to give us an understanding, a more intuitive sense. The picture giving us those thousand plus words about not only the middle point of the data, but also the spread of the data, the shape of the data, the patterns of the data. So for example, they can indicate the distribution of a variable, highlight unexpected outliers or describe an association between two variables. These are often things that we're looking for within the data. We want to know the midpoint of the data. We want to know the distribution of the data. We want to know the general shape of the data. And we would like to possibly know if that data may have a correlation to other points of interest that can help us make a better story or predictions about things. So graphical representations can also serve as an effective communication tool to share the stories embedded within data. And this is the one I know I pointed this out in the prior presentation, but I want to point it out again here because we often have this idea that the people that are actually understanding this fully are understanding it in pure abstract mathematical terms. And then we have to dumb that information down so that we can give the information to other people. So we imagine like an Einstein imagining things in just numbers and just being able to intuit things in numbers and then we take that information and we put it into a pictorial format for us normal people. But that's not generally the case because even Einstein again is quite famous for being able to try to visualize things like falling next to a beam of light and what would that look like, which helped him to kind of then gear where he wanted to go with the math. Now there's a question as to whether the math is driving the visualization or the visualization is driving the math. But I don't believe there's any question that visualization is an important part of the analysis even for the smartest people out there that are trying to glean insights from information. However, of course, we also do have the skill and this gets more into like presenting the information to like management or into a marketing situation or something like that, the skill of taking data and making it into a pictorial format that best expresses the truth about the data. And I want to emphasize here that I want to best express the truth about the data. We also want to be able to know how people would lie about the data not so not so that we can hopefully lie with the data but so that we can see how data can be manipulated in a pictorial fashion as well. Because as we will see, when we start to group the data into things like histograms, it will make a big difference on how large the boxes are in the histograms, for example, or are we going to add the outliers or remove the outliers and things like that as to the shape of the histogram which could lead to different representations of the data. So if someone was trying to mislead, you have a marketing campaign or people have an angle that they're using the data for, which is usually the case, people are often using data to argue their point as opposed to looking at the data objectively in order to find the truth about a particular thing. That's just the way things usually happen, right? So we have to basically be able to look at data and see those areas where they might be using it not properly. Alright, so characterizing distribution. A crucial aspect of understanding data involves characterizing its distribution. So typically we describe a data distribution by one, identifying the general shape, e.g. bell shape by modal, for example, we'll get into more shape types later, but usually we're thinking about a histogram here which looks kind of like a bar chart. We'll show a lot of examples of HIPPS histograms and future presentations, but once we look at that data we can say does it look kind of like the standard bell curve? We'll talk about bell curves in future presentations. Is it skewed? Is there a lot more data? Is there like a tail to the data? So we'll take a look at examples of that. Is it bimodal? Does it have like two humps like a camel? That's kind of a strange phenomenon happening there. That would give us some idea that there's something happening within the data that would help us to explore deeper into it. Now note when we look at like a histogram or just a shape of the data, that's not actually going to tie in exactly to something like a bell shaped curve, for example, but it might approximate a bell shaped curve. So nothing in real life usually is going to be like exactly on a bell shaped curve, but a lot of things will kind of approximate as you do more samples and stuff closer to a bell curve or some curve that we can represent with a function. If we have something that we can represent with a function and we won't always have something we can easily represent with a function, but if we can represent something with a mathematical curve, that could be a great tool because that mathematical curve can help us to do more mathematical analysis about particular items. And the bell shaped is of course the most famous one for a reason because many things tend to fall into that kind of bell shape. So we'll talk more about the bell shape and other types of distributions that we might be able to apply into a curve, a smooth curve to approximate the shape of the histogram and future presentations. So finding the center of the data, so that's one of the key things that we want to do is find the center point of the data, measuring how spread out or concentrated the data is from the center. So once we know what that center point is, that's not all we want to know. One of the big things that being able to pictorially represent data does is help us to get a sense of the distribution around the center point. Are all the data points close to the middle or are they spread out quite largely as there's a lot of spread from the center point? So organizing and summarizing data, statistics aim to effectively organize, describe and summarize data. So obviously that's the point of statistics. We want to take this wealth of data, put it into an organized fashion so that we can glean information from it. This process involves ordering data usefully. In other words, if we just take a look at a data set, if we just measured things every hour or something like that or every day, the data set might be ordered by day, but that's not usually going to be the most useful formatting of the data. One of the things we might do is try to sort the data from lowest to highest results of whatever the data is. That's something that we can easily do in something like Excel. That's why Excel is going to be quite useful for these tools and we'll do that in future presentation. Grouping data efficiently, so if we have a whole lot of data, then we might want to start to compile that data into certain groups so that we can then be able to handle the data in such a way that it's going to give us some information. And sometimes people say, well, we have computers now. The computers can handle massive amounts of data and that's true. The computers can do that, but just like with Einstein pointing his brain to the right direction, we have to tell the computer, we have to have an insight as to what we want to know. We have to be able to query, ask a question to the computer about the data. So to do that, we have to be able to kind of understand what the data is saying so that we can then tell the computer to look for whatever we're looking for within it. Otherwise the computer doesn't know what we're interested in. So summarizing data with single numbers like the mean or median. That's not the only thing we want to do, but clearly that's going to be a key component to be able to use these key numbers, the mean and the median, and then understand to spread around, say the mean. Identify quartiles, so that's another kind of tool that we can use. We can break the number set out into the mean, the median quartiles, creating graphical representations like histograms and box plots. So we'll make some of these in future presentations. Both of these are things that we can do in Excel, Excel being a great tool. So we'll show them in Excel as well as some other outside of Excel so we can see examples of box plots and histograms. The histogram is probably the big one that gives us a better just pictorial representation of the data. So a histogram, for example, is created by dividing the data into disjoint groups and counting the frequency of data items within each group. So this looks similar to a histogram, it looks kind of like a bar graph, but it's going to be a grouping of the data. We'll see a lot of examples of histograms and different shapes of histograms and how they give us a sense of the different shapes in future presentations and then we'll make histograms. So this gives us a sense of the shape of the data, revealing whether it's skewed, bimodal, or symmetrical among other characteristics. So in other words, these are terms that we're going to be using to describe the shape of the data. So once we see the data, we can say is it skewed, meaning is there going to be more data like it doesn't have a tail to it. There seems to tail off on one side of the data. Is it bimodal? Are there two humps like a camel in the data? As it's symmetrical around the center, it looks more similar to like a bell curve type of shape. So these are kind of characteristic terms that we can use to describe the data. So examining relationships, a fundamental part of extracting meaning from data is examining the relationship between two or more variables. For instance, one might look at the correlation between a student's SAT score and their GPA in college. So this is that example where we can say, hey look, I know all the SAT scores, test scores taken before college, and we can say is there a relationship between the SAT scores, and we're not talking about the SAT scores and the people that get into college. We're talking about the SAT scores and the people that are in college, right? So they're in college. They had the SAT scores before going into college. So now we can plot if we wanted to, the relationship between the SAT scores they had and their GPA, the performance they had in college, to see if the SAT scores had an impact. Now you would think that the SAT scores, if they're a measure of intelligence, for example, and there's arguments and debates on this, right, you can do stats on that and whatnot, but you would think that if it was that they would be more likely to do well and have a higher GPA. Now if you plot that out on a, if you plot that out, you're not going to necessarily find a straight connection in that. So then the question is that is that actually true is of course the question. So such relationship can be visualized using scatter plots. So in a scatter plot with each dot representing an individual specific combination of variables. So we can plot basically the GPA against the SAT scores and see if there's a trend. Now, and then we can really see, and statistics is really looking for this kind of statistics is to see if these two things have a correlation. Do they happen kind of in uniform and unison to each other? In this case, we're usually not going to find like a direct correlation, but they seem to move in the same direction with each other is kind of what we're looking for in statistics. Now, notice that the statistics is not going to tell us the question that is the common question we have to ask then is, well, you often hear that correlation does not mean causation. So then the question is, well, if there's a correlation, there could be causation. So then the question is one causing the other, and then we can also get mixed up in terms of which one caused the other one. So we might say that this thing caused the other thing when really the other thing is causing this thing, or it might be a third thing that's causing. So we get into those kind of questions as to why the relationship seems to be there, but the statistics, the scatter plot, and we can get into technical measures of a correlation. And we can see that correlation, and then we have to resist our human instincts of saying anytime there's a correlation, there's got to be some kind of causation, and we might get the causation backwards. And these, again, are things that people often use to mislead. You'll hear oftentimes people come up with studies that are just off the wall. There seem to be this weird correlation and they assign causation, or they reverse the things that the cause and the effect as if they get it backwards and possibly they may do that on purpose sometimes it seems like. So those are the things that we have to be careful of. We'll talk more about future presentation. So in conclusion, the ultimate goal is to organize, describe, and summarize data sets. To understand data, we often look at its distribution, shape, center, and spread. Those are the things you want to keep in mind. We're trying to understand the data. What are we trying to do with it in statistics? We want to find the distribution, shape, center, and spread. So graphical tools like histograms and box plots, typically we're going to be leaning towards the histogram. But box plots we'll take a look at them as well along with statistical measures like mean, median, quartiles can help summarize data effectively. However, they do not represent all the information and understanding data often falls analyzing various shapes and visualizing associated data via scatter plots or other graphical representations. So remember one of the things that we the pitfalls we need to avoid or be very careful of mindful of is the human tendency to put everything into very tight boxes. We try to categorize people in boxes. We try to categorize books in boxes, philosophies in boxes, religions in these tight little boxes, right? And obviously the boxes cannot possibly be big enough to really capture all the meaning of a person or a philosophy oftentimes. So we need to be careful that we're not honing down too much. And we also need to be careful when people are purposely honing down too much or focusing in on one particular thing, excluding other things. When we're analyzing other people's work and remember statistics is usually used in practice in the real world to try to prove a point or enforce an action that people already want to take. So they're kind of they have a bias in the statistics, which isn't ideal. We have to look for the for where the biases in statistics when the statistics are not being used, you know, properly. So we have to understand how they can be misused to do that. So all of these concepts are quantified quantified and explored in depth in statistical studies. So we'll dive into more of them in future presentations.