 Statistics and Excel. Generating practice data in Excel. Get ready, taking a deep breath, holding it in for 10 seconds, looking forward to a smooth, soothing Excel. Here we are in just first a word from our sponsor. Well, actually, these are just items that we picked from the YouTube Shopping Affiliate Program, but that's actually good for you, because these aren't things that were just given to us from some large corporation which we don't even use in exchange for us selling them to you. These are things that we actually researched, purchased, and used ourselves. Focusrite Scarlett Solo Third Gen USB interface with software suite. I've been using a Focusrite for years for my audio needs, before which time I had a USB microphone which plugged directly into the computer. But I think you'll find, as I have found, if you want to increase the quality of your microphone, you will need an interface, and the Focusrite is the go-to interface as far as I'm concerned. I've been using this for years now. It works well. It's easy to use. It seems quite durably built. Because I only do the screen recordings, I only need the one solo interface. However, if you have multiple microphones you need to plug in, or if you have other instruments you need to plug in, you can look at a similar model that has more input ports. If you would like a commercial-free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com where we have many different courses. You can purchase one at a time or have a subscription model, giving you access to all the courses, courses which are well-organized, have other resources like Excel files and PDF files to download, and no commercials. A blank Excel worksheet, we're going to be putting together a data set just within Excel so we can then use that data to apply and practice our statistical tools. In other words, oftentimes people might be thinking, I would like to practice my statistical analysis tools within Excel, like the average function, the median function, the core tiles. I would like to go into my insert tab up top into the charts group and then insert a histogram or a box in whiskers, but I don't have any data in order to do that. In many times that's actually a stopping point for people. However, it shouldn't be because there's many different resources we can use to get the data. I'm going to go back to the home tab as my default. One method, of course, is to go online and get data that we can apply these analytical tools to. Another method is just to make some data in Excel so we can just practice the tools that we want to be practicing with. And there's another advantage other than it just being easy to make a data set within Excel, and that is that when you download a data set, then what you're trying to do is apply your statistical tools in order to get some information about that data set. However, if you actually make your own data set, then as you make the data set, you'll be able to start to visualize what the statistical tools will give you as a result of that data set. In other words, if I make my own data set, I should be able to kind of start to visualize from how I create the data set, what the histogram and the box and whiskers is going to look like. And so you're kind of reverse engineering in that way, which could give you a better understanding sometimes. So I think it's a good practice to do for that reason as well. So we're in a blank sheet. Down here, we're zoomed in at 175. I'm going to zoom in a bit further by holding down control and up on the scroll wheel on the mouse. So now I'm at 235. I will typically start by formatting the entire sheet. This is what I do every time because otherwise you're going to get confused, right? So I want to format the entire sheet first. I'm going to select the triangle up top or you can select control A for the entire sheet, right click and format all the cells. So this is my underlying formatting. Now I usually go to currency and then I go to negative numbers being bracketed. I'm going to remove the dollar sign and I'm going to add the decimals back in. I took them off in a second ago, but I'm going to add them back in. So we're going to start with decimals. Now whether you start with decimals or not will be dependent upon the data sets you're working with. If you're rounding the numbers, the decimals aren't important, then you might take them off and then only add them when necessary. So I'm going to say, okay, that's my underlying formatting. Now if I need to change the formatting in a particular cell and make it a percent or something like that, then I can always go in and do that for that particular cell. All right, so now we're going to make our data set. So one of the ways we can do this is we can use the random function. So I could say this is a function. I'm going to say equals. I'm in cell A1, of course. Cell A1 equals means we're going to have a function and I'm going to type in RAND. And so it's R-A-N-D, selecting that. And then if I just close it up, notice there's not even an argument down here. It's just going to generate a random decimal. So there's my random decimal. And then I can select that and I can drag that down, putting my cursor on the fill handle. Now the fill handle is that little square. So if I put my cursor on the fill handle, I can drag it down. And if I double click on any of these, it now has put that same random function all the way down, right? It recalculates it. So if you want to lock, after you do that, once you want to lock this function so it doesn't keep changing, you can then copy your random function or select it and right-click and copy. And then put it somewhere else, right-click and paste it just the numbers. So now here you can see if I look at the formula bar, this one's just numbers. This one keeps on generating different random results. And then if I want these random results to be something other than decimals, then I can multiply them, right? I can say, okay, well what if I want it times 100, this times this. And now it's up to, now I can say this equals the one above it and then I can copy that down. And so then I can copy this down, right? So now I have a randomly generated set of numbers between zero and 100 in essence. I think that's basically the general idea of this. But obviously when I, so I can then apply and I can, if I wanted to change this to 200 or something like that, then I can do that or 1000. I can use that random generator tool. So that at least gives me a set of numbers that I can start to play with, right? So I can start to now do my average function. So I could go over here and say equals the average and there's my function. I'm going to double-click on the function. So remember whenever we do these functions, I hit equals, type in the function brackets, and then I'm just going to select this range. And that's going to help me to calculate the average. I could double-check that if I wanted to do it manually, right? I can say, well this equals the sum. So now this is the famous sum function. Have to know that one. Most important function is all of this. And then I can divide it by the number. I can do that with account equals the count function. I want you to count the number. There's my count of these. How many are there? Should be eight, right? I can see it right here. And then I can divide this out equals. I'm going to hit up, up. So now it equals what's in G2, G2 divided by this one, G4. So take the 3111.09 divided by eight, or take what's in G2 divided by what's in G4 will give us that average. So again, we can kind of play with these numbers. Now, because we generated them randomly, we start to think about, we can kind of think about what exactly we're going to get as a result. So let me give you some more examples of this. I'm going to clean this out. I'm going to select, I'm going to put in my cursor on A1, and I'm dragging over to G8. Notice that this cell range, I can name with just two cells. It's A1 through G8. And because it's all one big box, I only need those two cells. I don't need four points to name this range. I only need two points. So just keep that in mind because that's what the cell references will do. So I'm going to delete this. And let's say I want random numbers between a certain set of numbers. Let's say I just want random generated numbers between one and five, right? Just one and five. So I'm going to say this is going to be equal RAND and then BETWEEN. RAND BETWEEN. And now I have my arguments here. So I want between, this is a RAND array. Hold on a second, RAND BETWEEN. This is the one I want. RAND BETWEEN. I've got my argument on the bottom and the top. So the bottom, I want it to go to one. I don't want to go below one. And then if I hit a comma, that takes me to the second argument. So how high do I want it to go? I want it to go to five. And then I close up the brackets, CTRL, SHIFT, 0. That's my argument. So it should give me something between one and five now, right? So one and five, it gives me a three. Or you might do it like one and let's do one and six. I'm going to double click on it, go to the end of it. And that would be like a dice, like we're rolling a die, right? Between one and six. And so now if I copy that down and I put my cursor on the fill handle, drag it down. So now you've got our numbers between one and six. Now these are all equally outcome like a dice rule, right? So you would expect that because they're all equal, that you're going to get... If I copy this all the way down, if I did it a whole bunch of times, that you would have something that's going to be somewhat evenly distributed. What I'm going to do now is enter a histogram to get a visual sense or idea of the spread of the data. We'll talk about histograms more later. But for now, let's select the data so I can put my cursor on A1 and drag all the way down. That's one way I can select the data and then I'd have to let close the click, scroll back up. Another way we can select the data is to put our cursor on A1, hold down control and shift at the same time, and then the down arrow on the keyboard. So that's a great tool. You don't have to do it that way, but if you have a lot of data, that could be useful. Now we're at the bottom, we're down here on row 59. I don't really want to enter the histogram down here because if I do, it'll be down here somewhere and I want it at the top. So then I'll have to drag it back up to the top. So to get back up to the top without unselecting the data, you could do that with a keystroke by holding down control and then the backspace. Not delete, but backspace. So now the data is selected. I'm at the top of the sheet. Now I'm going to go to the insert tab up top. I'm going to go to the charts and then we're going to enter a histogram. So hit the histogram and there we go. Now this gives us kind of like a bucket and summarize the data in buckets. So it gives us a default of between one and 2.4. I want to have a spread of just one since there's only five data sets. So I'm going to double click on this data. We'll talk more about how to do this in the future, but I'm going to change this bin and I'm going to change it to just one. And so there we have it. So now you've got the bins. Now we could get a little bit more specific and we might have been able to enter just like a bar chart since there's only five items here. But the general idea is that you have a spread of the data, which you would expect would be somewhat even because if you ruled a die a bunch of times, then each number you would expect to come up somewhat evenly. That's the general idea here, right? So whenever you use a random generator, you're going to get numbers that you would expect if you looked at the spread of the data would be somewhat even. And usually what we're trying to do when we're trying to do statistics is compare something that's going to have an even spread to something that doesn't have an even spread. So for example, if the die was rigged or something like that, and let's say it was rigged to have a six a lot of times, that you would expect that you're going to get a skewed kind of answer that has a skewed towards, you know, the six. Say, let's keep randomly doing this and so on. So now you can say, hey, there's something wrong with this die, right? Because now it's skewed. I expect it to be somewhat the same, but there's a skew to it. And that seems odd, right? Is this weighted, what is going on? So we'll talk more about that later, but you can get an idea, like if you make your own data sets, what the general output would be in the data. Now let's do this a different way. I'm going to delete this whole row. I'm going to put my cursor on the A, select it, right click, and delete that column. Let's say we're talking about salaries, right? Let's say we're talking about salaries and I say this is going to be equal to random between, between, and let's say that we're going to say the salaries are going to be somewhere between a bottom of like 60,000 and a top of comma top 80,000, right? So now if I, that's 8,000, 80,000. So now if I select this, it's going to give me randomly between there. Now I'm thinking that more likely, you know, that the middle would be around, you know, 70,000 or something, right? This is the top and bottom. So this wouldn't give me a spread that I would think would be kind of normal to a, to, to like a, an institution, but it will at least give me a starting point, right? So I can say, okay, that it gives me 77, 151. Now if I was going to build my data set, I could say, well, let's copy that random generated number down. So there we have it. Okay. Now each of these, I believe is independent of the, of the one above it. So each one is like rolling a new die, right? So, and remember that if you want to preserve this data set, you probably want to copy it and then hard code it. So let's say, let's say that I take this, this data set and I say, I say, okay, let's copy it and put it right here. I'm going to paste it one, two, three, pasting it just with the values because every time I do something that changes on this side, right? So now this is, this is the first one that we did. Let's color coded. So I'm just going to cut, put a color around it and that was just randomly selected. Now if I make a histogram based on that, selecting this data, insert histogram and I'm going to hold down control, scroll down a bit. And let's delete this and let's change the bucket size. So bucket size. And I'm going to say the bucket is like a change of 1000. Let's say. So now it's given me, you know, a bunch more ranges. And you've got this pretty wide spread of bucket ranges and, and any, any numbers equally likely to come up. So you don't have a really a center to the data. What if I wanted to try to create a data set that did have a center to it, right? I could say, okay, well that's, that's the full range. I would think the middle, I can create another, another data set here. Let's go from here and say this is going to be Rand and it's going to be between the middle. I'm thinking is going to be between like 70 or let's say between like 68000 and comma 72000. Right. That's going to be like the middle range. And I can say, okay, let's copy that down. And now I've added some that are kind of in the middle. I'll make that like green and let's copy that. I'm going to copy that data set and I'm going to put paste it 123 just the values so it doesn't keep generating. And now if I selected this entire data set, I might have something that approximates more of what I would kind of expect with like a center centering point, right? So I'm going to go to insert charts histogram. And so now we can say, okay, let's take this data set and say that it's going to be spread of 1000 again 1000 on the spread. And so now you've got something, you know, that's a little bit more towards the middle of it, right? So now you've got something that's starting to approximate something in the middle. Let's say, let's say I drag this down further. So you've got more that's kind of in that middle section. And then I'm going to copy this copy this and paste it 123 make this whole thing green. And then I'm going to double click on this data set. And I'm going to drag this down. We'll talk more about charts later. I'm just trying to get an idea to show an idea. You know, how the how your different data sets see now you got something that looks a little bit more like it's centered, right? So this is just a general concept of how we can generate our own set of numbers within Excel that we can then apply our statistical tools to and how we can in essence reverse engineer something like the creation of a histogram, for example, as we generate our data set. And you might be thinking, well, that's cheating. You're teaching us how to cheat so that we can create a data set that makes the graph look like what we want to look like. But no, it's not cheating. When we're practicing trying to understand something like deconstructing a motor in order to understand how an engine works isn't cheating. What we want to be able to do is say, hey, if I can create a data set that will then populate a histogram like the one I envisioned in my mind, then when I apply those same statistical tools to other data sets and it creates histograms, I'll have a better conceptual understanding of what is actually happening so I can make decisions based on what is going on. And this is often illuminating the people because people often have misconceptions of what the data will look like when you plot it on a histogram. Like, for example, if you plotted this random data up top and I just took a random sample, that's what this is basically doing between a certain set of numbers, or like when we had the one between one and six and we took a random sample between one and six. A lot of people would say, well, if you did that a bunch of times, then you would think that the histogram should look kind of like a bell curve, similar to this less like this, which looks more like a straight line kind of thing. And that shows kind of a disconnect of what is actually happening. So if you think that way, if that's what you thought you say, well, hey, it should look like a bell curve because you took a random sample, then that's not exactly right. We'll talk more about that in future presentations of why that would be the case. But if you can get to the point where you build your data set and you can actually say, this is going to result in a bell curve, a bell shaped distribution, or this is going to be a skewed distribution because I built it that way. I know where the outliers are going to be. Well, then you have a better sense of how to read the histogram because you've reverse engineered it. In any case, we'll dive into more histogram creation in future presentations.