 Statistics and Excel. Where to find data to practice with. Got data? Let's get stuck into it with statistics. First, a word from our sponsor. Yeah, actually, we're sponsoring ourselves on this one because apparently the merchandisers, they don't want to be seen with us. But that's okay, whatever. Because our merchandise is better than their stupid stuff anyways. Like our, trust me, I'm an accountant product line. Yeah, it's paramount that you let people know that you're an accountant. Because apparently we're among the only ones equipped with the number crunching skills to answer society's current deep complex and nuanced questions. If you would like a commercial free experience, consider subscribing to our website at accountinginstruction.com or accountinginstruction.thinkific.com. We are in one note. One of the problems we often run into when practicing statistics is how can we get a hold of how can we get our hands on some interesting data sets so we can use them to practice our statistical tools on those data sets. Now, obviously we are in the age of the internet. So there's a whole lot of data out there, but you need to know how to search for the data. Because for example, if you just do a search in your favorite browser like a Google search or something, you say just give me some data sets related to salary or something like that. The likely result that you will be receiving is the summarization of some kind of statistical analysis. I'll give you the average of the mean and that kind of stuff when what you really want is the actual data like the survey data, the information so that you could apply your own statistical analysis. So again, there's a lot of different data out there, but sometimes you have to dig a little bit deeper. And if you know what you're field of interest in, then you can go to more of the sources of that data. So if you're interested in medicine, you know, you can go to some of the sources of the actual testing that's taking place to get the data related to that. If you're interested in financing, go there. If you're interested in agriculture or the environment or that kind of information, then you can dig a little bit deeper to actually find the source data. But if you just want to practice the statistics, then you have a couple tools available to you. A lot of the tools that we will be using the data sets come from Kaggle's K-A-G-G-L-E. So that's an interesting resource we'll take a look at. And then, of course, when you're in Excel, you can try to generate your own data if you just want to practice applying your statistical tools and making graphs to it. So one of the ways you can do that is you can create random, a random number generator. Now, obviously, if you just create random numbers, what you're going to get is a source of data that is going to have just random numbers. But we'll talk about different ways that you might want to, that you can kind of make your data sets using the random numbers just so you can then practice making a histogram or whatever we're doing from that line of data. If you want to practice a particular technique, that would be the fastest way to go. And you also have this random between. So the random between will give you a random set of numbers between any two numbers. Say we're talking about salary and we want salaries between 60 and 90,000, right? So if we take a random generator, we're going to take the bottom 60 and the top 90. And again, if you do that, you're not going to get a random source of numbers between the two, right? But at least you can kind of play with that and put together some number sources so you can practice installing graphs and whatnot in Excel. But it would be better to get a nice data source that has actual data in it so we can analyze the data. So let's take a look at this source. So we're going to. So I've set up an account over here. And so if I go into, it's K-A-G-G-L-E dot com K-A-G-G-L-E dot com. If you create an account, I'm in the homepage of the account. And then basically I'm just looking for the databases. So I'm going to the databases. And so now you've got the different ways that you can kind of sort through your databases. So you've got all data, computer science, education, classification, computer vision, NLP and so on. And there's different ways that they provide the data sources. So they've got the user and they rank them actually as well. So they have the usability ranking. So you might want a higher usability ranking. But we're looking for basically CSV files. The CSV file is a file that's a comma deliminated file, but usually you can kind of open it in Excel. So that's going to be an easy file for us to take it and then pull it in to an Excel document so that we can then practice with that data set. So that's what we'll be looking for mainly, at least at this point. So you could filter up top within here. And you can say, I want to filter by just the ones that have a CSV file. We're not looking for these other file types because we just want to pull the data into Excel. So I'll say, give me a CSV file and then boom. And then up top you can sort this way as well. And you can say that you want the hottest, the most votes. So you see they've ranked, you've got the rankings up top here. The most votes, the newest updated usability. So I would think the hottest and the usability would probably be what we would want to sort by. So then you can go into this information, rent prices and so on. So if I go into it in one of these data sets, it's got a usability of 10. That's great. And so I'm going to go down. Sometimes they give you the summary of what the data set looks like down here. And then you can download the data set up top, downloading the data set. And it's in a zipped file. So if I open that up, I've been trying to get this program off so I have the normal zip. But we've got the rent prices. So here we have it in just a list of data. Now just note that this file type, again, is a CSV file type. So you don't usually want to just start working in this file because if you if you say file save as you can see it's a CSV file type. So you want to convert it once you start manipulating the data to an Excel file. You can do that in a couple of different ways. I can go on the CSV file and I can convert it to just an Excel workbook. Or oftentimes I'll just open the CSV file like this. And then I will open another Excel file that I'm working on because I might be working with multiple tabs in a separate Excel file. And then I'll just I'll just copy the data the entire triangle of the data right click the whole data set. I'm not going to try to copy just a bit of it. I usually copy the whole sheet if I want the whole sheet over. And now I've pasted over here in an actual Excel file right so now it's an Excel. If you manipulate the data in a CSV file by the way. What will happen is when you close it back up and then open it again. You'll lose all your data because the point of a CSV file is that it's been stripped of all the Excel Excel formatting, which makes it easier to see it as just data right so so it's easier to kind of upload it takes up less space. And different programs can open it up whereas once you start manipulating it in Excel. Now it has all these Excel type stuff and you can only open it in Excel right so it's just a data file is how I see it in a CSV is it. So I'm going to close this up to you there's a large no I don't want to save that. And so that's the general idea and I'm going to close this back up. So so I I think this is a pretty a pretty neat source but remember this is only one source so if you know what you're interested in. Then you can start looking at the source of of the of the data that's being like census data or economic data and you can find different data if you know what you're in housing data whatever. You can find the data but if you're just looking for databases that you can practice with then searching through these databases is one place where you can go to find a broad arrange and that's why we're going to go here because that's what we're looking for. We want to try to show how statistics is applicable to different things. So we're looking for databases that that stand you know different areas would be the general idea.