 Welcome to the first lecture of our basic statistics course. We hope you have as much fun with this as we did. I'm going to start with some key terms. The first term you must know is population. A population is the entire category under consideration. Thus, it could be every lawyer in the United States, if that's what you're studying. Or it could be every single woman, or it could be every adult in the United States. But the entire category under consideration is the population. The way we represent the population size is with a capital N. So when you see a capital N, N for number, we're talking about the population. Now of course it's very expensive to contact everyone in the population. Imagine if you had to contact every single lawyer in the United States. So what do we do? The whole course is based on the idea that we take a sample, which is essentially a portion of the population. A good sample should be representative, or it's worthless. You want a representative sample, and that's why we're going to learn a little bit about probability samples and how they provide assurance that a sample is indeed representative. Now when we're talking about a sample size, remember it's a portion of the population, the sample size is shown with a lowercase N. Capital N, population, lowercase N, sample. So for example, let's say a company manufactures a million laptops. They don't want to test a million laptops. Of course a fortune. So what they'll probably do is take a representative sample of say 500 laptops. So the population size, that's the capital N, is a million. The lowercase N, that's the sample. N is 500, lowercase N. So we're going to test the 500, and that's usually what we do in something called quality control. Now you know about a population and a sample. Any kind of measure that comes out of a population, in other words a characteristic of the population, we call that a parameter. Generally we use a Greek letter. For the population mean, for example, we use the Greek letter mu. You'll see the mu on the slide, that's a mu. It looks a little bit like a U, it's a Greek letter, we call it mu. The population standard deviation, again we use a Greek letter, and that's called sigma. Those are two very good examples of population parameters. Again, a population parameter, these are measurements that come from a population. You take a census, and that's essentially the only way you're going to get a population parameter. You must take a census, and we know that it costs a lot of money to take a census. You've got to contact every single member of the population, so that's very rarely done. The government does it every 10 years, but corporations are not doing it. Normally we work with statistics, in fact that's the name of the course. We don't call the course parameters, we call it statistics, because that's what we usually work with. We work with statistics. These are measurements that come from the sample. Any measurement that comes out of a sample, remember the sample is the lowercase n, small n, not a capital n. So we're not going to look at mu for working with a sample. We actually examine the sample mean called X bar. X bar is a sample mean, and the sample standard deviation we call S. S is the sample standard deviation, and those are called statistics. Those are estimates. We use them to estimate the parameters. So again, the way it works is we're going to take a small sample, a percentage of the population, and we're going to use lowercase n. From that we're going to get statistics such as X bar, the sample mean, S, the sample standard deviation. These in turn are then used to estimate the parameters which we don't know. Remember the parameters are going to be mu and sigma. We're going to estimate those parameters from the sample statistics. Now we're going to talk about statistical inference. What is statistical inference? You know the English word infer to infer something? This is the process of using sample statistics to draw conclusions about population parameters. So we'll give an example. Suppose you take a sample of a thousand. Now the whole population is 330 million. You took a sample of a thousand people, let's say. You got the sample mean X bar. You're going to use that X bar. You're not so interested in X bar on its own merits. It's not so important to do that X bar. You'll see in a moment why it's not so important. But it's a good tool. That X bar will be a tool to infer something about the population mean mu. Now you have to also realize your X bar is nothing special. So I could take a sample of a thousand out of the 330 million. I might get an X bar that's somewhat different. And I'll use that as a tool. We call X bar an estimator of mu. It's used as an estimate of mu. So X bar is, that's called inference. Using X bar to learn about the population from mu. But keep in mind that your X bar is not that special. Somebody else may get X bars a bit different. You know that pollsters, they don't bother contacting. It's too costly except when we took a census. But the pollster doesn't work with a census. The pollster will take a sample of one or two thousand people to determine who they will vote for for president. So rather than contacting everybody, which should be a fortune, they take a representative sample of maybe a thousand people, sometimes more. They will get a sample statistic from that sample. We call it the sample proportion. That is a statistic. And they're going to try to estimate who's going to win the election. That's the population proportion. So notice how we use the sample proportion as a tool to try to predict the population proportion, which is really the measure of interest. This example is from quality control. Try to imagine General Electric has a plant on Staten Island and they're manufacturing LED bulbs. They won't have an idea of how many of these bulbs are defective. Now suppose this plant manufactures one million bulbs a year. So they have a million bulbs and they hire you. Now you're not going to test the whole million bulbs. If you're smart, you might take randomly 500 bulbs to estimate the proportion of defectives. So capital N, that's the population size of a million, you took a random sample of 500. Lowercase N is 500. Now out of the 500, suppose you test them and then a machine, you find 5 out of 500 are defective. 5 out of 500 bulbs are defective. So we call that the sample proportion. Your sample proportion is 1%. So 1% of defective, 5 out of 500. That's called your sample proportion and that's a statistic. You're going to use that statistic using inference tools to estimate the true proportion of defective bulbs and that's called the population proportion. We're going to show that we generally don't use the Greek letter. We're going to use a big capital P because the Greek letter would be pi. So we use P usually for the population proportion. And back to me. Descriptive statistics as opposed to inferential statistics summarize a sample of numerical data, maybe in summary statistics, maybe in the form of pretty pictures, images, graphs, charts, and in that case, we're only interested in turning that data into information about that particular sample. With descriptive statistics, we're not interested in any larger population. We're only interested in the data in front of us and in understanding that data and producing some summaries so that it's easier to talk about the data. Sometimes your data set can be quite large. So for example, if I give an exam to a class of, say, 35 students, ha, when was the last time our class had 35 students? And, you know, suppose I want to use descriptive statistics so I can assess the performance of the class, which we do. Every time we get an exam, you can look at the summary statistics on block board. We could get the median. We can get the average, the standard deviation, the mode, the max, the min. But one thing we're not going to do is look to take these measurements and draw inferences about the larger population. What larger population could we possibly be interested in? Throughout our college, all the sections of STAT 2000, through the whole CUNY system, through all of New York State universities, through colleges across the world, what's the population that my class comes from? It really doesn't make much sense to do that and we don't want to do that. That's not what we're getting descriptive statistics for. We just want to describe the performance of the class on this particular test. That's what descriptive statistics are all about. Where does data come from? It'll come from primary sources or secondary sources. Primary data is data that the researcher collects on his own or on her own. How do we get primary data? Field experiments, observations, in-depth interviews, focus groups, and for the most part, surveys. There's all kinds of surveys. Many of them yourself. Mail surveys in the actual mail and snail mail sent to you by the USPS hardly get any of those anymore. Used to be very, very popular, very low cost, very fast, but you're not going to see that anymore. You're more likely to see email. Your survey can come in email and in the email it could link to a web page so that it's actually a web survey. We have personally administered surveys where you have an actual interviewer and the interviewer has to be trained. That's the human being there and it's the most costly type of survey. But there are advantages because you have a human being. The interview can last longer and can be more in-depth based on the answers to an early question the interviewer could probe and try to understand the situation better and ask follow-up questions. What depends on the point and the kind of data that you're trying to collect? When you need something fast, telephone is probably the fastest because you get immediate response as long as somebody is in and picks up the phone and we all know we hate it and they're fast becoming very problematic because a lot of people don't pick up. But if you're not that concerned about how representative your sample is, telephone will be the fastest. And email is kind of next. It's not too far behind. So basically you've got different types of surveys. You've got advantages. You've got disadvantages. And it's like everything else. You want to look at the objective of your study and figure out which way you're going to go. Secondary data is data that already exists that was collected by somebody else, not the researcher who's working on this project currently. It might be in a library. It might be available for purchase from a corporation or a professional association. It might be census data from the government that you're working with. It might be hospital data. Again, the various ways that hospital data might be available. Secondary data, even though you might think it's difficult to find exactly what you need and that's true, still if you can manage to use what's out there, it's there. You can skip the whole process of collecting data which could be very, very long and involved. It often is cheaper than collecting your own data, of course, as well. What are some of the problems? The data is very easily outdated. It was collected for some other purpose. The variables may not quite be the variables that you're thinking of in your project, the units of measurement, the accuracy. Basically, you're not in control of your data, but there's a lot to be gained. In fact, there are some studies that you can only do using secondary data. Let's take a look. Some of the types of research studies that work very well with secondary data. One is fact finding. You're exactly interested in the secondary data. Maybe the amount spent by different industries or by different corporations inside an industry on advertising. Maybe related to market share. Maybe things having to do with technology, laptops, iPads, cell phones in different countries, Wi-Fi availability in different locations. A fact finding study is a perfect example of when you need secondary data rather than going out and collecting it on your own. Model building. When you have large data sets, one of the things you can do is look for relationships among variables, quantifying the relationship, creating a model, a linear model, or a non-linear model, but basically a mathematical model of these relationships and learning something about it. That's something that you can do with primary data, but you'd have to collect a lot of data in order to accomplish it. And finally, one thing you can do with secondary data, longitudinal studies. A longitudinal study is one where we're not just looking at data we collected this moment. We're looking at data over the long term, like in a time series. And if you have data that's secondary in the library, in institutions, in associations, in the government, generally this data is collected regularly every year, every quarter, every 10 years. And so you can do a longitudinal study without waiting for data collection to be done over the long term. And you can see the way things have changed over time, which you really can't do when you go out and collect your own data. Can errors creep into your survey data? Well, you better believe it. There are basically two major sources of error in survey data, either from the responses or from not getting responses, and we'll see that in a moment. Response errors come from different types of errors that arise in the responses themselves. Now, in this course, we assume that the data we're working with is accurate. We have no other way of doing it, but in real life, obviously, you can't do that. You would like to have your data as error-free as possible. Where would the errors come from in the responses? All kinds of reasons. Some malicious, some not so. So for example, you know subjects have been known to lie, right? It happens. People aren't always totally honest, especially in certain situations. That's why sometimes we go to great lengths to make sure that our survey is completely anonymous and obviously anonymous, to take away the necessity for lying. We want as much as possible to get honest responses. Sometimes the subject isn't really trying to lie, but people make mistakes they don't remember. They think something happened in a certain way when it didn't happen that way. We're human. We're not angels. People make mistakes. Sometimes people want to give an answer even if they don't know the answer, because everyone wants to be nice and helpful. And they think they know what the researcher is looking for. So they try to give the right answer. People want to give the right answer. They want to give the socially acceptable answer. Sometimes it's the interviewer that makes a mistake. When there's an interviewer doing a telephone interview, let's say, the subject says the answer and the interviewer has to write it down. How easy is it to make a mistake in that case? Very easy. An honest mistake. And now how about the dishonest mistake? Interviewer cheating. Yes, I'm sorry to say that has been known to happen. When an interviewer, let's say, is supposed to go door to door and ask people questions, people who are home, and maybe if they're not home, has to go there again and try to get the answer at least twice or three times. Well, sometimes you'll find the interviewer at the local coffee shop filling out the surveys because it's a lot easier. People, as we know, have been known to take the easy way out. Finally, though, the most interesting of these is something called interviewer effects. That's where there's nothing malicious. Everyone's trying to do a good job. Everyone's trying to do what they're supposed to do. But it's inadvertent. Everybody has inadvertent biases. Don't forget, some biases are good biases. And sometimes, for example, certain characteristics about the interviewer could influence a subject's response. Let's say you're doing a survey about racial issues in a particular neighborhood, and your interviewers are all white or your interviewers are all black, or you have white interviewers in some neighborhoods and black interviewers in other neighborhoods and you're comparing. There certainly might be bias creeping into your survey. And it may not be so bad as long as you know about it and you take it into account, but you really have to at least know what you're doing. The bias creeps in sometimes, and we don't know about it, and if we don't know about it, we can't correct for it. Okay, now we're going to talk about a different kind of error that can creep into survey data. It's not that the responses weren't accurate. It was a non-response. People have not responded to your survey, and that's not unusual. We get a low rate of response to a survey. Now what happens if the rate of response is low? See, if it be a random thing, and let's say only 30% of the people you survey respond, it was random, wouldn't be an issue. The problem is, it's generally not random. The people who respond usually are overly interested in the subject of the survey, or they could be more educated, and if only 20-30% respond, you may have a bias. You may have a bias study. That's why you try to do the best you can to achieve what's called a high rate of response. There's a time the government wanted you to get at least 60%. It's getting harder and harder to get 60% of people to respond to a survey. But you can if you try lots of follow-ups and there are ways it can be done. Look at sample 1 and 2. Look at the slide. Sample 1 is with the researcher, the sample size, notice lower case N was 2,000. 2,000 people. But the researcher was successful and got 90% of them to respond. Sample 2 is a much bigger number, dealing with a million people. Notice N is a million, but only 20% responded. There's a lot of non-response. 80% didn't respond. Most people think, well, sample 2, you got 20% of a million. It's a lot of people, right? And they think, oh, it's much better. 20% of a million is 200,000 people. 90% of 2,000 is 1,800. But the reality is that to a researcher, sample 1 is more effective and more reliable. Why? Because you have a high rate of response. So the chance of all your sample is still representative. Actually, like 100%. You're not getting 100%. You got 90%. So this is probably a representative sample and represents the population. So you can make the inference about the population. With sample 2 and only 20% respond, you may not be able to make that inference. Because it could be the 20% that responded do not represent everyone. They could be atypical. So it's of no value to you, possibly. That's why there are all kinds of problems when you get a low rate of response. Let's talk about the different types of samples. Now, let's talk about non-probability samples, which are based on convenience or judgment. Here's some quick examples of non-probability samples. Something called a convenience sample. You basically just take what's convenient. You go to a mall and just take whatever you need, 150 people at the mall. You just take students in a classroom. Say, well, I'll go to five classes. I'll get 200 students. I'm fine. That's convenience. You don't know, really, if you have a representative sample, or a judgment sample. Use your own judgment. This is more like, I'll look at these 20 stores because I believe it's my judgment that these 20 stores represent the whole chain. Because they're typical of what I have in my chain of stores, of 1,000 stores. That's a judgment sample or a quota sample. Sometimes that's done at malls. You tell the interviewer, oh, come back with 100 subjects. 50 must be male and 50 must be female. Another 50 males. 10 should be non-white and 40 white, et cetera. That's called a quota sample. Again, it's kind of a judgment sample. The problem with any kind of non-probability sample is you really don't know how representative your sample is of the population. It's essentially your guessing. You think it's okay, but it may not be okay. Let's talk about probability samples. That's where you collect the sample in a way that every element in the population has a known chance of being selected. You don't advance what the probabilities are of selecting certain people. Let's talk about the most common one that we talk about in statistics, the simple random sample. That's where every element in the population has an equal chance of being selected. You're not responsible for this, because you have to understand what it is. You can do this. If you want to collect a simple random sample, generally we work with random numbers or random number generator. All you have to know is that let's say I have 100,000 employees and I take a simple random sample, everyone had an equal chance of being selected. Let's say the sample size was 1,000 out of the 100,000. Everybody has the same chance, which is 1,000 over 100,000, which is basically 1%. Everybody has a 1% chance of being selected. That's called a simple random sample. Let's talk about other kinds of probability samples. Just briefly. There's something called a systematic random sample. Essentially it's very much like a simple random sample, but you might just take every 50th element. Basically you figure it out. You take capital N, the population size over the sample size, whatever that happens to be. Again, try to imagine we have 100,000 employees in your computer system. You tell the computer, just choose every 20th employee. That'll be called a systematic random sample and probably most cases it's going to be just as effective as a simple random sample. Sometimes you want to stratify. That's when you subdivide the population based on some kind of characteristic. Then you take a simple random sample from each stratum. Stratum is like a group. For example, I want to look at my students. I have freshman, sophomore, junior and senior. Let's say a certain number of freshmen and a certain number of sophomores. That would be called a stratum. It's a reason for doing that. Statistics, again, beyond the scope of this course. A third kind of probability sample is called a cluster sample. You might actually do this by, for example, zip codes. It's kind of a cluster. It's like a certain number of students from this cluster, from this subcode, a certain number from this zip code. Again, there are advanced reasons for doing this in statistics. That's called a cluster sample. You may learn this in another course. It's beyond the scope of this course. What you have to know is that these are examples of probability samples. Remember a probability sample? A known, you know the probability that everything will be selected. Hey, you know it. You purposely weigh it in a way. So it's not equal necessarily. The reasons for doing this in statistics. But these are, again, probability samples. They're not based on judgment or convenience. Now we finally get a chance to look at the data itself. We want to classify the data. We want to know what kinds of data do we have to work with. And one reason we want to do this classification is because depending on the type of data, that's going to determine what kind of analysis we could do with it. For now, let's break things up into qualitative and quantitative data. Qualitative data is categorical. It's also called nominal. And it's data that results in categories. For example, the example listed there, what's your student status? Are you a grad student or an undergrad? You can't get a mean for that. A numerical analysis is not numerical data. One thing you could do is count. You could say, here's the number of grad students in the college. Here's the number of undergrad students in the college. And maybe you could even get a proportion. On the other hand, quantitative data you could do a lot more with. And with quantitative data we're going to be talking about whether the data is discrete or continuous. And we will be talking about that a lot in the course. Discrete data arises from accounting process. How many courses have you taken at the college? That's a number. It's an integer. You're not going to split that up and have a decimal component there. There's no fractions. If I ask you, how many siblings do you have? You're not going to say 1.273. You're going to say you have 1.273, whatever. If I ask you, how many rooms are in your house? That's accounting process. Continuous data, on the other hand, arises from a measuring process. Instead of asking how many we're asking how much. How much do you weigh? Now, you may think you weigh 127 pounds. But the reality is that there's no such thing as weighing exactly 127 pounds. 127 could mean 127 with 8 million zeros after it, if you really are exactly 127. It could mean 127.4532 pounds. Basically, the data is continuous and it's we're going to have a continuous value even if the decimal portion is zero. You can't do that with children. You can't say and I'll get back to the example on the slide now. You can't say well, I have two children but it's really 2.3217638. No, of course not. So with the question about children, that's discrete data. The question about weight, that's continuous data. We're going to spend a little time now talking about levels of data. Sometimes it's called levels of measurement. Same thing. And we're looking at the data in terms of whether it is a nominal, ordinal interval or ratio. And you can see how the arrow points up. Nominal is the lowest level of data. Ordinal is the next interval is the next and ratio is the highest form of data. We're going to look at each one of these in more detail in the next slide. Nominal data is qualitative data. You get classifications or categories. You don't really get something that you can measure even though we call it a measurement but it's all we get is frequencies really. We can ask are you a CIS student or a psychology student? We can ask what's your ethnicity? Are you white or non-white? We can ask what's your occupation? But you don't get any numeric data out of this unless you code it and then it's meaningless. You can code undergraduate 0 and graduate 1 and there's still not numbers. All you have is categories and all you can do is count. It's a, what can you get out of it? Let's see what we have here. We can get a frequency. How many in each category? We can get a percentage for each category, the percentage of the total. We can get a mode which is the most frequent category. But what we can't do is anything more higher than that. We can't get averages. We can't get anything that requires numerical analysis. We can't even get a sum because if you can't get a sum, you can't get averages. We can't get a standard deviation. We don't even have a median. We don't have a middle point because we can't order these. If the categories had some sort of ordering, it would be ordinal data which we'll see in a moment. Suppose we have in a class, let's say 30 females and suppose at the moment that those are the only two categories. These are frequencies. What's the percentage? We have 60% female. What's the mode? The mode is female. Even if you code the data, it's meaningless. You can't force an average. You can't force numeric responses in nominal data. Let's talk about ordinal data. Ordinal data rises from a ranking. And then think to remember that ordinal data is that the intervals between the points are not equal. We don't do not have equal intervals. But what we can do is, let's say we check the top box. You have more of some kind of characteristic than somebody checks the next box. You see that, for example, with category of hurricanes. Category 5 is a worse hurricane than category 4. Category 4 is worse than Category 3, etc. Or this from class standing. A senior has more credits than a junior. A junior has more credits than a sophomore. A sophomore has more credits than a freshman. Those of you who took geology know this with hardness of minerals. There's a scale there. The hottest mineral is diamonds and talc is the softest. It's very rank. General is more than a kernel which is more than a major. What do we mean by unequal intervals? Look at the example. We have three people who checked these boxes for income. John Smith is earning under 20,000. Jane Doe checked the box for 20,000 to 49,999. Bill Gates checked the third box. He's making more than 50,000. Again, you know that the intervals are not equal. Jane Doe is making $3,000 more than John Smith. Bill Gates makes $50,000,000,000 more than Jane Doe. The intervals are clearly not equal. The appropriate statistics when something is measured on an ordinal scale, when you have ordinal data, you definitely can do the same stuff with nominal data. You can use the median, but you really technically can't use a mean. Using an average, a mean when you have unequal interval sizes. More on ordinal data. Ranking scales are obviously ordinal. Take something you've seen so many times, strongly agree, agree, neither agree, no disagree, disagree, strongly disagree. You know the intervals are not equal, but you know that if you strongly agree, it's more than agree. Another problem with ordinal scales, look at this example here. We ask you to rank for four scenarios. Being hit in the face with a dead rat, being buried up to your neck in common ore, failing the course, and having nothing to eat except chopped liver for an entire month. Okay, now we ask you to rank it. And you can rank it, one, two, three, four, but doesn't mean you like any of those choices. Presumably it's not one of those four choices is something you want. So ranking doesn't tell you that just because I ranked something one that is something I like. It may be better, I'd rather be failed the course than being buried up to my neck in common ore perhaps. Now we have a higher level of measurement, interval data. An interval level of measurement. We have equal intervals, the intervals are now equal. But the zero, you don't have to really have a true zero. For example, let's talk about temperature or IQ, most test scores there's not a real zero. With temperature, you know the zero is arbitrary. Okay, but you could say let's say we're looking at Fahrenheit. The difference between 40 degrees and 50 degrees, that's 10 degrees, is the same as 1780, 10 degrees or 90 and 100. The number, or same with the test score, the zero doesn't mean you have complete absence of all knowledge. But if somebody gets a 30 and another person gets a 40 beaten by 10 points, well a 90 is 10 points better than an 80, but the zero does not mean the complete absence of what is being measured. You can't really speak about ratio, so that's why if the temperature in New York is 40 degrees and the temperature in Buffalo is 20 degrees, it'll be wrong to say that it's twice as cold. First of all, as you can tell, if you're doing in Fahrenheit once you change the centigrade, it's going to be different numbers and the ratio will be different. Now why can't you do ratios? I told you why, because it's not a true zero. If you want to speak in ratios you need a real zero, a true zero which is the absence of what's being measured. So this is good news, if you get a zero on the stat test it doesn't mean you have absolutely no knowledge of statistics. That's not what the zero means. Now what are the appropriate statistics for an interval scale? Well, you can do everything you can do with a nominal order, you can do the mode, you can do the median, but now you can do the mean, that's important. That's why when something is on an interval scale, you're allowed to do a mean, compute the mean. Okay, now we're going to talk about a ratio scale, that's the highest form of measurement in a sense because you have equal intervals, you have a true zero, real zero, and that you have with weight, length, height, unit sold. That's why you could say, if some rock weighs 400 pounds and another rock weighs 200 pounds you can say, oh the 400 pound rock weighs double the 200 pound rock and it's going to be true in any units. Switch the kilograms, guess what? It still weighs twice as much, or a line. This line is let's say 5 feet long and you compare that to a line that's 2.5 feet long. You can say, well this line is twice as long as the other line. 5 feet is double 2.5 feet and it's going to be the same with the switch to other units to inches, it doesn't matter what unit you use, it'll still be twice as long and that's because you have a true zero. Once you have a true zero, you can talk about ratios like double, triple and again, 100 pounds is double 50 pounds. $100, if you have $100 in your bank account and somebody else has 200, guess what? They've doubled the amount of money and if you both convert to Frank's, guess what? The one person that had $200 now had double the Frank's as the person that had the $100. What kind of data do you want to collect? The goal of a good researcher, you try to use the highest level of measurement. Remember the lowest level measurement is nominal you can't do means all you can do is mold. So you try to get the best measurement possible. For example, if I wanted to know about your smoking behavior you can see you have a choice here. A or B, A is a nominal scale. Do you smoke? Yes or no? I can count. I can say I did this at Brooklyn College or Baruch College and I found out of a thousand students 300 said yes to smoking 700 said no. I have a count, a percentage if you want 30% smoke. But if I do it in the B format how many cigarettes did you smoke in the last three days? Now remember that becomes a ratio scale because the zero is real. If I smoke zero cigarettes in the last three days, that's a real zero and we know what that means. For somebody who says 50 cigarettes that's going to be double 25 cigarettes. B gives you a ratio scale so you want the best measurements. For the one question you're going to ask it's always better to ask it in a way you get a more powerful kind of measurement. And again with a ratio scale you can get the mean, the median, the mold, the frequencies. We're going to talk about what type of data to collect again. A and B, comparing soft drinks. Now which question is better? In A we ask rank. Rank the following soft drinks from one to five. One is the best. If you like coke the best that will be a one. Pepsi the best that becomes a two. The one you like least that gets the five. So that's basically you get an ordinal scale. You have rankings. But B gives you a more powerful measurement. Look what you're doing with B. You're asking them to rate each of those soft drinks. Each one has to be rated on a one to seven scale. The scale is on the side there. The one is excellent. Two is very good. Three is good. Four is fair. Five is poor. Six is very poor. Seven is awful. Now it's very hard to say that's equal intervals. You can't really prove it so easily to show that the intervals are equal. But it's close. It's darn close to an equal interval scale. And we assume such that it's equal interval. But even if it's not it's certainly better than the ordinal scale. It's closer to interval and it's almost interval and now we can do the mean on each one. We can say the average for coke was and now we know what these numbers mean. Let's say we get an average for coke of 1.6. That's between excellent and very good. Let's say it's a 3.5. That's between good and fair. It's a better measurement. And that's why researchers will do it that way. Rather just ask people to rank five beverages. Let's talk about the objectives of this course. First of all, you'll learn how to summarize data, descriptive statistics. You're also going to learn how to use sample data to make inferences about population parameters. We call that statistical inference. And basically, and this may be the most important thing, we want you to be an informed user of statistics learning to think for yourself. You have to realize we're going to leave what's called a post truth world. And we've seen so many lies about everything. Fake news has now become very very popular. There's a lot of fake news. A lot of junk science. They invented a term called alternative facts. The goal of this course is that you should actually know how to make a distinction between something that's junk science and real science or fake news and real news. One thing, one way to do this is to ask for the statistics. Look at the data. Try to look at the data and make judgments. And this course will help you. Like many of your professors will be teaching you nonsense. They'll be teaching you, we call these indoctrination courses. They want to indoctrinate you. This course will teach you to ask a simple question. Let me see the data. Where is the data that says X causes Y? As you all know there's no relationship between vaccines and autism. There's absolutely no relationship. So anyone who tells you that vaccines cause autism, where's the data? There's no data. It's all baloney. I have no time to go into it. But if you ever look into it you'll find out why it's nonsense. Or some of your professors might even tell you something like how wonderful communism or Marxism is. Well, good news. You can look at the data. 18 countries have tried Marxism slash socialism communism. Guess what? 18 failures. I think you're smart enough to know if they're 18 out of 18 failures maybe we shouldn't test Marxism. Maybe it's not a good idea. I don't care what my professor tells me. If you strike out 18 times out of 18 at bats maybe it's not such a good idea. This course will teach you about looking at evidence, thinking for yourself. And again, professors of statistics make mistakes too. But look at the data. Look at the numbers and make sure you're looking at accurate numbers. And that's what this course is all about. Appreciating facts. Real facts. Real data. Looking at numbers. And don't say things that are not based on evidence. You can ask to see the evidence. There's so much data out there you just have to know where to go and nowadays it's easy with Google. Make sure you look at actual data. Good luck with the course. I hope you enjoy the course.