 Imperical studies typically need to demonstrate reliability and validity of their measures. The two commonly used tools for doing so is factor analysis for validation and coefficient alpha for demonstrating reliability. There are also other techniques that can be applied, but these are the most commonly used ones and these are also the easiest to apply. Let's take a look at Imperical example. Our example comes from Baron and Tan and they measure social skills of entrepreneurs and here we can see that they are saying that they apply alpha. So there is the alpha character for reliability assessment, they present some numbers 0.85, 0.71 and they also report that they applied factor analysis. So what do these two techniques do and why are they used here, what is the logic? Also what does this table mean? So this table shows the factor analysis results and it also shows us the alphas that they were calculating for these scales. To understand what this table is about, we need to first understand a bit about measurement and what is reliability. So let's assume that we have this bathroom scale here, it's a bit rusty, we don't know whether it is reliable or not. To determine if the scale is reliable, if it always gives the same result when you step on it, it's very easy to check by simply stepping on the scale once, getting the reading, stepping off, letting the scale to reset, then stepping on the scale again, getting the reading, stepping off, letting it reset, stepping again on, getting the reading and stepping off. Now we have three different readings from the same scale. If those all three readings are the same or very similar to one another, then we conclude that this scale is reliable. It lacks any random error. It of course does not tell us if it's a valid scale, it might show 10 kilos too much or 10 kilos too little, but that's not the question that reliability addresses. So reliability is about consistency. If we measure the same thing again and again, do we get the same result? So how do we do that when we measure people? If we want to measure a person's social perception, then we can't simply ask the same question again and again, because a person will remember what they answered the previous time and then they will just repeat what they answered, so it doesn't work. So people don't reset as easily as bathrooms scales. In practice we often use multiple different questions that are called distinct measures. So we have five different questions that are all supposed to measure the same thing. But they are sufficiently distinct so that the person doesn't really recognize that these are actually asking about the same thing about person's social perception capabilities. They are nevertheless sufficiently similar that we can argue that they all measure the same thing. And this is a fine balance on how different and how similar the items can be. There is also another strategy for assessing reliability called test-retest reliability, where we actually ask the same question over and over with a time delay. This is a bit problematic because the time delay for asking a question from a person needs to be days or weeks, or otherwise the person will remember their past answer and will just repeat it without reconsidering the question. So in practice most studies use these distinct measures. We asked, for example, three or five different questions that are all supposed to measure the same thing, but they're different enough that the person doesn't realize that they're being asked the same thing. Factor analysis is a tool for validating these multiple item measures. This is a table of factor analysis results, and what these results tell us, or what factor analysis tells us, that it tells which items go together, which items have something in common, or if there are any underlying dimensions in the data. So the idea of factor analysis here is that if we have five measurement scales, then these items should be grouped empirically according to the things that they're supposed to measure. So these five items are supposed to measure social perception, so we say that conceptually they have in common that they all measure social perception. Then we look at, do they have something in common also empirically, and factor analysis does that for us. So factor analysis identifies that these five items belong to factor number two. So when we look at this table, we want to see a pattern like this. So each item belongs to one factor, these numbers are called factor loadings. Ideally the loading on the main factor would be more than 0.7, this 0.5 is a bit weak, so 0.7 is typically acceptable reliability for that item. We also want to see that the items don't load highly on the other factors. So if we assume that this factor three for example is expressiveness, so these items correlate strongly because they all measure expressiveness, we want to see low values here in the social perception items. These social perception items should not depend on expressiveness. And this is a very clean factor solution because the social perception items only load on the social perception factor but not the others. So we want to see that these are small and the main loadings are large. What is small? Less than 0.2, less than 0.3 depending on the source is typically considered small. Not all the items work ideally. For example this item here, people tell me that I'm a sensitive and understanding person is loading on factor four and factor three and factor one. So it's not cleanly measuring only factor five, which is social adaptability, but it also depends on, for example, expressiveness. So that kind of items we might consider dropping, but it was retained in this study because it had been validated before. So this is factor analysis. You look for pattern where each item loads on the same, that they are supposed to measure the same thing, loads on the same factor and not on any other factors. Then we would typically label these factors, these would be labeled social perception, this would be labeled social adaptability and so on. So this is the tool that we use for validating items empirically. How do we assess reliability? Once we have established that items measure something in common by using factor analysis, we calculate coefficient alpha. How exactly it's calculated, it's not useful to know, but it basically calculates the reliability of the average or the sum of items that belong to the same scale. So if we take the integration scale here with those four items, we take some of those four items, then the alpha here tells us what would be the reliability of that sum. Typically values greater than 0.7 are considered acceptable, but often times we get higher, sometimes we get lower and sometimes lower reliability can be okay if your question is something that we've never asked before. If you're studying something that has very well established scales, then we might require 0.85 or 0.90 reliability because we are the baseline so high already. So let's take a summary of measurement. The important concept of measurement is reliability, lack of random noise in our measures, validity, do the variables actually measure what they are supposed to measure. Reliability is conceptually easy to demonstrate empirically. You just take repeated measures using the same instrument if they correlate it is reliable. With measuring people and their attitudes and perceptions, this is difficult because the person remembers what they answered in the past. So in practice we have to use multiple questions that are slightly different to do that and then we calculate coefficient of alpha. Validity is something that we can only demonstrate less directly. In practice, validity is an argument that we have to make on conceptual grounds. For example, if we use CEO's name as a measure of gender, we have to on the conceptual grounds argue that name is a good measure of gender. It's pretty obvious for most people that that would be a valid measure. Empirically, when we have multiple items, we can apply factor analysis to demonstrate that those indicators that are supposed to measure the same thing also have something in common empirically. And then we assume that what they have in common is actually that they measure the same thing. In practice, using coefficient of alpha and factor analysis is what most of the articles that apply survey data do this.