 Liibilityä ja validityä ovat tietysti tärkeintä karakteristikselle hyvää measurementsa. Liibilityä on todella ympäristöinen, että valitaan ja todella ympäristöinen, että se on tärkeintä, että oletko the same result over-and-over, jos sinä ympäristötä samaa measurementsa, sitten sinä voit käyttää sitä konsistenssiä, että sinä ympäristötetä määrin, että alkaa validit ja esim. Eli se on todella ympäristöinen. The issue of validity is much more complicated. Validity refers to whether your indicators measure what they are supposed to measure. The problem is that because we cannot observe the thing being measured directly, we cannot really statistically assess whether the indicators correspond to the attribute or the trade or the construct that we want to measure. So validity and validation are complicated topics and in this video I will introduce you to some of that complexity. One thing that makes validity literature difficult for a person who just started reading it is that there are so many different terms. So measurement validity is whether an indicator measures what he is supposed to measure. That is fairly straightforward to define what exactly that means. It gets to some complications. But then there is all this terminology. You have a face validity, content validity, the convergent validity, the discriminant validity, the normality validity and so on. So there are so many different terms. Do you have to understand all these? Are these facets of validity that all have to apply? Are they different definitions? Are they contradictory and so on? One way to understand this literature or start to understand this literature is to understand that there is a difference between validity and validation. So validity refers to whether the indicator measures what it is supposed to measure. Validation refers to different ways that we can argue or assess validity. And these concepts are mostly focused on validation. Denny Borswoon's article in Psychological Review notes that these terms originate from questions such as asking people whether they think that the measurement is valid. So that's a way of validation that led to a term face validity. Whether the measure can predict something useful, that is predictive validity. So it's about validation more than about validity and these are two different things. So how do we argue validity and how do we define validity are two different things. If you just look at the definition of validity and then the things are much simpler because you don't have to understand most of this. But there are important terms that you need to understand because they are commonly used. And I will now explain three of them. These originate from psychometric texts from 1960s or at least they are. The Nunnallis book from 1960s is commonly cited as a source for these terms and that made these terms popular. So are these content validity, predictive validity and construct validity? Are they actually about validity or validation and are there competing concepts or are there complementary concepts? Do you have to demonstrate all of this in your study? Or do you have to focus on one? Let's take a look at what these concepts actually mean. So these are different things. The idea of content validity is that your indicators in your scale measure all different aspects or dimensions of the phenomenon. A typical example is a math exam. So if you do a math exam, then it has to cover all the content of the course. So if you have an elementary school, math exam, there is a subtractions, multiplications, divisions and sums that you have to calculate. So you have four different things. If you only cover subtractions, then you lack content validity. So it's a way that the indicators summarize some domain that the test or exam is supposed to summarize. So content validity is mostly focused on educational measurement or something where you have to summarize people's capabilities or skills in a certain domain of things with a single score. Predictive validity is about prediction or forecasting. And forecasting means that can you actually, based on your data, say something about the future. It's not measurement. Prediction and measurement are two different things. A typical example is college entry exams. They are not designed to measure who is good at school, who is smart or something else. They are designed to predict who is going to do well in the college and who is going to graduate. Because the college is not as interested in getting people who are smart or hardworking than it's interested in getting people who are going to graduate. Then we have construct validity and this is about construct measurement. But it is a special kind of validation technique. So construct validity is not the definition of measurement validity. Instead it is a validation technique. And why that's the case becomes clear on this next slide. So the idea of construct validity is that there is a nomological network. The nomological network is a network of constructs and their theoretical relationships. For example, the example given by Boris Boom and colleagues is that we have intelligence as our focal construct. Then we have general knowledge as another construct and criminal behavior as another construct. We have a strong hypothesis that intelligence is negatively associated with criminal behavior and positively associated with general knowledge. The idea of construct validity or construct validation is that we assess our measure of intelligence. Let's say we use an IQ score and we check if the IQ score correlates positively with general knowledge examination score and negatively with length of criminal record. So the idea is that we have this theoretical word here, the nomological network. And we have the empirical word here, our measured correlations. And then we check whether the measured correlations from our data matches these theoretical expectations. So whatever our measure is here, it is valid, construct valid if these relationships between the measured scores correspond to the relationships that we theorize. This is a somewhat useful way of assessing validity. So if your scores don't behave as expected, then that's one reason to either doubt the validity of your scores or doubt the correctness of your theory. So that's useful. But this is a very limited also because consider if you have a very green field of study. So you're studying something that hasn't been theorized much before. So where exactly would you get this nomological network? If you're the first person to introduce a new construct to your field, then how exactly are you going to argue that a construct has an established relationship with other constructs because there is no existing research on that construct. But this is basically the idea of construct validities, whether these empirical correlations are good representations or proxies of these theoretical relationships. One important thing that construct validity and these other three, other two commonly used validity terms don't address is that they don't really address what is the relationship between your data and your theoretical concept. So content validity basically just addresses whether these data cover the content of the thing that you're studying. So does your math exam cover all the things that was taught during the course? Predictive validities, do these scores predict something? So those two are not about theoretical concepts at all. So predictive validity and content validity, there is no theoretical concept in their definition. Construct validity has the term construct in the name and it also concerns the theoretical concept. But it doesn't address whether the data corresponds to the theoretical concept. It only addresses whether the relationships between the variables correspond to the relationships between the theoretical concepts. That is interesting, but it doesn't really address how the theoretical concepts are related to the data. So that is beyond these terms. So how do we define validity? One good candidate definition is that we define test as a valid if the attribute being tested or measured exists. So we assume that the construct exists independent of measurement and that is the realist perspective on measurement. Then we claim that the variation in observed data is due to the variation of the construct. So there is a variation in the construct. Let's say there is the construct of the intelligence, some people are more intelligent than others and there is variation in IQ scores. We say that the IQ score is a valid measure of intelligence if the variation in the intelligence causes variation in this score. In other terms or other words, some people perform better in IQ tests because they are more intelligent. Some people perform worse because they are less intelligent. So that's the idea of variation in construct causes variation in the observed data. And so the observed data is of course a function of construct and some measurement data. That's an easy definition. What is difficult is to argue that your scores are actually valid. So validation is the hard part. Defining validity this way is very simple. So how exactly do you validate and what do you have to write into your paper to convince your readers that your measures are valid. To understand that let's take a look at compare this latent variable model for validity and construct validity. So the construct validity perspective is more about epistemology. So it's what can we learn from the correlations in our data? Can we use the correlations in our data to learn something about the constructs? That is a useful way of validation, but it doesn't really address whether the test is valid. Then the latent variable theory presented in the last previous slide is about ontology. So does the attribute exist and does the variations in the attribute produce variations in the test score? So these are different, the focus is slightly different. The concept of focus here in construct validity is in the correlations. So it's the meaning, what do the correlations mean? Can we generalize from observed correlation to a theoretical correlation? And in the latent variable model the idea is on reference. So do the indicators, the variables actually refer to any real entity? We have to argue that. Then the empirical focus is on correlations. In construct validity we check the correlations between our data. And if those correlations match the theoretical expectations, we conclude that the test is valid. In latent variable theory we have to argue the causation. So validation here is not a methodological problem but a substantive problem. So we have to really argue why we think that our IQ test or innovation score actually varies because the construct is being measured varies. So we have to explain ideally what is the mechanism of variation. So how do exactly persons intelligence for example influence how they do in IQ scores? This is of course a lot more challenging task and it places more emphasis on validation studies and the theoretical part of the validation study whereas construct validation is simply about calculating correlations and see whether they match empirical expectations. Both are useful because if your measures don't behave as expected that's a reason to suspect that the measures may not be valid but ultimately that is not sufficient to claim validity. You have to claim our look at the causal process. We can also take a look how the latent variable theory differs from classical test theory which gives us the definition of reliability. The idea is that a classical test theory is a psychometric model. It's not a measurement theory so the scope is much more narrow. It's a model that describes how people respond to surveys or how they respond to different psychological tests. Latent variable theory is about measurement theory and it takes the realist ontology. Classical test theory doesn't really say anything about ontology so it doesn't say whether the scores measure anything. It only gives us reliability and true score. Then latent variable theory is focused on validity and construct measurement. Equations for these two models can look similar. Classical test theory is explicitly defined as an equation. The observed scores are deterministic linear combination of true score plus some random noise. In the latent variable theory this is more general. We are just saying that variation in the construct scores causes variation in the observed scores. There is therefore some kind of statistical association between the construct and the measure. It may not be necessarily linear so we can model other kinds of relationships. This takes the statistical model as an approximation for the causal relationships. The true score construct influence in different indicators. In classical test theory we take it as a assumption that the true score influences all indicators equally. So if we eliminate all random noise in the data then all indicators are going to be exactly the same because they share the same true score. This is called the tau-eq rule as an assumption. Tau is for the true score in Greek. Here in latent variable theory we just say that the various indicators depend on the variation of the construct. But we don't really make any explicit claims about how the dependency manifests statistically. So different indicators may depend differently on the construct. Some may be more sensitive to certain levels of the construct than others. And this allows us to do all kinds of statistical models. Particularly the IRT or item response theory models are based on this kind of thinking. Measurement error in these models is classical test theory simply about random noise and independent between items. Then in latent variable theory we can have all kinds of sources of measurement error. But the key thing that we have to argue is that the construct actually is a cause of the indicators. Or the variance of the construct is a cause of the variance of the indicators. And that is much more challenging to do than simply assessing reliability. Here is one very simple way that we can use this approach. There are latent variable models to assess reliability and validity. So if we take the assumption that linear statistical associations are useful for assessing causal relationships then we could say that the observed score is a function of the construct score. We use t here plus some systematic measurement error plus some random noise. So there are different causal influences to the true score or the construct score that we are estimating with that kind of model here. So we have error in reliability and we also have the systematic error in validity. The problem of course here is that if you have unique random noise and then you have an indicator that is unique then it may be difficult to know whether the indicators measurement error is actually validity error or reliability error. So oftentimes you can really say which one it is. Then a summary of all this. We don't really have any proofs of measurement validity. So validation is more of a substantive argument than a statistical argument. Nevertheless we can say that if many indicators or two or more indicators are highly correlated then they may be measuring the same thing. We just don't know what the thing is and we have to argue based on theory that the construct actually causes certain kind of behavior in people and then that's how we argue in validity. Then it's possible that the indicators correlate for some other reason and if measure behaves as expected with respect to other measures it may be valid so that's the construct validity way of validating things and it's a useful technique but you shouldn't rely on it as your only technique. Typically with the latent variable theory you work with this kind of models so you specify one latent variable as a source of variation of multiple indicators and this is called the common factor model. So it's a factor analysis model and that's commonly used with this kind of validity framework. This is a very complicated topic. If you want to study more about validity I can recommend you two good books. I like the writing of Denny Borsboom. So he has written a book called Measure Me in the Mind which is an introductory level book. So you can read that after reading for example the Vellys scale development which gives you an overview and once you have read that book then you can look at a more challenging text such as Frontiers of Test Validity Theory by Keith Marcos and Denny Borsboom which summarizes a broad range of validity literature and it's fairly condensed so that's probably not best for the first book but it's a really great overview of test validity theory.