 The reliability of a study or measure is the two-step process. First you need to choose a statistic to quantify reliability, and then you need to interpret the specific value of that statistic that you calculated from your sample. In this video, I will talk about the choice of reliability coefficient or how to choose the statistic that you apply to estimate reliability. Reliabilit ovat asioita, että se on todella vaikea vaikea vaikea vaikea vaikea vaikea vaikea vaikea. Reliabilit voivat olla ympäristölle vaikea vaikea vaikea vaikea. Tällä puheessa olen tullut vaikea vaikea vaikea. Olen myös tullut vaikea vaikea vaikea vaikea. Reliabilit voivat olla erilainen. Meillä on test-reliabilit, joka valitaa valitettavasti vaikea vaikea. Meillä on these internal consistency measures valitaa vaikea vaikea. Meillä puhutaan senaaleja tai teknikseja, jotka on installeita, jotta on kovaa vaikea vaikea, ja vastaamiseen kovaa vaikea. ja sitten katsotaan, että jos nämä erilaiset ovat hieman korreettisia, niin ne voivat olla ympäristöintiä. Tämä ovat basuja. Joten katsotaan esimerkiksi, ja meidän pitää ymmärrää ympäristö on todella ympäristö, jotta ymmärrää ympäristö on ympäristö. Tämä on esimerkiksi ympäristö- tai ympäristö- tai suurin suurin sekaisen, kaikki nämä kysyvät ovat ympäristö, jotta on ympäristö, jotta on ympäristö. Jotta nämä ympäristöä ovat ympäristö sieltä hieman erilaisten ympäristö- tai suurin sekaisen, jotta on ympäristö, jotta on ympäristö, niin myöskin järjestäjät ympäristö, jotta ympäristö ei tule muistaa, että nämä kysyvät panna, the same question immediately again so that there is very little time delay and repeat this brainwashing and asking, brainwashing and asking many, many times and then we take an average of those measures and that is our true score. The idea why it's a true score is that because we assume that the measured score x is a function of true score and random noise, then in large number of replications of the measurement process the random noise, different instances of random noise cancel out each other and what is left in the mean is just the true score. The true scores for these three items are T1, T2 and T3 and one of the important things that we need to ask ourselves when we choose a reliability coefficient is are these true scores the same? If the true scores are the same then the scale is referred to as being tau equivalent. The tau here refers to the Greek letter tau which is sometimes used to indicate the true score so it's like Greek T and equivalent of course means that two things are the same. So are these the same? There are reasons to believe that they probably would not be the same in this case. For example if you have a highly innovative company then a person would probably agree that the company is highly innovative. But if the innovative is what that company would be more about efficiency and processes instead of new products and services then a person might actually disagree with this item. So you may have a company that is highly innovative but nevertheless the person would disagree with the new product concept item so that would mean that the true scores between these two items would not be the same. It's also possible that there would be some dimensionality in these scales. Of course this is the three items so we can't estimate any multi-dimensional model. But it's conceptually possible that for example these two items X2 and X3 would be more correlated with one another than X1 because they are more concrete. This is fairly abstract so these are easier to evaluate. This is more like a subjective opinion, less objective. So that's another reason why the true scores may not be the same. If the true scores are the same then we can just use coefficient alpha but if they're not the same then we actually need to pick another reliability coefficient. So we're going to be looking at internal consistency reliability coefficients and before we go into detail in these coefficients it's useful to check what exactly a reliability coefficient quantifies. So these coefficients are applicable to multiple item scales. So you need multiple items and then we calculate correlation between the items and that correlation gives us the correlation matrix quantifies internal consistency. How much the items correlate with one another. We have the measured score consist of true score and an error score. Then we calculate the scale score. So I'm using capital X to signify the sum of these individual indicators X and that sum is our scale score. And reliability coefficients quantify the variance of the true scores of the scale score for different individuals divided by the variation of the scale scores for different individuals. So how much of the scale score variation is due to the true score variation. That is what reliability coefficient quantifies. They are not about individual item reliability but they are calculated on the scale level and typically for a sum calculated from that scale. Another way of calculating the reliability coefficient is to calculate based on estimates. So this is variation of estimated true score divided by variation of the scale scores. We can also calculate it differently. Variation of true scores divided by variation of true scores plus variation of error scores. So that's another way of estimating total variation. And the third way to calculate this reliability coefficient is to calculate unreliability. So this is the variation of error scores, error variations and divided by the total variation. Then you subtract that from one and that gives you reliability coefficient. All these coefficients work the same way. So there is something divided by something and we use one of these three forms depending on which one is more convenient for a particular scenario. And that gives us a coefficient. We refer in how the true score variation, error variation or the scale score variation is calculated. One important assumption that all reliability coefficients make is that the error scores are random noise. So the error variation is random and it is independent of the true scores. Different coefficients make different assumptions in true scores. Tower equivalence assumption made by coefficient alpha means that the true score is the same for each item. And then we have other coefficients that relax these assumptions in different ways and then they produce different kinds of reliability estimates. So coefficients differ how the variation of true score and variation of the scale score are calculated and that's basically it. Two important things that you need to consider or the two most important assumptions in coefficient alpha, which is the simplest of these coefficients that I'll address, are tower equivalence and unit dimensionality. If you have a tower equivalent model, then a factor analysis that has all the items loading on a single factor indicates a solution where the factor loadings are roughly the same. And if the factor loadings are different, so for example 0.5 to 0.8 here, then we don't have tower equivalence. The items are not equally reliable or items have different uniqueness or however you want to interpret them. So that's the first assumption, tower equivalence. Another assumption that is important is unit dimensionality. And how do you assess unit dimensionality? If you do an exploratory factor analysis, then the unit dimensional scale basically means that one factor is sufficient to explain all variations in that scale. If you do a converter factor analysis, which is a bit more rigorous, then a converter factor analysis would show that the factor model fits well. So that's a single factor model fit well to a scale. If yes, then there's evidence for unit dimensionality. So these tower equivalence and unit dimensionality are the two things that you need to consider when picking a reliability coefficient. There's a great article by Cho on choosing reliability coefficient and making reliability reliable, that summarizes all these various coefficients. And he categorizes these coefficients on the dimensionality, whether there's unit dimension or multi-dimensional model that expresses the data, and whether there's tower equivalent or congeric. Congeneric basically means that the items depend on the same true score, but they do so to different extent. So if you have unit dimensional and congeneric model, then you have a single factor model that fits well, but the items load on that factor with a different degree. One of the problems in this literature is that these coefficients, they are named rather inconsistently. For example, there's omega coefficient, and then there's another coefficient referred to as omega that applies to unit dimensional case and multi-dimensional case. Also, if we have KR 20 or Hoitz method or coefficient alpha, then those don't really, the names don't describe what the coefficients do. And this is confusing. Composite reliability here. Well, all of these are composite reliability in the sense that they quantify the reliability of some of the indicators. So why one index should be called composite reliability but not others? So that kind of things. And Joe's article advocates that we should use a more systematic approach to these coefficients, and that would be a very good thing if it happened. But unfortunately, we are just so used to calling tower equivalent reliability as KR or coefficient alpha, that it's unlikely that this more systematic approach that he introduces in the article actually takes hold. These split-half reliability coefficients are not common. So I'm not going to be addressing split-half coefficients. The idea of these split-half coefficients is that you take a scale of, for example, four items, you take two items, you calculate scale score, and you take another two items, you calculate scale score, and then you correlate the scale scores, and that quantifies reliability. These are not very commonly used. They used to be common in the early days when calculating other coefficients was more cumbersome. But nowadays, with computers, we can just calculate any of these coefficients in fractions of seconds. So calculation is not an issue. Then there is parallel reliability. I'm not going to be talking about parallel reliability either, because that is typically too constrained. Parallel reliability means that there are the same true scores and that the means of error variances are the same. So in tower equivalent condition, we're typically focusing on the essential tower equivalent condition, which means that the true scores are the same, but they can differ in their means. And it doesn't really make a difference in this coefficients if the means are different, because they're not included in the correlation matrix. So I'm just going to be focusing on these four classes, unidimensional, tower equivalent, unidimensional, congeneric, multidimensional, tower equivalent, multidimensional, congeneric. And how do you choose a reliability coefficient and what do the different reliability coefficients actually quantify? There is a nice article by Dan McNeese in Psychological Method that advocates that people should abandon alpha and use one of these other coefficients instead, which he argues are superior. Some of these are superior in some ways. Some of them are inferior in some ways. It's important to understand what these quantify and under which assumptions. Remelas, omega-total, and omega-hierarchical are applicable to by-factor models. So when you have multidimensional data, then use some of these indicators. Then our greatest lower bound, GLB, is a special kind of coefficient that does not really make any assumptions about the dimensionality of the true scores or the scales. But that's kind of like a worst case estimate of reliability. So it gives you an estimate of reliability that almost certainly underestimates the reliability. For example, if you want to apply errors in variables recursion analysis, then using the GLB coefficient would be a very bad idea. But sometimes calculating this kind of worst case estimate would be useful. So if a worst case estimate gives you 80% reliability, then you know that your reliability is probably pretty good. The two important things to consider when choosing a coefficient are tau equivalence and unidimensionality. There's another nice article by Joe and Kim, and they present this workflow for choosing which reliability coefficient to apply. And the first step is the scale unidimensional. I'm going to be focusing first on the scenario where the answer is true. So there's unidimensionality, then we need to check if there are tau equivalence. So are the factor loadings of our well-fitting single factor model, are they the same for all items? If yes, we have tau equivalence, if no, then we don't have tau equivalence. If yes, then we use coefficient alpha or tau equivalent reliability as Joe's paper, other paper advocates for the term. If no, then we use omega total, composite reliability or congeneric reliability. These are just different names for the same coefficient. So this is fairly simple, either you use alpha or you use congeneric reliability depending on the factor loadings. Things get a bit more complicated when we go to the multiple factor models. But let's take a look at what alpha quantifies and what the congeneric reliability quantifies. So this is the equation for alpha, and Joe's article basically, this is the typical way of presenting alpha. So that's the equation from which it's calculated, and Joe's article tells us that this is another more intuitive way of understanding what the formula means. So this formula here is something divided by something. So we have variance of true scores divided by variance of scale score. So we always have the true score variation divided by scale score variation or its estimate, and that is our reliability statistic. So that is the same idea in all reliability courses. So what is the numerator, what is the denominator? Well, the first denominator is scale score sample variance. So that's just the variance of capital X and calculated from the sample. And this is something that Joe advocates that we should always use the scale score variance as the total variance just to be consistent. So each indicator uses the same denominator, so they're easier to compare because only the numerator is different. Then the numerator is mean item covariance. And what is the idea of the sum of mean item covariance? Well, the idea is that if the items are tau equivalent, then the covariance of those items quantifies the true score variance. Because that is the only source of covariance, only source of set variance. And if they are tau equivalent items, then the covariance is between all the items should be the same in the population. So our best estimate of what is the amount of set variation between items is simply the mean of covariance. And that's the logic in coefficient alpha. So it just takes k squared, creates the number of indicators because that's how you calculate the variance of a sum. There is an important thing to understand of coefficient alpha that leads to other coefficients. It is that if the tau equivalence assumption does not hold, if the true scores are not perfectly correlated so that each item has some level of uniqueness or there are some minor factors beyond the true score that you're interested in or if the items are congeneric so that they depend on the same true score but do so to a different extent then alpha has been proven to underestimate reliability. And because alpha underestimates reliability when the tau equivalence assumption does not hold it is sometimes referred to as a lower bound estimate. So if you have unidimensional data and you have a coefficient alpha of 0.8 then you know that your reliability is going to be at least 0.8 if tau equivalence does not hold. So if tau equivalence does not hold then your reliability is actually higher than what alpha indicates. This is the reason why alpha is referred to as lower bound but it's not the best lower bound actually and I'll come to that toward the end of the video. Then we have this case of unidimensional but not tau equivalence. In this scenario a single factor model fits well but the factor loadings are not the same. McNeese's article recommends omega total and coefficient h in this scenario and omega total is sometimes known as compositive reliability or congeneric reliability. Coefficient h is referred to as maximal reliability. I'll be looking at this coefficient first and then we're going to be taking a look at how the maximal reliability coefficient differs from the congeneric reliability. The idea of congeneric reliability is that you estimate the factor loadings. So factor loading is an estimate of individual or indicator reliability if the single factor model holds. And then we calculate the sum of factor loading squared that quantifies the model implied true variance. So that is the model implied true score variance of the scale score. And then we divide that by the model implied scale score variance which is the sum of these true score variance plus the error variances. Another way of calculating the same coefficient is to use the scale score sample variance. So instead of calculating what is the implied covariance, assuming the model is correctly specified, we can take at the actual variance. If the model is correctly specified, the scale score sample variance and model implied total variance should be about the same. If the model is misspecified, then these two quantities will not be the same. Typically we calculate these estimates based on converter factor analysis. So we run a converter factor analysis and then we take the lambdas. We take the error term variation and then we calculate using this formula. There are extra seats for doing that online and then you present the result. So that is composite reliability or congeneric reliability. So what does this coefficient H quantify then? Well, this is kind of like omega total but for optimally weighted items. So the optimal omega total and every other or all other coefficients in this video assume that you have a scale score calculated as a sum of items. Coefficient H uses a weighted sum or a weighted mean and those weights are calculated to maximize the reliability in the population. But this coefficient is a bit problematic and the problem is that your estimates of reliability factor loadings, they are just estimates, they are not known correct population values. And if you use an estimate of reliability to weight an item, then those items that are overestimated by chance only will be weighted more than those items whose reliability is underestimated. So you are basically giving more weight to positive estimation error of individual item reliability than you are giving for negative estimation error. And this weighting process actually produces biased reliability estimates. So this coefficient H or maximum reliability is positively biased and it's not really clear what would be the advantage of calculating a weighted sum instead of a sum where you want to quantify reliability. So I can't recommend this coefficient. You can read more about that argument in our paper in Psychological Methods. Okay, so that is the unidimensional items, unidimensional indices. We have coefficient alpha and we have omega total. Then we have our bifactor models, or Revellous omega total, omega hierarchical. And then we have our stratified alpha that Chille recommends, but that's not included in McNeish's paper. And these are for the multi-dimensional scenario. Let's take a look at what these coefficients do. So a chose paper in 2015 with Kim presents this workflow. You first assess dimensionality. Now we conclude that there is multi-dimensionality. We need a bifactor model for the data. Single factor model does not fit well. And then we need to choose whether we actually calculate the reliability based on the bifactor model or whether we estimate the reliability without the bifactor model. If we go without the bifactor model, then we have stratified alpha, which is sometimes called a multi-dimensional tauical and reliability. That's the term that Chille 2016 recommends. And if we choose to go with a multiple factor model, so we actually estimate the bifactor model, then that gives us various omega coefficients and various factor reliability using chose terminology. So various of omega that assume a bifactor model and these could also be called factor reliability, which is a more descriptive term. Let's take a look at this scenario first. So we have unidimensionality. There is no unidimensionality and there is no tau equivalent. So this is the most general scenario. Here we have, in McNeish's paper, we have Revellas, omega total, and omega hierarchical. And these coefficients both rely on the same factor model. The idea here is that if you calculate this, for example, using the psych package in R, then the reliability coefficient itself or the calculation procedure estimates a bifactor model internally, and then it takes the main factor and the minor factors, and it produces your reliability estimate. How exactly the bifactor model is estimated, it's not relevant here, but it's useful to understand that this takes care of some of the dimensionality issues in the scale. If you want to have more control in the estimation procedure, you can always estimate a bifactor model yourself and then calculate the reliability coefficient using formulas that I'll show in a few slides from now. So how do these Revellas, omega total, and omega hierarchical differ? Well, the difference is that the similarity is that both of these use scale-score-sample variants of standardized items. So we take the standardized items, we sum the items, and then we take a variance of those sums, and that is our denominator. But the numerators are different. Both of those numerators have the general factor variance. So model-implied variance by general factor in the standardized metric, and then we have, in Revellas, omega total, we have model-implied variance by minor factors. So the difference here is that whether you are interested in the variation, the degree of error variation in the data, or whether you are interested in the degree to which the main factor, the general factor, explains the indicators. So omega hierarchical basically takes the approach that you're only interested in the general factor, and the minor factors should go to the item-specific error, or specific factor error if you want to use that term. So we are not interested in that, that is nuisance variation, that is unreliability, and we just are interested in the model-implied variance by the general factor. In Revellas omega total, we are interested in all variation that is systematic between the items. So we just want to evaluate, assess how much of the scale score is random noise, and how much is the variation is due to the factors, including all the factors. So it depends on what quantity you are interested in. I would assume that most researchers consider that this is more in line with the classical best theory assumptions. So a classical best theory defines the random noise as unreliability. But then again, this is probably more practical because you are typically interested in understanding how much one unit dimension latent variable explains the indicators. So perhaps this omega total, omega hierarchical is a more useful coefficient. Generally in management research, I haven't seen either of these coefficients in the use months, but they are certainly useful. Now let's take a look at not unit in MSNL and tau equivalent conditions. So this would be applicable for some reason you cannot factor analyze your data. That is Cho explains that that's a reason if factor analysis does not converge, then you can go with this kind of reliability coefficient. That's kind of a bad recommendation, because if a factor analysis does not converge, there is typically a reason for non-convergence. And when you have a non-convergence solution or inadmissible solution or something like that, you should understand the underlying reason because it may indicate another problem in your data or in your model, and then when you fix that problem, it can fix the reliability coefficient as well. But to be systematic, I'm going to cover this coefficient as well. The idea of this stratified alpha is that you split the scale into dimensions. So if we have a six-item scale, we know that the first three items have a minor factor and the items four to six have another minor factor. Then we calculate the coefficient alphas for the first half and the second half of the scale. So those scales, those subscales are unidimensional because they are affected by one single, one minor factor and a general factor, so there is unidimensionality because there are no like two different factors that affect different sets of items. And then you use those coefficient alphas to calculate how much of those individual scales scores is error variance. And then you sum the error variances, you divide that by the total variance of the scale and you subtract that from one and that gives you a reliability estimate. So you basically split the scale in half, you calculate the unreliability of each split half and then you sum the unreliabilities and then you divide by the total variation and you subtract from one, that gives you the stratified alpha. This is not very commonly used. And what's the problem here is that how would you know that items can be divided into subgroups that are total equivalent each if you cannot run a factor analysis. And if the reason for using this coefficient is that factor analysis can't be applied but you need to do factor analysis to show that it's assumptions hold, so what would be the point of this factor analysis, this coefficient. So we have scale score sample variance here and we have variance of each subgroup or the error variance of each subgroup and we take a sum we divide. So that's the coefficient. Joe gives this nice table of these different coefficients where he expresses the coefficients in more systematic form and all of these forms are of the same form so there's something divided by the scale score total variance. So how these differ is how you do the true score variance, how do you estimate the true score variance of the scale score. And basically he recommends that for the congeneric case and multi-dimensional case you apply different kinds of factor analysis model. So you can use bi-factor model, correlated factor model or second-order factor model depending on which model makes sense for your data. And then you calculate what is the implied variation of that factor on the items and then you calculate, take a sum of those implied variances, you apply some covariance algebra and you get this kind of equations and then you pick the one that matches your scenario or your model and that gives you the reliability estimate. So how do you choose your reliability coefficient? Basically three is the procedure, first dimensionality with the factor analysis if you have unidimensionality then does factor analysis, does the tau equivalence hold, if yes then you use tau equivalent reliability or coefficient alpha, if no then you use congeneric reliability or composite reliability or omega which are the same. If there is multi-dimensional data then you should do a bi-factor model, correlated factors model or something like that to model the dimensionality and then calculate your reliability estimate based on the dimensions. Tau also recommends that if factor analysis can't be calculated then you can use the stratified alpha but that's a bit of a bad recommendation because you should always understand the reason for the difficulty of estimation and then fix the reason instead of trying to work around the problem. Of course sometimes that's not possible and in that case there are stratified alpha could be useful. Okay, so we have covered these reliability coefficients except the GLB. So the GLB is a bit different and the idea of GLB is that it's a worst case estimate and alpha is in a way a worst case estimate but the GLB, the greatest lower bound has been proven to be a better worst case estimate. So reliability is always greater than alpha which means that alpha is a lower bound and the greatest lower bound has been proven to be actually always higher than alpha. So in large samples this is a better worst case estimate than alpha. So what does this equation here quantify? We have in the denominator we have scale score sample variance as always and here on the denominator in the numerator we got indicator error variance. So how we calculate the greatest lower bound is that we basically calculate what is the maximum amount of error variance that each indicator can have and then we take those maximums, we take the sum and that gives us the error variance we divide by some of the scale score variance that gives us the total error variance the third total error variance in the scale score we subtract from one and that gives us reliability coefficient. How exactly these error variances are calculated is beyond this video but the idea is basically that if your items are correlated very highly because errors are assumed to be uncorrelated then we know that the error variance is probably very small. If the items are weakly correlated then we know that the error variance can be very large and we basically look numerically and we try to find the largest possible set of error variances that is still compatible with the observed correlation matrix and that provides us the greatest lower bound estimate. In practice this is always greater to alpha but it's difficult to calculate and it's positively biased in small samples. So if your sample size is let's say in the thousands or close to a thousand then this may be a better coefficient than alpha but if you are in the 200-400 range then this is probably too much positively biased to be useful. I have not seen this actually being used in management research pretty much ever but for completeness it's useful to know that it exists. So to conclude you generally choose among these coefficients based on unit dimensionality and tau equivalence. You run a factor analysis model if the factor analysis model fits well all factor loadings are equal then you go for the tau equivalent reliability or Chromebox alpha or color efficient alpha. If your factor model fits well but the loadings are not equivalent, not the same then you choose congeneric reliability or composite reliability. If you have multi-dimensional data then it's a good idea to calculate multi-dimensional model using a factor analysis and then you pick one of these formulas based on which kind of factor analysis model you fit the data or you develop your own formula it's not that complicated if you know covariance algebra. One final thing about reliability coefficients is that you don't actually always need one so you need to consider if reporting one is actually necessary or is it avoid reporting unnecessary things. For example if you do a structural equation modeling study where you have structural regression models you don't actually form scale scores at all so one could ask what is the relevance of calculating the reliability of a scale score if you never calculate the scale score in your study. In that case reporting the raw factor loadings would be more useful because they give more direct information on reliability of the items themselves than any of these reliability coefficients do.