 Literature on measurement and reliability, sometimes differentiates between different types or different facets of reliability. This difference is sometimes made also in the more applied literature that discusses different reliability coefficients. Typically, when we start discussing reliability, the starting point is this classical test theory equation. The idea in classical test theory is that the variation in the observed scores x is a function of the variation in true scores t plus some error variations. Or an individual observed score x is a sum of individual true score plus some random noise realises. But this is not the only way of thinking about reliability. One source of confusion is that when I teach measurement, I quite commonly give this book chapter from Singleton straights, research design book to students and then they will read and memorise that reliability is about stability, consistency or equivalence and then they write that into an exam. Without really understanding what does it mean that reliability is about stability, what does it mean that reliability is about equivalence. Are equivalence and stability the same or are there perhaps different types of reliability, different estimates of reliability or what are these terms about. Let's take a look. This article by Smith and co-authors in Psychological Methods explains the difference in that normally the random noise can be divided into three main components. There is the random response error, this is purely noise, transient error which is something that relates to time and specific factor error which is something that relates to the measurement instrument or survey items. And if you have multiple raters then you have the radar error and you could for example calculate inter-rater reliability estimates which are different from these reliability estimates based on transient and specific factor error. So what are these different errors? So what is a random error, transient error and specific factor error? These have been introduced by, worked by Krombach and they come from his work on generalizability theory. And the idea that what the random error is that this is simply something that relates to the occasion and it relates to, this is just random noise like if you are accidentally mark an incorrect person, incorrect scale option in a survey form then this is a random error. Something that occurs simply by chance only and it's not stable, it's not consistent between items and so on. So this is perhaps the closest to E in classical test theory. But this is not the only kind of random error in the data. It is also possible that you have transient error and transient error is something that relates to the measurement, the occasion, that specific time. For example if you are measuring person's satisfaction with life then that is something that is probably fairly stable from day to day and only changes over longer periods of time. However if you ask a person how satisfied they are with life at this point then their mood which varies from one day to another for example if you did not sleep well you may be in bad mood and then you may indicate less satisfaction. So this kind of transient errors, errors that occur for a specific time and then go away are what the transient error is about. And the transient error is about stability. So if there is no transient error and the construct that we measure is stable then measures should correlate highly over time. Then we have the specific factor error. So the specific factor error relates to the specific measurement procedure. So if you have multiple indicators of the construct then those indicators typically have some variance components that are not due to the construct. For example if you ask a person how innovative they think their company is that probably measures innovativeness of the company but also it measures the general attitude of that person towards the company. So that would be a specific factor error, that sort of variance. The idea of specific factor error is that each indicator can have this specific source of error that's called uniqueness in factor analysis which combines the item specific variation and the random noise into one variance component in the model. But we assume that the specific factor errors will cancel out if we have multiple different items but if we have the same item measured at different locations. So we do a test-retest then the specific factor error if that exists would be incorrectly attributed as a reliable variance because it's not related to the true score. This article by Lee, Smith and Putka in Organizational Research Methods presents this kind of various decompositions. So there is the variance of x, the variance of the measured score is true score variation plus person occasion variation which is the transient error plus person item within scale and person scale variation which is the specific factor error. So that is the variation that is stable over time and that relates to the measurement instrument. This is variation that is not stable over time and it does not relate to the measurement instrument but it relates to the mood of the person and that kind of things. And then we have the final component pure random noise. It's important to understand that when you choose a reliability coefficient or reliability estimation approach, different estimation approaches and different coefficients can give you different numbers because they quantify different sources of random variance. So you can have for example if you have single measurement occasion and single measurement item then you of course cannot estimate reliability because you don't know how much of that variation of that measure is due to the true score, how much is due to the error. If you have multiple measurement occasions then you can evaluate stability. So you can check if that measurement is stable over time but you cannot know if there is specific factor error. If there is some sort of variation relating to that specific item that is not due to the true score but should be considered as measurement error then that can be quantified using test reliability. Then we have multiple measurement items and single measurement occasion and this is the most common scenario where we address reliability. And then we would say that this is about equivalence. Test retest is about stability and multiple parallel tests or tests that we assume to be parallel are about equivalence. And for example coefficient alpha, composite reliability, oh my god, other indices are these equivalence indices. Then we have also a coefficient of equivalence and stability, which for example this Ormeses Arisa Smith's paper discusses and they are applicable to scenarios where you have multiple measures from multiple measurement occasions. Then you can parcel out this variation in the true scores to transient variation, item specific variation and random noise. So that allows you to do that kind of variance decomposition. While we normally work with coefficients of equivalence it is useful sometimes to understand that if your test retest reliability and your coefficients of equivalence for example coefficient alpha give you different values. It does not necessarily indicate that one of those values must be incorrect. It is just that these quantify different aspects of reliability.