 This video explains the phenomena of statistical and methodological myths and urban legends. So what are these statistical myths and how do they emerge and what's the outcome for research practice? Let's start by asking the question how should I or how do I choose which analysis techniques or which methods to apply. One reasonably sounding strategy is to take a look at the journals where you want to publish and see what other people are using in those journals, but turns out that that is the source of methodological myths and urban legends. These are beliefs that are widely held but are not correct and can lead to suboptimal decisions and even incorrect results. So how do these myths emerge? Typically the way we get new methods into applied disciplines is that somebody introduces an idea in a research methods journal such as psychometrica or econometrica. Then somebody in the applied discipline reads the article in the research methods journal, uses the technique, perhaps slightly misunderstands the technique and then sites the methods journal that they got the technique there. So what will happen then when a next person in that journal wants to apply the same technique? Do they go to the methods journal, try to understand the equations and proofs and simulation results? Or do they just look at the justification that this empirical paper gave for the technique and explanation of how and why it works? They will go to the empirical paper instead of looking at their methods paper. Chances are that there is another callous citation to the idea and the idea becomes more misunderstood. So this is similar to the broken telephone game that many people have played as kids. If you have ten kids in the row, the first tells a message to the second one who repeats the third one who repeats the fourth one. And then by the time the message reaches the tenth kid it is something completely different than the original message was. So these long chains of citations from one empirical paper to another instead of looking at the original source cause confusion and misunderstanding. So what's more problematic then is that when we have these two articles here that cite the misunderstood idea then people in this discipline think that they have their own body of knowledge about the technique and all the other people who want to publish or most of them who want to publish in this journal cite these two papers as evidence for this is how the technique is supposed to work. Then the more careless the careless citation and the more misunderstood idea becomes institutionalized in the research practice so that no one even questions it. Once we have ten papers that apply a technique incorrectly or repeat a claim that is not true then everyone think that that claim is true because it has been repeated many times. What will happen next is that this more misunderstood idea will be institutionalized to the discipline through the review process and doctoral student teaching. When you take an introductory research methods class quite often those classes will tell you that these are the techniques that we apply in our field and then they show you how to apply those techniques using a statistical software. Instead of explaining that this is what the methods lead to says about this technique and that leads to the current application or past application of the technique instead of the proven properties of the technique or method driving its future use in the discipline. Then if you have a person who wants to do the technique right that person runs into problems because of review process. So you have a person who wants to do the technique, the idea correctly cites the methods literature and then subbies to this journal the reviewers will say that no this is how the technique is applied and the exciting five different of these articles. So it actually becomes that the discipline starts to enforce the incorrect application of the technique and this is very difficult to break. There are a number of articles and books about this topic. One of the leading authors is Bob van der Berg and he has these special issues in organizational research methods as well as edited books. One good idea if you want to understand your techniques that you apply well is to use the term methodological myth and then the name of your technique in Google Scholar because this is actually widely used term to explain these misunderstandings. So not only you need to understand how your techniques are applied and how and why they work it's useful to understand whether the common misunderstandings or misconceptions about the technique and these articles about methodological myths and urban legends are useful in that regard. They are typically written in a way that is fairly readable for applied research instead of being like original hardcore research methods articles that explain algorithms and provide proofs and equations. These are fairly easy to read and they are also fairly fun to read at least for myself. So I recommend these techniques. Let's take a look at a couple of examples of methodological myths and then we will take a discuss how you can avoid spreading these myths in your own work and when you really work by others. So this is an article that I reviewed recently. And the article was not accepted as it is but instead we invited a revision and asked the authors to completely redo the analysis. So let's see what's going on here and the authors had an endogenetic problem and this was a nice article that it actually noted that there is an endogenetic problem and tried to do something about it and they decided to apply two states least squares. So in the first stage regression analysis they regressed the endogenous explanatory variable x on the instrument and then in the second stage regression analysis they took the residual from the first stage regression analysis and used that in place of x as a predictor of the final dependent variable. So what's the problem? The problem is that this is not how two states least squares work. So you don't take the residual from the first stage regression analysis instead you take the fitted value. What's, where do these researchers come up with the idea that this is how two states least squares is supposed to be done. They cite two articles that were published the previous year in the same journal. That's fairly common. You cite articles that use the same technique that you have applied before and it happens that these two-sided articles quoted here actually explain the two states least squares procedure incorrectly. So what would have happened if that article would have been published as such? Then there would have been three recent articles that explained two states least squares and the person who doesn't understand the technique, if they want to know more about it, they read first article then they look at the explanation of the second article that looks the same if they're still not sure if that is how two states least squares work, they will look at the third article that says the same. And all this is a misunderstanding perhaps by one of two researchers that is just repeated in the literature. Instead of looking at original sources or good methods books, people cut corners and they look at the guidance provided by the journal that they taught. So that's one example and to avoid this would be a good idea to justify your choices based on methods literature instead of previous empirical applications. This is an example of where analysis results were clearly incorrect because the technique was misapplied. The second one is less severe but this is perhaps the most widely spread methodological myth. The myth is that coefficient alpha must be more than 0.7 to be acceptable and that nonnally in 1978 in the book Psychometric Theory has stated so. This is an example of this myth in action. So we have a 0.7 cut off being cited without a page number. Without a page number citation is a good indication that perhaps the reader has not actually read the book but has is citing it out of the habit of doing so in a particular discipline. So this is not true, nonnally says nothing of the sort. They didn't give a specific cut off. What the book actually says has been written about in many different places and you can also check the book itself. The recommendations for reliability values is that the value should depend on the context. So if you have a very early stage research, you have a new scale that no one is using before then perhaps 0.7 is a good cut off. But if you have more mature research, more mature area and you are more interested in getting the magnitudes of the effect right instead of just checking whether the effect exists or not then you could perhaps need something like 0.9. So nonnally clearly explains that context matters but people read this as that 0.7 is the ultimate cut off and in every scenario if you have in any scenario that applies to all scenarios that's not what nonnally says but that's what the myth is 0.7 is always enough. It is not always enough and nonnally does not recommend one cut off for every scenario. A more reasonable strategy for comparing reliability statistics is to look at what other people in your discipline what kind of results they have gotten using the same scale what kind of reliability statistics. And then compare is your reliability better or worse than the previous applications of the scale. That is probably a lot more relevant reliability standard than a psychometrics book written more than 40 years ago. Let's take a look at third example and this is a big one again and this is about partial least squares. I have written some papers about this topic. The idea of a partial least squares analysis is that we apply a recursion analysis but instead of applying doing our scale scores as sums of items we take a weighted sum before doing the recursion analysis. So the partial least squares analysis is essentially an indicator weighting system for creating composite variables or weighted sums to be used in recursion analysis. There are many myths around this technique and I will focus one of them and the particle myth that I'm focusing on that the way that the partial least square algorithm weights the indicators increases reliability. This is stated in for example this editor in MIS quarterly which is the leading information systems journal and also an FT 50 journal. So optimization of the weights by the partial least squares algorithm aims to reduce measurement error, so improve reliability. The problem with this claim is that there is reasons to believe that it cannot be true and there is no evidence for it being true. If we take a look at how we form indicators when we construct scale scores for recursion analysis the typical way is that we take a sum and there's a known problem that when we take a sum of the indicators then we will underestimate the relationship between the variables that those indicators represent. So here you can see we did a simulation study for a paper and we varied the true correlation between the things that we measure and then we simulated different data sets and these estimates from recursion analysis using weighted sums of scale items are systematically too low so they are underestimating the true relationship. It is true regardless of whether we take an equal weight sum or just take a sum or mean of the items or whether we use weights that are optimized to maximize reliability. This is something that you can't do but in simulated scenarios you can. So even if we have an ideal set of weights that maximize reliability in a simulated scenario where everything is under our control there is no noticeable advantage in reliability over weighting indicators more based on their reliability compared to using equal weights, weighting each indicator the same. Just looking at the claim that on weighting more reliable indicators more than unreliable indicators sounds reasonable but it doesn't actually improve reliability and there is no evidence for it to do so. So how can people start to believe that the partial least squares weights particularly would improve reliability in a meaningful way? Let's take a look at evidence that has been supported to this belief. There are book chapters and articles mostly published outside the mainstream research methods journals that claim that there is evidence for this phenomenon. For example, Chin's 1999 book chapter cited here tells that in their simulation study the partial least squares weights after applying those weights the regression results were more accurate than using equal weights that we normally do. Okay, so people claim there's evidence for this so what does the evidence actually say? Let's take a look at what the partial least squares weights actually do. The weights, how they work, is that they create a bias away from zero and if you don't consider technique fully like we do in a rigorous research method study but you, for example, simulate the correlation values between 0.2 and 0.5, for example, then you can fool yourself into thinking that these scale scores from the partial least squares algorithm are more reliable because it's actually this bias away from zero happens to be cancelling the bias due to measurement error in this particle scenario. So that is not evidence for reliability, it's just evidence that in some scenarios one source of bias can cancel another source of bias. Of course, as a routine research practice relying on one bias to cancel another one is a really bad idea. Additionally, the objective of your analysis is to check whether the existing effect or whether an effect is non-zero, then a technique that is biased away from zero so that it hardly ever indicates that your effect is close to zero is probably the worst possible thing that you can do in terms of indicator weight. Of course, why people like to use this technique is that it provides you support for the existence of results even if the results are so that there is actually no effect because normally we want to demonstrate that our hypothesis are actually not refuted by the data. So what can we do about these problems, the statistical methodological myths and urban legends? Beyond there are articles about these phenomena specifically editors are trying to do something. For example, in this editor in the journal of operations management guide and ketokivi, they are using specifically partial least courses. An example state that you should always have a basic understanding what your analysis technique does. Unfortunately, many research methods courses focus on how a technique has been applied in the past and then how you apply it using SPSs or some other software instead of explaining what are the basic principles that the technique is based on. You don't have to be a statistician but you have to understand the basics what is the principle that allows the technique to work. Then another recommendation they give is that you should never provide justification in the form that expert X has recommended that technique Y should be used in a particular scenario. No, the methodological choices should be justified based on methodological evidence. For example, if you want to justify using technique X, then you can say that, well, method X, technique X has been proven to be the idea technique in this particle scenario. By proving we mean that there exists, somebody has written a mathematical proof that for example recursion analysis is unbiased in certain conditions. Then you don't necessarily have to cite the proof itself but if a good textbook says that something has been proven, then you can cite that textbook as an example. Then another way of justifying your choices is to point out that simulation evidence, which is another way of supporting methodological claims, points out that technique X works well in conditions that are close to your conditions. Never use the justification that expert X recommends method Y. Experts, if they really are experts, they will always provide you the justification for the recommendation. Explain the justification instead of saying that someone says so. It's also worth thinking who is an, if you cite expert, who is an expert. If you want to say something about recursion analysis, should you cite an econometrics professor that has built their career on studying recursion analysis and related techniques, or perhaps a marketing professor who has built their career in applying that technique in marketing scenarios. So which one is a more better source to cite. Then never apply empirical precedent as justification. As demonstrated by the two states list, for example, that someone has done something in a past article in the journal where you publish, does not mean that that is the correct thing to do and it's not evidence for the thing being correct. Site good books, cite articles in applied research methods journals such as organizational research methods, but the fact that somebody has used something before it's not evidence for that technique to be useful. It probably correlates for that technique to be useful, but it's not direct evidence. Finally, always read what you cite. So when you cite a book about recursion analysis, then you should read that book, or at least a part that you cite. And if you cite, provide a page number. It's much more difficult to do careless citations with a specific page number than to do careless citations just to a book and then relying that I hope that somewhere in the book it says so. One of my favorite things to complain about as a reviewer is citations to econometrics books such as Green's 2012 book, which is more than 1,000 pages. People make a claim, or the authors make a claim about their methods and then they cite Green's book without the page number. When I see that in my response letter, I tell the authors that you need to add a page number in the Green's book because you can't possibly expect me to read the full 1,200 or so page book to check their claims. Typically, in a revised version, the citation is removed. That's indirect evidence that authors never actually read the book in the first place. If they had read it, they could have provided the page number.