 Hi, my name is Michele Neute and in this video I'll talk about the free tool StatCheck, a spell checker for statistics. I'll talk a little bit about what the problem is that StatCheck is trying to solve, how it works and how you can use it. So let's start with the problem. This snippet of text you see here is a paragraph from a published psychology article. And we notice that most conclusions in psychology and related fields are based on statistical results like the one I highlighted here. So if we read it, it says simple effects analyses within each of the two levels of valence were conducted, revealing a significant main effect of subtype upon the proportion of positive words falsely recalled. And then there is an F test reported. There is something wrong with this result. And we can find that if we recalculate the numbers that are here, because these statistical results have the interesting characteristic that these numbers should be internally consistent. So if I have the degrees of freedom and the test statistic, I can recalculate the p-value. And if I do that, I find that the recalculated p-value is actually 0.06 and not 0.05, which would make this result not significant anymore. Now we find that in psychology, roughly 50% of published papers that report statistics contain at least one inconsistency. And in about one in eight papers, so 12.5%, roughly of psychology papers, there is at least one decision inconsistency around p is 0.05, like the one that we see here in this snippet of text. Well, why do we care so much about this? Well, three main reasons. First off, it can make results uninterpretable. A statistical result is a very important reason or sort of ingredient on what we base our conclusions. And if we can't trust these numbers, can we still trust a conclusion? Furthermore, specifically relevant for meta-analysis as well, we often need these reported statistics to calculate effect sizes. But if we can't trust these numbers, how do we know if we calculate the correct effect size? And finally, this could also be a symptom of underlying problems. Maybe there are additional problems in the underlying data that we cannot even see. So this might be the tip of the iceberg. This is why we developed StatCheck. So StatCheck is effectively a spell checker for statistics. It's an R package and it has a shiny web app as well. So let's first take a look at what StatCheck does. So StatCheck looks in articles for statistical results to see if they are consistent or not. So you can feed StatCheck an article, it will search through the text and it will look for statistical results that look like this. So it needs a test type. So this is a t-test. It needs degrees of freedom if it's relevant for the test type. It needs a test value, 2.20 in this case, and a p-value. StatCheck can recognize t-tests but it can also recognize f-tests, z-tests, correlations, chi-square tests, and especially relevant for meta-analysis again, q-tests for heterogeneity. It takes these ingredients, these degrees of freedom and this test value. It uses them to recalculate the p-value. Now in this case you see that if I feed it the degrees of freedom 28 and the test value 220, it actually recalculates a p-value that is consistent with the one that is reported in the text. So in both cases the p-value is 0.036 and that means that StatCheck will label this result as internally consistent. So in other words these numbers belong together. However it can also happen that the computed p-value does not match the reported one. So in this case it may be because of a simple typo. So instead of 0.036 you accidentally typed 0.046. In this case StatCheck will flag this result as inconsistent and that means that the three reported numbers do not add up so to speak. So they don't belong together. StatCheck distinguishes between inconsistencies and decision inconsistencies. So if the reported p-value is larger than threshold for statistical significance 0.05 but the computed p-value is smaller than 0.05 or the other way around, StatCheck classifies this as a decision inconsistency. There are two main things to note about how well StatCheck performs. First it can only find statistical results that are completely reported and in APA style. APA is a specific reporting style that is commonly used in psychology. Second StatCheck can sometimes make mistakes in classifying results as consistent or not. Luckily the sensitivity and specificity in this classification are very high. So 85% to 100%. It differs a little bit what settings you use. I'll talk about that in a bit. But even though it's high it's not always perfect. So take care in interpreting StatCheck's results. So how can you use it? How can you use StatCheck for your work? Well it depends a little bit on what your goal is. StatCheck can mainly be useful for self checks, for peer review and for research. And for the self checks and the peer review we created an easy to use point and click web app. You can find the web app at statcheck.io. It's effectively it's very straightforward. You upload an article it will give you back a list of statistics it could find and whether or not they are internally consistent. So if you click on the browse button that you see here in your screen as well you can upload the paper and the result will look something like this. So here I uploaded a Word document but you can also upload a PDF or an HTML document with four statistical tests in it. You see the little table below the browse button with the four tests that StatCheck could find. You also see the computed p-value. So which p-value StatCheck looked for or which p-value StatCheck computed. In this case it's the same for every statistical test because it's a fictitional document that I just created. Usually these are different numbers. You can see that in this case two of my results are consistent so that's great. One is an inconsistency and another one is a decision inconsistency. So this is my own manuscript I was checking. I might want to go back to my text and my statistical software to see what went wrong. Now if I do that in this case I already noticed a potential reason for the inconsistency so the third row in our table. So if I look at the full text of where this result was reported I noticed that I said furthermore we performed two additional t-tests just in case. This test was one-tailed and then a test is reported but this one was not. If StatCheck doesn't realize that this highlighted test is one-tailed it expects the p-value that is about twice as high as the one that is reported. But here I'm very explicit in identifying this test as one-tailed so it seems a bit unfair to count that as an inconsistency. So in order to take this into account there is a setting in the web app that you can use to correct for this. So it says try to identify and correct for one-tailed tests. Now if I check this box StatCheck will rerun on this article but now it searches for keywords in the text that indicate that tests might be one-tailed and you can see that in this case this third row in the table is now counted as consistent. Now for details how this works I'll refer you to the frequently asked questions but this is a quick way to resolve any potential causes for errors. Well the web app is mainly used for self-checks and peer review mainly if you just want to upload one article and quickly see what is wrong in it but you can also use StatCheck for research and if you want to do that the R package might be the best choice for you. So StatCheck is an R package it is published on Chrome so you can install it in R. So note that it's slightly different than most of the other R packages in that it also needs a separate program to convert PDFs to text but the details on how to install this and how to go through the step of the installation are explained in the manual. In R you can do additional things as compared to the web app such as checking just a string of text directly. You can check separate articles just as in the web app but you can also check an entire folder of articles. So if we sort of look at the differences between the R package the R package and the web app a first difference is that you can scan a large body of literature in one go. So for instance if you want to learn something about the general inconsistency rates in an entire field. Another thing that you can do in the R package is there is more options there are more options to specify what you consider an inconsistency. So for example you might have different people might differ in their opinion about whether p equals zero is okay to report or not or maybe you want to assume a different alpha level instead of the 0.05 that is often used in psychology. You can also play around with the latest developments in the R package. All the code is freely available on GitHub. Two of the things I've been working on that you might want to check out is a new PDF reader that doesn't require this additional program to install when you want to use the R package and I'm working on additional code to also look for non-APA reported statistics so that we have a little bit more flexibility in the types of statistics it can find. If you want to know more about StatCheck or if you have any questions you can find everything you need on the StatCheck website so statcheck.io. Here you have all the resources in one place you have a frequently asked questions if you look at the top of your screen there's different tabs the home tab looks like the page that we saw before with the browse button and you can upload your document and get the results but the next tab is the frequently asked questions it explains a little bit on how StatCheck what StatCheck does how it works if there are things that are unexpected how what could be a reason for that. It also has a list of additional reading materials so the manual where you can find the R package a scientific paper that we published about this the validity study and the github page with all the code. So thanks for watching this video I hope that this was useful and that StatCheck can be a useful tool in your work as well.