 Okay, welcome everybody to this session about pre-registration in social sciences. I'm the moderator of the session. And I'll start out with a short introduction about pre-registration, about the concept of pre-registration. And I also try to coordinate the Q&As after each talk, and I will end up with some closing remarks. But first, what is pre-registration? So pre-registration is basically that you specify your research design, your hypotheses, and your analysis plan before you're actually looking at the data of your study. And you see on the right here, on the right figure, that it's gaining popularity a lot. So in 2012, we had 38 pre-registrations on the open science framework. And Brian Nosek actually expects that at the end of this year, we would have 85,000 registrations logged on the open science framework. So it's really gaining popularity, which takes the question, why do so many people pre-register their study? Well, as you're probably aware, we're currently facing this replication crisis, where a lot of results do not replicate. A common belief is that it's partly because researchers, they tend to desire a specific result. Usually this is a significant result, a P value below 0.05. And the problem with this is that researchers, during their studies and during their statistical analyses, they either consciously or subconsciously steer their results in that direction. So for example, in the data analysis, they omit certain variables or certain values to get this to this big P value of lower than 0.05. This is a practice called P hacking. How can pre-registration help with this? In pre-registration, you set out the rules in advance. So you basically, all your decisions, you already make clear from the beginning. So there's no room to make these data contingent decisions later on in the process. So that way your hands are tight, so to say, and P hacking is not really an option anymore. So this way, pre-registration can help solve replication crisis, so to say. That's the theory at least. So how is pre-registration? So is it actually doing what it's supposed to do? We're not sure. So there is some evidence from medical sciences, because they already have mandatory registration of clinical trials for over 20 years. There are clinical trials of GOF, which you see on the left side. On the right side, there is a systematic review by Ruby Thibault and others. And they find that in a lot of clinical trials, the registration does not match with what they actually do in their studies. So there is selective reporting, both of interventions and LSEs and outcomes. So how is that in the social sciences? Well, that is the topic of this session. Our four talks. So we will start with Marcel van Assen. He will talk about the quality of hypotheses in psychology, pre-registrations. Then I will take the helm. I will talk about effectiveness of pre-registration in psychology. Then George Afoso will talk about effectiveness of pre-registration in economics and political science. So we capture several social sciences. And finally, Sarah Ann will close with a talk on pre-registration and testing science. So first, I'd like to introduce Marcel van Assen for the first talk. Marcel has a background in mathematical psychology. He is currently a professor in mathematical sociology at Wittecht University, which is in the Netherlands. But he spends actually most of his time at Tilburg University. He is just appointed as Vice Dean in Education. And most importantly, he is my PhD supervisor. So I'm really happy to announce him to you. So Marcel, take your way, please. Yeah, thank you, Olmo. Starting up here. Yeah, can you see my presentation? Thank you. So my talk is on the quality of hypotheses in pre-registrations. Olmo and I, we led this research. We are both from the Meta Research Center. And five first year psychology students did this research with us. And Maxime Sidnikov, he's a research master's student. So what do I mean with a quality of hypotheses? Well, what I mean is the structure and formulation of hypotheses, not a content. We do not evaluate the content. And how can we, we, we loosely speak about the quality of hypotheses? Well, loosely speaking, it is, if you see an hypothesis, can you link it to one specific test? If you have a data to test this hypothesis. So let's loosely the definition. And here, a hypothesis, which is not really satisfying this. If we read this while some fact, some other factors may emerge as predictors, for example, job satisfaction, we generally expect these effects to be weaker than the effects of concerns. Just a moment. Presentation has gone. I don't see the presentation anymore. I see the presentation. So you may wonder, okay, but this, this is no hypothesis. Well, I would agree with you, but many hypotheses that we find in pre-registrations are like this. So what is the context of this project? I'm talking about the quality of hypotheses in pre-registrations. Well, this is embedded in a huge pre-registration project by Olmo. So this is just a tiny piece of it. And as I said before, it is part of a research project for with five first year psychology students. And although this is not the topic of this presentation, I highly recommend to do projects with a few first year students really giving energy. And it's much more fun than giving lectures to 300 psychology students. So what is the main message? I often start off with the main message because some people may drift off and then it's most important that you know what the main message was. I have three main messages. And the first one is that pre-registration is really important. And I think the most powerful method for safeguarding the quality of conformatory empirical research in theory. And it's that I think I believe strongly that in a number of years it can be 10 or 50 pre-registration will be the standard in the social sciences as it is in medicine. Now, okay, so pre-registration is very important. Remember this because the remainder of the talk will show some problems with pre-registrations. And that is actually the second message here that pre-registration in psychology does not yet safeguard quality of research. It's not yet effective, because if we look at the hypotheses of these pre-registrations, then only 40% of these hypotheses was good. 80% was not. And if you look at the pre-registration level, only 80.6% of pre-registrations was good. That is in 82% almost of pre-registrations. There was at least one hypothesis that was not making the most sense. And the conclusion I draw, I personally draw from this is that most research in psychology, that pre-register, do not take pre-registration seriously and or do not fully understand pre-registration. This is a very harsh conclusion, but I hope that after you see the data, you tend to agree as well. And finally, the main message is that because pre-registration is so important, we should do it more often, but at the same time, we also should do it better. I have no fancy colors or gadgets in my presentation, so I have to do it now. These three main messages can be summarized in, yes, pre-registration, yeah, we should do it. Let's take it, oh no. And then third, come on, come on, we should and can do it. Okay, let's continue then with the contents. The value of pre-registration. Well, and one of the main messages was that the pre-registration is very important. In empirical research, I tried to explain this, but I realized that this is very hard, you have to take much more time for it, so I will very briefly summarize why it is important. It goes like this. First of all, empirical papers contain p-values. And you could also tell the story with other statistics, confidence intervals, Bayesian base factors, it doesn't matter. One statistical piece of evidence, say p-values and decisions depend on it. And what we learned from our statistics classes is that how these p-values work is that we have a population, sample, test, done. And then p-values work perfectly, and we can interpret p-values as alpha 0.05, then p-value of 0.05 is rejection or smaller, and when it's higher, we don't. And these p-values then represent the probability that you get these data or more extreme when a novice. So that's the interpretation of a p-value. And the problem is that when we have not such a simple process as we were taught in our statistics books, but data driven decision during the process, then properties of p-values are destroyed. That is, it's no longer the case that the probability that you reject a hypothesis. The probability of a hypothesis is 5%, p-values and their properties no longer work. And this data driven decisions are known, yeah, and almost called them p-hacking, but I like more research and degrees of freedom and the garden of walking paths. So these data driven decisions, p-values no longer have their properties, and decisions based on them are therefore, and, yeah, can be questions in many papers. Now where does pre-registration come in, in pre-registration we specify one path, how the research will go, and in that way, the properties of p-value can be preserved. In our paper in 2016 of the meta research group we explained how you could try to avoid this questionable research practices or the research degrees of freedom to get only one path. And what I will talk about today is the start of the path or a phase in the beginning of the, of the path that is the hypothesis. We can only guarantee that we have one path in confirmatory research when the hypothesis is well specified. The hypothesis is not well specified. We already run into problems. So that's the significance of this project I am going to talk about. How do we define a good hypothesis? Well, we defined or thought of five criteria that a good hypothesis should satisfy. A good hypothesis is a hypothesis that satisfies all these five criteria. The first one is that the hypothesis should contain at least one variable, mostly two or more, but at least one. So if you have a hypothesis, we predict an effect. This is not a good hypothesis. Well, and I decided to give not too many examples because yeah we have in the end more than 450 hypothesis. It would not be representative anyway. So I stick more to the, to the theory but I will give some examples. The second most difficult criterion is that hypothesis should be understandable. And I will explain in more detail how we, how we, if operationalize this, I think in a nice way. And it's not, you cannot really operationalize this or define this in words we tried this many times it didn't work. I will come back to this but intuitively it means that it has to have one unambiguous interpretation. So, I'm reading this hypothesis here we predict that this group will demonstrate all the behavior indicative of altruism. Well, we, at least as a group, we didn't know really how to interpret this, unambiguous. And maybe not the most difficult one but a very important one it turns out the hypothesis has to be single. That means it cannot be translated into multiple different hypotheses. A general structure where you have this criterion violated is the following a and B have an effect on why. It's not a single hypothesis because this can be translated into a and B together have to have should have an effect or a alone and be alone both should have an effect. Or maybe a should have an effect and be not. And so it's it's not clear how this actually translates. And there are some examples that we found where we have this structure, for example, I will only read out the first one partisans would experience more stress less well being during the election campaign. And here too, more stress, less well being, we have a double hypothesis. So the data collection, I think this is clear, and hypothesis should at least say something about the direction, and not as as here that it's not clear what is meant and this is always true here. And five we have falsified it was going to the data collection. So this is part of the project of all more, which has more data than this so this, this research is halfway. We have 140 pre registrations of seven pre registration types, mostly the pre registration challenge and the other type. In total we have more than 400 hypotheses, 454, I believe. The protocol, the protocol, we aimed for reproducibility, attempting to eliminate subjectivity. And that was really hard we had to revise the protocol and almost 10 times to to get a protocol that that's worked relatively well. In two stages. First, we take the pre registration and we identify the potential hypotheses. This is done by two coders simultaneously. They read the pre registration and they do. They search for hypothesis section if they find it. They copy the hypothesis where the restriction is that it should be one sentence. There's no separate hypothesis section, then they copy sentences for the pre registration that has these keywords over here, expect hypothesis, investigate predict and replicate. And then after the coding there was a reconciliation phase where first sentences are eliminated that have nothing to do with hypotheses but that still contain these words. They determine the hypotheses in 80% of the times the coders agreed. Immediately, then they discuss and if then they don't agree to third coders which were all one me, we also take a look at it and take the final decision. We score the potential hypotheses, and this means that two coders each score the hypotheses on these four first variables, so that this doesn't contain at least one variable understandable single and direction. And then in the reconciliation stage, we have a similar structure. So you're already here that 68% with the coders had the same scoring. And after discussion it was more and in 12% third coders had to make the final decision. About understandable. I said this was the most difficult one. What we did is this, each coder had to classify the hypothesis, was it a hypothesis about an association, moderation, etc, a to G, and F at the lower bottom is here cannot be categorized so you read the hypothesis and you have no clue what it is doing. And what we do then is if both coders disagree about the, the categorization of the hypothesis or if it contains F for both, then we decide it is not understandable. We come down to the results and I immediately come up with the discussion because there are not so many results in this rather qualitative research project. The most important conclusion is that it is very hard to identify and locate hypotheses in often badly organized pre registrations. For instance, lists of hypotheses are often not used or not used effectively. And if we look at scientific papers, so not the pre registrations but the papers there too, we hardly ever see lists of hypotheses. But if we look at student thesis, one of our group exam student thesis, Augustine is her name. And there they do contain in the majority, these lists of hypotheses which makes much much easier to identify these hypotheses. So what we observe here is that students do on average better than researchers. Let's go to the second part. As I said in the main message, only 40% of the hypotheses were good, and 18, 18.6 of pre registrations were good. Well the students who are first year students, they were shocked by the low quality of the hypotheses. And if we look at what criteria were violated most, it was single, many hypotheses combined a lot of things, and understandable. Sometimes we couldn't figure out what what was actually meant in the hypothesis. And here too, we ask ourselves, hey, do students better than researchers. My answer is, yes. If we compare this how students do their research, and they list hypothesis. Also, there's hardly any evidence of publication biases student thesis, and they often do power analysis, and on average they also had higher sample sizes. And if we go to PhD thesis, also there that there's a, what a famous research of a boil that compared to PhD thesis and articles that came out of it. And PhD thesis also fair better than research, and I would say also pre registrations. Why then do students do better than research. Because we try to explain for ourselves, how come that pre registrations concerning hypotheses are so bad and one obvious explanations is that only bad students become research and the good students they do something that is really valuable to society. Our explanation is sloppiness, and I do not believe in the first explanation. I think it's sloppiness that researchers can make hypotheses well just like our first year students, but they do not yet take pre registration seriously. And or do not fully understand the purpose that you have to specify this path for your research from hypothesis to the final conclusion. The final conclusion of this talk is then that science and scientists still generally have a lot to learn and must do their best, must do their best when pre registering hypothesis, and I would say doing research. I tell this. Yeah, I give I teach a lot about meta science also to students and they get a bit depressed that when I talk about the state of science. I always tell this story and I would like to tell the short story to you to. And it's a lesson on your military. What is a rather famous story, at least in the Netherlands is that lesson on the military that the human human species on earth. If we express that this time that we are on a full day. And we are living only in the last 38 seconds of this day, we are very recent on this planet. And so we are not that significant. We can also do this for humans doing science. And here, if we see human life as one day only in the last 13 seconds we do science. If we just started doing science, we cannot expect that we do everything perfect. And we are like say the cave man of humans that the scientists are just start. So, don't be depressed. There's, we are just start. That was my, my talk. Thank you for listening. And if there are questions, if there's still time for questions then. Thank you, Marcel. The Q&A is open, but there's no questions yet. So I'll just, let's just wait a couple minutes. And then meanwhile I was wondering so I was particularly struck by the difference between researchers and students, and you're saying okay the most common explanation is that researchers are maybe a bit more sloppy. So how can you think, how do you think we can improve this? How can we make researchers less sloppy, so to say. Yeah, I don't know, I think it's a state of mind that people think okay let's just write down in the day or an hour. What I plan to do, as it's as it is just a piece of paper next to the computer, not really taking it so seriously as belonging to the research itself. Pre-registration is the basis of the research. It should be just as important or maybe even more important than the paper following from it because the paper should be the result of what is promised in the pre-registration. So I think people don't understand the value of pre-registration. Yeah, that's fair enough. So there's a question by Bob Reed. So they say it seems to me that this is another argument for the benefit of replication because the hypothesis is well defined after the researcher has been done. Do you agree with that? Yeah, but wouldn't it be good if already the original research is okay. The research already specifies the hypothesis as well if there is one at least, maybe you have exploratory research and you don't have to need to have well specified hypotheses maybe. But if you have planned research, then I would say try to figure out what is the hypothesis immediately, write it down well, and also specified and how you assess the variable, how do you test, etc. So I agree that for replication, we can construct the hypothesis precisely, but please let's do it also for the original research. Thank you. One final thought I had myself is that it might have to do with consequences. So for students, they have to be strict in listing their hypotheses, otherwise they get a bad grade. Maybe there are just is a lack of consequences when researchers do not adhere by this by strict rules of science, so to say. True. And even researchers are punished when they list our hypothesis. So if you write your hypothesis and you submit your paper, then you may get a reviewer who says, don't list your hypothesis, it's it's it's childish. I remember when I became a PhD student, and then my first research in visual perception. My supervisor said to me I had listed the hypothesis. We don't do that. That's silly. That's childish. But we require our students to do so, because for us it's very easy to check the work of the students if they list their hypotheses. And it's the same of course for researchers, we can also check work of our fellow colleagues much better if they list their hypothesis. Forget about childish, we just want good science. All right, thank you. I think we'll go on to the next speaker, which is me so I'll share my screen. All right, so I'm happy to announce the second speaker, which is myself. So my name is over the knocker. And I'm part of the meta research center at Tilburg University. Yeah, I'm going to talk to you about effectiveness of registration in psychology, which is one of the main themes of my PhD. And well this title, I guess, begs the question. What is registration effectiveness, when is a registration actually effective in achieving its goal. So we devised this simple formula. Pre-administration effectiveness is pre-administration specificity and pre-administration study consistency. So first part pre-administration specificity is about how many details are in the pre-administration are all important study elements covered. Are they extensively covered so that you are actually able to do the research. We also call this produce ability, which is a bit of a toe in cheek comparison to reproducibility, you know, in the paper, if everything is really detailed, you can reproduce the study. So then that reproducibility is high. But you can also make this reasoning for pre-administration, you know, if the pre-administration is really extensively outlined, you know everything is in there, then it's possible to produce the research. So we're using specificity and produce ability interchangeably from now on. And there's also registration study consistency. So what you plan in a pre-administration should match what you're actually doing in your study. And we need both, they are both necessary in order for pre-administration to be effective so we can have a really high specificity so the pre-administration is really detailed. But if you then just do other stuff, instead of what you planned out to do, then still the benefit of pre-administration is lost. Likewise, you can have a really high consistency between your pre-administration plan and your study. But if you only pre-administrate, let's say one hypothesis and not even anything about how you're going to test that hypothesis, then also it's not really effective to pre-administrate them. So both these elements, and I'm going to zoom in on both these elements in this talk. So these elements have been looked at individually. So Marjan Bakar, which is another of my thesis supervisors, looked at specificity of pre-administrations, and Alina Klaas and others looked at registration study consistency. And both studies showed that there's ample room for improvement. But they did look at these things separately. So what we tried to do is look at them together, and also we tried to have a bigger sample. So what we basically did was we looked at all pre-administrations we could find. For example, from the pre-administration challenge. So this was a challenge hosted by the Center for Open Science in which researchers got $1,000 if they published a pre-administration study. So this was done to increase uptake of pre-administration. So we had 180 papers that won a pre-administration challenge prize, and we include those in the sample. And we also used papers with pre-administration badges. So that's also an initiative by the Center for Open Science where papers that have open data, for example, can get a certain badge, but also papers that have at least one study that is pre-registered, can have a badge. And there were 244 of those. So quite a big sample of pre-administrations. We did exclude some, no studies from other fields in psychology because we weren't focused on psychology. No studies based on secondary data because that's pre-administration, pre-registering those studies is a little bit more complex. I actually wrote paper about that. Please look it up if you're interested in this topic. And also no registered reports because it's also a slightly different way of pre-registering because the review is at the beginning of the process instead of at the end. So all of this resulted in total sample size of 484 studies from 281 papers. So we looked at these in two parts. So in part one, we looked at hypotheses. So Marcel already talked about the specificity of these hypotheses or the quality of these hypotheses. And basically his conclusion was that it was kind of a bad hypothesis contest. So I actually looked this up. This is actually a thing where people do a stand-up comedy evening. And they come up with ridiculous hypotheses. And then they try to, during their routine, they try to support these hypotheses by all kinds of evidence like spirits or correlations and stuff. So yeah, this got me thinking that the field of psychology might just be one big bad hypothesis contest, but that might be too negative of a take. And we also looked at pre-registration paper consistency. So between pre-registration and the paper, do the hypotheses match up, yes or no. What we found, what we found was that about half of the pre-registered hypotheses were omitted. So they were there in the pre-registration, but in the accompanying paper, they were no longer there. We also found that about half of the studies add non-pre-registered hypotheses. So they just add hypotheses that were not in the pre-registration. And they did not specify explicitly that they were exploratory. So we call those added hypotheses. But we have to take into account here that the quality of hypotheses both in pre-registration and in papers is rather low. So it could be that our results here were skewed a little bit because of that. Now, if you have a double barrel of hypothesis, for example, is the whole thing about this is, or is each individual barrel and hypothesis. So because of that ambiguity, we might have found a little bit too negative results here. Actually, another project in the one we're going to talk about now. Because in part two, we looked at other study elements than hypotheses. So for example, we looked at variables, statistical model, how statistical assumptions are handled, outliers, those kind of things. And again, we looked at specificity or producibility and pre-registration paper consistency. So how effective is pre-registration on these other study elements. So here you can see a list of all the things we looked at. And we categorized some of them as essential and some of them as non-essential. So essential elements are elements that are directly related to the empirical cycle you see on the right. And we think these elements, these five things, they need to be present in each pre-registration. Because these are basically the backbone of a scientific study. So we have the variables, independent and dependent, are part of the design of the study. A data collection procedure is part of the data collection procedure. The statistical model is part of the analysis and the inference criteria are part of the interpretation. So we also collected data of these non-essential elements, missing data, inclusion, exclusion criteria. These are all things, also things where p-hacking is possible. We found them a little bit less essential to a research study. So we're not discussing those in this talk, mainly due to time constraints as well. So we're focusing on the top five study elements. We collected data for only 55 out of the 484 pre-registration study pairs. So data collection is still in their way. We actually still need coders for that. So if you're interested, let me know. And this data is based on only one hypothesis per pre-registration study pair. So in part one, we found hypotheses, right? So for each pre-registration study pair, we looked at all hypotheses that were both present in the pre-registration and in the paper. And then we randomly selected one of those. So we have one hypothesis per pre-registration study pair that we actually scored these elements for. So these are the results. So first, we're going to look at manipulated independent variables. So here you can see that in the pre-registration, we found a manipulated independent variable 40 times out of 55. And you see two colors, green and red. And green means that the information about this manipulated independent variable was producible. So based on information that was provided in the pre-registration, it is definitely possible to actually use this variable in a study. And red, however, indicates that that is not possible. So the information is too scarce, too limited, maybe something is missing. So in those cases, it was it's not possible really to use this variable in a study because the information is just not there. So green is producible, red is non-producible. That's with regard to the pre-registration. The paper is similar, but we're talking about reproducibility now because it's about paper and it's about redoing the stuff because it's already been done because it's been published. So green is there's a lot of information about this variable in the paper enough that we can reproduce this variable in a new study. And red means that there's a lack of information in the paper. So here we see that at least in the pre-registration, quite some information is missing about the manipulated independent variable. And that is a problem, of course, because if the information is missing, we can also not compare it to the paper. And that is what we did here. So this is a graph in which we look at pre-registration paper consistency. So of all the pre-registrations in the papers that included information about the manipulated independent variable, we were able to compare those because there was information there. Most of them are consistent. So green means there is consistency in the manipulated independent variable between the pre-registration and the paper. Red means it's inconsistent. So there is a change in this variable. And as you can see here, it's 21 of them had information for both the pre-reg and the paper. And you can see here in this graph that, indeed, the bottleneck here is the pre-registration. So those were often not specific enough to be able to do this comparison. So the problem here lies, I guess, with the reproducibility of the pre-registration. We can also look at independent variables that were not manipulated. So these were just measures. There could be like scales, measuring some kind of construct, but also things like gender, for example. And we split this up into three things, the procedure, values and construction. And the procedure is, okay, how is this variable measured? For gender, it could be, okay, it was measured using a questionnaire with Altrix. For EEG, for example, an EEG variable, then the procedure of the EEG should be present here. And you can see here that green again is reproducibility, red is non-producibility, and it also holds for the paper. And you can see that, again, the pre-registration is not really specific about the procedure that is used in these variables. The second picture is about values. So this means which potential values can this variable take. So for gender, it would be male, female, non-binary, I guess. And for the EEG, this will be a range of values, but they need to specify this, right, otherwise they can tweak this in between pre-registration and paper. So again, you see that it's not always perfect, but I guess the majority of the times there is enough information, there is information about the potential values of these measures. And then third, on the right side, we have construction, and what we mean by that is we only prefer to variables that are constructed out of other variables. So for example, scale consisting of several items. So what we need to know then is how are these items transformed into the scale. So are we just summing all the item scores? Are we taking the average? Are we truncating anything? This just needs to be specified as well in the pre-registration. And you can see here that it's not always the case, and it's definitely not always the case in the paper. So the paper often omits information about how the individual items are synthesized into a general score. So that's problematic. And then on to the consistency of all these elements of independent variables. There's a sea of green, so that's good news. So whenever there is sufficient information in both the pre-registration and the paper, then basically for each of these elements, then the pre-registration and the paper are consistent. So that's good news for pre-registration effectiveness. So that's a good thing for the dependent variable. Again, we did look at the procedure to measure this variable, the values it could take and how composite variables are constructed based on individual variables, that's construction. And here you see, I guess the pattern here is that pre-registrations are less producible than papers are reproducible. So you can see here the bottleneck is again, based on the left two pictures is with the pre-registration. Those are often not specific enough, not producible enough to be able to compare these different study elements, which of course is a problem if you want to assess pre-registration effectiveness. And the picture does get more positive again when we're looking at consistency between the pre-registration and the paper. So dependent variables are usually consistent between the pre-registration and the paper, although the potential values that the dependent value dependent variable can take sometimes aren't. So it could be that some kind of p-hacking involves tweaking the values of the dependent variable, truncating them, for example. We also looked at the data collection procedure. So how is the data collected? And there's two elements here, sample size, basically how many people are you going to test or did you test. And we have the sampling frame, and that's the procedure of data collection. So we were at a supermarket and on that day, and within four hours we approach as many people as possible, something like that, that's a sampling frame. So let's start with the left picture. The right side is totally the right side of the left picture is fully green. So in the paper, the sample size is always provided. So that's good. But in the paper or in a pre-registration, that's not always the case. Of course, it's a really important element of pre-registration to specify the sample size, but you see here in red, that's not always the case. So the right picture, sampling frame, this is mainly red. So here this means, okay, both in the pre-registration and in the paper, researchers do not provide a lot of information about this. So they don't really specify how this data was collected exactly, at least not to an extent that we could easily do it again or do the data collection ourselves. A few minutes almost. Yeah. Okay. Here we have sample sizes, left in the pre-registration, right in the published paper. Then we're of course interested in the difference between those. And let's look at the right graph here. This is the relative sample size differences. And you can see that sometimes the sample sizes are way higher, sometimes way a little bit lower. But of course, the most important information here is why is it lower and why is it higher. We also have data on that, but it's not analyzed yet. So I have to, well, I can't say anything about that, unfortunately. I can say more about statistical model. Here we see also a little bit of a gray. What that means is that, so researchers should say something about it. And gray means they say nothing at all about it. So in some cases they have a pre-registration without referring to anything of a statistical model. I guess we can see the same parents, except for the details of statistical model. So the model itself is there, both in the pre-reg and the paper. The variables are there, both in the pre-reg and the paper. But the details, so how did they use, do they use robust standard errors, for example, that's often missing information. And when the information is there, both in the pre-registration and the paper, usually it is consistent between them. So that is one of the elements of pre-registration effectiveness. And in this case, again, it looks good. Then our final elements, which is inference criteria. So here we see a lot of gray in the pre-registration. So most often in a pre-registration, they don't say anything about inference criteria. They don't say, okay, we're going to use alpha as 0.5 or something like that. I guess they just take it as given. And in paper, they also often don't mention that. That's why we see this red bar on the right. So this means that they only implicitly say it. So gray is they don't say anything at all. And the red is that they implicitly say something about inference criteria. So they use asterisks in a table with p-values, for example. Green is that they actually say, okay, we use this alpha. We couldn't compare many of them because they were just not enough information, but usually the inference criteria were consistent between pre-registration and paper. So all of this, what does this amount to? I think this is an important slide of this talk. So we can see what happened with this report. So this is a slide on a shell. We can see that in various reports, the number of the portion of positive results significantly goes down, presumably because it avoids p-hacking and publication bias. Now we have also our own results. And that's kind of in the middle between various reports and actual and original papers. So and this makes sense, right? Because in peer illustrations, like standard peer illustrations, we don't avoid publication bias because it's not part of the review process. But we do avoid some forms of p-hacking, we hope. So this actually indicates that at least some forms of p-hacking are prevented. So here you can see this together with the other bars. You can indeed see that it's somewhere in between. So this, in my book, indicates that pre-registration seems to be working at least somewhat. So finally, to wrap up, one, selective hypothesis reporting seems to be prevalent in psychology. Two, it is hard to compare pre-registration papers because they lack reproducible slash reproducible information. Two, when information is available, then most pre-registrations align with their corresponding papers, should be papers there. And four, pre-registration does seem to have expected effects for the proportion of positive results in the literature, because it's somewhere in between the proportion of written reports and of original studies. I think the actual percentage was 66%. So main takeaway to reap the benefits of pre-registration, pre-registration, and paper reproducibility should be significantly improved. So it seems to be working a little bit, but there's way ample room for improvement. So these are my co-authors on the left for the selective hypothesis reporting project and for the right, pre-registration effectiveness reporting, pre-registration effectiveness projects, and we're still collecting data, so we're still looking for coders. So if you are interested, please send me an email. You can do that on this email address. You can also, should be a message on Twitter. And of course, all our studies were pre-registered. You can find your pre-registration, pre-registrations here. And when we're done with analysis, all data will be publicly available, of course. All right, that sits on my behalf and I'm open for questions. So there's one by Marcel, which he put forward in the chat, but it's only the chat for the hosts and the panelists. So I guess the other people attending can see it, but his question is, do you have a suggestion how to improve the quality of pre-registrations? Yes, I do. I think we need first, we need to improve the infrastructure surrounding pre-registration. So now there are some templates available. And I think those are really useful because they are like a guideline people can use to actually do a good pre-registration because it is really hard, especially if you have to start from scratch. So pre-registration templates can really help. And there's more and more templates on Open Science framework as well. So please check that out if you're looking to pre-register your studies. And there's also a question in the Q&A by Eric Olsen. So it says, in terms of clarifying the value of pre-reg, where does the responsibility for increasing this clarity reside? Yeah, is it the researcher? Is it educators? Is it funders maybe? So he's saying there's not one stakeholder, but what can a success of students tell us? I guess this one is also related to the previous talk. This is a good question. So how can we improve pre-registration effectiveness? Is it should come from researchers themselves? I'm actually of the opinion that it should be a top-down effort. So I'm a big proponent of pre-registration despite some results not being definitely saying that pre-registration is effective. But I still think that funders should mandate that pre-registration happens. Because a little pre-registration is better than non-pre-registration at all. And I think it's mainly about money or slash power. So funders have the money and the power so they can dictate what's happening in the scientific community, basically. So if you all agree that pre-registration is a good thing, they have to say, okay, let's get this done. Every time you submit something, it needs to be pre-registered. So I think that's where the main improvement lies. Any other questions? Or should we go to the next speaker? I guess we can continue. So next up is George Afosu. He's an assistant professor in comparative politics at the Law School of Economics and Political Science. And he's published about political accountability, election integrity, but he's also interested in transparency in science. And that's actually why he's here. And he's talked to us about pre-registration in economics and political science. And it's not called pre-registration there, but it's called pre-analysis plans. So please tell us more, George. Thank you very much for the introduction. And I think I share a lot of the comments by the previous presenters. I think there are similar trends of what they have shared in political science and economics as well. This work was done by myself and Daniel Posner at the University of California in Los Angeles. And what we try to do in political science and economics was essentially to take a stock at what has happened in the past few years when the trends of pre-registration kicked off in political science and in economics. We didn't set up to do anything that was causally identified. You know, try to find the effect of PAPS pre-registration on issues of fishing and hacking that has been alluded to in the previous presentation. But we set for ourselves a simple question, which was the way that PAPS are being written now within political science and economics, do they have the characteristics, they have the ability to check fishing and hacking. So that's like the simple question that we set for ourselves. So our approach was quite simple. It was basically to draw a registered pre-analysis plan on the American Economic Association and the evidence in governance and politics registries. These are the two main registries that we knew of where at the time political scientists and economists were registering their PAPS. Political scientists mainly on EGAP and the economists mainly registered their studies on the AEA registry. We analyzed the content of this PAPS that we drew from these registries to find out whether, you know, things were clearly specified and comprehensive enough to limit the scope for fishing and hacking. Then we also tried to assess whether PAPS indeed tied the hands of researchers by comparing, I think, as Gomo did in his study, compare the pre-registered analysis to the publicly available PAPS that came out of this pre-registration to see whether they indeed were tiring scholars' hands on what got reported. But we also conducted surveys with PAP users who obviously were registered on the EGAP website, but also on the Innovation for Poverty Actions website. We sent emails and surveyed to about 664 scholars and received about 155 responses. The idea was to get the perception of scholars regarding the use of this pre-analysis plan, because in as much as we try to advocate for the use of pre-analysis plans, I think how scholars experience the use of this analysis plan, and I will mention a couple of things that are on the minds of scholars on, you know, how this might impact on the way they publish, how they get rewarded in the academy and so on. That was important for trying to think about the way we use pre-analysis plan in the social sciences. There's a limit, obviously, to the approach we adopted. First, you know, these judgments are necessarily subjective and I think I was quite impressed by Marcel and obviously almost presentation on the very precise nature they define how they think about what is a good hypothesis, for example, and I think it's very useful for us to use that as we move forward. But we had very simple criteria of, you know, having an X, having an impact on Y with a specified direction of effect as a clear hypothesis. But again, it sometimes can be subjective within this realm of thought. Our analysis is not causal, but we believe that it's taking, you know, help us to think about how we adopt this mechanism to increase the credibility of social sciences. Why is it important? I think Omo showed this trend, you know, in political science and economics, this has kicked off from 2011 when the first pre-analysis plans were registered. Now we are talking, you know, in 2000s and so of pre-registration in this field. We teach graduate students that this is the new trends in the field, you definitely have to do it. And so we want to figure out whether there are any benefits to the reasons why we promote pre-analysis plans. The other thing is that it takes a lot of time and this came from the survey of scholars in the field. We asked them how long it took them to write a pre-analysis plan and about more than a quarter suggested that it took them more than a month. About 31% or 32% suggested two to four weeks. So it's very time consuming. So if we dedicate all this time, we teach students to do this, we really want to know whether it's beneficial. The more critical and deep question surrounding the use of pre-analysis plan in political science and social and economics has been an argument about costs and social scientific discoveries. Let me figure out the view that, of course, if you tie yourself to a pre-specified hypothesis, then it tie your hands to exploring the data and that limits potential breakthroughs that come from unexpected surprise findings from the data. It forces researchers, they say, to undertake analysis that are inappropriate when you have the data in your hands. Others are also concerned that it invites scoping. So especially for young scholars who pre-specified the idea and don't have the money to run things in time, they might be stooped by other scholars who are well placed to conduct the research and take the ideas of them. So these are some of the concerns in the field, so the motivation for study. And I think the other thing is, perhaps are unlikely to enhance research credibility if policing is not rigorous. So online, you know, or more and others who are taking this seriously, taking paths and comparing what people registered to what is eventually published. It's really hard to see any types of reward that the discipline offer for people who do work like this. I think this is coming up and we need to be encouraged, but, you know, it wasn't there. There is all this benefit in theory that we think we should gain from pre-analysis plan, but it also comes with cost, even if it's alleged, and we need obviously to investigate whether this costs are real. Our early stock taking was just to provide such benefit. So this is the description of the sample that we used. We had about 200 pubs, 47%, 48% had some publications attached to them, and that's the sample we could compare the pubs with the published papers. 50% were gated. And so what we did was essentially write to this artist to share them with us in confidence. So our description is just a description of the whole pubs, but not for individual pubs. 50% were in the website, and the other half on the other side. Most of the pubs we analyzed are more concentrated in 2015, 2016, but the trends we discover apply to the area earlier periods as well, which we show in the paper. In the field, most of pre-analysis plans for field experiments and survey experiments, there's hardly any for observation or study, so it's not very clear how to write pubs for observation or study yet, given that there's questions about whether people saw the data before we wrote the plan, and so it's not very common in the field. We have a common rubric, which we used at least two coders, either ourselves or our research assistants helped to code all the pre-analysis plans and related papers. And then we wanted to code for clarity of definition, which is critical to reducing fission. We followed Orkin's criteria, which is if we should give these pubs to two programmers, they should be able to come up with the same results for the primary independent variable and the dependent variable in the pre-analysis plans. So pubs, as they are written in political science and economics now, reduce the scope for fission. So fission is made possible if there is imprecise definition or lack of clarity about the econometric model so that the way you would analyze the data on the outcome variables, the coding rules, whether covariates are clearly specified and sufficiently clear, the sub-samples that will be involved in the analysis, inclusion criteria and among others to be involved in the analysis. And if these things are clear, then there is obviously a hand-tying situation where we can really compare what you specified to what was reported in the paper. So, inadvertently or nefariously, if these things are not clear, then it leaves room for fission or hacking. So what do we find? We find that about 77% of the time the dependent variables in the pre-analysis plans that we specified were judged to be clearly specified. So 73% of independent variables were also judged to be clarified. And I think the independent variables and treatment variables because often time is related to a field experiment that is always quite clear, but when it comes to the dependent scholars are a bit hand-wavy about it, but 77% is pretty high given that this is an early stock-taking exercise. What about the statistical models that people wanted to run? And this also we use the same criteria and has been mentioned by previous presenters. 68% clearly specified a precise statistical model that will run, but only 37% specified how they would estimate standard errors. In 19% of cases, the model that were presented in the paper defect from the one that were pre-specified and such deviations were only mentioned only once. So obviously it's okay to, I mean, deviate from what was specified, especially when you thought post-filing of the pre-analysis plan that this is not the right specification, but at least to acknowledge in the paper that this was different from what we pre-specified is a measure of transparency in our view. 25% of pops specified how they would deal with missing data, only 8% specified how they would deal with outliers, 20% dealt with how they would deal with covariate imbalance. And so this gives a lot of room for remembering as well to generate particular results. So there's a lot of room in this regard for improvement in political science and economics. So can free analysis plans reduce Haken? So Haken is made possible obviously through imprecision about the specific hypothesis that the researcher intends to test. Such would implies that researchers can move, go, post, interpret, expose what the data had to say, which is not based in theory that was pre-specified. So 90% of the time the analysis, the pops we studied specified a clear hypothesis. So this is, you know, scholars were really good about that. However, this scope for efficient can come, can also come from specifying too many hypothesis. So it turns out that even though scholars were careful about specifying clear hypothesis, they also specified a lot of them, which meant that there's room for us to pick and choose which one we get to report at the end of the day in the papers that we publish. So for example, 34% of pubs pre-specified one to five hypothesis, which is, you know, obviously a good number, but mainly pre-specified way more. So 6 to 10, 11 to 20, and about 8% pre-specified 50 plus hypothesis. But obviously this is not a problem if scholars specify that well amongst all this hypothesis that I have specified, this are the primary ones and this are just secondary ones that I'll be exploring at the end of the day. But even that, before shot there's people, you know, scholars specified too many pubs, too many hypothesis. Among the, among the, the plans we looked at, 42% specified only one to five as their primary, but by 3%, 50% or more. 6 to 10 hypothesis 25%. So there's still a lot more of hypothesis that are specified as primary, even if we respect our sense to that. Another safeguard to, to so many hypothesis is to commit to multiple testing adjustments here, you know, among scholars who are pre-specifying five or more hypothesis just 28% pre-committed to doing so. But do scholars take advantage of this latitude that they give themselves in terms of specifying too many hypothesis or not a specifying adjustment and so on. We find that all tests, faithfully presented the results of all the pre-registered primary hypothesis in 61% of cases. More than one third of cases had at least one pre-registered hypothesis that was never reported. The median neglected hypothesis was about 25%. So a lot of stuff so very consistent with what Omar presented in his, in his work as well. So 18% of papers presented new hypothesis, which were not pre-specified. And in 82%, this new new hypothesis were never identified as new. So they don't mention, mention them at all. So in this, what did we take as, you know, a most complete part, I think that the lot of dimensions by which we can analyze what a good part is. We think that they are forming criteria for what a good part should be to have a precise hypothesis, a precise independent variable, precise dependent variable. And a precise statistical model. We think that this four criterion is very important. And I think the most suggestion about the mode of inference is important here. I think we should consider that as part of a complete pops. So here, if we look at the distribution, sorry. So a lot of about more than half of our pubs met the four criteria, which was, which is quite really good. We would take that as a glass half full, but there were a lot also many more that fell short of this four key criteria. In regards to the challenges or the objections to pubs, we try to take some lessons from what people said in the surveys that we conducted, you know, obviously time consuming, does this tie for discovery, does it hamper publication. And is there enough policing to make sure that pubs are effective in what we think they should be doing. Composing pubs people thought took a lot of time, as I have mentioned already. But people thought that there were positive angles to this time consuming exercise. 65% thought that the project led to a refinement of the research protocols and data collection and plans, which is like an improvement in in the research process. So you refine you get to refine your research protocols before you even implemented get comments from people and improve improve the quality of your research. 65% said it put them in a position to receive such useful feedback. 52% said the experience downstream time savings. So once you write a pop. Obviously downstream that works pretty well in terms of time time saving. You know, negative 34% said writing pops delayed the implementation of your project. The trade off it so it appears that there's a shift in workload in the from the backstage to the front stage so you do much of the work in the front stage. But it's not very clear that on net they generate significantly more work, according to the interviews we have with scholars using pubs. It limits the scope for discovery I think I mean this is obviously how to tell. But if you ask scholars, you know, they don't seem to think that it limits closely less than half. They didn't think that it limited them at all, but a huge proportion also said somewhat or quite a bit. This is obviously quite a concern in the field that, you know, right in the pub can limit the scope for discovery. Others says, you know, it does it. You sort of write. It seems like a report that you don't seem to write a convincing theoretically interesting paper that will get published in high end journals. But what we try to do as a follow up to this study was to look at whether this is this has some credence in the data. We take we took a look at the NBR working papers, those who mentioned whether they use any form of experiment in their work about 11% or so had that. And whether they mentioned that they use any pubs so only 8% said they did, compared to 92%. Then we looked whether this papers were eventually published or not published. So at this stage, we see that there is a high probability for those who did not use pubs to get published, compared to those who did not use pubs. So this on the face of it, give some credence to that concern. However, we see that once published, those that use pubs had about 61% chance of being published in the top five journals in economics, compared to those that didn't use pub. So, yes, you know, in my limits publication, but it seems like you learned well when you write a pub. We also see a high rate of citation for for the best that he's pops. Okay, thank you. So the balance sheet. Just to end, just to end my presentation. It appears that perhaps as they are currently been waiting, not doing everything that, you know, we thought they would do doesn't tie hands as much as we thought they would. Many pubs are not sufficiently clear in terms of the hypothesis to prevent hacking. The details they provide insufficient, not precise enough to prevent fishing. The purpose do not always follow the pops that were registered and deviations deviations from the pubs are not always faithfully reported. However, the majority of pubs as we analyze us quite sufficiently clear. They are precise enough comprehensive to limit the scope for fishing and hacking. There's a halfway there so we we view as I mentioned from the start, we view this as a glass half full. I agree with myself point that we just at the beginning of this process. They started in 2011 in the case of political science and economics. And so, you know, this is a start of the process we are all learning. And I think there's room for improvement, we need to, you know, clearly specify what should go into a pop. There are, you know, some few recommendations from other scholars but as a, as a, as a field, we haven't clearly specified what should go into a pubs and what constitutes a sufficient pub. And for us is that we are, we are going to assess whether there has been any progress, especially along the four dimensions that I mentioned, potentially add almost recommendation about inference. For 2019 and 2020, coding is ongoing for us and we hope, we hope to show how much progress we have made and what my explain if we have made any progress what my explain such a progress. So thank you very much for your attention. All right, thank you George. On the open floor for Q&A. There's also already a question by myself and us and who asks, are there many templates for PLS plans. Do journals have these templates or are there general templates that economists and political scientists can use. So, so they are, they are no, like they are no templates, but I think a couple of newly published papers that try to give guidance on exactly what should go into pubs. So the eGaP for example, also have like some sort of a template on what people should put into their pubs for registration. So there are very few things, there are sort of key elements that eGaP asks scholars to specify and so does the AEA also have a couple of things that people should specify. So hypothesis clarity, clarity of hypothesis, sample size and things like that are asked by this registry. So I guess that that kind of template that is available. But obviously, but you know whether scholars specify this element precisely enough to aid reproducibility to aid to tie a scholar's hand is another question. Yeah, Bob Reid also has a question. So you mentioned four key elements and you also alluded that inference criteria might be a good addition to that list of important elements and Bob Reid asked okay, but should the specification of how to sample as a sample shouldn't that also be one of this list of critical elements that should be in a pub. Yeah, I mean I think, I think it's an important element of pubs in general and we called that in our study we called multiple, multiple things so sample size whether there is IRB the direction, the direction of hypothesis testing whether it will be a one sided test versus two sided test we sort of code all of that and we think that they are important for pubs. But in terms of judging whether a pub is complete, it's not very, I think theoretically it's not very clear to me whether, you know, the collect how you will take your sample size of the sample goes into the issue of the testing of you know the testing of a hypothesis and whether it's supported or not supported by the data you use. So I precise, if you know if we understand pre analysis plans correctly to be. This is what I'm going to test in the world. This is the independent independent variable I'm going to use in testing that this is my operationalization of this hypothesis. And this is the model specification and this is how I'll come into conclusion as to whether my hypothesis is supported or not. And I think like this for elements, and obviously the directional test would be sufficient enough to tell us that. All right, thank you George. With that I'll close the q amp a, and I will present the next presenter, which is Sarah and field. And she's a PhD student in meta science at Groning University. And I believe she only has a half a year left before she has to finish her dissertation so it's going to be an exciting next couple of months. And she's also part of the Executive Committee of the platform for young meta scientists, which we call pins I'm also part of that. This is a community of young meta scientists early career meta researchers, and we try to keep each other posted about the latest work in meta science. We also host some conferences. So if they're early early cover meta researchers listening. Please contact Sarah or me, maybe we can link up together. So with that shameless plug, I'll hand over the floor to Sarah and please go ahead. Thank you almost. Let me just get my slides up. So, as almost said I'm Sarah and thank you for joining us for the end of almost the end of the last day at the conference. As I said, I'm sharing a little bit about an attempt that myself and some colleagues made to link up trustworthiness and perceptions of trustworthiness with pre registered findings and registered reports. So I'll just share basically that that article of ours. My idea was to set the hypothesis that people are going to trust findings from pre registration registered reports, more than those that are published using the traditional publishing model so with no registration at all. So it sounds pretty simple. It sounds quite intuitive you know that's one of the promises right of pre registration especially is that when you go to the effort of setting out all the plans that the findings that result from that process that are probably going to be a little bit more high quality and hopefully a little bit more trustworthy so again we wanted to sort of empirically test this. We did two studies. One was sort of a pilot study and the other we just called a full study which was a registered report itself. So we were basically looking to ask about trustworthiness of active psychological researchers so I mined the Web of Science for email addresses of people who had recently published on psychological topics in peer review journals. So the design was a two by three factorial design with the independent variable of registration status so pre registration registered reports, and the dependent variable was trust. So with that little outline I'll just show you basically exactly what we did. So we randomised people, people put into one of three different conditions for the independent variable of registration registered reports, just a quick caveat. I might at times put registered reports and pre registration together as they're the same thing they're definitely not. But sometimes for brevity it's easier to save them in the same breath because they have similar benefits and limitations and they're similar in a lot of senses. So yeah, people were were put into one of three conditions for the study, either or none. So like I said, we, let me actually actually just go a little forwards. I'll stick with this. Very briefly, we showed people a study vignette. So we basically showed them a little fake study. And we manipulated pre registration status within that little tiny fake study. And we asked them about it after they have read it, after the trust, after they had read the little mini study, we asked them about their trust in that study and its results. So back one step further, people in the none condition were not shown any kind of registration at all in their mini study, they just saw a basically just a mini study with some fake results and some simulated data. People who are in the pre registration condition. They were shown a little mini study, which basically included details that indicated to them that the study had been pre registered so that things have been put up on OSF and that kind of thing. And that was intended to make people who were reading that material or that mini study that was meant to get them to think that it was a pre registered study. And similarly, the registered report condition contained text in the mini study, which indicated that it was actually the result of a registered report. And then we asked them after they had read this mini study with the manipulations, whether they trusted the study or not. So it sounds maybe a little bit more convoluted the one it actually was it was a pretty simple study, I think. And so I'll just show you for the firstly the pilot briefly so this was the mini study that people saw. So, as a just a very brief study description, and you can see highlighted down the bottom there. It says the paper makes no mention of any previously documented sampling plan or study design. This was a material that someone in the none condition would have seen so this again is a little mini summary or study, which is just published in the traditional sense. And then, again, once they'd seen that material, that little mini study we asked them how much they trust the results of the study. They indicated that trust on a one to nine like it's fail. So now in hindsight that's that's not, you know, a little bit too simple. But there are a couple of methodological issues with the study that I think I can see in hindsight but I'll talk a little bit about those later. And importantly, we also asked them a little bit about their opinions on pre registration register reports as well which comes in handy later, as I'll show you. When we said how much you trust the results of this study. We wanted to let give people a bit of a sense of how they should orient their response or orient around the question of trust so it's ideal obviously if people are given the chance to sort of have some kind of intuition about what they're being asked you know how much do you trust the results of this study that they would intuit what trust means but participants tend to want to know kind of what you want for them to do. So we gave them some adjectives which helped them kind of think about trust in the way that that we wanted them to think. So these are not synonyms for trust they are just the facets of trust and we thought that things like a true effect being reliable or valid. They are ways that we evaluate research articles in real life. So when we said how much you trust the results of the study. That's what we wanted them to sort of think about a little bit. So up until this point I haven't mentioned the other condition, which is familiarity. Because it's completely outside of the scope of this talk and this session, but it is in there so that's what you're seeing is just an extra independent variable. So this plot basically represents what we expected we expected for the registered report condition to report the highest possible trust ratings out of out of all the conditions. But we expected them to perform higher than with a pre registration condition. And overall we expected for pre registration status to be the highest in conditions where there is some kind of registration so again, we had expected to see that trust would be greatest whether some kind of registration or pre registration or registered report in in the finding. So that's what we see here so the error bars are relatively big, but we do see a relatively neat somewhat linear trend there. And if you look at the results numerically, they certainly echo that it's a very strong effect so we conducted a Bayesian analysis a Bayesian and over. Because we had two independent variables we couldn't just run a normal Bayesian and over we actually, well, you technically can but it can be a little bit complex to interpret the results so we ran. We calculated inclusion based factor, which basically allows you to compare all of the terms in the model that include one variable in comparison with the rest of the terms in the model. So basically let's you focus on just one main effect. That's kind of the layman's explanation of it. And you can see at the top where I've highlighted here. The pre registration status effect was was really quite compelling, regardless of who's sort of criteria of evidence you use Jeffries or or Lee and Vaha markers it doesn't matter. That's a very, very extreme inclusion based factor so including pre registration status in the model. It was a good move. And in this case we can for the for the pilot we can certainly see that pre registration register reports does really boost the perceptions of trust so a first blush. It's looking good right because again, people are people are up taking this initiative because they want, they want to start increasing trust again in research right where we've seen so much drama with findings lately and so much undermining trust we want to know that what we're doing is yielding more trustworthy research or at least that people think that that's what's going on that matters right. So, again to begin with you know this is some good stuff with a pilot. Let's now to the full study results. One difference. So there are two differences between a pilot and full study. The pilot had a simpler study than yet. We conducted a qualitative. I did a content analysis on the qualitative results of the pilot study. I asked people, what do you think about the materials, do they give you enough information to make a trustworthiness rating. So does that little study been yet does that give you enough information is it realistic enough for you and most people said no, most people said they needed more information. So, we, I knocked up this little thing here. I simulated some some more data which looked nicer in a plot, and I threw this into later and basically type set it so that it looks like a mini study to make it a little bit more realistic and to give people a bit of a sense of a little bit more of a detail of the study. So that was one thing that was different. One thing was that in the, you might recall in the first study in the pilot, the manipulation information was quite as relatively subtle right it's just a little bit of text within the main text of the, of that material, and it doesn't really stick out and we thought, well, it's possible that people actually missed that manipulation so in the full study we actually added this text profile we call it a profile to for people to read before they actually saw this. So, we wanted to make sure that that prime was fairly strong fairly salient. And the highlight of it is what we were focusing on is our primary measure. This particular text is what someone would have seen, were they in the pre registration condition. So if we go all the way back here to my methodology. This is what someone in the pre registration or middle condition would have seen before they saw the study video. So like I said, you know, we basically explained that the fictional study had been pre registered we didn't use the words pre registration or register reports at all. We felt that they would be a bit too loaded, and that people would, you know, extremely quickly see the idea of what we're doing so we wanted to make it a little subtler than that. So over to the findings for the full study. Again, we asked the same questions and we got their opinions on the incentives which is. Well, I'll go into that in a moment. I'll leave that for now. So, this is the kicker that affects that we saw in the pilot study with these you know nice little linear expected relationship between trust and pre registration status that completely disappeared for the full study. Which is a little bit frustrating and upsetting, because we kind of don't know what's going on now is trustworthiness indeed increased for a registered report finding or a pre registered finding. So, if you look at the plot on the left plot a. This was the data plot for the full data set. We didn't make any exclusions. You can already see that there's kind of crazy stuff going on there's some really big air bars, so much crossover, and that there isn't even a trend that maps on nicely to the pilot findings. At this juncture I'll just briefly mention the exclusion so. Like I said the full study was a registered report itself. And in the stage one before we got in principle acceptance reviewers in stage one suggested we should do a manipulation check, basically to make sure that people were being primed in the way that we wanted to. Which makes sense. So we basically asked people after they had answered all the questions and whatnot we asked them about whether they had noticed the prime, you know, was the fictional study that you just saw. Like a pre registration or was it a registered report so we explicitly asked them if we noticed that that manipulation. We ended up excluding two thirds of the data, based on this manipulation check, which is pretty crazy. So going to plot B, which is post exclusion data plot. It's even crazier than the first one. The bars are even bigger if that's possible. And there's basically not much going on at all that makes a lot of sense. Now, like I said, we had to exclude two thirds of the data based on the exclusion criteria, which ended up being a total sample of 200. So if you're splitting up 200 people into six conditions that's not many participants per condition so it, it's logical that there's a lot of noise in the plot. But it's also the case that we made those exclusions in an attempt to cut out some noise right that's kind of often what what these things are made to do. And it's a little disconcerting that that given that the plot is as crazy and messy as what it is. Note also on the left on the, the y axis for plot B, the range for the dependent variable of trust again that's a one to nine market scale. That's extremely wide. So it's just all sorts of crazy, all sorts of crazy. It sadly makes me makes me think a little bit of this cartoon here. The error bars weren't quite greater than eight standard deviations, but you know, they were pretty big and a little bit upsetting. Why is it upsetting? Now I'm someone who's a very firm open science advocate. I love open science. I love slow science and I love celebrating error in science. However, I really firmly believed that people would think yes, pre registration registered reports that is so much more trustworthy. I really believed that that what would be seeing in real life would be what we saw in the pilot that decent strong linear effect right that strong relationship between trustworthiness perceived trustworthiness and registration. So that's why I was upset because I really believe that pre registration registered reports should be enhancing people's perceptions of trust in the findings. So, that's the plots. And if we look at the inclusion base factor now compare this inclusion base factors the one of the pilot which was 1400 odd. This is a base factor somewhere between zero and one. So we in, we calculated an inclusion base factor in favor of the alternative hypothesis. Which means that if you invert that point 359 number you actually get some weak pro null evidence. And again that inclusion base factor basically let's separate out that main effective pre registration right from the rest of the model so there is basically no point in including pre registration as a term in this model it does nothing to explain variance kind of fuss. That's that's what we're looking for it's a very. There's nothing going on with trust and pre registration. Which is why the qualitative, the qualitative questions or results were interesting. So we basically asked people what do you think about these these initiatives. What do you think about them. One in five people said that they were neutral about them about one in 10 people said they're good but not practically useful. Half of people said that they're good very useful half of people are on on my sort of thinking. And a couple of people even went so far as to say that they don't think they're a good initiative at all. And again one in five people said something else. It's complicated basically that's what it boils down to. So people wrote a lot of different things and like I said I did a content analysis on these on these different comments because we had a lot of data. And it was good to sort of see what sort of themes would come out what were people saying the most what were people people's main concerns about registered registered reports and pre registration. So a lot of people mentioned hindering hindering creativity or exploration in the scientific process and some people mentioned that it slows science down. Other people in this session already mentioned those and I won't mention them again. But they are very common problems people have with pre registration of registered reports. So people also said pre registration and registered reports are not a panacea. For one thing they don't fix QRP and fraud well you know you're right they don't. And there's not a system that we could possibly have in place that would fix QRP and fraud because I firmly believe that if someone is going to be systematically and pre meditatively committing QRP and fraud. So if they're deliberately doing the wrong thing they are doing bad faith science and they're going to keep doing that we can't change that. What it does fix is that it helps good faith scientists do better science right it helps people plan out their stuff it helps them avoid bias. This is again for people who really want to do the right thing that will help. I firmly believe that people said no it doesn't fix the file draw problem we're right it doesn't but I think that it can help avoid a big file draw problem. In that it can give people an opportunity or a place to put stuff that didn't quite work right you've got a pre registration document that you've already uploaded to the OSF. You run the study it doesn't didn't work the way you wanted it to that's a bummer. What you can do is just put your data up put your materials up put your code book up online and someone else can come and pick up where you left off if you don't have the time to chase up the failed study right so no it doesn't fix the file draw problem. But it gives people an incentive to just, you know, not just stuff their results away at least they can put something up for their work right. People are also having issues saying that, you know, it might replace critical thinking in science I don't think that's true at all. But, but that is a concern that a lot of people raised. And I think that. So just sort of just to finish my main thought here, we still don't know if people see pre registration register report findings as being more trustworthy. I think it's complicated. I think trust in sciences is a tricky thing, especially now. What both George and Marcel mentioned is that we're in the infancy stage of psychological reform and psychological science and in other sciences. This is early days, we were still ironing out the creases and I think that that's a really positive sort of thing to think about. We can work on increasing trust right we can work on this stuff and I think the work that almost, for example and George have been doing, taking a look at things like specificity and adherence to pre registration plans. Some of the participants who who responded in the qualitative aspect of the results in my study, they mentioned these things they said well if people don't adhere to their pre registration they useless. They said if people don't, you know, include enough detail in their pre registration it's not going to be helpful and they're right so I think it's about finding out these little creases so that we can increase trust. And so that it's not just perceptions of trust, but that it's actual that science is becoming more trustworthy. So that's, that's my hope. And with that positive note, I'll end my presentation. Thank you very much. Thank you saran for a nice and transparent talk. So I actually wrote a blog post about these experiences with doing this study as well, which I highly recommend I think you can find it somewhere on Twitter. It's in all sort of Bayesian spectacles. Yes, highly recommended. My question. He was wondering whether there are variants of thrust was higher among those who are familiar with registrations. So I think that was based on on slide 10, where the bars I think were a little bit higher for those familiar with registration so I guess he was wondering whether that's actually the case. Let me just share my screen again. So I have a. This is not familiarity with pre registration. This is familiarity with the fictional author of the study. So I'll show you. So you can see here on this slide here, a researcher with whom you have never collaborated. This is this refers to the fact that sometimes people judge studies differently when they come from people they know. Right, that's that's a fairly common thing to expect. And we wanted to account for that, I guess, in in our in our study. We wanted to see if you know familiarity, because people, we anticipated that people might have issues with the protocols to some degree, but that if they knew and trusted the person that published those pre registered findings or those register reports that they might actually be more trustworthy or trust the study more so you can see that pilot study is a little neater to show you. So when they were manipulated with a familiar author so when the author of this fake little study was familiar to them. They were more likely to trust the findings overall. But that's not exactly. And maybe look maybe I should have mentioned familiarity to prevent confusion. I just felt that I could gloss over them and no one had mentioned it because it just added time to my talk that and it seemed to be outside of the scope. But I'm sorry about that. I think it's clear now. So Daniel Larkins has a question as well. So he was wondering okay. I think we should have pre registration first and pre registration as an ideal right ideal, and I'll do what registration happens. But there's also registration as a current practice. Do you think that respondents would differentiate between these two what do you think the response and in mind. Possibly so that's actually something that I discuss in the in the paper itself I think so there was a big big big time gap. I mean, like a year and a half to two years between the collection of the first data and the collection of the data for the full study. The collection of the data in the pilot study I think was done in about 2016 and 2019 was the full study. So I think in that basically in the in the time between those two time points. People have become maybe potentially more aware of the issues with pre registration registered reports more pre registration than anything actually. I think people have started to sort of explore pre registration for themselves in that time. And it's quite possible that that people are starting to realize yeah, there are issues with pre registration need to be ironed out. It's quite possible that what we had in our minds was the ideal. I was certainly very idealistic about them when I first started this study again in 2015 or so. I didn't realize myself about all these complications that pre registration is difficult to do right and it's it's as a system it's hard to get going and be consistent across you know across people across disciplines and whatnot there there are complications. And I think it's quite possible, especially again in the later data collection point that people were starting to touch on that and say hey, I like the idea, but it's complicated. I think that's that's completely possible. Yeah. That's a good observation. Yeah. Yes, thank you. That's nicely links to my final remarks. So thanks Sarah and I'll share my screen once more. So, let's see. So, if there's one thing I think our, or talks have have shown is that pre registration is indeed not a panacea, and there's still a lot of things that are going wrong. Also because frustration is still in its infancy. So, to answer the question how is pre registration doesn't actually allow others to transparently evaluate the severity of tests. We're not sure. So, we can we have some conclusions. So, first, the quality of pre understood hypothesis is low. Selective hypothesis reporting is prevalent. Also important study elements are not adequately described, neither in a pre registration and in the paper, which makes it really hard to assess pre registration effectiveness. It seems to have the expected effects on a portion of positive results so that decreases for previous studies compared to standard studies, and is still higher than in registered reports so that's actually the portion that you would expect beforehand. And it's also unclear whether presentation increases trust and science, sir and outlined. So all in all, I guess it's a mixed bag, I would say a little bit of a sad results so we're definitely not there yet. But I also think there's a lot of room for improvement so. For example, registered reports could resolve the problem with standard pedestrians because we viewers can then flag on specific or non producible pedestrians. I think the registration infrastructure can be improved I think Center for Open Science is taking huge steps in that. And there's more and more templates for pure registration. Also skills can be improved so this will be a matter of education I guess for also for us as meta scientists to handle this. So in general, all of this leads to more work for meta scientists, which I guess is a good thing. If only the funding bodies would actually fund our work really nice. So with that I'd like to conclude the session, like to thank all the speakers and and all those who were here, even on a Saturday morning European time. So it's 9pm now in Amsterdam. So I think I'll just grab a beer and go and have with my friends. Any case, thanks everybody for being here. I think it was a really nice session. Thanks a lot for organizing. Thanks.