 and welcome. This is a webinar about new processes available on the open science framework, secondary data pre-registration template. Joining us today, Omo Vandacher PhD candidate at Tilburg University School of Social and Behavioral Sciences in the Department of Methodology. Pam Davis-Kean, professor of psychology from the University of Michigan. Maron Bakker should be joining, just joined us, great. Thank you. Also from the Tilburg School of Social and Behavioral Sciences in the Department of Methodology. Leaders who have been developing, several years now, standards and best practices for working with pre-registration, particularly with existing data sets and using those. Today's webinar will give an overview of all the issues surrounding and benefits of pre-registration and the specific implications of doing so with secondary data analysis. It'll give a brief overview of how the template was created, the history of that and where it exists right now. Give a brief demo of what it looks like on the OSF. Then we'll follow up with a couple of points about important reminders for when you're writing up the results of pre-registered work and importantly, ways to provide feedback on how it looks in the registry and the types of disciplines it is most relevant to. We'll wrap up with the Q&A session. A lot of the great discussion will occur in that time. So please use the Q&A box. I'll be monitoring that throughout the webinar. So if there's a point of clarification, I'll stop us and interrupt. But otherwise, most questions will leave till the end. With that, I'm going to stop sharing my screen and pass it over to Olmo. So that's your work. So good afternoon, everyone, or good morning or good evening. I'm not sure which time zone everyone is in. For me, it's afternoon. So I'll stick with that. Yes, so today we'll be discussing our template for the pre-registration of secondary data analysis. And I would like to start with basically the basics. So what is secondary data? So of course, the name secondary data implies that there's also primary data. And primary data is basically the data you get when you go to the standards empirical cycles, scientific methods. So we have a scientist. They have a research question or an hypothesis. And based on that research question or hypothesis, they collect data. So the goal of the data collection is to specifically address that research question or hypothesis. So that's the standard empirical cycle and that's what we call primary data. So secondary data. Then someone else comes in basically. So we have another scientist and they have another question, another research question than first one I've talked about. But they will use the data that was collected by the other scientists. So they will use someone else's data and answer a different research question than what the data was originally collected for. So that's an important distinction between primary and secondary data. Primary data, the data has been collected specifically for that research question. In secondary data, that's not the case, but data can still answer questions of relevant to that research. Secondary data is also revisiting your own data. So in this case, we have the same researcher again, but now they have a different research question or different hypothesis, but they can still go back to the data they originally collected for this reason. That would still constitute as secondary data. We also, we call secondary data also existing data. So you can use that as synonyms. And finally, it's not always the case that individual researchers collect data for to answer their research questions. It also be institutions that can just collect a large sample of data that other researchers can use. So in that sense, there is no primary data because the institution didn't really collect the data for a specific hypothesis, but there is secondary data because other researchers are using the data for their own research questions. So hopefully that sets the stage. Now for some examples of secondary data. A lot of it is large representative surveys. So for example, the European social survey, that's just a wide range of data about all kinds of attitudes of Europeans. There's also the World Bank that provides open data that other researchers can use. So these are the institutions that I just talked about. And there's of course the individual researchers that provide their data on OSF, for example, and other people can use that data to do their own analysis. So that also constitutes secondary data analysis and now in this age of open science and open data, we should see that more and more often. So now to some characteristics of secondary data. So one thing is that there are often many different variables, respondents and time points. So as I just showed you those really large European surveys they contain a lot of data points, they contain a lot of variables and that also makes into a kind of a data buffet. There's so much things to choose. And it also means that there are so many research or degrees of freedoms because I have this variable I wanna measure but I can do so in this way, this way or this way but which one am I gonna choose? So again, we have a researcher and a research question and he can use one variable of the dataset but he can also use another and he can also use another and maybe the last one finally gets him a significant result. This is a problem because that's data contingent decision-making. So you first look at the results and then decide what you do. We also call that be hacking and one solution to that is pre-registration. So that's the reason why we're all here. And so that basically blocks these researchers choices and only limits you to the one you pre-defined or pre-registers. So in that sense, pre-registration can prevent this data. Also unique to secondary data is that they're often really hard to come by. For example, it's data from hunter-gatherer societies that other people will use because they cannot look at it themselves. So it's often unique data. And it also means that there will be many analysis turned on this data. And that means either you yourself have prior knowledge of the data because you're revisiting the data over and over again or because a lot of other people have used it then you might know from them stuff about the data already. So this leads to another problem, which we call a harking. So here the lines represent basically theories. So you have prior knowledge of the data and then one of the things that you measure comes out as significant. And then you say after you found that, oh, yeah, but I knew about it all after all. So we call it harking, which is hypothesizing after the results are known. And again, a solution here is pre-registration because pre-registration, if you register all these three studies, then people know, oh, they did these three studies and you cannot say, oh, I only did this one. And, oh, yeah, that supported my hypothesis. So again, pre-registration can help here. So pre-registration, how can we define that? For primary data, we could define it as specifying your data research design, data collection plan, and analysis plan before data collection starts. This data collection is of course part of that research pipeline. And for secondary data, it's a little bit simpler. It's specifying your analysis plan before doing the data analysis. And one thing you might know, at least I have experienced is that it's really hard. And that's why we need to provide researchers with guidance. So that's what we try to do with this template. And Marianne will now go into this process of developing this template. Yes, thanks, Olmo. Yeah, so as you showed in the slide before, in traditional pre-registration for primary studies, so the way it was developed, you basically pre-register it before data collection. And therefore also some people say, like, yeah, we cannot pre-register secondary data because the data is already there. And this was kind of a discussion. And this was discussed, especially at the CIPS meeting in 2017. And because a group of people saw the advantages of pre-registration and wanted to use that as well for secondary data, but how to do that? So at the Society for Improving Psychological Science, they had this discussion. I wasn't there yet, but that resulted also in a paper. And a lot of people were already involved in a discussion like, yeah, what is important for these pre-registration of a secondary data? Another result of that discussion was that in the next CIPS meeting in 2018, you can go to the next slide. We organized a hacker shop or Sarah who was involved in the discussion. So you wanted to do a hacker shop and invited me and Olmo to join because we had experience with pre-registrations. And this hacker shop was basically a combination between a workshop and a hackathon. And in this workshop, yeah, we had some people gather who had experience with pre-registration with also with secondary data. And we just had a Q&A session about all those topics related to pre-registering data that was also there. So that was really great. And I think we had a wonderful discussion and most of the people also stayed because we continued with a hackathon for the rest of the day. And in this hackathon, the goal was to, yeah, get the first version of the template for a pre-registration with secondary data. And yeah, we did really well on that day. I think it was really a good day because we had a very nice group of people. We started with some general discussions about what is important in secondary data. So one of the things is, for example, what Olmo also said, you have often many variables in these data sets. So it's really important that you decide beforehand which variables you will use. So that's just one important thing that you really, yeah, describe which ones you are going to use. And another thing that is, of course, really important with secondary data is that you are really transparent about your knowledge of the data because, of course, the data is already there. And while in a pre-registration of primary data, people can be sure that the data is only collected afterwards. Yeah, the data is already there and the researcher should be just transparent about their knowledge. So have they used the data before, which variables, these kind of things. So that was kind of an important part. So the description of the prior knowledge. And then, yeah, we basically divided the people in groups, or you can still, yeah. And people were just discussed, used the OSF pre-registration templates and they went over it in groups and discussed whether these parts should be deleted, keep changed or added. And you see a little bit of a picture that I was able to dig up with our whiteboard which contains also all these different aspects that we went over. We went over it by two teams and we also tried to test it. So we had a kind of an example study that we used to see whether that worked. And yeah, it was really nice because by the end of the day, we had a first version. Of course, it was not finalized. So we wrapped it up by email and I just checked and in July, we submitted it to the OSF. So I think this went really fast, which was really nice. Yeah, you can go to the next slide. But of course, we were still discussing things by email and everything. So and we were thinking like, yeah, maybe we should give researchers even more guidance and do something with it a little bit more. And therefore we proposed to write a tutorial paper and we asked the SIPs participants to join. So maybe there were even more people at SIPs but at least these one kept being involved and helped writing a tutorial paper. Again, we went over it in different groups trying to come up with an example. And this tutorial is also, yeah, it will be published in meta science, I think. Meta psychology, yeah. Meta psychology, yeah. And the preface is already available. So I think that's my part right now or do I have another slide now? No, I'll take over from you. Yes, thanks. Yeah, I'll go a little bit more into this tutorial and to the template itself and how it looks on OSF. So, the world paper was using an example and this example also features on the OSF template. An example was our moralist people, more pro-social than less religious people. That was a research question. And we also used, of course, secondary data and we used a data set which is called the Wisconsin Longitudinal Study. And those are more than 10,000 graduates from Wisconsin high schools. I think in 1957 already it started. So it's really a long term data set. So we use that to answer this research question. And the research question was based on the golden rule. That is that you should treat others like you yourself would be treated, like to be treated. So once again, the characteristics of secondary data that I discussed earlier. So it's a data buffet which leads to the risk of be hacking and there's can be prior knowledge which leads to a risk of harking. And for both things, pre-registration can be a solution. And also both things are relevant to primary data analysis as well. But they are more salient here because of the features of secondary data. So what did we try to do with our template? So we have one, a couple of guiding heuristics and one is specify your variables and statistical analysis in detail. So this goes back to this data buffet because so many variables choose from that you should really be specific which ones you are going to use in your analysis. So that you cannot use A and then B and then C and then only use the one that works. So how does this look like in practice for our research question? Well, of course, we have to specify the data. This is all waves of data and we decided to choose this one. Within that wave are several surveys on the phone or through email. We chose these two. So we chose to have some questions about volunteering to represent pro-social behavior. And there were some religiosity questions to measure the degree of religiosity people have. So this is the example we use throughout the template. So you can see that if you look at the template in OSF you can also see that this example comes across many times. So we should be even more specific, actually. We should also provide the actual variables and preferably how they are called in the data. So they are called IL-001-RER. So please also specify that because that really makes things easier. And we use two variables, two measures to assess pro-social behavior. As I said, it was about volunteering. We used a binary measure. Did you do that in the last 12 months? And we used a more continuous measure. And use it also specify how all these items and variables are scored. So there's a lot of information you need to provide. And OSF templates give some guidance here. So this is a sneak peek. So on the left you see all basically categories of questions you get. We have now already landed on the variables one because I wanted to highlight that because it's so important for secondary data. And at the top here, you can see the question about variables. So there's a little bit of an explanation of what you should provide. And also you can click here on show example and then it pops open and you get basically what I just showed you on the slide before. All these variables and how they are measured, et cetera. So this gives you some guidance as well. What you also should do is specify your analysis because it might be clear which variables you have, but how do you are gonna use them in the analysis itself? Of course it's also vital. So here we use two different regression analysis, one using the binary dependent variable and one using the continuous one. And also here it's important to not forget outliers, missing data and inference criteria. And of course in the template, you're also prompted to specify this. So here's an example of how it looks like again. So now we came to the category analysis and you see at the top statistical models. So here, small description of what you need to specify. And again, there is an example based on our velocity research question. And just thought I would highlight this. We also include our code in our example, which is just a good practice that I just wanted to forward here in this webinar. So that was one thing, the data buffet. That's basically be specific about everything you're gonna do. Then there's also this prior knowledge issue. So you might already know some things about the data. So how do other people know that you're actually not using that prior knowledge to choose which analysis you're going to do? Because if you already know, okay, A leads to B, if you've already seen the data and you know that A is positively associated with B, then it doesn't make sense to do that hypothesis, right? Because you already know the answer. So to assess those things, we also ask in this template, okay, what is your prior knowledge about the data set you're using? So this is an answer. You don't have to read it. I'll just walk you through the most important things. So you should, if you already used some variables in another analysis, name those variables. If you find any associations of those variables with other variables, note that. What are the consequences of this association? So what does it have for an effect on your current hypotheses that you're now trying to test? And are you going to control for it basically? So in this example, we included a control variable in our analysis because we had some suspicion based on our prior knowledge that that might be relevant to our new analysis. So this way your prior knowledge can even improve the analysis you're doing. So this is again, what it looks like on OSF. You can see here also this field can be blank. Those red stars indicate obligatory items. So this is actually the most important one because it's so unique to secondary data. So that's why it has a red S to risk here. And an accompanying item is this one which says to list your prior work using this data set. So including any relevant variables you analyzed and preferably for each other separately. And this is important because a list of these publications or talks can help others assess whether your answer to question 18, the previous question is plausible. So you can say, okay, I have no prior knowledge of the data in the previous question. But then if you list eight papers that you used and were used this data sets, it might not be plausible that you indeed have no prior knowledge at all about the data sets. So that's why we also included this question. And here's an example of an answer. So all three authors. So do this for each other separately. Show which conferences you might have presented this data. And also whether you have submitted it somewhere or whether it's published somewhere. This is a for important information for others to check possibility of your prior knowledge. And of course these items and the questions are linked together under the header knowledge of data. So there's some potential difficulties with this. We are aware of that and we'll probably learn about questions in the Q&A as well about this. So what prior knowledge is irrelevant? That's a hard question because you can't really be sure. So that's why one guiding principle I would say is be inclusive. If you're in doubt, just include it. Preferably more information than less information. And another one is what if you have been conscious prior knowledge? So that means that you forgot about something about the data, for example. How do you get to know this? In this, you can try to be exhaustive. So really dig for your memory and see, oh, I did this paper and would then learn again about the data sets. So really airfully list all the previous things you've done with data sets and be exhaustive in that. And some things are bound to pop up. So now I'll give the floor back to Marjan because our template is also being actually used. So that's great. And hopefully even more so now that's an OSF. But Marjan is gonna walk you through some research examples that were based on our template. Yes. So yeah, I was for this webinar checking some examples and it's just really nice to see that people are really using it and I think that's just great. You also see here I have some titles. So we see different examples, things about personality traits and political preferences, more developmental studies, but also in what predicts teachers, ICT related professional development. So there are a lot of different things that it's used for. So I think that's great. And I just wanted to show some parts of it. So this is a pre-registration by Caitlin and Pamela. And here we see a little bit of the data description. So they describe which data set they used the years that it's from where it is available, when it was downloaded. And also that they haven't looked at it further and also that where the code book can be found. So this is more of a general data description. And I think if we go to the next slide, then they describe also very neat what their knowledge of the data is. So Caitlin hasn't worked with it, but Pam is very familiar, but she not with the things that she uses right now, but she has used some other variables. So like social demographic variables, parents, educational attainments and so on. So I think this really shows how you can describe that. And it's not that something is wrong or correct, but it's just be as transparent as possible about your prior knowledge. So here we see another one, which is about self-reported personality traits and political preferences. And we say do multiple studies. So they have some with primary data and also with secondary data. And yeah, you can go to the next slide. So here they also give an example about a paper that they used with the same panel data. And a description of that. And they also state that they don't have prior knowledge about the trends that they will be working with. Although they have now this, of course, also this publication. So, but probably there are about different parts, but I didn't go into that as much as possible but to see that. But here you can see that you can just show what you already have done with the data. And there's also this example, which was also one of the papers in its day. They combined different data sets. So they had all those individual researchers had collected data about personality traits and also cortisol and testosterone levels, I think. And now they wanted to combine everything. And this, I think this is really great and interesting to really combine all this data. But that also means that, of course, researchers know already their own data because they have collected it, maybe published already about this. But they were really transparent about that. So for example, they mentioned a selection of the researchers may know individual data sets very well, particularly given that some publication have resulted from their use. And then they refer to the end of the document with all the publications. And again, this is also to show you how you can be just transparent about this. And of course, if you pre-register, almost always something doesn't work out as you planned. And I wanted to show these two examples, I think, because yeah, that's not bad or something, but it's just that you should be, again, transparent also about these deviations. And yeah, we found here two deviations that were presented in notes in the final papers. So one is about the convergence issue. Yeah, if you run complicated models, then often you also encountered convergence issues. And then of course, you have to make some decisions. And they describe here what they did. And in the second example, they stated that in their pre-registration, they intended to have at least 4,000 participants, but in the end they had just a little bit less than 4,000. Also, I think just a minor issue, but at least they presented it very transparently. Oh yeah, so just to wrap up, for this pre-registration of secondary data, it's really important that you specify your variables. So be very clear about which variables you want to include, because often there are many, many variables, and also how you will analyze them. And of course, the prior knowledge. So do this for every author that is involved in the project. And yeah, of course, it's better to be very extensive about it and just add all the information that you have and be honest. Yeah, you might have some prior knowledge, but just be open and honest about it. So pre-registration is hard. And I think the pre-registration of secondary data is also hard because of these additional things like this prior knowledge. But it's also really worthwhile because in this way, you can have others and yourself as well, have more confidence in the validity of your findings. So I think that's just really important and work well to do this. And I hope that this tutorial, this template and how it's now implemented in the OSF will make it easier for everyone. Thank you, Margon. I've just included a link to the paper there for everyone to take a look at. I'm right now going to share my screen and show you a little bit how to access the secondary data pre-registration form. Give a couple of final reminders and a homework assignment. If you're attending today, we do ask for a little bit of feedback from you. So please be ready for that in just a moment. And then we'll open it up the floor for Q&A. Let me share my screen. So this is the OSF registry. There are a couple of neat new features. If you've used OSF before, there are a couple of recent improvements that I'm really excited to show off. So you can get to the OSF registry just through this osf.io registries. And you don't have to have an existing OSF project or anything like that. You can just select add new. If you do have analysis code or project description on an OSF project that you may want to have, you may do that. But maybe at the time, this will be the first step in starting a new project. So do you have content? No. And then this secondary data pre-registration template is available on the dropdown list of fields. Create draft. Now I'll take just a moment as the gears spin in the background. And I call this a five, just so I can show something else in a moment. And here's where you start getting into the meat of the study. One important that Alma and Marianne have already given sort of an explanation and demo where it is. One additional thing that I'll point out is the My Registrations tab. So once you have a draft available on OSF, this My Registrations tab shows you all of your submitted pre-registrations and the draft of those will be available here in the draft tab. All right, let me follow up with a couple of final points. Marianne mentioned this a little bit, but some of the points that we like to emphasize once you've pre-registered and are writing up the results of pre-registered work, there's a couple of reminders to make sure to follow. This one should be relatively obvious if you've done it before, but just remember that when submitting the results of pre-registered work for publication in any journal, make sure to include a link to that. Each registration on OSF has a persistent unique identifier to include that when describing your study design and pre-registration. Report the results of all of your pre-specified analyses. You have 10 analyses, one huge large model, whatever it is, make sure to report the results of each of those analyses that were included in the pre-registration. Any unregistered analyses can and should be included, but just indicate those with clear descriptions or under a different heading. Report that as unregistered work. Generally speaking, that is best described as exploratory analysis that deserve to be later confirmed, but sometimes there'll be a little bit of blurring between there and that's okay too. As long and that is only understandable if it's sort of clearly indicated what was registered and what was not part of that pre-registration. And as Marjan mentioned, any changes to the pre-specified plan, please include that. You can, there's a template of how to document that. If it's not just clearly indicated in the text should also be there, but these transparent changes are a way to give better clarity and better context for the results of all pre-specified work. In one moment, I'm going to share a link. Once I stopped sharing my screen, I'm going to share a link to a feedback form. And so take a look on the OSF registry, start a draft of the secondary data courage forum if you'd like to. And we'd love to hear about it. We've been really racking our brains about how individuals, how different disciplines in particular interact with these types of issues. So let us know what discipline you're coming from. And then basically let us know what you like or don't like about the forum. If there are questions on there that don't quite make sense, sometimes even very basic words like study or experiment mean different things and different disciplines. So we want to be aware of how the addiction provided by this template, how that makes sense to you or not. So please go ahead and use this forum. We'll share this also in an email afterwards and be tweeting it out and using it for user feedback. But once you've taken a close look at the registration template, please let us know what you think about it. And with that, I'm going to stop sharing my screen and open the floor for CUNYs. I see we have a couple there. First thing I'm going to do is make the panelists. Second thing I'm going to do is share the feedback form. And then third thing I'll do, oh my gosh, is stop sharing. All right, let me take a look at the Q and A here. I think there is also a raised hand. So I'll be taking a look at that next. Please give me a moment. Patricia asks, is there? I can take Patricia's question. Yeah, please do. Yeah, so Patricia's asking or stating that there is this registration requires a lot of ethical commitments from finding authors from running out, a lot of p-hacking and harking before doing the pre-registration. So they say, okay, you can just basically do all the harking before and p-hacking and then pre-registrate your stuff. And then you would still get the benefits of pre-registration, which could be more, the people think your research is more credible, for example. And I think that is correct. So, and I think there's not really a way to avoid that. So if you really want to cheat the system, if you really want to be a bad scientist, so to say, then you can. So pre-registration is not foolproof and this is still, can still be a problem. But we tried to circumvent this a little bit by including in our templates. I don't think it's implemented in OSF templates, but it's in our original one or role one, basically. Like a statement saying, okay, with uploading this pre-registration, we're saying, okay, we were truthful about our prior knowledge and this is the only pre-registration that we do about this hypothesis or research question. So having such a formal statement, hopefully increases the barrier for people to actually do this kind of cheating that you mentioned because if you say this formally, if you have formal statement like this and you do other stuff anyway, then it's more like fraud, really official fraud. And hopefully that is like a barrier for people to engage in these practices. But still, if there's a will, there's a way you can cheat if you want, I think. Next one, next question. Thank you, Ulmo. It's going to Pam. This is from Alaria. I was wondering about the main analysis versus robustness tests. Do you recommend or suggest to register both? And if so, should that be the same or different pre-registrations? Thank you for the answer. Yes. So we pre-register our robustness checks because we're using correlational data. And so we almost always know we have to do robustness checks on the correlational data. However, having said that, having done now two secondary data pre-registrations, one of the things I want to point out because you're often using, in my case, we're often using national population studies. And even though I'm really familiar with the data, the child development supplement PSID, I actually designed. So in some ways it's my primary data collection of which people are doing secondary data analysis. I'm doing secondary data analysis. There are things that you think you know about how data was collected. And then as you're actually analyzing it, you go in thinking, okay, they asked us at three time points, but they did something crazy, like change the way the question was actually asked, even though they're calling it the same thing in the third wave. And you don't realize that till you're actually analyzing the data. So what we do when we have robustness checks or things that show up afterwards is that we do amendments to our pre-registration and state exactly that. We say on this date, we found out as we were analyzing the data that the question that has exactly the same variable name but was in year three, was actually asked in a different way. Thus we're not sure whether or not the correlation we're seeing or the lack thereof is related to the change of the question. And so we just, we put that in and for robustness checks, if we realize a question has come up or we're asked a question like, but how do you not know it's this? And so we've added a robustness check, we do an amendment and we say on this date due to a question by a reviewer or a question by a lab member, we have added a robustness check and we have not touched the data. This is what we think. And I have to tell you this, it is very hard. You have to retrain yourself. You're right in the data set not to analyze that when a question comes up. You have to stop, which is what we've done at the pre-registration, talk about it, decide how we're going to approach it, write the amendment and then do it that way just to try to, and it's both somewhat freeing to actually have those conversations instead of doing analyses on the fly and not remembering why you did stuff which comes up during review and actually having it detailed. So we think of this as also just our lab manual of how we write out everything we've done and decisions that have been made. So it really helps us, but I would either, so if you know ahead of time you're going to do robustness checks, you can put them in the pre-reg or you can use the amendment to your pre-reg which we have used way more often than I thought we were going to do, but we use that quite frequently. I'm going to allow Greg Murray has his hand raised. Greg, be ready. I'm going to, oh, I think you should have microphone ability now. Hi folks, thank you so much for this. This is really, this is an interesting presentation. I've been interested in it for a long time. I am the editor of a journal that would be pushing more open science projects. We've made some progress with registered reports. You know, I would like to do some of these pre-registered secondary analyses of secondary data. I'm wondering sort of more from the writing it up in the manuscript side because sometimes reviewers have different visions of people's research than the researchers do themselves. So there are a couple of issues. Pam sort of addressed this with the robustness check answer. Say the reviewer just wants a completely different analysis of some kind. I think my understanding would be you'd still be compelled to do. Well, I don't know. I guess maybe I'm assuming something I shouldn't be. Are you still, do you still do your analysis? And how do you say, well, we were going to do analysis A, but reviewer didn't like it. So we had to take it out. And then how do you hand? I mean, in a registered report, you would deal with this sort of stuff as sort of exploratory analyses or something along those lines. How would you identify it in the actual manuscript in a section, I don't know, called reviewer requested analyses or, I don't know, part of me says just throw it back at the reviewer. And, you know, I don't know. So I guess I'm trying to figure out how to manage that stuff. I've actually called that reviewer harking when reviewers want us to go back and change our question. I'll just answer this briefly because other people probably have opinions too that what we've done is I usually reply to that request and say this is a preregistered study. So we followed our preregistration. This would be new. I can add it, but I have to add it into an exploratory section just like you said, Greg. And I have to state that, that this was off the prereg. So, and if I do that, I can also preregister. I can put a supplemental in as I just said. And to my prereg, saying based on a reviewer comment, we've added this, this was not the intent of the original study. And so this is exploratory in nature. What I tell my students to do, especially with our correlational data, is that they also have to up the p-value. So I don't want to see unexploratory data of p-value of 0.05. I want them to be much harsher. We have to like charge ourselves for doing exploratory data. So we're looking for things at 0.001. And I explain that back when I send the paper back in. That's helpful. Thank you. Thank you very much. Next question, I think from Emma Jones, asking about the conflict between disclosing prior knowledge and journals that use the double blind peer review system. Just sort of stating, understand that there's no perfect solution by wonder of any discussions and with journal editors in this regard. Yes, no, and all the way in between. They may want to step on that. Arjun, do you want to start in PAM? Do you have any experience dealing with editorial processes that double blind peer review and registration? I saw just one example when I was going over these examples and they kind of copy paste in the pre-registration, I think from the OSF again, to a file in which they just removed some of the names. So they try to make it as blind as possible and send that in or use that to share with the reviewer. So that might be one solution that is used. Yeah, we just stopped with this last week. So, and this was for psych science which has the double blind and we had to have a conversation about it. What do we do? This is a pre-registered study. We have it stated in the abstract. So what we did is what you would do when it's blinded to author is that we said this is pre-registered and we blinded the OSF pre-registration for review but that it's available. So that's what we've done. You can also sign a creative view only link for registration. We ran into this where we could do things with projects but not registration. So where it was harder to do. But I think you could do, like I said, the old like not revealing the author. You can do the same thing saying blinded for review but this is a pre-registered study. And then you gave the anonymous option as well. It is a little bit concerning that you wanna state that you've pre-registered but a pre-registration can find the people who actually did it. But I will tell you having heard this and actually just to note this is unethical as well but people often take the title of the paper and just do a Google search on it and they get to the pre-registration. So again, authors aren't supposed to do that under a double blind but I frequently on Twitter hear people say, oh yeah, I just looked it up. I'm like, well, you know, a double blind you're not supposed to know the authors but anyway, that's just, you can do the best you can and hope it stays double blind if that, I'm not big on double blind related reviews but if that's the case for your journal. All right, this question coming in from David Disabato. How do you recommend handling period strings, the scoring process when you don't know the psychometrics of the data? For example, it'd be hard to specify what score and you will use until you have done psychometric analysis. Do you recommend doing those analyses before or after you've created the pre-registration? I don't wanna be the only one answering but for psychometric stuff, I often do know what I'm gonna use. So I guess I don't know the example. So in this case, if you're using, if I know that I have to get internal reliability then I state that I'm gonna get internal reliability and I state the items that I'm gonna be doing that on. Generally that is also dictated by the field because I'm using other measures. So I'm gonna be doing the scale that the measure comes from. Exploratory factor analysis maybe what you're referring to. And then that would, I would register the exploratory factor analysis then saying things like, is it something that a lot of people don't. Hey, I'm not sure if this is just on mine, but. Been in. There's a little bit of breaking up. And forward that you can take 10, 20% of your data, explore. Shoot. That's okay, you're coming in and out. Sorry about that. So that you can take 10 to 20% of your data, explore on that and then confirm on the rest. So you could do that with the psychometrics as well. You could look at the data a random 10 to 20% depending on what sample size you have and then confirm on the rest. So again, these are all things that you can pre-register and if you have to change from your pre-registration, you can do an amendment to the pre-registration. Let's see. This one from Jim, I'll pass to Omo. What a way, oh, I'm sorry, I answered that. What a way to somewhat demonstrate you've not seen the data prior to analysis be to randomly sample the data if the size of the asset makes it feasible with the seed random sample dictated by something after the pre-registration such as like a moderate number to the results of a future football match. I don't know if this would make sense or be reasonable. It sounds cool that it is doable. It definitely sounds cool, yeah. It's interesting suggestion, but I think no. So if you, for example, know the ins and outs of a data set, and then just to take a random sample of that data set, odds aren't you still know about patterns in the data, right? So it's just that you're taking a random sample of something you know. So in the random sample, you probably also know about patterns. So that's different from knowing something about one part of data set and then confirming it in another, like I'm just discussed. So now you know something, take a sample of that, but then you still know something about that little sample. So I'm not sure if this is really a resolution, but maybe someone else has other thoughts. I don't have other thoughts because I think you described that well, Ammo, but I just also want to note to people in the real world of doing analyses. I'm a full professor. I've used most large scale education and developmental data sets. And so I have a lot of knowledge of them. I ran a center where we replicated across longitudinal data sets. So I know a lot about them. This doesn't mean I know every question or possible question that comes up. And I think that the point of doing secondary pre-reg and why it's so important for secondary data is so that you can restrict what you're looking at in these data sets to the questions of interest, your scientific questions, that otherwise people are in correlational data sets, you're gonna find lots of things that look like they matter that don't actually matter. And the point of the pre-reg isn't to try to stop you from doing science, it's to make sure that the people who are reading what you're doing and want to replicate it or to see how theirs compares to yours knows what you did and that the reviewers know what you did. So again, I think if you have prior knowledge ahead of time and you're pulling a piece, you just note that in your pre-registration and let the reviewers decide if they think that that's too close to what you did or you were too knowledgeable, thus it moved your analyses in a certain direction that it should not just be upfront about it. And then the reviewers can decide if they think that that's gonna change the outcome or if it again led you to a finding that was not maybe a true finding but a spurious finding from the way that you looked at it. I answered this question here. We got two more questions in the queue right now. I jumped the gun on this one, I accidentally copied it in everyone's chat. That's fine, I'll go ahead and answer it right now. That's the best way to add these time-sent amendments to an OSF registration. There are a couple of ways to do that. And I'll in just a moment post some links to both of those. When you do create a pre-registration on the backend there is a live OSF project for you to store files, data sets, code, anything you want. And one of those can and should be either in one of the wikis on the OSF project or just an uploaded document. Every time you upload it, it'll be time-stamped and you can indicate when that change or amendment was occurred. On the registration itself, those are frozen. There are a few fields in the registration metadata. I'll post a link to that, that you can update the description if there's anything you'd like to note there and I'll provide some examples what that looks like also. But I recommend using the backend OSF project as an example for how to do that. The last question, this is right in between you may be primary and secondary but we'll see what the panel thinks. What about new research questions that are generated during and not after the data collection? Do you call those secondary data? I guess given the data collection started it could still be called secondary data registration. What do you think? So Amo, you kind of went over this during the presentation, right? That even, so the primary data collection that was based on, if it was based on hypothesis testing and I do want to note that a lot of these as Amo also said, the institutional data collected or collected for surveys, large-scale surveys are collected for the community. And so they, not that they aren't generally related to some kind of larger issues of hypothesis testing they're not specific. And so those will be utilized as secondary data but in the case of doing primary data you had specific hypotheses but now other people and other student comes in they have a different question. You can register that and Amo correct me if I'm incorrect about this but you can register that as a secondary registration because it wasn't what the data was originally collected for and I'll tell you that's an advantage because when you get to a position where you're like well I would have liked to have asked this but the data didn't have that that really is a secondary data problem more than a primary data, right? You should have, if it was your primary data you should have collected the right data. So again in that situation where you have a different question pre-registering it as a secondary data would be I think much more to your advantage than trying to do it as primary. Yeah, I fully agree. I think actually our template is particularly useful for these situations because prior knowledge is bound to be an issue here because you're already working with data. So the questions in our template about prior knowledge are definitely useful to use. Couple of answers are coming through. I think Amo you might be providing perhaps a few. Last question in our last minute David, this motto asks what happens when you realize that the pre-register analysis is not the right one? Maybe you attended a method section or a methods class and then you figure out a better way to do it. Any final thoughts for that from David this motto? Yeah, that happens frequently. Mostly students are using these pre-registrations a lot and they're learning stuff as time goes on. So to the extent that I can catch it ahead in time and say we need to do a different analysis, that's great. Things are changing. I use missing data analysis a lot. Sometimes things are updated and so we have to go back and change. This is what we use the amendments for. Right now we have a pre-registration on the ones that were highlighted that we're running into a binary mediation related. We're trying to figure out what the analysis should be. So we can't formally pre-register until we figure it out. But a lot of times people will throw in a structural equation model and then they find out that's not really the appropriate analyses they thought it was. And so they do an amendment and say upon additional information and consultation with statistical people, we've decided on this. That's an amendment. And we are at time. I wanna respect and thank our panelists for their time and work to date to get this up and running. We're super appreciative of it. Thank you again. And everybody have a great day. Bye. Thanks.