 All right, welcome everybody to this MetaScience 2023 virtual symposium, towards objective indicators of trustworthiness of research findings, lessons learned in the transparent SI project. I'm Tom Hardwick, a research fellow at the University of Melbourne. And I'm hosting this session on behalf of AMOS, the Association for Interdisciplinary Meta Research and Open Science. So before we begin, I'd like to acknowledge and pay respect to the elders and descendants of the Borendry people, of the Kulin nation, who have been and are the custodians of the land where I am speaking to you from. And I'll also take the opportunity here to advertise the AMOS annual conference, which is taking place in Brisbane, Australia this year between the 21st and the 23rd of November. So for updates on that and all of the AMOS-related news, you can follow us on Twitter or Mastodon. And you can join our mailing list on the website, AMOS.community. So today's session will last for 90 minutes. If you'd like to ask a question of the panel, then please do so using the Q&A function. And then I will then forward those questions to the panel. I'm going to introduce the panelists now and then I'll hand over to them for the remainder of the session. So we have, let's go to the list up, Renault Evra from the University of Lorraine, Marcel Van Assen from the Meta Research Center at Tilburg University, Jakob Jolley from the Association of Dutch Research Universities and the University of Kroningen, Atricio Tresoldi of the University of Padover, Eric-Jan Wagenmackers of the Psychological Methods Unit at the University of Amsterdam, David Vernon, who I'm not sure is here yet, but hopefully will be soon from Canterbury Christchurch University. And finally, Zoltan Kakec of the Elter Institute of Psychology. So I'm going to hand over to Zoltan now who will guide you through the rest of today's session. So over to you, Zoltan. Thank you so much Tom. So thank you everyone for joining us today. I would like to share my screen and you should be able to see my screen now. So in this session, we are going to talk about the Transparency Project, which is a large-scale multi-site application of Derbems 2011 Experiment 1, but it is much more than that. So I will start with a very brief introduction of the project and then we will start the discussion with the panel members at first with some questions that we kind of jointly formulated before the conference and then I will turn it over to audience questions as well. So we will continue discussing throughout that. So let's start with this brief intro of the project. In the old days, we were doing research studies almost completely in the dark. We did research planning, recruitment, study execution, data management and analysis almost in isolation, maybe discussing with a few colleagues and then revealing these studies and the results only at the publication phase. After the reformist movement and the replication crisis, there were a few solutions to the decreased reproducibility and pre-registration, registered reports and open materials, open data became best practices, if not mainstream yet, but definitely recommended best practices. And with these, we have started to open up this research process. We are able to be very transparent about our research plans or data analysis to some extent. And also at the results and publication phase, we are more transparent with the open data and open materials, but we are still not transparent or we are still doing most of the research execution in the dark, the recruitment, the experimentation itself and the data management is still not, we are not able to demonstrate its integrity. So in this transparency gap, there are a lot of things that could happen. You could just restart your study as many times as you want until you get your desired result. You could do or you could exclude outliers, those pesky outliers that stand between you and significant p-value. You could just simply fabricate data, enter data yourself into a spreadsheet and nobody will know with just using these previously proposed best practices. Or you could just do a regular study in the old way. You could have 100 dependent variables, see which ones are significant and then retrospectively pre-register your study and only those five variables that were significant or only those analysis that produce the significant results. So if you think that this is like, some of these are clear indications of fraud and we shouldn't be too worried about it because they are very rare and the only very small minority of researchers would result to these think again. So in medical research, for example, Hariman indicated that there's 67% of medical research which is retrospectively pre-registered and retrospectively pre-registered label was only put on studies where the registration was submitted on the day or after the day of submission of the paper. So this is only the tip of the iceberg. It doesn't contain people who actually went to some lengths to conceal this retrospective pre-registration. So you might also think that replication might come to the rescue. Replication is the gold standard of scientific research. So we just need to do more applications and then we will see what studies are trustworthy, which studies are replicable. However, replications are time consuming and there's no way you can possibly replicate all of the research that you base your research on. On the other hand, if you rely on other people's replications, so others doing the replications for you, you are still in the same trust trap that you are in for the original studies. What would make you trust the replication studies more than the original studies without indicators of trustworthiness? So instead of relying too heavily on replications, there's a need to create objective indicators for trustworthiness, so that we can create trustworthy original studies and trustworthy replications that we can all rely on better. So that was the issue that transplant C project tried to tackle. And as a case study, we used Dario Bem's 2011 experiment one. That's what we chose to replicate and create a study where the... So a study, the conclusions of which would be acceptable to the main stakeholders, whatever the outcome of the research might be, whether the research might support the C or ESP theory or the null hypothesis that there is no indication of ESP. So you are probably all familiar with the Bem 2011 experiment one, but just a brief summary, this was a parapsychological experiment where people had to... Or saw two curtains in front of them on the computer screen and had to guess which of these two curtains hide a picture. They needed to guess either left or right curtain. After they guessed that curtain was... Or that curtain disappeared and it either revealed a picture which was a positive reinforcement that they did the... Or they chose the right curtain, the correct curtain or a gray background which indicated that they didn't choose correctly. So unbeknownst to the participants, the correct side, so the target side was only randomly decided after they made their choice. So this makes it a precognition experiment. If there is a precognition or premonition that tipped off people about which side the picture would appear on, then you would expect that there would be higher than chance probability for guessing the correct side. On the other hand, if there is no precognition, then you would guess or you would expect that people would guess at 50% rate. So just randomly. So our goal was to produce research that would be acceptable and the conclusions would be acceptable for both sides of the aisle, both people who are proponents of the PSE theory and those who are skeptical of the PSE theory. The main targets that we wanted to tackle this with was to produce a robust research plan that would be secure against post hoc criticism. So it often happens that you produce a research study but the other theoretical aisle would come after the fact, after your publication and say that, okay, but you didn't do this. You used different stimuli that we did. You did a bad job at replicating the environment and so on. So we wanted to create a very secure and robust research plan. And the other main target was that we wanted to be able to demonstrate the integrity of each research step. So not only claim, but hopefully objectively demonstrate that we did everything as we specified in our plans and in our final paper. So in order to create a robust research plan, the main intervention that we used was an expert consensus design. In this expert consensus design, we systematically invited every researcher who has cited BEM's paper in a research publication and Dario BEM himself and also others who have contributed significantly to this conversation in the literature. And we co-designed the research plan with them. There were almost 30 expert researchers in the panel who co-designed this experiment with us, about half of them proponents of the seed theory and about half of them skeptics. One of the cool features in this consensus design was that we also pre-designed the conclusions of our final paper. So before we collected any data, we have predefined word for word, what would be the final conclusion in this paper for each possible outcome of the study. And the panel members had the opportunity to fine-tune this conclusion. And so we ended up publishing that predefined conclusion that the panelists agreed on. And so the members of this consensus design panel are here today. So of course, it was pre-registration and registered reports and to ensure or be able to demonstrate that the protocol, the research protocol was delivered as we intended. We used experimental manuals and checklists and we even asked for a video recording of a mock research session from each experimenter to demonstrate that they have completed the training as we intended. Some of the other features I'm excited about is that we use the direct data deposition, which means that instead of storing the recording data on paper or on a laptop, we directly pushed our data as it was flowing in to a third-party trusted repository, version controlled repository. This way, we are able to demonstrate that the raw data that we collected was unchanged throughout the research process. And we also used born open data that means that this repository was public from the get go. So people had access to our data as data were flowing in. So if there's a tech savvy or data savvy person, they could just use this born open data to reproduce our findings and follow our study in real time and how our findings changed. However, most people probably don't have the expertise for that, so we created something that we call the real-time research report. So this real-time research report meant that through a shiny app website, people could see or results analyzed in the way that would be published in the final paper. So as data was flowing in, people could follow live if we stop this study now, what would be the finding of this study or what would be our conclusions? We also applied the Tempor Evidence software. So this means that we are able to demonstrate that the experimental program, the software that we used and the experiment itself was unchanged throughout the whole process. So we didn't introduce any bias through this experimental software into our study. And there were also external research auditors and also an IT auditor who evaluated the integrity of our research or data or procedures. We use the whole eternity of open science, open materials, code and data, of course, and our papers published in open access. So basically with this methodological toolkit, we were able to patch up this transparency gap and open up the planning, study execution, data management, data analysis and the results phase of our study much better than with the previous best practices. So let's see briefly what we found with the involvement of 2,115 participants. So this is more than 20 times the number of participants in the original experiment one in the BEMS paper. We found that the probability of successfully guessing the correct side was 49.89% that is very close to the 50% expected by chance. And indeed this data supports the null hypothesis or the null model, indicating that there's no higher than chance probability for guessing correctly in this paradigm. You can also see the base factor accumulation of the base factor over time. It remains above 50 throughout most of the data collection process indicating a strong support for the null model again. So we can conclude that the findings of this study are not consistent with the predictions of the ESP model in this paradigm and that the failure to reproduce the previous positive findings with this strict methodology indicates that the overall positive effect in the literature might be the result of recognized methodological biases rather than extra sensory perception or PSE. However, we cannot rule out, of course, the existence of PSE or ESP with this study alone. Nevertheless, we can say with quite big confidence that this particular paradigm that was used in experiment one of BEMS is very unlikely to yield evidence for the existence of PSE or ESP or precognition. I just want to acknowledge the gracious funding from the BIAL Foundation for this research project and the donated time of the consensus panel members. I'm not sure if I mentioned, but the BEMS was also in this consensus design panel and of course the co-authors and the collaborating lab members who participated in this big multi-site research project. You can find our paper with the results and about this study in Royal Society Open Science. So with this brief introduction to the project I would like to again thank all of you who have joined this event and the panelists who are giving their time and let's have a good discussion about this project. So before the conference, we had the email discussion and based on that email discussion we have formulated a few questions that might be worthwhile to discussing here. One of the first questions that I'm personally very interested in is related to whether you... What do you think about the conclusions of the paper? So as I indicated, these conclusions that you can read in the paper have been pre-formulated in this consensus design process and we didn't change them after the data collection. So one of the indicators of success of this consensus design process to me would be if people say that they still agree with this conclusion. So what do you all think about the conclusion of the paper? What do you see? There is out that this experimental design or TPP is a serious candidate for being a golden standard for a new generation of registered or registered effort. I'm curious about the opinion of the methodology present in the panel, but this is a unique example of an increment of the present standard of a registered effort. There is no doubt. And consequently, the reasons obtain tell a clear story. With this paradigm of this experimental design, there is no sign of recognition or anticipation of future unpredictable events. This is clear. Of course, if we reflect about results, some of the skeptical may think, okay, this is another demonstration that Psi is in an absurd hypothesis. But if we know all the literature about this line of investigation, the only possible conclusion is that this particular experiment design didn't show any. I remember that this one of many different protocols that can be used to tell such weird hypothesis that I like must. And in the literature, unfortunately, most of the participants don't know all the, so I can only summarize that using recruited non-selected participant and asking them to perform a forced choice task in a normal state of consciousness, there is a demonstration in the form the supporter of the nowhere of where they put it. Usually show the less reliable results of some time there are positive results but with very tiny, tiny effort size. So, and I recommend all who are, say, satisfied by the findings of the Psi don't apply the constraints of realizability, sorry, for my pronunciation, that is a recommended in all experimental design. So, to end the story, they are correct. There is no sign of recognition, anticipation, et cetera, with a particular experiment design but I recommend not to extend the general this finding to all field. This is not only a theoretical but an empirical question if we find that not results applying different experiment design. Thank you, Patricio. Marcel? Yes, thank you, Zoltán. And I agree with the conclusions. I think it is a really excellent study. The methodology is superb. Many innovations in research I would say and I'm very much in favor of the consensus design that has been applied. And I agree with the conclusion that there is no evidence of ESP in this particular framework but from a meta science perspective, I'm very much interested in how this affects a scientist, which is our profession, our belief in whether ESP exists or not because when we do research as a researcher, the outcomes of the research will have to affect our beliefs and that's what I'm interested in. I'm looking forward to the discussion. Thank you, Marcel. How about you, EJ? Well, so I think it's an interesting high quality methodological demonstration for outsiders but when it's about does it change our beliefs, I don't think proponents would greatly change their mind and I think had the results come out differently, so let's say the effect would have been 55%, right? We probably had agreed to say something like this is strong evidence that something's going on, right? I would totally not have subscribed to that conclusion, right? I would have said, okay, what is more miraculous, right? That Zoltan actually is completely fraudulent and fakes these data or that there's this effect that violates the known laws of nature. Well, sorry, Zoltan, but I would have bet it against you. So it would not have changed my mind greatly had the results come out differently, I have to say. So I think it's a very interesting project but mostly because it kind of demonstrates like if I'm not sure every project needs to go through this length, right, of doing it, but I think it's a great methodological demonstration in terms of changing beliefs. I'm not sure, I've encountered very, very few researchers who've ever changed their minds about anything but yeah, so I think maybe some of the proponents or advocates have something to say about that as well. Okay, so Jakob, I think you were next. Yeah, thank you so much, Zoltan, for your pretty nice introduction and again, my congratulations on this project. I think it's a brilliant demonstration of how to do things right, in particular in a controversial topic like this one. Interestingly, in terms of changing beliefs and thinking how you think about science and about the ESP hypothesis, for me this study, it did change my mind. I was open to the possibility of the existence of ESP. I think that the ESP hypothesis in this particular, it's quite difficult to maintain a belief in the existence of ESP as something that exists as a signal, as something that can be used as a way to gather information about the environment intentionally. So in that sense, it did change my mind a bit about the ESP hypothesis. I think it's difficult to maintain that ESP exists as something that we can measure and capture in an experimental setting. That's the issue, probably. You know, there are many more ideas about what might cause all these weird effects in the research that go beyond project research practices. So in that sense, side research in general, experimental research is still very much going to be taken care of worthwhile. But the ESP hypothesis, the idea that you can actually measure people intentionally obtaining information from the environment by ways that we don't know about yet, it's very unlikely given this set of particular measures. So it did change my mind in that respect. Thank you. Thank you, Jacko. No, was it you? The... Yeah, I think it was me. Yes. My opinion is like Partizzo, that the TPP is an upgrade of scientific practice with all the best controversy, killers, tools. This allows to test some hypotheses about side phenomena response to such experimental condition. But this research is still imperfect. You have made... You have hired Jim Kennedy as a parapsychologist and specialist of clinical research for an external research audit and enlisted good and bad points and show an asymmetry. The consistency between... I quote him, the consistency between the preregistration of the published report was very good, except that discussing protocol deviation was adequate. Key aspect of the management of the data collection server and software were rated as not adequate. And when combined, represent serious deficiency. It is the summary of this report. And he said, and this is the asymmetry, if the study would have found evidence for ESP concerns about the protocol deviation and inadequate computer system management would have been much greater. So this is some part of the aspect of the discussion too. Yeah, thank you. Thank you very much, Jono. We can get back to these points. And I'm making notes furiously. But let's continue with David. Good morning. So I think it's worth just trying to address that issue of, would it change my mind? And that's a... I feel like a typical psychologist because I'm going to say yes and no. In the sense that, yes, it changes my mind about this particular paradigm, this protocol, if you like. Experiment one from Ben's suite of experiments. This tells me in a very sort of methodologically robust way. And there's a lot of things, a lot of steps that you've gone through here. And I think it's... Somebody said it's the gold standard. Patricia, I think you said this could be the gold standard methodological research. It just struck me that perhaps it's even the platinum standard. It's just like it's above and beyond the expectations there. But it does change my mind in the sense of, I would look or I would now think to myself, I don't think there's really much mileage in trying to rerun that type of paradigm in an effort to try and find a, whatever you want to call it, retroactive priming effect, precognitive effect, however we want to frame those words. So I think that's a really nice notion that it may be and it might be that that's where, like that's one potential benefit or use of this because it is very methodologically intense. And what I've spoken to you about this is also an already, but one of the things that concerns me a little is that this is quite costly as well. The whole thing is very, very intense and costly. That limits it, that introduces a whole range of restrictions and Psi research generally is, as all research, I'm sure every researcher is completely aware of how difficult it is to get funds to support their research. And this would be even more challenging if every piece of Psi research had to go through this. But so in terms of changing my mind, yes, I think this tells me that that paradigm isn't, there's probably not much there. Does it change my mind in the sense of, do I now think Psi or whatever that may be is nonsense? Not really. I think it's a point on the graph. Hopefully what we could do is accrue more information. And that might be one way forward. We take the more, I don't know, the areas where Psi is proposed to be robust, I can think of things like Gansfeld et cetera, and then maybe run them through this procedure. So it's not that we use this on everything, but we try to take the best examples, whatever they may be of Psi and use this sort of extremely rigorous methodological approach to see if we can then elicit those effects. That might be one way forward. Thank you all. So I think this is a nice segue to talking about cost and benefit of the different methods. So some swaths could unfortunately couldn't make it to this panel discussion, but in our pre-conference discussion, he sent an interesting point about this. So I would like to quote him here to start the next round of questions. So he's saying, we can pad each other on the back about our greatness all we like. This doesn't address the elephant in the room. Does anyone here actually think it is feasible to do a study like this regularly? I doubt it, but even if that were the case, it would be unwise and irresponsible. As I said above, there are many more worthwhile hypothesis that could result in societal benefit or even just considerable advance on understanding the brain and the mind. Running this study like the TPP on every important question would be utterly wasteful and slow down science to a gracious, to a glacial space. I do agree that slow science has its value. And in general, we should slow things down a bit, counteracting the incentives for publish and perish, et cetera. But I don't think running TPP on everything is the answer either. Research needs to find a good compromise, a good trade-off between methodological rigor without making progress grind down to a halt. To me, the answer to these challenges is not running super large-scale studies with data being uploaded centrally, but to improve transparency, regularly check reproducibility of uploaded data and publishing results, facilitate independent replication. Of course, occasionally studies like the TPP can and should be done, but I would expect them to be rare. So what do you think about this, the cost-benefit of the methodological toolkit that we developed here and whether this should be, or which of these methods should be applied in mainstream research? I think Patricio was first. I think in a way, I think that the two main improvement of TPP that I suggested in previous email exchanges can be renamed as transparent pre-registration protocols. In particular, the both open data and even more the application of the pre-registered statistical analysis or the ongoing data respecting all the processing declared in the pre-registration is something that can be used in every register as a report. Clean up the gap that you mentioned in the introduction because even in the registered reports, they are the, say, in my view, some of the best control of the intention of the investigators, there are voluntarily or involuntary always deviations from pre-registered declaration in particular about processing data and some degrees of freedom in the actual, of the statistical analysis. If we include these two innovations in all pre-registered reports, I think it is with a great leap, a great jump in the quality of the experimental experiment not only in psychology, but in many other disciplines. Thank you, Patricio. So, Jacob. Yeah, thanks. So, one of the interesting things about the TPP and the methodology that it uses is yes, it's very costly and it's very labor-intensive. So, in that sense, I completely agree with Sam's point that this might not be the way forward for every single psychological experiment. On the other hand, just yesterday was the presentation of new Dutch projects for research infrastructure. One of them quite struck me was a very big project about the research and ecology. And basically it was a centralized data store with real-time data, blood, barn open data, a very nice dashboard to do your own analysis and research. So, in that sense, the methodological intervention that you see in the TPP also finds ways into other research areas. So, I really do think that we are at the point in time where innovation and doing research is something that can be radically transformed because we have a few ways to gather data, to share data, and to do all of that in an increasingly effective way. So, the nice thing about this ecology project was that it was actually open for anyone. So, if you have a camera in your own backyard, you can just plug it in, it would be brilliant. So, in that sense, I do think that we'll form something steer with the TPP, we'll find that way on main-spirited research. On the other hand, it's also important that we keep in the back of our minds that a part of the problem that we're trying to address here is not just an anthropological, it has to do with trust in science, it has to do with trust in hypotheses, it also has to do with the quantity of theorizing. So, in the end, the TPP was aimed at, in the case of one single experiment where we believe that there will be a difference between two conditions. We don't know, it's causing the difference, we don't have any kind of predictions about the size of this particular effect. We don't understand in which specific consequences the effect occurs, if at all. So, in that sense, what we have been trying to do in ESP research, and I've done quite a few who's inspired to myself, is that we're trying to find something a-different. And as we all know, every single panelist here knows finding a difference in the data set, well, it's very easy to fool yourself. So, in that sense, I think that another thing that we should work on as a, not just as a psych community, as a psychology community in general, is to work on the quality of our hypothesis that we should go beyond a hypothesis that says, we expect a difference between group A and B because finding a difference is actually too easy. We should be a bit more specific with our hypothesis. So yeah, that's what I'd like to add, two things. We are working on the infrastructure already being what we also need to work on, about the size of it, which is a methodological innovation. Thank you, Jakob. Marcel, please go on. I need to get the charger. I'll be back in a minute. Yes, so I think this project had many methodological innovations that make this research and its outcomes very trustworthy. But on the other hand, I think you need to be a methods expert to be able to do all this and to not exclude all these scientists who cannot do this yet. I would not be in favor of having many such projects or wanting scientists to do these projects. However, what I believe is important in research, and I think we should do it more, are three steps. One, the consensus design approach. Use it more often. I do not see it a lot yet. It would be great that before starting your research, you ask particularly people who are not believing in the theory to co-work on the design. So we could use that more often. Use pre-registration more often. And this is now happening more in psychology, but unfortunately, most people who pre-register their research do not do it well yet. So there's a lot of things to do there. And what I mean, good pre-registration is that in advance, you also pre-register the code for the main analysis of the hypothesis you want to test. And I would also love to have open data, open code and open research materials. And with these three pillars, I think we would do a great job in empirical research. Thank you Marcel. So David. Yes, I think I would echo that really. There are a lot of steps here in this piece of work. And I mean, they're all very, very good. But I think it would be laborious to the extreme to expect every side research project to go through this. But that doesn't mean that we can't incorporate some of them. And some of them might be a lot easier to incorporate than others. Things like the direct data deposition, born open data, I think those might be a little bit easier to set up. And even things like the lab logs and all that records. Those things aren't so onerous really. And I think, you know, could be sort of expected much more upfront, I think. So there are certain aspects of this that I think would be easier to implement than others. And I don't see why that would be a problem. And I could imagine a situation as we move forward where funding councils would expect this of all research that they fund. And certainly in some of the projects that I've been involved in and some of the research funding issues that I've been associated with, it's becoming much more the norm for funding councils to expect things like pre-registration, open access publishing and things. So data deposition would be a natural step forward. I think all of these things. So, you know, I see this much more as a gradual process. I think there's, you've set a sort of very high bar with a range of methodological issues that we can adopt approaches that we can adopt. And it won't always be possible to adopt every single one of them. But over time, we can, you know, we can take them on one by one, is it? So I think this, I see this as very sort of a step to approach. Thank you, David and E.J. Well, it reminds me a little bit of this project I was involved in testing the facial feedback hypothesis. And in particular, the idea that if you hold a pen between your teeth, your face faces in the smile position and you rate cartoons, cartoons is being more funny. So we did this as part of a registered replication report. I think that at the time that was for perspectives on psychological science. And in the end, it involved 17 different labs and we had a sort of a proponent of the effect who vetted the protocol and I have to say, this was one of the most intense research experiences that I've been involved in. Like it took forever. And it was, I had the highest assistance specifically for this project who did most of the work, but it was still quite an effort. And that was only on our part. Then there were the 17 other labs that had to videotape the participants to make sure that they were holding the pen correctly and do ratings afterwards. It was quite something. So, and this project went beyond what we were doing at the time. So I can only congratulate everybody who worked on this project because I know how time-intensive this is. And I do think that if people want to apply this procedure, they have to, well, maybe another good thing about it is if you want to use this particular procedure, it forces you to think very carefully about your hypotheses beforehand and select hypotheses that you think are really important because if it's something that you don't think is really important, you're not going to go through all this trouble. And I also think in general, this is something we haven't really discussed, but there's a little counterintuitive that replication research can be so exciting. So I remember when I had a student who came in for looking for a research project and I proposed some replication and she was very disappointed because she thought it was extremely boring, but it became immediately clear that as soon as she contacted the original researcher, that it was actually really exciting because people immediately get a little edgy when you tell them that you're going to replicate their work. And then, of course, there's the, you know, what will happen? Will the effect replicate or will it not replicate? So I think it's, there's more sort of prestige associated, prestige for researchers associated with doing replication research, but I mean, what are the current rates of people actually doing replication research? It's still very low, I think. So I think that is something that I would like to see changed as well. Thank you so much, everyone. So before we go on with some preformerated questions, I think it would be great to open up here for audience questions now, because there was, I think, some confusion about how long this webinar would last. We will go for 90 minutes, but still some people might have expected one hour. So let's see what the audience has to ask us. Tom, could you relay some questions from DQ&A? Yeah, we've got a couple of questions. There's one from Chris Rowe. I don't know if you want to, they want to ask that question themselves. But in the meantime, there's one from Jason Chin, which I think was very early on in your presentation, Zoltan. He says, isn't retrospective registration a far cry from saying data fabrication is widespread? Yes. I mean, saying that or pretending that you had the research plan and the hypothesis before you collected data, while actually you only formulated this registration afterwards, I think it's very similar to fabrication. So I can very easily see a researcher imagine it in my mind that this person had like, okay, a million dollar grant and the P value is 0.06 and he would just think that, oh, I just need this tiny push, you know. I know that the effect exists, everybody knows that. It's just for some dumb reason that one of the participants was drunk when they came in. I mean, just the nudge these notes, you know. It is, I think, similar fraud. And people just rationalize it in a way that it seems like less intense, but I think it is very similar. And we don't have a clear marker of how often actual data fabrication is going on. And this at least has some clear records that we can rely on. So it could be proxy for a sort of fabrication. Okay, Chris, would you like to ask your question? Sure, can you hear me okay? Yes, thank you, yeah. Great, yeah. So I'll essentially just read what I've written. Thank you for the presentation. This is certainly a gold standard design with respect to precautions against QRPs. What I've called the meta-experiment. But from my perspective in reading the paper, this has been at the expense of paying due attention to the experiment itself. Patricia, you're going to have to read it again. Patricia, I think mentioned something along these lines earlier. Very little consideration is given in the paper to the participants and their participatory experience. For example, the recruitment of participants is described cassorily, some through course credits and by other means, and that's not really a consideration. And it's not obvious from the paper, you mentioned this last night at another talk that 60% of participants actually completed their trial in a group setting, which is a very different psychological experience. There's some reflection in the paper on the general antipathy to sigh of the collaborating researchers, but that too seems to reflect a general lack of interest in how that might translate into participant expectations that could affect an outcome. Now, this isn't something magical that applies only to parapsychology, though parapsychologists have considered it in a bit more detail. Psychology generally is actually replete with psychosocial experimental effects. Bob Rosenthal, many people will know did much work in that area, but sadly that kind of approach in the paper tends to reinforce the public's perception that psychologists see participants as unidimensional data generators rather than reflective human beings who respond to the circumstances that they find themselves in. So it's more of a statement, but I'd be very interested in your reflections on that as a statement. So is there anyone who'd like to respond to that? Yes, Jekyll? Yeah, thank you. I really think it's very interesting and work well statement. Also, I'd like to take together with the next question Bob, our vice-le-government basically asks, well, what are these PSP phenomena on your career, more naturalistic settings? So basically, and this is a kind of alternative here to sigh phenomena. Interestingly, there are some alternative ideas towards this ESP hypothesis. So ESP hypothesis is that we can actually use a model of sort of information to inform our self, whereas that our alternative take to sigh basically sees sigh as either probabilistic phenomena where things that occur very often in relation to, well, some alternative models to mind-brain relationship. So for instance, that's how these theories rather explicitly predict sigh phenomena to occur, but only if the setting is right. Ironically, the setting being right means in rather unrestricted settings. So basically what we're doing as psychologists is that we're trying to isolate particular phenomena from basic nature and the environment in which it occurs to really pinpoint and try to study that particular thing. That's what experiments are basically for, trying to reduce all kinds of environmental factors and only focus on one thing that's interesting. It works in many, many different situations, many different kinds of phenomena, but there are some theorists and I think what reasons they're saying that in particular for these kinds of phenomena, basically reducing your setting or juicing the recent freedom of which these kinds of texts can express themselves to just the press of a button, it might not work. And it ties in to Eric's question. So does it pop up as he does in watching? Well, very conveniently, it might be the case, but also to Chris's question, with regard to the role of the participant. So if you need to reduce the role of the participant to a button-pressing machine, it might need not be conducive to a lot of the nominal richness of that experience. So in that sense, this resonates with me. Unfortunately, it also makes it a bit more difficult to talk about the issues in one period of the setting because if you have to pay attention to the phenomenology as well, it kind of clashes with the quantitative way that I think about psychology, sort of mainstream psychology. And having said that, I would love to stay on a bit longer, but I see that my colleagues are waiting me for my next management meeting. So again, my congratulations to all of you. Like this discussion. Thank you very much for having me here. And Zoltan again, my congratulations. Brilliant project. Really great. Thank you so much. Thank you, Jakob, for your participation. And thank you for answering this question. Marcel, do you have anything to add? You're muted. Yes, thank you. So when I was involved in this project in the panel, I thought about the ESP phenomenon and I thought, well, I believe it's unlikely to exist, but maybe and particularly that we all have it, but perhaps there are a few people who may have it. And like, I don't mean it sarcastically, but Jesus type persons who do have the ability. And therefore I really insisted on looking at the extreme performance also and do statistics on that. And also when looking at the best performing persons in this environment, they didn't perform better than chance. So if you would argue something against the particular setup, then we'd like to add that in this particular setup, maybe some people may be disadvantaged, but it would then be quite unlikely that all of them are not, yeah, are at a disadvantage. So I'm following up on this. I think this research is best served by looking for people who may have ESP and doing strict tests on that or them, right? It's cheaper also. And if indeed we find that there are such people, we do not, yeah, there's no strong evidence yet that there are such people, then we have something. And we can continue doing research based on that. But this general population research, I think may not be worthwhile because given these types of research, it's rather implausible. I believe that we all have this ESP phenomenon. Yes, thank you, Marcel. One thing I would like to add to the question by Chris is that the experimental manual and also the checklist indicated or contained some instructions on how to keep the atmosphere that was included in the original BAM paper. And also in the training videos, we were looking for these aspects. So we were not completely like disregarding this aspect. We put some effort into keeping the same kind of environment and also the feel of the experiment as in the original BAM study. That being said, we cannot objectively demonstrate that each time we were able to keep this up, we didn't ask participants about the feelings during the experiment and so on. So there might be, it might be that we failed in this. We cannot demonstrate this. Okay, David, do you have anything to add? Yeah, I was just gonna, in a sense, echo that point that you've just made. I mean, trying to sort of speak to Chris's question or issue, which is this idea of how we treat the participants and how we interact with them, how we recruit them, all of those things. I do think this is an issue that we deal with in psychology, not just in, what do we call this, parapsychology, generally, I think we deal with this really quite poorly. In a report is a fairly straightforward, sort of quantitative, this is how many we got, and this is their age, and we talk about that. We don't often report about the interactions between the experimenter and the participant and those things. And yet we know from psychological research, those things can have an effect. I mean, in other areas that I've worked at, trying to measure things like creativity in the lab is always a challenging aspect and how you interact with the participants can have a big impact on that. And I think it's just something to keep in mind. It's one of those, what is it, Rumsfeld things. It's like there are many more unknowns here, but I do think, and I guess, in a sense, I feel a bit like a lawyer in the sense that I'm just being very specific when I look at this paper and think, okay, it tells me that it didn't work using this particular paradigm, and that paradigm has to include things like, and that's how we recruited participants and that's how we round them, and that's every aspect of that. And if we change any aspect of that, that may well change the outcome because these things can have an effect. Okay, so I see that there's someone with raised hand in the participants, but Tom, could you give us a hint of how we should progress in terms of order of questions? Yeah, Sif, is it how? Sorry if I pronounced that incorrectly. If you want to ask your question or make your comment. Other than that, we don't have any more questions. So I think maybe if you carry on then maybe I can get this out in a few minutes. Yes, okay, so until, is it how Tom's back, we could address some of the other questions. So another thing that I would be interested in, what do you all think about the role of automation in research, increasing credibility in research and specifically it is getting more and more current with the new progresses in AI technology. So there has been some research studies that have been autonomously carried out by just AI now. And this is within arms reach in our field of science as well. So what do you think employing AI and automating or handing over most or all of the research steps to AI agents would increase the credibility of our research projects? And what do you think about the usefulness of these research in general? Yes, Marsha? Yeah, so last 10 years, I've been doing research on meta science and the more I get a bad feeling about humans as researchers, humans have many biases, confirmation bias is one of the most prominent. For instance, a good example is that researchers often look for phenomena that confirm their suspicions while following Popper, we should look for events that may falsify our beliefs. But we humans simply don't do that. Researchers are humans and it's too difficult for us. So in that respect, I think we could be helped a lot by machines and by simple or simple by strict logical machines who may assist us. Okay, so I see that. So just following up on this, so what you said, Marcel, that people have these confirmation biases, unfortunately as we train our AI systems on human data, we might inadvertently impart these biases to the AI. So just like racism is involved in the AI systems that are used in the US for parole decisions, we might be imparting some of these research biases. So I would be interested in the psychology of AI related to these confirmation biases. That would be interesting. Okay, so if there's no more discussion about this, oh, David. Yeah, I mean, I think I'm not sure I'd want to go so far as to say that AI should replace researchers. I mean, I love my job and I have a lot of fun doing it and I want to carry on doing it. But I mean, they could play a role. I mean, in a sense, one would hope that as scientists we come together in that sort of spirit of collaboration where both proponents and skeptics will work together and perhaps that's where maybe the open collaboration ideas will flourish. But I could see a role for AI but making you rethink or at least keeping, making you think about alternative possibilities and ideas but not so sort of working with it rather than being replaced by it. That's what I think. Because as you say, the way we wouldn't be privy to how that AI has been programmed or what learning algorithms it uses. And one of these things that comes to mind is I remember reading recently about AI program that was used to identify cancer tumors and literally identified them using a ruler because in the slide that always had a tumor is a ruler to show how big the tumor was. And of course, rulers don't lead to cancer at all. In a sense, it's how the AI's learned. So we need to be very careful not just falling into a false assumption of thinking somehow but the AI is going to be completely unbiased. And rightly so, of course humans are biased. We're all biased. The fact that we recognize that and we try to keep ourselves in check and allow our colleagues from both sides of the aisle to check on us as well, I think is a good thing. So I would see them as perhaps working collaboration. I think it'd be a fun thing to have an AI as a collaborative author on your paper. Yeah, thank you, David. So I saw that Isato asked us to read his or her question in. Could you help us, Tom? Maybe they posted it in the Q&A. Yep, so I'll read this out. The increasing role of meta science in science holds great promise and some risk. Already its influence can be seen in the growing proportion of studies that are preregistered as well as many journals adoption of badges for preregistration and the sharing of data and materials. In addition, many scientists now understand that the previously common practice of combing through a new dataset to find a good story and then reframing the results to tell that story can potentially lead to erroneous conclusions. The growing salience of meta science in the field is in many respects like holding a mirror up to science and the scientists who conduct it. On the one hand, exposure to a mirror is known to enhance conscientiousness and indeed it seems likely that the emergence of meta science concerns may be encouraging scientists to be more disciplined in a way that they conduct their research. However, mirrors can also make people self-conscious and it seems plausible that scrutiny of the scientific process could at least sometimes stifle scientific creativity and risk taking. Okay, so there was a lot in there but I guess the question, the main question is what do you see any risks of increasing scrutiny and pushing down hard on this credibility point in relation to science. And for example, increasing like things that could be described as surveillance could that stifle research creativity. So what do you think about this question? Yes, Marcelle? Yes, sorry for talking again. We are here for that. Yeah, yeah, I understand the sentiment but I don't agree. So pre-registration in itself doesn't stifle creativity because you don't have to pre-register all the research and you could also do other research and tell this honestly to people. And even when you pre-register your research, you're also free to analyze your data in other ways as long as you report how you did this. So in principle, the creativity is just the same as before but you also have a part that you devote to clear, rigorous, pre-registered hypothesis testing where you do this with traditional statistics or base and pick your thing. And so that would be my answer. So I understand the sentiment but you can still do your research however you like, report it honestly and you do a great job. Although I like it when you do more pre-registration by the way. Yes, and if Alias wants to chime in, I completely agree that for things like pre-registration needs to be used the right way. And for example, in medical research, there's a better understanding I think of the staged buildup of a research project. Like you start with case studies or some show of promise, then you do a feasibility study, a pilot study. And only in the very end, you would get to the like the big RCT stage where pre-registration is something that is crucial. So I think in psychological science due to the pressures that are involved in the publication process, people usually just simply skip the first stages and everyone wants to do a confirmatory study immediately or they rather want to do an exploratory study but want to make it seem like confirmatory study in the end. So I think the slow science thing here could be definitely applied to psychological science in the sense that we should do the prep work for our studies better. And pre-registration could be applied in these pre-steps as well if you want but they are not so crucial as for the confirmatory stage. You just need to be clear of what stage this study is in, right? Okay, I see that one new question appeared in the Q&A. Is that so Tom or I'm not sure I'm reading it correctly. I think these are slightly more on the topic of AI. So I think if you've got additional questions for the panel and we haven't got much time left maybe you should move to that. Okay, so one of the things that came up as well was that the TPP did a good job in addressing this transparency gap but we didn't really deal with the part of recruitment. So we cannot demonstrate for example that it was actually all unique humans who did the research and not a single person at each lab who did the experiment a number of times. There are some indications in the born open data. So the data was coming in at the same time. So it would have been very hard for a single human to use those keyboards but still they could have set up like a computer system or something to produce these data. So do you think that we should extend the transparency to the recruitment stage to be able to demonstrate that or the data is coming from real people and the real unique people or is this something that's like doesn't really matter? And we shouldn't focus on this. In particular in the transparency project this was not really a concern because it would have been just as impressive if a computer or a single human is able to do the precognition as if it's 2000 humans but I'm talking about more psychological science in generally what do you think? Marcel do you still have your hand up but I'm not sure. I think you left it up. Okay, so it seems that there's no substantive discussion of that. I personally think it would be great if it could have an easy way to demonstrate this but this would require some human detector. And I think that there will be a lot of progress in other areas, not in psychological science or science but in general because of the AI improvements we will see soon some unique human detectors that we will apply to distinguish between AI generated content and human generated content and we just need to wait a little bit and apply these techniques in psychological science. Okay, so one other thing during this discussion it came up that for example, EJ you said that you would have batted against the conclusion if we found higher than chance gas rate and you said you wouldn't have subscribed to these conclusions. So the conclusions would have been something like there's a strong evidence supporting that there's a correlation between human gases and the computer generated random numbers. So it wouldn't have been the conclusion wouldn't have been precognition exists rather that there's a correlation there. And I think that simply correlation does not violate physics. So it doesn't imply retro causation. But then you have to come up with an explanation, right? And I think I cannot, well, if you can think of an explanation that doesn't violate the laws of physics that would be really great. But I'm reminded of this phenomenon where at some point they found, I think it was in Italy, evidence that neutrinos could go faster than the speed of light. And the head of a physics department in the US summarized it by saying, this is like the equivalent of finding a flying carpet. And so sure enough, it was a material malfunction that explained the measurements. So I would go that route if I would see this. I would think like we have a flying carpet here. I don't believe in flying carpets. So obviously, I wouldn't be a Bayesian if I didn't assign probability epsilon to psi being real, but epsilon is really small for me. So I would need to see evidence for it in many different ways for me to really change my mind. So I'm just a, yeah, yeah. And I would say, before I forget to bring this up, I did think it was really cool to do the Bayesian hypothesis testing to carefully think about a prior distribution, monitor the evidence over time and also that you're able to quantify evidence in favor of the absence of an effect is also really cool. So I thought, I'm of course a big fan of Bayesian hypothesis test in general, but especially in situations like these, where I think when you put so much effort into the data collection and safeguarding everything, I sometimes see that people then just unthinkingly use some kind of approach because everybody uses it or something like that. And I really think that's a missed opportunity. And in general, I think more attention should go to methodologists for these projects, right? Because you only need one or two to work on this and it can lift the level of the entire project. Yeah, it can lift to a whole new level. So I really thought that was a great example of the advantages of that approach. Yeah, thank you for mentioning that it was a very hard part of the consensus design process that some of the panel members insisted on the Bayesian approach and some members of the panel insisted on the frequentist approach. And I was just cracking my mind how to accommodate these, how to resolve this. And I remember that it came to me during a morning run that why don't we do both? And the solution was simply to create a packet of the frequentist and the Bayesian hypothesis testing and only proclaim the support for either model if both have the same conclusion. And thinking back, this is kind of obvious now but at the time we were always using either one. Okay, Marcel? Yes, yes. So EGA's epsilon is really, really small and we have a lot of people in the world, more than 7 billion. So why not trying to find those special people if they exist that can show ESP? That would be great, I think, if we can find these people if they exist. And I think this is one of the, I'm not sure whether it was used in this particular setup. I'm sorry that I don't know this but in general, one approach is to do a test, look at people who perform exceptionally well and then test only those people again, right? And if they regress to the mean then we know that it was a fluke but if they don't regress to the mean then something special could be going on and it would be a much more effective use of resources to focus on the people who show some promise in this regard. So in general, I think it's a good idea. Yes, I see in the Q&A that Felix says that this is a great demonstration of the do-un-queen problem but Felix, you can probably speak to this question yourself. Hi, sorry, I was not prepared to speak but yeah. So the very first question of this imposing web you asked about the change in belief and also what E.J. said in his last comment, I think that's a nice illustration of the dream-quiet problem because everybody can blame his or her favorite auxiliary assumptions. And if we all stay with our previous beliefs, how can we change our beliefs at all if even such a study is not able to change your beliefs? So I mean, is that just a function of the strength of your prior belief which probably is very strong in this case? And related to that, I have the impression that in many adversarial collaborations most teams stay at their prior beliefs. Instead, despite having these consensus protocols and so on. So are you aware of any adversarial collaborations where the losing team actually changed their mind also in public? I think I heard of one such adversarial collaboration with Kahneman being involved where the result kind of proved both sides a bit wrong and there was like a served outcome. But I'm not really aware. I mean, I'm not aware of the details. I should read up on this. Yeah, so I see that we have two minutes left. Yeah, go ahead. Sorry, I just wanted to mention that there was this recent study with many experts who looked at ego depletion and the results there were quite clear. And I think some people there changed their minds. And there was also a collaborator of Amy Cuddy who changed her mind on power posing, I think. So it does happen, but yeah. We have these debates that have gone on for decades, right? Where one researcher advocates one theory, the other advocates the other theory and miraculously they always find support for their theory and against the theory of the adversary. So that's kind of almost funny. So this is a great area to make progress on how to change minds and how to, or rather how to make or experiments more effective in and more impactful in changing priors, maybe. So, but with this note, I think we need to close our session. I would like to thank all of you in the panel and of course, everyone among the attendees who participated in this session. And let's make progress in improving science and trustworthiness of research projects. Thank you all.