 Great. Thank you. Good morning. Thank you for joining us all right now. Again, my name is Matt Makle, and I'm joined today by Aaron Miller, Jay Carter, and Scott Peters. And we're going to be talking about open scholarship collaboration and replication. And here is kind of the quick menu of what we'll be covering today. I'll set the stage for a few minutes with some background information and some definitions so that we're all speaking the same language here. And then the bulk of the session will be on these five different types of collaboration, assessing overlapping established effects, which is a really clunky name that I'd love to replace. But it's worth checking things that have already been found, then multi-team collaboration, persistent collaboration, collaborative analysis, and then the name that I kind of like the best, pre-registered adversarial collaboration. Then we'll do a quick wrap up and then hopefully we'll be able to spend quite a bit of time sharing questions and answers. The latter part, the portion that's in red here will be building largely on this paper that came out at the end of 2019 that Aaron and Scott and some others wrote with me. And then we are also Jay Carter is joining us, joined us for a hackathon just over a year ago in 2020 where we were trying to initiate actually some action on forming some large scale collaborations, which I think we had really good promising start on, but then about a month after that the world's had a different agenda and some other actions and activities took away our attention a little bit. But this is the paper that a lot of the work that we'll be talking about is building on. And so first is kind of set the stage I want to talk about the lay of the land and where I, or at least I think and I think we think things are we think there's a lot of problems and in education research, maybe a more optimistic framing would say rather than problem it would be a piece for improvement but that doesn't really make a clean header on a slide show so I'll use the header, the word problem, and that we think there's some lack of reproducibility and lack of replicability in our fields and so reproducibility is when you use the same data to check the results of this replicability is using different data and whether or not you get different results or analyses, and we'll include some links later on in the show in the slides to kind of demonstrate why I think all these things are true. In terms of questionable research practices that's this gray zone between something that the field kind of agrees is not good behavior versus something that the field thinks is the best practice there's this middle gray area and that could be things like p hacking. We run 25 different statistical analysis but then you only report the ones that show you the result that you like harking which I think is a great term that I love is hypothesizing after the results are known. And so that's where you run a bunch of analyses and then get a result that you like and then you act as though I was predicting that all along, even if you necessarily were not. And then there's also some problematic publications into what do we do with these retractions around the rise overall in academic research and if you're interested in that. I highly recommend the retraction watch blog, which blogs about all sorts of retractions and even has a. I don't know if they use the term hall of fame or hall of shame but of individual researchers who've had a great many of their papers retracted, often for making up data. So if we can't trust reliably trust all of the publications and what's published in our fields, that's got some pretty big and serious consequences. And if you think that is enough problems and consequences. Sorry, there I've got a couple more for you that are a little more general things like like research or flexibility where we can choose which methods we want to do if we think that'll give us the result that we want. Lack of statistical power which could either yield inflated effect sizes or just a lack of ability to find rather small effect sizes and small effects, which can be really important if we think that that's what actually works. Education may have a lot of big data in terms of national or federal or states collecting administrative data, which can be really great but that's often observational and as a researcher we don't necessarily have control over what information is being collected so that may limit what questions we can ask, or what questions that we can answer. And then some things that have come up quite a bit in the last couple sessions are about incentives. And I think in research overall there's often this motivation for having a lot of things, but maybe less on quality of so we might know a little bit about a lot of things but we don't know precise answers about many different things. And I can just want to let everyone know I'm sure you all know this because we're 11 months into online meetings and conferences. I cannot see the chat because I'm sharing the screen so hopefully my colleagues can. So where does that leave us. Well, I think this gift kind of tells you where I think this leaves us is I think that there's quite a bit of problems in our fields. And, or again if we wanted to be more optimistic there's plenty of opportunities for improvement of what we can do. And if you wanted to just a quick self plug there's another conference session if you're interested in questionable research practices Jonathan Plucker and I will be talking about that at 2pm today, and some a slightly different route than what we'll be talking about today. But I do want to pivot if we want to reduce all these problems that I've been talking about and avoid some of these consequences. What can we do, what are some actions that we can do. And obviously as Bill Nye is telling us on the screen, I think one of the big answers is that we can work together, but before we pivot to collaborations. I also want to talk about the replication solution. And so the two solutions will be talking about a replication and collaboration as you all know since that's the name of the session. So just broadly replication is that's where we're checking existing findings and then we think collaboration is a route to improving future findings by working together. And Tuki talked about in terms of statistics how confirmation comes from repetition. And if you do try anything else other than confirmation repetition leads to failure and more probably destruction, which I think is a great, very powerful statement, and I think applies to more than statistics but I think also could apply to our field at large. If we cannot repeat the results that others have found before us. I don't know if we can consider ourselves a successful field, and I certainly don't think that we can consider ourselves a field that should be informing practice. And if one of you can get one result but I can't replicate it or if none of you can replicate my results, how useful of finding are we actually producing for classrooms and schools to consider. I think not particularly useful. And so that's where replication becomes very important. There's two commonly used terms to try to slice replications into different things. One is direct replication and that's where you're purposefully trying to repeat the previous methods as closely as possible. The analogy I like to use is if I'm trying to follow a famous chef's recipe I want to bake their bread as closely as possible to them so I'm going to follow every single step that they did as closely as possible, because I want to see if I can get the same result that they did. And conceptual replications on the other hand that's where you're purposefully changing a component to test an underlying hypothesis, and it could be to assess generalizability so that's where you say you found this result in kindergarteners. I want to know if we can find it in first graders to, or we found this result in the United States, does that apply to students where I live in some other country that's two very different purposes. A paper that really affected my thoughts quite a bit on replication by Stefan Schmidt shall we really do it again is on replication. He talked about five different reasons of why we might want to conduct replications that verification of an underlying hypothesis is what I just talked about number five, but it's also potentially to control for fraud or sampling error or many other things of why we might want to conduct replications. If you want to know more about replications I'm not going to go through this we will share the slides, and we'll share the link to the slides, but here's several papers that give very different views on replication because what I've been talking about is is is a common view of replication but not necessarily everyone agrees with it, but here are many different papers that might give you some different perspectives on replication. So replication is an important tool, but it is definitely not the only tool that we think is needed to avoid some of the problems and consequences we're going to need a bigger boat. It also reminds me of that quote of the, the sea is so large and my boat is so small. When I think about trying to get the education research community to where I think we could do where we could be. It's a little daunting for me to think about wow how do we get there and how can I live up to that and how can I do it feels a little scary and so I think I'm going to need a bigger boat. And that's where I think collaboration could be really helpful because it's not quite as much weight on on my shoulders, it could be can all work together to try to live up to these aspirations. And so now we're going to pivot a little bit to collaboration and collaboration work. We're talking about these five different models and here's the overall type and how we think they're relevant. And without further ado, I'll turn things over to my colleague Scott Peters. Hello everyone. Thanks for joining us today my hello from snowy and frigid cold Wisconsin where the temperature is currently negative three degrees. I'm going to be talking about, again, I agree with Matt this this term is kind of clunky but I'm going to be talking about two different methods of collaboration that are quite similar. The first one is if they're two really distinct things but then kind of interject some comparative thoughts across both of them. So the first one I'm talking about is this idea of overlapping establish effects and so when you hear that you should think immediately of okay we have some prior research about a theory about an intervention instructional method or something. There's some kind of disagreement about the size of the effect I mean, usually if you're seeing studies, there's going to be some kind of statistically significant effect. That could very likely be for many the questionable research practices that Matt just mentioned. But usually you're going to see that there is some kind of effect is probably positive but there's some disagreement across of them. And something I want to point out about both the methods I'm going to talk about is this quote from McBee in this lovely book toward a more perfect psychology that really talks about how the purpose of confirmatory research where we're really testing a hypothesis is to resolve some kind of disagreement is to settle some kind of disagreement. And so something I was thinking of is how many times have I been in a conversation with teachers or with other faculty where I've said that something works and there's research behind a practice and somebody else holds up another study and says well this study disagrees. And we have all this research out there as Matt said, we know a lot of very little about a lot of things but we don't know about things really really deeply and so that's kind of what I'm getting into with these two methods here and go on to the next slide there man. So what I'm going to talk about with these first with this first method, the two best examples I have linked down here on the bottom and again Matt's going to share these but you can also Google these because they're open science topic so it's all open on the internet. The first one is really getting at this idea. I think of this one is a little bit more meta, a little more focused on like the scope of replicability or the size of effects within a field within a journal it's just a little bit larger in scope so like an entire body of work, but it can also be just kind of several interventions or several instructional methods at once. So the examples I'm going to show here from the reproducibility reproducibility project psychology, something like 250 authors studied about 100 different experiments conducted in the four largest journals and psychology, with kind of two goals to see do they replicate. So do the replication still show significance. And if so what was the replicated effect size. So that's why this last line I have on here I think is the way that I always think about it. It's like we're doing 20 years of replication, but in a single study. So rather than rating for decades for people to do different replication studies to see if something is statistically significant and hope those replication studies happen at all, because they're not terribly common. So to wrap it all into one study again with this idea of kind of settling the disagreement more quickly is the way I think about it so providing a much more informative study much more quickly. So Matt if you want to go on to the next slide here, what you'll see here is the, the number of studies so you have the four different journals that were studied in replication project psychology, and you can see 35 of 97 successfully replicated. The original effect size was about point four. And in the replication studies, the meeting effect size was about point two. So replications were much less likely to replicate again these are in the four biggest kind of fancy pants journals in psychology so this wasn't like oh sure if you take the whole field you know this is some pretty rigorous journals. So replications were more strongly powered conducted by a wider range of teams. And when these effects averaged across we saw smaller effects and much less likely to replicate. Again, that kind of makes me think of the dumpster fire Matt was talking about I mean that's not good. And if you go on to the next slide, you can see another figure that I think is really telling about this. So on the left are your p values from these studies and on the right are the effect sizes. So on the left side of each figure are the original studies p values and effect sizes. And on the right of each figure are the, where I say p values on the left effect sizes on the right. Sorry, if I crossed them. And what you can see is already kind of what I said but I like these visuals a lot that show the density you just see how common those low p values are in those studies because studies that are statistically significant are far more likely to get published that's kind of a larger open science problem we see it here. So again, getting to that idea of what is this doing for us, it's allowing to get a much more informational study we're getting a lot more information from this one paper if you think of it as one paper about these different interventions about the state of the field from any one study. So that's why I describe this as kind of meta it's like thinking about the larger state of psychological research, then it is about any one particular study. And so if you go on to the next slide, Matt. I know at least a lot of the teachers that I work with are very familiar with john had he he's got this book visible learning. That's just kind of like an encyclopedia of effect sizes of all educational research. And it's extraordinarily popular I feel like for a while, you know, there were like four names education that every teacher knew and this was one of them. And so he's got this this big list of, you know, effect sizes of different educational inputs. And I feel like this is kind of the perfect context of where a study like what we're talking about here this method of collaboration would make the most sense. We've got all these different effect sizes of all these different interventions, but this is a meta analysis so it's tons of different studies across a long period of time you know maybe 100 years in very different classrooms probably in different countries a lot of very diverse context here. And so we don't have a really good sense of the nature of the true effect. And so this is a place where kind of as Matt said before, we've got some kind of a priori established effect but we need to know a little bit more about it. So if you go on to the next slide we're going to get to a very similar method that I actually think makes even more sense in education. So this is the multi team collaboration and it's been called many labs many babies I think many primates. There's also a specific education one called many classes that I'll talk about now in just a second. And this is much more about where a group of people gets together, kind of the many labs or many classes so there's many different sites kind of across the country across the world. We're going to implement a shared protocol. So we want to test the effect of X, but importantly, we want to see how it works across a very large number of contexts. So it's very interested in external validity. And sometimes I have kind of a foot in each pond I have a foot in the psych world and a foot in the education world. And sometimes I feel like psych folks are very interested in internal validity. And cause and effect or of a very carefully designed and controlled study and you know often done in labs hence many labs. Whereas educators immediately come back to that and say well yeah but you know that was in this context I teach in this kind of school or doesn't work with middle school students or high school students. It seems to me like often educators are more concerned about external validity and generalizability the yeah but does it work in my context kind of kind of concern. The thing that I really like about the many labs or multi team collaboration is it kind of gets at that. So Matt if you go to the next slide what I have is I have two figures here that I'm going to share from many labs one and many labs to. Again, these were focused on psychological theories psychological prior findings, and you can see all the findings listed there along the y axis. And if you're an education and not as much in psych, those might seem a little foreign to you and that's fine. What you can see here is you can see the little X's sometimes you can see them sometimes they're kind of hidden. And that was the effect size of the original study. And then what you have is the dark shaded circles are the US replications and the open circles of the international replications, and you can see it's not as consistent where oh the original studies were all large effects and replication were all small, but you do see some pretty wide variability. We're kind of dialing in on the true nature of the effect and seeing that diversity of the effect. So, compare the difference of like currency priming down at the bottom, you know, not much effect basically around zero, but look at the diversity effect pretty small. Whereas if you look larger up on the different anchoring studies, you do see a fairly consistent effect size around one of the original studies that is, but I'm pretty wide variability. So there's some difference in context it tells you it might work in one context but not another, which is pretty important if you're going to implement a finding. And if you go on to the next slide. This is just many labs to I believe there have been five many labs like it's actually hard to keep up because they're very prolific. You see this again this time they've overlaid those standard normal distributions to see some for some interventions we see small effects that are consistent and for some interventions in some cases there are very large effects but then sometimes even very negative effects. That's kind of important. I mean imagine in medical research if we knew that like a drug could save your life when you're an inch from death, or kill you tomorrow. You know that's a pretty wide variability of effect we probably would want to get a little bit more information before we use that. Now we don't deal with that kind of stuff in education as much, but knowing how consistently something works is almost as important as whether or not it has like a mean effect size that's positive or negative or whatever. I thought I would just share this example so the next slide is from many something called many classes, and I, I hate to put this much text on here but I wanted just to highlight something here from many classes and so many classes as an effort at the higher education level to see if the timing of feedback in your undergraduate or graduate classes has an effect on the effect. So if I give you feedback immediately after a quiz versus six weeks later versus a month later, does it work that's kind of the the single class approach to a study so it doesn't matter when I give you the feedback. So that might be implemented in introductory psychology class of physics class, you know environmental science class, you know literature class, the effect probably changes depending on the context. And so what many classes has done I think they're in the middle of publishing their findings now is they implemented it in many classes. So they got all these collaborators together share the protocol and everybody went into their different classes, you know, intro physics intro math college algebra that kind of stuff to see if the effect vary depending on the context. That's really what we're getting at here is does the thing still work, and does the particular context of where it was implemented matter. And I can think of all kinds of educational implications of this so if you go on to the next slide. You know I'm thinking of, I think it's from 2018 there's a pretty famous meta analysis by Stockard about direct instruction. And so you can see pretty, you know, substantial effect sizes there on that first line from that else on direct instruction. But again, I mean if I look at this or if I were to present this to teachers one of the first things they're going to ask is, okay but is it different for kindergarten. It's about international audiences versus, you know, you know, classrooms that aren't age based or something like that like what are the specific context in which it works, as opposed to all these studies that were included in the meta analysis that involved many different types of research like they were implemented in many different contests across also why do you know why period of time. And the last thing I'll point out here this this next slide here is my last two slides here talking about a study that I, and I'm pretty familiar with it gets at the effects of academic acceleration. So this is looking at a very, about 30 year period of acceleration or a 20 year period of acceleration studies. But these were studies that were uploaded across a wide types of context, a large different types of studies and you can see in the abstract here, we got a pretty wide confidence interval in terms of the size of the effect on to the next slide map this is my, my last slide here. You can see in that top row here, you know, we've got a pretty wide bound of our effect size, you know, so it's either slightly negative, or you know, pretty decently positive. We might want to know more that doesn't settle a lot of disagreement for me if I'm just looking at this and like okay great so if I'm a principal should I go do this. But in a many labs or many classes kind of context we might take a shared protocol for how to decide on whether or not we accelerate students, and then go and implement it in something like 100 districts across the country to really get a broad scale sense of replicability of generalizability that is going to provide a lot more information, and hopefully settle some of the disagreements because now we're not going to have some people citing that, you know, negative point one study and some people citing negative point four we have a much better sense of the consistency of the effect. Okay, and with that I think I'm about a minute over so now I have to move on to I think my colleague Aaron Miller is next. Nope sorry everybody you're stuck with me again. If, if listening to Scott's discussion of all these different ways you could form collaborations makes you think that sounds exhausting. I'm not, I didn't come into this profession to organize all of these people it's like to hurt all the cats of my colleagues. You're not alone. Many others have thought that sounds really hard like yes I see the upside, and that could provide the field a lot of value but how do I do it that seems like a lot of work I have a full time job already. And that brings us to this other form of collaboration this infrastructure of persistent collaboration. And so you may think of other fields like in physics there's CERN which we have the slide of the large Hadron collider particle accelerator on the left that is so big it goes through France and Switzerland, or you may think of something like the human genome project where it's one project with many many many many many contributors, and they're getting papers and asking questions over for many many years and decades, both of those have this existing infrastructure to help researchers ask their individual questions or do their projects, because they're working on things that are bigger than their own paper, and even perhaps even bigger than their own careers and their own research agenda. And several years ago in psychology, a researcher and a group of researchers saw this and said hey we could benefit from this in psychology and so they formed the psychological science accelerator. And I'm going to talk a little bit about that and then make some connections to where I think education research could benefit from in the psychology accelerator because I think that's a little closer to home for education than something like physics or the human genome project. It's this globally distributed network of labs right now it's over 500 different research labs in 70 countries on all six permanently inhabited continents. I think people are always in Antarctica now, but their goal is to coordinate data collection on a group of democratically selected projects and so it's the group gets together and says what should we be collecting data on and then everybody or large group of these labs collects data on it, so that they can get a really diverse set of data from many different departments, so that and why why all this matters is because they want to accumulate this reliable data and answer some of the questions that Scott was just talking about is, when does this happen for whom and what contexts. Now in education. We have in our countries we usually have a federal Department of Education we have more local departments of education and even internationally, we have some things like PISA and Tim's can compare different countries on test performance. There is some parallel that already exists, but a lot of these are institutional or run by the government and not necessarily run by us as researchers. So we have less control over what data are being collected and how, whether or not we have access to the data, or can we answer the questions that we want to answer. Education has one benefit in a lot of we have a lot of information that's already being collected, but the research community doesn't necessarily control it, and I just wanted to share briefly. Some of the folks from the psychology accelerator re wrote a paper I think in the last couple weeks about promises and challenges of big team psychology, which I think also translate well. And the first that they talk about is navigating institutional barriers back in April like many education researchers, I wanted to collect some data on to look at what the pandemic was doing for learning. And I wanted to start a mini collaboration with folks from three different institutions, and turned out getting rapid response from for IRBs to approve what we wanted to do quick turnaround. We gave up in July because realized by that point, the school year had ended and navigating the institutional barriers was really hard. So even if you have an existing infrastructure, just as the research arm, but if you're of other institutions within education don't also advance it can be really really challenging. And in this article the psychology researchers talk about what it's like to navigate over 100 different IRBs when you're trying to collect a survey and asking questions because of one IRB that mandates a change, then that creates a cascading waterfall of 119 other IRBs have to sign off on the re wording of that item to and boy, that takes a long time. Similarly incentivizing skilled but less visible contributions and a lot of our work, the kind of the only carrot that we have to offer is an authorship on a manuscript. During a large scale collaboration of any sort, you might be one name of dozens or hundreds of authors, how meaningful is that going to be for you as you apply for jobs or apply for promotion and other things can be incentivize contribution and participation, especially when a lot of the contributions can be in the logistics of running the logistics of it and not necessarily writing the manuscript or analyzing the data. And similar the third challenge that they talk about is we as researchers don't necessarily get a lot of training in herding cats and navigating some of these large institutional barriers, how do we how can we gain this experience and gain a way for that to be incentivized so that it's worth our while and we get the credit that we need professionally for putting the time and to actually doing all of those things. So I'm going to turn things over to Jay to talk about collaborative analysis. Thanks for that feel the odd duck because I'm not a psychologist but I've done some work with a collaborative analysis and this is a way to deal with some of these replication problems that focuses specifically on the data analysis stuff right so another practice practice help is actually analyzing the data, because there are a huge number of ways that that analysis could reasonably perform I mean some of these things we're trying to root out our malicious research practices but some of them are just. I had to make a choice as a single researcher so I can only make one. A lot of people have great ideas and lots of people learn great methods and they might not line up as well as you hope, like you might have a research question that be perfect for synthetic control but I mean you probably very correctly understood clear of all the economists in your life. And so you didn't learn that method and you're probably the better for it but your research question may not be. And so if you don't have the correct tool in your toolbox the correct tool you might not know you have don't have the correct tool, you might say look the structural equation model is going to work just fine. And that's what I'm going to run with, but these things have can have massive effects on the results you actually find. Other stuff, even if we both are going to run regular old least squares regression regular us what if you have missing data are going to do this wise deletion. If you're not careful some of the programs will do that for you, if you don't know where you're missing covariates they'll just do it for you. Do you replace with the mean. Do we impute multiple imputation well all these decisions are on face reasonable decisions right so you, but it can easily correct your results. I guess you could you the analyst the one analyst could just do the analysis like 17 times or whatever and just, I've made every cascading tree decision I just made the decision to the analysis went back to where I made the decision made a different decision make the analysis but that's so it's better to find a friend or at least some colleagues or at least some people that aren't actively trying to sabotage you when you're doing science. And it becomes even a bigger deal as we get into points where we're going to use more sophisticated slash less more opaque. So, when we talk about things that are just y equals x beta plus epsilon, we have feature engineering, what rich, what penalty did you choose for your regression. Like, where did you start for your random force like the starting place matters and some of these things you can take care of replicate by via replication, but it might not be that I need to replicate doing it from a different random from the same random starting point, I'll get the same answer. I know that that's how the algorithm works. What if I chose the wrong starting point. And so many so collaborative analysis is one way to deal with that. And all of this is, I'm sorry, next slide. I can't change it on my end and not yours. Thanks man. So one example of this is this many analysts one data set paper at Silver sign and 60 other people did. So, the research question was our soccer referees more likely to red cars to darker skin tone players and lighter skin tone players so they had 29 teams. 61 analysts were given the exact same data set and the same research question. And so, you can have this idea where, you know, some labs are going to choose some methods some labs are going to deal with missing data in some direction some labs are going to bend the data and some, you know, we can bend data we can leave them as linear, so various all that stuff so if you go to the next slide I have the results. So the effect sizes range from like kind of big, I mean there's like a two point some actually at the bottom those two effect sizes are truncated so that you can read them they're so big that they would be off this ratio scale. So like odd ratios of over three to nothing to no results you know point estimate is negative and and the confidence interval crosses zero. So this is just an excellent opportunity to look at hey look some of these, like, as in credit, like completely reasonable researcher decisions can lead you to different conclusions. Yeah, so using many, so I got many analysts one data set I'll break it into pieces. So the, the many part could be you know two or three people or it could be 61 61 to be 2019 61 people, but we're trying to do is wash out the individual effects of idiosyncratic researchers decisions. So I have a distribution of the effect size, as opposed to one number that's me who decided I was going to use OLS, I was going to throw all the outliers, you know, I'm not being quite so naive with my methods in real life but if I just decide I need to get this on quick and dirty, I could do it, I could defend every one of the decisions in print. And another researcher might say no I think you needed to you know maybe you should add some fix effects or maybe you should use a structural model. So what we want to do is is dilute the effect of small researcher decisions on what our understanding of the effect side is. And a lot of part of this is, we want to stop thinking about effects as a number, and as a and start thinking about them as a distribution of things that might be the truth. So, I don't think that the odds ratio was to I think it's somewhere between, you know, one and a half and three. And so similar the idea of one data set could be the exact same data set in this case right so I gave everyone the same data set, or it could be a similar data set from the same context. So I have a project that's going on with my research team and another research team studying policy change in Wake County. And so we have the same data access, but I could. Thanks. Yeah, we have the same data access, but we've chosen to focus on different things so I may very much care about test scores I may want to control for test scores and they may say look, I don't think test scores is really what's going on here. I think we really need to focus on distance to. You know, distance to school. And so we have the same data access but I'm choosing different covariates I can code myself differently, trying to get at what happened in this particular process so again it's just all about trying to take some of this researcher degrees of freedom. Good researcher degrees of freedom right not the not the malicious type and deal with them in the analysis step. So that's why I got handed over to Aaron. I had to meet myself. So hi, I'm Aaron Miller. I think I was given this topic, because everyone thinks I like to start a fight, but I actually don't. I, I, but I really like a good argument. And so I was given pre registered adversarial collaboration. So, in our work. Most of us have some big questions that have camps, where one group of people are in this camp and another group of people are in another camp and, and they have some fundamental disagreements. So, pre registered adversarial collaboration starts with a discussion of what are some of the big disagreements that are held between researchers, and then what can we do about them so it does start with a disagreement about some important theoretical or empirical question. And the idea is that we bring together the adversaries to resolve this issue. So we're bringing together bringing other two teams of researchers with very different perspectives on a certain question. So a lot of time it just begins with just a discussion of what is the big disagreement. And it's important to begin there with a discussion, because it might be that the, you know, that it's over said that the difference is actually something and it isn't just not having to be a big deal after all. So it starts over the conversation amongst the two, the two different perspectives. And if it is something then that that that actually is a true disagreement, then the two groups of people get together and agree to collaborate on a research project together. So, the goal is to lead to some of the full hypotheses. We got the next slide. So the goal is to get a testable hypothesis and transparent research. And there's been several attempts and at a pre registered adversarial collaboration you can see some of the references down there at the bottom, and some articles about what makes this work. From several people's experiences. What seems to make this work is first the two groups coming together with a desire to address the differences. It's not going to work if people don't go into it with good faith. So you have to set aside some personal motivations, and that can be very difficult. So it's got to start with all the collaborator collaborators agreeing that this is something worth doing. And then once you have that, then you can go into the actual work. So the actual work is trying to create a protocol that will actually answer the question that will actually resolve the difference. And I'm sure that you can think of a lot of ideas, a lot of ways where that can go really wrong. So again, people have to go into it with the commitment to being objective, the commitment to being open. But some things have enough emotion behind them that the collaboration has to involve a third party and impartial arbitrator. So I think this is a really fascinating way to do research to bring together people that usually are not communicating and collaborating to solve a real problem. So what we've done in the past are things like a rejoiner to an article or a target article in response or several different perspectives published together in a journal on a special issue, but that really isn't solving the problem. And so a pre-registered adversarial collaboration is meant to solve the problem. A lot of times there's a lot of emotion and you could bring team into an impartial arbitrator who is trusted by both sides. All right, so the menu for this is we've got a problem that can actually be addressed. So it's a real problem for discussion. So people go into it with two different perspectives on the issue. The two groups of collaborators have a desire to address the difference, and they trust each other or at least have an impartial arbitrator that can arbitrate any differences. So if all of this can happen, then a protocol can be created. And the key thing about the protocol, in addition to it being pre-registered so that no one can go back and say, oh no, let's look at it this way because then that proves my perspective. And notice that the protocol that is created, both sides have to agree that it would at least address the difficulty, the difference. So sometimes we have groups of people who are at odds. And if no matter what protocol is created, I would not falsify anyone's belief. That's not actually a scientific disagreement. So coming together to collaborate on something that would, that no matter what the results are, no one would be convinced is a philosophical agreement, not a scientific disagreement. But if it is an actual scientific disagreement, then the two groups of people get together, and they're able to create a protocol that then is pre-registered. And one is put forth for the publishing of that particular result. And that is also pre-registered ahead of time. And then the goal of this is to actually answer some of those big questions that we have thought about and thought about and thought about in education for the last 100 years. I think this is probably the type of collaboration for which there is the least amount of examples because it is one that that is the hardest to do. So there's been only a couple pre-registered adversarial collaborations in psychology. But I think it has a lot of potential for education because we have so much emotion behind our ideas and because there's so much at stake. So this is just another option that uses some of the things that the other researchers have already talked about. So these best practices in collaboration, but has that kind of additional little fun flavor of adversary. Thank you, Erin and Jay and Scott. Just as a brief wrap up before we go to Q&A, we hope that this last 45 minutes or so has helped introduce some terms that you may have heard before and they may want to know more about. And we hope these terms are relevant, not just for you the next two days at this conference, but hopefully in your work going on into the future. We believe these two solutions are not the only tools that the field needs to help us reach our aspirations, but we think they're two important tools that could help get us there. Again, here's the link to the paper that's at the bottom. And I guess I'll stop screen sure and if you're watching this session live and you're trying to access the link to the slides. We've not posted them yet because to be honest to the the finished slide show did not exist too much before this presentation was given but we'll make sure that's up soon. So I guess we'll now we'll go to the Q&A and here's our Twitter handles. Here's the link to the slides that will work soon and link to the paper and we hope folks have good questions and maybe you've already shared some in the chat. There was a great question in the chat that all that I kind of talked about in the chat but I said I would read out as well by Michelle, which was quick question about direct versus conceptual replication, how do they fit with older terms and education like efficacy or effectiveness. Anyone wants to jump on that one? I've not ever tried to make those that connection before. I guess you could say the direct replication is can I get this results under the same conditions as the original finding because that's really is the goal, whereas the conceptual replication is not necessarily enhanced conditions but it's a different scenario. Does the same result happen in a different context or using a different measure? For example, so if you follow Carol Dweck's exact research protocol, you will get the effect of mindset. So if you induce a belief that someone's abilities are fixed and they're not going to be able to be changed and then you immediately assess them on those abilities, and then you give someone out. You induce the idea, you suggested them the idea that your ability is malleable and you can always get better, and then immediately test them on that, you will get an effect. So if you follow them directly, you will probably get an effect in the distribution of truth. Okay, so you'll get an effect in there. So that would be a direct replication. But then when you try to do a conceptual replication, so just a couple, there was a, and I can't remember the exact, the office for it, I apologize, but I can always look it up. When they tried to induce it over time in a classroom, so suggesting in the classroom that your abilities are malleable, it actually didn't improve achievements. Overall. And so, like, okay, well, that would be conceptual, you know, conceptual replication, we're trying to tell kids that their abilities are malleable they can always get better. And that did not seem to have an effect on their performance. And then when it was looked at, you know, in the actual context, it actually had little to no effect on students that were already achieving well so at grade level and above, but it did have an effect on struggling students. So getting a conceptual replication is like where tells us okay now we know where this actually applies. That that was my example. And something I've thought about a lot is I've kind of journeyed more to this open science world is like degree of confidence in a particular effect or intervention. So, you know, prior to my involvement in open science I always thought about it in terms of basically a single study so like what's the one best study on this topic and then that was like my prior belief for understanding. But I think when you go into the whole open science world and start thinking, at least as I have a much more like about a body of evidence. So 30 different studies, every one of those is imperfect in some way, you're kind of digesting all of that and then making a final judgment. I think efficacy or effectiveness starts to take on a new sense, because you're making kind of that, you know, it reminds me of like cook and Campbell, but like about causal inference, they always talk about how causal inference is a qualitative decision. Like there is no, like one study that if you do it, you can say one cause the other like it's always a subjective nuanced decision. And I think part of what in the context of collaboration. Many labs, many analysts, you know packs and everything. I think it's that most of these methods can give you greater confidence in a finding, because they were tested by more people across different sites across different contexts. So I think it gives you like a greater confidence that something actually is true. But to your specific question about conceptual versus direct, I kind of think that like direct replication is getting at like internal validity. We do a direct replication over and over and over and over if it's true every single time we have more confidence, whereas conceptual is almost more about external validity and I'm curious to hear others weigh in on this to see if they agree with this characterization but where we're taking an idea, maybe works with kindergarteners and trying out high school students, or taking something that works in the intro psych classroom, and then trying to bring it to real world k 12 classrooms we're taking the general concept, say, of engaged learning or something like that and and seeing if it works in a different setting. I don't think it always has to be that way, but it can. Such as my thinking and again I'm curious to hear other people agree or disagree or think of it differently, or we can move on to a different. I have a question in here otherwise about are there any quality indicators or standards literature on replications. So I'm assuming things to look at when judging the quality of replications is what she's asking about. I know Matt you have always said that's correct. Yeah. Yeah. Yeah, I don't know if there's anything standardized on it I think one general rule that many people who talk about replications really suggest is important is if you're not going to do a direct replication so you're going to do a conceptual replication and you're going to change something intentionally is the more things that change the harder it is to make clear conclusions about how your results connect to the original finding. So if you're changing the population that you're studying and you're changing the measure that you're using and you're changing the analysis that you use and you're changing other things. If you get a different results. What do you conclude you don't really know what quote unquote broke the original finding. And so there's many who strongly recommend if you're really trying to be systematic about it is to make one intentional change at a time. And what you should add to that is you should probably be careful and considered of what you're changing. So, if you're going to replicate say oh well this you found this in this study but let's replicate it and see if it works for left handed individuals Well, why, why are you asking about left handed left handed individuals do you have a theoretical reason to believe that left handed individuals would yield a different was different results than right handed individuals or the general population. And so if you're going to manipulate or purposely change something have a have a good reason for why you're changing that. I like to say like your can a replication should have a clear research question. So a conceptual replication might be does this work. I think this works but does it work in X context. That's a research question for a study. A direct replication should be like, do we try do we want to build evidence base so do I think this works exactly as written again. And so if you can't articulate a research question that's generally backed up by theory and or at least a good hunch. Yeah, that's that's a bad sign. And so that's I think Matt's point like what you conclude is like well, you should be able to articulate a research question that you answer in your study, even if your study is what they did. Education sometimes gets tricky because we don't always have theory driving our research it could be practical knowledge of knowing. I know that well funded schools work differently than poorly funded schools. And so we found this effect in a well funded schools but will the intervention work in a poorly funded school as well we don't necessarily know that that's going to work. I don't know if there's a grand theory driving that other than maybe the theory that resources really matter. But that often is what drives our research questions for conceptual replication and I think that might be a good enough reason, and it also fits Jay's criteria of that's your research question is, will what worked over there work in this particular context. And also I think the one other thing I just added it's really quick is something to look for a replication is how much work and what the authors did to make sure they were doing the intervention exactly as the original authors did it. Assuming it's a direct replication. You know if they're testing. A lot of these things education are very vague like oh direct instruction. Oh great I'm going to go, you know, implement that in a classroom. Okay, like you're going to need a lot of information to actually replicate that study and then I forget which if it was many labs or what it was, but they talk at length about reading the papers going and talking to the original authors many times really trying to clarify exactly how it was done, because if the authors just said oh yeah we replicated it but there's no information about how they made sure they were replicating the same thing, your confidence in the findings as testing the replication is going to be pretty weak. Yeah, and just to answer another topic that came up in the chat recently is how to institutions view or judge a publication where you might be one of only 60 authors to institutions value this type of work. I think that's another area of our field that we need to see some changes in improvement and especially where I think we as individual researchers maybe can't get too far ahead of other portions of the research community, because if we as individuals are only participating in large scale collaborations, but then no institution wants to hire or promote people who do that primarily as the research. We are going to be selected out of our fields because we will not be hired or promoted to stay in the field. And so that's something where we have to kind of move to the different pieces I think forward. Or maybe not fully in line but a list a little more closely if we want to be rewarded for this type of work. And I'm seeing that it is now 12 o'clock I want to thank folks for joining us and hope this was helpful and our Twitter handles were at the last slide if you want to reach out to us and we also hope that we see you for the rest of today and beyond and hope that this helped you get a little more comfortable with joining us in open science.