 Yeah, thanks for joining us today, everybody. I'm going to present the MetaPyPEX framework today, which is framework for data analysis and termination for multi-lab replications of experimental designs. This will be more of a brief introduction on how we arrived at the framework, what it is, and how we'd like to see being used. And for a bit more details on how to actually use it, please feel free to attend our tutorial on Friday at this year's SMART content as well. So where did it start? We were going to re-analyze data from large-scale direct replication projects like the MenLabs or the multi-lab registered replication reports. And we intended to use participant or item-level data, and we needed that data to be cleaned in order to compare re-analyses with already-published results. That turned out to be much more difficult than we anticipated. But before I'm going to talk about the difficulties, I just want to get some vocabulary out of the way to make sure that we're talking about the same things. So this graphic here represents the basic structure of multi-labs. And when I say multi-labs, as I just said, I mean the multi-lab registered replication reports on MenLabs. And all of these have a very similar structure on some level. And the first interpretable data would usually be item-level data, where each participant is a row, and each item is still represented as a column. And for individual participant data, these rows are still participants, but now the dependent variable has been aggregated to a single numerical value. And these, again, are aggregated to achieve replication-level statistics like between group effect sizes or group variances. And then those replication statistics are used to run meta-analyses to aggregate data for each replication project. And finally, a multi-lab then might span multiple of these replication projects. Yeah, but despite the very similar structure, the solutions they found to document and analyze their data were rather unique. So just a few examples here. The software and users usually are, but sometimes SPSS or other software, the code structure itself was very unique to each multi-lab. So when or by whose data aggregated, sometimes these steps were taken by the replication sites and we don't find the actual raw data. And sometimes that was done at the replication project level. Yeah, the data files themselves are very different. Sometimes CSV files, SPSS files, Excel files, R data, et cetera. And if you include the files with information for the data transformations, so like information that the code sources, the analysis or data transformation code sources, you will also find solutions like text files or Google documents and probably some more. Yeah, so the variation across the multi-labs is quite large. There's little consistency in naming conventions, sometimes the verbal descriptions and other provided details on the data transformations are sparse or even inconsistent. Some of the code solutions are really complex, though interesting. And sometimes we do not find cleaned data sets at different levels of aggregation. So this just results in a lot of detective work for anyone who wants to reuse that data and makes computation reproducibility much harder to achieve and actually check. Yeah, and I want to elaborate a bit more about computation reproducibility. But that would mean you can run the original code on the original data and achieve the same results as reported by the projects. And there were a few great analyses on registered reports and publications with open data badges over the last couple of years. But the results are somewhat devastating to be honest, which maybe just means it's maybe a lot harder to get there than we think about, than we think. So some of the results were just only about 30% of the articles these projects looked at were actually reproducible. The computation reproducibility depends on the skill level of the analyst, which then again poses the question like how much skill is necessary until it's not computational or reproducible anymore. Most of these analyses agreed that using non-proprietary formats helps a lot and providing version control. So that could be using solutions like RN, but also using containers and like Docker, for example, and also just simple stuff like using relative locations. So you have to change less file paths just when you're trying to rerun it. And some of these were already implemented, some of these best practices were already implemented in some of the multi-labs. So for example, many labs too found a package solution for their data transformations. I think the package at the moment could be pulled, so I'm sure if it's available right now. There were container solutions, a lot of non-proprietary software was used, and also bug trackers, which are great just to make sure that the community interacting with those projects is visible and just so we know where people have found errors or deviations. Yeah, but this is very inconsistent across these projects. So our solution to these complications is MetaPipe X. And just to break down the name quickly, it's a pipeline from meta-analyses of experimental data. And these make up more than 50% of the currently published director application multi-labs. And yes, the framework consists of three components. We see on the left the standardized analysis pipeline, which provides guidance to make the analytical structure more explicit and reduces documentation effort. We also created an R package that matches this pipeline, so it analyzes data. It creates standardized documentation for the data at different levels of aggregation. And the third component of the framework is the Shiny app, which we can explore data, which we can use to explore data, MetaPipe X data. That's a combination of replication results and meta-analytical results. And basically it's just a GI to select and visualize multi-lab analyses. So you can also run the analyses within the Shiny app and combine different data formats. It takes SPSS data, R data and CSV files. So you don't have to be able to work with R to implement the pipeline. You can just use this GI. And the most comfortable version to apply the pipeline is the full pipeline function, which just takes individual participant data as an input, which is depicted on the left. And yeah, then just runs the analysis and creates a full documentation of the standardized pipeline. On the right, you see the full structure that is exported by the function. And in R, this is also provided to you as a nested list. With each folder, you always have the data file and according code books, so you can make sense of the columns and the data set. And finally, what we hope MetapyBx may do, we hope it will reduce effort in analysis and documentation. We hope it makes your multi-lab data more explorable, so no matter if it's your own data or you combined or simulated data. And we hope this fosters interaction between the research community and these types of project. Best case scenario, it helps to make the multi-lab format more accessible to primary researchers and students alike and it helps to develop and ask more meta-scientific questions, chat more a light on questions around heterogeneity. Yeah, and that just adds to the meta-scientific toolkit. Yeah, that's it from us, or for me today. Thank you for your attention and please feel free to join our as Macon tutorial on Friday.