 Hello, and welcome back to Esmeralda Conf 2023. This is the first presentation session on planning, collaboration, and review management. As always, you can ask questions via Twitter by following at ES Hackathon or by using the Slack channel if you registered for the conference. Presenters will be answering those questions after this session as well. So keep an eye out. First up today is Gerard Wagner and Julian Presta and they're going to be introducing Kohlrev, a pipeline for collaborative and Git-based literature reviews. Welcome to Kohlrev, a pipeline for collaborative and Git-based literature reviews. I'm Gerard Wagner. And I'm Julian Presta. Today, we're excited to present Kohlrev. Kohlrev is a standard for data and collaboration management that supports the entire literature review process. In literature reviews, we still have to waste a lot of time with data wrangling, operating complex interfaces, dealing with unreferenced managers and converting proprietary data formats back and forth. We're not prepared to complete the process efficiently with updates and multiple search and review iterations. Our inspiration to address this challenge are the tidy rows and statistical data analysis packages. In these cases, we've seen the benefits of shared data structures which allow users to select and combine different packages with ease. Our work also builds on Git with its transparency, reproducibility and scalability built in. With Git as a basis, we could really scale collaboration within review teams and beyond by involving the broader research community. Git also makes it much easier to undo changes or test different extensions. We believe that the design of data structures really matters in order to create an ecosystem of review tools and forget we need to think beyond tabulated data structures. These data structures would differ from conventional ones in reproducible research. For literature reviews, we need to regularly feed updates and new records into the process. And the process itself combines manual and automated steps. That's very different from the typical reproducible research analysis where we really manipulate the raw data and all the changes are purely computational but simply dumping everything into Git version control would create a huge mess. If we combine shared data structures, standardized operations and a Git workflow, everything becomes much more efficient and we can cycle through the review process with ease. We believe that just proposing a data structure and management system would not be enough. To show the benefits and to get this idea to fly, CallRef is now available as a first prototype and Garrett is now walking you through the live demo that covers the entire process end to end. So welcome to the demo. This is the three step workflow of CallRef and all you need to know is CallRef status. This command shows you the current state of the project and also tells you what the next operations are. So you run CallRef status, then run the next operation and then you can also validate the changes. So you basically repeat these cycles. This will take you through the whole review project from initializing the project to retrieving the metadata and PDFs, screening activities, as well as different forms of data analysis and synthesis. So CallRef status. Now tells us that we should initialize the CallRef repository. We can take any review type from that list. Let's just go for a scoping review as an example and you can read up on the details in the documentation. So now everything's set up. You see the different directories and files were created. Okay, so upon CallRef init, the status now tells us that there are zero records in the process and the next operation is CallRef retrieve. And this tells us how to add search results. So it could simply copy files on BIB, RIS, XLSX to the data search directory. We could provide PDF documents or you could run an API space in search. So let's try that. So here we are in the documentation where things described. You see the different options for API searches and we just go for the example. So I run it and the example is the CrossRef database, two additional databases from our discipline and we searched for our microsourcing as the keyword. So we run these commands and it fetches all the results and then we run CallRef status where we see 33 records in the process. We should run CallRef retrieve. Which now tells us that retrieve is a high-level operation consisting of the search, load, prep, and to be. So let's run that and explain to you what happens in the background. So the search results are stored in the data search directory. The settings are in the settings, Jason. That's where the search parameters or the API calls are stored. Now the load operation brings everything into the same BipTech format and it adds a few fields. For example, the CallRef status which keeps track of the state of each record throughout the process. This could be MD imported and it changes to MD prepared, perhaps pre-screen excluded or synthesized in the end. Now the CallRef origin points to the original record in the search source. So it's important to keep track of that. Now the prep operation also uses province data describing where each field comes from and whether there are any quality defect. For example, the title comes from this source and it has a quality defect because everything's in capital letters. Now the prep operation results all of those quality defects based on high quality data sets like CallRef curations or Crossref. In this case, you see that everything's fixed. No remaining quality defects and it's even connected to a CallRef curation. So coming back to our example, we see that everything is completed. We repeated the search with no additional records that were retrieved. The load was completed for the different search sources. In the preparation, we see several records that are quality curated as well as two records that are excluded automatically. And in the pre-screen, there are a couple of duplicates that were identified and merged. So the next step, according to CallRef status is pre-screen. Where we provide a short explanation at the beginning and then we check each paper whether it's relevant to our objectives. So in that case, it's microsourcing, it says relevant, same here. So in the end, we get a short overview of our coding and the next step is to run CallRef PDFs. So that retrieves the PDFs from our local hard drive from other projects, as well as online from the Open Access PDF collection of Unpayable. So now there are a couple of PDFs that would need to be retrieved manually, but I'd like to continue with the process. Just skip the screen, which is pretty similar to the pre-screen. So we include all in that step. And now in the CallRef status, it suggests the immediate next operation, the manual retrieval, but we can also go for the verbose mode, which gives us more options. So here we see more operations and additional information on versioning and collaboration. We see that in the data operation section, there's Prisma, FloatChart, an obsidian vault for data analysis, as well as a manuscript that we can find here. There's information on how to build the manuscript and how to create versions. So let's just have a look at the output. You can see the Word document with the Prisma, FloatChart, all the details, the figures added based on the pipeline. There's a to-do list of the relevant records that would need to be synthesized and the full reference section with all the details. So basically we've completed a whole literature review in just five minutes. And we can also go online to see the example repository. So you can use Git to collaborate in small teams and private, or also in larger teams and public projects. So there's no limitation here. You also see that there's full transparency of the changes. You have one commit for each operation that we completed. You can go into the individual commits, have a short record at the beginning, and also see the detailed changes that were applied throughout the project. So now I'm sure you're all wondering, how can I try this and how can I get involved? But there's a variety of ways in which you can get involved and they somewhat also reflect our vision for how Colorado Build developed going forward. By the way, all the different links here on the slide will be available in the video description. First of all, Colorado is live on GitHub and you can go today and try it out. Use it for your literature review projects and let us know what you think. The more and more people will use Colorado, the more we will be able to refine the best practices. And for example, tailored to the nuances of particular review types that you're working with. As you've seen in the demo, Colorado is built with extensibility in mind from the start. So we are looking forward to many, many contributions and extensions that integrate Colorado with your most favorite review tools. We have only briefly touched on the possibility of curated repositories in the demo, but we envision community level curations to play an important role for the reuse of review projects. And you can read more about what we mean by that following the link to our Colorado curations. And so ultimately, we hope that Colorado will become the data management standard for literature reviews. But to make that happen, we need you. So please download Colorado today and start using it. We are looking forward to your feedback. Thanks very much, Garrett and Julian. Up next, we have Thodoris Diaconidis, who's going to be introducing ScreenMed R, a package for automizing the screening of publications for meta-analysis or systematic reviews using PubMed database. Hi there. My name is Thodoris Diaconidis, and I would like to present you my package, a package called ScreenMed R, which is a package for automating the screening of publications for systematic reviews and meta-analysis. To start, I would like to thank the organizing committee for giving me that opportunity to present my program. So let me start with the idea behind the ScreenMed R package. Its task is actually to find all the relevant publications that are needed for a meta-analysis study as quickly as possible and as accurate as possible by using the program instead of showing instead of someone reading all the abstracts one by one or the publications one by one to end up with the more relevant ones. So this is the task of the program. And the program actually to accomplish this task runs in algorithm and unsupervised machine learning algorithm. And in conjunction with the cosine similarity, it can provide the user with the more relevant publications, abstracts in our case, for his study. The program runs through PubMed. So PubMed is actually the database, the output of the database that it's the input for the program. The output of PubMed is the input for the program. And at the end of the day, the user ends up with a small number of publications, about 30% of the initial publications for manual inspection. But the user can also lessen this number by using some extra functions that they are already in this script-medar package that I'm just going to show you just after. So how the program works. The input of the program is actually, as I said before, the output of a PubMed database search. It's usually a CSV or a TXT file, which includes the PMID numbers. The PMID numbers are actually what the program needs for as an input. I will provide also with a video vignette that I implemented a specific case in order for somebody to be more acquainted with the program. So this is the rest input. And the other input of the program are four or five publications, actually the PMID numbers of these publications, that the user is quite sure that these belong to his or her study. So with these two and actually the number of groups that the user would like at the beginning to divide a total number of publications is the input for the program. What the program does, it actually divides the abstracts of all publications in groups that the user defined in terms of text similarity. The measure for this text similarity is actually the cosine similarity. Cosine similarity is more or less a number between 0 and 1. If the number is very close to 1, it means 1. Let's say it means that the two texts are identical. If it is 0, it means that there is no connection of one text with another text. So more or less, you end up with some number between 0 and 1 for these specific groups. So what is actually doing, what's going to make the function is doing, the output, as I said, it is a cosine similarity. If you have, for example, two or three groups, you have three numbers between 0 and 1. And the group with the biggest cosine similarity is the winner, is actually the most relevant group of publications. Is it safe to discard all the other groups and keep the publications of the one with the biggest cosine similarity? Well, it is safe in the case where the cosine similarity, the difference of cosine similarity between the first, the winning, let's say, and the second, let's say, the less, it is greater than 0.2. In that case, you can discard the second or the third or all the rest and keep only the first one. This is a round of a screen method function. You can apply this screen method function again in a second round, taking this time as an input the output of the first run, and keep going until you cannot find the cosine similarity greater than 0.2. This is the idea. It takes a couple of minutes to, in case, for example, a modern computer for, let's say, 1,000 abstracts, you can run the whole thing in less than a minute and end up with a smaller group of publications, and you keep running it, and you can end up with, let's say, 30% or 20% of the initial ones. After that, it will be quite difficult to have such a difference, such a cosine similarity difference, and it's not working, more or less. It's not safe to work. Extra functions that are included in the packets is this mesclin BQ function. So what it actually does, let's say that you have a group of publications, and you want actually to have in common D descriptors and Q qualifiers, more than D descriptors and more than Q qualifiers with the publications of a comparing group. Let's say that the comparing group is this four or five publications that we had, and you end up as an input, another bigger group, and you want to see which publications of the bigger group have these numbers in common. So in this way, you filter even more your publications in terms of the very relative comparing group. Another function is the mess by name. This is more specific. Let's say it has to do with the name of the mess term. If someone is familiar with mess terms, there are two parts, the descriptor and qualifier of a mess term. We'll show everything also in the video vignette. So you define the exact name of the descriptor and the qualifier that you want your publication to include. And the program actually filters all the publications that you enter to the ones that have these specific descriptors and qualifiers. So this is the whole idea of the program. And these are the functions that are included. And someone can find everything in this web page. It's actually in GitHub. You can download it and install it from here. And there is also a vignette, a PDF vignette, that is more analytical. And you can find a case study here. And the program actually was implemented to this meta analysis here. And you can find more information for this specific program in the appendix of this publication. That is more or less what I would like to say. If somebody has anything more, if you want to learn more about the program, I would be very glad to help you. Thank you very much for your time. Goodbye. Thanks very much, the Doris. Our final speaker in this session this morning is Clarice Neville, who's going to speak about meta impact, designing future studies whilst considering the totality of current evidence. Hello, everybody. I'm Clarice Neville, and I'm based at the University of Leicester here in the UK. Now, the Complex Review Support Unit, the CSU, have a suite of web apps for evidence since this. And this short presentation will be introducing one that is currently being developed by myself and colleagues Terry, Nicola, and Alex. This is meta impact, and it aims to enable researchers to design future studies whilst considering the totality of current evidence. As with the other CSU apps, many people work on these projects, and their contributions are always appreciated. Here, I'd like to give extra thanks to Alex for providing some of the content for this presentation. So in research as a designing new studies, a balancing act is going on when considering how many people to recruit. If there are too few people, then it may be that the study is unable to detect and affect, even if there is one, due to a lack of power in the study. If there are too many people recruited, in other words, the study could have detected and affect with less people, then some participants would have undergone treatment unnecessarily. Obviously, this is even worse when the treatment assigned was inferior. Both scenarios are wasteful and unethical and should be avoided where possible. In the UK, before a treatment is offered to the public, governing bodies must approve them. To aid decision-making parties, it's common practice for systematic reviews to be presented where all the relevant evidence has been found and combined to give an overall picture. When there is a quantitative outcome of interest, a meta-analysis is also often considered to review the evidence. With this in mind, there exists an ideology that instead of powering new trials in isolation, researchers should anticipate their new study being added to the current body of evidence and therefore power it to influence an updated meta-analysis with said study included. Let's go through what this looks like statistically. Firstly, we need to conduct a standard meta-analysis of the current evidence in question. In this example, we have a meta-analysis of six studies with a pooled odds ratio of 0.8. Then using parameters from the meta-analysis, a new study is simulated. This involves sampling a new study effect from a distribution defined by the meta-analysis. Here, this is minus 0.15 on the log odd scale. Then by setting the probability of an event in the control arm at an estimated value, the probability of an event in the treatment arm can be derived. Here, this is 0.18. Finally, using the binomial distribution here as we're using binary data combined with the probability parameters we've derived, we can simulate the number of events in each arm for a set sample size. Here, the sample size is 200 in each arm. And so we get 38 events in the control arm and 35 events in the treatment arm. The next step is to simply redo the meta-analysis but with the newly simulated study included and then noting down whether the resulting meta-analysis gave the desired effect or not. To then estimate the power of the set sample size, 400 here, one simply repeats steps two and three, i.e. simulating a new study and updating the meta-analysis a large number of times. The power is then calculated as the proportion of meta-analyses that gave the desired effect. In this example, we were looking for updated meta-analyses that gave a traditionally significant p-value of which there were 304 out of the 1,000 iterations, giving a power of 30.4%. The final step is to then simply adjust the sample size to achieve the desired level of power. Right, so as you can tell, this method that I've just gone through can be quite complex to understand and follow, and furthermore, it can be computationally difficult to do. Therefore, it was our aim to build a user-friendly web app to allow researchers to easily utilize these methods. As with our other web apps, we worked with the statistical software R and the Shiny package, and as a result, meta-impact was born. As some elements of the app are hard to comprehend, particularly the sample size calculator, additional features were added. Small features included information pop-ups and help wizards to take the user through each part of the calculator, but bigger, more educational features are set to be incorporated to teach the user how the method works and gives the results that are being presented. One such feature will be to incorporate the langen plot. The setup of the langen plot is similar to a funnel plot. Along the x-axis, we have the outcome, and along the y-axis is the standard error. The diamond represents the pooled effect from the meta-analysis, with lines extending to represent the 95% predictive interval. That's the interval for where we'd expect new study effects to be. For every possible position on that plot, i.e. for any new study with a certain odds ratio and standard error, that respective area is shaded according to how that new study would affect the updated meta-analysis. So here, the darker shaded area shows where new studies would cause the meta-analysis to give a significant pooled effect and the white areas otherwise. Now let's consider all these tiny individual points that you can see on this plot. They each represent a simulated study of the set sample size being tested, according to the method we've been describing. Therefore, one can visually see how the power has been calculated. It's simply the proportion of those dots that are in the shaded area. Now at the time of recording, this app is still in development. Currently, it can run a frequentist meta-analysis and calculate the power of a new study of a certain sample size from that meta-analysis. Furthermore, the app can produce power plots for multiple sample sizes and for fixed or random effects meta-analysis. Further features that we are aiming to complete before releasing the first version of Meta Impact include adding functionality to run the meta-analysis within a Bayesian framework and to incorporate an interactive version of the Langham plot described previously. Looking further towards the future, a later version may consider how these methods can be extended to network meta-analysis. Once a first version is ready to roll, we also plan to assess the potential benefit of Meta Impact by utilizing past reviews. Specifically, this will involve taking all the studies in that past review but excluding the most recent study added and then plugging that all into Meta Impact. As described, Meta Impact will then simulate new studies based on the edited review. We do meta-analyses with the simulated studies included and obtain new results. An optimal sample size of a new study can then be estimated, such that it influences the meta-analysis. This new sample size along with the new pooled effect will then be compared with the original review and the study that was removed to assess whether using Meta Impact would have been more beneficial. Once released, we believe that Meta Impact has the potential to benefit patients and research by encouraging ethical sample sizes and reducing wasteful trials. So please do keep an eye out for when it's released. You should be able to find out when by following our Twitter or GitHub. Thank you. And thanks very much, Clarice. That's it for this session. We hope that you've enjoyed it as much as we have. Our presenters should be online for the next couple of hours and days to answer your questions. So do please keep them coming in by Twitter. Thanks very much. Bye.