 Hello and welcome to ESMAR COMP 2022 and the review processes from A to Z part two session. So if you missed part one, you can head over to our YouTube page and watch the recordings of it. This session will be live streamed to YouTube and the individual presentations have been pre-recorded and published there as well. Subtitles have been verified and can be auto-translated. For those individual talks, automatic subtitles will be available shortly for this live stream. If you have questions for our presenters, you can ask them via the presenters individual tweet from the at ES hackathon Twitter account, which is on the slide that you can see in front of you. You can ask questions that way. Presenters may have time after their talks to answer some of the questions that you ask or at the end of the session if we have time. We will endeavor to answer all questions soon after the event. We would like to draw your attention to our code of conduct available on the ESMAR website at esmarconf.github.io. So now I would like to welcome our first speaker, Steph Zimson from the University of Washington. Steph, you can take it away. Hello. I'm Steph Zimson. I'm a systematic reviewer at the Institute for Health Metrics and Evaluation at the University of Washington in Seattle. I'm here to talk to you today about automating data cleaning and documentation of systematic review-extracted data using an interactive R Markdown notebook. My institute takes in data from a lot of different kinds of sources to inform our modeling. We have a very large set of coherent, detailed models covering many years, locations, diseases, conditions, the global burden of disease project, we've got other global scale health projects going as well. We do about 40 systematic reviews a year. These take in peer reviewed, primary scientific lit and fit them into our research databases so they can inform those models. A quirk of our institute is that the extracted databases usually are handed off to a different person from the reviewer to the report analysis and modeling. That lets data set clean up to fall through the cracks and get left off at the handoff. So we've got a problem with that. The data extracted by hand are inherently messy and patchy and inconsistent, uploading messy data wastes of computer infrastructure, analyst time, reviewer effort and getting those rows that never make it in. And if the data set ends up biased or has incorrect numbers in it, that can lead to unrealistic models. Systematic approach to cleaning these checklists once we have decided the extractor should clean it beforehand off. The next systematic step is to do checklists, but that can be very tedious to go through and excel. Making the computers do the checks, save some of that tedium and extra usefulness out of that is that the code is reusable. And one more step to make it even easier and our markdown notebook allows even easier code based checks and our markdown notebook acts like a lab notebook where you can take notes and paste in your experiment or actually run your code, put in figures, all those kinds of things, detailed it and print it out as a copy of what you did. So you get the documentation out of it as well. I have a repo holding all of the code that I'm presenting here today, and that will be available post that again at the end. I want to talk about the code in the automating the cleanup. So the previous iteration that runs, we had a script that launched functions with a configurable set of inputs just in R, and that works great. But it's a little fiddly to get it configured. What I'm working on now is this interactive notebook, and it doesn't run yet. So to demo today, I'm going to show you the example data set I came up with, give you a live demo of the working code and then a narrative tour of the notebook. So my test data set, I won't go into the details, but it's got a lot of different kinds of fields and all kinds of different areas. It's not a huge extraction, but it would be tedious enough to do this by hand. But we have an internal policy to not share unpublished data. So I needed to find an example data set to share with you. Infectious to these data observatory to the rescue there, Institute at University of Oxford in the UK. They have several published data sets from scientific, sorry, from systematic reviews. One of them is soil transmitted helmets, such as hookworms. That is like the data, it's one of the causes we extract in our neglected tropical diseases team. And from what I can see, the data use policy says I can use them this way. So there's a script to download their data. Prep a data set similar to mine with it so that the test will work fine. And I've got that script on the repo. So here is a glimpse at this example data set. I'm selecting some fields that I will talk about later. These ones are going into a unique key in a minute, but it gives you an idea. The next step is to show you the working code based using this example file. This code came out of a final project for a class I took last term at Health Metric Sciences in here at UW with fabulous collaborators Rose Bender and Ally Eastus stellar graduate students in our our department or Institute. And the three files I'm going to talk about next. The three files are here. Here we are my RStudio document. I've got a read me file. We've got the C report parent file and we've got a configurable file as well. We've got have I'm showing a little bit of the check functions as well. I won't get into those, but they feed into this. So the read me shows you what it's about. And how it works. Gives you broad instructions on setting up the config script that collects all of your inputs in one place and saves it as a accessible, a set of arguments accessible to the parent script. And the parent script runs them and calls those child scripts over here. It saves everything out as a report of what rose failed or which tests with which columns. So we'll talk about those in a moment. Here is the parent script. With where users update to to. File paths, everything else is either the. The function scripts. Or or the. Details from the config file or the data. The input data. And the rest of it just runs. Here's the config file. I've set it up already to match today's. Today's demo, so I'm just going to run it, but you start out defining the source directory. Yeah, let's watch that instant. And the path to the input data. Passed the output route and directory and create a whole new directory out of that with a internal folder dated with today so that you don't get confused and get things stacking up too much on top of each other. Here are the checks themselves. These are the inputs arguments to the. Different checks for checks for missing this checks, a check for duplicates, a check for valid values within a column. So the. Missing this check takes individual columns. Make sure there's no hole in the data where you don't want one. The duplicate check. Can catnates of your input columns together to make a unique key to say, are there any rows that have the exact same combination of this? And the validation check is more complex. It has. Column name, a logical operator and logical conditions to be met. So if. We want the minimum age to be less than 19 in this example. If it's not less than 19, it will flag it for us. So how far we got? We got the output directory. Let's start. Writing down the recording the entering the. Criteria for the validation checks. And then this saves all of these inputs as an RDS file elsewhere, which will be called again by the. Parent file again, the parent file has directions. It says just update these two items. So there they are. And then we can run all the other functions. So here we are. Collecting the list of functions to run and sourcing them. This is the call that gets them to actually be a thing to be able to be called upon. Learn some packages. Here's where we import the arguments from that RDS file that we saved out. Here's the data table. Sorry, here's the input data. Loading that up. There we are. We've got observations and columns, observation columns, observation rows and field columns. And that is everything. The rest of it is just running what we've already asked for. So here's the missing checklist check. The one observation with missing values in this column in this row. Here's the duplicate list. This is handy when you've fixed everything, you're going to get answers that say, no duplicates found or no other errors found. Finally, the last one, 10 observations that don't meet this last criteria. And then we can save all this. We get out of this A, repeats it all over again, but we get a print and it tells us that our output saved. So we have documentation on which rows need which fixes. Super, super useful. It's a little crude as you see from the output there, but it is still going to be useful. So that worked. Let's look at how far we've got with turning that into an armwork down notebook to make it even easier. Got a file name here for you. I do have to give credit to people who have helped me get this far. Not quite everything I know is from Jenny Bryan, but the book to help set up your projects and your studio and your get and everything working nicely together. Super useful as is the code chunk options from you easy and all their work down help. Here is my our notebook, my our markdown notebook in our studio with the usual YAML output is set up to be notebook has a standard set up code trunk. Well, some packages at the top. And then instead of a. Read me. I've got that here in the notebook of what the state what it does. This is very similar to which is reformatted. Same thing as we just watched. I'm calling this section form to fill in instead for each of these sections. I've got a. I've got narrative instructions and a format syntax formatted example, including there's my one I just showed you in the other arm. They don't are set up. I've got some places for feedback. Commented out right now, so I don't get error messages with instructions. Some of this has instructions of there's nothing for you to do here. But again, more feedback. Then I've got the three checks for missing this. Duplicates and valid values. And each of these has an explanation, an input section and output to expect and then directions. Two directions for this one, because you wanted to find what's missing. You can add to the previous stuff just by adding another four in case you have the numerical null. And this is still part of looking at missing this. Which columns of check. Again, formatted blanks and formatted examples. Here's the duplicates again. This one, more explanation inputs, results, directions. Formatted blank, formatted examples. And the more complex one for checking for valid values with explanation, inputs, results, directions. Formatted examples. We've got the end of the interactive part clearly marked. But also here's some information for you. Find your report. At here, this part says, still run the code chunks. But what the cleaning steps are. So cleaning steps are automatically done. This is still formatted, just like the other one. I will do some re-factoring. But it's still reminding people to run the code checks, source the functions, run the functions, rewrite the report. Same thing, easier for a lot of people to read and manipulate. Let me go over again what I showed you. So if you have a problem with no cleanup at all, you've got fewer usable rows of data, you have a waste of time, among other things. We don't need that. Checklist is a systematic first approach. The configurable script takes away a lot of tedium and potential for a human error of just missing things. That high attention, lower reward, low intellectual reward work is really draining. Haters of data make them do it. So our markdown notebook is easier to use for a lot of our systematic reviewers, including me to some extent, who are less code provision. Again, we've only got so much of that. Attention is fair. Let's save the cognitive effort for doing systematic reviews. Because it's easier, it may be likely to be adopted wider across the Institute, which would lead to a bigger return on investment, more and faster systematic reviews, fewer drop rows, better data. And of course, you can automate those reports about what the heck happened, how did you clean this? Thanks for listening. It should be about 4.15 my time in Seattle when you're watching this live. So I will try to be around to answer text questions, or yeah, text questions, but I may not be and I'll get to them when I can. And again, the code repos to you. Thank you for your time. Thank you, Steph, for a great presentation. And I'm delighted to say she is around for questions at the end. So we'll get to tap you for resources then. We're going to move on to the next presentation by Karis Wong, who's at the University of Edinburgh. So Karis, you can take it away. Hello, my name is Karis Wong. I'm a PhD student, neurology registrar, and clinical trial fellow, based at the University of Edinburgh. Thank you very much for asking me to speak today about my project, which is developing a data-driven framework to identify, evaluate, and practice candidate drugs for motor and neuron disease clinical trials. As a bit of a background, motor and neuron disease, or MND, is an incurable and fatal neurodegenerative disease. We only have one approved drug in the UK called Rilazol, and this was actually approved back in 1995, and that prolongs life by an average of two to three months only. This is despite many promising preclinical studies and clinical trials in the meantime. So essentially, after 26 years, we do not have any new drugs in the UK. There are several other drugs like Adaravone approved elsewhere, but most of this have very little or modest survival benefit at best. We did a systematic review to try and understand the challenges in MND trials and to learn what we can do better. We broke down the challenges in the two broad categories. The first being the difficulties in designing and delivering trials in this population, and secondly, the limitations in our understanding of disease biology, which also contributes to challenges in how we select drugs to take forward to clinical trial. However, there are promising advances in both of these areas in trial design, we have the MND smart trial up and running in Edinburgh and across Tafina, the sites in the UK. Smart stands for systematic multi-arm adaptive randomized trial. This is one of several adaptive platform trials in MND, including the HEALY ALS trial in the States and TRICELL's platform trial plan in Europe. All of these trials are multi-arm trials, meaning they compare multiple drug arms against a placebo arm or dummy drug arm at any point of time. A shared placebo arm means we need a smaller sample size overall to get a definitive answer on whether drugs work. They are also adaptive, meaning it's set analysis stages, drugs which are not effective are identified early and drop, and drugs which are effective are taken true from stage to stage, including from phase two to phase three seamlessly. Using this design, we're also able to add new drugs to the pipeline to be tested, rather than starting standalone trials from scratch. We therefore now have a way of testing more drugs quicker in a much more efficient way. So the question now really is, how do we best select drugs to take forward to clinical trial? There are two ways of thinking about this. So one is push trials where we say the evidence for a drug saved from preclinical mechanistic studies is so good that we must test it in a clinical trial. Historically speaking though, MND trials have relied on small and often not reproducible animal studies to inform drug selection, and this has not been very successful to date. And this is also likely owing to the complicated, the complex disease biology which we have not fully understand. The other way to think about trials, drug selection for trials is pool trials where we say we have a horrible disease and we need to try something. So what's the best drug we have available to try? To decide on this, we could look at the entirety of the evidence base, including studying different types of data rather than solely relying on small animal studies. So this is the framework that we developed for this purpose. We use different domains of data, showing here on the boxes on the left, to identify, evaluate and prioritize drugs to take forward to clinical trial. So this is a modular framework, which means in future, if we have other domains that become relevant and available, we are able to add them on. Currently, these are the domains that we are using. So first, we've got the published literature, which is informed by realizer or repurposing living systematic review. This is a three-stage systematic review taking into account the clinical literature of MND and other neurodegenerative diseases, which may share similar pathways, as well as animal and cell study literature. We also use data from experimental drug screening, which are from my colleagues at the University of Edinburgh and they are working on different screening methods and models, including using stem cells derived from people with MND. We use data from these two domains, as well as other data, for example, from the target ALS, RNA-seq data, to do a pathway and network analysis. So thereby identifying pathways and networks of interest, which we can then map to drugs of interest. We also mine drug and trial databases for data on safety, visibility, and pharmacological data. We also harness expert opinion. So we will incorporate all of these data from different streams and to generate an integrated candidate drug list. We use an interactive shiny app called the ICANN MND shiny app to visualize this and filter drugs according to overlapping categories of interests. This can help trialists prioritize which candidate drugs they should evaluate in more detail. For drugs which they have prioritized for further evidence generation, synthesis, and reporting, we would then produce living evidence summaries. These are summary of the data across the different domains, which we keep continually updated. So at time points in keeping with trial adaptation, the trialists will have access to current curated content for each prioritized drug. We report this using a shiny app called the MND source CT shiny app. For the realizer component, we do a three-part machine learning assisted systematic review. So first we have a systematic review of the clinical studies in MND and other neurodegenerative diseases, which may share similar pathways, animal and vivo studies in MND, as well as human stem cell studies in MND. Each of these takes for a starting point, updating the automated living search of PubMed using an API based and serve or the systematic review facility platform to retrieve new publications. Serve is a free-to-use, bespoke web-based application for systematic reviews developed by the camera and its group. We then use a machine learning algorithm based at the epicenter to screen citations for inclusion. Next, using R, we run regular expressions on the included publications to identify the drug and disease study. We then generate a table listing all the drugs and the number of publication for each of the disease. We then run this against a second algorithm and R to filter drugs by a logic. So taking forward drugs, which have been described in at least one clinical publication in MND or where they have been described in clinical publications in two or more other diseases of interest. The trialers filters this list further based on biological plausibility, safety and feasibility. We then annotate and extract data using serve for the three reviews for all the included papers for prioritized drugs. For the clinical review, we score each drug based on drug efficacy, safety, study size and quality of studies. To give you an idea of the scale of this review, the top half of this slide shows the systematic review component completed in 2017, which informed the first two arms of MND Smart. As you can see, we have a large corpus, more than 40,000 publications across the reviews. And from there, we have identified 146 drugs which we eventually narrowed down to 22 drugs which have favorable clinical and preclinical data. So the reviews form a robust evidence base to inform expert panel discussions on drug selection. However, it is by no means a small undertaking given its scale. So to make it more feasible for the current iteration, we now incorporate automation techniques to our current workflow shown on the bottom half of the slide. The components incorporating automation are color coded here in pink. As mentioned earlier, we use serve for annotation and data extraction which enables efficient crowdsourcing. We currently have a group of more than 60 reviewers. We use workflows in R for data analysis, scoring and R Shiny for visualization. Next, I'll show you a demo for our Shiny app to identify candidate drugs. Here, we can choose categories of interest. For example, drugs listed in the drugs screening library, drugs across the blood brain barrier, drugs which shows a signal in any of our domains of interest and drugs listed in the British national formulary. Depending on which categories you choose, the app will plot a EULA plot to show you where, how many drugs lie in each category. For drugs meeting all of the categories that you've selected, this will be tabulated in the table below. So this is one way which the expert panel can use to prioritize which drugs or which groups of drugs should be evaluated in more detail. For prioritize drugs, we present a living evidence summary in this Shiny app called the M&D Systematic Online Living Evidence Summary for Clinical Trials or M&D Source CT for short. First, we have our drug table to summarize the systematic review component. I apologize for redacting the drug memes here. As I mentioned earlier, we score each drug according to efficacy, safety, study size and quality of studies as well as number of publications. We rank each drug based on their scores. We're also able to summarize the animal and vivo and in vitro survival and cell death data, respectively, where available. We also use interactive heat maps to visualize all of these data. To provide an idea of what the current data are based on, we have a living Prisma diagram for each of the reviews, showing how many publications are identified, included and annotated to what degree. We also provide an overview of the quality of studies included for the animal and vivo review as shown here. For the clinical review, we're able to provide an interactive Sumbers plot to give an overview of the studies included. So this shows the drugs and disease study, type of study design and for interventional studies what phase the studies are in. For each drug, we are able to generate a drug CV with data across the different domains summarized across the tabs shown here. For example, this is the clinical summary for pyroglythazone. For the animal and vivo review, we're able to select outcome of interest and animal model of interest and the shiny apple plot, a forest plot for that outcome and model for the drug of interest in this area below. We're also able to tablet and summarize the data from the publication that meets the criteria and the table below. We also also link up for clinical trials. We retrieve data from the clinical trials.gov API to list the MND clinical trials for the selected drug. So this includes planned trials, previous trials and ongoing trials. For pathway analysis, we have some visualizations here and string enrichment analysis as shown. We're also able to generate all of these information in a timestamp PDF format using RM up-down. So this is very useful for record keeping, anticipating future discussions with trial sponsors and drug licensing authorities. Apologies for not sharing the full version at present as per the request from our trialists as we are coming to a trial adaptation epoch. We do, however, have a demo version, which doesn't have all the bells and whistles, but should give you a rough idea and I'll be very interested to hear any feedback and suggestions. That's all from me today. Thank you very much for listening and thanks to my supervisory team, Professor Smokin McLeod, Professor Siddharth and Chandra, Professor New Carragher, to the Comraders Group, the MND Smart Drug Screening Group and the Realizer MND Consortium. Thank you very much for listening. Thank you, Charis, and we're looking forward to seeing the full version live when it's available, obviously. So now we're gonna turn over to our next and final presenter before we head off for questions. Vincent Ramirez, you can take it away. And he's, sorry, Vincent is from the University of California, Merced. Hi, my name is Vicente Ramirez, I am a PhD candidate at the University of California, Merced in the Department of Public Health. I'm presenting a talk titled, Sniffing through the Evidence, leveraging Shiny to conduct meta-analysis on COVID-19 and smell-offs. A quick recap is that the SARS coronavirus 2 appeared in late 2019 and began to spread globally in early 2020. With the start of the global pandemic, countries worldwide declared shutdowns and mandatory self-isolations. Reports that run this time began to emerge of strange symptoms occurring alongside of this virus. These symptoms often occurred in other respiratory viruses. There was a magnitude to which they occurred in COVID-19 patients that was causing for alarmist viewer rain. A lot of lights were shined on a spell loss during this time. Spell loss occurred in other respiratory viruses but it was not as prevalent as it was in COVID-19 patients and it was occurring without nasal congestion, which was a lot. We can talk about the chemical senses but let's not get too deep into them. I think the most important thing to realize here is the way that we describe our senses. It's a little bit different than what the biological definition is. For instance, we know that flavor is composed of opaction, gestation, and chemistesis. That is our sense of smell, our ability to taste and the general chemical sense that makes up things like spiciness or menthol and cooling or numbing. We'll say something like that tastes really spicy or we'll have a dessert that has a little too much orange zest in it. Say, oh, that's too citrusy for me. It tastes too citrusy. When I say it tastes too citrusy or it tastes too spicy, I'm using both of those terms incorrectly. But it is common in the English language to describe both of them as taste because what we describe as flavor is taste. And so this does not match with the biological definition and with the medical definition. It's important to consider this when we describe a patient self-reporting their symptoms and saying, I lost my sense of taste. It could be that they've lost their sense of smell, a major component of flavor. And they're mistaken as taste loss. What we do know is that COVID-19 affects all three of these senses. When we start out our analysis, we aim to examine two main points. What is the burden of COVID-19 on the senses? It's typically on smell. And is there a difference in how researchers are measuring sensory loss? We collected papers from research databases using a search strategy consisting of keywords like anosmia, smell loss, SARS-CoV-2, and COVID-19. We set out specific criteria used to eliminate papers in our analysis. This was, was there data available? Was a PCR test used to diagnose COVID-19? And how did they recruit participants? Two of our co-authors then qualitatively rated each study based on criteria which they measure the level of bias for each study. This risk of bias assessment was done to make sure that studies were representative of the populations we wished to measure and that there wasn't any inherent bias in the methodology. We found that 634 studies use direct measures or objective measures to measure smell loss. These objective measures can include things like smiling a series of chemicals and identifying them. They're commonplace in diagnosing a factory disorder and are used in the clinical setting and the research setting. 28 out of 34 studies use more subjective measures. This includes self-report, questionnaires, phone interviews, and surveys. Our random effects meta-analysis gave a poor estimate of 77%. That is, three in four patients with COVID-19 will lose their sense of smell. This is for the direct measure. For the more subjective measure, we have 44%. This is a huge discrepancy between the two. But what is clear is that COVID-19 is causing smell loss and magnitudes that are several times higher than seen with other respiratory viruses. We can take the direct measure at face value. It's used to diagnose to factory disorders in other settings. We assume that 77% of people are losing their sense of smell when infected with COVID-19. And we can also assume that only two thirds of them that are suffering from COVID-19 recognize their symptoms. That is where this 44% comes into play. Smell loss is rather common in COVID-19. This can be seen in our meta-analysis and throughout the literature. It's less common in other respiratory viruses. It exists in influenza, but nowhere near at this level. In fact, there's a paper here that has the title. Recent smell loss is the best projector of COVID-19 among individuals with recent respiratory symptoms. This alludes to the fact that about a lot of smell screens that can be fast, easy, cheap, and accurate can quickly be deployed in airports at work concerts. It might be able to catch more COVID-19 than the current preventative measures, like looking at temperature. It's also worth noting that the level of symptoms has changed between variants. Here we see a report made in December, 2021 comparing the Delta and the Omicron variant. We see that there's a significant difference between reporting the loss of taste and loss of smell with the Omicron variant reporting much lower levels. As we were conducting our meta-analysis, we realized that papers were being released almost daily. The literature was moving fast, and so we needed to move with it. We decided to make a web dashboard, which can constantly be updated and the findings can be disseminated to our research community. We needed to be cost efficient and not require too many technical skills, as my technical skills were not as great as they are now. So let's knock down using AWS services or using a personal server. At the time, these skills that were needed to do this were going to be a hurdle. In fact, this was my first dive into Chinese applications. We anticipated hundreds of visitors per day in the beginning, and this didn't in fact happen. So it had to be able to scale up. We quickly found a solution. Flex dashboard provided an easy to use solution for making web dashboards in art, and they can be implemented in Chinese so that they're interactive. This is done without a learning curve. In fact, it's like writing on any other R Markdown file, which I was already familiar with. Google Sheets and the API provided by Google provide an easy to manage data solution, which we can call on to constantly update our data. This is a nice solution because we can easily edit our data through a spreadsheet. We were editing our data almost daily. Shiny apps.io also provided an easy to use solution to host the application. It was both cost efficient and required almost no technical skills to get it running. We were also super grateful to R Studio, as they provided us with over a year of hosting for free because we were hosting a coronavirus based web application. This was really helpful considering that I'm a graduate student. Before we jumped into the Google Sheet and the COVID-19 dashboard, I want to give acknowledgement to our co-authors. Danielle Reed is the PI of the study and the associate director of the Monad Chemical Census Center. Mackenzie Hanham is a postdoc in Danielle Reed's lab as co-first author of this paper. Sarah Lipson, our artist scholar, Riley Kerriman, Sarah Marks, and on the next page, Riley Koch helped to review papers, maintain our database and really carry a lot of the groundwork that was done. The study would not have been done without the constant and daily work of these members. Paulie Joseph is a researcher and clinician who has helped us understand measure of the clinical implications here. And Kailu Lin is a co-author who helped review and validate the analysis prior to publication. We collected relevant information on the number of cases, the number of subjects, the subgroup, whether or not the study was using objective measures or subjective measures, and then we took down notes on each study. When a new study was added, it was added as a new row. If a study was to be included, it needed to be checked by two of our co-authors. Here we see checked final denotes this. If a study was excluded, it needed to be checked by two of our co-authors as well. That is, here we see a study was excluded because the methods accrued patients with previous olfactory disorders. Here, a study was excluded because there was no data available. We can take a look at our dashboard. The homepage gives a quick summary on what could be found on our dashboard. Each of these icons here is described and corresponds to the tabs up here. We use the meta package in order to visualize and run our meta analysis. We can see that there's a difference between our 34 study analysis and this current analysis. That is that the random effects estimate for objective measures has decreased down to 67%. And for subjective, it's decreased down to 40%. We can also visualize where studies are coming from. We've used GGplot and GGplotly, which uses the plotly library in order to create an interactive map. We can zoom in, we can hover over each of these. We see that an abundance of studies comes from Italy, US, Turkey, India, France, Spain, Germany, China. We also used HTML widgets in order to include interactive tables. Here we have a table of each of our included studies, which we can sort by the number of subjects included or whether or not the study was objective or subjective. Here we have a list of our objective studies. We include DOI so that users of our dashboard can quickly access these studies. Excluded studies are also important. Because they were excluded, it's just because they didn't have data or because there was not a COVID test. They are still important for different topics at hand. So we opted to include them. We allow readers to understand why each study was excluded and the DOI, so the viewers can access them. We have a source code for the original analysis on GitHub and we include our contact information as well as a little bit of a background on each of us. Obviously we give the user of this dashboard the ability to share to social media platforms. Here we have Twitter, Facebook, Google+, LinkedIn and Pinterest. Thanks Vincent for sharing with us that really interesting project and ways your team address it. I know we'll have some questions from Twitter around your integration of those different platforms. So thank you. We do have time for some questions now so I'm going to encourage everyone to submit questions as you think of them and of course if you think of them later we can answer them later. I did wanna start with a question for Steph from Matt Granger on Twitter. And this is actually a question that I was really thinking about because I think the work that you presented really addresses a key issue in this field which is what do you do with such messy data especially if you're kind of taking over with the evidence synthesis stage and working with someone else's data and trying to figure out how to make it usable without too many headaches. And so Matt asked the question what would help if anything at an earlier stage to sidestep the need for so much data cleaning so that there isn't that headache kind of at the end? There's things that we do such as putting rules on which column can have a null value and which one can't. And there's data validation that you can build into the Excel file that we use to allow certain values or not but you can override those accidentally or on purpose. One of the things that makes the messy is that as you're doing the extraction you learn about your data set, the data available, what people are reporting and so you might come up with new columns that you need to fill in or you may figure out oh we need to change how we're doing this. And the important thing is to go back and fill those in and that it's hard to go backwards. It's much easier to just keep going forward to keep reporting to whoever you're reporting to that I got this many more done but you have to go back and whether you're doing it at the end in a massive cleaning push when you get a list of the rows that need help or if you do it when, oh I just had this meeting with the person who's gonna be using the data and we changed our mind about how we're handling this one inclusion or exclusion criterion or something. Go back and fix it. Take the time to go back and fix it but the real answer is it takes time, so. Yeah and so yeah that definitely pulls in the theme of what Terry talked about about having a good team and really having good leadership of that team and good oversight so that folks feel like they can have those conversations as a review is going on and that data can kind of be cleaned in an iterative process. And good communication within the team so that it's perfectly fine to say oh wait I gotta take some hours and go back and we just made this decision. I gotta go apply it retroactively to the previous rows. Right. Yeah I think one of the, yeah well and I think as a leader of a project like this emphasizing to your team members that you're learning as well in this process in that it's okay to make mistakes and you go back and fix them and we talk about them and we move forward. We have a procedure that includes after you've extracted 10 papers go have a meeting with the modeler and talk about it and learn from it and then do it again after 30 papers. So you're still learning by 30 papers of what the data landscape is like so. Yeah definitely an iterative process for sure. Thank you Steph. Keras we have a question for you actually about your experience and what it's like kind of working in a topic where transparency may not always be feasible or may not always be encouraged. And I should also mention like you all as panelists can jump into it with questions and further discussion. I don't have to kind of take over or so. Yeah I agree that's tricky with it intellectual property being an issue with clinical trials. So we are still working on it. So certainly from my point of view and from my primary supervisor point of view we are hoping to build on a policy where we say the work is funded by from the funding part from the trial. So I think it is reasonable that they have steps of the data but at some point there will be too much data for us to be using as a group and we need to be releasing it to maximize the impact of what we have accumulated. And I think that's right. So we are hoping in future to say we would release fully the data that we have perhaps a bit of a lag so perhaps looking at six months a year of a lag to what we have accumulated. So far that hasn't been set in stone yet but that's necessarily what we're working towards too. But I think from a more general perspective it is a challenging area I think especially when we are working for us we're working across different groups and there's certain data from other groups which is given to us for internal purposes rather than external. So just making sure that good agreements between the different groups working together of what is shareable and what is not and at what stage I'll be planning on sharing the data. So we are striving to make it more open and transparent as I said to maximize the impact of what we have. Yeah, that makes a lot of sense and it sounds like kind of negotiating and navigating that ahead of time before you've collected the data can be a key part of making sure that that's possible. Our step I noticed you guys kind of nodding along did you have something to add to that? That's question about data transfer? No. Okay, great. Well, I have another question. This one is for Vincent. Really around your experience with learning these tools and using them to bridge kind of the learning barrier from art to shiny specifically kind of what your experience has been like with that process? Yeah, so I'll first point out my experience probably is not unique. There's a million tools out there and there's a growing list of tools. All of them are, they all have their unique capabilities, unique strengths. Yeah, it's, I mean, I jumped in. So, I thought that the art community in itself does a wonderful job at documenting these tools and providing these tools. I think that there's, I'm repeating myself, but really, yeah, there's just a myriad of tools that are there. You can do a lot of creative things with these tools. This was, this is a, in the grander scheme of things, something rather small that we did, but it shows really these different sets of tools that can be kind of put together to create something new. In our experience, we were faced with a problem where data was coming out almost daily on a problem that was really, really new. This was four months into lockdown. But I think that the same application can be done on a problem that data is not coming out very often where reports come out once or twice a year. It's a very slow moving problem. And so an update to the broader community can be useful as well. And knowing that there's art tools out there to go and facilitate this very easily, even for people with very little technical skills, very little art background is just wonderful. I think Emily has had a temporary glitch with her broadband. So, while she unfreezes, oh, she's back. I'll hand back over to you, Emily. Thanks, Neil. Sorry, I panicked a bit there. So, I missed, unfortunately, Vincent, I missed the end of your response there. But I think one of the things you were highlighting is something that Terry also mentioned in her opening. And that's really just this need to be really creative and think outside the box with these tools. And I find in all of these projects that you all have talked about in the earlier session, it's identifying a gap. How can we use what we already have to address the gap and to push the field further and then to do the work and share that with folks? And that's just so incredible. You learn a lot just by doing that and by sharing that work, that hard work and efforts. So, I don't know if we had other questions that have come through when I kind of dropped my, let's see, when I dropped my internet, let's see. Neil had a question about pipelines, which relates to that point is really like, what can we do as a community, as an evidence synthesis community to make it easier to fit together all these evidence synthesis tools to start to connect the dots between these different programs, kind of break down silos between disciplines and between packages and programs. So anybody can answer that. Well, in a sense, isn't that what we're doing here? I mean, I come from public health and I'm a geneticist. I think there's ecologists here and sociologists and everything. And we're kind of facilitating that right now. I think that this type of discourse is really where we get to it. I was grateful enough to have been contacted by one of the organizers to present on this topic. It was a little intimidating to think that everyone here had a math background, for instance, or they had a background in evidence synthesis. I just come from out here, but I think that this type of discourse and this type of events is really where, what is gonna address that? From my point of view, so I agree with you in terms of my background as I'm a clinician and I've not been into evidence synthesis or even using R before I started my PhD. And actually now there's so many tools that are available, but as you say, they're kind of in separate areas. Learning R is a great way of just harnessing the power of all of these different tools. So I'm working also with the camaradas group, which is based in Edinburgh, but they're given a network and they are working at developing systematic review tools and evidence synthesis tools. So Karine mentioned a systematic review facility platform, which you can use. Anybody can use it and you don't need to be in a specific view to use it. And it's got quite a lot of functionality already embedded in it, but the longer term ambition is to try to incorporate some of the tools that we mentioned earlier. So Karine, some of that we are doing from the back end, but in the future, we're hoping that things like machine learning algorithms or citation screening might be in it or going from a systematic review to visualization coming out of your Shiny app, perhaps in the future, that will be a workflow that one does not need to know that all the underlying programming in the back end, but perhaps that would be all embedded in surf. So I know that that's a longer term ambition, but it is ambitious. So we shall see. That would be incredible. Yeah, thanks for sharing that idea. That sounds amazing. Well, I'm looking at the time. It looks like we're just about nearing the end of our session. I don't see any other questions. So I think what we may do is just direct people to ask questions on Twitter, interact with the videos. They'll be uploaded shortly. And thank you all for presenting and for being willing to discuss your work and just for doing this work and sharing it with all of us. So we look forward to seeing everyone.