 Hello and welcome to Esmar Combs 2022 and the review processes from A to Z part one session. We're delighted to have you here today. As always the session is being live streamed to YouTube and the individual presentations have been pre-recorded and published there as well. Subtitles have been verified and can be auto translated for those individual talks and automatic subtitles will be available shortly for the live stream. If you have questions for our presenters you can ask them via the presenters individual tweet from the at ES hackathon Twitter account and you can see that in our feed and below on the slide. Presenters may have time after their talks to answer some of the questions or at the end of the session if time allows. We will endeavor to answer all questions soon after the event. We would also like to draw your attention to our code of conduct available on the ES Marcon website at esmarcongithub.io. And so I'm really delighted to welcome our first speaker who is Bronwyn Hunter from the University of Sussex. Bronwyn over to you. Hello and welcome to my talk for Esmar Combs 2022. I'm Bronwyn Hunter and I'm a PhD student at the University of Sussex and today I'm going to be presenting and demonstrating how we can use transfer learning to facilitate rapid text classification in our. We're on the synthesis of different data sources such as social media and also academic literature to understand large scale patterns of wildlife exploitation. So why use automated text classification? So identifying relevant data sources to include in evidence synthesis particularly when looking at different sources such as academic literature and grey literature can be one of the most time consuming stages. And given increasing publication rates this could create barriers to evidence synthesis. As a result researchers are increasingly using automated text classification methods often based on machine learning to conduct the article screening stage. In addition some of the more state of the art machine learning algorithms can actually achieve performance comparable to manual labelling. Although machine learning is increasing in uptake some of the more advanced techniques often require specific programming skills particularly in Python. They also often require large amounts of training data which have to be labelled manually to achieve high performance. So in this presentation I'm going to be talking about how we can adapt some of these approaches that are generally used in Python to use in R and also how we can use something called transfer learning to reduce the amount of manually labelled data we need to build these text classifiers. There are many different machine learning models that we can use for text classification. There's random forest, logistic regression, naive bays, k nearest neighbor, support vector machine and then some of the deep learning base methods like neural networks and transformers. Given the number of different models available it can be difficult to know which one to use. Today I'm going to be talking particularly about transformers and hopefully I can convince you that even though these models are quite complex how we use them is actually a conceptually appealing approach. Why use transformers? To illustrate why we might want to use these more complex transformer based models I think it's first useful to think about some of the pitfalls of the other models I've mentioned. Some of the simplest text classification models such as naive bays look only at word frequency to make a decision about the topic or the sentiment of text for example. And whilst this can work quite well for longer texts and it has been successfully applied to some evidence synthesis because word order isn't taken into account these methods often fall down for shorter pieces of text and in evidence synthesis we often only have perhaps the title or the abstract of a paper. In contrast more recent text classifiers use what we call recurrent neural networks. These models are based on the logic that a word's meaning is a function of this context. In these deep learning models words are fed in sequentially and what we call the hidden state which here is represented by these circles is a function of the hidden state of the previous word. Such that the hidden state of the final word contains information from all of the words in the sequence. This sequence representation can then be fed into a classification decision. As you can see from this illustration representation of the question mark the final word in the sequence contains quite little information from the first word what. In essence these models struggle to represent relationships between the words that are far apart in a sequence and also in practice because words need to be fed in one by one they can be quite slow to train. So now we thought about some of the other models we could use what is the transformer model approach. And whilst I don't have time to go through the full architecture which is represented by the diagram on this slide I will just highlight some of the key features of this model. So firstly in contrast to the recurrent neural networks texts are fed in as a whole rather than sequentially meaning that these models are much quicker to train. The key feature though is what we call self-attention and this self-attention mechanism is part of one of these encoder blocks in the model. Essentially self-attention looks at an input sequence and decides at each step which other parts of the sequence are important. So in this example the boy is holding a blue ball. We know that holding a blue and ball are all related to each other but the word blue is not actually related to the word boy. And this self-attention mechanism is able to learn these association between the words in the text and hopefully build a more realistic representation of natural language. Now since the introduction of transformers back in 2017 there has been a proliferation of models that make use of these self-attention mechanisms to build models of natural language. One such model is called BERT and this is the one that I'll demonstrate the use of today. It was first introduced by Google and it stands for bidirectional encoder representations from transformers. So what this model does is it takes these encoder blocks from the original transformer model and it stacks them on top of each other. So the commonality between all of these models that make use of transformers is that they take this pre-training and fine-tuning approach when applied. Essentially we take these models they're pre-trained on a large body of text which helps to build a representation of natural language. BERT for example is trained via masked language modelling which is where the words in a sequence are masked and the model has to predict which words fit in the sequence as well as next sentence prediction. So once that model has been trained on a large body of text we can then adapt it for use on a range of different tasks in natural language processing one of which is text classification. In essence the learning from the pre-trained model is transferred to the application and that's why we call this transfer learning. Hopefully the previous slides have given you a little bit of understanding of transformer based language models and their key features. Now onto the important part which is how we use them in practice and whilst the models are quite complex in themselves how we use them has been made really simple by the hugging face library and this is available as a python library called transformers. So what hugging face has is a repository of these pre-trained language models. We can then take one of these models off the shelf and adapt it for whatever task we want to do. In the case of text classification we can take BERT for example we can obtain a representation of our text via the cls token from BERT and then we then add a classification layer which is what makes the decision as to whether that text is relevant or irrelevant. So say we have a collection of abstracts that we've downloaded from Web of Science for example we can take a subset label them as irrelevant or relevant based on our inclusion criteria. The training data is then used to fine tune this whole architecture and the testing data will be used to assess how well this classifier is performing. If it performs well we can then feed back in the rest of our data and use that to obtain our final set of relevant abstracts. One thing I like about this approach is that there are loads of different pre-trained models to choose from some of which have been trained on domain specific corpora. For example Cybert is a BERT based model that has been pre-trained on a body of scientific literature so that learning from the pre-training can then be transferred to the classification. How well then does this approach actually perform? Here I've shown some results from my own PhD work where I was comparing the performance of different machine learning approaches in classifying academic abstracts for relevancy and here I looked at naive phase which I mentioned earlier a simple feed-forward neural network and finally our fine-tuned BERT model. The graph on the left is showing F1 score which is a function of both precision of the model and its recall so that's how many of the actual relevant abstracts did the model classify in the final dataset. As you can see at a relatively small training size BERT achieves higher performance than the other models and particularly when we look at recall BERT is by far outperforming the other models and is actually retaining 97 percent of the relevant result relevant abstracts in the final dataset so I hope that this is a fairly convincing example of where BERT can achieve high performance without the need to label thousands and thousands of data points. Whilst hugging faces made it really easy to use these transformer models in Python there aren't any tools to make use of transformer models in R. This is where our reticulate package comes in. Reticulate allows us to use Python libraries in R and thus if we're doing other stages of evidence synthesis in R we can basically streamline our analyses. Before you get started with reticulate you need to make sure that you have Anaconda installed on your computer and this will allow us to make use of Python. You'll also need to set up a virtual environment which has either TensorFlow or PyTorch installed. The fine-tuning of some of these models also does require you to have a GPU available. Now I'm going to run through some code examples. These are just snippets actually so the full code will be available on my Github repository but here I'm just going to illustrate some of the key parts. So once we've loaded in reticulate we can then install our desired Python libraries and in this case this is our transformers library using py install. Then using the import function we can import that library into our environment. Once we have our transformers library imported we then want to use the dollar sign operator to access the models and methods within this library. So to load in our tokenizer which splits the text up into tokens or words or word pieces we're going to call our BERT tokenizer from pre-trained and then the name of the model that we want to use. Then we can also load in our model to be fine-tuned and here we're using BERT for sequence classification. So the architecture that I illustrated earlier with the BERT model and the classification head that's actually available as as a whole model that we can download and fine-tune. So the transformers library has what we call a trainer module and this is the the method that we're going to use to fine-tune our model but the first thing we need to do is set our training arguments and some of most of these I'm leaving as default values but the key ones that we want to add in ourselves is the output so that's where the final model is going to be saved the number of training epochs the batch size so the number of training examples that are loaded into the model at once and the evaluation batch size and again we're calling methods from within the transformers library using that dollar sign operator so transformers training arguments. Next we want to initialize our trainer so we're calling transformers trainer and we're initializing it with the model that we loaded in earlier the training arguments that we set in the previous slide the training and testing data sets and then the method which we're going to use to evaluate the model performance this is defined as a separate function which you can see in the full code in my GitHub. We then create the trained model by calling trainer and train and once that that model is trained we can use it to generate predictions so by using the predict function we will get a prediction for each of the classes and then by using the argmax function we'll get the class with the highest probability so those are the most important functions and hopefully you can see that by just using a few lines of code we can train a really high performance model for text classification. Thank you for watching the presentation I hope it's given you a little bit of an introduction into transformers and how we can use them for text classification in an evidence synthesis context. If you want to know more about any of the models or the concepts that I've talked about I've linked here a few blog posts and videos that provide nice introductions into these. I've also included a QR code which has a link to my GitHub repository and here you'll find the full code and also a couple of data sets that I used to generate these examples. Thanks very much. Thank you Bronwyn for a really excellent talk I'm delighted with the level of documentation that you shared with us and just just really excited to learn more about the package so we're actually going to move on to our next speaker we'll hold some questions till the end. Alina Takola from the University of Vienna will be giving our next presentation. Hi everyone welcome to my talk my name is Alina Takola and I'm a PhD student in Friedrichschilding University of Vienna in Germany. Today I'll present to you our attempt to create a synthesis of the ecological niche concept using the research weaving framework. Ecological niche is one of the most fundamental concepts in ecology it is based on the very old idea that each organism occupies a distinct place in the environment. However the study of ecological niche in practice has proven to be complicated. How do we identify a species place? Is it the species habitat? Is it its functional role in the community or is it some mathematical construct like a multi-dimensional hyper volume? Since its introduction in scientific literature in the beginning of the 20th century the concept has been studied through many different prisons. In a recent review we wrote we included a non-exhaustible list with more than 30 definitions of ecological niche that we found in literature. It is just clear that ecological niche is a very diverse concept which spreads over multiple fields of ecology and there is clearly an ongoing scientific debate regarding its components. For this reason we decided to use the research weaving framework because it allows for both evidence and influence synthesis in order to explore and map the diverse literature of ecological niche. Thankfully the offers of research weaving provided a list of recommended tools for each part of the analysis which came in handy when planning our project. So in short our aim was to quantitatively analyze in a reproducible way the ecological niche literature and identify temporal trends. Here are the components of the research weaving framework. This is a list and each analysis does not depend on the previous or next one. Each step is independent. In the next slides I will go through this list that explain what we did in every step of this analysis. First of all we compiled the data set of more than 30 000 publications along with their metadata. Then we had to identify the study species. Doing this manually for 30 000 publications is a very painful process so we decided to use an algorithm which is called GN Finder. The mechanism of GN Finder is based on the assumption that all taxonomic names, classes, orders, families etc start with a first capital letter. The algorithm has two steps. At first it identifies all words beginning with a capital letter in a text and then it compares the extracted words with two online taxonomic databases Encyclopedia of Life and NCBI. If the word exists in one of those databases then it's a taxonomic name. If not it's considered a normal word. Here you can see an example with an abstract from our dataset. GN Finder would first extract the words in bold and then after comparing the results with the databases it would keep the green words and remove the yellow words. The performance of the algorithm was very good. Only five percent of the words in the final output were not taxonomic names. Then we classified the studies according to their type. In particular we had to identify whether they were experimental, observational, theoretical or meta-scient studies. I have to admit here that this is the only part of the research we have in framework that we haven't fully automated. We couldn't find an algorithm that can make an inference about the methodology of a study so we resorted in an artificial intelligence online platform called Riot. We used Riot in order to label our abstracts one by one so if you have any idea of how we can avoid this please drop me an email. The good thing about this though is that we can use more detailed labels and more importantly we can classify the studies according to the source of their data, paying it database, fieldwork, simulated data etc. Next up is a part of temporal trends. This step requires a plot with a number of publications per year. As you can see ecological niche studies show an exponential increase over time. Since our data set extends over a period of almost 100 years we decided to break it down to smaller subsets. Every analysis you see in this presentation has been run once for the complete dataset and once for each 10-year subset. This allows us to identify temporal trends in all the components of the research we did in framework. Moving on we have the spatial patterns. For this step we use the bibliometric sub package to extract the country name from the affiliations of the authors. With this information we can construct networks of collaborations between countries and identify clusters. However, due to phenomena such as helicopter science the affiliations of the authors do not always correspond with the places where the studies actually took place. So we decided to take this one step further and create a process in order to identify study areas. The identification of country names in the abstracts can be easily done with the help of syringar and maps are packages. But that was not enough. Quite often the authors, instead of mentioning specific countries, they mention biogeographical regions. For example, a study might have taken place in the Amazon rainforest, the Alps or the Mediterranean. For this reason we decided to use text mining again in order to identify location names in the abstract. This is still work in progress. The goal is to build a global heat map in order to identify underrepresented study areas in the ecological bingery church. The next component of the research wave in framework is content analysis. Here we use topic modeling which is a text mining method in order to identify conceptual topics in the abstracts of our publications. Essentially these topics represent research communities inside the ecological niche literature. We identified 10 topics and here you can see their evolution over time. Next up is the analysis of terms. Terms refer to the keywords of each publication and luckily the keywords are included in the metadata downloaded from online databases such as Web of Science or Scopus. Using their coherence frequencies we can build networks like these ones. Here each node is a keyword and the edges indicate coherence of two keywords. In this slide you can see such networks created with bibliometrics and iGraph are packages. Furthermore in order to quantify temporal trends we calculated network indices for each 10-year subset which i won't explain in detail today due to limited time. We created similar networks and calculated their indices for the next two components as well. So similarly to terminal analysis we created networks based on co-authorship patterns. Here each node is an author and each edge indicates shared authorship. Again we have calculated the network indices for each 10-year subset to show the temporal trends of co-authorship patterns. Here you can see the publication networks which were created based on co-citation patterns. Co-citation is defined as a frequency with which two documents are cited together by other documents. So here each node is a paper and each edge indicates two papers that tend to be frequently cited together. As with previous analysis we calculated the network indices for each 10-year subset. To sum up we here try to apply the research weaving framework on the extensive literature of the ecological niche concept. We used many tools recommended by the authors as well as additional text mining algorithm and we conducted eight different analysis in order to construct the conceptual map of the ecological niche concept. The ultimate goal of this project is to bring everything together into a nice automated workflow. Stay tuned for the print print and thank you very much for your attention. Thank you Elina for that wonderful presentation. We're actually again just going to go on to the next presentation and hold the questions for the end. So I'd like to introduce our next speaker who is our fearless leader for this conference Neil Hadaway. It's great to hear you talking about your own work and you'll be talking from the Stockholm Environment Institute so we're delighted to have you here. Thanks Neil. Hi and thanks for coming along to my talk. Today I'm going to talk to you about Citation Chaser which is an R package and shiny app for forward and backward citation chasing. My name is Neil Hadaway and if you want to follow me on Twitter or ask me a question you can find me at Neil Hadaway and you can also find me with the same name on GitHub. So before I get started I just wanted to introduce the team. I've been joined in the team by Matt Granger and Charles Gray who both helped with coding on the package and Matt Granger in particular helped to design the network visualization functionality in the shiny app. I also wanted to give a special mention to lens.org who've helped to provide a token for long-term access of the lens.org database through its API. They've been incredibly supportive so thanks to them. So what is citation chasing? Well if we're thinking about how to retrieve articles for our evidence synthesis or meta-analysis we as well as searching bibliographic databases and gray literature for information we might want to make use of the reference lists of a set of relevant articles that might also hold other relevant information for our review and also articles that cite a set of relevant records and that's what citation chasing is. Backward citation chasing is looking through a list of references of our articles and forward citation chasing is looking at which articles cite our relevant articles to see if there's potentially more information that we might have missed to bring into our synthesis. Other names for citation chasing include citation searching, citation tracking, snowballing, pole growing, footnote chasing, reference scanning, reference checking, bibliographic checking, citation mining, reference harvesting and many more. Some of these names specifically refer to a particular direction of citation chasing some of them are more general but for now on we'll stick with citation chasing. So when we are going to perform citation chasing we can start from a number of different points for example we could wait until we've screened our articles and use the final set of included studies to then perform forwards and backwards citation chasing. We might also along the way find a list of relevant reviews that we won't include in our synthesis but we might want to scan their references and see who has cited them for other relevant records. And then we could also start out with a list of articles that we used to test our search and use that what's known as a benchmark list of articles to perform citation chasing. Something else worth mentioning is how citation chasing is performed at the moment in some of the best gold standard systematic reviews. Unfortunately it's not great. We had a look at reviews conducted by the or published by the collaboration for environmental evidence and out of 16 published in the last couple of years we found that 63 percent have performed backward citation chasing but none have performed forward citation chasing and in 31 percent of cases it wasn't clear which articles were used as a starting point. For reviews published by the Campbell Collaboration a recent study showed that 88 percent did perform backward citation chasing but it didn't look at whether studies had performed forward citation chasing or which studies had reported which lists were used to perform the backward citation chasing. In Cochrane reviews 87 percent a similar number had performed backward citation chasing and only nine percent have performed forward citation chasing a little bit better than the others and in only 1.5 percent of cases was the list of citation chasing records not clear. So you can see that a lot more could be done and at the moment more isn't being done because citation chasing is very challenging and those challenges relate to the fact that there aren't clear standards or best practices in how to do citation chasing. It's also often done by hand for example not using digitized lists of references and citations and when it is done digitally it's very time consuming because you need to go one study at a time to extract its references and citations. Added to that individual tools that we have to handle like Scopus or Web of Science don't have a particularly high comprehensiveness when it comes to the total studies that are in the reference list and which studies have cited them. So we wanted to produce a tool called citation chaser to do this. Our objectives were that that tool should be easy to use, it should be open source and free, it should accept a variety of starting identifiers and it should allow people to refer into the tool from other tools like review management software. We wanted to allow for forward and backward citation chasing all in one place to make things easy and efficient and we wanted to produce interoperable outputs that could be pumped back into the deduplication and screening process in our case RIS files. So we use the lens.org database as a data source for our tool. Lens.org is a meta database that consists of more than 245 million records that was as of January this year and it's an aggregator across different sources of bibliographic data. Microsoft academic graph was in there until recently when that was retired, Crossref, PubMed and PubMed Central and Core and lens.org are actively looking into solutions for replacing Microsoft academic graph like OpenAlex. But we made use of the lens scholarly API which allows us to query the database automatically. For the rest of this presentation I'm going to focus on the Shiny app which is the primary way that we see people engaging with citation chaser. There is underlying it the R package which is available on github and cram and the Shiny app makes use of the powerful functions within that package but users are encouraged to look at the R package if they want more detail. But we see most people, particularly people without a high degree of coding experience interacting with Shiny with citation chaser through the Shiny app and this is what you see when you arrive detailed instructions on how to use the tool and the tool is available at estech.shinyapps.io slash citation chaser. So we've seen two use cases or we see two use cases for Shiny for citation chaser that I'll run through. Firstly the user would directly put the identifiers into the tool themselves and they could either do this by manually entering lists of comma separated identifiers into their relevant box you can see the six types of identifiers allowable there and they can add in multiple identifiers at the same time or they could upload a CSV file that contains different identifiers in a two-column CSV with IDs in one column and the type of identifier in the other and you can click on help to find what that CSV file needs to be formatted like and see an example that you can download and edit. And in the beta version of citation chaser which is available at citation chaser citation chaser test by this URL you can also upload an ris file and the tool will manually strip out the dois that are present within that ris and take those as starting points. The other use case is through referral and what we mean by referral is that a developer could build a URL based on a set of identifiers that the user could click and it would take them directly to citation chaser and you can see here in this URL that we have an indication that there is a query following the URL by the question mark there's then the specification of which identifiers are following the equal sign and then a list of identifiers in a comma separated list different types of identifiers are separated with an ampersand followed by their new identifier type and then a second list and so by clicking this it takes the user to a pre-populated list of articles showing which identifiers they've put in but whichever way the user starts searching for their or inputs their articles this is the table that you'd see in citation chaser listing your input articles that you can then if you want download as an ris file but it shows you how many references and citations are available for each record that you searched for. The next thing that the user can do is to search for backward citation chasing for references within that set and we can see once we've clicked the blue button that we have a total of 136 references across those four articles and once it's been de-duplicated there are 132 unique articles across that set and you can download that ris file using the white button there. Similarly for citation chasing you can click the blue button and you see that there were 582 citations of those articles and 580 of those were unique and you can download those in a ris file there as well. If you want to you can also visualize the network this can be quite time consuming if you start with a large number of articles so it's worth downloading your reference and citation chasing results before moving on but if you click visualize you'll see your input articles as black dots surrounded by their references in red and their citations in blue and you can see the connectivity the connection between your starting articles it's quite interesting you can move around. In future we want to develop this more to allow people to download the visualization what you can do already is interact with this visualization and click on any of those circles and it will take you to that record in the lens.org database. So we see people using citation chaser in a number of ways they can integrate the tool into their systematic review or evidence synthesis workflow by starting with a set of included articles relevant reviews or a benchmark list they can deduplicate their reference and citation citation chasing results against their initial search results from searching great literature sources and bibliographic databases for example and any unique results that are left can be screened to find additional articles that could be useful that are only found by citation chasing there's additional functionality within the r package that might be useful for some people as well the lens.org API outputs a really rich data frame that you can access within an additional output object called DF and that holds a lot of information about authors for example but a whole suite of other information too and then we have some future developments that we hope will make it even more user friendly we're hoping to develop co-citation analysis or waiting or filtering so that people can dive within that forwards and backwards citation chasing results to see which records were most frequent within their network and already in the citation chaser test beta version you can have a a basic look at that frequency analysis there that we'll be developing in the future we also want to build a tool to allow people to deduplicate their citation chaser results against a larger set of for example bibliographic search results to show which are unique so that they don't need to screen again articles that they've already screened within their normal evidence synthesis workflow and we also want to build in and we are in the process of designing functions to allow people to search on titles so when you upload an ris file if a record doesn't have a doi you'll also be able to search for titles although searching for titles is not particularly efficient because of very minor changes causing a problem for a match but that's it thanks very much for your time we hope you enjoy citation chaser you can check out the shiny app at estech.shinyapps.io citation chaser you can see citation chaser on github as well and you can find citation chaser newly added to cram if you want to use it there thanks Neil wonderful to hear about your own work with citation citation chaser that's kind of a tongue twister sorry we're going to move on to our next speaker joshua polannon who is from the american institutes for research hello and welcome to the presentation my name is josh polannon i'm a principal researcher at the american institutes for research and i'm coming to you today from downtown washington dc thanks for joining me and so today we're going to discuss an evidence gap map shiny application for effect size and summary level data and so like i said this is a collaboration with numerous individuals here at ai are chi zang joe taylor ryan williams mega joshi and lauren burr mega in particular i'd like to give a big shout out to mega was our lead developer on the shiny application and she really helped us rethink how to process this information in the our environment so big thank you to mega i know she's watching so a huge congratulations to her as well i'd also be remiss if i didn't thank our sponsors so the project was originally sponsored by an institute a u.s department of education institute of education sciences that grand ended a little while ago and so the methods of synthesis and integration center or mosaic as we call it and ar has picked up some of the slack and helped us get this shiny application to the finish line so i'd like to thank both of those sponsors before we go further so let's then turn to creating one of these gap maps with meta analytic data in particular with our shiny applications okay so first things first where can you find this shiny application so ar has a has a shiny application server that where we host various shiny applications and mosaic actually has a couple of different shiny applications that you can take a look at while you're there but if you're just interested in taking a look at at at the egm shiny app you go to arshinyapps.shinyapps.io backslash mosaic underscore egm actually it's a mouthful but relatively simple url to get to and you know you you're in the right spot if you find a landing page that looks something similar to this now of course we're going to keep updating this page but but this this landing page will probably look um pretty familiar to you uh once you click on it regardless of when you come to it so before we dive into it i just want to mention a bit about the philosophy and the features of of the shiny app so the first thing is we wanted it to be compatible with a typical meta analytic data set we wanted to we wanted you to be able to say okay i've already done my meta analysis make probably an r and i have a row level data effect size level data on the rows i have several columns that indicate the effect size and the variance and and different characteristics of the programs and we wanted you to be able to take that data set and upload it into the shiny app and have it work so that's what it does we also just wanted it to be easy so some basic features uploading effect or summary level data sets so that means so again you can have a row level where each row is an effect size or each row can be a cell within your um e gm and i'll show you what that means in a minute you can actually have two so a traditional two level e gm is just x axis and y axis with multiple categories within those two axes you can and you can produce that very easily with this shiny app or you can actually produce a three level e gm which i'll show you what that means in a minute this is going to give you some really nice summary outputs and then like i said an easy to copy and paste r code in case you want to do some more um fiddling with the the gg plots or anything else within the plot itself okay so the live demonstration finally um so two examples one using effect size level data that comes to us from a review that i completed um a little about a year ago on the effects of programs on cyber bullying outcomes so we're going to look at at that so that's the effect size level data sets and then uh summary level data set comes to us um from that motivational intervention uh repeat that i mentioned and that had 14 different motivational interventions across three different outcome domains and i do want to mention right off the bat when it comes to loading the shiny app in chrome that there are that there are several dependencies and so if it takes a minute the first time that you um go to the the site uh don't fear give it a minute let it load and it'll i'm i assure you it'll it'll eventually initialize okay first things first go to load data upload my own data there is an example already built in here i'm not going to show you that um right now you can you're free to play with that if you'd like but it has all the same features but we're going to use our own data today to start off we're going to do effect size level data and then you can actually upload a csv or an excel file we're going to start with a csv file browse we're going to use this example es level csv open just takes a second upload complete now we've got several drop downs here the first one says specify the first factor for the e gm so this first factor is your x-axis so what do you want on your x-axis to be um it really is your preference you can go um you can you can put the outcome down there on the x-axis you can put the outcome on the y-axis um we're going to go with design on the x-axis design is the um design is the uh the rct qed individual level not school level that sort of thing so the research design then on the y-axis we're going to go with the outcome the outcome type so we've got two different outcome types here we got um uh cyber bullying or um aggression or traditional in-person bullying now if we want to specify a third factor we could here i'm not going to for this round but i'll come back to it and show you what we do with ads uh this effect size so which variable in your data set is the effect size variable so it's just it's easy one it's es then we've made this easy for you if you if you happen to have a column that's got standard errors you can click on that here and we'll transform the standard errors for you or if you've got variances for your effect go ahead and click on variance and then tell it where the column of variances is and then finally which column is your study level study identifier now this is important if you have nested effect size data um and so this would be like if you have correlated effects or hierarchical effects so multiple effects per study but even if you don't you probably have a a study identifier just go ahead and click on that and that is it that's one two three four five clicks then you click on over to create summary data and the last click you're gonna you're gonna push is um has to do with uh the the assumed correlation among the effects uh when there are multiple effects per study point eight is is generally the correlation that we say is about right so we're going to use that for now and then click summary data and I'm going to expand this to show 25 rows and now what we see here are our basic summary data so this is each one of the cells that's estimated so let's walk through this for a minute the first column here is the factor one the x axis so this is our design column so we've got non-randomized class class level assignment non-randomized individual level assignment non-randomized school level assignment randomized class randomized individual randomized school so on and so forth then we've got our y-axis this is our outcome so we've got two outcomes here we've got aggression or traditional bullying or the cyberbullying the estimation method here I'm going to sort by it just so we can see this easily there are actually three different types of estimation methods that EGM will select for you uh the most basic one we actually don't have here occurs when you have one study in one effect when that happens there's no meta analysis that's happening within that cell it just gives you the the basic effect size if you have two or less studies and four less effect sizes you're going to get the univariate random effects model this is just a conservative model that doesn't assume any correlated effects even though there are effects within study it does simple averages within the studies to create the average effect per study and then the correlated effects estimation model is the robust variance estimation method and and that is why we need this little scroll over here and it accounts for the correlated errors within study okay so that's the estimation method then the average effect size is exactly what what you might imagine it is the average effect for that cell and so in this particular example negative effects are a good thing because it means that there's a decrease in bullying or cyberbullying for the intervention group compared to the comparison group and we've got the number of studies and the number of effects so the most studies within a cell is for the randomized individual looking at cyberbullying we've got 12 studies and 27 effects and the most effects is 61 actually and so this is randomized at the school level and there's 61 effects there okay enough of that how do we get to an eGM click on eGM there's a couple more drop downs here this says the first one says do you want to map average effect size onto a continuous color do you want to overlay anything on the dots this gives you allows you to say the number of studies or the average effect i'm going to click average effect so we can see what that looks like and then the x axis is the design and the y axis is the outcome that allows you to put any label that you want in there and when you click create plots finally we get an eGM okay so if we look here one more time we can see our typical eGM that we think of we've got the x axis is the design we've got the six different cells six different categories for design two different categories for outcome on the y and then crossed in there is our average effect for each one of these cells the average effect corresponds to this color code down here with a negative four being the largest in a dark purple this i believe is set up for people who have color blindness or issues with processing colors this should be good for those folks and we've overlaid the average effects within each of the cells if you want to download the plot click on this download plot you can give it a different name we'll just say hey sham you can do some stuff with the width of it but if we download the plot and then we click on it you can see it last but not least is the r syntax you can copy and paste this right there like that and this will tell you this will give you everything you need to reproduce that plot using your own data really that's it i'd like to thank you all for joining us today i appreciate any input that you have i'd like to thank again my collaborators and co-authors chi joe ryan mega and lauren if you have any questions feel free to reach out to me there's my email i'm also on twitter you can find all my contact information at joshwellarpolanin.com and we look forward to having you all use the program have a great rest of the conference thanks for having me have a great rest of your day wherever you're calling from thanks josh for a really interesting talk on shinia for evidence gap maps i'm actually excited to use it because i've tried to create one of those manually and it did not work out so it's it's great to see this work moving forward so now we have a few question or a little bit of time for questions about five minutes of course if we don't get to all of your questions we'll be answering them twitter kind of as as they come in so just be on the look out there but our first question is for alina it's from neil and the question is really around how you might see research weaving being useful for folks without coding experience so do you see it as potentially being able to be developed into a web-based app for those folks who don't have that extensive coding experience thanks for the question yes actually that's the ultimate goal of the of the project but it's going to definitely take a lot more work to arrive there we're just for now we're just trying to make each step automated so that people can cope with big corpora big bunches of literature and then of course we can extend it to a web-based application thank you yeah maybe a future hackathon project right um thank you yeah so our our next question is for neil actually um this came through on youtube from the leads mix method systematic review course um and it's a question about how citation citation chaser may compare to google scholar scopus web of science for its comprehensiveness um in forward citation searching yeah thanks very much that's a a really good question um so this as you could tell from the question that only affects affects the forward citation because there's actually a real number of references in a paper but the number of papers that reference it is a bit nebulous um so my experience from lens.org is is that it it's close to google scholar um google scholar sometimes is a little bit more sometimes a little bit less um but they're they're on a par google scholar trolls the internet whereas lens.org is an aggregator of databases they have a different way of working but um it's consistently better than scopus and web of science the um i haven't published the data yet but i i'm planning to web of science is the worst then scopus and the lens and google are sort of fighting for um top position but that that shifts over time as well so absorption of uh open alex which is planned to happen soon by lens i believe um we'll we'll change that great thanks neil so we may see a published paper soon that we can cite in terms of yeah which one outweighs the other um we have another question um from twitter um from phil martin um who says you know definitely planning to use the the tool on his own work um he has a question about citation chasing in general um are there any existing standards for the set of studies to chase citations of and what might be the advantages or disadvantages of different approaches yeah that's a really good question it was something that we we see differs slightly in different fields um it tends to be in cockren reviews that they would use the list of included studies um which i think is a really good approach but whatever you use whether you're using a set of articles that you think are relevant or that might be relevant or that were linked by semantic relationships and search results for example or that you know irrelevant because you've read them all or their reviews so their reference lists are likely to be even more relevant than a normal paper um it's it depends where you sit on a spectrum of um of sensitivity versus specificity so i think if you used something like bibliographic checking of reviews that you knew were relevant you're likely to get fewer additional papers because you're probably aware of that body already but the relevance of all of the results that you get is likely to be higher so it's a bit less work but you're probably not being as sensitive so i think if you used a much larger set of of records then you obviously get a much larger set of citation results and if you use them from search results rather than using them from manual inclusion screening then you're you're likely to get fewer relevant or a lower percentage proportion of relevance from search results but the problem is if you use search results if you use um like a very large set of studies to then go broader and you haven't manually screened those for relevance you're going to get a very large set of records that you have to screen manually so i think there isn't any guidance or standards i don't think there's really any evidence of which is most relevant which is most useful but i think there needs to be but i think the next step for something like citation chaser would be building in some kind of screening algorithm using machine learning that can look at co-citation and semantic relationships to help narrow a large body of citation and chasing results down and it's really difficult to say citation chaser i'm really sorry yeah no it is um thank you yeah no i agree so it sounds like in addition to the these tools the field kind of needs additional guidance and structure around what's expected and the best ways to do that um so thank you i want to thank all our presenters we're out of time so we can't take any more questions live but we will be doing that over twitter and um please you know continue to join us for the the rest of the sessions today and later this week thanks everybody