 Today I'm going to talk about ASSIST, the automated systematic search deduplicator, a tool I developed to identify duplicate publications in systematic review searches. As systematic reviewers, we want to capture as much of the literature as possible, so we usually search across multiple databases. However, this leads to duplicate publications making their way into our systematic review. If not identified at an early stage, we can extract data from the same publication multiple times, which in itself is a waste of time, but it can cause real problems if it's not removed before we come to do our analysis, because it means that we could easily skew our analysis one way or the other based on that duplicated data. This means duplicate publications are a real problem. Over on the right hand side here, I've just tried to illustrate what I mean by duplicate publications. Looking at this, these records are very clearly the same, although there are small differences in the title, pages, volume, etc. It's very easy for us as a human to see what the differences are, but if you have thousands of records in your systematic search, there's no way you can go through all of them manually, so we need some automated tools to help us do this. In my research, I was interested in developing a tool which could identify duplicates in really large systematic review data sets. In our lab specifically, we're interested in pre-clinical data, so data from animal models of disease, and when we do systematic searches across databases, we often retrieve thousands of results. Typically, we've used EndNote for deduplication, but as I went through my systematic review, I just kept finding more and more duplicates, and I realised that EndNote wasn't good enough. So, using previous data sets within our research group, I imported these into RStudio, and I tried to format them in a way that the authors would match, the DOIs would match, and some of the other fields would match better, like the pages, for example. I tried to just clean the data as much as possible to help identify the duplicates. I then used this record linkage package I discovered in R. It actually was developed years ago, back in 2010, and what it does is it links records based on criteria that you specify. So, I specified a whole list of blocking criteria, and this is criteria where it has to be 100% match. So, papers where the title and the pages match 100%, or the title and the author match 100%. And another thing it does is it also has some text matching algorithms, which give it a score between 0 and 1 on all the fields for a record. So, across everything, it matches the text and tells you between 0 and 1 how similar it is. So, using all of this information, we're able to get a list of linked records, basically, and a lot of them won't be duplicates, because if they only match on, for example, year and pages, most of them won't be. So, I developed additional filtering criteria to say if it matches all of these things above a certain number, then it is a duplicate, and if it doesn't, then it's not. The blocking criteria and the additional filtering criteria were really just developed using a heuristic approach, just trial and error continuously on several in-house data sets and just trying to figure out what worked best. Once the records get through to this, and the ones that are true duplicates, then I will select a record within each pair to remove, and I'll keep the record with the most information, so that's usually the record that has an abstract, for example. This algorithm or this workflow are then made into a Shiny application so that it's user-friendly and it means it can be used by anyone and they don't need to have coding experience to use this. I wanted to see how Assist performed on data sets that it hadn't seen or hadn't been used in its development. So I collected five systematic review data sets from collaborators and these range from quite small data sets with only 2,000 records to over 80,000 records. And I tested the performance of Assist compared to EndNote's automated default deduplication function and also a tool developed at Bond University as part of their systematic review accelerator. The Bond University tool I thought was a good comparison because it's also available via an online web platform, simple to use and it's fully automated as well. So I compared these and overall Assist came out on top in terms of sensitivity and specificity. It had a sensitivity of over 95% so it correctly identified over 95% of duplicates and it had a comparable false positive rate to humans so it wasn't removing too many papers. It was also really fast, especially compared to the Bond tool which I think due to server issues online it did take quite a long time to deduplicate some of the larger data sets. So overall it shows that Assist is a good option for performing deduplication of large data sets. The next steps for Assist are really to disseminate the tool. I need to write a publication and get the tool out there for people to use because it's really only our immediate collaborators who know about it and are using it right now. I also want to integrate more important formats so that people can import libraries from Mendeley, Zotero and other reference managers and I'd like to integrate with other systematic review related platforms. For example we have an in-house platform called surf the systematic review facility and so it would be good to integrate with other platforms like that. I also need to scale up capacity to deal with larger data sets. Currently the Shiny application is limited at around 50,000 records so I do need to work on that and think about other solutions. The code underlying the application is openly available on GitHub. There's the link there and please try the Shiny app if you're interested you can use the QR code on the right hand side there to visit the website. Thank you.