 Hi everyone, today I'm going to give a quick introduction to the Automated Systematic Search De-Duplicator, ASSIST. This is an R package I created to remove duplicate citations obtained from different literature databases. Researchers performing a systematic review typically search across multiple databases for relevant research. Often there could be substantial overlap across databases, meaning that the same citation may be present several times in your combined search results. As you can see in this example, citations index in different databases have slightly different formats. They aren't exactly the same. These subtle differences are easy for a human to verify. It's obvious that these two citations are referring to the same bibliographic record and we can easily mark these two as duplicates. However, in reality we can be faced with thousands of these decisions to make, even if a tool suggests which pairs might be duplicates. It's a lot of manual work to go through each pair and check. Therefore, we need automated solutions. There is a growing focus on trying to integrate new evidence as it emerges via more frequent systematic search updates and living systematic reviews. When researchers perform an updated search, there will be duplicates not only within that new search, but also between that new search and the citations which already exist in the review. This can be problematic for recordkeeping. For example, if you use a tool to de-duplicate everything once you've added the newly identified citations in, if there is deep duplication between the searches, you may delete the old citation and retain the new one. If you use identifiers to keep track of each study, for example, EndNote record IDs, this can cause all sorts of issues. If you've already screened some of the records for inclusion or extracted data, it can be difficult to then link that up with the original record. Duplication tools need to be equipped to deal with these issues. Assist is an R package available for you to install via GitHub. It also has a shiny application, meaning that you don't need to know how to code to use it in your systematic review workflow. Assist allows users to upload citations in multiple formats, to automatically detect and remove duplicates, to specify which citation to retain in a group of duplicates, and to download the resulting citations in multiple formats. Let's go through an example on the Shiny app. So here we have the Assist web application. As you can see, in the left hand side here, there is a section for uploading your citation files. There's a few different formats to pick from. So there's EndNote, XML, CSV, or tabs limited. And there's some formatting requirements here in terms of what columns you need and how to get your data into the right format. Today, we're going to use EndNote, XML for simplicity. So if I now open up EndNote, you can see this is an example systematic search I've got here. I'm just going to select everything and export so we can use this dataset. So once we have that file, we can go back to Assist and we can upload it. Here it is. So as you can see from the EndNote, there's 1,845 citations here. So we want to just check in this preview that that is the correct number that's been uploaded. So yeah, that's uploaded fine. And this is just a preview of the first 10 citations just to check that all of the fields have loaded properly. So that looks fine to me. One thing to point out is this record ID column. This comes directly from the record ID in EndNote. So I'll just filter to show you that. The record number, sorry. So yeah, that's coming directly from that record number field. And I know that some people use this to keep track of their citations throughout a systematic review. It can be really useful to have an identifier that's unique to each study. So if we're happy with our data, we can move on to the de-duplicate tab. And here, first of all, we've got automated de-duplication. And you can configure some of the options in this box here to determine how this will take place. So first of all, it's asking about the unique ID. And it's asking to select a column which contains a unique ID for each citation. As I mentioned, a lot of people use the record ID or the record number directly from EndNote. And that's the case here. But you might have your identifier in a different column. It just depends on how you've been managing your project. But today we're just going to stick with record ID. So for this section, I want to just explain it in a little bit more detail. It's asking about which citation identifier or record ID you want to keep from each duplicate set. Imagine these four citations are referring to the same bibliographic record. In other words, they're duplicates. What assist is going to do is identify that all these four citations are part of the same duplicate group. Three of them will then be removed and one will remain. It's important to note that metadata from these other three citations that were removed may still be retained in this final citation. For example, if record 1111 had a DOI and an abstract but the final citation did not, it would try to merge those together to retain as much metadata as possible. Therefore this setting only refers to which identifier you would like to retain at the very end. This has implications for record keeping throughout your project. For simplicity, we are just going to stick with the default option, which is to retain the identifier of the citation with an abstract. We can then click the remove duplicates button to automatically identify and remove duplicate citations. This may take some time depending on the number of records included. Here there is only around 1800, so it shouldn't take very long. And here's the results. From a total of 1845 citations, assist removed 1259 duplicates. There's 586 citations remaining and 141 possible duplicate pairs have been flagged. If we now move over to the manual deduplication tab, we can have a look at this in more detail. In the automated deduplication step, assist removes duplicate pairs which it's very certain about. However, in the manual deduplication phase, these are pairs where there was some uncertainty. As you can see here, 141 pairs have been flagged for manual deduplication. Assets places records in this manual deduplication tab when it's not sure that they're definitely duplicates. Here you can see in this table the records are presented side by side. So author one is from citation one, author two is from citation two. And as you can go along and see this is for every field. Highlighted in green is when the fields match either exactly or close to exactly. As you can see here, there's been 141 pairs that assist wasn't sure about. We can go through the pairs below and review each one individually. If you select the pairs that are definitely duplicates, you can remove them. Because there's so many here, I'm going to choose to flag all of the selected pairs instead. This means that they'll appear in the output for me to review later. On the summary page, we have a simple Sankey diagram of our workflow so far. We have our original citations, the number that were automatically removed and the number which remain. On the downloads page, there's a number of different options to go through. First of all, we need to choose an export type. In assist, you can choose to export just the unique citations or all citations. There's some notes on what this what this means over here on the right. For now, let's just export the unique citations. There's a number of different export formats as well. So there's an end note tab delimited format, risk format, and a CSV format. I'm going to choose the risk format for simplicity here. You might want to only download citations with a specific label or a specific source. For example, you might have labeled the new citations, obtained in a recent systematic search in a certain way, and you only want to retain the studies which have that specific label. That's what these options are for. For now, we're going to ignore them. We're going to leave them at their defaults to ensure that we can download everything. And we will download the unique citations now. If I click this, it should open an end note. That the right import is selected. So this is a risk or RIS import. And that's our unique references now in end note. As you can see, if we go back to the app for a second, you can see from the export format, there's additional notes here on how it's going to be exported. So in the notes field, this will contain flags for potential duplicates. And the database name field will contain the duplicate identifier. Remember, the duplicate identifier will refer back to the record ID of the citation that you keep. So if we go here, you see in the notes column, quite a few papers have been flagged with this potential duplicates note. So probably the easiest way to review these later is to add them to another group. Obviously, if you export to a CSV file or to another platform, this could be slightly different. But this is just to add them to another group. Obviously, if you export to a CSV file or to another platform, this could be slightly different. But this is just showing you how to do it in one of the popular platforms like end note. So you can go through these in your own time. You could sort it by title, for example, and check for any remaining duplicates there. For now, though, this is your unique dataset until you find any more duplicates being flagged. I wanted to quickly go through another example of where assist can be useful. Here, I'm performing a systematic review update. I've found my new citations, 2,626 of them, and I've got them in end note at the moment. I've labelled them with yesterday's date. I've labelled them with the date here. And in the name of database column, I've put the source. So which database the record came from. I've also put my existing citations into end note. And I've put the name of the database as in database already, which essentially means that they're in my review already. Again, the label field represents the date of which I obtained these citations. I've put all of these together into one group and I've exported them as an XML file to upload to assist. As you can see, I've already uploaded the file and proceeded with deduplication. In the manual deduplication section, I've been through this and I know that these are all real pairs. So I'm going to remove all the duplicates after selecting them all. This can take a couple of minutes as it runs through the deduplication to make sure that these extra pairs have been removed. Now in our summary, we can see that there's an additional part that shows the manually removed citations, as well as the ones that were automatically removed. To get only the new citations, which haven't been retrieved in any previous searches, I'm going to filter by the label. Having this label here means that any citation with this label will be retained. Going back to our earlier example with four duplicate citations, you can see that all of these citations come from different sources. The final unique citation retained by assist keeps all of this information in the source column. This behavior is the same for the label column. We want records which only have this label and no other label. In other words, we don't want citations which were retrieved in this search and the previous search. To ensure that that's what we get in the output, we can turn on this option. Only retain citations which are unique to this label. We can download citations again and look at the output. Here we have imported the citations only previous labels. Only present in our most recent search. As you can see, all of the labels are 270223 and the name of the database reflects the source. When you use an automated tool, you want to be sure that you can trust the results. In the assist shiny app, you can export all of the citations and check the duplicate groups that assist has generated. All citations class as duplicates will have the same duplicate ID as shown in this screenshot. If you want to know more about the performance, have a look at our preprint. An important point to consider is that assist was initially created for use in biomedical systematic reviews. On test data, the automated features of assist have been shown to have very high sensitivity. Over 95% of duplicates were detected. It's also been shown to have a very low false positive rate, comparable to that of human reviewers. The performance of the tool in other research areas may not be optimal and needs to be validated in future. Today, I have focused on the shiny app because it's accessible to everyone. However, assist is also a fully functional R package with a website and vignettes. Please refer to these to use the package and integrate it into your systematic review workflows in R. Based on feedback from the wider community, there are a few areas I'm looking to work on in the coming months. Firstly, pulling in additional metadata when users upload citations. Pulling in additional data from Crossref and OpenAlex, such as DOI information, could facilitate the de-duplication process. Often missing metadata is the reason the duplicates aren't detected. I'm also looking to integrate with other R packages. In recent months, I've been working with the site source team who also have a tutorial at this conference that I really recommend you have a look at. I'm also interested in making improvements to the manual de-duplication features to make them less labor-intensive. I also want to provide better support for non-standard databases. Assist was created based on formatting data from PubMed, Mbase, Web of Science, Scopus, the common databases that are used for biomedical reviews. Therefore, some of the formatting requirements and differences across other databases might not be as well covered in Assist. I also want to validate Assist across different research domains and across different use cases. Assist is open source and open for collaboration. If you'd like to get involved, please visit our GitHub page. Feel free to raise an issue or fix a bug or even start working on a new feature. If you'd like to discuss further, you can also reach out to me by email. I'd like to say thanks to everyone who's already contributed to the conference. I've also contributed to the GitHub page, whether that's by raising an issue or by fixing something within the package. And I'd like to say thanks to everyone today for listening to my tutorial. I hope that you find Assist useful.