 Hey there, welcome to this presentation on the new SiteSource R package. My name is Trevor Riley and in this presentation I'm going to provide an overview of SiteSource's general functionality and the primary use cases for SiteSource. So the package got its start at last year's hackathon. It's really fantastic to be able to share SiteSource for the first time here at Esmarconf. I'd like to thank Neil Hadaway and the Esmarth planning committee for your support was critical in bringing this team together and getting the project off the ground. I also want to make a note that SiteSource is in its final stages of development and testing. We're looking forward to launching version 1.0 here soon, which will include a fully functional Shiny app as well. Before I jump into the package, I also want to take a moment to highlight the members of the SiteSource development team whose hard work and dedication made this possible. Along with myself, we had Katelyn Hare, Lucas Walbrook, Matthew Granger, Sarah Young, Chris Prichard, and Neil Hadaway. Okay, so at its core SiteSource has two unique functions. First, the package allows for users to add customizable metadata in three fields. We've named them SiteSource, SiteString, and SiteLabel. And in this presentation I'll refer to this process as tagging. Tagging citations is the first step users take after uploading citations. By tagging a citation record, users can maintain information on a number of variables, which could include the resource which the record was found, the method used to find it, an iteration or variation of a strategy such as a search string or the progression of a search string, a research phase, and even certain groups of citations such as benchmarking articles. The second ability relates to how SiteSource deduplicates records. Instead of working like a traditional process where a user selects a single record to be retained, SiteSource actually merges records to create a single primary record. This primary record maintains the tags provided by the user in the three fields. And it's with these two functions that we can start to produce some really great visualizations and analyze our citation data. If you watched any of the Esmar sessions from previous years or even this year, you might recognize the ASIS package developed by Caitlin Hare. Currently a modified version of ASIS is the backbone of the record merging process. Alright, so let's take a look at the first example, analyzing resources and methodologies. We'll start by taking a look at the vignette on comparing topic coverage. In this example, we're using SiteSource to review overlap between databases for literature on the harmful effects of gambling addiction. To do this, we've ran a search for the term gambling harm in the title and abstract fields in Lens, Scopus, Criminal Justice, Abstracts, Psychinfo, and Medline. After loading the package, we can tag our citation files with the source names. And once we've run through the deduplication process, we can start to create some plots of our data. This heat map shows the number of records retrieved from each database, as well as the count of overlapping records found between databases. In this example, we can see the Scopus' highest number of records on gambling harm and Criminal Justice Abstracts, the least. We can also see out of the 176 articles found in PubMed, 171 of them were found at Scopus. This next heat map uses the same data, but provides a breakdown of overlap percents between two sources. As you can see, 97% of PubMed's 176 articles were also found at Scopus. The next plot is the upset plot. The upset plot is one of my personal favorites. And it's just another way of looking at that overlap, but in a bit more detail. So you can see here we have our five databases. You can see Scopus here at the bottom with the most number of citations, followed by PubMed. And then you can take a look at how these databases overlapped and in what combination. You can see here that Scopus had the highest number of unique citations, followed by Lens and PsychInfo. You can also see that there are only six articles that were found in every single database. All right, for this next example, let's say you want to understand not only how each database performed and look at the unique records each source contributed, but you'd also like to know how that searching of traditional databases and platform performed versus two other literature gathering methods. Let's say citation chasing forward on your benchmarking articles, and then using a new co-citation machine learning AI, you fill in the blank discovery tool. Right, so taking those two different methods. For this case, we can take a look at the source analysis across screening phases vignette. And instead of just using one of the customizable fields, we're going to use two this time. We use site source and site label. All right, so you are first going to install and load the package. You are going to point to the direction of where that data is that you want to evaluate. And then you are going to start tagging. So you can see these are the file names and then aligns with the source name. And then the label here, we have a search because those were search results. You get down to this next one. This is final. So it does not have an actual source and it is labeled as final because these were the final included studies. You can see that TIAB is the title and abstract. Those were the ones that made it past title and abstract screening. And above those, you can actually see that we have our two methods. Those are also labeled as search. So we're going to go through the deduplication process. And then we're going to start taking a look at some of these plots. So again, we have that first map plot, which does overlap between the databases. We have the same one as a percent. And then we have that upset plot. I want to get down to this next one, which is, again, one of my favorites. So this is where we actually start to see that site label come into play. At the bottom here, it might be a little difficult to see, but this is broken up into three plots, bar plots, as you can see. And we have actually searched, screened, and final. So not only are we looking at the databases across the top, your agris cab, the green file, and then Web of Science here at the end, and these two methods, we're seeing how many unique records that method or source brought in and how many duplicate records have found in at least one other resource or method there. And you can see then how that progresses through title and abstract screening and for the final included papers. Just below that plot, we do have the citation summary table. We're still working on this. This is one of the things we want to just make sure is polished. And I think this is going to be a really fantastic addition to the package. It's something I'm really looking forward to seeing. And thank you to Allison Patel for the inspiration on this and also for helping us in this regard. All right, one last quick example. This is actually a use case that I hadn't even considered at the beginning, but made complete sense when I was working on another project. So I'd like to show this one off benchmark testing. So we had a number of different searches and we wanted to be able to take a look at how well we were finding our benchmarking articles. And using site source, we were able to actually do multiple iterations of our search string so much faster. So in this case here, you can actually see that search string four and search string six. I know these are hard to see, but they are not contributing to finding any of the benchmarking articles themselves. Unlike you can see search string number five has found three. Search string two has done three just by themselves, right? But search string four and six while they are finding benchmarking articles, we also have other strings that are finding those same benchmarking articles. So again, a really interesting way to use site source in helping to develop your search strings and strategy. This was also just a whole lot of fun. I think we went through four or five different iterations of this and this is just one of them. I think this may be our second or third. And then also just heading down here and taking a look at this record level table. It's just fantastic. Lucas put this together and it still just blows my mind. I love it. It's very helpful in the benchmark testing. You can play around with it. You can also download it as a CSV. So if you're interested, come jump in this vignette and take a look. A couple more things before I wrap up. I just want to point out that site source is available via download on GitHub. We will be working to get it up on Cran soon. And we've also put together some discussion boards. We're really hoping that folks engage with us here. We've got the different use cases outlined and vignettes. And then finally, if you're interested in working with us to further develop site source, if you have some interesting use cases in mind, please drop us a line. Please reach out to us on this discussion board. We're really looking forward to working with you. And that's all for now. Thank you so much and happy site sourcing.