 All right, next up we have Sean Reif who's going to talk about S-Site identifying highly replicated research through citation analysis Yeah, good morning. So Real quick. My name is Sean. I'm from site.ai and Murray State University in Kentucky where I'm an associate professor of psychology and Before I get started just wanted to note a couple of things that I should have put on the first slide And I forgot This work has been supported in part by the NSF and NIH and second I'm one of the co-founders of site.ai so I have shares in site.ai and I'm obviously have a financial and financial interest in the company So full disclosure Site was founded in 2018. We started it to address two key challenges One is simply the volume and velocity of published research So, you know, simply the millions of papers published every year As well as the extent to which sort of the state-of-the-art or the accepted stance on any given topic changes quickly Combined with the credibility or replication crisis And what we really wanted to do was to enable researchers to evaluate research in terms of quality And also to discover high quality and ideally replicable Research and the thing that we sort of leaned on to do this is citation statements That is the key component So a citation statement is basically just what you see in the gray box up there is the text that surrounds the citation to a scientific paper So we do this, you know If you want to actually analyze the content of scientific citations, you have to have the full text of the scientific article and So we have to get those and we have to actually analyze them So real quick I'm just gonna talk about sort of our methodology and how that works If you want more information and want some some of the technical details I have a QR code and a link to excuse me to a paper open access paper or we describe it in a good bit more detail So we retrieve papers from open access postings we have a partnership with our research the folks behind on paywall and We also have indexing agreements with Many major publishers So they send us both their back catalogs as well as an ongoing deposit of new publications We then extract the text from either the PDF or XML files that we are either sent or retrieve through one of those methods and We have some methods of identifying the text of citations Such as the examples that I gave just a second ago We store those in a database and then we have a deep learning classifier that classifies that citation text as One of three categories supporting that is to say we successfully replicated fizz be at all Contrasting we did not find evidence consistent with fizz be at all or simply mentioning Which is you know where you cite something but not in a sort of a value to the context or you know relevant way so as of Today, we've indexed a total of thirty four point one million scientific papers and extracted one point nine billion citations from them and our database currently covers 179 million works and just to clarify if you're looking at that and it seems like there's a discrepancy in the numbers our database contains metadata on basically everything with the DOI What we do is we we add to that by analyzing the scientific papers themselves and extracting that citation data as well as enriching it with Other sources like open alex, which I'm sure a lot of you are familiar with Okay, so we combine all of this and this allows us to produce what we call report pages For a given scientific work So this is an example paper on amygdala hyperactivity font at all from 2006 and With these report pages which you can can pull up for any published work with the DOI It will give you some metadata. It'll give you some information about the publication But what makes site a good bit different is What's in the right hand panel there under these cited by four hundred forty nine publications? That is containing text that is not from this paper not from fun at all It's actually from other papers that are citing this paper So if we zoom in you can see this one example from Davies et al from 2017 and It's got a total of four what we call smart citations The first two are listed up there and it's showing you the full text of the citation statement Now because we have the classifier and we've been able to classify these citations We're also able to allow users to filter that list of citation statements by category So for any of these reports you can say hey only show me the supporting or mentioning or contrasting citations and That ability to filter or sort by the types of citations that a work has received Leases some interesting possibilities So for one thing we have a pretty robust search infrastructure So you can search for papers based on keywords like you would be able to with with any other platform But because we have classified those citations You can for example Filter the search results by the types of citations that a work has received So for example if you pull up the filter button on the search page you can in this example Only display search results from papers that have received 26 or more Supporting citations and ideally, you know, this is work that has been at least to some extent replicated later on So there are a number of use cases Obviously orienting yourself or evaluating a new field is one of them So forward and backward chaining of citations if you're familiar with that process it makes that fairly fairly easy And I should say so as a professor every once in a while. I will teach intro psychology, which means I'm a social psychologist spending a week talking about neuroscience, which is mildly terrifying Even after doing it for years So, you know, one thing that I do is when I go back to sort of reprint my materials I'll go through there and I'll say, you know, all right What are some of the key studies are being cited in the latest edition of the text and I'll look up what? Citations they have received since then and I do that with a lot of classes and it's a useful exercise You can also use this to identify what we call heavily active areas of research So if you're interested say in something where there's a lot of discussion going on at paper or a topic Well, what we recommend is you can actually just look for look for papers or topics that have Relatively high number of both supporting and contrasting citations particularly if those have co-occurred within, you know Say the past two or three years That's a pretty good indication that there's a lot of movement in that area and I think you know You can also see how this could have implications for things like grant writing In terms of identifying sort of gaps in the literature or unresolved questions But also and then hopefully this is of interest to some of y'all I think there are applications for meta scientific research So we're we're very happy to collaborate with meta scientists researchers like yourselves provide data usually for free and so, you know one thing you can do with this is you can Use for example supporting citations as a kind of proxy for quality not a perfect proxy. It's a proxy but some indication of the quality of scientific research and In fact, I just had a graduate student Hillary Copeland who had just she just finished her analyses doing exactly exactly this she's looking at racial ethnic and gender diversity of authors and its impact on quality of scientific output Using that supporting citation count as one indicator one outcome variable And you can also do things like identify various citation patterns There are a few implications for this and here's where I get to talk about metrics, which I know is a favorite Everyone here loves metrics right metrics are amazing and we should have more of them I'm not a fan of metrics, but I have resigned after a number of years to the fact that they will continue to exist Much as I dislike them And the thing that I like to point out about metrics and the way that site does metrics because we have What we call the site index, which is basically just a ratio of the supporting sites that a given journal or organization has received To the number of supporting and contrasting sites You know this is I think a step up it is a better metric, so it's a better bad thing If you want to think about it that way and that it's it's based on the content of a citation Not simply the fact that a citation has occurred And I think that is at least a step in the right direction I should also point out that we don't apply this while we apply it to journals and organizations and funders We don't apply it to authors or labs I believe the technical reason is because we feel like that would be icky So that's you know, we have to draw the line there and make sure that you know, we're sort of applying it with good intentions and I was just at the large language Large language models discussion earlier this morning a lot of great discussion in there You can't give a talk without this kind of thing without sort of touching on AI And that is definitely a thing that we are heavily invested in I'll go ahead and just say if you were in that session I remember one thing that people were talking about was you know, the problem of the sort of open AI hallucinating You're familiar with that. It makes up references. It says things that aren't true. We've actually deployed using open AI's Back in technology. We've deployed what we call side assistant And because we have access to the full text of scientific articles we're able to parse questions that are posed in natural language and Then give a composed synthetic answer, but one that's based on the scientific research So if you go to that and type in a question, it will reply with a an answer But it will also give you it will show you its work. It'll show you what papers it's referencing So that's an active development and we're we're excited about being able to work on that in the future This is the link to the paper where I get into we get into more technical detail about what we do and how we do it and I like I said, I'm we're really excited about collaborating with meta science researchers So if you're interested in any of that by all means, please shoot me an email and Look forward to questions. I think we have time for just one question. So I'll try to make it good. Thanks Sean for a great talk Jordan working in Federation of American scientists so in the spirit of some of the discussion that has been happening around kind of other Contributions to scientific progress like data sets and tools and methodologies. I'm curious how site thinks about kind of a fourth Citation category that's more along the lines of using And whether that would fall into supporting already or whether there's a different Yeah, so this is a great question. We we spent a lot of time developing those three categories And properly conceived the way I understand your question I think that would actually be what we will call mentioning citation. So it wouldn't be valence. There's plenty we've had plenty of debate over that But yeah, that is something that we're constantly exploring and you know, I'd be come find me after and we'll have a chat about it Sounds good. Thank you