 Next up, we have Vicente Amado Alivo, who's going to talk about development of a global registry for peer review in astronomy. Hello, everyone. My name is Vicente Amado Alivo, and I'm a second-year graduate student at Michigan State University, and I'm very excited to share this work. And this work was done in collaboration with the Space Telescope Science Institute. And the team working on this were myself and my advisor, Wolfgang Kersendorf, at Michigan State University. And so to start off, we are part of the Deep Thought Initiative. And here is a QR code of our website and also the URL that you can use to follow our work. And so we're an interdisciplinary group of astrophysicists and data scientists developing methodology and tools, working on open source and open science methodology and tools, working on computational meta-research in astrophysics. And this is our team and our collaborator at Space Telescope, Lou Schrolder. And so today, I will focus on, at first, I will discuss the allocation of community resources in astrophysics and how that is unique, as they must be shared amongst the community. And then I will present our open source reviewer registry called the DTI Registry. And an important element of this is the problem of ambiguous author names. And I will finalize by discussing how we validated a few techniques. And so early on in astrophysics, a majority of telescopes were built by individual research groups for their specific research questions, such as the Hopkins Observatory. But as these questions became more complex, the community came together with large organizations, such as NASA, to build the Hubble Space Telescope, and then more recently the James Webb Space Telescope, observing further than we ever have before. And so these are community resources that must be shared amongst all astrophysicists, all researchers in this field, to perform their experiments. And they're also limited resources. They can only observe one astronomer's targets at a time, and they need specific conditions to be ideally used. And so a peer review process was created, requiring a proposal for an astrophysicist to be able to perform their experiments. And so as an example, at the European Southern Observatory, they have been overwhelmed with reviews. And they have tracked how many reviews, how many proposals they have gotten over the years, and they have five times more proposals than they did in 1977. And so a study in the biomedical field found that 20% of researchers perform over 70% of the reviews. And so if a minority of researchers review a majority of the proposals, this can cause non-expert reviews, as proposals are highly specialized. And so we propose and we will present an expansion of the review pool to the global astronomical community, so that we can diversify the review pool geographically, and across specializations. And so one of the biggest problems for the observatories to find reviewers is understanding who is a researcher in the field. And so the DTI Registry will have a unique profile for all active astrophysicists worldwide, which will contain relevant information for their peer review. And here's an example profile of my advisor. And there will be extended metadata, and this can allow a facility to have finer-grained searches. So as an example, if they want to involve more early career researchers and other examples as well. And this will be transparent and open to the community. And so to build this, you must first gather reviewer information. And the first place you might think of is the Open Research and Contributor ID, ORCID ID, which assigns a unique identifier to a researcher. However, only roughly half of researchers in each field use ORCID IDs in their work. And this is because a researcher must actively sign up. And it's unclear how up-to-date these are. And the next source of information is the NASA Astrophysics Data System, which is an online database of over 16 million astronomy and physics papers. And the worldwide experts are contained within the author list of these publications, but their identities are ambiguous as their name is scattered across their publications. So these two sources contain the worldwide astronomy community. And so we will use these to expand, to include more researchers and more reviewers than ever before. And so our pipeline has the goal of creating unique researcher profiles. And there are many sources, as I said, for this information, such as affiliation websites. But we choose astrophysics publications as a majority of them are open access. And we extract the author names and the publication metadata, such as titles and abstracts. And then the publication metadata is input into the unique researcher profiles. And you can extract expertise from this. And you can see Kersendorf et al. 2020. And the publication metadata can also be cross-matched to external researcher metadata sources, such as affiliation websites or ORCID IDs. And this can be leveraged to create a more robust, unique researcher profile. And today I'm going to talk specifically about the disambiguation as there are some challenges here. And so here I will talk about the problems with disambiguation. And there are problems finding unique authors, specifically there will be incorrectly merged. Different authors have the same name. For example, last name Smith appears over 4,000 times and last name Lee appears over 23,000 times. And then names can be incorrectly split as well as an author's name appears differently across the publications. And this can be due to appearing with and without the middle initials and various reasons. However, this can disproportionately affect names from different regions. And as similarly to what Pierre talked about yesterday, this can cause a discount to groups of people. And that is our goal is to disambiguate them to not have this discount. And so we wanted to first validate name-based techniques as our database is of roughly 1.8 million publications. The most prevalent data is the author names themselves. And there are some techniques that use author names, so we wanted to start there. And the first thing that we did was create a validation set and we cross-matched Astrophysics publications and ORCID identifiers. And we found a database of 16,000 identified astrophysicists with the true labels. And we then validated simulated methods as a majority of author name dysmigration algorithms are either simulated with simulated identities or are unsupervised machine learning tasks. And we used our validation set to validate these name-based methods presented here. And we found 2.5 times more splitting and merging of names than was presented. However, as just a preliminary result, we wanted to use these methods to disambiguate the publications from the NASA Astrophysics Data System. And we did this just for the articles or PhD theses in Astrophysics, which is roughly 1.8 million publications. And so here we have our preliminary results. And this yellow line is the International Astronomical Union, which is one of the largest astronomy societies in the world. And this blue curve you can see is the disambiguation using the first initial method, which in Newman 2001 is referred to as the lower bound, as it strictly just merges names together. And so what we see from this is that by only relying on these societies and other groups like this, we are missing a large chunk of the expertise that can be leveraged for peer review and to review proposals. And so we will continue to do more work on the disambiguation, but this is just a preliminary result where you can see the kind of the lower bound. And so for our future work, we are continuing to leverage the publication metadata and the true labels from the ORCID validation set to validate more modern and computational author-named disambiguation algorithms, such as using natural language processing on the titles or the abstracts. And so just to summarize, in peer review and Astrophysics, there is, they have been overwhelmed with reviews specifically in their allocation of community resources. And we present a way to broaden the reviewer pool. And in our development of the registry, we found that current author information sources are incomplete, which leaves author identities to be ambiguous. And the current methods, simulated methods, perform poorly with our validation set, and there are many, many new algorithms available to be validated that have never been validated. And so thank you very much. This is the QR code for our website. Two minutes for questions, please, Brian. Thank you for that. I think maybe you can help solve a problem that we've been trying to think about solving in this space, and that is that the proposals for telescope time are functionally a form of pre-registration if they were made more accessible and visible. And what seems like you're solving is that there could be a partnership with journals, because your service may help them if we conceive of this as a registered report process. Where the review for telescope time is stage one, the journal uses the peer reviewers that you've helped on discovery to commit to publishing it regardless of outcomes. And then all the journal then has to do is follow up review to see if they did what they said they were going to do. Have you been thinking about that? Does that fit with the model that you're working on? I think we would definitely be very interested in doing that and partnering with journals. As I'm not an astrophysicist, I'm not 100% sure if they do registered reports and if this is something that's in their workflow. But I think definitely the proposals could act as something like that. But we would have to definitely discuss with journals and set up a collaboration. You're willing to talk to us more about the registered reports so that we can share that with our community to learn more about that. Hi, I'm really interested in this extraction of an expertise database from your work. I'm curious if this is applicable to fields where there's a lot of interdisciplinary collaboration. Do you have a way of identifying, you know, basically through patterns of authorship? Which authors have expertise where? My other question is I'm not sure how author order works in astrophysics, but I'm curious if there's any way of addressing the level of contribution in your data set. Yes, so for expertise extraction in Kursendorf 2020, Kursendorf et al. 2020, you can see some more specifics of how that's done. But we use natural language processing techniques on all of the papers that someone has submitted to kind of represent their expertise. And so as you're interdisciplinary, your expertise will be shown within different areas in that way. And then I forgot what your other question was. Order, name order, yes. So in astrophysics, at least, you're like the primary person is the first author. And the rest, first and second author are the most important. The rest doesn't, could be interchangeable. But I know at least in our group, we try to use the credit taxonomy. But it's, we're one of the unique people who do that. And so if this was expanded beyond just astrophysics, there would have to be some fine tuning. As I know, in other fields, it's just based on alphabetical order, et cetera. And, but there's definitely things you can do to give more contribution. You just have to make it specific for each field as there's a difference. Thank you. We've got to move on to the next talk now. Thanks so much. Thank you.