 All right, it's the top of the hour. And I guess we'll start. Thank you for coming to our session on Integrating Research Identifiers into University and Library Systems. I'm Karen Smith-Yoshimura from OCLC Research. And we will be switching places. This is Michael Altman from MIT. First, I want to make the plug. This presentation is based on works at a registering researchers and authority files task group has been working on for over a year. And the draft report is just out at the URL given here. So those of you coming to this session, I'm hoping that you will go back and review and provide feedback to our draft report. And the co-conspirators for this report represents a group from three countries, the US, the UK, the Netherlands, and different perspectives. We have people who are represented on the ORCID board, including Dr. Altman here. People involved in the VF, Virtual International Authority file, Tom Hickey, the chief scientist at OCLC Research is responsible for the creation. A number of librarians responsible for LCNACO authority work. We have people from the ISNI board, International Standard Name Identifiers, UK project, and with Laura Dawson, also representative from the publisher perspective. So we have this kind of cross perspective that we try to bring on the whole issue of registering researchers. Now, our disclaimer, the opinions are our own, but they're also the ones that are in the draft report. And yes, it's tough to make predictions, especially about the future. And if you want to look up who really said it, be my guest. All right. A little bit about OCLC research itself. We have a number of different projects. You go to the OCLC page, OCLC Research. You can see a whole overview of our various activities. Basically, our mission is to act as a community resource, your resource, for shared research and development. And we also work closely with institutions affiliated with the OCLC research library partnership, which I know a number of you are affiliated with, to work on common issues. And we also provide technical support and develop various kinds of prototypes. And if you go to our page, you'll see this is one of many, many OCLC research activities. But we're focusing today on our registering researchers and authority files task group, basically how to make it easier for researchers and institutions to more accurately measure their scholarly output. And that includes looking at the challenges to integrate authority author identification, approaches to reconcile data from multiple sources, and models and workflows to register and maintain integrated researcher information. So we do have an activity page on the OCLC research site. And there's a link there to our works in progress, which includes this draft report, which I wish you all to review. So for the preview, we're going to cover our motivations. That's what Micah's going to cover. I'll come back for a landscape scan. And then Micah will conclude with some of our observations and recommendations. And we plan to have enough time for questions and answers at the end. So with that, Micah, you're up. So Karen promised that we'd talk about our motivations, which are, of course, glory and fortune. But the bigger university and ecosystem motivations for this is, as is often the case in these days, more. So a little contrast. There was a review of publishing of co-authorship and science in 1964. And you can see it changed dramatically over several decades from two co-authors roughly to 2.4. But change is rapid. And in some fields, by 1980, it was 4 and 1 half, 2006.9. And now, this is a pretty picture of a galaxy. It's also an author list for the Sloan Galaxy Zoo. And that has only some tens of thousands of the hundreds of thousands of named contributors to that project. So that, as authorship expands in that way, that raises a number of issues, trends, and questions. Like, what do these authors mean? How do we figure out which authors are there? How do we integrate systems for contributors to data, journal publications, books, and many other questions? But we're focusing on the integration disambiguation questions here. And this is important because, as you know, scholarly output impacts reputation and ranking of the institution. And I was trained as a researcher. And there's an unkind thing we sometimes say about deans, which is that they can't evaluate, but they can count, or at least they can weigh things. So citations are often the things that get counted. But the citations rely on names to associate the thing being counted with a person to associate that person with an institution. So it all comes down to connecting this name to the work. And that's ambiguous because scholars may be published under lots of different names. And in the libraries, we know about this. And we have elaborate, formal, controlled, systematic, robust systems to do name, authority, variation, et cetera, for books. And so in the world of research articles, data sets, beyond things that are formally cataloged in national libraries, the same name can connect to different people. And here is an example of three different names. And we know that these are different people because Mike Conlon is on our task group. And he disclaims these other people. They're not even related to him. And the point I also want to make, Mike Conlon is a relatively uncommon name, although here you already have an example of two in the same field with the same name. But think of Chinese names, ethnic Chinese, and how often they are in the scientific literature. They're in the top 40 of the countries responsible for articles in all scientific journals. In China itself, there are 250 million Chinese with the surname either Li, Zhang, or Wang. And that's only in China. Just think of all the overseas Chinese. So if you think of the commonness of these Chinese surnames, the odds that a C. Lee represents more than one person is very high, regardless of institutional affiliation or discipline. Oh, I'm up for landscape. How convenient. Yes, to be continued. The other thing we're finding is that there are researchers out there who have discovered that they do have already an identifier, in fact, more than one. A lot of researchers don't realize that some kind of librarian somewhere has already created an LC-NACO authority record for them. And if they do, they have not only an LCCN, but since LC contributes to the virtual international authority file, they'll also already have a VF. Not everyone realizes that, but this is an actual from a signature block of a researcher. And he listed, and he knows by himself, 13 different identifiers or profiles just representing him in these different systems. So what we've done is we started out with thinking about use case scenarios using seven different stakeholders, researcher, funder, university administrator, a journalist, librarian, identity management system, and aggregators of various type, which includes publishers. And then for each of those stakeholders, what are their needs and what are their functional requirements? So for example, a researcher wants to disseminate the research, want to have the compilation that accurately reflects his work and not somebody else's work, to find collaborators who are interested in the same field, and basically to ensure that his network presence is correct wherever it may be. Funders have a stake in this as well. They want to attract and make sure that the research that they funded is correctly associated with the people that they gave the money to. University administrators, as Michael pointed out, they're really concerned about counting, to make sure that all the scholarly output of their scholars is being accurately reflected so that their reputation and ranking will be accurately, also accurate. And then librarians of course also have a stake in to uniquely identify each author as they come in for the various systems. I'll skip over and just talk about a little bit of the function requirements. This is just a partial list, but as a stakeholder, librarians do want to create consistent and robust metadata to associate metadata for a researcher's output with the correct identifiers to disambiguate the similar results. And as we've seen, name is not enough. You need other kinds of attributions, whether it's institutional affiliation, some titles of their works, disciplines or professional associations, something else besides just a name and certainly more than just a string. And to merge entities that represent the same researcher and split entities that represent different ones. For their researcher and university administrator as a stakeholder, we have to link multiple identifiers that a researcher might have so that all their output can be collated like that person with the 13 different identifiers. You want to associate the metadata with the researcher identifier so that it can resolve and give a complete picture of their intellectual output. Want to verify that a researcher's and their work or related work is represented in whatever system that they're using. And you want to have register researchers who do not yet have a persistent identifier. We also looked at researcher and university administrators which also have similar needs to link the multiple identifiers to associate the metadata with those identifiers and to make sure that they're accurate. For funders, the big thing is to link the metadata for the researcher's output to the grant funders data. So all in all, we did a profile. We started off with a large list of over 100 different research information systems that Dr. Conlon had compiled. And we used two basic criteria to select the systems to profile. One, to have a significant mind share or take up by researchers. And two, researchers had to be represented by a persistent, unique, and publicly accessible URI. And we wanted to end up with a representative sampling of different categories. So we have authority hubs which includes the digital author identifier used by the Dutch, the lattice platform in the Brazil, LCNACO authority file, the names project from the UK, ORCID, Open Researcher and Contributor ID, Researcher ID, and the virtual international authority file. And then we have current research information systems. Simplectic was represented on our task groups. So we have that perspective represented as well. Identifier hubs like ISNI, national research portals, online encyclopedias, reference management, researcher and collaboration hubs. Researcher profile systems, and we include here a community of scholars, Google Scholar, LinkedIn, ScienceCV, and Bebo. And then subject authors, identifier systems like author claim and subject repositories. We create a great spreadsheets of all of this and the profile consists of a lot of different characteristics. And I'm only showing a very small partial view here of the authority and identity hubs. The key point here is that even though each of these are hubs, the amount of researchers represented varies. Some, like the lattice platform, focuses solely and only on researchers. So that two million people are two million researchers, all of the researchers in Brazil and all those who collaborate with them. Whereas for others like the ISNI or ORCID, there's only a portion of those. And it's really impossible to tell for something like the virtual international authority file or the LCNACO file, how many of them represent researchers. They're very large, in case of VF, we're now up to 30 million personal names represented in the aggregation of all national authority files that contribute to it, but we have no idea of how many of those represent researchers. But there are lots of overlaps. For example, the digital author identifier system is loading their records into ISNI and ISNI is loading into VF. So we're getting some examples of overlaps. So for example, that Cliff Lynch, who I think you all know, he's represented in the virtual international authority file. There's a link to ISNI, ISNI contributes so he has an ISNI identifier. It's also represented the LCNACO file and the Dutch national file. And then there's a Wikipedia article on him. So he's represented in multiple places that are shown within the VF context. One of my colleagues at OCLC Research, Eric Childress, this is his LCNACO authority record. They're starting to have more zero two for other identifiers listed. So his ORCID and VF and ISNI are all represented in the same LCNACO authority file. So we're getting these overlaps. Now for some observations. Before we get to the observations, I'll make two statements. One is that these observations are not quantitative. We don't have a formula for going from the data we collected to the observations, but the data that we collected is available transparent and the first round is posted on the OCLC site now and all of the intermediate coding products will be posted along with the final copy of the draft. The second point I'd like to make is see the disclaimer at the beginning as these observations are future focused. So first of all, just a descriptive observation. Where is everyone? Well, if you're talking about everyone who matters, then maybe it's linked in. Just looking at the total number of professionals indexed in any of these systems. But we're focused on the people who really, really matter. So where are those? Well, using the numbers that we have plus some back of the envelope calculations to make up some fudge factors for estimating percentages of researchers in other places, it's clear that mostly they're not anywhere. I mean, there may be, we might see, there are a lot in Orchid and Isney now relatively, but mostly we haven't captured them yet unless they're far more linking, lurking and linked in than we think. Maybe 5% of linked in as researchers, in which case that would be off the charts, but I doubt it. The scholarly landscape around identification is clearly changing. One change has to do with the difference between books and journals and the relative increase in publication around the latter and around other sorts of research objects like data sets, software, digital constructions that behave in a way a lot, they're neither journal nor books, but they behave in more journal-like ways in some of these characteristics. So books are read by everyone, but they're, or at least everybody claims to, and journals mostly, but tenure and promotion is largely based on articles in many fields, in much of the social sciences, life sciences, physical sciences, and books have a robust system of name and subject authority, but no robust system of citation tracking and journals that's been flipped around. And there's also that sort of full text indexing. And so research identifier systems are not the same thing as name authorities. The stakeholders for traditional name authorities, the primary users were research libraries. The primary stakeholders for researcher identifiers is much larger, it's a much more diverse group. Research libraries care about this, other parts of the universities care about it. Funders, researchers interact with these things directly, and if you asked a researcher about what their name authority record was, they would not even know what you were talking about in most cases, but they know what their archive ID is. Because this was built in one, for one community, the name authority systems are standardized and well integrated within that set of library systems, whereas the researcher identifier systems tend to be fragmented, but they're some well integrated with particular communities of practice. The organization of the traditional name authorities was mostly top down, national, clearly from a national authority file, with carefully controlled vetted inputs from member organizations that were contributing to that top down structure. Whereas there are many different models in the researcher ID space. Top down, bottom up, middle out, individual contributors. For external integration, it's hard to integrate with national, with traditional name authorities if you're outside a library system. There's a high barrier to entry, the protocols are strange, they're not generally using JSON or XML even, and they're not a lot of open APIs that are exposed, but for ID systems, again, it's for researcher ID systems, it varies, but there are some services that have open data, open APIs, they even integrate with web 2.0 protocols. Like you could use the same, you could consider using the same ID to sign into LinkedIn and Facebook and ICPSR. Who knows? And that actually works. The works covered are different. Traditional name authorities focus on book-like things, or things that, you know, library's catalog. So they originally could go on shelves at least, and be individually labeled on the shelf for the most part, whereas the scope of researcher IDs are pretty much everything else. And we're producing a lot more of everything else these days. There's that more word again. The people involved, authors of works published in or about a large country versus everyone else making everything else. And the criteria for getting records in the system are different. And traditional naming authority system, we aim for a complete record of the work. All right, here, a particular work and a complete description of it. And that may involve unambiguous identification of authors as a sign of the topic, but the orientation for most identifier systems are different. It's to create a persistent and ambiguous identifier that you can attach to an individual contributor and use in a lot of places. Whether that has complete association information related to that, that's a different question. The value is in the persistence and the link to the individual, not necessarily in some guarantee about the completeness of the information associated with it. That comes from other services, other layers, et cetera. And the environment is much more complex. Karen talked about our attempt to categorize all of these different sort of interoperating services entities into different classifications and we came out with only 10, but in reality, these overlap. It's a Google Scholar is a floor wax and a dessert topic. It's a reference manager system and a profile and some signs URIs that don't change and attach to a person, which you could call an idea if you like. Mendeley is, again, a profile system, reference manager system, metric gatherer, a secret intelligence information system for who knows who. Not only are these systems gray, do they overlap, but what they do today might not be the same thing that they're doing in a year and two years and three years. It's a complex picture. And systems can have both producer and consumer relationships with them. Chris systems consume identities from identity providers, from naked records. They also create records that go up to other systems, but in general, it's unclear how information is equilibrated and entirely corrected across the whole system. If you see an error here, it's not necessarily clear where that information came from or how to correct it all the way up the chain. And institutional members and maintainers of these various systems overlap. There are researchers in different institutions who are participating in many of these systems. There are publishers who are participating in different collections of these. And so a more, we attempted to draw a more complete system, which is you can study in the report until, if you stare in the middle of this for long enough, you'll see a 3D picture emerge and you will be enlightened. But what we got out of it is that, again, there are a lot of institutional contributors that have inputs into all different systems at different stages. It's not clear within those institutions how those things are being coordinated. That information flows roughly to the right, but if you find information, but there should be some error connection that flows back that way or right, left, confused. Before you leave, I also, for this audience, want to point out that the libraries, it's in the upper left quadrant, which contribute to the authority records and they also get information from the publishers. And they may also be controlling the institutional repository system. It's such a separate place because that's in the lower right. That's where your Harvard profiles or your Stanford community access profiles for the researchers. And you'll notice that there's no line even within the same institution connecting the top left quadrant with the lower right. That's right. The same institution may be involved with systems all through, but that doesn't mean the information is consistent or managed through that. That's one of the challenges that identifier systems might help with. So what are some emerging trends? And here we get into the problem of predicting the future. That was predicting the present and I feel really reasonably safe about that. Though there's still room for lots of error. But now we're in dangerous territory. So one thing that is reasonably clear is that there's a widespread recognition that persistent identifiers for research are needed. No stakeholder that we talked to or that we heard a lot of saying, this is not an issue, we should ignore this and let it go away. Registration services rather than authority files are going to be the solution for researcher identification. I'm not clear which yet entirely, but this is not a problem that's being solved by traditional library systems. And this is a sort of hopeful emerging trend and we cross our fingers that it continues to emerge. But that interoperability between some of the major systems are increasing. In the adoption space, publishers have been early adopters in this space and a number of things that we've seen is that most major publishers, scholarly journal publishers with a US presence now integrate to some degree or not other orchids into their submission process and into their systems. And that this is generally implemented through integration with manuscript submission systems and with cross-ref. So ideally when a researcher, an author comes to a system and submits a manuscript, that identifier gets created if it doesn't exist or submitted at that point. And it travels along the review revision process and when it's published, it gets indexed. And now you can go to just do a DOI lookup and with the standard sets of DOI services, figure out at least the first offer unambiguously. And by the way, crossref is doing projects with indexing funding identifiers too. So that means you could sort of interpolate the set of articles that came out of funding with particular people, which is an interesting proposition for some of our stakeholders. And we're seeing integration of ISNIs in publishing platforms and in funders as well. And the funding, the funder adoption has lagged the publisher adoption. But there are trends toward, we're starting to see national funders make use of identifiers in systematic ways. The Portuguese national funder recently required orchids for their national evaluation system. And the interesting thing, one of the interesting thing about that, watching it happen was that because some of these systems are not top down and orchid allows individual researchers to register, the Portuguese government didn't sign up and create identifiers for everybody. It said, hey, we're gonna base our evaluations on this stuff so you better register. The Netherlands has a national funder. They already have a researcher identifier system, their local version, and it's top down and they fed it right into ISNIs. Now, voila, all the Netherlands, Dutch researchers have ISNIs. And the Swiss are taking a similar approach. In the US, number of funders in the US and UK, welcome trust, integrated orchids, early adopter there. NIH has integrated orchids into their interagency biosketch system, which is science V, and the US DOE has done that. But that's, again, early days in this. In universities, we're seeing universities assign different sorts of identifiers and in different ways. Some are doing it systematically. Some are bulk registering. Some are tying it to their HR systems. Some of it is an opt-in. And since the time we wrote this draft, we got a list of even more stuff that's going on with adoptions and integration into the infrastructure that universities use to manage information, like electronic thesis and dissertation systems, institutional repository systems, profile systems, et cetera. So what are the set of recommendations? Well, the report has a long set of recommendations at a sort of micro detailed level for each set of stakeholders. And so what we try to do is summarize those into a set for targeted for universities. And the first is prepare to engage. So the adoption of researchers, research identifiers has been especially rapid within scholarly publishing. Funders see clear benefits to this and are beginning to engage. And forward thinking universities should transition from watchful waiting to some sorts of engagement analysis. That may be simply developing some outreach materials for researchers and stakeholders explaining what these things are and where to get some of them and how they might be used. That's something that my library system was working on last week. So you can steal ours if you like. Future proofing systems. When you go and look at your library systems, authors are not a string anymore and identifiers can be multi-valued and have multiple different authorities. So instrument that into the systems. Also when you, this is maybe a little bit subversive especially for my publisher colleagues, but publishers have a lot of information. They have, they're instrumenting more and more. They're getting information about author identifiers, about funding information. They have some cases statements about who made what contributions, some of the information like acknowledgments, grant information, even if it's not coded is at least separated out into different areas. What we generally get back is PDFs. But that's not good enough. There is more information there. We should be getting that from in our systems, instrumenting that into our systems and putting that sort of thing into our contracts so that when we subscribe to digital publications we get more of this information that's gonna allow us to do better evaluation analysis. And we should certainly prepare for more measurement and reporting. Which to choose? Well, there are broadly two sets of, unless you're going to take a really innovative and unusual move and tell everybody to get a LinkedIn ID and use that, which we don't see any universities doing. There are two bets in this space that people are making. Orchid and ISNI, they're not incompatible bets. Which one you start with first might have to do with what your closest stakeholders are doing. If you're in the Netherlands versus Portugal, for example, it's maybe a clearer choice. The systems have some different capabilities as we talked about and the usage patterns are somewhat different. This doesn't mean that you will throw out name authority systems. They're going to be here. And increasingly things like VF will be integrating with these other systems. But even though while VF integrates with these systems it will not be minting these systems doing the quality control. So if you want to mint them and have an ability to evaluate do quality control, et cetera, engaging with these upstreams necessary. And you should be aware of community identifiers. These will continue to be important in local communities and to be imported in particular spaces. But they're not going to be, they're not a solution that's being proposed for this general area. But you should be aware that there's an investment in these community identifiers that can be built upon and mapped to these other systems. And finally, this is a risky environment. The environment is evolving. Fundor mandates and policies are very incomplete. There's no clear dominant business model. Most researchers are not in any of these researcher identifier systems as we saw. And there's no clear integration between the classic traditional systems and the new emerging researcher ID systems. So there can be a lot of impedance mismatches, though systems like VF on one side and Chris systems, D space, ETD systems, those systems from various parts are starting to have this integration. But you're probably not gonna go to your ILS system anytime next year and find this integrated in there. And researchers are not going to drive this change alone. Some have signed up, many sign up on their own for these, for researcher identifiers. But, and all are sensitive to who controls their profile and how that information gets corrected. But that doesn't mean that most of them will take the time to go and fill out a complete profile anywhere. So instead of mechanisms, well-timed nudges and setting norms with junior scholars and also establishing information feedback loops so that people know about what their profiles are, where that information is coming from, how that can be corrected at times when they're using it. Like evaluation reports, submissions, et cetera. Very useful. And here's our reading list. The first thing you should read is this and then you should scribble lots of red pen on it and then you should type in the red pen markings and email it to us. It's not long. It's short.