 I'm Karen Smithio Shimuna, Program Officer for OCLC Research. And today, talking about the work we've been doing that highlights some of the challenges presented by organizational identifiers. I'd like to first start with a short definition of what we mean by organizational identifiers, which are unique, persistent, and public URIs pointing to organized bodies. I give four examples here. They're all hot links. IDLOOK for the national NACO authority file, the International Standards Name Identifiers, or ISNI, the Virtual International Authority Files, or VF, and WIKI data, which also has identifiers. And in this case, all four of them point to none other than University of Washington. So when we look at identifiers, one of the key uses of them is trying to track the scholarly output of an institution. And in fact, three of the most widely used international ranking systems for universities and academic institutions all rely on citations as a factor in how an institution is ranked. Therefore, it's really important that the researcher is affiliated and associated with the correct institution because it's going to affect what the ranking is. I'll note here that national authority files, when they have organizations and identify for organizations, it's usually organizations as either authors or subjects, and they don't really care about organizations that are there solely as affiliation with a researcher. So there might be some gaps. This all started out from work from another OCLC Research Library Partners Task Group that finished its work last year on researcher identifiers. And the report was published last year. And it turned out in discussing the report with the partners that it became key of not only what the researcher identifiers are, but the identifiers for the organizations for which they were affiliated with. And many of the functions and requirements that we had identified for researcher identifiers were equally could not be met, could not be fulfilled unless you also dealt with organizational identifiers. So for example, the institution wants to track all of the scholarly output from their institution. So you need identifiers not just for the researchers, but also the identifiers for each of the schools or departments that you want to track within the institution. Then you want to track research groups which make them prize staff from multiple different institutions. This can get really complicated when you have research groups that get funding externally, and yet each of the staff are represented and affiliated with different institutions and different departments. Then, whoops, I have to click this way. National reporting, a little bit different. I keep on doing that. Not so much in this country, but especially in countries like the United Kingdom and Australia, they have reporting exercises where they have to bring everything up to the national reporting agencies. And that's important because that depends on how much funding they get from their national governments. So again, it's really important that they track the scholarly output of each of the institutions if they're going to get funding. And of course, the funders themselves want to track how much money they're giving to any institution or departments in the success of the funding that they've been given. One of the key ways of disambiguating researchers who share the same name is to also look at where they're affiliated. So you want to have the correct affiliation of those researchers in order to say whether this K. Wong is from Johns Hopkins or from University of Washington, for example. And then you also want to be able to correctly identify the researchers' affiliations on publications. So no matter how diligent you might be on publicly having URIs for your researchers, it's not going to mean much if the publishers themselves don't use the correct identifiers for that same faculty. And of course, it gets more complicated because guess what? Faculty tend not to stay in one place. So by the time something is published, the affiliation for that researcher might well have changed by the time it's actually published. So all of those are kind of needs that also require these persistent, unique public identifiers for organizations. The other aspect of this is many organizations don't even realize they have identifiers already. And we'll go into that soon. So probably not for this audience, but I thought I'd repeat why we're dealing with identifiers to identify things rather than relying on strings, that is, text strings. As an example, when you read an English text, you can often see references to say the Bibliothèque de la France in it because that's what it's called in its own language. It'll be referred to as BNF, or it'll be translated as the National Library of France, all in English text. But if you go to other language texts, these are all referring to the Bibliothèque de la France. And if you know Chinese, then you can easily say, ah, yes, that's the Fong-Borg-Gorgia de la France. But no one's going to be able to immediately associate that unless they have the same identifier. Now, this is really the same kind of issue we have with personal names. People often have nicknames, use middle names, have initials. So it's similar to having Bibliothèque de la France or BNF. But organizations are more complex and have more challenges. So for example, organizations, they merge and they split. So one identifier might become two or three or multiple identifiers. They have a tendency to acquire. Other organizations are get acquired. They can have multiple departments, multiple schools. They have hierarchies. And those hierarchies can change over time. And in fact, they can have multiple hierarchies, depending on where they are or depending on departments. Branches can exist in multiple locations or in multiple countries or different jurisdictions. And it can be even more complex because those countries and jurisdictions can also change over time, adding to that kind of complexity. Others, it can be unclear when a name change represents simply a name change or actually represents different type of organization. And we have found that there's different practices. For example, in the national NACO authority file, a name change, often whether it's one record or two records, there's differences of opinion or approaches of whether at what point do you have a different identifier for simply a name change. And most important, I think, that we have to keep in mind there's different perspectives. And to me, the poster child of this is in the ISNI database. There were people who were really surprised when they looked at their institution's entries in the ISNI database, couldn't understand why there were these departments listed and not other departments. Well, if you looked at the sources, the vendors, the publishers who sell subscriptions don't care about any of your units that don't buy their stuff. They have identifiers for the units of schools that buy their stuff, then they have an identifier. Don't buy, you're dead to us. They don't have an identifier. And for some public goes even further. You stop the subscription, they delete your identifier from their system because they're dead to them. Well, that school still exists. But it's a different perspective. So when you're dealing with unique, persistent public identifiers, you're dealing with people who have different needs and, therefore, different approaches under what circumstances you assign an identifier. The Trobe University is currently undergoing a massive restructuring that sort of illustrates the kind of challenge we face when you're dealing with identifiers for organizations. They currently have five different faculties. Most of those faculties also have dozens of different schools or centers associated with them. And they're collapsing them all to two different colleges. And so this is a case where the faculty are actually staying where they are. But their affiliations are changing around them. At the same time, we've been looking at organizational identifiers. There's been another group in the United Kingdom, the JISC-CASRAE Organizational Working Group, also working on looking at organizational identifiers that can be used throughout the higher education in the United Kingdom. CASRAE is the consortia for the advancement of standards and research. What is there, research? Yes, research administration information. They did their own landscape review of what kind of institutional identifiers were out there. They did member stories of use cases. And they are just now publishing the summary of their reports and conclusions. One of which I thought was really interesting that I happen to agree with is trying to separate out the use of identifiers from the services that you build on top of them. The other part of it was that they thought, they came to the conclusion that going on to the future was as a sort of bridging ID or hub that could be employed to link together different identifiers that might be used elsewhere. And finally, they asked, they encouraged institutions to integrate the organizational IDs in a way that they could, in fact, at some point be mapped and curated in the ISNI database. So they came to the conclusion that you're never gonna have just one identifier, you've got all this, but at least use what has already been done for the international standard name identifiers, build services on top of that, and also encourage people that whatever identifiers they do tend to be using to go back and contribute them and link them up in the ISNI database. So I gave the link to the report there. So when we talk about the ISNI organizational identifiers, one reason why our task group has also focused on ISNI is, well, for one thing, they already have almost a half million organizational identifiers, public identifiers assigned. So that's already pretty good progress. They're already being used by ORCID for organizations, ORCIDs for individuals, but any identifications that they use in ORCID, they're using the ISNI for the organizational identifier. They already have links to multiple national authority files because they have links between VF and ISNI. And those are bi-directional links. They're links from VF to ISNI and from ISNI to VF. And they're already present in WikiData. I'll go into that in a little bit more detail now. But first, when you think about ISNI, one other aspect of ISNI that's very appealing when you think about organizational identifiers is that it's not just in the library domain. Yes, they have identifiers from the libraries, archives, museums domain, but they also have identifiers from trade sources and the publishers, text rights, music rights, encyclopedias, and various research and professional organizations like article databases and theses databases. They're also important to libraries. Here's the list of the various types of sources for each of those kinds of categories. So, for example, you have publishers represented like the Balker Registry in Reingold. You also have rights agencies like the Authors Guild represented. You also have music brains in the British Library Sound Archive and then Modern Library Association and various theses and so ProQuest is represented. So you have different sources, often with different perspective, rights agencies that are really, really stringent about assigning the correct identifiers because that affects who gets the royalty check. So you want to make sure that those are all correct. We have statistics, lots of statistics I just wanted to point out. We do have these links between ISNI and BF. These are specifically the number of links from non-BF sources. So BF sources tend to be national libraries or libraries like the Getty Regional Libraries. When we look at just the ones that are not represented in BF, we still have from ISNI to BF 7.4 million links and going the other way, BF source links back to ISNI, over 27 million links. So those are a lot of links between the two that can be used to make sure that you're dealing with the correct organization. BF does aggregate identifiers because each of those sources, of course, also have an identifier. So when we have the LCNACO logo there, it's the LCCN, that's with an id.log.gov. But it also aggregates the identifiers from each of those national libraries, including in this case ISNI and Wikipedia. Wikidata is a disseminator of identifiers and not only aggregates them, but disseminates them. So when you look at the Wikidata entry for the Bibliothèque National de France, you see that it has an ISNI assigned. And then when you go to the English Wikipedia, that same number is in the English Wikipedia, but it's also in the Korean language Wikipedia where I should say Wikidata is how you'd say in Korean. So no matter what language you're looking at, you can use the identifier so that you're basically free of the text string and referring specifically to the entity. So ISNI currently has over 8.8 million, almost 8.9 million ISNI's that are publicly assigned. That represents about 40% of the total. There was only one million just two years ago. And as I said, it has over 27 different million links. The IDs are persistent. We now have a process so that when, as part of the ORCID registration process, it also accesses the ISNI database to see whether something has already been assigned. And the linked content coalition has named ISNI as its number one strategy as a bridging for all kinds of identifiers. So the underneath this is a difference between in those different groups, the research groups, tech rights, music rights, libraries and trades and organizations. How many ISNI's have been assigned? And as I said, almost a half million are just for organizations. Now one thing that's different from ISNI than some other organizational IDs is they have a quality team comprised of staff from not only the ISNI team in Lighting, but the British Library and the Phibli Technosciente de France. And it's not just every time somebody, there's a new organization, it automatically gets an ISNI. There's a provisional ISNI's, but it has to go through a matching exercise so that there's a level of confidence that these different sources actually refer to the same entity. So for example, they require at least three or more VF sources to agree that this entity is the same thing. If it's not VF, then it's at least two. And then there's different single sources from the publishers because for the publishers, they need to identify nevertheless, which can be part of the link. So when we say public, it's 8.8 million, but as you see, there's the provisionally unassigned other identifiers that have come in from other sources that have not yet gotten enough corroboration where it's now at more 9.5 million. So roughly 40 some percent of all the identifiers that come in have actually been publicly assigned that you can access. So as I was saying, we have this new task group called Representing Organizations in ISNI Task Group. Even though most of our recommendations are targeted for ISNI, our goal is to also use the kind of data modeling from our use cases so that anyone can use the same kind of data modeling or leverage the data modeling we're doing for their own purposes. We do have representatives from the JISC-CASRA UK working group on our group. So there is some cross-fertilization. The former co-chair of the NISO institutional identifiers group, Grace Agnew, is also part of the group as well as representatives of people who have been very active in the program for cooperative cataloging. So we also have that perspective as well. And the ISNI team in Leiden. So in ISNI, there are a very long set of data elements you can use to have attributes assigned to organizations. Very few of them are actually used, but they're there if somebody wants to take advantage of them. This list, which would deal with some of those issues we were dealing with before, is a member of, is a unit of, is superseded by, is affiliated with, former name, last name. So if you imagine the La Trobe example, you could use some of those data elements to say, ah, this is now superseded, this faculty is now superseded by this college. These came from the NISO institutional identifier working group. But some of our use cases didn't quite fit. So some of the recommendations are just additional attributes like is hosted and hosts, for examples like the Hathi Trust, where you have the Hathi Trust. There are members of the Hathi Trust, but it's hosted by a specific institution, acquired and acquired by, especially for commercial institutions. There's multi-institutional statewide repositories. It might be hosted by some, but a number of them have quote unquote governing boards if to the degree that that might be important, we have governed is governed by as another possible attributes. And then there are cases where people are, it's more than just membership, but actual partnerships where people co-invest both resources and money. So partnering is another attribute. So many of the recommendations I said are for now with ISNI. We'll see examples. I'm just curious, how many in the audience have actually looked inside the ISNI database? Could you raise your hands? Oh, most of you have, okay. Many times you'll see a long list of different alternate names because they're coming from multiple sources, which can be really confusing if you say, well, which is the one that's actually used by the organization? So one recommendation is at least have a flag that say this institution to the institute, this is the form of name that they want to use regardless of what any of the other sources is. So that's one of the recommendations. As I said, we have all these relationship types, right now they don't display. So to the degree that people are going to be using these other data elements, display them. The contribution forms right now are basically focused on individuals. We think that there's enough differences between individuals and organizations that it deserves its own contribution form. One of the next things we haven't really started yet is an outreach document to explain to the institutions why it's important for them to take some responsibility for their own public identity, and why ISNI may impact how their organization is seen by other organizations or ranking systems. The question about how to engage those organizations to maintain their public identity and since we have a quality assurance team set up, it would be even greater if some of the organizations actually step up to collaborate with them to make sure that the corrected names get disseminated throughout the public arena. So this is, I probably can't see it too much, but this is a partial list of a very, very long list of various departments at the University of Minnesota. And it goes on for pages and pages and pages. So one question is, well there's some departments listed there, as I said, and not others. Some of those names have changed over time. The University of Minnesota did not submit these. They came from different sources. So if it was the University of Minnesota taking responsibility, which ones of these would they want to not only continue to have, but commit to maintaining? Because obviously it's not just a list, but a commitment to maintain this over time. What level of granularity is really needed from an institutional perspective? Do you really need to have every single department of geology and geophysics and various varieties? Do you really need them separately? Is it better at the school level? These are probably questions only the institution itself can answer. The fact of the matter is we're going to have these kinds of entries in ISNI and other databases from other sources. So it behooves the institution to also have a stand on how it wants to be represented in these databases like ISNI. Here's another example, a specific record example. This is from UCLA. And they were kind of surprised. This is the Raouville Bunch Center for African American Studies. It's part of a rather long list of alternate names. It's actually under a C from list. This is actually the name that this center likes to be called. If you go to the website, that's what it calls itself at UCLA. And yet none of those lists of the name appear in the top part. So this is a case where if you're an institution and this is how you want to be called, this is why he said, identify or flag how that institution wants to call itself. And it might be different names in different languages in those countries which have more than one official language. So as I said earlier, I agree with the concept that you free up the identifier from the services you might want to build on top of them. This particular diagram, and anyone who knows me knows, I'd like to have at least one diagram nobody can read per presentation. This is a use case that was developed by Jing Wang at Johns Hopkins University. And it was meant to illustrate the use case of having to have a institutional identifier or affiliation or researchers that are publishing in a publication. So you have different actors, so an administrator who's responsible for aggregating the data within their institution and the curator who's responsible for the affiliation data and their organization data. And you have the software, the publisher who has publishing software that the researcher uses. It presumes it is using an existing get-is-ne API which exists so you can get the API. But then you want to be able to verify the affiliation. And that's not really, that's not really the purpose of an identifier system. So this is the yellow box, a verify affiliation. Now, who's going to create that software? Well, we don't really know, but we do think that there is a need for a verify affiliation service. And whether that's from part of the publisher's software, maybe it's part of a research information management system. Maybe it's just an organization or part of a vivo extension. I mean, there are various kinds of systems out there, but they do need somewhere a module for verify affiliation to make sure that when you're tracking your researchers, they're being affiliated with not only the right institution, but the right school or the right department depending on what the grant criteria were. So I think it illustrates not just the use case of, yes, you want publishers involved in correctly assigning the affiliation for a particular researcher, but also a need for this verify affiliation service. So the issues I thought we could talk about together today are the kinds of things we've run across that quite frankly, we don't have a stand on. I think it's something that the organization, the institution might have a stand on, which is what level of granularity is really needed. And I'm thinking in terms of this archive grid bringing together all kinds of archival collections from different institutional archives. What level is it from the school? Is it the institution? What level? Who has a responsibility to indicate that preferred name for an institution? It's fine to say it's the institution's responsibility. And then there's this whole to make, well, is it the library? Is it the dean's office? Is it the provost? Is it the marketing? I mean, oh my God, we're gonna have to have a committee. I don't care, you decide. But apparently who decides is not a simple answer for institutions. But that's an issue. How to encourage the use of organizational ISNs by the publishers. I think we have good cases to make within the academic environment. We have standing in the academic environment. After all, OCLC is a library cooperative. So we have a feeling that we can have a fairly good chance of success of encouraging academic institutions in the adoption of organizational identifiers. We don't have the same kind of standing with publishers. Actually, you guys probably have better standing with publishers because you buy their stuff. So you could, in fact, go back to them as part of your negotiations and say, oh, by the way, we really need to have institutional identifiers associated the way we want them as part of what we buy from you. Money is involved, you have more clout. How to better reconcile name variants and related identities from these different perspectives. As I said, you have these rights agencies that have their needs, publishers with their needs. There's lots of name variants. How do you reconcile them and make it clear how and under what circumstances you use the name? I think this is, again, part of a service rather than the identifier itself. And how to engage services to build on top of these organizational ISNAs or crosswalk to them. So those are the issues I've highlighted. That's where we have the questions, discussion. I'll leave the issues up. And now I open it up to you. Either questions, comments, answers to the outstanding questions. Here with your copybook. Thank you, everyone. Thank you.