 I'm here to talk about the Australian Research Data Commons Party Infrastructure, one of our core infrastructure projects at ANZ. And the purpose of this particular infrastructure project, and I'll briefly mention some others at the end of this presentation, is to improve the discovery of data sets in research data Australia. And the way it attempts to do that is by linking research data sets via the researchers that are involved with them, and also by expanding the amount of information that might be available in research data Australia about that researcher or that research organisation, which helps people understand the usefulness or the authority of the data. And you may, the first question probably is, well, you know, why do we need an infrastructure, the people who are providing the collection activity and service records to ANZ will also provide party records, which says who this person is. The problem there is that there's a fairly common scenario where, you know, Dr. John Smith might be a researcher at Monash University, he's responsible for a number of data sets that are published by Monash University. He's also working on a collaborative research project. He's also responsible for data sets that are published by Queensland University of Technology and also works as an advisor to a program at CSIRO, and there are data sets published by CSIRO with which he's also involved. So you can see there that we won't be able to link those data sets together unless there is some sort of common identifier, and this is where the party infrastructure comes in. So it attempts to enable this linking by persistently identifying researchers and research organisations using a public identifier, and we heard Nick talking earlier about what an identifier is. It means that somebody is taking responsibility for managing that identifier. And by having a public identifier for researchers and research organisations, it just improves overall management of researchers in the research sector, and has a lot of spin-offs. And to a certain extent, the major beneficiaries of having this infrastructure is not necessarily research data Australia. The beneficiaries will be the academic publishers, the research grant authorities like ARC and NH and MRC. There are a lot of people who are trying to move towards this nirvana of being able to say who an identifier is, no matter in what context they are producing outputs, and no matter who they're affiliated with. So we needed to choose a public identifier scheme, and since you know it's not an ongoing organisation, we're not going to be here in the future, we needed to use a scheme that was going to continually be supported and managed by an ongoing organisation. And so there was already the NLA People Australia Service providing a public persistent identifier for Australians and Australian organisations. They already had 880,000 records based originally on the Australian name authority file. So basically the people in there are people who publish or people who are published about. But they also then included contributors such as the Encyclopedia of Australian Science, the Australian Parliamentary Library for Politicians, the Australian Women's Register, the Australian Dictionary Biography Online, they now have Queensland University of Technology researchers in there. So we are getting more, we have contributors in that space that will give us more information. So Dr John Smith at the Queensland University of Technology who also works at the Monash University, will also may have an entry because he is an Australian scientist in the Encyclopedia of Australian Science, which gives quite a lot of biographical information about him that's not going to come in through the party record that you're supplying in URIF CS feeds. So they have already established an aggregation service. They have got software and hardware and processes around the managing of aggregation from multiple contributors around Australia. They have contributors from most, because of libraries Australia, Australian universities are already contributing records to the National Library of Australia about publications and they're already submitting, most institutional repositories are already submitting records out of their institutional repositories to the National Library of Australia now for Australian research online. So it was good to sort of have all of these things connected in the one space. And also they're committed to integrating with international identifier services such as a virtual international authority file, I think the international standard name identifier is another initiative and ORCID, which I think is an initiative of the academic publishers to try and establish an international identifier. So because they're integrating with those, it means the identifier that the Australian public identifier we're using for researchers will map to these other identifiers and will help connect everything together, the research data sets, as well as publications from the same person or organisation. So how ANS established this project was to separate it into two stages, because this was a difficult one. It was in the sense that it was an infrastructure that was already in place. It was uncertain as to whether the research sector, in particular the universities who were the predominant research organisations and people like CSIRO, Geoscience Australia, for example, would actually feel about this idea of assigning a public persistent identifier. And I stress public, the only information it will link to is public information. It's not a Australia card type identifier. So it was important that stage one of the project involved a lot of consultation about gathering requirements and establishing with representatives from that sector that a infrastructure and processes that they actually felt would work and at least would be supported by the majority of institutions. We're not anticipating that we'll get complete coverage here. There will always be whatever reason, perhaps some organisations that may opt out. But unless the majority was going to eventually participate in this, then it wasn't worth doing. And this is why we had to have stage one, stage two. Now, the advisory group met for the final time about, I think, last week, a week before. And they have endorsed the architecture that's been described in the first stage and recommended to Anne's that we proceed with this project. And so we are now moving into stage two, which is the implementation phase. There's a project wiki which you may go to which has more information. Also, the advisory group now is terminated, but a lot of the people in the advisory group are now in our early implementers group. And they will help Anne's and the NLA define the requirements of the sorts of tools, the exchange methods for information, some of the matching rules may need to be varied in order to accommodate the needs of the universities. So the implementation stage, the second stage of the contracted project with the NLA, is adapting their existing infrastructure, which is designed for librarians at the NLA to match sort of identifiers amongst contributors that we spoke of. Not really designed for what would be needed to make the efficient, the matching of large numbers of researchers. So that infrastructure needs to be adapted so that institutions can use the tools and the identity matching service needed to be modified and the tool that's used for identity matching enhanced. And so in order for this to be implemented and the tool improved, we need to ingest work with a number of early implementers. And this is the stage two, which will probably finish in the first part of next year. And they will produce technical documentation and guidelines. And after that, the project has funded has ended. But of course, there is ongoing support needed for this infrastructure. For Anne's, that will be providing training guidelines to contributors about how to use it. Liaison and some tools, for example, making sure that the metadata store projects integrate with the party infrastructure project. For the NLA, they will continue because they have to for all sorts of other reasons. It's their mandate to maintain the infrastructure. They will register contributors and the harvest arrangements with the universities and the government agencies and continue to support and improve the data administration tool. So that's just an ongoing commitment. So just a little bit of a pictorial view. On the left, we'll have a ARDC contributor. Could be an institution, could be part of an institution, somebody like yourselves who's contributing collection service and activity records to the ARDC. Now, at the moment, you also may need to include party records. But the purpose of the infrastructure is that the party records can be submitted independently to the party infrastructure. So information about your researchers, about your research groups, your departments can be submitted directly to the party infrastructure. And in return, a persistent ID and otherwise an NLA party identifier is assigned. And that means that we don't need your party records anymore. What we need is in your collection service and activity records, a related object which says this NLA party identifier is the manager of, is the chief investigator or whatever the relationship is to your collection or to your service or to your activity. Once you have that relationship, you know that there is already a record in the ANS registry for that identity. So there is no need then to supply party records. But of course, you do need to supply party records or someone needs to supply party records. It may not be the individual project or data capture project. It may be that your library, your research office is already providing information about all of your researchers that are involved with your projects to the NLA. And so you really just need to be able to link to those records. But I'll talk about those implications later. And then from the party infrastructure, the ANS collection registry collects party records, which is linked to the collection records that we also receive from the contributors. So it's a little bit of a little bit of a split instead of in the sort of model that you've seen so far. Just a little bit about people Australia or, which is now a trove, if you're familiar with trove, they're people and organisations view. They of course don't use RIFCS as the underlying schema for records about people and organisations because RIFCS was designed for a data registry, a collections and services registry. And it only captured enough information about people to provide a little bit of context. Whereas what they're looking at is getting a lot of literature information about people from the sorts of data sources that they're talking to, Australian Dictionary of Biography, Australian Scientists, it's like a pity of Australian Science, et cetera. And so they use a well-used or well-known standard in their library and archives sector for describing corporate bodies, persons and families called EAC-CPF, which is the new version, which they're just now moving to from the older version, it's just called EAC. And as well as supporting names, it supports rich descriptive information, it supports related parties, it supports related resources, and it supports the description of functions and activities. And that's a link to that standard, if you're interested. So just a little bit of a, so again, another diagram, the model, there's the data providers providing party information. The NLA would obviously hope that most people have an OAI repository, we know that's not true just yet, but most of you are building towards one, if you hope, don't have one already. But they will accept party information that's in a static XML file, accessible over the internet as well. And they will negotiate with you when you set up a relationship, or you, or the institutional, whichever body within the institution is providing the party information, will sort of contact them and say, I've got party records, information to give to you to get identifiers for, and they'll sort of say, well, you know, do you want to use send us your records in EAC format? Do you want to, have you already got them in RIFCS format? We'll take that if that's what you've got. But they can also work with you, and whatever internal, you might use another directory standard that's for describing people and organisations within your research management software or your staff directory system, and they can also use that format and map it to EAC. So that's just the negotiation phase. They take that information in to their harvester via their harvester into the People Australia database if it's automatically matched. Now, at the moment, the rules for automatic matching are very, very tight, and most of your records would not get automatically matched because, you know, an identical first name and surname is not sufficient to say that that's the same person. They use things like birth and death dates and other ways to do automatic matching, but most would not match unless they have a common identifier. So without some sort of common identifier to go with the name, then probably it will end up not being matched and will go into the party data administration tool where someone will have to manually say, yes, that's the same person, or no, that John Smith is, it's not the same person. There's no one of that name already in People Australia. It's a new person and it's a new identity. And when it's in the People Australia database, it's not only available via Trove where you can search and browse people and organizations, it's also available via machine query services using the SRU standard API, and it's also available for harvest via their AI repository. So you can harvest those records back again and you can harvest it with the NLA identifier. So your own systems can automatically populate your systems with the NLA party identifier against that person's entry in your system. And from there, you can harvest it back again or you can send a request. And the change now because we're moving into using this service in the research data context is that the party administration tool will now be available to the data providers as well as to the National Library for them to do their own matching, for their own records. And this is an example of what the party administration tool currently looks like. It'll be released at the end of October or mid-October for a small number of early implementers who are going to provide their records to the National Library about their researchers for them to use and basically feedback improvements. And as the tool is improved, then we'll release it more widely next year. And so you can see that the entry there, that's the entry for Thomas Huxley. And you can see on the side that there were actually three contributors. So there were three sources of information about Thomas Huxley. One was the Library of Australia, in other words, the Australian name authority file, and the Encyclopedia of Australian Science and the Australian Dictionary Apography Online who all provided records about Thomas Huxley. And you can see there's some biographies there, related people and organisations, and resources, things that either about Thomas Huxley or by Thomas Huxley. Now if we were to look at a researcher, for example, someone from QUT, you can see here now the contributors are the name authority file and the Queensland University of Technology. And when the Queensland University of Technology supplied their records, it was out of their institutional repository. So the National Library of Australia was also able to harvest from them related, sorry, publications in their repository that were connected to Greg Hearn. And so you now see the selected resources from their repository, as well as other publications that were already in People Australia by or about Greg Hearn. And then if you look at the, back in the People Australia database, this is what happens. They don't change anyone's, they just take what the person contributes, it stays, or the organisation contributes, it stays in there. So there's one identity for Douglas Mawson, one identity authority. That's the party ID. But within the database, there will be three quite separate records that are each being provided. And the local ID that was submitted as the identifier by the local institution when they submitted that data. And that can always be matched against to the, they will keep that match point. So the Encyclopedia of Australian Science could always send an SRU request to Trove with their local ID, and it will return back the NLA party identifier. So, and also if they then send an updated feed and they have expanded the information about Douglas Mawson, it knows that it's the same record because it's got the same local ID and so it will replace that record rather than create a new one. It will do an automatic match. And so what this means for data providers to the ARDC is that they're encouraged to provide party information to the NLA party infrastructure service and to obtain NLA party identifiers. In this way, we will build a common set of parties for the Australian research sector rather than a slab of identities from this institution, this institution, this institution, this institution, many of whom will overlap. Over time, affiliation information can also be captured in EAC so that people will be able to see that this person was at Queensland University of Technology until sort of 2008 and then they moved to the Monash University. This sort of information is a benefit in the research and innovation context. And so what we also then want to see is that the collection and activity and service records that are provided to Ann's contain links to NLA party records. So inside those collection records or dataset records, for example, there would just be this related object element. And you can be confident that when you see your dataset record in RDA that there will be a lot of information available there about that person who is managing that dataset without you having to provide that information in your dataset records. So for data capture projects, they're a little bit different than from sitting in the commons projects because you're often working in a quite specific subset or area of the institution. So you really need to work with other groups within your institution who are also got contracts to provide data records to Ann's so that you actually come to a common agreement on what the key structure you'll use for party records and who was going to provide those party records. Are you each going to provide party records to Ann's or is one area like the research management officer going to do it on behalf of all of the areas institution? It doesn't really matter. It's quite valid for the party infrastructure to get a set of party records from a particular research center which has very rich information in their system about those researchers much richer than perhaps their staff managed personnel system or their research office system has. And so if you provide records about that person the research office provides records about that person they all get all that information gets connected into the one identity. So even though you're two different contributors you can be referring to the same person and you get a sort of an amalgam of information about the person, what you provide and what somebody else has provided. But to make that happen easily so you don't have to do a lot of manual matching you want to sort of agree what key will you use what is some unique key within your institution that you can use so that that record that you provide about the researcher in your research program can be matched with the record that might be provided by another area of university about that person because you've chosen to use the same key structure for identifying. If you don't that's okay too but somewhere along the line someone may have to manually match because there's not enough information if you like to do an automatic match because the name being the same alone is not enough. But as part of this project the NLA and ANS will be looking at other information we can use to do probabilistic matching such as publications, field of research codes there are other things that might be able to be used for automatic matching so that people don't have to do manual matching and we'll look into that as we go along.