 So, before release 15 of Research Data Australia, which went live last April, default search would return research activities, services and parties as well as research data collections. There were already more than 40,000 ARC and NH and MRC grants that were in the RDA system, so you would often find grants appearing when you did searching. But now that the default search for RDA only returns data collections, we needed to have a separate discovery service for exploring research grants and projects. This is a pilot service because we didn't have enough resources at the time to do thorough user requirements analysis and design. So we have a working service and we hope that you'll give us a lot of feedback and we'll gather that feedback for an improved design next year. The ANS Registry aggregates lists of research grant descriptions that have been provided by funders and it also aggregates lists of project descriptions which have been provided by research institutions and agencies. Currently we have 45,000 grant descriptions just from the two major funders, ARC and NH and MRC and we also have 2,000 research project descriptions. The research project descriptions have been supplied as activity records in RIFCS format by our data contributors. They may have been manually entered or they could have been harvested from contributor feeds along with the way we harvest other objects like collections, parties and services. There's a good reason to have both grants and project descriptions for the same study as the information provided by the institution can be more current and may contain more information than the grant description which only has the information that was supplied during the submission process of the award. Also there may be many research projects which are either internally funded or funded by bodies who don't supply grant information to ANS. So that's why we collect grant descriptions from funders and we collect project descriptions from the institutions. So now I'm just going to give you a quick look at what the service looks like. You can see from the main research data homepage, research data Australia, there's a grants and projects explore service and on that you will see that your search is restricted only to grants and projects. There's also into browse by the same subject groupings that you will see on the home page. A little bit of information about what the service is and also at the bottom there will be a link to all of the funders who supply grant information to us. And if you follow those links at the bottom, you'll get more information about what they have supplied, terms of use for the data, how current it is and those sorts of items of information. A simple search for, for example, protein interactions, notice also you can search within various fields instead of all of the fields the same as you would on the home page. And there's also the possibility to sort the search results, default is relevancy, there's alphabetic, but you can also sort by the day that the project commenced or completed or by the funding amount for the grant. And you can see here if we look at say we can filter the result to projects rather than grants. So those will be the ones supplied by the institutions. And you can see, for example, that first one has been supplied by the University of Adelaide and it's linked to a grant because they have used the grant identifier for their project description. If I go into there, you can see that this is the information that they supply. You can see the related data, the data that was output from that study. You can also switch and see the actual grant record, which is what was supplied by the Australian Research Council. But again, because they are linked, you can also, we were also able to see if we go to the ARC, that the ARC's grants have funded this creation of this data set. And returning to the search, you can see you can restrict to grants or projects. You can restrict by, filter by the status, you know, whether it's a finished or it's still active, the field of research subjects categories, the institution that's managing the project or the grant, who funded it. Within those funders, there are various funding schemes. You may want to restrict, for example, to studies like discovery projects and exclude things like equipment grants. And also, you can filter by the funding amount range, the commencement date and completion date range. And that's enough. I don't want to go on any further because I have little time. So going back to my presentation, I'm going to anticipate this question about why do we bother? We are RDA as a service for discovering research data. It's what we're about. Why do we bother with these descriptions of research activity? And that's because project descriptions provide extra context for the published data sets. The metadata about data sets may not be very complete and you may get more information when you read about the study. Also the vast number of research projects that have been conducted over the past X number of years don't have published data sets, although many of them have produced data that could be useful and reused in other contexts. So this is one way, perhaps, of discovering research that may have produced data that you could use, although the data itself has not been published. The grant identifier can pull together related data sets and publications. So by publishing grants and assigning a persistent and unique identifier, we have the ability for that to be used in the entire research sector to try and connect together various aspects, such as publications and data sets and other outputs. Also, the funders of research would like a way to see the research outputs that have been funded by their funding programs. And a side benefit is that before this, there was no one-stop-shop-for-Australian research and this could be useful nationally and also internationally because it could be added to global research discovery portals and people will become more aware of what research is going on in Australia. In this climate of increasing collaboration, publications and data sets that result from the same grant could still be deposited in different systems. These together in a national or global service requires that they all use the same grant identifier. Now funders like the ARC and NH and MRC have always had their own identification system and grant IDs and they are the ones that have traditionally been used in the acknowledgement sections, for example, of journal articles to identify the grant, but they don't resolve to any information about the grant. So the research sector needs a globally unique identifier, which is persistent over time and resolves to a description of the grant and supports linked data queries. And we chose the Perl identification system for this. Currently the identifiers resolve to a view page in research data Australia, but as funders develop their own online systems, it is possible for us, for this research grant identifier to actually resolve to a view page in this system. You can see that the way the grant identifier is formed is that always the funder acronym comes before the grant ID. So for example, we can redirect grant identifiers of this form to view pages in the ARC's online systems rather than our own. We also have an API and why is that important? There are many systems where access to research grant information is useful. For example, an institutional research portal may want to display all the research grants with which a researcher has been associated during their career or all of the research grants their institution has participated in, even though they may not be the administering organization. An API allows these systems to interrogate RDA to display this information within their system so they don't have to come to RDA. Also, systems that support the submission and description of research data and publications could also use the API to provide lookup and validation for the grant so they don't have to have just a free text box where mistakes can be made. Additionally, analysis and reporting systems that want to analyze research funding patterns can also use this API as a source of information. There are two options for connecting data collections to research grants. If your institution supplies project descriptions to RDA with connections to research data outputs, for example, then if the project description contains the associated grant identifier, then that's all that's required and the connection will be made. The other simpler option, if you don't supply those project descriptions, is just to add the grant identifier to the metadata about the data collection as one does for a publication. Of course, if the funder has not supplied grant information to RDA, then there will be no per-grant identifier. However, it would still be very beneficial to include the grant by just selecting the funder from a drop-down list for Australian funders and then just add the grant ideas free text and also possibly a title and description. So this information is useful and perhaps later when the funder does supply us with grant information, we'll be able to match the grant ID and connect it to a per-grant identifier. So you can see here in a system, say a research management system which manages all of the research projects in an institution, data collections may be connected to a project, but if the project has the grant identifier, we can also make a link between the grants and the data collection, which is very important for funders who want to see the outputs from their research grants. Just as an example of what happens in institutional systems, the following screenshot is from the University of South Australia's data management planning system and you can see here they have a section adding in the funding source and they have a drop-down to select the funder, funding scheme and the identifier and if they use our API in this form, then this can be a drop-down to select the scheme and this can be a look-up to type in either text or the ID for it to be validated and return the actual per-grant identifier. But at the moment because they do put this funder number here, or sorry, the grant identifier here, just the NH and MRC one, in an NH and MRC one, when they're creating RIFCS for us to harvest into our system, they can turn that into a per-grant identifier because they know who the funder is and the grant ID. Connecting a publication to a grant is already happening in most institutions. The Council of University Libraries has already developed guidelines in conjunction with the NH, MRC and ARC for tagging open access versions of publications resulting from grants with the per-grant identifier. And as these research publications are harvested by Trove, ANS is able to harvest that connection from Trove so that we can include publications as well as data collections when viewing a grant in RDA and there's a Trove guide which explains how to do this. Here's an example from the QUT institutional repository when submitting or cataloging a publication, they can put in the funding body which again comes from a drop-down list of funders and then type the grant ID and they plan also to use our API to provide a lookup for the grant ID as well rather than have it as free text, possibility of error. Just looking at some example grant views in research data Australia, there's one from the NH and MRC with researchers that's linked to a party record, you can see the Perl, there are also other examples where you will see related data collections and ones where you will see related publications but unfortunately I've lost the link so and I'm running out of time so I won't demonstrate them here but those links that are in the slides if you follow them you will see those records. So we're only at the beginning stages of building this service in terms of the content. Currently the grant method data that's provided the ARC and NH and MRC includes investigator names but no identifiers. If this was provided for example an orchid or an LLA party identifier the grant could be linked to information about the researcher that's been provided by their institution and to all of their grants and outputs no matter who the funder was and also the grant data they provide doesn't include institutional partners in the grant only the single administering institution so we are continuing our engagement with them to hopefully get more information in the future and we're at the start of a process to expand the registry to include grants from other funders for example we're currently working with the Department of Environment for them to supply information about their research projects sorry the research projects they fund for example in the National Environmental Science Program and they will require the resulting data from those research grants to be published and also to be linked to their grants by the institutions who deposit the data in that way they will be able to see an RDA the outputs from the work that they fund most data collection descriptions do not contain the related grant identifiers at the moment most research project descriptions that we have in our system do not contain the pearl grant identifier and this is where we want to make some progress over the coming years there's limited coverage in RDA of research grants from funders other than ARC and NHMRC currently and again we hope this changes and there's limited coverage in RDA of research projects past and present that's been undertaken at Australian research institutions we only have a very small number of them at the moment and usually those are ones where the data has been published so our message from this presentation and our continuing message is that research institutions can now provide us with descriptions of all of their research projects and we would like them to supply as much as possible more funders will be supplying grant information to RDA institutional systems can have a lookup widget for the quick selection and verification of grants and inclusion of the pearl grant identifier rather than just free text input and we would like as many people as possible to give us feedback on how we can improve this service there's a list of references that I've added to the slides which will be made available thank you very much thank you well hello everyone from Canberra where it's just started snowing I'm going to talk about two identifies that you can include in your metadata one is a unique identifier for a person or organization who worked on the research and the other is the grant identifier which nominates who the funding agency was and the specific grant number in a consistent way so I'll start with the party identifier many organizations have some type of record set that describes people associated with them so it might be the research staff at university authors of collection items in a library members of a parliament actors in a theater company and so on a few years ago the National Library and and work together to set up the people and organizations so in trove now goal here was to bring together those records from different organizations and systems and to make them discoverable in a single place we especially wanted to bring together records where the same person had different records at different institutions and we wanted a consistent way to identify that person across those different systems so how does it work well the first time we get a record describing a new person like this one for Marcia Langton we set them up a record in Trove which is like a container it acts like a big bucket the next time we get another record describing Marcia Langton from a different source we put it in the same trove Marcia Langton container we work with about 40 sources now and each source usually has a snapshot of the person's life and work as it relates to their own organization so for this Marcia Langton example we've got four sources that have a record describing her and all have a different piece of information to bring together libraries Australia has an authority record that knows different forms of her name and the titles that she's published under the Australian women's register has keywords that describe her fields of activity as an academic and an activist and they've also written a biography about her achievements as a feminist and a land rights advocate and an actor on her awkward profile she named the universities where she studied and been employed so that's the organizations that she's related to and then I access the Australian women's register and awkward all my publications that she's authored they're not the same ones but all those pieces of information come together to form a more complete picture about her life and her work than any single source does it's a great benefit to a user who comes along and finds this record in trove and there are also some clever infrastructure benefits that come along with this service so this trove container record is the amalgam of records from four different organizations I access the Australian women's register libraries Australia and awkward in the background we keep all of their local system numbers so you can ask us for I access record a 22 464 and trove will return this big container record for Marcia Langton they all resolved to this overarching persistent identifier and we call that the NLA party identifier it always has the form NLA dot gov dot au slash NLA dot party dash and their unique number so Marcia Langton's is this one with 615 464 on the end now all records in the people zone group identifies for that person into a single container so if a university has records for their researchers they can add them to trove trove will put it into the bucket for that person and then they can ask for their own record back and they'll get all the other related identifiers that trove knows are associated with that person the party identifier is persistent so the URL will always resolve even if the system changes it can be used by anyone to refer to Marcia Langton that's background on how trove aggregates records for a person this thing goes on to play an important role in how RDA can identify researchers so here's a record for a researcher named Kim Anderson it was established by the University of Adelaide and then later Kim set up an orchid for herself and a libraries Australia authority record was also created trove brings all three of those records together into this one container so now Kim has a persistent party identifier in trove over at RDA she's added a number of data sets like this one and she's also associated with a research project it's a research project description in RDA as well as the data set and the trove party record all include that same NLA party identifier in the metadata because they all include the NLA party identifier systems like RDA can automatically bring together all the different bits of data from different contributors and different local systems and build a picture of Kim's current research which is just what RDA does now if Kim were to move to a new institution and get a new local identifier in their system it doesn't matter as long as she uses her NLA party identifier then services like RDA and trove are able to tell that it's the same Kim Anderson that previously worked at the University of Adelaide when the NLA party identifier is used we can collect here that all the research someone has done no matter where they've done it or what system it's identified in a similar situation exists with the funding and Monica already touched on this little bit so hopefully you know about that guide that core released last year that lets us know how to tag repository records with ARC and NHMRC grant numbers so that we all add them in the same format to repository records for publications and for research the format is the permanent URL which currently resolves to RDA hopefully everyone's seen this before but if they haven't standard prefix the funder then the grant number so in my example here I've got an ARC funded project LX 0881890 if you use a ghost to that URL they take into the page in RDA where they get the overview of the project in trove if you search for the same grant number in the same perl format you'll find 18 publications that are associated with that same grant now those publications aren't all held by the same institutions they're held by different repositories but they've all put the grant number in the same format so the first one is the UQ institutional repository the second one is the University of Wollongong and because they both use the same format of the grant number they both came back in my search so just for a quick look here's what the record from the University of Wollongong looks like in trove you can see the ARC grant number in the right format it's on the record we're also now getting those grant numbers in the same format added to the records in the people's own and researchers are actually doing this for themselves so when a person includes that they've worked on a particular project particular grant number we use that information to further broaden the picture of their research without them needing to do anymore so the first author on this paper is someone called Sara Dolnica she has one of those container records in trove and on the record we got from Orchard about her she included that she'd worked on a number of ARC funded research projects including this one that I've circled she included the grant number in the correct format it's the same grant number that we saw in RDA and the same grant number that's been tagged in those 18 publications in trove now Sara doesn't have to include the 18 publications in her record here that describes her as a person trove can simply use that standard format identifier to link through to all the publications tagged with the grant number so clicking that returns a user to those 18 articles trove didn't actually do anything it just relied on the same grant number given to us in Sara's record from Orchard and the repository records most important thing is that a user doesn't have to know how all this metadata works they don't even necessarily have to understand what a grant is from her biography they could simply click through and discover more information that they didn't know they wanted so wrapping up what are the benefits of including these identifiers well including a persistent identifier for a person allow systems like RDA and trove to identify different bits of research different data sets publications bring them together and relate them to their creator it also allows the link to go back the other way to link creators to the research they've participated in including a research grant number allows us to do similar things for funding grants to identify the project bring together the people who worked on it the data sets and publications that were outputs when researchers move institutions it becomes easier to discover and import their previous work and for end users who are discovering this information they can expand their search without having to understand how the system works now all of that relies on consistently using the same format of identifier across institutions when we all use the same format then these automated automated systems can do a great job of bringing together research outputs from a single project and across a researcher's career I might leave it there and hand it back to you Susanna so yeah I work at Griffith University and e-research services and a while ago we used the ANS Research Grant API to improve the data that we present in the Griffith University Research Hub so the Research Hub is our publicly facing researcher profile system and we build that for two main purposes one to make Griffith Research more discoverable to show what we're doing and the other one to give researchers a profile that they can use for their own purposes that they can share that shows their work individually and to give a bit of background the Research Hub is built using Vivo which is a semantic web application and it's becoming quite popular there's a large number of universities worldwide that built their researcher profile systems based on this it came from Cornell University originally so there's a huge uptake in the US in particular and as a semantic web application it has a couple of very nice benefits for this sort of purpose and one is that it provides a very rich ontology to model information about researchers research related activities organizations such as institutes, schools, groups and in terms of activities we can model publications, grants and other research output and of course it's also easy to add party or your own ontologies to add even more data to this now when we developed the Research Hub one of the main aspects that we wanted to cover was that people would not have to maintain their profiles themselves and so in that spirit we try to get as much data as possible from various enterprise systems that are used in external systems if available and so at the end at the moment researchers really only have to add their photo if they want one a short biosay statement and maybe a research statement and everything else including academic degrees employment history publications, grants, supervision and so on gets drawn from enterprise systems so we get the same information about institutes, groups and schools however one problem that we came across was that enterprise systems were at some point built for a specific purpose and that was usually not that the data would be displayed publicly and for a lot of the data that's not a huge issue publication records are fairly standardized so we didn't have any problems there but grant information in particular was not very well covered in our systems sometimes just because we weren't the managing organization so if things changed later on in terms of titles and amounts and whatnot that wasn't necessarily reflected in our systems and the other reason is that we didn't necessarily need descriptions and whatnot for the reporting purposes the systems were built for so for the research we identified two business cases where we could use external grant data and really add some value to the research hub and one was to improve data on existing grants get better descriptions get full funding amounts like the total grant amount and not just the share that Griffith University got from it and the other business case was that while we knew about grants that at some affiliation with Griffith we didn't know anything about grants that researchers had while they were not at Griffith University and so adding that information became quite important because while it doesn't showcase any Griffith research it is an important part in the biography of our researchers and it gives a much more complete picture especially because we do have historic information about publications and whatnot so not having the grants left a gap that many people were sort of eager to close and again we didn't want people to enter this information manually so getting as much of that done automatically as possible was the end goal and this is where the ANTS research grant API came in and yeah I said in the previous talks it draws from the same data sources as the Research Data Australia portal and so it has very comprehensive information especially about ASC and NHMRC grants and it also provides us with a very nicely cleaned up version of this grant information so information that is maybe not well captured in a standardized vocabulary in the source data was actually cleaned up and is now provided in a very nice form and the API is based on solar which is a very simple to use very nice and very well documented enterprise search engine and so using this data was actually quite easy for us so for the first business case we didn't actually have to do very much we could basically look up grants based on their grant ID and the funding body grant IDs are necessarily unique across funding bodies but doing this lookup was quite easy and so we would get back the record as a JSON formatted record and all we really had to do was map those fields to our RDF vocabulary and do a few related lookups for people in our database and whatnot to link it up properly but all in all it was a very easy process and well we did this work quite a while ago so about a year and a half I think most of it a bit longer and initially a lot of the text fields still contained a lot of the actual information in terms of funding amounts and whatnot and we did a fair bit of text processing to extract it as well nowadays ANS has done a lot of work on improving this and so we're now getting a much cleaner version of the data so whoever wants to get into this area now and use this information is in a really good position to get very nice and clean data from this the second business case was a lot more difficult so we just heard about research identifiers it's still very difficult to get that information for our researchers at the moment and ORCID is not very common yet and we don't get ORCID identifiers from the API or from the funding bodies so what we have to do to get historic grants for researchers that had nothing to do with Griffith was we had to come up with a way of matching researchers by name and for that we built a two-stage scoring function one simply looked at name similarity and gave us some idea whether two names could be referring to the same person and we put a lot of empirical work into that because sometimes people go by a preferred name sometimes by the actual first name some people always include their middle name some people don't so there's a lot of work to do about that and then we still have the problem or had the problem that names are not unique and so we added a second score that was based on the fields of research people published in and we have very good information about that in the research hub so we could build a portfolio of four codes that people had published in previously and we just went by the assumption that if they had a grant in the past that had a certain four code that they would have at least one publication that had that four code as well yeah then we had to implement some additional handling for edge cases where grants were actually managed by Griffith that we had information about them but people were different institutions and still attached to them and linking all that up but that was all relatively easy once we had the linking up and running well I can't actually give any numbers about how well we are doing empirically it worked quite well and in practice over the last one and a half years I think we had about two or three false positives where people informed us that the data was incorrect and we built in functionality to manually add and remove grants but still automatically ingest the data and yeah so both of these cases were very successful and that was largely thanks to how easy the ANTS API was for us to access and to use and yeah I thought to wrap it up I quickly put up some links to the systems involved the first one is our research hub the second one for those who are interested and who may not know about it already that's the Vivo project which is definitely worth a look for everyone who's interested in getting into the space of researcher profile systems and the last one is the documentation to the ANTS API and since it's based on solar there's a lot of additional resources everywhere on the web and yeah that's all from me