 Hi everyone, thanks for joining the Griffith Gold Standard webinar. I'm the project manager for the Griffith Research Hub and for two ANS projects, the ANS Gold Standard and the ANS Metadata Store project and I'm also a data librarian. Hi, my name is Arvin Solland and I'm a senior developer in the research services and I've been working on the Gold Standard project plus some other ANS projects as well. Just some technical problems with the slides there. Before we start, just some acknowledgements for this project. The Gold Standard project at Griffith had a steering committee which was comprised of John Morris who's the manager of the research services here, Malcolm Wolski who's the associate director of scholarly information and research at Griffith, Andrew White, the ANS client liaison officer, Julie McCulloch who's the ANS Metadata librarian, Stuart Humford the ANS senior research analyst and Chris McCunes who's a web infrastructure developer in enterprise information systems at Griffith and Gerhard Weiss who's a senior programmer at Griffith. Just a glossary of terms so you can understand some of the things that we're talking about in this webinar. So first of all the Griffith Research Hub which is a showcase of Griffith research with profile pages for researchers and their associated projects, groups, publications and data collections. The hub was seated with funding from ANS to develop a metadata store and selected records in the hub are provided to Research Data Australia via the AIPMH. We also have a data repository which makes research data collections from Griffith accessible and searchable via a web interface and selected records in the repository are ingested by the hub and then provided to Research Data Australia. So if you're not familiar with the term metadata store the ANS metadata stores program supports the development of institution-wide solutions for the discovery and reuse of research data collections. We use Vivo software that's the software that underpins the Griffith Research Hub and it was developed by Cornell University. MOAI is an open access platform for aggregating content from different sources and publishing it through the OAIPMH and it's used at Griffith to provide the automated feed of records to Research Data Australia and also RAID which is our research administration database and it's maintained by the Griffith Office of Research that contains information about researchers grants and publications. The ANS site my data service which allows research organizations to assign DOIs to research data sets or collections using the global DOI consortium with data site. The NLA infrastructure which is a completed ANS funded project at the NLA to develop a national party infrastructure to provide persistent identifiers for people and groups or parties and Research Data Australia which hopefully you're familiar you're all familiar with Discovery Service for Australian research data collections and their related parties activities and services and of course the records are provided by contributing institutions using the RIFCS scheme. We adopted an open approach for this project in accordance with ANS wishes to share the project experience and the lessons with the ANS partner community. The code we have developed for minting DOIs and for providing records via MOAI will be made available on GitHub at the conclusion of the project and we also maintained a project job from around February onwards. Off the 11 post there've been almost 1400 views in total and while there are a few comments some people have contacted us directly after reading the blog and we found the blog very useful in summarizing our thoughts and in engaging with the ANS partner community. So in this webinar today we're going to cover a project overview the strategy we utilized to develop the gold records the manually create manual creation of gold standard records using ANS online services the creation of records via an automated feed to RDA the information architecture and the technical challenges we've faced in this project a list of the gold standard records the enriched record connections that we've made in these records in the enriched record components the lessons we've learned along the way the return on investment and then we'll just do a quick summary and then there's time for questions at the end. So this was this project was initiated to address a problem and the problem was there are now over 40,000 records that describe data collections in research data Australia plus their related party activity and service records and these records vary considerably in metadata quality and in their interconnectivity. So the purpose of the project was to create high quality records that are richly described and connected and to publish these gold standard records to RDA where they can be seen as exemplars of best practice. The premise is that high quality records that are richly described and interconnected will enhance discoverability and reuse of the resources they represent. This is part of the ANS data connections strategy which is to link data through shared entities and concepts and to exploit these linkages in the research data Australia discovery portal to create a rich mesh of interlinked information about research data collections. So QUT was the only other institution to undertake the gold project and unlike other ANS projects we were encouraged to have minimal communication with QUT about this project because an outcome of the projects was to find different pathways to similar goals. QUT have completed their project but we are still to complete the final records review process. Creating high quality information about research data collections requires the creation of quality metadata records. The words on this slide are taken from the US geological survey website. Under the topic heading of why scientists should embrace data documentation they explained the benefits of data sharing and metadata creation. They summarized the value of metadata for scientists as one allowing others to understand your data. The metadata record will contain valuable information about your data set such as why it was created, how, when and where the data was gathered, if there are any gaps in the data, what quality checks the data went through, other sources that were used to create the data set and how it should be cited. This record allows the data to be reused for purposes that may not have been foreseen when it was collected and this allows the advancement of science to occur. Two, to avoid data duplication. Development of a data set is a time consuming and costly endeavour. By merely making metadata available and discoverable these records allow scientists to determine what data already exists and avoid duplication of effort. Three, to share and access reliable information. Metadata records allow scientists to share reliable information with ease and find out how to access it. Four, to evaluate data. A metadata record allows a scientist to quickly determine if a data set is appropriate for use in a project. Five, to reduce workload. Creating a metadata record requires some work up front however when a data call comes in for data that a scientist created years before, the metadata record will provide the details about that data that may otherwise have been long forgotten. Six, to make data transcend people in time. A metadata record allows the data to remain usable once the data developer has moved on to other projects. It ensures investment in the data by providing information that allows it to be used indefinitely. And lastly, institutional memory. Metadata creates institutional memory for organisations allowing an organisation to have accessible knowledge of all the work it has produced. Gold standard project was all about creating quality metadata and making connections. It follows that the better the quality of the metadata the more useful the records are to data discovery and reuse. So the aims, project aims and objectives which will investigate the metadata quality and design of records that Griffith contributes to RDA. To investigate methods for improving connectivity within the records using ANS tools and services. To identify the characteristics of a gold standard record. To identify deficiencies in our records that would be addressed through the gold project. To identify ANS tools and services that could be used to create gold standard records. To develop and implement a strategy for creating gold standard records. And to share the process for viewing with the ANS partner community. And to document the project including providing feedback to ANS on their tools and services. In scope was a project plan. A sample of richly enhanced RUCS collection records with related party and activity records that are manually added to research data Australia. A document describing the process related to enriching standard RDA metadata records to gold standard. Five to 12 richly enhanced collection descriptions with related party and activity descriptions provided by automated methods to RDA. A presentation of the findings of the project at a public event or forum. Progress reports and a final report. The time frame, this was a nine month project that began in late February and the current status is that we're still in progress. We have a formal extension to 30th of November. But we expect the record, yes that's two days time, but we expect the records review process will continue beyond this date and will publish on the blog when the process has been completed and the records are in the RDA production service. So this project was entirely funded by ANS at 125,000 but the project built on existing infrastructure and staff knowledge which are not factored into this funding as we'll cover in detail during the force of this webinar. As with any project there are a number of risks associated with the project and these include loss of staff, the capacity of Griffith Systems to store gold standard metadata, the ability to integrate with ANS tools and services and the ability to automate record updates from Griffith to RDA for the gold records. Project stakeholders include Griffith researchers who are research data collection owners and contributors of record content, Griffith e-research services who provided the project staff and the overall project management and implementation, Griffith ICTS staff who provided assistance with the feed of records from Griffith to RDA, ANS who funded the project and ANS partners who are current and potential implementers of processes and solutions and some of whom are providers of ANS tools and services such as data site in the National Library of Australia. Right so a little bit about the strategy we utilized in the gold standard project. So firstly it was to interpret the meaning of gold standard records then build on existing records and their related party activity and service records and expand the records that uses many RIFCS elements as possible. Then make use of ANS tools and make connection with these records site by data, ERI meeting service and National Library of Australia party infrastructure. Then initial records are to provide provided via ANS online manual interface and then later via automated feeds. ANS staff will then assess the record quality and provide feedback prior to approval for publication in RDA and finally assess and share our gold standard experience with ANS and ANS partner community. So interpreting the gold standard our first task was to interpret what was meant by the gold standard records. We used a variety of documentation and resources to find out and these included the ANS competent provided guide for the CPG. This guide is a reference tool for metadata providers who needs to use the RIFCS schema and it describes the information collected from display in research data Australia and explains how to use the RIFCS schema to share that information with ANS. We also used the RIFCS schema guidelines. These documents describe the use of the RIFCS schema for the purposes of exposing collection, collections metadata via an OI IP image data provider to the RDA collections registry and also we used our friendly ANS liaison officer Andrew White who brought us with some briefing notes on what was expected regarding the creation of gold records. Next we identified some ANS tools and ANS services that we could use to create connections in the gold records. These were the National Library of Australia's party infrastructure. This is a service that allows institutions like Griffith to obtain a single unique system in that time for each of its researchers. All other identifiers such as Scopus and Thompson IDs can be grouped under the one NLA party ID. We will go into further detail about our experience with this infrastructure later in the presentation today. We also used the ANS site metadata service. Through this service an institution can mean digital object identifiers for their collections. The service is offered free of charge using the global organization data site as the registration agency for the DOIs. Again we will go into some further details about the experience with this infrastructure in the presentation today. Also the Australian Gazetteer service is about the establishment of a robust national infrastructure that allows place names to be validated efficiently by both individuals and software systems. However it was unfortunate that it was not yet available to use it during our creation of the gold standard records. So we created a spreadsheet to manage the record review process. So to note the deficiency in the collection records and the associated party service and activity records and to identify improvements to make the records gold standard and to reference ANS content priority guide and the RIFS schema. The findings of this process was mostly gaps in the records rather than or descriptive metadata. And we will talk about the specific gaps and how we filled them shortly in the slides about record enrichment and connections. So we manually created records in RDA. This was required to meet deliverable E2. So we were required to enhance one collection record to gold standard plus their associated party activity and service records. Submit this via ANS online services then go through the review process with ANS staff and after review we can publish it to the RDA. And our assessment of the steps working for here is value of the manual submission. It's useful if you don't already have an automated feed of records to RDA and it's useful if you can store the type of information required by RDA in your own systems. It gets you familiar with RIFSES and you can see errors and shortcomings immediately. Some drawbacks is that it doesn't require you to store the rich metadata about collection in your own systems and if you do have an automated feed you risk to have mismatched metadata between institution and RDA. So about the automated provision of records to RDA this is required to meet deliverable D4. So we were to enhance five to twelve collection records to gold standard level plus their associated party activity and service records and then submit this by an automated feed to RDA. Then go through the review process with ANS staff. This is the stage we currently at and when the review is finished this will be published to the RDA. And our assessment of these steps are that our approach to the automated feed allowed us to enhance the metadata capacity for connections within our own systems and to maintain richer metadata than RIFSES required within our data repository. An example is item level records. Keep the records we provide to RDA in sync with our records in our internal systems. Drawbacks is definitely more time consuming to do this than the manual submission and it relies on ability to enhance capacity book technical and metadata of the source systems and requires mapping from RDA to RIFSES requires JSON with our ICTS staff so that's an extra step in the process for us and a little bit more cumbersome to review within the ANS online services. So to understand how we crafted the automated feed of our records to RDA it's a little helpful to look at the technical architecture or the information architecture here as we fit. So this diagram shows the information about the flow in and out flow of the research hub. So basically rich RDA data is stored in the hub and that RDA data is then queried using MOI to produce the RDS feed which we harvest into RDA and whenever possible we make changes to the records in the source systems for example the data repository or in the research hub itself and then updated the mapping within MOI to reflect those changes in the RIFSES feed. So some technical challenges the RDF to the RIFSES mapping and there was also some additions to the RDF ontology to accommodate gold standard metadata levels and then for us it was also about coordinating the record feed to RDA with group with ICTS so we were relying on assistance from ICTS whenever releasing a new MOI version and we had no direct access to test and production systems which were used to upload our records by the automated feed into ANS testing environment for review. Also we have problems when we try to load the Library of Congress subject headings as an ontology interval because we wanted to use this as an easy way to extend to describe the research areas in for the records. We tried to load the whole ontology interval but the size of the ontology XML file which was about 900 megabytes actually cost our entire Vivo dev instance to crash without a chance to restore so the database was completely wiped so any Vivo users beware please handle large ontology files with care because it just really doesn't like it. Also being early implementers of the DOI service we encountered a few problems so the documentation was not up to date and in the beginning the test environment was simply not working. We had to use the production environment for testing and that was not a good solution as when we wanted to conclude the testing and get a proper production prefix DOIs we had to get ANS service staff to manually flip the switch and then we were basically left without the testing environment as the production environment would only produce production DOIs. Also being early implementers of the citation element in RITS we'll address this issue a little bit later in the presentation. So please note that we have not yet concluded the project and the records are still in the review process and we will put up a blog post when the review process has been completed and the records are available in the RDA production service but on this slide you can see the list of collection records being updated to gold standard and all the party associated party activity and service records. Okay so we used the ANS site my data service to min to DOI for our collection records. RPA developed PHP script to do this which we will make available open source by the end of the project and in the meantime it's available on request. We saw the value of the DOI mining service as something beyond the gold project and there's a list of what we think the value of DOIs are. The benefits of using the ANS site my data DOI mining service is that it facilitates all of the values that are listed in that slide of DOIs. It's free and unlimited within reason of course it mints through data sign which is a global DOI registration agency for research data sets. You get technical support from ANS staff and it's technically straightforward now I say it because RPA has described some of the problems that we had initially but they have mostly been resolved as far as we are aware. So Griffith University was an early implementer of the party infrastructure project at the National Library of Australia and as such we contributed a feed of party records directly to the NLA to obtain NLA party identifiers prior to the completion of that project. This means that we had assistance from NLA project staff to match the 20 records we needed party identifiers for and that had failed automatically automatic matching. Additionally our unmatched records remained accessible for us to hand match as required. So for the Gold project we simply logged into the Trove Identities Manager which is displayed on the slide or also known as TIM. We selected the Griffith unmatched records and either matched or created new party records in the NLA party infrastructure. It takes less than 10 seconds to physically match the records in TIM however it takes more time to determine whether a match should be made in the first place. The process for this is to take your unmatched record and search for possible matches in the infrastructure that is searching for matches among records that already have an NLA party identifier assigned to them. This can be quick for example if there is clearly no existing record with the same surname and first name or it can be more time consuming for example if there is an existing record with the same surname and an initial that matches the first initial of your unmatched record. There are some expertise required for this task to ensure that the right match is made and librarianship skills are most useful in this regard. For us the process was quick and took approximately one to two minutes per record times approximately 10 records was about 10 to 20 minutes. As mentioned we already had the NLA party identifiers for a number of our records. We didn't require training and using TIM because this was done by me and I worked on the party infrastructure project prior to coming to Griffith. We took these identifiers and added them to the relevant party records in the Griffith Research Hub and the hub supports multiple identifiers for each researcher and the NLA identifier is one of these. The NLA identifiers were mapped to RIFCS and provided in the automated feed of party records to RDA. So we made a number of enhancements to our board collection records so this included reviewing the collection title. We created a concise specific title so that researchers can identify the topic in a nutshell and distinguish from similar titles. And we also reviewed the description that there is no prior specialist knowledge in plain language, minimal but detailed. We also mapped a format element which refers to the file format of the data keys like a PDF or CSV or of course similar. And this is stored in our data repository records into a repeated description element in RIFCS. We reviewed the reviewing rights and access rights. These are the specific statements referring to the legal framework and access condition regarding the data. We added temporary coverage. We included a date range for the collection and we'll cover this in more detail in the next slide along with spatial coverage which is geographical coordinates and text that are applicable to where the data was collected. We went to the digital object identifier DOI for each collection record using the ANS sitemate data service and we included citation information that includes the DOI so that the collection can be cited in publications. We included related information that relates to the collection such as the related website and we included subject types in addition to field or research codes for example library of congress subject headings and local types included to provide additional context sometimes more provided more precise than the FHR codes and persist in the discoverability and interpretation of the research. So we added temporal and spatial information and in some cases this required consultation with the data custodian which in many cases is the researcher but in other cases this information was made clear in the collection description. For some collections the temporal metadata included a start and end date for others new material was still being added to the collection so there were only a start date added. Adding spatial information required a use of Google Earth so that we could supply both the human readable text and the actual coordinates. Google Earth includes a coordinate system that gives you the latitude and magnitude of where your mouse is pointing. The coordinates are used in RDA to give users both the actual coordinates and a visual location. So enriched collection record components. Once the DOIs were issued for each of the for each collection we created a citation element in the manual interface that the issue of a citation element is easy once we had identified what information we want to include in it. However the automated these was more challenging and support the provision of the citation info element as either a single block or provided separate parts. Because we used Vivo software and this is based on our RDF triple store it was more logical for us to map from from the individual RDF elements to separate parts in RIFCS however we found that some of the mandatory parts seemed to be forwarding directly from citation information for publications and not really adjusted to suit collection metadata. The absence of a guide or examples made it difficult to determine how the information required could be should be constructed and how those ending citations would be rendered based on the individual citation parts. As a number of ENDS partners in addition to GRIPID expressed single concerns for using this very new RIFCS element. ENDS has since updated the citation info element and based on the community feedback. So we enhanced the party records by including an NLA party identifier by adding a biographical statement so researchers are able to log into the research hub and edit their profile page to add a biographical statement and this is then fed to RDA in a party record. We added ANZSSE field and research codes and these are also recorded in the hub and mapped to the RDA feed. We added related publications so for related publications we decided to script an insertion into each gold party record that contained a link to the publications listed in the researchers profile page in the hub and a text note explaining this link. We felt this was preferable to adding each publication as an individual listing as in some cases there were a hundred publications and the most current publications list is in the hub record. The script is specific to Vivo users and references GRIPID systems but can be made available to other institutions on request. We were expected to include existence dates which are birth or death dates which would have provided an additional context for a party and also an additional match point for records provided to the NLA party infrastructure but unfortunately we couldn't include these in our party records because it's against GRIPID privacy policy and we also reviewed contact details. For the activity records these come from the research administration database which is managed by the office for research so we weren't able to change those records directly in red however we did enhance them within the hub itself which was by adding temple and spatial coverage wherever possible and by adding existence dates. So lessons learned the ANZS online services tool to create manual records makes it possible to provide gold standard quality records to research data Australia even when some of this information cannot be captured in your local systems however it's preferable to capture rich metadata in your own systems and then provide the records in an automated feed to RDA. Your choice of strategy to enhance records to gold standard is dependent on your method of submitting these records to RDA. So we recommend that you build in processes for record enhancement that are sustainable wherever possible that is captured in your own systems so that they can be applied to as many records as possible and have benefit to your institution beyond contribution of the records to RDA. To give an idea of the return on investment the outgoing costs for this project the project funding was 125,000 and we had two staff members working on the project but not all of our time was creating records there was also time spent on administration reporting communication and so on. The costs that are not factored into this number are that we built on existing systems such as the research hub we built on existing staff knowledge so training was not required we built on existing collections such as those in the research data repository. The benefit is that we have rich metadata in our systems which can be discovered locally and by Google and so on. We've created rich metadata records in research data Australia which is again another platform for discovery of the collections. Also more and more widgets and visualisation tools have been developed for specific areas. They require the metadata to present things like 3D flyovers using spatial data or visual and interactive timelines that use temporal data. The citation element in DOIs are an encouragement to researchers to cite data and to provide data so that it can be cited and the NLA party identifies a beneficial as a way of managing multiple identifiers for each researcher to improve research exposure through the creation of a trove record. So overall we've created better metadata for long-term access preservation discovery and reuse and just if you wanted to do the same thing you could save some time by using the open source scripts that we've created and the QUT have created and to create we recommend that you create rich descriptive metadata at the time or research a hands over the data regularly other than going back retrospectively. So in summary we suggest that the approach for enhancing records will differ widely between institutions as it's dependent on a number of factors including the existing systems in place at the institution for managing research data, what metadata is captured in these systems, how the institution will feed their records to RDA and if they're using the automated feed whether changes need to be made in a single place in one system or in many places in many systems. It also depends on staff and researcher availability and on available funds. So that brings us to the end of the presentation and I hope you've got something out of it and time for questions.