 It's my pleasure to introduce our presenter today, Natasha Simons. Natasha is probably known to many of you already, but some of you may not be aware that Natasha joined Ann's earlier this year coming to us from Griffith University, where as a senior project specialist amongst many other things she rolled out data citation at Griffith University. And that's going to be the topic of her presentation today. So thank you Ann, we'll hand over to Natasha. Okay, so at Griffith University we've been supporting data citation for quite some time now. We were the first institution to mint DOIs for our research data sets using the Ann Site My Data Service and we went on to put a Site My Data element into the research hub, which is our premier discovery service for all things Griffith Research. We also ran a project to engage with our researchers and our academic liaison librarians about data citation practices. So in this talk today I'm going to share our journey with you, the warts and all, and I'm going to talk about what a data citation looks like, the minting of DOIs using the Ann Site My Data Service, developing guidelines for minting and managing DOIs, some engagement strategies around data citation and the lessons that we've learnt on our journey, and there'll be time for questions at the end. So a little bit about Griffith to start with so that you can contextualise it. So Griffith is a medium sized university that's situated in southeast Queensland. We have five campuses that are spread out from Brisbane to the Gold Coast and we've got around 43,000 students for 4,300 staff and you can see from the list of schools and departments that we have basically four academic groups and they're quite diverse. So from arts through to business, health and then science, environment, engineering and technology. For our research we also have 32 research centres and a whole lot of priority areas that are quite diverse. So you can see in there we've got sustainable tourism, physical sciences, nursing, Asian politics. So we're very cross-disciplinary at our institution. We have in place some good research infrastructure for research data management. We are very fortunate to have a strong commitment from our university readers to improving data management and we also have had as a result of that some staff resources that have been allocated to important things to do with research data management at an operational level but also at a contract project level. And Griffith has been quite successful in seeking funds from ANS and also from NECTA and other external sources to build our own infrastructure and contribute to national infrastructure around data management. And we have had that strong emphasis on also seeking internal funds. So our research hub, for example, was built initially with a spark of funding from ANS which we then contributed more than what ANS had contributed to build our research hub and really significantly invested in that as a resource, not just for managing data but for profiling our researchers as well. And we have policy frameworks and service models for data management under discussion as well. So what does a data citation look like? Well, this is our research hub which probably a lot of you have already seen because we have talked a lot about it. So the research hub is built on vivo, open source software, and it was built in-house at Griffith and it's our researcher profile system that gives profile page for researchers but also connects everything. So it connects the researcher to their publications, their data collections, their grants and so forth and has landing pages for those things within it. So this is a profile page for Associate Professor Rodney Stewart and you'll see a number of tabs underneath his photograph. There's overview publications, projects, and I've got it defaulted in the screen on the collections tab. So you can see that he is the owner of a number of datasets to do with Southeast Queensland domestic water usage. So there, if you click on those, one of those, you open up the collection in the research hub and that is because we've worked on a gold standard project that was funded by Anzac Griffith. We have some nice metadata for this particular record to describe this data collection. If you then scroll further down this record, you get to right at the bottom there you can see site this collection. So that's what the data citation looks like. It's got the creator names and the date that the dataset was published, the dataset title and the producer of the dataset. And then there's also a DOI link that is a persistent identifier that links through to the dataset. And people can copy and paste that site this collection. So if somebody, a researcher uses this collection and they want to site that data in their research, they can copy and paste that site this collection feature into their reference list for their article. Okay, how is data cited? So that's one example of how data could be cited. I'm going to speak from personal experience here using my own example, mostly because it enables me to talk about it in detail. But a couple of years ago, I conducted a research project with my colleague Joanna Richardson on the training needs of repository staff and it involved data collection through a national survey. We then had the article published in the Journal of Librarianship and Scholarly Communication. And while the peer review process was going on for the article, we put the dataset in our institutional repository and we used the site my data service to mint a DOI for it. And we then provided a little citation reference to the Journal of Librarianship and Scholarly Communication and they put it in the supplemental content. So if you scroll, that's just scrolling down from that previous top record at their website and you'll see supplemental content and it's got identifying skill sets from repository staff, data file and survey questions. And if you click on that DOI link, you will, you have to do a few extra clicks but you will get through to the actual dataset and the survey questions. And right at the bottom there is the recommended citation for the article so that's not to be confused. The supplemental content is the dataset and the recommended citation is actually for the article. So that gives you an example of what a really nice workflow is and how the Dryad data repository works in terms of the workflow for people publishing their datasets, getting the citation and including a link to the dataset in the article when it's published. So it's an example. So now I'm going to talk about our journey into how we're able to get to this point. So what do you need to pack if you're going on a data citation journey? Well, there are really only two things that you need before you start your journey. First of all, you need some research data collections at your institution that have open, embargoed or mediated access. So you obviously don't apply DOIs or data citation to datasets which are private or closed because no one's going to cite them. Secondly, you need a publicly available metadata record that describes each of these collections and provides access to them or advice about how you can access to them, how you can access them. So you can see that data citation actually comes further down the track after you've done some management of the research data resource. So at Griffith we have a research data repository that's built on Equela software and it hosts our data collections, the files as well as the metadana. And we also have a research hub which I've explained and the research hub draws metadata records automatically from our research data repository along with a number of other sources. So the data repository is feeding the research hub and the research hub is the main discovery service for our data collections and for our researchers and research. So on our journey you might also need managed support and I've given you some pictures there so you can see what we've had at Griffith. So you might need management support to do things like approve the commencement of your data citation journey, to assign resources to carry it out, to review and approve related policy workflows and guidelines, to sign the forms that you need signed to use the SiteMyData service and to be an advocate of what you're trying to do. You will probably also need technical support because you need to develop a technical script to mint DOIs but don't be scared off by that because it's not very complex and we've made ours available for reuse if anyone wants to use that. And you need technical support to mint and manage the DOIs over time as well and to create the citation element display in your discovery service. So just to give you a bit of a picture that these when and why DOI, that it didn't actually happen overnight for us, we've actually been doing this for a while as I mentioned. So in August 2011 we started thinking about persistent identifiers and I wrote a... PIDS is persistent identifiers, so PIDS for data options paper which recommended that we adopt DOIs and then in August at the same time and launch the Site My Data Service pilot and we were able to get agreement from our management to actually go ahead with joining the pilot and becoming guinea pigs. And then September to December we started developing our machine-to-machine scripts and started minting DOIs for our collections. I can't remember exactly what month it was but around May of 2012 we put a site list collection feature into the Griffith Research Hub and October we commenced a data citation project and by that I mean this was our outreach project to go out and talk about data citation to our researchers and to trial some data citation products like the Thompson Reuters Data Citation Index. And then in 2013 that project concluded in May and in September we produced our own DOI guidelines which we've also made available and we're going to talk about them a bit later and we developed a roadmap for where we want to go with our research data management and also with our DOIs and data citation. So when we first looked at DOIs we weren't actually thinking data citation right at the start. We were thinking we needed a persistent identifier and we needed that to fill gaps in persistent identifiers for scholarly works so we had long and incomprehensible URLs for metadata in our data repository and they're still there if you want to take a look they're pretty ugly and we needed a persistent identifier that would be short and easy to read and that would signal long-term management for our research data collections. We also needed a persistent identifier that would contribute to the semantic vision for data in the research hub and by filling in gaps for persistent identifiers for scholarly works I mean we had persistent identifiers for publications through our institutional publications repository but we didn't have persistent identifiers for our data collections, for our theses and for our grey literature like unpublished reports and conference papers and any policy documents that don't go to ERA or Herdsea but that we want to make available for people to discover and also other digital objects like a video lecture with link to research projects and so forth. So later on we needed it as a foundation for data citation as well. So we chose DOIs to meet our needs for a number of reasons first of all because they're a global persistent identifier that are already used in many scholarly publications so Crossref issue DOIs for publications and they're used in many journal articles. Also because DOIs can be assigned to research data, theses, grey literature and even software code now using data site as the registration agency and that is the registration agency that you'll use when you use the Ansite My Data Service. We also needed DOIs to improve the visibility of an access to research data and it gave us a responsibility for managing persistent access to our data collections. So when you meet DOIs you are also making a commitment to maintain access to at least the metadata page of your data collections and if that page changes like you get a new repository software and it gets a new web page for your data well you need to update the DOI with that URL. The signal is that you're going to maintain these data collections over time not just sort of have a transitory thing so also DOIs won't break the institutional repository software. DOIs won't break when institutional repository software has re-indexed sorry about that because handles sometimes do if you re-index software in your repository and DOIs won't break and later we needed DOIs because they facilitate data citation they greatly assist in tracking the impact of data sets through the collection of metrics and altmetrics which use the DOI. So the Ansite My Data Service provided a partnership with an international DOI registration agency that just data site. They provided minting DOIs for metadata records about open mediated or embargoed research data VCs, grade literature and software we haven't done the software part yet but I know the CSIRO have just begun to do that which is great. It's a machine to machine workflow and there's an easily achieved small amount of metadata that is required to mint a DOI. You can also trial it in a safe test environment there's a lot of high level documentation provided and there's a lot of information about data citation on the Ansite website and also it's free. So this is why some of the reasons why we decided to become the first getting pigs to site my data service. How does it work? This is to give you an idea of your workflows so you sign in agreement to use the site my data service and give you an institutional idea. You prepare your machine to machine script and in your script you're going to include the required metadata for each, for minting each DOI which you will already have because you've got metadata for your data set. So it's very easy, the metadata is data set title, creator, publisher, publication year and identifier. And the identifier is actually the landing page for the metadata of your data set, so URL or URI. You execute the script against the site my data service and the site my data service returns the DOIs and you can store the DOIs in your own system and use that to create a citation element which you can just write in your discovery service and you can also feed that citation element in your RIFCS feed back to ANST for display in Research Data Australia. So as an example of our scripts we've got a Python script that you are most welcome to use it's available on GitHub and the link is there if you'd like to read it. So we've minted DOIs but it actually raised a whole lot of questions. So the actually machine to machine technical part was quite easy but there are a number of questions that arose like what's the criteria for assigning a DOI to a research collection? At what level of granularity should a DOI be applied? Is it just for collection of all items or is it for items within a collection like a collection of film and then you have a DOI for each film within the collection or what? Also should the DOI link to the landing page for the actual data or should it link to a metadata page? Which landing page for us? Was it the research hub? Was it the research data repository? And what if the data's changed or updated like a new wave of data is added or a spelling mistake is made and then corrected? Should a new DOI be issued? Should researchers be able to mint the DOI or should we mint it for them? And how are DOIs assigned if the research data is the result of a collaboration between various institutions who can all do DOI minting? If a Griffith University researcher collaborates with Simon from the University of Sydney who gets the data set? Who mints the DOI? Because theoretically there should be one DOI for the data collection. And what happens to the DOIs we've minted if Anne's closed the shop? And finally can you cite data without a DOI? I wrote these questions up in an article called Implementing DOIs for Research Data in D-Lib and that's freely available if anyone wants to read it. We've actually learnt a lot since that time and we've answered most of those questions satisfactorily although the one about collaboration I think is still quite difficult. And I think TURN, Terrestrial Ecosystems Research Network has actually got the best experience in that particular area and we haven't had that actual question arise just yet. But to answer a couple of those questions we decided the DOI does need to link to a metadata page and for us that metadata page is our discovery service, the Research Hub. If the data is changed with a new wave of data yes you should actually create a new DOI. Can you cite data without a DOI? Yes you can, but it's certainly beneficial to have one and I'll talk about why later I've already mentioned a few reasons and what happens to the DOIs we have mentored if ANS closes shop? Well of course they are still maintained because data site is the registration agency used and that's international. So the questions and answers that we found we wrote up in our Digital Object Identifiers and Management Guide and I'll put a link there if you'd like to read them or adapt them yourself. Also ANS helped us answer some of those questions because there was a very good community of practice around DOIs and data citation that ANS facilitated and it wasn't just Griffith asking questions there were other institutions coming up with the same sort of sticky questions and we had quite a lot of events, workshops and online discussions about this and the ANS DOI FAQs have answered a lot of those questions and we also documented our experiences of minting DOIs in the gold standard blog which is now closed because that project is closed but the blog's still there if anyone wants to read some of the painful experiences but the good ones that we had too along that particular journey. So once we'd done the DOIs we put the site's collection feature in the research hub and we fed it through to Research Data Australia so in this particular record it's the same collection as the previous page and you can see how to cite this collection there in the Research Data Australia service. So just some discovery services that are useful on your data citation journey to know about so Thompson Reuters has launched a data citation index and ANS is negotiating for Research Data Australia data sets to go in the data citation index for discovery and also you can link from ANS you can actually log into your Orchid ID and link your Research Data Australia data set in your Orchid profile and on the top right there you'll see there's a video about how you can do that it's very easy to do. On the bottom right there's a little snapshot of one of the Griffith records in the data site content service that's there. So once you meet the DOI because it's done through data site you can actually search for that through the data site search service and that has driven a little bit of traffic to the Research Hub too which was unexpected and we're quite pleased about. So it's just more ways to expose your data I suppose and also put the old metric badge there because DOIs are really important in being able to collect metrics and citation metrics for your research data sets. So to talk briefly about our engagement experiences so after we'd done that we decided we went on this project to talk to our researchers about data citation or a select number of researchers really just it's a trial. So we established a blog for this particular project which again has now closed because the project is closed but the blog's open if you'd like to read some of our experiences there. So we did things like we spoke with our academic liaison librarians about citation practices in different disciplines and we included data citation as part of some standard consultations with a group in health and an individual in environmental economics so it's a relatively small portion of Griffith liaison librarians that were involved in this and also of Griffith researchers. We looked at what kind of workflows we could set up and we're looking at how to incorporate those workflows into our new research data repository that we're building to replace the current one so that you get an email when you've lodged a collection saying here's a citation you can use in your article or wherever to cite this data now that you've registered your collection in the research data repository. And we reviewed our existing information and workflows around Griffith policies and procedures, academic style guides, training materials and so forth and we included data citation in the best practice guidelines for researchers managing research data and primary materials and I need to acknowledge that the work of my colleague Sam Sehl is really critical in this, she led this particular project on the engagement strategies. So the lessons we learnt through this project is that one size won't fit all and it's important to be aware that there are major differences across the disciplines that you would be likely to encounter in a university like Griffith. So we've talked a little bit about style guides which are generally don't talk, this is citation style guides don't generally mention data, but in discussing citation practices with some of our subject librarians it became obvious that there are many other factors that would make a researcher more or less open to a discussion about how data might contribute to the impact of their research. And this could include the types of publishing outlets, their target audiences and the processes by which their work is currently assessed. So we also observed some differences in a fairly unscientific way really that seem to do really with the age and career stage of the person that you're talking to. So more than ever younger early career researchers need to build a profile and seem to be more prepared to investigate nontraditional ways of getting their research out there and that was also a lesson we learnt through launching our research hub project that those early career researchers where they had a profile, they were the ones who were most enthusiastic about keeping it up to date about embellishing it, adding things to it, making sure that they looked very good in their profile. So in working with the group of health researchers we observed that the people who maintain the most interest in the possibilities of data collections being cited were the postdocs, not the more senior staff members. Another lesson we learnt was to choose your time and to be able to find hooks with researchers. So unfortunately at Griffith we don't have any way of knowing when a researcher is going to publish their data. We still get our data collections incidentally when someone comes to us with data or where there's a particular project where we're collecting those data sets and that makes it very hard because it means we could get a data set after an article has been published that the data set has informed and therefore how does the person then put the citation to their data in that article if it's already been published. So ideally an ideal kind of workflow is the one that I described earlier where while the peer review process goes on somebody submits their data set, gets the data citation and then puts it in the article that way and that is the nice workflow that Dryad have. So we've done our best to find other hooks and the first is data deposit and notifications that can be automated around that and we've tried this and during the course of the project we minted DOIs for 14 newly deposited collections and sent the notification email to the researchers similar to the one in the Dryad workflow and these data collections were being identified again after the end of the projects and the final reports had mostly already been published but it was still slightly disappointing to not get a response from the researchers over this communication. So we also thought it was better to have a need to know basis around DOIs so we were interested to hear that other organisations are taking an approach in which the minting of DOIs is by request we'd interpret this to mean that it is necessary for a researcher to know what a DOI is and to have at least a basic understanding of why they might want one but our view is that the assignment of a persistent identifier to publicly accessible collections has benefits above and beyond those that might accrue to the data collection owners through citation and making the minting of DOIs as a rules driven rather than demand driven exercise should remove the need to communicate about DOIs and citation before a DOI is actually generated. So we would still want to include DOIs in the context of full citation information on display pages and in notifications and various kinds of training and information resources but the researchers shouldn't have to understand the ins and outs of DOIs in order to make a decision about whether they should have one or not. So we also think it's important to be honest and realistic with researchers and be careful to not oversell the benefits of DOIs and the benefits of data citation. So there's still a relatively small body of literature around the benefits and because they're researchers they're going to ask about it and we really need more research to be done in this area not just in one discipline but in other disciplines as well to make our case stronger. And it's good that that is happening. So here is a quote from an article that was in Peer J recently from Heather Pivoir and she says that previous studies have found that papers with publicly available data sets receive a high number of citations than similar studies without available data. We find a robust citation benefit from open data although a smaller one than previously reported. We conclude that there's a direct effect of third party data reuse that persists for years beyond the time when researchers have published most of the papers reusing their own data. So this is really about data reuse and publications but it's really selling the advantage of having open data and allowing it to be citable. And just mentioned here that Heather Pivoir is actually going to be one of the keynotes at the e-research Australasia conference. This year in Melbourne if you can make it along to that. And our fifth lesson was not everything can be solved now or by you alone. So it feels messy because we all like things to end neatly and all of our questions to be answered but it's really not possible to be able to do that. So you'll see in this guide that the white circles are collective action that's needed for change in these areas. That's not just us at Griffith but it's actually everybody really. So things like a change in funder mandates around data, publisher policies around data saying that which are starting to change. So we know that costs and nature are requiring data now as part of their publications. So when these change then you've got more of a case for researchers, more of a motivation researchers to do proper data management and to be able to get the citation and cite their data set in their publication. We also need changes to style guides for citations to include data and tools like EndNote and Zotero to include the ability to add data sets which I think is in Zotero that recently made that adjustment. And research quality exercises too at the reportability of data. So in the pink boxes we've got things that we're investigating now and in the near future which is things like information and training for our librarians and also for our researchers bibliometrics and altmetrics. And in the slightly red box we mostly know what we're doing with these. So we mostly know about identifier registration agencies and DOIs. We've got a data repository and institutional procedures around data. So we wrote some of our experiences up in a DLIF article which is also openly available and we did do a more extensive webinar on this when we first finished the data citation outreach project and there's a link there if you'd like to watch that because that has more on our experiences with strategies. I'm sorry to say that we did well but we didn't conquer the world. And on the to-do list are things like embedding DOIs in automated data collection workflows. So someone, it's a bits data set and they automatically get a DOI as part of a data citation email that comes out automatically from our data repository. We'd also like to mint DOIs for grey literature for things like our theses, reports, discussion papers and so forth. We know we can do it but we need to be able to do that. We need to improve the link between research publications and underlying data. So we have the ability to capture that in the research hub but again we can't really do that if data publishing is not yet routine at the institution. We also need to review our DOI guidelines, rules and workflows at future points in time and embed types of metadata such as coins into landing pages so that people can import that into citation tools. So finally just some reflections on our experience. So I think it's important to do what you can with what you have available to you. And the technical minting and maintaining of DOIs is relatively easy and something that, you know, if you've got those metadata records and you've got your open, mediated or embargo data collections then you can mint DOIs and you can start looking at putting the SiteMyData element into your Discovery Service and Research Data Australia. So getting the citation element is as easy as getting the SiteMyData service up and there's also a lot of materials available now on DOIs that is setting up DOI minting and maintaining and there's a lot of material on data citation which is aimed at researchers and research data managers. So you don't need to reinvent the wheel and I think as guinea pigs we did actually go through a lot of this and would be really pleased if people can reuse any of the things that we did so that you don't actually have to start right at the beginning. So you could decide, so the answer is a machine and SiteMyData service is machine to machine but you could set up an administrative interface for minting and maintaining DOIs. So that would be if you want your librarian for example to be able to mint a DOI without requesting the script from the tech... without basically liaising with technical support to be able to do that and Tern have actually done this and they have made that... I believe that they've made a code open... open source but you'd have to have an investment into being able to do that because that front end would run over the top of the machine to machine scripts in the background basically a humanly readable interface that the librarian or research data manager could use to mint and maintain the DOIs. Also establishing workflows for DOIs and data citation is not easy you don't know when your researchers are going to publish their data and if data publication is not routine. I've talked about that in a bit more detail and data citation is not yet common practice but there is a large international community supporting data citation both as a principal and to encourage practice in this area and there's a growing body of evidence on the positive link between open data and citation counts that we use in your arguments for promoting data citation with your researchers. So I'm going to leave it there. Thank you so much Natasha that was absolutely fantastic and a lot of content there so I'd like to take this opportunity to thank Natasha once again for a fantastic presentation.