 Okay, so today I'm going to go over data citation, DOIs, and the ANS services that support data citation. I'm also going to come across the questions that ANS is most often asked about this area, but I'm taking a fairly high level approach, so I'm going to assume that I don't need to convince you of the benefits or the needs for data citation. I'm going to also assume that you know what a DOI is and that you are, or perhaps in the future, access the ANS machine to machine services, particularly for data citation. Now, if those assumptions aren't true, please figuratively put your hand up and pop something in the question box. Conversely, if what I'm saying, and I am going over some older material as well, so if what I'm saying is painfully obvious to you, please also put your hand up and we can fast forward a bit. Okay, so let's sort of take the 30,000 foot view of the data citation landscape. Data site is an international nonprofit consortium for data citation that has come out with the goal of effectively trying to do for data what Crossref has done for publications for some time. It's built on a very widely deployed and familiar DOI infrastructure, again a sort of nonprofit international infrastructure usable in a whole bunch of ways. Currently ANS is the only Australian registration agent for data site, which means we can create DOIs, but this is not exclusive in any way. Anyone else could rock up to data site, pay the membership fee and start meeting DOIs also. Now ANS provides a service, and you're probably going to hear this term quite a bit today, cite my data, and I'm going to focus in on that service quite a bit very soon. The other thing we need to start talking about is citation metrics, and I'm particularly going to focus on those three areas of the journal companies, Crossside and some of these domain portals. So this is an area that's in rapid flux and is far from settled, so we're sort of taking a snapshot there, and the best I can do there is show you what current practice is. So the site, my data server, so let's suppose you're ready to, and you're convinced of the benefits of data citation, and you're ready to work with ANS to attach a DOI to your data and enter the brave new world of data citation. That reference I've got on there is a good place to start. I can go into more detail over these six steps that I've got here later in the webinar. For now, I'm going to stick at a fairly high level, so you get the big picture. So step one, step two, and step three there are effectively administrative steps. You know, with some back and forth with usage agreements, and you're providing us with some technical details so we can give you access to the service. Step four is effectively your data citation P-plates. I know many people will have functional and unit tests on their research data software that they will want to run, and you don't want to do that with production DOIs or production services. So we do run a test prefix service. Once you've gone through that and you're successfully through your P-plates and haven't been booked, it's on to step five, which is another final administrative agreement. And finally, you've got your open license in step six and you're ready to start minting full production DOIs with the assurance that everything went well on your P-plates and your tests are all passing. Now, this is an extremely high level and abstracted away most of the details view of what's going on. So, and in this case, even though we provide the site My Data Service is effectively a front end or a proxy for the international data site infrastructure in the blue box there. And we're also doing it in such a way that future interactions you have with research data or study take your DOIs into account. It's a simple request and response type machine to machine service only at the moment. And once you've got that, then you're expected to maintain that link to that DOI so you can use it in future harvest. Now, this is, as I say, this is very high level and I'm skipping over a lot of the Mali and sort of error conditioned type details. But I love to talk about that stuff. So if anybody wants to get in touch or focus, you know, on a more technical side, I'm happy to do that directly with you or after this webinar. So one of the things I'm often asked is, is this workflow set in stone? Or could you go about this in another way? And the short answer is yes. If you let us know your feedback or your requirements, then we will certainly look at other ways of providing access to data citation services. And a good example might be people who already have significant data collections in RDA might want to retroactively mint DOIs, provided the data, the metadata is, you know, meets the minimum standard, which I'll talk about later. And we're also looking into, we've had requests for a sort of per institution home page where you can look at your data citation status and how many DOIs you've minted and how complete your meetings across RDA. So the sky's really the limit there and it really ends as always, as always is driven by the needs of the data owners there. So please let us know if you have ideas or suggestions. Okay, there's something slightly different about the end site, my data service compared to the other services. So the other services, you know, including harvest into RDA and RIFCS, that you're maybe used to using or considering using. It's usually a relationship at least initially directly between ends and yourselves, which we then provide your metadata to the world. But initially it's a sort of one to one relationship. We have never imposed extremely strict metadata requirements on those harvests and, you know, different organisations have different levels of metadata, you know, that suits their workflows. And our services, we're funded essentially to target the Australian research and government data sectors. The site-mode data service is slightly different because we're effectively a front-end to an international DOI and data site infrastructure. The DOIs you mint and the transactions you do with our services appear very quickly in that international infrastructure. And once they appear, we can't arbitrarily take them back without a formal change or redaction process. So it instantly involves the rest of the world. This and data site, so we have no choice but to impose the minimum quality and data standards that data site impose. And that includes a compulsory set of metadata. That's not particularly onerous from memory. It's five metadata fields that looks a lot like a subset of Dublin Corp. There are optional metadata standard, optional metadata fields as well, which we encourage but don't enforce organisations to complete. OK, I guess the next most common thing I'm asked about ANS services is this is all very good, but what happens when you go away? And the short answer for the DOIs you've minted and their metadata, nothing will happen. So we've taken the approach that when you approach us to mint DOIs, we register your URL, your name directly with data site. And because the international DOI infrastructure is persistent anyway, ANS could disappear tomorrow and your status, discoverability and availability of your DOIs and citations will be as they are today. If you want to mint more DOIs and ANS is no longer doing this, you will need to find another registration agent, but I'm confident either another Australian or an international one would come along in time. So I can also say, and I notice one of the ANS directors, Andrew Trelawres with us today, I'm sure he'll jump in to reassure you if I'm incorrect, but I believe the federal government has guaranteed to keep the ANS services running, not saying where or how, but there is that guarantee. But in particular with DOIs, you're going straight into an international infrastructure and that will persist. OK, the next big thing, so time and now money. In the past, it has cost a small amount to mint DOIs. The International DOI Federation has abandoned that model. They have a membership model now. ANS is funded to provide infrastructure to the research and government data sector in this country, and therefore we provide this service for free. The block of DOI address space that we get from data site with our own prefix is effectively infinite, so they're not going to run out anytime soon. I would say that if you feel you need three million of these things in a hurry, you should probably contact us so that we can come up with a better bulk workflow. But for you, cost is not an issue. OK, so this is a machine to machine service and anything that involves programmers and programming can incredibly actually go wrong. So a couple of ways of if things do go wrong, particularly when you're on your P plates, please let services at ANS know, and everything that goes to that email address is given a ticket and is formally tracked. If you want to talk to one of your friendly neighborhood ANS client liaison officers, and I noticed Andy White was with us today, if you have sort of broader need to discuss how this work or how you think it could work better, please contact me directly or again services at ANS and that will make sure it gets to the right person. If you have questions that are, you know, more of a design or usage or high level thing or you would benefit from an ANS person sitting down with you and looking at your workflows and your data issues, again, please talk to a CLO. Karen, this group here has a wide range of options for helping you or contact me, particularly on the more technical or workflow side of things. And we're all always happy to hear how people are using the ANS services or the like to use them. On the face of it, this is a very simple question and we live in a complex world unfortunately and the answer to a lot of these questions is it depends. So one of the things we're often asked is, at what granularity should I cite my data? And I normally do that annoying thing of turning around and answering a question with a question and say, well, what level of data would your data users expect to see it at? Would they, if it's a biological data set, would they expect to see one sequence or one gene? If it's an astronomy thing, would they expect to see a whole chunk of the sky or one light curve? If it's a social science thing, would they expect to see a whole survey with a million respondents or just one postcode? There are no sort of universal best practices I can point to on this and it does vary wildly from discipline to discipline. Having said that, if you have a naturally hierarchical data set, you can consider having DOIs that cite at multiple levels of that data set. So some people may be interested in the data set as a whole and want to cite that as a whole. Some people might want to zoom right in and be interested in very low level elements to that data set. So again, if you know you have those two kinds of data users, I would consider multiple citations to different parts of the data set. The optional, the data site optional metadata is quite rich and allows you to build up what's in effect for the data equivalent of the social graph where you can build linkages between within a data set and to other data sets and allow people to use the DOI resolution infrastructure to navigate that graph. Which I know sounds quite abstract, but I can expand on that if once I know more about people's particular data issues. Now, this is the big one. And it's such a deceptively simple question and I think I've spoken to a number of BAs and data archivists who have struggled very long and hard with this issue. What do I do when my data changes? And the reasons for changing are as numerous as there are data sets and researchers, I think. And again, I can't give you a universal answer that will fit all situations. So here's three approaches that may or may not be suitable in your domain. You could consider taking time-based snapshots. So for example, you were doing economic statistics, where the date of the data set is significant, then you might have DOIs snapped in time. If you have an instrument that's taking a photograph of Mount Everest every five minutes, then time is a factor and you might want a different DOI every five minutes. The other thing you can do is instead of going through time with the whole data set, you could do what software people call a delta where you issue a DOI to only the changed elements of a data set. So if it's a gigantic data set and you've only changed 1% of it, you might issue the 1% as a delta and give that a DOI with these instructions that people need to apply that to the giant data set. This optional metadata I referred to is also good for building up not just hierarchical structures, but structures in time. So a data set can refer to itself or related versions of itself going back in time. And again, people can sort of walk that graph. If you let us know specifically what your issues are, what chances are we can probably point you to someone who's had a similar issue, or we might be able to suggest a good way to use the service. Now, we've had maybe 300 or 400 years of learning how to do journal and publication citations and it's very, although even that's changing, but it is very well understood and it's become sort of socialized in the research sector. We've only had a few years of doing this with data citations. So to say it's early days is probably the understatement of the decade. However, there are some emerging developments that are worth looking at and I'm going to quickly talk about four of those. The first one are these domain-specific data portals. And there's two examples there, Pangea, the Earth and Environmental Sciences one and Dryad, the Applied Biosciences one. So these portals are domain-specific, usually non-profit international and each of them hopes to become the first point of call for data in a particular domain or but the subject area. And if they're built around DOIs and citation from the outset, that can make your life a bit easier and particularly if it's truly open and they're providing you with citation metrics that might be a way if there happens to be one in your subject area. These portals, and we're going to see this come up again, are completely blurring what a publication means and I'll return to that later in a moment. So we're starting to see new online only journals who are even further blurring what a publication means. Some of them will require that you or require and insist that you deposit data and software with a publication and all those elements are up for peer review, including re-running results to see how you've got them. Gigascience is a new one, and actor Chris E is a very specific one in crystallography, but Gigascience is an interesting one. They're also assuming that this is for people with very large data sets in the biosciences but completely blurring what it means to publish something. Very exciting stuff. How are the people doing it? Oh, actually, excuse me for one moment. I might just quickly also focus on the journal companies. I don't want to say too much about this, Karen has a lot of material and has run other webinars on this, the journal companies are waking up to this as a business opportunity because they're already doing it for publications. And there is a collaboration between data site and crossref called crosssite where the journal companies are going to try and track both data and publication citations. So there may be other initiatives out there I'm not aware of and please let us know if you know of any, but that's probably a good start for how to get going in this world of data citation. Here's three upcoming webinars that Karen's group here will be running. Heather, there is probably one of the top people in the world on impact of data sharing and everything she writes on her blog is worth reading. In fact, everyone there, we've been very, we've been, well, Karen has been very student picking very good people for this so from all over the world. So as my lecturers used to say to me, I commend those to you. Okay, so I've zoomed through that. Here's a few references for what we've just been talking about including that last one is for any developers who want to access the data site metadata service directly there's a scheme of it. Now, I'm happy to take questions. I can also go into more detail about the six steps of accessing the service if that's helpful. Otherwise, we'll throw it open to questions, I think.