 Good afternoon everyone and welcome to our webinar today. My name is Jerry Ryder and I work with the school workforce team in the ARDC and I'm based at Wake Campus in Adelaide. Now I'll introduce guest presenters shortly. I just wanted to start though with a little bit of background because I know some of you will be aware of the countercode of practice for usage data associated with library resources such as databases, electronic journals and ebooks. But today we're going to hear about the countercode of practice for research data and the five steps needed to implement that code in your repository. As you know the ARDC has a strong interest in citation and usage metrics for research data and has been involved in a number of global initiatives aimed at implementing and improving processes around this. We're currently in discussions with the Make Data Count project team about how the ARDC can support those Australian institutions wishing to implement the Make Data Count protocol for their data repositories. So please watch this space and please do let us know if you have an interest in pursuing this further. We'd love to hear from you. In the meantime let's get on with our presentation today where we're fortunate to have two fabulous guest presenters. First up we have Patricia Cruz who's currently Executive Director of DataSite where her role is to advance DataSite's mission, build partnerships and work with stakeholders. Prior to joining DataSite, Patricia was Director of the University of California Curation Center at the California Digital Library. Patricia is a strong advocate for data sharing and we're delighted that she could join us today. After Patricia we will hear from Daniela Lowenberg whose current role is as Research Data Specialist and Data Publishing Product Manager at the California Digital Library where she has played a lead role in the Make Data Count project. Prior to joining the California Digital Library Daniela spent three years as a Publications Manager at PLOS ONE where she implemented and oversaw the PLOS data policy as well as running some journal operations and we are also delighted to have Daniela with us today. Okay so thank you everybody for joining this webinar and hopefully we can get some good questions out of it and thank you to both Jerry and Susana for setting this up. I'm going to talk to you about the Make Data Count project kind of some of the background and the logistics associated with it and then just really what the initiative is about and what some of the milestones were in 2018 in the first release and this will kind of give you an idea of what our vision for Make Data Count is and how you can use it in your repository. So first what is Make Data Count and cool people call it MDC or you can call it Make Data Count and when we really started with Make Data Count we really wanted to imagine a world where data are considered a first class research output and are valued as such you know that there would be birds singing and bunnies hopping around and little squirrels and everybody was really thinking that data is just as valuable as journal articles and between 2014 and 2015 when I was at the California Digital Library and Daniela was at PLOS we did a project funded by the National Science Foundation and in that project it was really to think about what it was we needed to do in order to collect usage metrics around data thinking about how you count usage metrics what what do researchers want when you count usage metrics what's important to them and this came out I'm showing an article here from John Crats and Carly Strasser that had a publication in scientific data that came out of that award and part of that was doing this extensive survey where we asked researchers what was important to them and we both researchers and data managers measure scholarly prestige in citations of researchers and 61 percent of data managers rank citations as the most interesting metric and so that's where we really focused our attention in the Making Data Count project. Let's go forward a little bit and in data site data one the California Digital Library counter were funded by the Alfred P. Sloan Foundation to really take the work that we did that's part what was part of that original project and to think about okay let's you've done all the thinking about it you've talked to a lot of people understand what's important okay how do we implement it and so that is what the Make Data Count project is all about it's really about making real a lot of the work prior work that we did and there's here's the basic things that are part of the Make Data Count project we're 18 months into this project actually a little bit longer and the first thing was a formal recommendation for measuring usage data and that's where the countercode of practice comes in and I'll talk a little bit more about that in a minute the second thing is to develop a hub for data level metrics so people can collect metrics around and reuse those metrics around their repositories and the really the overall goal is to make tracking that usage much easier and much easier for people to do and therefore driving adoption and showing how easily it can be done and that's one of the things we're doing with you today and and engaging across all the research communities it's not specific to anyone research community but any research community can use this so the Make Data Count project this is just a nice little image that shows you all the different pieces of it where we're leveraging existing initiatives that are out there such as data citations and developing a new recommendation such as the countercode of practice and those feed into a data level metrics hub that is hosted by the by data site and this is where if you participate in this you push your server blogs into be processed in the data metrics hub and then we drive a gate engagement across all the communities and using the data the data level metrics hub you're able to display data metrics and when Daniela goes into her presentation I think this will be a little bit clear but this is really a schematic of how all the pieces kind of fit together so so far talking about the countercode of practice we have created a data usage metric standard and this is on the counter website and the countercode of practice really talks you through how data usage should be measured and if you think about a journal article a PDF file it's pretty easy to think okay one person downloaded that journal article and the metadata was viewed x amount of times with research data it gets more and more complicated as you can imagine as people what do you count you count when people view a piece of metadata you count when a data set is downloaded do you count also the prior data set if it has a related data set but it was also downloaded what about versions of data sets and so that's what the countercode of practice really digs into that then we socialized that countercode of practice and developed a data usage metrics working group that's run by they're at the research data alliance we also have reached out at many many conferences and talked to many people about the countercode of practice and and what it means and and how to put it into practice we gave the mdc a narrative and you saw the little unicorn earlier and we're bringing it back for a second time just because we love it so much this is daniella martin fenner and myself were presenting at the pit of palooza which took place in gerona last year and it's really we we tried to release as i said as the slide says is give it in a narrative and make sure that people understand why it's so important to engage in this initiative at data site we built a hub for open metric for an open hub for usage metrics and this is available to anybody through our api and so you can grab those usage metrics via the api the second the other thing that we did is we implemented at our own repositories and this is really data one and and daniella at using dash and soon dryad where we've looked at you know let's give this a test run we found that when we would talk about this project and we would talk about how usage data metrics and what it needs to look like we really needed to put a picture with it and so we felt it was really important to to show how it's done and show what the value of is it of showing usage data metrics are and then the really important thing and this is something that i'm particularly excited about is adding data citations to the mix so if a data is used that's great but if the data is cited we can also track that and include that and daniella will talk a little bit more about that as as she goes forward but it's really trying to get that publishing community to move forward to cite data in in research articles etc so that we can also track that as an output all of this together with making data of first class citizen so daniella i think this is over to you now sorry so what does it all look like thanks trisha for talking about what we've been working on as she mentioned we implemented at our repositories i myself am at california digital library and we have our own data repository called dash which you may have also heard about is becoming dryad and then the data one repositories implemented as well and so you can see here on the right that we have the metrics the views the downloads and the citations and of course you know the views and downloads standardization is happening in the back and i'm going to walk through this after but the big part then is that you can click on the citations and you can actually see the real-time citations that are related to this this data set and we can see it in data one as well here we can see the views and the downloads and the citations as well when you click on the citations there you can see the full list as well and so that's the high level of what this actually looks like at a repository level but we're going to go back some steps and walk through the implementation but first i wanted to address i think a lot of people are probably wondering how this relates to other initiatives and one that folks at ARDC have been involved in for a long time is scolex. Scolex is not a thing is what we like to say we work very closely with scolex most of data site is scolex but it's a change initiative and so the make data count and scolex teams work really closely to advocate for best data citation practices but how we really work together is that scolex is an information framework so it actually is explaining how people can submit their data citations properly to crossref and make data count on the other side is actually giving an infrastructure over just for displaying those data citations back at the repository level so if you have questions about that we can talk about this at the end but i wanted to move on to implementation at your repository what does this all actually look like and we'll get into the details here and why is it so important that we do this we need a way to be able to measure data and we know a lot of people oops we know a lot of people right now are talking about there we go okay people are talking about how data needs to be valued and we need to be able to credit data and how important that is but we can't do any of that until people have actually implemented a framework for standardized data metrics and for showing data citations and so what we really care about is building this out so that repositories and publishers are supporting make data count practices so that researchers and repositories and funders and publishers and all of us that support researchers can actually have the means to be able to evaluate the impact of data but to be able to do that we first need to have these standards implemented and right now that's not something that we have so we put together these five simple steps for how you can make your data count um of course these are different if you're at a repository that's part of a larger repository organization or if it's a homegrown um all these steps kind of vary but we're going to walk through at a really high level what these steps look like and then if you are at a repository um and you'd like to talk about this further we'd love to set up time between our uh data site developers and your team to actually walk through the process of this and we can go over documentation from repositories such as Zinodo and Dataverse that have or are implementing this right now so the first is we built a uh getting started guide and this is on our github um all of these links are available from our MakeDataCount.org website um and that uh that there is the starting point that walks you through what I'm going to go through now so the first is the code of practice for research data and Trisha mentioned this earlier I know many of you are familiar with counter because I got institutions and you're using this for other scholarly outputs so we wrote a counter code of practice for research data with counter that they recently formally endorsed so now it's the first code of practice not related to articles but related to data um and it's on their website so you can see in the screenshot there's a link to the peer uh to the preprint but on our website and other places now we do point to counters website as they'd like to collect all of their feedback there but we knew that this was the starting point is we actually need a standard for usage metrics and so that was the first piece of this so um we strongly urge anyone interested in this to read this code of practice first or at least the executive summary um so what does it actually look like at the repository level so what it comes down to is processing the logs so there's access logs at a repository level of people that are looking and downloading and using the data sets there and so we needed to process those against the code of practice and that means specifically we're looking at the views and the downloads which encounter language our investigations our views and the requests our downloads and so we wanted to standardize what those things were something different with data that we also had to look at was that the access methods for data are different and so we had to include API access in there and then also looking at automated agents like Python and other things that are not necessarily crawlers that we would want to exclude but are actually robotic methods of getting the data that are common practice for researchers so we were looking at users at the country level so that we don't have any privacy issues and we were looking at session times as well so if you click on to download a data set 100 times in 30 seconds is that 100 downloads or one and so that these are the kinds of things that we defined in the code of practice that we then had to standardize within our own logs so what we see here is this is a screenshot of at dash when we're calculating the stats and so we are calculating the stats every day and so what we're doing is we're running a Python processor that's open source and available for anyone else to use as well and that's processing our logs against that code of practice and what it's looking for is any new usage events that have happened since the last time we were in it so each day any views or downloads that are happening and then excluding those bots again for any DOI that was accessed that day and then here it's put into a sushi format which you may also be familiar with from other counter work but you can see here on the left the body of it what we're sending you can see what country someone's accessing it from we can see that if it's a request that means it's a download investigation or a request yeah download and investigation is a bio so you can see the counts there and then on the right you can see the report so we put in dash which is the name of the repository and then we put in the time that we're actually filtering by and this is the information that it's is being sent out and that's our way of standardizing it and all of this information the report format the processor we've used or how we did process it in case you would not want to use a Python processor are all available in our getting started how to guide but also our things that we would love to walk through with you and your developers if your institution or repository would like to get involved in this so we've mentioned a couple times the usage metrics hub and this is hosted by data site and it's really important because if we actually want to be able to do aggregations of this data it needs to be in a central place and so data site built this out and and what happens is we actually send that report that I showed the screenshot of over to the usage metrics hub and that's sent in through an API right now but if you wanted to do it manually through a CSV that's also something that we could work with and the benefit of doing this is one that other people can then access these usage metrics but also that data site can do aggregation services so right now data site folks are working on doing aggregations by user by using an orchid when we have an organizational identifier we can look at by institution by funder anything with a persistent identifier so it would not just be by repository but you could really get a better sense of the usage that's happening with your research data and so these are broken down by data set and by data set and then they're aggregated over time and they're also combined with the data citations metadata so round tripping this information we've standardized and now we're sending it to the hub what's important then for repository is pulling that information back and so a big piece of this is that we are making API calls to that hub for each DOI for from the repository and so what that means is data site is actually working with cross rep event data to get all the citations from that data set that's where scolics and all of our working together comes in and then what we are able to pull back then is all events related to that DOI such as data citation from event data and then also pulling back that usage metrics but we can also use that from our own database or we could pull back the aggregation and that's kind of the round trip of the sending and receiving metrics and what that results in is the slide earlier that we showed where you can actually see standardized views downloads and citations not not only count of citations but what those citations actually are and so it's a little less exciting to see the numbers for views and downloads because we know that repositories are already doing this a lot but the big thing is actually having standardized metrics will allow for us to have a comparison and right now when we're looking at the views and downloads that data sets that repositories are really comparing apples and oranges because we just don't know the different ways that we're all doing that and so the big push for us is to get as many repositories as possible to start doing that for usage metrics standardizing them along with our big push for publishers to be submitting data citations so that we can have this cycle really running so what's next as we said a couple of times now the biggest priority for us is outreach and adoption so it's really important to us that as many repositories as possible are able to implement this and we want to devote resources to helping repositories do this and then the other side of it is mass outreach to publishers and we've been doing a lot of work with Scholics on this getting publishers to understand the need for them to submit their data citations properly to cross-rep so that we can actually pull it back and show researchers what's happening with their data set and right now we are having a struggle with publishers doing that correctly so you may see if you go to our website a lot of blog posts that we've been writing about that and then we want to iterate on our implementation so we released all this in June of this year but we want to continue to build out what those metrics are so data site as I mentioned is working on aggregation and then we also want to work on showing different types of things data downloads by volume data downloads by regions and then of course all metrics that people are interested in that we could pull back really anything that goes into cross-rep that could be Wikipedia stuff Twitter stuff that kind of stuff we would be allowed to show so we're building out the functionality for that and then in the future as well we do need to think about beyond the DOI because right now if it was not clear this is for data site DOIs that we are submitting and pulling information back from and we know a lot of research data are not just DOIs they have accession numbers and handles and so this is on our radar and I think that we care about as well and with that I think we would love to talk about questions and here's the URL for our website all information is linked from there including recordings and presentations in our roadmap and our Twitter as well well thank you Daniela thank you Trisha I have a question that might I guess see to be a further discussion I was going to ask about DOIs being a prerequisite then you mentioned that that's something that you're looking at as a next step Daniela so for those repositories that are not yet routinely assigning data site DOIs to their data sets what what can they be doing now and what's the sort of a timeline for their involvement Trisha do you have any data site specific response to that or well from data sites perspective we're really focusing on DOIs simply because that's kind of our business at this point and we're also really focusing on the data citation piece as well and that is really a lot of DOI compliance so that's one of the reasons that we're really focusing on that is because we know people really want to know what journal articles are citing a particular data set but as we go forward I think towards the end of this project we're going to step back and say what are the other identifier schemants out there that we really need to work with and where is there some low-hanging fruit that we can do things with and then maybe develop a roadmap based on that I just wanted to add one other thing that as a person who used to run a repository I think we were always struggling for funding and so this is a really good way to if your data are being used in particular that you can say look how valuable this repository is here's the benefits and investing in a repository if you can show those usage metrics and the journal citations associated with the data in your repository that can be a really powerful message to people absolutely thank you for that we do have a question that's come in who's interested in how to distinguish legitimate API calls from non-spired and malicious stat spikes can you help us with that one Daniela? Sure that is a great question and that is a lot of what we spent time working on in the code of practice you can see in the code of practice or in our GitHub we have a list of what are known spiders and crawlers that we're excluding a lot of them are similar to the ones that are used for the counter code of practice release five but we have added other ones that we found as well and we're continuing to add them there's a lot of known and then a lot of that is then coming down to at the repository level when you start finding things and adding that to the exclusion for instance we know that Google bot is something that we would want to exclude but we we did put in specific parameters there if things you know if it's a python thing that that that would be okay but it's very well laid up in the code of practice but I'm happy to talk about that another time too. Excellent thanks for that Daniela we don't have any other questions at the moment but as I mentioned earlier we at the ARDC are very keen to connect with people in the repository community here in Australia who have may have an interest in pursuing this a bit further and we can certainly facilitate a connection with Daniela and her team and we would be really interested to to I guess you know track how that works and see some pilots get up and running so if you are interested please contact us at the ARDC and then we can sort of facilitate the next steps there and now we do have another question when we've got time for that so Janet just looking at the schema and wondering how how Skalix hooks in is that something you can help with Daniela? Sure so Skalix of course is having showing how researcher or how publishers and repositories can submit their data citations properly over to Crossref Venn data and also through a Skalix hub for open air and so what we're utilizing is just ensuring that as many data citations are properly being indexed and being sent over to Crossref those are the we can pull back and so we're not defining any new schema we're not building out anything new for data citations we're just leveraging what's already pushing data citations into the event data hub and then we can pull those back and show them at the repository level and aggregate them in the data site hub. So Janet hopefully that answered your question and Janet has has indicated that you have answered the question for her so thank you Daniela so thank you again to Patricia thank you again to Daniela thank you all for attending please let us know if you would like to pursue this further we'd love to hear from you in the meantime enjoy the rest of your day and we'll say farewell for now thank you. Goodbye from California. Bye bye.