 Hello everyone and welcome to the webinar enabling fair data in the earth and space sciences So this webinar is presented by the Australian research data commons My name is Natasha Simons, and I work for the ARDC and we are a transformational investment in research data tools and infrastructure from the Australian government I'd like to introduce our three presenters for today Shelly Stahl, Leslie Wyborn and Jens Klump Shelly, Leslie and Jens, would you like to say introduce yourself say a few words about yourself? Hello, I'm Shelly Stahl. Leslie is with me here at the American Geophysical Union building today lucky me I am Director for data leadership, and I'm very delighted to be here I'm Leslie Wyborn. I work with Shelly on this Fair data project. I also work for a few groups ARDC, NCIOs, scope, etc. etc. and my interest is in next year What fair data is going to enable for us all? Hi, I'm Jens Klump. I'm with CSIRO mineral resources. I'm the geographic antipode to Shelly and Leslie today sitting in Perth I've been I've also worked with both of them on fair data and precursor projects, and I'm going to talk a little bit about that today Excellent. Thank you all. I'm really excited to have you here today. I think this is a wonderful project It's a pathfinder project and an exemplar of how to do fair data in a particular discipline and I know there's a lot of ongoing work that this project has created and a lot of Interest and excitement around making data fair. So without further ado, I will hand over to Shelly and Leslie Great. Okay. So let me just get moving quickly Here here at the AGU We have a data position statement that is guiding the work that we do from all the way from the board of directors to any decision that's made and it's it's really Entrenched in this concept that our earth and space science data are world heritage So all of the unique observations that are made it's it's just not possible to Reproduce those the tsunami the earthquake all of the climate change observations that we're doing It's it's just incredibly important to our science to our researchers and the work that we have at AGU So it's it's a guiding document So I want to say that up front because that's why we were so energized and spend so much time Focusing on the data challenges within the community And one one that came out if you don't know the Belmont form. There are an environmental funder international with 29 different national funding agencies and I really like to use this survey Researchers came back and they were providing information about their digital skill challenges and this particular question that they asked really demonstrates the challenge that we have was sharing data and sharing software and models and that Self-disclosed these researchers really felt that that was an important thing that they needed better skills at so here We have 87% of the respondents of this of this survey So what where that went was the work here at AGU ramped up on challenges with data in 2015 and Shortly thereafter there was the publication in nature on the fair guiding principles to highly recommend everyone take a look at Where fair stands for findable accessible interoperable and reusable In the sense that it applies to data or any digital object that's out there So we asked permission to use the acronym and we if you ever see me at a meeting, please ask for one of these stickers I carry them with me and your And and and we launched a variety of Activities and the largest one we launched was a project called enabling fair data One really important thing to highlight is the difference between open and fair in the project We talk about both opened and fair but not all data can be open And it's certainly true that fair data isn't automatically open Fair meaning findable and accessible and well documented data does not mean For instance that protected information is expected to be open that doesn't that that's not what it means And it not it's not what it says it's possible for data to you data. That's fair to actually be closed And if it's appropriate for that data then that makes sense But we do want the data to be as open as possible as closed as necessary But it's important to identify this continuum There's a recent publication that I've noted down here at the bottom That talks about this in depth Coming from Sarah Jones was part of our project for enabling fair data. So I recommend that to learn more about that One of the one of the things that we're looking at is the fact that Much of our community has met the needs of their researchers for a long long time Repository communities infrastructure that supports the community and when we're when we're talking about sharing data We're really talking about machine readable international discovery and the evolution of moving from specific communities that may be geographically Co-located all the way through international teams that are working multicountry And the ability for cross-domain Transdisciplinary efforts to have more And I don't know if you're seeing my washington post articles, but apparently it's busy at the washington post this afternoon. So sorry about that But the but these international discovery and Persistent identifiers and good documentation. So this is the evolution things that we're doing for a community all the way through international work And our ecosystem within Research is incredibly complex. And if you want anything to happen that affects everyone All of these entities have to be engaged And you it's very difficult to do that And even for our big project for enabling fair data, we really focused on publishing and repositories at the time That that uh, especially the publication time, which is at the end of the life cycle of research So there's so much more work to do. Um, so let me tell you more about the enabling fair data project First of all, please if you if you you're clearly on a digital apparatus so go to Https coptest.org and navigate here to the enabling fair data project I really want you to have this as something that you Have access to all of the time and specifically the commitment statement and all of the project information is here Along with the author guidelines and the additional resources so specifically talking about that commitment statement, it's um, It's targeted to a variety of stakeholder communities and the two i'm going to highlight here the the the ones that we focus most on include specifically to help us make change in culture Where data repositories and our scientific publishers? So the big the bottom line is the data really needs to be in repositories Not in the supplementary information of a paper where it's very difficult to find not indexed and usually doesn't have good documentation Um, so we're asking repositories to help researchers um, uh, deposit their data Assign it a persistent identifier that's globally resolvable support landing pages with robust metadata about that data And then help them make sure that it's clear what the citation is and any other curation guidance Not all repositories have the same level of services, but we're asking folks to Work towards that as a as a main goal And then for the scientific publishers, there's quite a heavy lift here We're asking them to require that the data that supports the paper be identified in a data citation As well as within a data availability statement and that they correctly code That reference that citation um as a data citation, which is something that is possible through The um the schemas that are available within the infrastructure, but hadn't been fully implemented across the journals and In the publishers and at this point in time is actually being implemented. So that's fantastic Here are the signatories for enabling fair data from the publisher community specifically So you'll find um science and science advances Nature and scientific data are all signatories moving and transitioning into these guidelines Elsevier has been an incredible support. I'm very grateful for their support PLOS has been with us and and a real guide with all of their existing policies In supporting the work that we do and of course AGU and Copernicus with EGU. EGU's publishers Copernicus and Wiley supporting AGU Taylor and Francis Also has been incredibly incredibly helpful. So you're going to see that just about all of the earth and space science journals um Are our our signatories are on their way to being signatories We do we do are looking at the proceedings of the national academy. We're expecting them to sign soon fingers crossed So within the commitment statement, there's like I mentioned, there's several stakeholder communities identified The most important is our researchers These are the words from the commitment statement that we're asking researchers to embrace when it comes to fair data open and fair data And the reason that top part is in a different color Is because that requires them to select the right repository a repository that can help them So it really matters that we make these connections between researchers and the repositories that can support them in their research And on top of that we're asking for data citation availability statements and really doing treating your data management plan as a living document So the author guidelines were recommending that all of the publishers who are signatories have agreed to implement these guidelines This is a very quick high level version of them But choosing a repository that supports the fair guiding principles We say fair aligned because the project did not hit all of the letters of fair equally It mostly focused on f and a in a little bit of r So it's it's not a full blown compliance But it does get you pretty far citations and links to the data with those with with persistent identifiers such as a digital object identifier for instance data availability statements and then being as unrestricted as possible And this is where open comes in if you can be as open as possible as close as necessary This is what we're asking for but of course there is some data that must be protected at various levels So we we also have a frequently asked questions page and this is something that there Anyone who's a signatory for the project Is is being asked and this is a question that'll go out real soon Being asked to support updates to the text Even within the last six months. We know we've got to update our our f aqs because we we've learned and gotten feedback And refined how we say things in our our guidance um And so so that will be coming up soon And for your edification, there is a bitly out there So go to fair one page upper and lowercase matters And pull down the document that's got this one page guidance that gives you access to all the information within this talk And how we're talking to editors and reviewers and authors to get ready for the The new author guidelines that are being implemented across the entire journal community And then lastly We uh, you know the earth and space sciences is just a beginning Many of the other scientific domains are working on data sharing at various levels And the steering committee for the project which includes Leslie Wyborn um We have a nature commentary that was published early in june of this year Asking and reaching out to the broad scientific community To encourage data sharing to encourage embracing the fair guidelines It's it's talking about credit and incentives for researchers for doing this Getting it into our rhetoric to ask researchers to actually share their data and make that part of the process And then more difficult Not that the first two were easy, but even more difficult the International infrastructure and the costs and funding necessary to make sure that we have a persistent And sustainable environment for the data and digital objects associated with our research So it's going to take all of us to figure out And we hope you'll join us. So please become a signatory I've included And I know these slides will be shared with you, but I've included places Things that you can read about you can get be a part of the email and communication with enabling fair data by Sending me a note to be added on to the list You can be a signatory you can learn more about what the publishers are up to and what their challenges are And then some of the references for the materials that we have within the publication So Thank you and yens. I hear that over to you Well, thank you. Shelly um, I want to Step back a little bit from what shelly just said and look at where did we come from? Why was fair necessary? in 2003 A whole group of funders and research organizations and others got together And drafted and published the Berlin declaration on open access to knowledge in the science and humanities This was when the Internet as we know it is was still fairly young and so the internet They recognized had fundamentally changed the practical and economic realities of distribution Of distributing scientific knowledge and cultural heritage For the first time ever the internet now offers the chance to Constitute global and interactive representation of human knowledge Including cultural heritage and the guarantee of worldwide access And it didn't escape them that this had might have Some ramifications for the publishing Industry and how we deal with Communicating our knowledge and quality assurance For me the key point was the definition of what an open access contribution is and this includes In the definition included scientific research results What they called raw data metadata source materials digital representations of pictorial and graphical materials and scholarly multimedia material so In a nutshell this constituted what we then later called open signs and the real Aim behind this was scientific integrity and this is really nicely summarized in a report published by the royal society in 2012 scientists an open enterprise and Linking back to what shelly just said about openness The royal society came up with a really nice way of putting this saying it's about intelligent openness It's not To be open at any price you have to be intelligent about it, but it's about scientific integrity And that's why you need to think of how open can I be? So the value of open access to publicly funded research data and open source software is well accepted now I think the most prominent call for open access came with the Berlin declaration In 2003 it has been signed by 644 Signatories they are all research organizations Funders universities not individuals. That's where I think it is really significant because it has this very strong institutional backing And then the Berlin declaration was followed by Plenty of other policies calling for open access to publicly funded research data of those The most significant was the ocd principles and guidelines For access to research data from public funding published in 2006 What makes them significant is that principles and guidelines published by the ocd It is normally expected that these guidelines are then Transferred into national legislation by the ocd member states. So this was a very strong mandate from the governments to the funding bodies and and other Organizations involved in research that they have to thrive That they have to Work towards making Data from publicly funded research openly accessible And many more other papers followed and I think it looked all pretty good for open access and then this happened It nothing happened Um, and this was summarized very well in an article by Nelson in nature in 2009 Where Nelson complained about empty archives. This wasn't working And when we look at this over time, Amir Ariyani and co-authors looked at this in 2018 Um We see that there's a lag of data publications behind the total number of publications Only one tenth of publications are matched by any kind of data This is not a one-on-one match and not every paper comes with data but I think 90 loss or Gap is really significant and it hasn't gotten better. No, it actually has become worse Even though the number of data publications is rising exponentially It's lagging behind the overall rise in publications. So something is not working What happened and I think that we have to look at two Factors one is sharing research data and code is Still too difficult in practice and authors don't know what to do. They're willing to Help and they see the point but when it comes to Taking this into action It's too difficult And then there's the second factor that There I think there's still little prestige in making data and code available It's still the journal publication that gives you tenure So there's an elephant in the room that it's the journal publication that carries the academic prestige Because it showcases the brilliant idea that adds to the researchers reputation and unfortunately, I don't have time today to go into More of that argument, but it's something to consider So what then is Significant about fair is that it takes the openness the Sciences and open enterprise into action And gives us guidelines and principles of how we can act on making research data Open in an intelligent way by formulating these principles of findable accessible interoperable and reusable And that is a fantastic start and that then needs to be Further detailed into the discipline specific guidance So this is where cop desks comes in that shelly just talked about It started with the the coalition on publishing data in the earth and space sciences statement of commitment in 2014 and then the project that shelly just talked about that provides recommendations and guidelines How this can be implemented fully in the research data ecosystem So when we put all these pieces together there's an open aspect and there's a fair aspect and it has a technical and the social component and what has developed a lot in the last years is the technical component of Making the publication data and software available on the internet identifying them with physician identifiers And describing them with rich metadata and this makes the machine readable and discoverable And that lays the foundation of sharing with other institutions with industry and worldwide And then there's the open access aspect that is separate from fair Which allows us to share those ideas and reuse them so from my perspective there's A lot of technical development that has happened to enable fair That now lets the machines do the heavy lifting Where persistent identifiers provide the anchors to people publications data code samples instruments and other Things we want to know about and that we need to identify The metadata becomes machine readable and can be harvested by standard web technologies and Then there's also the link data aspect that's finally coming of age If it's done at scale and not done manually So are we there yet? We're getting there But the whole transition into this new paradigm needs Ask us to change our practices around data and software And we have to make them integral parts of the research ecosystem That means a technical integration that I just touched upon and a social integration that Unfortunately, I don't have the time to talk about today That's all from me for now, and I'll hand over to Leslie um Okay, so in my section What I just wanted to focus on is what is available in Australia for researchers To help them comply with the new fair publication requirements um particularly Those who are in the earth and environmental sciences Who is I know these requirements already starting to bite and I do get several calls um Week from people saying oh my publication cannot be accepted until I do x y and z And I guess they feel I'm responsible for being on this project and that should eventing their anger with so um I guess what I want to do is the work that shelly and um ends and said Where does this leave Australian researchers? Do they really understand how to consistently share document and reference the data they use and collect? Community domain focused repositories offer the best situation but The important point is that although they're very prominent in the us and in parts of europe We do not have many um domain specific repositories in Australia And how does the concerned Australian researcher make the best decision? Because this is about long-term transparency and availability of the inputs of their science For the duration of their career So kind of what's the best way forward? um And so I just like to point out that um these were the groups that were involved in the original project and um shelly contacted me and said oh would nci come in on this And I said yeah, and I can get a couple of others. I'm sure So oscope and the Australian research data commons also supported this so I think we're probably the biggest country identified in this project. That's yeah Yeah, we were Australia was quite prominent in this project And I guess why I wanted Australia to be in there was because I knew There could be problems when if we went certain directions And so again as shelly has shown you where the statement is I just thought I'd show you another view of this and on the left you've got all the groups So shelly presented the publishers these are societies and communities And on the right we've got the Australian signatories To it federation university has just joined And if you look at the individuals there are two Australians who have signed That'd be good if we could get some more Australians in this troops, but anyway, so this is it um one of the things we did to um to help the training was we partnered with esep And the earth systems information partners in the us have this data management training clearinghouse and so they joined in and we had a um Was a crowdsourcing of fair training materials, which a team then vetted and so here's the data management training tool and The longest short of it all is when I showed it in to the people at ar dc like natasha A couple of other people they ended up joining in and natasha is actually now on the advisory board of this group, so again, we're leveraging globally what's available And you can see there's a wealth of um if you go to this address It's a wealth of online training materials that have been vetted to be on this list It's not as if you just put anything on it So that's one way we're helping Australians is this is a resource that's available ARDC also, um then set up this single page on the new website on citation and identifies there's the link And then you can see how it links through to other pages on the website If you want to cite data you want to cite software or get a ijs and sample number again, this is helping you meet the requirements Of um the publishers as they come through so Here is another resource for you another issue Is that as I said, we don't have domain repositories, but for Quite a few years now ARDC has been working really hard to establish contacts with every institution in Australia And helping a lot of them set up a data commons more importantly an ability to record metadata And then for those of you who know that the metadata is stored at the commons or clearing houses at the institution and is then harvested into research data Australia So what we've set up here on this page is you can see the list of contacts and we keep this up to date This is changes But for every um state in Australia That's the person you can contact And again, it's all on the web page And then they will then if you're from an institution in Australia They will be able to tell you who to get in contact with at that institution Who can help you um store data at the local institutional repository So again, this is another service that's on the um and website Ah, as I said, we've got trouble as Shelly said there is the issue of the trustworthy repositories And I've known about this for some time. It's actually looking at on this map. You can see what has become known as the North South divide And it's embarrassing how many trusted repositories there are in the northern hemisphere Compared to what we have in the southern hemisphere And um there's only five so uh as part of this project we um set up a clearing house for earth and environmental science repositories to um Try and help get those repositories to get this trustworthy certification, but before long um ARDC Came along and as you can see on this page on the left hand side, this was the call for expression If you wanted to be part of it the teams already got going as a community of practice going so The link there is to the document on what the community of practice is doing And Richard Ferris is the contact there at um ARDC and so there's um quite a cohort now starting to develop through and Hopefully in combination of both the one we set off with the AGU Fair project as well as this one in Australia We'll start to do something about that embarrassing um north south divide And I think that's kind of my final slide, which was more just following through what yens and um Shelly said just trying to help you the researcher Know how to meet these new requirements and what resources are available And as I said, there's plenty of help with the ARDC as to um how to make them How to make them Okay, so I think um Susanna we can take it back from us now I think it's over to me too. Yens, would you like to turn on your camera too? Thank you very much. That was really excellent presentation. There's so much there to unpack There I'll start with the First question in the pod. Uh, it's from Andres Rubichek, and I hope I pronounce his name properly He says yens's talk reminded me of a recent nice related episode of big ideas and he gives the link. It's it's a big ideas as an abc program here in australia And that particular episode is with brian nosak who's the director of the center for open science And it's called sharing science for the good of all So, uh, I think some similar points that he touched on that you did as well yens That was just a comment in there um, I think Question for leslie about when uh, do researchers actually When in the process will researchers find out that their data needs to be made fair It's now part of the publishing requirements of the list of journals that shelly showed And there's been a bit of controversy at the moment because the journals have tended to do Well, specifically But I would argue if you are submitting a paper um, it's it's there so If you really want my opinion the day you go out to start collecting your data start fixing it up because this is where you're going to end up Yeah, if you and if you look on agu's web pages You can find we've got new guidance that helps navigate exactly what's being asked for So if you're submitting to an agu journal, you absolutely must have your data in repository That will be your requirement. Um, we also if your research is Is significantly based on software We will also be asking for a software citation for what supports your what supports your research And please take a look at the web pages if you have any questions You can come back to us and ask that you can send me a note We are the target you navigate the many kinds of domains the many kinds of data all of the different challenges Um, and so sometimes it's not clear and you are welcome to ask those questions And the editors of the journals have been really trying very hard to support the authors and Are helping to navigate the process. It's been really a great support system So just to the side again if you're in australia as um sciences Taylor and francis are signed up to this so they're starting to Um apply it as well um Yeah, and that's why I think it's important if you have any problems of giving you the state contacts In ardc who can then put you in touch with who your institutional contacts are And honestly As before you're even starting to write the paper just get this sorted out Because I am I'm not tired of But I do get requests from someone who said I've revised my paper and they won't accept it unless I take the data out of a supplement and put it somewhere And I'm going away this afternoon. So how can I fix this up and give me one? How to try and do it now it doesn't work Um, this takes time. You've also got to get the right appropriate metadata on this Um, if you haven't done it before it it's not something you can derive it, you know In a half hour Yeah So what do you think the uh, do you think there's a role for data management plans in this process? I guess that the arc have now Changed their Requirements where they are actually asking for researchers to write a data management plan Doesn't necessarily have to be submitted but they have to have one Do you think that that is a step in the right direction and how how do you see that fitting in with fair? Um, I think it's very important. I haven't seen the arc one, but I know some of the work that's going on in the us with nsf And in some of those you have to specify what repository your data is going to go in And the repositories are notified And so you've got that connection to see Oh my god, this person's collecting 10 petabytes of data and our repository can only take one terabyte That kind of balance question is being asked And I think you're getting deadly serious about asking people how much what is the volume of data you're going to collect Um, because not every repository can take everybody's data And just another aside again from nci's point of view. Oh, I see you've got 20 petabytes of data. Will you take my um 10 megabyte excel file in um Free text format and jumble blumble, you know, no we don't we don't take that So know what your data is and plan at the time you're collecting it That's really it and I agree with you if you've got data management plans at day one you can build all this into it right Yes, fantastic. Thank you. Um yens. You mentioned that the challenge is for fair is both technical and social Um, which do you think is harder and why? Well, you know that I always say may all of your problems be technical It's You can Fund somebody to build a technical solution, but you can't fund somebody to change their mind on their behavior and so this is the Why I Said this is an elephant in the room because we don't like talking about that Split in that we have the technical solution. So they're coming into place now and at the same time It doesn't change our behaviors as much as some hope it would and so That's why I also advocate to look at what How does research work from a social perspective? If we want to change how things are done We first need to understand how it's done and where we can intervene And we've heard some of the things that publishing workflow is certainly varies Persuasive argument to change behaviors but it comes at the end of the data life cycle when things become permanent and archived and published and as Leslie said It actually starts when you start thinking about your project that you need to start thinking about what am I going to do? What is this going to produce and how am I going to deal with this? many people who deal on the with data and materials on the so-called long tail of research Initially aren't challenged by this because you can always keep things on the thumb drive and in a drawer It's only that it's where with a big Volume the big data volume big whatever volume projects where it starts that people had to think about how are we going to deal with the logistics of this? so There are other points where we can start thinking about intervening into this process of doing research and and trying to understand how that then influences introduction of better data practices So what role do you think skills development plays in this because You did mention that one of the challenges is that authors actually don't know how to do this um what what Who really should be providing these skills for researchers and at what point do you think? That is a very good question Because yes skills Are essential because if I don't know what to do and how to do it. How can I act? so um In the researchers lifecycle that should come early students should learn about data management practices, but Since all of these developments are fairly recent. We all need to upskill and There are various programs the ARDC offers programs together with the university's CSRO has the digital academy that wants that it's aiming to upskill the entire workflow on data and digital skills so this has to be effort that touches More or less everyone in in research We the world is changing and we have to do something we have to learn Right. Yes. Very good um We might finish with uh, shelly. So shelly you've put an enormous amount of energy and skills and enthusiasm and everything into this project What's next for you and what's next for this project? Oh, thank you So adoption continues and we're in transition um In something that that happened during we left a lot of problems still yet to be resolved And and it's not necessarily the publishers that are the ones to take some of these Forward so we have to work with some of the other stakeholders within the community And one of the areas we headed Just recently and and the belmont form one has funded us to do this and lesley's actually own this project with me Is to figure out credit so So when you are talking about You know, you know the the challenge the social challenge of the researcher Wanting to actually do a good job with their data management There are many Um incentives To share with them on why to do this they benefit when other researchers do this for them So so here you have Brilliant researchers trying to develop then the next most exciting project to to discover or learn more about a particular cycle within our environment and If they could only know What data exists for a particular geological area or a particular phenomenon? and They they know that they have to right now it's The people you know the lab you trust The organization you've partnered with before But you don't really get access to the data that might be stored in a repository in a different country That you may never have had access to before What happens if all of a sudden every repository out there makes their data easy to locate? So that you can use a simple search to identify a location And then go to that repository and use their local tools to investigate what's there And you've discovered that there's more information about the area you're interested in that you ever realized And it's changed how you actually look at your research Such that you can build on what's already out there in ways that you never could before because you just weren't aware This is what we're trying to get at we're trying to provide to researchers More information about what's available so that they can use their incredible Imaginations to look forward and make a leap that they can't make today Because they don't know what's available So This is the goal Thank you so much. Thank you. Thank you all three of you. I think it's been a really brilliant webinar I mean, it's terrific work in this area. I really hope that People have gotten a lot out of this content and that will continue strong To do all those amazing things That you have in your head there shelly and that we all want for open science Fantastic. So just I thank you everybody for attending a last reminder that we do have some events coming up The Australian research data commons has two summits one on storage and compute and one on data and services programs They are happening in a couple of weeks time at on the One of the first days of the research australia conference. So if you want to find out about them and Some of our other upcoming events. Did you just go to ardc.edu.au? And you can also subscribe to our newsletter to find out. So thanks once again everybody and have a good day slash evening