 Okay, welcome everyone to this webinar, which has been jointly organised by AGU, NCI, OSCOPE, TURN and ARDC. The topic of our webinar is meeting publisher requirements for data samples, stock titles and notebooks. I'd like to thank ARDC for hosting us and providing the backbone and for those of you because we have some overseas people registered for this. The Australian research data commons purpose is to provide Australian researchers with competitive advantage through data and their mission is to accelerate research and innovation by driving excellence in the creation analysis and retention of high quality data sets. I'd like to do an acknowledgement of country and those of us presenting acknowledge and celebrate the first Australians on whose traditional lands we meet and pay our respects to their elders past, present and emerging. And for me, that is the people of the none all country around Canberra. So just to give you an idea, we'll just have a short introduction. Then I'll hand over to Shelly and Chris from AGC who have a lot, sorry AGU have a lot of experience in fair publication requirements and not just in the earth's sciences, the cognizant that a lot of people from health are actually registered for this webinar and bio areas and all sorts of areas. And also we have researchers here as well as those that offer research support. So we're cognizant of both types of people here. And so we want to speak to you as the researcher or research support as to how you can meet those requirements. And if you do it carefully, you can also increase your digital presence. For the second to the half hour at the end, Natasha Simons as ARDC has agreed to step in and facilitate a forum on where you can tell us what you feel needs to be done to help you meet these requirements. What are your pain points? That is when Shelly and Chris have helped you understand what they are. What are the things that are currently stopping you and what can we do to help you? So if you can keep your questions about the publishing process to Shelly and Chris. But if you're having problems in Australia, you've got a half hour set aside for that. And then we'll do a quick wrap up. So I'm doing the introduction and I just thought we thought we'd just introduce the speakers one after the other at the start so we can get the presentations to flow smoothly. So I'm Leslie Wyburn. I'm with the NCI Australian Research Center Commons. And I'm also chair of the Australian Academy of Science National Committee for Data in Science. And I just have this interest as a traditionally trained scientist that there's data and there's your conclusions and that you have to make your data available. And in this day and age as it gets bigger and bigger, it gets harder and harder. So now I'll hand over to Shelly. Could you do a quick introduction, please? Yes, thank you, Leslie. And thank you for the opportunity to be here. Chris and I are incredibly delighted. So I am the senior director for data leadership at the American Geophysical Union and very excited to speak with you today. Hi, and I'm yeah, I'm Chris Sertman. I'm the assistant director for data leadership. Natasha. Thanks, Leslie. I'm Natasha Simons. I'm the associate director for data and services at ARDC. And I'm coming to you from Turbul and Yagura country in Brisbane. Thanks. OK, so let's ask the obvious question. Why are we having this webinar? And recently there was a project funded by ARDC NCI called the 2030 geophysics collection in short, and it aims to build a high resolution geophysics reference collection for computation and wants to make high resolution data sets available for programmatic access in 2030, including AI and ML potentially at Exascale. But as I said, this is not a geophysics seminar because we need to know what's going on in the physical sciences, the social sciences with their data so we can meet those transdisciplinary research challenges of the next decade, e.g. the UN sustainable development goals. And we also want to go through this issue of I'm making it fair today. Will it be fair in 2030 if we want to use it as this system runs up and there's more on this website page here. So here's the but, but, but, but we also know that researchers on this project need to survive in the here and now and they have to publish. So what are today's requirements for publishing data samples, software, etc. And are we really making these fair in our current publications? Will the data in our publications of today be interoperable and reusable in 2030? Can we integrate with other disciplines? And above all, which is what I said, the last part of this webinar is about how can we make it easier for you, the researcher and all those who are supporting the researchers to meet the challenges of fair publications of today. I just highlight that Shelley is an official adviser on the 2030 project. So those of you that are on that, we will still be able to tap into Shelley's expertise in the coming couple of years. So now I'd like to hand over to Shelley and Chris. And can you take it away and I'll stop sharing my screen. Thank you so much, Leslie. Let me go ahead and pull my slides up. Chris and I have quite a lot to share with you today. And we will be going back and forth and present who's going to be presenting. So hopefully that will give you some variations. The American Geophysical Union has had a position statement on data since the mid 1990s with the most recent version updated in 2019. And very soon we'll go through another revision. It guides the AGU and the decision makers and leadership on everything that they do, including our brand new strategic plan, which is really focused on inclusiveness and collaboration, as well as open science and open data. Our data, our earth and space science data are a world heritage. This data needs to be cared for and stewarded into the future in order to solve the very complex problems that we're dealing with today. So we what we do supports those goals. And we're very pleased to have our our leaders and our membership put together these position statements for us to follow and to support the community with this. This graphic is is fairly is really interesting. And it's also kind of fun. When you get access to these slides, I have the link to it in the notes. It's in in 2019, nature's nature celebrated their 150 years. And what they did is they took all of their publications and mapped all of the references in order to show how science has built upon previous science. And one of the things I want to highlight to you in this case, I want you to look at yellow. Yellow are the are the geosciences. And what you notice here with yellow is they intertwine it intertwines with just about every other color here, meaning the other disciplines. And what I want you to take from this is not only how connected we are, but how how much we depend on the other domains and disciplines in order to further our work. And it's a fun and it's a fun interactive graphic and you can explore it. And it has a really neat speaker that gives you some fun stories that go with all of that content. Also in the work they did in 2019, they mapped the papers and how many authors and where the authors were from. And what I want you to note here is we nearly have no more single author papers. I mean, compared to the body of science and research that's being published, it's very small and continuing to get smaller. It really just talks about how complex our world is and how we are also international and growing. So multinational, that's your international teams. That's the orange growing and growing. And the single country teams, the domestic teams is the blue shrinking and shrinking. So so what you can take from this is, first of all, the research that you'll do, especially if your early career or student is not going to be by yourself, you're going to be with a team. And you're going to be in an international team, which means you need to know your community, not only who's sitting across the aisle from you at your lab or your your offices, but you you need to actually get to know folks worldwide across different domains. You also are going to need to be able to discover that research worldwide. And you're going to need tools that will allow you to discover that in a way that's meaningful to you. You're going to need really good documentation to understand that research, the data and the software, you're going to need data that's easily interoperable such that you it doesn't really matter which team created it, no matter where you are in the world, that it will actually work together and you don't have to spend an enormous amount of time fighting with your data to get it to be to work with other data sets and you need all of this to be accessible. So I identify software being accessible, but truly all your research, all of your data also needs to be accessible. And in tools that are really easy and current, things that are valuable to you today, I give one example here. You know, I could have made it, you know, entire many, many examples, but Jupyter notebooks are very popular as well as other tools. So just put your favorite tool that's popular here. And that is relevant for your community. And don't forget, we have to make sure the licensing is clear. If we don't have licensing, we don't know how to reuse the content before us. We need to have that licensing that tells us that we can actually under what conditions we can use and reuse what's been published previously. So we need your help. We represent a society, but also a publisher. There's 23 journals. And we, you know, the thinking about these things at the time of publication is way too late. We really need to make sure that every step of the research life cycle, all the way from funding, your institution, the teams that you work on, the communities you talk to, AGU and other societies that you work with, no matter what the society, we all need to be thinking about these things in a way that makes sense and is collaborative. So in our talk, we will walk you through what publications are looking for. It'll have an AGU bend to it, but it will definitely be something that is common across most publishers or they're moving this direction. So it may be a preview for some publishers. What to do with models and simulation a little bit hard, because those are usually more complex and larger. We'll talk about the resources we have available openly on AG for AGU and the blog, which we try to do in community to make it easy to access. We'll talk about tools that you can use to help optimize discovery, citations that will get into persistent identifiers and linking and some selection of repositories. And then machine learning and artificial intelligence are a really big deal right now. And the way you prepare your data and the way you think about it is really relevant and we'll talk about data readiness around AI. And then what's happening around the upcoming data fair and tools that you'll be able to have access into the future. So, Chris, I think you were going to take over. Yeah, so next slide, speaking of working together. So one of the things that we did recently at AGU was some and I think a variety of places are in the same situation where this is an evolving space and you might have content in different places throughout your organization. We recently worked with our community, our staff and other, you know, just a wide range of people to really consolidate our guidance. So try to create a streamlined resource, a location where we could point to to help guide our authors through particular aspects of sharing their data and software. So typical questions like what data needs to be available. Repository selection is one we often handle as well. Availability statements, data and software citation, the sample numbers, which we'll talk about later. And then the help desk, which actually allows us to to field more questions and refine our guidance. So this is a resource that our authors are sort of increasingly using to help format these different sections of their paper. So next next slide. Part of that is we really want to guide our researchers in doing more with describing the data and software behind their papers. So instead of parachuting them in, and I think this is a common scenario that that you've seen, maybe you still see, but we're working towards improving is this whole concept of dropping an author into just a home page URL or not even actually providing a link of any sorts. And just saying data, we we use data from this from this organization or, you know, we use data from this website. What we would like to do is guide them as best as possible. And so often that means a direct link, a DIY, a persistent identifiers, often, you know, the best solution. But sometimes you can't often do this because repositories and and and other services out there are in a variety of different provided different functionality sometimes. And that that can mean, you know, just a database that you have to create. You have to select certain features or it can even involve translation. So, you know, we we have increasingly in a number of Chinese repositories being used. And so you need to guide you need to guide readers through your availability statement to the data. So saving them time, saving, saving everyone. And so next next slide. So what are we talking about here? There's really really two two aspects here. So the availability statement. We I know it's commonly referred to as data availability statement, but we we feel like there's much more in there. So software and you'll hear more about notebooks as well. But there are other objects that we can include and site. So trying to provide as much information to and citation information in your availability statement, but also citing it. So citing in your reference references section. And this is a this is a place where we often I think we'll be working more and more with our authors because I think there's just two two separate things going on here. And culture, whether authors feel like they is it fine to cite their data or software or technically possible. And so next next slide. So why so which data and what software so data used by from others so they can get credit your own data so you can get credit. Data supports your research findings. So usually the processor aggregated data, analyzed data and then of course software. This is something we're stressing even more. So next slide and what is included in the availability statement. You can see, you know, the different aspects that we're asking for when people sort of mention data or software. So, you know, repository name, the context, so the description. Really, what what can what can someone find when they go to that link? What are they? Where are they expecting? Licensing information is not so common, but we're, you know, asking for that as well. And then in text citation sort of encourages the citation and your references and then also with software versioning and a link to your platform. So that's also besides the preserved version of your software, which you can provide a DIY also to development environment where you can go and see where the software is being actively developed. So next slide. So we have an example here. This is this one's from one of our journals, Water Resources Research. And so the next next slide. You can see in a sample availability statement, this is one of our older availability statements. As I mentioned that we're evolving as we speak, the these availability statements, the open research section where they live and AGU journals is evolving and getting better and better as we work with our authors. This one is actually pretty it's pretty good. You can see that they are just sort of describing the data where they can find it in a repository. So in this case, it's dry it. And then they also have this this in text citation, but you can see they're using a DIY direct link that's sighted sightable in. Hello. Oh, OK, sorry. There was a message that popped up. So so so you can see that you have a DIY and you have a citation. So next next slide. And you can see the the citation in the references section. So you can get this this in this case, this is this this can be properly indexed and you can get credit for for this this data. So next slide. And then on the dry page, you can see all this or descriptive information around around that data. So the author information, the title, the citation, the abstract. But you can also see that dry and provides some of this this additional metrics information. That's also something that authors are interested in and who's who cited it. So this is an example. So next slide. So what do you mean credit for my data and software? You you maybe find that this is a surprising question, but it's not actually. Again, we we're we're working with authors to sort of move people along to thinking that this this is actually a thing. You can actually get credit for your data and software that it's actually, you know, the culture of of of of citing is actually possible. And it's technical, technically possible. So, you know, software, research data and software. Why why would you why would you want to do this? So it's important scientific contributions. You this so you want to get, of course, you want to get credit for the data, the software that that you've developed. It's also becoming more and more a part of the the promotion and tenure process and honors and awards so you can recognize the value. You can get credit in that that sense. And then funders, of course, are asking more and more for for you to do this. And so next next slide. And so what does it do for me? So it's it's easier to value it. So, you know, your peer reviewers can actually review the data as long as it's sort of preserved and curated better. And it can be discovered in different ways, not just solely through the paper, but it can also be part of the the larger sort of ecosystem, the scholarly graph, and you can you can find it in other ways. And and ultimately, again, to potentially get credit or, you know, find as Shelley was saying, can actually lead to new collaborations. Your data will be be preserved as part of the scientific record and linked. So this is also another thing where, you know, your data might not be lost or, you know, your software might be lost, but it it'll be preserved as well. So next slide and then there this is just one publication, but I'm sure there are many more out there of of citing the benefits of of citing your data and software that it can actually lead to more, you know, more likely to be cited by others. And you can see this percentage impact. So that's another reason why you would want to link to your data and your software as well to improve your your impact. So next slide and and so how do you do this? Well, we, you know, we often say trusted community accepted repository to preserve your data and software in a place in a repository such as this, which provides this is a citation capabilities. And often that means that the OI and then you can you can include that citation in your references section as well in your paper. So next slide. And we it's funny, there's sort of two things about this slide. The first, if you don't know about citation cross our cross site.org. It's a really helpful resource for formatting your your citations. We've heard a lot of great things from authors about this. But the other thing is you can actually see the citation again, the example. And it's an it's it's an opportunity for us to highlight something that's new with AGU, and it actually will help with indexing citations. But it's also this this bracketed description. And it's actually, you know, included in APA citation styles. But it's something that helps the indexers as well, as far as identifying whether this is data, software, computational notebook, collection or other sort of research objects. So that's a nice example. And this is something we just recently rolled out as well. So next slide. So again, here's our updated guidance. If you want to go to data and software for authors at at AGU. I think the other thing to highlight here is the data help at AGU.org email. It's not just for AGU authors. Actually, it could be, you know, if anyone on this calls and this is, you know, having any questions or, you know, has any challenges, then they can also reach out there. But the other thing is that we we're hoping that some point that this drive we drive this towards sort of more of a community option. And we might we will speak about this later on in the talk. So next slide. Another thing to mention here, and it's already an old slide, actually. But it's a good one to show you that initially we started off providing guidance by each journal. And so you can see some of the earlier earlier journals where we provided sort of the specific guidance. And one one thing is we'll probably end up coming back to this because there are standards norms for particular communities. And that we'll be working on that in AGU to to really, you know, speak to, I think, you know, often you hear, well, we use community standards and we want to start speaking to that in AGU and delving into that more of like, what are the community standards and using our sections? But this is how we started off. We started off with these journal, the journal specific guidance. And then next slide we actually moved. And this was a this was a really great moment where we actually started coming together some of the journals listed here. And and and one of the people to sort of lead this is Peter Fox, who was actually the editor and chief of of Earth and Space Science. And he unfortunately passed away right when we were starting to put this together. But he he sort of launched this this initiative to start doing a shared guidance between our journals. It made sense, right? It made sense to to have sort of a common common reference, common common guidance document. And so now it's mainly used by all of our journals now, except maybe one or two, as we're still talking about some some challenges around proprietary software and data. But for most part, AGU has moved many of its journals to a common shared guidance now and it's available at data.ag.org. But yeah, this is this. And then here's the citation here as well. So next next slide. And then the other thing to to mention here is it's one of the areas where we have a good deal of conversation is around models and simulations. And you know, this is this is an area where it's becoming increasingly complicated with, you know, data in it can be actually not just on one research computing cluster, it can be in many places. And the software can be in many places as well. And, you know, there's just the the complexity is growing with with with these these larger projects. And so this is one project out of NCAR that's looking at how can we actually what are the best practices for preserving and and the replicating some of this model data. And they have a lot of information there to help work through this. But we also have the practicality, the the the the here and now to deal with the AGU. And so we actually get these questions and we've actually had to work through them in our data and software sharing guidance for authors of here's the here's sort of the good, the better and best type of approach to preserving your your data, your software, and just trying to provide the best guidance we can as our authors sort of grapple with how do they best share this information. So I recommend that you have a look at that resource. It was very informative to to our work as well. So next slide. And that's Shelley. Thank you. So this resource, which is at the very top, data dot a dot org slash resources, Chris designed this. And what I what I want to highlight for you here is, first of all, it's on GitHub, which means that you can do a pull request or you can provide a comment and actually contribute to the thinking and updates that happen as we continue to evolve as a big community, not only in the Earth and Space Sciences, but although, you know, every all of us, many of these resources, it doesn't really matter what your domain is. I think for those of you who are participating from, you know, sister and brother domains around the Geosciences, you'll see that this makes a lot of sense to you too. So you're welcome to take a look, reference it, use it, pull it into your own resources. Everything is open here. Everything there's no other than the blog post. There's no content that is that you can't actually site or build upon or give attribution for. So there's a lot of materials here. And again, we have the email at the bottom. This email goes to Chris and I, primarily Chris, giving him credit because he does the bulk of the answers for that email. And it's always there's always something to learn. There's always something to dig into. But the the things that we've come that we've been working on with the community and vetted broadly are in the resources section. Also on this resource is the blog and the blog. And I think there's a slide. I think not yet. Let me back up. The the blog will is actually things that we're getting a lot of questions on. But we don't yet have a common answer. So what we do is we try to give you the best information we have at the moment, knowing that it will likely change or we know about things that are changing, but it's not settled yet. So this is helping us navigate a number of topics. And it's been nice to have as a resource and share it with others. OK, so let me talk to you about these resources. Some of them I'm going to highlight here. The first one is about digital presence. So this is you, the researcher and what it's like. What, you know, when I go look for you online, how I see you. And this is you curating your presence. It starts with the orchid as your hub and then uses the persistent identifiers that link to your orchid and talks about how to optimize those. So I've included a few slides from this presentation today. Which we'll walk through very briefly. But then we also have we encourage you to go look at that and get yourself optimized so that collaboration and discovery of your research and your research products are as optimal as possible. So far, citation, this is so popular. How do you do it? Where do you look for it? What are the what's the information? Chris has Chris did this one and it's really fantastic. Five tips for citing your research software. You get a checklist and you get a recording. You get that for the digital presence as well. We're trying to give you all a very short checklist and then something a little bit more substantial, about 15 minutes of video to help walk you through all the concepts, if you're interested. But if you're not interested, you get you get the one page checklist. The guidance for AG authors. This is truly for any author. So so how do you cite a Jupyter notebook within your paper? This is current practice. We are continuing to work on other tools for using Jupyter notebooks more in a more integrated way. But at the this is what we have available right now within the AGU publications, and I believe it's doable within any journal and any publication. It is pretty straightforward and it should be supported by any publisher. And for those of you who like are, we did not forget you. Here is your guidance, something very similar, but tuned to your community and to your platform. So we're delighted. AGU hosts both folks that love Python and Jupyter notebooks as well as our scripts and our markdown samples. So, yes, I am definitely on the team on the support team for the the International Geosamples and the I samples group. What I can tell you is about a year ago, we had a meeting where I asked for help on what it would look like to cite a sample within a paper such that automated attribution could take place and and citation would be valuable to the community and the team is working on that. We I've seen some early drafts and hopefully we'll have something in the new year that we can go ahead and put here on the resource page. And hopefully you'll see it other places as well. But there's a lot of really great work happening. And I'm really excited about it. And I can't wait to be able to share something with you once those teams work themselves through their vetting process. And last thing, we're going to look to you for any additional adjustments that you'd like to make on that statement. So here is the data dot AGU dot org page. This is where you'll land if you type in that website. This is the week. It's a simple page. You've got a blog post, you've got resources, and we have a link to the data fair as well. So it's very simple, not very, very easy to use. You can see that very large data is a challenge. We have we have a blog post there. You can see there's the digital presences there, and there's a couple others. If you would like to see a resource or a blog post, please ask. We have a number in the works. And if somebody asks, so likely we will change the order of things that we're doing to accommodate that. Because if you want to know, then likely you have peers that also want to know. So so please do reach out and express challenges that you have. So let's talk about digital presence. So I told you what that that is. I will repeat it because some of you may have been drifting. So come back, come back. Digital presence. This is about you, your research and optimizing how you are discovered. And we feel that that's really important in order for you to you know, have the most opportunity to be recognized for your work, discovered by others and for you to gain a network and collaboration opportunities and meet other folks that are working in the same area, worldwide, internationally. So this is this is how things are connected across all of those types of digital objects, your publications, your data sets, your software, etc. So why do you care? Chris said this, I'm going to say it again. When you publish your data and software and then cite it in your paper, you actually can increase the citation of your paper. And we do have research on this. It's not a guarantee, but it does increase the possibility. And then you also open up yourself to additional collaborators, other domains and these also play into open, open science practices. So first, you. So if you don't have an orchid, oh, my goodness, get one, get one right away. This is critical for to be the hub for your entire digital presence. And we have on our website information about how to do that. So I won't get into it here, but get yourself an orchid. Here's mine. This what it is, it's not your CV. Think of it as a linking hub where you put some basic information, like I keep my bio here, I can assure you my bio is used for most digital versions of when I give a talk, people come here and get my bio. This is where they find it primarily and then make sure your email. So you have control of what information is public, especially if you're an early career researcher or student. I'm going to give you a tip right now. Put two emails associated with your orchid. One is your personal email that is likely not to change. And one is your university email that might change if you as you progress in your studies and in your career. And you want them both because you always want access to your orchid. OK, that makes sense, right? You can just make one public. You don't have to make both public, but you can access your orchid through either. So this is very important. OK, that was my my tip for those who are early in their career. Great. So here is a paper and look at this. It's one that I published with Leslie Webber. I don't think it's our most recent publication, but it is one that I'm very proud of. And it combines two communities, the Geosciences and Chemistry. We do have a lovely relationship with IU PAC. And Leslie works within Geochemistry, which was one of the reasons it was important to have her as a co-author. And you can see in this case, data science only wanted orchids for the corresponding authors. So what I've learned since is we could have forced the orchids in for Leslie and Nancy and Ian, and we should have. So that's my tip to you. Make sure orchids are associated with every single author. And you can see that here with a little green dot that says ID. That indicates that the orchid is associated with that author. The other thing you need to know is it might be a new word for many of you. So I really want you to pay attention. Crossref is the agency that assigns DOIs to publications, primarily English speaking publications. There are other agencies for Japanese and Chinese and Korean and a few other languages. But Crossref is your primary DOI registration agency for English publications. Why does this matter? Because they are the ones that make the references to your data connected to your orchid and connected to your publication. And you'll have an opportunity to let them help you populate your orchid. You want to do that. And if you didn't recognize the term Crossref before, this is these are folks that you can trust populating your orchid profile. This is a good thing. It will save you a lot of typing and you'll thank me later and tell all your friends. All right. So to get all of that information, to create those links with Crossref, to make sure you have all of your links as robust as possible, you want to grab this checklist. So you'll find it on the data.ageu.org page. These are the direct DOIs to the checklist and tutorial, but you can get them from data.ageu.org. And then please walk through it and share it with everyone you know. It does not matter what domain you're in. It absolutely does not matter. Let me say it again, does not matter what domain you're in. It will help everybody, every researcher you know. OK, tell everyone they should just come get this and do this. OK, Chris, this is the big handover. Thank you. And you had a question again. And you had a question in the chat actually about what about data site? And I think that that's actually something I do speak to in the digital presence, but we can get to that. Yes, we do. We do get to that as well. So go do the checklist and we'll get to data site as well. Yes. But so this is this is something that I think through also through Leslie's encouragement that but this is also something that we're we're also working on. And so as far as always improving our guidance, but AI data readiness is something that's actually come up. And you've heard I we recently heard this at the Research State Alliance. So the National Institute of Health in the US and the USGS, which is like two examples that said we're working towards AI readiness. You may have heard this in other places. There's actually a cluster, a group that works is working on a checklist related to this topic at EASIP. So this is Earth Systems Information Partners, a group that meets regularly to work through how they can sort of improve this checklist. They're serving the community at the moment. And hopefully the checklist will be available. But like next slide. I I couldn't I couldn't necessarily point you to the it's in a draft form right now, but I pulled out all the sort of elements that they're looking at in the checklist. And you can probably see a lot of familiar familiar topics here about some of the things they can do as far as documentation and access. But I think this checklist is going to be really invaluable to to authors, to researchers that that are working through sort of the challenges of how they can make their their data AI ready. And this really in many ways continues a lot of the work that we have been doing that we provided in our guidance. But it will obviously update it as as as this progresses. But we thought we'd share sort of the early the early preview of some of this information. And just another thing that we're tracking. If anyone's interested, I can they can email us to data help at ag.org and we can provide the the draft a link to the to the draft copy of the checklist as well. So next next slide. So mentioned this earlier, too, like our our we have our own help desk. But we also have the the grander, the greater help desk, which is a community resource that's that we partner with ESIP on an Earth Cube, but it also travels. It travels around the different venues. Next venue will be the AG fall meeting, which is coming up in two weeks or just feels like it's coming very soon. But it'll be around the corner very soon. But yeah, one thing we do with this as a community is we provide resources from the various services that are out there in Earth and Space Sciences. We have volunteers that sort of help and and answer questions at various times or they provide resources to the community. There's the physical location. Oh, what happened there? But there there's the physical location, the AG fall meeting where we'll be, you know, running various workshops or trainings, but we'll also be doing talks, giving talks from various community members. But you can also see all this material at the the the data fair link that you see here. But it's also on Twitter. So you'll see the the the back and forth as far as the questions and and answers on Twitter through the hashtag data help desk and then age 21. But I think that the the one of the things to mention here too is that again, we we really want to start expanding this further and make make the help desk a community resource. We we know we were we were talking earlier about maybe this was before we even got on the webinar about how this is a shared effort data stewardship and its libraries. It's it's, you know, the societies, the publishers. It's the repositories, you know, it's the organizations. We are all sort of working together. And that and Shelley advanced the slide at the right time, which is at the end of the day, it's it's this Beatles line again with a little help from our friends. We all are in this boat together, working together. And so I think, you know, it's very much on our roadmap to to start expanding our help desk further and start working with the community on on on all these challenges. So I think I guess I'll leave it there. I don't know if we had any other slides. I think the last slide is our contact slide. Thank you, everyone. Yeah, we can leave it on the Beatles slide. Thank you, Chris and Shelley. Um, I guess the idea was to point you to all these emerging resources that are happening and also to show you that if you've got any issues, you can join in and guide Shelley and Chris as to what are the next lot of resources that would be useful to what you're doing in Australia. We thought that was a better way to present rather than sitting there and going through and this is how you cite a model and this is how you cite software. So we've got all those resources and we'll make these slides available. I've been putting some of the connections in. So between now and when we hand over to Natasha to help us understand what issues Australians may be having is open to the audience to ask any questions of Shelley and Chris directly. So does anyone have any questions they'd like to either put in the chat or just unmute yourself and put or put your hand up first so we can sequence the questions and ask. So does anyone have any questions about any of the material that Shelley and Chris presented? So I'll ask one. One of the things I get asked a lot about is citing models. What do you think is critical to being able to cite a model in a publication? Looks like Chris is choking. When you're choking. OK, OK, I guess I'll start. So so a model is is really a very complex software with configuration, with forcing factors, with input data, with output data. What I what I would do is I would direct you to the resources Chris provided for the RCN. They were incredibly thoughtful. They had simple models all the way through complex IPCC models walking through, you know, what the contribution is to science. They even I think they even include proprietary models and how to handle that. So so we work with AMS, quite a lot, the American Meteorological Society. So they have a number of models that are proprietary. So it, you know, I think that that is the probably the best resource that gives you all the different variations. And then when you when it's time to actually publish based on the type of science you're doing, the recommendation for what to cite is and they give you a recommendation. If there's a rubric that they use and that that is the best resource. Then second best, of course, you can take a look at what we wrote, which walks you through good, better best, depending on where your model is stored. And the most if you have control over it, of course, we want you to put it somewhere where you can get a DOI. Don't forget if you love using GitHub or GitLab, their GitHub is a really great connection. And as a nodo where you can create a citation, we highly recommend that. We provide you with those resources and that would be how you would preserve the version used for your research. How do I do, Chris? I think I'm it was good. I think I'm better now. Louie, do you have a supplementary question you'd like to ask? Louie, you've got your hand up. He's unmuted. OK, thank you. Sorry, I was trying to click the button. Nothing was happening. I just wanted to sort of bring up the question of which is a follow up question of peer review. And I mean, it does apply. I'm thinking of it in terms of models, but it applies to anything that we want to count as a sort of as an output that gets considered for, as you pointed out, sort of the wider use of our stuff that's not just for more science, but also for our personal development and so on. So anything that's peer reviewed has a much higher value. And I wonder what you think about of, you know, assigning DOIs, but also assigning a sort of a peer review status to data sets and models and software in the same style as we do for something like JOS, you know, one of these sort of where we actually go through code and review it. Yeah, JOS is a really valuable resource. I am very familiar with that, Dan Katz and that team. So I'll give you the best answer I can give you, which is it's not. There isn't common agreement or resources or funding yet. But I would agree if if we're able to get to the point where data and software can both be can be peer reviewed, that would be incredibly valuable. So but there are a lot of challenges to do that in today's infrastructure. The majority of researchers use a general repository which provides a researcher with no support on curation or peer review in any way, shape or form. I have to tell you it makes it very easy for you to get a DOI, which is beneficial, but it does, you know, benefit when it comes to reuse of your data and software. So many folks are stuck with just a general repository or a generalist repository. But you as researchers should actually want a repository that can help you with curation. What that looks like is going to be different depending on the domain, the country, the resources, what's possible. And even when you get a repository that can help you with curation, there are only certain skillsets that are going to be available in that repository team that can give you some level of review. For instance, even with a discipline specific repository. And I think about one of the ones I work with a lot is BicoDemo. So that's biological chemistry, chemical oceanography, data management office at a Woods Hole in Massachusetts. They have specialists for different types of data, but they are only comfortable looking at like boundaries, like like salinity, for instance, ocean salinity. They'll look to make sure that the numbers make sense within some bounds. But they don't know the science that went into the capture of those numbers. So they can only ask the researcher, well, you know, this bit right here looks like actually maybe your columns are off because these numbers are out. You know, they're way out or they're they don't look quite right. It looks like the calibration of the instrument was off. Could you take a look? But they're not going to know the science that went into it. So how do we deal with that? How do we actually look at the science with the data? There are like within the US, the USGS and NOAA and NASA do have peer review because that data is used to make life and death decisions. So they do fund the peer review for that content before it goes out, before it goes into weather reports, before it goes into other recommendations. But it's a service that's being done by that mission. And most data doesn't have access to that. So you depend, then, on the scientific process and by making the data right now, what we're stuck with for good or bad, depending on how you look at it, is giving access to researchers, the data and software to consider and replicate or reproduce, whichever word makes sense for your research in order to determine if there's an issue or not. So it's OK, but I would agree. It would be nicer if we had more funding for peer review. But that's it's it's just not realistic right now. And the other thing that you and I have often talked about is can we please have some domain repositories that we can actually do this? Because, as you know, in Australia, there are many domains that just do not have very main specific repositories, and they're aligned on generalist institutional repositories or Zanotto or heaven forbid, as someone said, research gate this morning. Look, we better move on. Natasha, do you have a quick or how about we just address there's a question in the chat that went up early from Alex Prent. Would you explain the data side for samples a little more? Would one just place IGC URLs in the reference list? I think that might be getting at the problem that you're going to have to 100 and you can't do that. Do you let's say I feel like you should take that. Did you want to talk? Do you want me to talk about the work we're doing on collection? Do you eyes or do you want to do want to come at it differently? Oh, I was going to talk about there is a problem people with the fact that in a publication, you can only site. Is it 20 research objects? There's even less for some. Yeah, there's a number. So there is a group starting in Research Data Alliance, which I'm Shelley and I are associated with, where we're looking at what we call an aggregated DIY so that you put, give it all your individual samples, 200, 2000, whatever you've got and IGC and then you create a DIY for the aggregate that goes with the publication. And the important thing about doing it that way is that the organisation, the funder, the researcher gets credit for the individual DIY. I just say in that aggregation, but the aggregation is what's cited in the paper for transparent research. And then it was well, you could also think that that same same sample could be used in other papers and in different aggregations. And so through knowledge, perhaps you can track how many times your sample or your bit of data or your bit of software is used. Look, in the interest of time, Natasha, I thought we could hand over to you for you asked your question and I'll share my screen so we can get back to hopefully your presentation. Thank you. But I have a question, which is so first of all, great set of guidelines that you've got there. I think everybody there's some good comments about that in the chat. Sorry. Can you just pause that for a second, Leslie? Yeah, I am. Yeah. Sorry, I can't see anybody now, so it's a bit harder. What I was asking was so you've got some guidance around the data availability statements and I just wondered, I mean, obviously different journals have different requirements for what they want you to put in a data availability statement. And I just wonder how you're able to match these up in the first instance. And the second one is that, you know, data availability, data available on request is still the most common response to a data availability statement. And the second most common is it's in the supplementary section of the journal. So obviously we want to move in the fair landscape to more data in more repositories and that availability statement linking there, like the dry out example that Chris showed. So I'm just wondering if you're sort of, if there's an interest from AGU there in monitoring a shift there. I can. Yeah, Chris, you don't mind just for a second. I can tell you, AGU does not allow either of those. And we have not allowed them for years. OK, so you could not as a researcher. Now, that doesn't mean that there isn't the, you know, the occasional author who's able to, you know, who has some sort of circumstances where there there needs to be an allowed situation. But across our journals, you cannot say data available on request. And it's been at least five years since we've said, no, that's not allowed. And data available in the supplement only three years. We've not allowed that. So that's not we know that's not OK. So that's where we are. It's just not. Yeah, I can. I can add to that I I work very closely with our staff on training them and and developing sort of these checklists. And and we've, you know, I can sort of confidently say that we are weeding that out almost to like, you know, a very zero percent now. Like, you know, it could any of those kind of things come to me in the day to help desk. And we actually work through how the author can actually share, you know, when those kind of instances come up. But it's specifically stated in our in our guidance and what the authors get, you know, now. So it's like direct direct statement saying you will not do this anymore. So it is it is really I haven't seen it as much since I maybe saw a few cases when I first started, but I haven't seen any since and we were actually advancing a great deal with like, you know, how how, you know, how we're sort of improving with our availability statements and citations, too. So it's really started turning around. Yeah, I just add A.G.U. is not the only set of journals. Is how many of you've got 23 that have stopped doing this? I know nature, science, just about every journal has stopped taking, you know, supplements or contact the author. And so that's what this was about, because I have been repeatedly asked by researchers, what do I do now? They won't accept my paper unless I do something. OK, so that's what we were trying to do in this is show you. Yes, this is what the barrier is, but here are the resources. So I think it's a good place now. Sorry, I forgot, Natasha, I'll hand over to Natasha because you're going to run it, aren't you? Yeah, just don't share any slides. I'll just put a link in the chat. So there's a chance now for everybody to have a bit of a say around this and for us to have a discussion for about, I don't know, the next 15 or 20 minutes or so. So I just we don't have to do the whole thing by mentee, but I just thought we would start with mentee. Let me just share my screen and. Why do the buttons? Why do the pop ups go right over the present button? There we go. So the question, which areas are the most challenging for you or all the researchers that you support in meeting publisher requirements? And you can pick more than one. So if they're all challenging, that's very fine. Just want to get an idea of if there's one area that people are struggling with more than another or if these are all evenly spread. Oh, software is winning. Yes, software is winning. Just data, everything else is almost even there. So so we have a resource on software. Whoever those 10 people are, make sure you get your fingers on that because that might help. Yes. And ARDC is leading a conversation around a national software strategy here. So that will have some related guidance to. And that is on the ARDC website. If anybody wants that and Paula Martin is Martin is here is available to she's involved in that and she can help answer the questions. Oh, it's even now feels like a horse race of some kind. But anyway. OK, so that's that's interesting. And of course, we can't tell if why people are having trouble here. So we might go to the next question now, but that gives us a pretty good idea that data and software sort of pretty equally challenging. And then we have notebooks followed by models and samples. So what are your pain points in meeting publisher requirements for data, samples and software, etc. So we don't mean published articles here, but all the other things that we've been talking about just meant he doesn't allow you to put them all in. So you can just put in this is an open ended question and you can just enter your question into your comments, sorry. And it should scroll across the screen, I hope as people start entering it. Yep, citing and sharing notebooks. Is that one that comes up quite a lot for you, Chris and Shelley? Yeah, yeah, that's why we have the guidance. That's why you have guidance. Yes. Yeah. Chris, do we have the DOI for the one the paper that we're using as an example? That might be helpful to share here. Oh, yeah. Yeah. OK. What we have actually, we might have two papers, right? Don't we have the one with data and software? This was really fun. We have an amazing set of authors who had a paper with three data set cited and three pieces of software cited. And so we were using that as our example and we're all taking bets that it gets just by using that as the example. It's going to be the most cited paper we have. So it's just, I don't know, it's kind of a funny, silly behind the scenes sort of bet we've got going. But then we also have one on a notebook that both of these Chris worked on and made sure were really great. So the credit's all his. OK, there's a question there. If a results paper is published, can I then publish another paper for just the software? I mean, obviously you can have data papers. So are there the equivalent software papers? Yes, yes, Joss is already in the chat. Joss is a fantastic software paper. Great. Yep. Journal, software journal. I'm sorry, software journal. And that's where you'll get they do appear of you. So I think there's a theme in here that citing data and software is really hard. That seems to be one of the things that is coming up in a lot of this. Having data peer reviewed for making it publicly available. Shelly can make comments on that. Or something we're working towards. It's a problem in Australia because most of the big repositories, you know, you publish the data, then sorry, deposit your data, then it's published. And researchers want to actually have that data embargoed until they're sure that the paper has been accepted with the data in it. So there is a group working on that to help work with repositories to be able to facilitate access to data for the peer review period. Shelly, do you want to add something? That was just brilliant. Actually, that's exactly what we're doing. And it's it recognizes the fact that especially when you pick a repository that has a curation process and it does take some time, that you want to time the paper with the data. And even for those that are generalist repositories that take the data right away, that you want the data to be available for the peer review process, but not open open more broadly. And yeah, we there's a lot of work that's happened in this area. RDAs done a lot of work. Others have done a lot of work, but we're we're digging in even further. And actually have a recommendation coming out for vetting that will then go to the journals to help them realize how to help authors think this through because because because that's one resource that you have, among others. Yeah, it was pretty amazing. When you think about it, we actually had to start getting a conversation going between the publishers and the repositories. And up until now, most conversations have been with the publishers and the researcher. And the poor repositories were getting stuck with some pretty unreasonable demands from either researchers or from publishers. And so we started this three way conversation so that the repositories could actually have a say because they're no longer libraries that have books on a shelf with data. They're dynamic, they're living and they've got to make things things work. Gosh, you're getting a lot of comments here, Natasha. Oh, well, there's still it's I think it's now I think we've got the same ones just crawling over now. But the themes I'm seeing here citation, guidance about which repositories. And that's someone put someone put storage, but it's similar kind of thing here. Also peer review and just best practice guidance that's kind of missing. So I might go on to the next question, which is where do you or your researchers go to for help in meeting publisher requirements? So this might be so I've just had the privilege of writing the forward for this year's state of open data report and in that report, researchers say that the pretty even split really between institutional libraries, repositories and publishers as places that people go to. But yeah, honestly, one of the solutions is scaling up. OK, yeah, we need to scale Leslie. Are you ever email? Is that one of the finances? I don't know. That's who I go to. Oh, Leslie, you will become the help desk. Yeah. Oh. There's a reason Leslie's co-authored a lot of my papers. I'm not stupid. The library. Great. Yay. Yeah, but I was saying that sort of evenly split in the open data report, the results show it's evenly split between institutional libraries, repositories and publishers. And so this kind of there's this really there's this shared responsibility to provide help. And yet it's quite it's a really disconnected space. Publishers provide what they provide. Institutional libraries do their thing and repositories do something else. We're not really I don't see a lot of coordination between those groups in providing help for researchers. So I think that's an area that needs attention. What we're trying to do, I will be very honest with you. Having the publisher tell you what to do with your data is something that doesn't make sense to me in because they are the least knowledgeable about your scientific data. Now, that being said, that is my full job at a publishing house because they knew they didn't know. So what Chris and I are doing and he talked about it earlier today is we are trying to adjust the timing from when you think about publishing your paper to more and more connected to your community. So you start thinking about your data when you're actually designing your research, not on what data will I need, comma, followed by where will I store it, comma, followed by, oh, I'm going to use that same metadata I used last time because that worked really well, comma. Oh, I wonder if that vocabulary I used last time is going to be the right one for me this time. Oh, I'll have to check on that. So it's it's in front of your work as opposed to at the publication side. So what I what I'd really like to see in this list and don't type it just because I said it necessarily, but societies and unions weighing in on what those what you need for your your resources. OK, that's great. Thank you. That is actually all the mentee poll questions I had. So we have a few minutes left, though, if people would like to talk about, you know, what might be helpful in this space now. So we've seen what the pain points we've we've heard what the pain points are. And we've looked at, you know, where people currently go for help, but are there suggestions for what's needed in this space? Thinking about the participants, the organizations that have contributed to this webinar, too, and whether there are things that can be done, you know, by us. So to please suggest. Jeff, you've got your hand up. Could you unmute yourself? Or are you able to unmute yourself? OK, Paula, can you unmute? I think I am. Yeah. Can you hear me? Yes, Jeff, definitely. Yeah, I just wanted to follow up my question in the chat, which David Sinski sort of partly answered. But it's really about for agencies. So I'm from Geoscience Australia. So we have in-house databases, which we provide to the public. So a lot of, you know, geochronology data, for example, that I'm involved in is delivered that way. My question, I guess, is if and when we publish that in various different journals, and how do we ensure that? Or find out whether that repository and that way of delivering the underlying data is going to be acceptable to a particular journal? Or are there ways of essentially getting our databases and delivery mechanisms accredited in some way so that that becomes an acceptable way of of delivering the datasets? I can answer that because I've obviously really thought about it a lot. The problem is that, Jeff, is if you give a reference to the whole database, right, you're not actually referencing the data that goes with that publication for transparency, etc. The reviewer needs to know which part of that big database you have used to create, you know, to do that paper. So again, considering there are such small volumes, what some groups are doing is as we talked about these aggregated data sets. So this is a data set that goes with this publication. And then you give it a DIY, and I'm sure you stop big volumes we're talking about that maybe what GA could do is in a catalogue have this is the data set that goes with it with a DIY, right? Because that then means that anyone who picks up your paper in the next five to 10 years can get the exact data set that was used. The other way is that if you have generated the data from your database, then you can actually save the query. So I question the database on this day. This is the query I used provided someone from external because again, we're about reproducibility of science. Someone from external can run that query on your database and have the exact data set replicated. So there's just been a paper published in Data Science Journal on how to cite data from dynamic data sets. It's not easy to do it. I can assure you. Could I answer that? Also feeding on from that. So, Jeff, you have, I mean, whenever a database or a data set from a database is released at GA, it is supposed to be released and published. So it should be an ECAP. So that's essentially a subset of the database that exists in our in our Oracle systems. If you want to create one just for an individual paper, as Leslie suggested, that's fine. You just publish it in ECAP. You give us the data set. It goes into our repository and then that can be provided with the DOI to the publisher. And we're in the process of getting our core trust seal certification. So we will be recognized and should be should be recognized by all the publishing houses. Thanks, David. That's really good to hear. There's an American. Hi, Alan. Alan Pope from the US has asked. How's it going? Yeah. Let the Australians ask question first if you have any questions. But it's such an interesting topic that I thought I'd join and ask. Well, just quickly. And then we'd better head to wrap up. OK, so Shelley, he's one of yours. You can have him. Yeah. So I'll just say the question out loud is like this is great disciplinary information. But I'm coming from a polar research background. We do work with a lot of Australians as well. But I'm curious about like regional questions around best practice for management, data discovery and things like that. If you have any thoughts on that, the crosses disciplines, but might be focused by a location. So for me, what I would do is draw you to the work we're going to get started on next year, which will go discipline by discipline and polar. Of course, having its own cross domain challenges, but unified on location. But but you have done so much. I'm not it would be interesting to see what's already in place and what's missing. But yeah, that would that would be great. So so we're going to start with hydrology. It is fairly progressed. So there's a lot of resources. And then we're going to test out our methodology on them. So it's more of a method check. So we'll do that early in 2022 and then start to if you if you'd like to nominate polar to go next. And the idea would be we would talk, you know, across international community on what makes sense for things that will go in your data management plan, what data, what metadata vocabulary, where do you want to put it, under what conditions, you know, based on the different types of repositories that are available, you know, how would you make your selections such that a researcher doesn't have to guess that their community says, here are the things that we recommend and it's coming from the community. And then that can be easily shared with institutions that can be shared with funders and it makes a community have more of a cohesive plan. Okay, well, I'm getting text messages from Natasha that it's time to wrap up. Sorry, Natasha, I had to dub you in. We really were flying blind on how to run this webinar. We were hearing a need that people didn't quite know what to do. And so we kind of put this together because as some of you know, I work closely with AGU and I knew a lot of the work that Shelley and Chris were doing and I wanted to bring it to Australia knowing that it would also suit environment, health, et cetera. And so this was just kind of starting the conversation. I'm not sure whether we met your needs. We did have up to 99 people at one stage. And so if you've got any follow-up or any other questions or suggestions as to how we could continue this conversation, there is the email address on this slide, contact at ardc.edu.edu. I'd very much like to thank Paula, Martínez and Joe Saville who behind the scenes helped get this off the ground and get it organised. The support from NCI Oscope and Turn is appreciated as well. And I'd also like to acknowledge and thank Shelley and Chris for taking the time out to do this. I can't remember what time it is over there at the moment but I think it's nine p.m. So they've actually stayed up late to do this for us and I'd like to thank them sincerely for introducing all the work that they're doing. So thank you everyone. And Paula, the recording and notes from this will be publicly available off the ARDC website in the not too distant future. So thank you everyone.