 improving access to geoanalytical research data and get to know where it's from. How do we do that? Please ask some questions through the question and answer box or in the chat and we'll try to address them during the discussion. Now we'll follow the general introduction of the debate, introduction of the panelists moderator and then we'll start off with the first part of the debate in which the panelists give a little statement and then we go to the discussion with the closure summary. Together with Marta Klocking, Gergit Amat and Lucia Proferta, we organized this great debate because we thought it's quite important to have a discussion about improving access to geoanalytical data, its heritage, its provenance, as the data is the backbone of our research and we use it every day to solve questions we might have. It's important to be able to discover it but it's also important to be able to evaluate the value of the data once you've found it and to be able to assess the quality. So some information should be available on how the data was collected, modified and then publicized. So from the take one at the GU there's a couple of insights where we've got a very diverse community with different methods, different samples, different applications and that the barriers for adoption of more standardization in the geoanalytical data is mostly cultural so we can be scared of being scooped or the data being scooped before we get to publish it and maybe there is a lack of credit for when you publish data so you don't get always reference to a data publication. And data management at the finish line, yeah at this point there it's difficult to when is data set finished, right? Do you need to publicize it when it comes from the machine or do you need to publicize it when you're publishing your paper? And then the last point is that the sticks we have data policies in labs or in institutes they're not so strongly enforced. Now today we've got four speakers invited to shed their light on the topic and the first speaker is Katie Jamerlane from the T-Side University in UK and Katie is an igneous petrologist and field volcanologist specializing in the in-situ microanalytical techniques and she's also passionate about changing the data culture in geochemistry and making geochemical data fair. Then the second speaker is Steve Goldstein, he's Higgings Professor of Earth and Environmental Sciences at the Lamont Authority Earth Observatory of the Columbia University and Steve is a geochemist, he uses natural radioactive decay in geochronology, geochronometry and process tracing. He's promoted best practices since a while in geochemical data and publicizing publishing it in for example things such as the editor's round table that he helped to establish. Then Leslie Weierborn, she was invited very last minute and she is honorary professor at the Australian National University in Canberra and she's a long-term geochemist and she's involved in many data science projects concerning geoscientific data within Australia and around the globe. Then last up is Olivier Puré, he's associate professor at the UNI in La Salle in France and he's a hydrogeochemist with particular interest in trace metal fractionation, low temperature accurate systems. He's also a strong advocate for open and inclusive science so it would be really interesting to hear his point of the on the matter. Now each of the speakers will introduce themselves, give a short view on the topic and then Kirsten Lennart who is senior research scientist at the Lamont Authority Earth Observatory also Columbia University. She'll be leading the discussion and she's also in her usual job leading the EarthCamp data facility as from a materials data system and system for earth sample registration or CISA. Then some brief announcements, yeah as I said please use the zoom question and answer feature to ask your questions and you can also ask questions in the chat and then there's participants who are joining via the live stream on the EGU website and they won't be able to ask questions directly but we'll monitor the chat there and we'll try and transfer those questions to zoom. Yeah that's it that's all let's get to the core of the debate and I'll hand over to Kirsten who might have a few words to say as well. Yeah welcome everybody and I've already been actively chatting here with with people in the audience. It's always sad that you know there is just a screen in front of me it was fantastic at the EGU meeting in Vienna to see at least part of the audience in person but so that's why I'm trying to establish a little bit of a dialogue here in the chat so please continue to introduce yourselves and also to really participate in in this debate so we're going to start off with the comments of the panelists who have put together a few slides to provide you with a first view of of their involvement and their opinion about geochemical data standards and geochemical data sharing. Let me just add you know thanks Alex for introducing me that was was really nice so I'm actually running a data facility um Earthchem as a place where geochemical data are shared and my interest has for many years been to get the community engaged in defining what is needed to make data reusable and you know we we're making progress but not enough and this is a great opportunity for us to to take another large leap forward and we're going to hear a little bit later about efforts I think such as one geochemistry and new initiative to to facilitate a global collaboration on geochemical data standards. I just wanted to say at this point already that for those of you who may be in Hawaii 10 days from now for the Goldschmidt conference that we're going to have a boost in the exhibit hall for the one geochemistry initiative and it's a great place to stop by and talk a little more on standards and data sharing in geochemistry and I think with that I will leave it for now and get started with our presentations and it is a great pleasure to start off with Katie who has been already a very vocal and enthusiastic supporter of a cultural change in geochemistry so Katie please take it away. Thanks Kerstin and hopefully I can move the slides thanks like thunder wonderful and yeah so I've been given a very kind introduction I am a patrologist I specialize in collecting textural and in situ data and so for the volcanic rocks that I work on it's about combining kind of the field locations with textural observations perhaps qualitative rather than quantitative and then quantitative in situ compositional data and kind of connecting it all the way through so you know I collect a range of different geochemical data and I you know I really I think this debate is an excellent opportunity for us to talk about how we can do better as a community and why we should be making geoanalytical data fair and accessible given the amount of money that we are that we receive you know that if we think about the cost per analysis that we have the more that we can make this data reusable the better value for money we're going to get and the more we can seek to answer kind of these larger questions within geosciences so I think my kind of introduction whoops to kind of the the challenges of sharing geoanalytical data and how we kind of how we archive it and make it available came when you know fairly later on in my career and for that I'm quite sad that I wasn't introduced to it during my PhD and at kind of the earliest stages when I was collecting geoanalytical data but working with geophysicists working across disciplines between labs trying to communicate the different uncertainties that we have with geoanalytical data the different methods that we have and the different things that we can resolve was really challenging and trying to have kind of a uniform approach to how we report our geoanalytical data and the associated metadata will allow us to be more interdisciplinary will allow us to be more effective in communicating the limitations but also the applications of our geoanalytical data and I think that's what I was kind of trying to represent with this cartoon here is how we can be more interdisciplinary how we can use geoanalytical data across across different disciplines to couple with with other areas of geo science and I think another reason why we should be making our data fair is to have a more inclusive scientific community and I really like this this diagram here where we're looking at there was this really interesting paper looking at how to make an integrated coordinated open network science in volcanology geochemistry and petrology and a key part of this is having fair data and fair tools and thinking about how our data are findable accessible interoperable and reusable and I think that's part of our responsibility is to make geochemistry kind of more inclusive by sharing our data in in a fair way and and finally my kind of last point is you know often we might think about you know the the different repositories and the different tools are available and obviously those are the building blocks that we need to be able to share our geoanalytical data in a fair way but actually the more people that I talk to you know at the more I talk to students I'm like have you thought about how you're going to you know share your data what metadata do you need to to have to so that people can use your data moving forwards I think there's there's more discussion that can be had about how we have this data culture change you know this for me I think that this should be included within all graduate programs where we're collecting geoanalytical data and I think at the moment it's it's very variable depending on the experience you have and I think what we what would be great is if we can have this kind of change where it's not oh I have to do this because a journal or a funding body says we have to archive our data but thinking about how we're going to store and share our data well as we're collecting it is is incredibly important and I recognize I'm probably speaking to the converted here as part of this debate but I think it's something that I personally want to work on is having that discussion with everyone that I work with whenever we're collecting data as we are collecting it is how are we going to store this how are we going to make it reusable and accessible for other people moving forward so I think very much this bottom-up approach as well as the top-down with the you know enforcing the requirements from funding agencies and journals to make sure our data is findable and accessible into operable and reusable is really important that's probably all from me okay so um yeah thanks very much for that real that kind introduction alex i'm i'm steve goldstein and i'm as alex said i'm a i'm a i'm a geologist actually at lamont doherty earth observatory who practices geochemistry and i use and i use isotopes to um basically try to figure out how old things are and to trace geological processes along with major and trace elements and i've also spent a lot of years as an editor and as you can see from my old face i've been doing it for a while and i just want to give a perspective that we're having done it for a while that i've really seen a lot of changes over time about how to deal with data and i just want to kind of tell you what it was like when i was in graduate school and um you know at that time data was published only in data tables and journal papers and if you wanted to get that data you had to go to the library you had to go to the journal you had to get a copy of the paper copy of the of the paper somehow and um there might be a data table if there was too much data there wouldn't even be a data table right you might have to just try to read it off of the um off of the off of the figures and so we've come a long way okay um and you you would have had to do it because of pay you know there's there's a limit to how how much you know how how long a paper uh published paper could be at the time still is right um but we're you know we're in a digital world now and and and that was the first fundamental change um one of the questions that you know we were asked to consider here is why should analytical data be made public and why not and um frankly i really think that um i should be surprised that we are still having this debate in 2022 and um this is an example of my first slide now go to go to my go back please right this is 19 years ago in 2003 um the editors of the chemical geology published a guide to publishing data in that journal where we said if you want to publish data in chemical geology this is the analytical information i.e. the metadata that you need to include and um even at that time we were really surprised that we would that we had to do that because geochemistry since at least the 1960s um has been the major driver in improving analytical chemistry and in improving the quality of analytical data it hasn't been chemistry it is geochemistry and in 2003 this had been going on already for 40 years and yet the quality of the analytical sections of manuscripts submitted to chemical geology by geochemists was very spotty and this editorial was one of the first attempts to impose standards on the publication of geochemical data and in fact i think it is the first time a journal published a set of standards reporting geochemical data um just want to say it just i'll read the abstract because it's so short but this is what it's all about in an international journal such as chemical geology it is vitally important to include a reasonable assessment of precision and accuracy and appropriate information regarding the analytical methodology so that the data can be compared between different laboratories and the policy of the journalist to ensure adequate document documentation is at the policy of this journal to ensure adequate documentation is outlined below it's been 20 years almost 19 years since we published this and we're still discussing this so thanks Alex can you go to the next slide okay so but the answer to the question should analytic should scientific data be public and fair well the answer is yes it should be what are the what are the arguments before firstly science is driven by ideas and data are the foundation of the ideas second in order to be able to evaluate the ideas we need to have access to the data and i have to say that there is a recent paper that was published that i was not able to get the data that was used to um publish in 2022 that i have not been able to get the data that was used to make the arguments in that paper so this is still a problem okay more over ideas are often ephemeral good data is long lasting okay we publish papers for the ideas that's the science but we change our ideas we can still even when we change the our ideas we can still um we can still use the data another i think very important aspect to consider is that we use mainly public funds to pay for science and therefore we have an obligation to the public to make that data public um so the question is not whether scientific data should be public and accessible but rather how to make it so and so i would say that there are no arguments against making data accessible but there are a lot of challenges so alex if you can go to the next slide okay so what are some of the challenges here's a list of some of the challenges that we might want to talk about okay where should data be stored and i want to say that it should not be by journals or publishers because we need a reliable long-term term source of data and i want to give an example i was looking for data on isotopes in manganese crusts that were published in the 1990s i went to the journal i went to the i went to the um i clicked on the link and the link didn't work anymore for several papers luckily these the authors are still active even though it was the 1990s i went to them i was able to get the data they were shocked that the data their data was no longer accessible and one of them actually couldn't find it and went to an old floppy disk and ended up getting the data off of a old floppy disk somehow i don't know how they did it but but they did so not by journals or publishers how are we going to pay for long-term data and storage and who should pay and one of the ways that one of the business models a major business model is to try to make authors pay for it you know i think that that's a problem in places like the united states where grants do not pay even pay for the science that we're trying to do okay another important question is when should the data be made public that is should it be made public on on publication or otherwise what's a reasonable moratorium time frame to have researchers keep their data with help that's an important question next next next bullet point how do we enforce data sharing i think funding eight people will do whatever funding agencies tell them to do and they'll do whatever journals will tell them they need to do in order to get their papers published those are possibilities this is the question is an important question and i just wanted to note about note that peer review does not seem to be a good way to enforce data sharing my experience is the peer review is really hit or miss and one way that we one thing we don't want to do is to have a sentence that says well if you want the data contact the author that never works okay how to next bullet how do we determine appropriate data and metastatus standards and how do we enforce them that's a that's a problem how do we the next one is a really important one for the data producer how do we ensure allocation of credit for data for example when we do data synthesis often the people who generated the individual data in that synthesis do not get cited and the last point is very important for data producers how do we ensure recognition of the importance of their contributions by promotion and evaluation committees so that's a list of some of the challenges so thanks very much that's that's what i have to say for now there is leslie hi um i apologize for the fact that simon and shawna had to pull out at the last minute and i'm the last minute replacement or should i say last second so i'm just going to um give you my experience with geochemistry and probably go back a bit further in history than steve did um i did my phd in geochemistry with professor bruce chapel he was a brilliant geochemist and he invented the xrf his favorite saying was quantify your data and no one would argue with you so here was my introduction to databases but the important thing was that i was at that age when xrf came in the number of analyses you could do in three years and a phd went from 15 in white chemistry to 300 and my career started out with a problem because in the 1970s the number of analyses you could publish in a typeset table per publication was 15 and stayed at 15 for the next 20 years and so data supplements became the norm the dark ages of geochemical data began it went from sharing our data which is all we could do over three years of fighting white chemical analyses and publishing them to like me getting my first paper rejected because i put in 20 analyses and no phd student could possibly do that many my analyses must be garbage and so we went into the dark ages where what we say data mining took over you know it's mine it's mine you can't have it only through a data supplement and then the questions emerged where is my data where did it come from how was it obtained and so really we've been trying to improve access to geoanalytical research data since those dark ages of the 1980s began when the dreaded data supplements came in i joined the government agency in 1977 and was a protozoic granite specialist must have collected over a thousand protozoic granite samples and you wonder why my knees are shot because bruce always encouraged students to have representative samples which for course granites was 30 kilos in the late 1990s i was funded by 20 minerals exploration companies to do a compilation and metallogenic evaluation of 10 000 protozoic granite analyses mainly by trying to munch together these data sets from eight geological survey databases and i don't think the word standard had been invented but remember i've been doing this data set together in a few weeks a pattern dropped out where we could see nine major granite types depending on depth and pressure of their source regions 90 percent of them are formed in geothermal gradients of 30 degrees or higher and as you can see in this figure in the right this is modern day crustal heat flow in the protozoic and it is exceptionally high and everybody ignores it because it doesn't fit the data so next slide please with the data this is the power of being able to put data in great data sets so on the right you have the australian protozoic and here i'm plotting thorium and on the left you have the modern arcs and you can see straight away if you look at tuttle and bowen then you know these granites are formed at low pressure because they're all granites they're high silica and desites are extremely rare in the australian protozoic so here was i thought a wonderful story and i did as bruce taught me to quantify my data but nobody really believed it but anyway i'm just saying it was where i got hooked on multidisciplinary science and data next slide and at the end of this project i thought hmm this is what you can get out of australia what could happen if we had a global geochemistry database unfortunately for me ga valued my data science skills more than my geochemistry skills and sentenced me to the next 15 years building data systems and interoperability experiments and all sorts of other things that most of you who on this call know me for but after i left ga in 2014 i met two fantastic groups one was a group of enthusiastic geochemists who were in that first slide who all shared this vision of one geochemistry as a global geochemical data network that facilitates promotes discovery and access of geochemical data through coordination and collaboration amongst international geochemical data providers i also at that time met up with a group of disgruntled agu members who were just sick of data being hidden in supplements and the making data fair project was born and that's a link to a paper in nature on it and now databases have to sorry has to be in databases linked to the papers fortunately after an effort only 40 years we've killed off those horrid data supplements that led to so much data mining in those last 50 years so now what we need to do is to agree how we can coordinate these databases through community agreed standards that enable interchange globally of data between multiple systems this is one geochemistry thank you thanks lessy and whispering and we move to our last panelist Olivier thank you everybody and this was a great range of input and opinions that we've heard and there was already one question that is being discussed in the in the chat but let me just kind of try to put out some of the or summarize some of the the concerns and insights that have been brought up again you know we we already had on an earlier slide that Alex showed the summary of our you know ideas that came up in the debate at the meeting and it becomes clear again that I think the the major challenges that we're yeah thank you the major challenges that we're facing are really in the culture and they they pertain a lot to this lack of credit and the implementation and I saw that Dominique commented in the chat on the problem that you know giving credit doesn't help if that credit isn't recognized in the at the universities you know in the promotion procedures and so on and I think there is change and I would love to hear from the panelists but also from the audience please put in the chat or or let us know if you would like to speak we can give you speaking permission what your experiences are is there a change in in any way inside there has been I mean from my experience this has been a topic discussed so much in workshops and conferences at sessions and so on and people have come up with ideas and and it seemed to me that there is a little bit of movement but hey do you see that and be what can we do to go from this slow and limited progress to something that really makes a difference so I please raise your hand if you want to comment because otherwise I'm not not sure who's going to be talking you have to raise hand yes Steve okay I bit this this is a huge problem on many levels and it's particularly as I said where I see it a lot is in data synthesis where data could be coming from a lot of different papers and I you know I'm I've been on many promotion and career committees and I've been involved in a lot of promotions and I'm real I'd really like I'm really looking forward to hearing from the people who are participating about what they think can be done about this this issue because one of the things that you know we can't we can't do is we can't go into a promotion and say well this person's actually done a lot more than it appears they've done it's just that they're not getting credit for what they've done so we need to figure out different ways for people to get credit we hear a lot about people talk a lot about how things like citations are too there's too much focus that's put on citations and things like that and that may be that and I'm not going to argue that that's that's true I think but the problem here is partly that people are not getting cited for their contributions and so I think that at least one of the things that we can do is try to figure out a way to make sure that at least the people get cited for their data contributions so I just want to put that in there that that's one of the things that we can do it just has you know clearly a technical problem in having you know just thinking of Leslie's presentation about the tens of thousands of analyses that just were Australia if you have a global compilation of granites and you do data science on that it it is still a big hurdle to cite you know 2000 papers and give credit but you know that's obviously and I don't know maybe I should let Leslie talk because she might be addressing this not quite I was just wanting to make the point with Steve I agree with you we've got to work out how to get that citation attached to the data almost at the analysis level so it travels with the data there's another group that's working on being able to cite a bucket of 10 000 analyses so that each analysis can be recognized and traced it's called a reliquary and there's a group working on that so number one we have to get that promotion they can actually state that the second thing though is in valuing people who do this work even if they can say look I did 10 000 analyses and go look at the GA repository you can see how many rocks are from me but do the people value that or do they only value the science so that's where a real cultural change has to come in and the second thing also I think like look at my career which I think I gave a paper on geochemistry a lot of people here didn't know that I was a serious geochemist but I was the one who could do data data science instead of valuing that as a person who could do both I was siphoned off to be doing all sorts of geophysics and paleontology and polycobbles and all sorts of things because that was what they valued and it cut me off from doing geochemistry so I think we call these people the translators the people the fringe dwellers the people have been working both camps they're the people we really need to find and value as well apart from valuing researchers who look after their data in my era a true scientist was one whose data and conclusions could be independently validated we now no longer value the scientists who comes along with evidence-based research I mean just quickly comment on that in between it's from my perspective been really interesting as a theme chair at the Gojmit conference with a new theme on on big data and the the explosion of abstracts that are now coming in doing data science artificial intelligence machine learning on geochemical data so I kind of hope that that explosion the interest in you know creating new new science through the use of large volumes of data will drive a bit more this you know question how do we deal with with data access and how do we give credit for the data so please also you know put your opinions into the chat we really would like to hear from everybody in the audience so Katie please go ahead thanks Christine I think there's a lot we can learn and I think assessment in terms of for tenure or from a promotion or for hiring is changing quite a lot in the last kind of five ten years with respect to kind of outside of pure publication you know we're now recognizing more EDI work that people are doing and I think this is an opportunity for us to learn from from those adaptations that are being made that recognition that people are getting for not just the number of publications but also kind of their work in the whole and how that what they're doing as part of the community and I think hopefully we can start to raise that as there's a really important thing about you know we need to recognize that curating data and making sure it's and collecting data is is valued just as much as as the publications but I saw a comment about you know the challenges being really we need to have funding to be able to do this and often it's not a priority for universities or research institutes to have to kind of supply the funding to allow data to be made fair and to kind of spend time doing that and I think that's something also that really needs to needs to improve and I don't know I don't think I have an answer for how how we go about doing that but I think you know we need to recognize that the generation of new science is not just the person writing the paper about it is how we collect it it's those analytical developments it's how we curate that data it's those database managers it's all of these things come working together to allow us to have these new insights to try and interpret our data rather than just the the final end product and I think that's where we can maybe do better. Thanks and I'm kind of distracted by this discussion that's going on in in the chat already about how we could potentially move forward in with with citations and you know what I just said in my last comment here is that I think the technology is probably there just including you know all the DOIs of original papers that have contributed data to a given data set including those in the metadata of the data set DOI will allow us to generate statistics also about how often data have been reused it is well from my experience the big challenge is the implementation and we've seen that with you know the recommendations from the editors round table that Steve was was part of where everybody agreed yeah that should happen but the the editors the reviewers at the journals just didn't take the necessary steps and didn't feel that they had the power but also to to enforce it and also didn't have the resources and the time to look at you know everything pertaining to the data so there's there's a lot I think in the process and in the resources that are available to to make it happen yes Olivier yeah it is an important issue to to get the data from the paper but many authors said data will be available upon request so it is not acceptable anymore I think I did write that 15 years ago in my in a postdoc paper and I think it is one of the wrong sentence I wrote in the paper I say that because the data set was not mine I work in a big project and the data set was available on the website for the project and this website was not available anymore and we do not have any more of the data set so it is not the only example of that and I think a few a few weeks ago or last week a paper from in medicine from question people said that in their domains about 90 percent of people who said that are available upon request do not risk answer to request so we we need to work as a reviewer as editors to request authors to to put their data on repository before being published I think it should be mandatory in the future thanks Olivier and I see Steve has and you know there are there's also communication going on in the questions and answers part but the Q&A with interesting questions I will come back to that question in a moment but Steve now yeah actually I want to address the discussion in the chat about the number of limit limits on the number of citations in the I mean I can I can see the I can see the value of limiting citations in the paper version of a of a paper but we have you know there's no reason to limit the number of citations in a digital version and in fact one of the things that I learned at a fair meeting that I went to a couple of years ago is that now nature although it limits the number of citations that it publishes in the paper version of of of an article the official version now is the is the web version of the article and I just don't see any reason why there should be a limit to the you know that number of citations in the web version of an article and if that actually becomes the standard of the official version of a paper then actually all of the you know indices are going to have are going to be able to take those citations so you know I think we have ways to help to make sure that credit is given where it's due even now in ways that we're not doing now so I want to advocate that that be the policy of all of all journals yeah thanks kirsten yeah I just want to add something to what you said Steve because I I know that we have discussions about exactly this issue when I'm chief editor for a system science data and we have many data compilations and we have this request very often and I know that even though it is only online the typesetting has to be done and I think at this point there is a difference whether you have 50 pages of citations to be typeset and checked for all the links or whether you have just one page of references I don't think that we have to drop it so I think it is essential and I always try to make sure that this is happening that that everything is cited but I also can understand a publisher who says we have to think about the limit of work and if we if we really force to to have everything cited I guess this will be represented in a in an increase of APCs for example but what we can also do is but this is what they don't know what what we do at gfc at gfc data services I make sure that even for for compilations I include all the all the references in the metadata and this I think the most the highest number is 1400 for a huge compilation on global heat flow and at the moment the credit for data citation is not fully acknowledged but this could be an option to outsource the necessity to cite everything in the paper but to make sure you have everything cited and in the in the machine actionable metadata on the data publication side thanks yeah I think it's it's an important point that you know the repositories can move forward and should and maybe you know make it make these citation statistics also available to the researchers in a very prominent way to kind of demonstrate how cool it is to have a lot of data citations you know and and that might contribute to a cultural change yes Leslie I'll just put a link in the chat at RDA last week there was a group that involved somebody from the Germans clients climate center Deb Agrawal from department of environment and Shelly Stahl from AGU and and the publishers are involved as well and that group is trying to work out how you cite something when you've got you know two hundred even a thousand within a reliquary or bucket and it would be fantastic that um oh sorry um I've just got to repeat my ear right Alex has now put it to everyone sorry I sent the book for people to um work and if this is so important for geochemistry um sign up you can get all the links through that page I've just put in yeah Alex reposted it so yeah yeah for those of you who are on the line RDA is the research data alliance just uh it's an international organization that promotes and and um promote standards and developments uh in the research data ecosystem very important for our um I think I mean this this is a really interesting discussion and we really need to stay with that and I also think that it is important um to get more of the publishers into this discussion um so we we definitely need to um reach out I think maybe you know geochemistry is a good opportunity in in 10 days to if you stop by at any of the booth uh that may be there I'm not sure how many of the publishers actually now participate in these exhibits it's it's good to bring the topic up um as as a concern of the community I think that's that's really important I've seen Melanie have posted a question here that I would like to bring up um to the panelists and to the audience and this is you know the um an issue that I in with my earth camp library as a domain specific repository I'm struggling with or dealing with clearly as well so Melanie had posted this short story here that during a recent consultation on data publications a PhD student uh had wanted to publish his data on Zenodo um he was at first easily convinced that it would be better to publish it in a domain repository but then there was so much work involved in getting it into the right shape adding all the metadata that were needed and so on that due to the lack of time and publication and the pressure of the research paper needs to go out we need a doi for that data set it ended up in Zenodo and you know this is an experience at the earth camp library that if we really try to get the highest quality of metadata sometimes people say okay come on I'm done with this I'm going to put it in fixed share and get my doi anyway so the question here is is any kind of data publication better than none when and where should we start to train researchers on good data documentation and importance of sharing data and this clearly also pertains in how do we create the tools that help us you know make it easy to keep all the information on data quality with the data starting in the lab all the way down to publication yes Steve okay I have a very strong opinion about this okay um and the answer is no um anything um anything is not better than none in this case okay because if the data is doesn't have the appropriate metadata that people can use to evaluate its quality then it's it might be the best data in the world but we can't count on it okay so my so in my view um we've we've been talking about credit and how we need to give people who generate the data credit for that data and get it you know even if it's being published in a synthesis of many other data papers one of the ways that we can do it is to accredit specific data repositories or data sites such that if you want to you know you can't publish anything in any publication and get a citation that's recognized right there are the citation indices decide who they're going to recognize and who they don't recognize we can do the same thing with data and if we do that okay if you publish your data in a in an accredited repository then you can get credit for that data that would be a way to encourage people or incentivize people to actually do it right so i'd like to suggest that that's a possible solution to this problem the answer and once again the answer is no it's not okay to public just stick it anywhere any anyone else from the panelists want to respond to this yeah katie i guess really just to agree with with steve i think it's really important and i think that's what's um really useful about the domain repositories is that you have the requirement for metadata and yes it takes more time and and i think that's part of why we need to change this this well i mean we're in such a competitiveness culture that it's all about getting the publications and then moving on to this and then moving on to this and there's always so many things to do that often and you know speaking personally when i'm like i've got to try and arrange all these things and to make sure that it's storable in earth chem then then it falls off the list but we shouldn't be thinking that it's too much work to do it it's actually a way to make our data reusable it with we you know yes it's more work but it means that our data is more valuable moving forward and so i think that's where we need to start thinking about it as soon as we collect the data we need to be thinking okay you know as you're collecting the data download the templates that are available on earth chem stop like populate it from there you know there are stages that we can do to minimize this work rather than having all of your data ready ready to submit the publication and then you're like oh i need to deposit it in a repository and so i think that's the real that's where we can change the process and that's what we can be kind of all working on individually yeah great that really important yeah steve then olivier and then dominik yeah i just want to say it's already so much work to get a project from beginning to publication it's so much work to get to generate the data it's just got to become part of the process right i mean though the reason why people don't want to do that is because it's not part of the process you can get your date you know the the objective is to get your publication and once your publication is accepted the incentive is gone for a lot of people so we have to basically make it part of the process of publication that's it thanks olivier yeah uh with uh in france we have a a new law for open science i guess it is four years old and it is now mandatory for phd students to consider data sharing in their project and they receive a kind of passport of open science where part of the passport is on data sharing i can share the the document and it it is very good but the main issue is that principal investigator do not know anything about data sharing we request from new researcher to do what we can do so we have to improve all together and it is a small start but it is a start excellent yeah i'm i have dominique and leslie on but i would like to come back to that uh topic of education and you know introducing training and awareness of data management very early in the educational process uh so let's come back to that keep that but dominique now it actually just fits into into this also into training oh excellent um so maybe just a very brief introduction so i'm from frankfurt and we are also working in in germany we just started the national research data infrastructure uh not reasonable this the government apparently and there was a lot of money given for the next 10 years to do this in various consortia and i'm also representing a little bit one of these consortia the national research data infrastructure for the earth sciences so come back to this i think one of the the issues with giving credit or how do we get people to input data into impositors or databases why don't you want to do it and why is it complicated i think one thing behind this is that we should maybe recognize it and this should be part of this cultural change as as a separate method so that it is not part of something but it is something of its own and therefore deserves to be treated as something of its own and this is why it requires this lot of of work and i think this is if we for example see like 20 years ago something like this isotopes came up and isotopes measure became big they're a bit older but it became bigger and was recognized this is a set this is a new method this is something we should we should um we should focus on and and this gives us new information i think this is something similar with data because there's the knowledge pure knowledge that data provide but there's also understanding that data provide or the entirety of the data so something really new that comes out of looking at all the data getting some new understanding and this is sort of like a new field and if we recognize it as a method um then it needs all what what methods have so it needs the training the education for this method um how do we treat data how do we assign metadata how do we store it where do we store it and how do we retrieve the data so what tools what programming languages what interactions what and so on we need to work with these data and this is something so this is this is not just something we attach to what we are doing this is something this is standing on its own feet and i think this is what we need to recognize and and to me this is the cultural change which is actually also the bus word within this nfdi that we need this and i think this is this is quite important and this is quite comprehensive it is not just a single we need credit we need get people to this meet and um we should recognize it as something like let's call it data signs or whatever which is currently a new word that's coming up or so but i think this is really um what we need thanks dominik leslie you're following on on this yeah i was just going to say i was just going to say the other um throttle point is actually research grants and that i know of some systems where you only get 20 you only get 80 percent of the money you're allocated you get the remaining 20 percent when the data that was part of that project is actually in a public repository and um what i call it accessible i think bigo demo does that um so and that as part of grants particularly early career grants is starting to train people that there is an overhead to managing and looking after data and that yes you budget for it but at the same time the grant funders have got to start putting in these requirements and recognizing that it is a valid cost i know in australia for something like 20 years um you could not get any funding in a research grant for compiling data and then they wondered why they didn't have any data accessible from the grants that they'd funded so anyway i'm just sort of saying we're talking about publications but let's not forget there's a bit of a stick there in the grants but let's have lots of carrots because you have too many sticks everyone goes and disappears and does something else well that actually you know it's it's interesting that you say with the grants you know grants often um also have have data that never sees a publication because things were not what expected and so on but these data are as valuable for you know others as as the published data in many ways and it brings me to the question that denise hill said put into the q and a here and you know how should we handle credit for data that is general generated but not explicitly tied to a formal uh publication so i think that you know the answer here from from dominique is important that the way data sets are now published in repositories with the doi they have a real citation and can be cited independently of of a publication uh the problem quite often still being that publishers don't accept those data citations in the reference so again we're coming back to you know um need back to the need of having a dialogue with the publishers and and bringing this this up again does anybody else want to say something to that particular aspect of yes yes yes and i didn't see who was first sorry okay i cannot award to antony i understand his question because i used some geochemical database from others and used to improve the database by compiling some other constants from literature and i used to to cite the original database and then cite the other papers that i compile in the original database but it is a huge work and my work is not published in a way because my database is not my new database is not available i used to share with some colleagues who ask and i give them my database but i am i do not have credit for that i do not need to have credit for that but uh maybe maybe you we can work on some uh interoperable database that are shared i don't know how but maybe we can do something like that thanks kirsten i think um the the problem for getting credit for data is not cannot be restricted to the publishers or the researchers but it's i think it's the full system and we have addressed many facets of that the publisher no the funding agencies could be really could make sure that data are published and i was even talking to the german research foundation once and said oh i'm really happy that you have an open data policy now but how do you how do you make sure that this happens when what people are promising in their their proposals because it would be so easy to check for the next proposal whether the the data promise to be uploaded to a repository is available but they said they don't have the manpower to do this but this i think this could be one one way and the other are definitely the publishers but i think since copters this has changed a lot and most most journals accept data citations but still it is when and i have many of these conversations with researchers i i helped publishing data at gfz data services they say no i don't want the data to be cited because i don't get any point on my hindex for citing data i need the paper to be cited but i said well then why don't you cite both you cite the data when you use the data but but this is always um this is still so so deeply in in the heads of this full research system that the only credit you get is via high-ranked publications that this is also like we have to really think think ample and bring bring the researchers and the publishers and the the funders on the same on the same plate i think yeah and i wonder i mean that brings up in some way the question you know how do you how do you get the the journals to to change and you know many of the journals are actually associated with professional societies so we do have you know agu we have the geochemical society what what can the role of the societies be actually is there a process that we can potentially initiate for the societies to take a um a more active role in the world of data and um you know providing recommendations or you know endorsing recommendations i remember that it was my very beginning in interacting with the the geochemical society way back in 2005 or six i forget it was was a gorge mcconference in melbourne um where we actually had a president of the society who was very supportive of of data efforts and the geochemical society created a statement or a policy statement on on data that has never been updated it's there since 2006 or seven and uh it it would need a a real focus so um how do we get the societies to be more active i know agu is is very focused on on data activities these days but they are at a higher earth science wide level or even science wide level and not specifically focusing on on geochemistry any one yeah don't mean me um i'm not sure the societies can necessarily um endorse something but i think societies as part of want your chemistry would be fantastic just as supporters to show that they are there that they are supporting this day and this is they can direct people to so they can say this is we support wouldn't want your chemistry um please um endorse what what they are come up with in whatever rate hopefully more bottom up process of course so that everybody is part of this and um i think this is this is how they can contribute significantly and i think this is how we could reach them most easily and so i would be hopeful and i'm optimistic that they would support something like want your chemistry and i think this is this this this would be a good way and then just point to please this is a good way to go yeah i think you know we we have actually had and you were part of that i think in one geochemistry a discussion on writing a short paper for elements or for one of you know the the communication channels to those communities and to the societies to create awareness and and attract attention i think it's it's really relevant that we are in a different world than 10 15 years ago and data needs to be on a society's agenda this is my opinion um i actually see that we are we went long i had expected we would finish early but it was a great discussion and i'm i'm really happy to see you know the ongoing discussions in the q&a and in the chat and so on and we've had really a fantastic participation by the panelists here today so we have 10 minutes left and i thought i would use that time to basically get just a one two sentence takeaway from the panelists and also asking members of the audience to put into the chat their takeaway from this session and you know and and maybe an action item you know that what what next what would you like to do you know let let's join in let's create more discussion on an ongoing basis through one geochemistry but also you know we have these conferences where we can get together now in person again with some virtual participation and and keep the dialogue going and have that communication to the relevant implementers to the funders to the publishers to the societies really pushed from the community because that's very often what they want to hear so i'm going through the list of the panelists and asking your feedback the opposite way around so Olivier can i ask you for your takeaway message so thank you my takeaway message is we still need to improve and i will continue to advocate for more open science including open unfair data and my last word just do science right thank you very good point thank you Olivier Leslie i'd agree with Olivier in that once i was asked by a CEO of my former organization how can we fix data in the organization i said fix the science get the science back to being data and conclusions but also being a senior citizen or alias an old fart i'm actually feeling we're getting the movement i think people use here people like katie being so passionate and all you younger people you're caring about it and you know about it i think we're going to get there now whereas for the last 20 or 30 years they just have not been voices advocating for this they're now here and more importantly they're in the younger generation so keep going because i haven't got much longer to go i can i show you okay well we're very grateful that you managed to stay up that long it's it's a real challenge to to be online until what is it getting taught midnight for you right yes don't worry about it i'm getting used to it so next one steve you have some thoughts at the end of the session yeah i i get i guess a couple of takeaways is that really we need to ensure ensure that the data producers get appropriate credit okay and one of the new ideas that came out of this that i hadn't thought about before that came out of this this this meeting is the possibility of i think it's a big issue that there are places where you could just dump data i'm glad thank you for bringing that up denise hills i think that they're that this is something that we have to deal with and one of one possible way to make sure that people contribute their data to the right places is to make some kind of figure out how to accredit appropriate data repositories whereby if the data go into those repositories they will be cited by the major cited citation indices okay and then people can get credit if their data are cited and so i think that is a way to ensure that data producers do get credit or one of the one of the steps forward the other the other really important thing i think is that we need to make submission of the data part of the publication process and if we do that just like all of the other things that we do going from start to finish which is a long arduous process this will be part of getting published and therefore people will be okay with doing it those are my final thoughts excellent and katie you have the last word oh no pressure then i guess um i've just found this debate really informative and i think it really highlighted how there's it's kind of a multifaceted challenge and no single solution is going to fix everything but the discussions that we've had about you know and i really liked what leslie said you know it can't be all sticks we've got to have carrots there too that kind of multifaceted approach again is something that we need and i think the more we have these discussions the more we shout about these kind of issues and the challenges outside of data management sessions the more they're integrated into kind of normal science even though they obviously are you know this is data it's like you know you can't have science without the data so i think that's a really positive step forward and i think you said kirstine that there's um sessions at gold schmidt that aren't you know that we're reaching out to everyone who collects geoanalytical data not just people who are interested in its curation and i think that's something that we need to change the more people talk about it the the better informed people are going to be and the easier we are going to be able to integrate it into our standard work plan just like steve said you know we need to it just should become routine and it's about changing that attitude so that it's extra work it's part of the work and i think that is um that's really important moving forward but i just wanted to thank everyone for their really interesting points that they've raised and the potential solutions that everyone has discussed thank you katie that was a great end to this debate i think i wanted to thank the panelists obviously the conveners because getting this great debate into the program at egu was a major step and i think the discussion here has shown what a large range of topics challenges concerns are there and i think we we potentially started a movement here in with with this debate i i definitely hope so and in that context wanted to remind everybody we had put the slack channel of one geochemistry in the chat uh it's a way to keep the discussion going post questions uh reach out for to those who may have some answers in research data management and so on uh again there's there's sessions at geochemist conference there are workshops that you can still sign up to uh if you want can participate virtually i know with the meeting being on hawaiian time it's it's a challenge for some people um but we're we're going to continue be it you know the next agu meeting or gsa and egu in 23 and so on to keep the discussion going again thank you all to the panelists for this fantastic um debate and thank you all in the audience for contributing to the discussions for listening in and i hope to see you all again thank you so much and take care