 Welcome everyone to the Planet Research Data Commons and the impact of being involved in the Research Data Alliance webinar. We've got a great series of talks here today from projects and partners that have been involved within the Research Data Alliance and also have been involved within the Planet Research Data Commons work. This webinar is part of a series of themed webinars that go and extend all the way through 2023, celebrating the 10th year of Research Data Alliance working and operating. This month, being June, is a focus on agriculture and environment and there's six, I think five or six other webinars that are posted worldwide this month focused on those topics. This presentation and this webinar here today will be live-streamed obviously but also recorded so people can access this and look at it in the future. I'd like to just start though this talk by acknowledging the Ngunnawal and Nambari people the land in which I am situated here today. I'm based in Canberra in Australia and the Ngunnawal and Nambari people are the owners and acknowledge the owners of the traditional lands in which I kind of speak. I also wish to acknowledge the traditional owners of the elders past, present and emerging across the lands and the countries across the world. I want to mention also that we have just completed a national consultation looking at what are some of the needs actually of researchers within environment and earth sciences and one of the key data challenge areas that came through and it was loud and consistent was appropriate recognition and appropriate work utilizing Indigenous data governance and knowledge. So that's a loud and consistent finding that we found through the Planet Research Data Consultation. Today we've got, as I mentioned, we've got a great panel of speakers. We have Dr Andrew Trelaw from the ARDC Director of Platforms and Software for two more days. We've got Dr Elizabeth Wink from Ostrates, Dr Leslie Wyburn who's the honorary professor of Geoinformatics and Geosciences and Dr Chris Barlow from the Centre of e-research and digital innovation in talking about agri-fed fair data and tools. These series of talks really are focused on fair data and some of the outputs and impacts that RDA has enabled research within Australia. And this is really a celebration of the work that has actually happened within Research Data Australia. And as Benny can probably attest, myself, I've been a member of Research Data Australia for seven or eight years and actually getting the honour to actually present this webinar has been really heartfelt and also made me also kind of reflect on the work that Research Data Alliance has enabled and the impact that it has actually provided in terms of research infrastructure development and being able to solve those real-world challenges. They solve through the working groups and the interest groups that the Research Data Alliance hosts and maintains and provides the culture and the function to enable that to happen. But I actually, whilst that is very, very important and has actually led to outcomes and endorsement and standards being developed, I was reflecting actually that some of the other aspects and the impact of Research Data Alliance is the great networks and the knowledge transfer those international structures and how that actually gets translated into research. Me personally, my personal work with RDA has led to a great network of like-minded colleagues and peers that are unable to talk with, share ideas and share problems with and get solutions to much more rapidly than if Research Data Alliance wasn't in place. And I think that's actually a really good summary of some of the impact that actually Research Data Alliance provides and it's also across multiple domains and multiple disciplines, not just one particular demand discipline, all work towards the same thing. One program of particular that the Atlas Living Australia and the AODC have recently released is a national framework for access to restricted access species data. So this is data held on species which is sensitive of nature. So it's either a conservation species that's endangered or sitting on private landhold and data. Through the RDA network though we were able to determine what other countries have completed similar type of work where it involves research government and industry and able to utilise some of their structure and their thinking to be brought back into Australia to make this project work and also to supercharge how that program was delivered. And I think that's a really good example of the power of the Research Data Alliance. I also actually really wanted to reflect that the Research Data Alliance has helped us build up our capability, our digital literacy and our understanding. Two weeks ago the Planet Research Data Commons hosted a workshop in Perth where we had 100 participants from research, from government and from industry looking at the topic of trusted data and information supply chains. So thinking through how do we actually manage a supply chain of data and information all the way from collection through curation, integration, analytics to the use cases of state of the environment reporting, environmental impact assessment, research outcomes and the fact that we were able to have a fruitful and dedicated concept and conversation for two days on a topic called data and information supply chains I think is a testament to some of the work that Research Data Alliance has done and the general understanding that has been developed throughout the community and in areas other than just research, so government and industry. So it's got a lot of work impact and work. So the Research Data Alliance, this slide here shows it's a global member of organization. Any individual can join and register. We have a link there to enable that if you're interested and I do encourage you to flow through into that link. The Research Data Alliance also has had 64 flagship outputs including eight ICT technical specifications. So that's come from the working groups looking at particular issues within research data and the ability to make that fair and to share that and that's led to 64 flagship outputs with 200 plus adaptation cases across multiple disciplines. Within Australia we have a vibrant community of participants within Research Data Australia, 785 members across Australasia with 683 in Australia. We have 32 group chairs so 32 people that are actually leading and driving some of those working groups and interest groups and leading that impact back into Australia but also worldwide. We also have a community development manager here who's actually on the call Dr. Catherine Barker and we have two technical advisory board members and one council co-chair. We have great representation for Research Data Alliance in Australia and it is really a worldwide phenomena as well where we interact with our international colleagues both in the sharing and bringing back all that knowledge and information into Australia. So it's 10 years I really kind of look forward to what the next 10 years actually kind of hold. Part of the Research Data Alliance also we are talking here about the Planet Research Data Commons and this is a new program of activity that the ARDC the Australian Research Data Count Omens is embarking on to integrate data across multiple domains and multiple disciplines within the Planet Data space. If I can just move on Gary if that's okay. Yeah thank you. The new Planet Research Data Commons concept builds upon the work that's happened within Research Data Australia and some of the projects that we'll be presenting to at through this webinar and it's really looking at those research and national priorities and challenges. What we've actually found and what we're actually wanting to achieve is a connected and integrated data system across multiple earth and environmental systems allowing us to actually address some of the national challenges that are posing across Australia. We've seen the pace of changes celebrating due to climate change, urban population growth, transition to renewables and we've seen that within certain areas in use cases of bush-wise floods, adaptation, geohazard events and also new methods of capturing and monitoring and valuing biodiversity and environmental credentials through market-based approaches. The Planet Research Data Commons will be addressing those issues but also bringing in research, government and industry to tackle these problems. We see that the need for integrated dynamic systems to be enabled us to look at cumulative impact over a region and to predict how the environment will change based upon a variety of factors needs fair data, it needs trusted data and information supply chains, it needs network modeling analytics and decision support infrastructure and it needs integrated fair care data sets and services and that needs to be built upon a backbone of knowledge, understanding and people understanding what those data methods are, how do we actually integrate with different disciplines, what those needs are and how you network these models together. The Planet Research Data Commons is just about to launch, really looking at that knowledge infrastructure and these are the aspects across multiple spheres that need to come together to answer some of those questions that are faced by a changing climate. We'll be working on four integrated program activities, so trusted data and information supply chains, what's the accreditation that's needed to actually trust and provide procedural mechanisms of trust between our different data providers within different disciplines and different sectors, enabling that data to flow much more freely than has done so in the past, looking at mechanisms such as the international data spaces concepts to make that kind of work at a data level, not just at a system level. We'll be also funding and looking at programs within integrated fair data sets and services knowing that domains and across domains need work to make their data fair and to make that interoperable, but also to look at those standards and the interoperability standards that are required to enable that to happen at scale. We'll be also focusing on modeling analytics and decision support infrastructure, so looking at the areas that take that data, being able to combine it into a coherent stream and making sense in use of that, so providing national infrastructure to support those functions both within research, government and industry. And last but not least, as I opened up this conversation, the Indigenous Knowledge Management and Governance is an important factor and providing care implementation across all our research infrastructures is priority of the Planet Research Data Commons. We see these programs operating as an integrated fashion, addressing national priorities and challenges with a regional focus to begin with, and increasing interoperability and integration across all our domains. So I just wanted to, I'll leave that there and I'll say thank you very much. We're going to hand over to our next panelist, Dr Andrew Trelaw. And Andrew is going to be talking about the Global Open Research Data Common Interest Group or AUK, I believe. Thank you, Andrew. Thank you for that introduction. I'm also delighted to celebrate 10 years of the Research Data Alliance, in particular because I was one of the group of Australians who helped form it over a decade ago, many, many late-night video conferences. So it's just wonderful to see the way it's grown and developed. However, before I go through a relatively short set of slides, I have to say I feel a little bit like an imposter in that, although I was briefly responsible for the early stages of the Planet Research Data Commons, I do not have a background in anything related to any of the discipline groups. So just take what I'm saying with a grain of thought there. What I want to talk about is, I want to talk about an instance where Australia is both contributing to and learning from international best practice. Next slide, please, Kerry. So the Global Open Research Commons, with the attractive name of GAUK, we didn't think that one through, I think, is trying to think about what a commons means. And this particular interest group formed before Australia started its Research Data Commons activities, although I think even before we had the Australia Research Data Commons. So there are a number of these sorts of initiatives around the world. What we were trying to do when we first started was say, well, what are the things that actually probably intended, what are the things that are common across the different commons? And we also wanted to see how to network better between what was happening inside the Research Data Alliance and some other international activities run through, in particular through CoData, but also through the World Data System. And so we established this interest group to try and really map out the space. Next slide, please. So we started off at Plenary 14 in Helsinki, where we had an initial interest group session after a series of birds of a feather sessions at previous meetings. And we've presented it at every plenary since then. And in fact, as part of our work, we've spun off from the interest group, an international model working group, which I'll talk a little bit about at the end. And for those of you that are interested, we have a web page that describes our work. Next slide, please. So we tried to early on define what it was that we were going to focus on. And we said, look, what we're really trying to build or at least what we're trying to build with others is this notion of a global trusted ecosystem. So this trust idea that Hamish has talked about already, an ecosystem of services and outputs and seamless access. This is all very high level and aspirational, but that was what we were shooting for, drawing together things like the Planet Research Data Commons in Australia with like-minded or similar activities overseas to build towards this global trusted ecosystem. And the strap line, if we ever print bumper stickers, the strap line would be something like digital research resources for the common good. Obviously, we weren't doing this from scratch. When we began, we were informed by a number of things already underway, work that was being undertaken by the National Institutes of Health in the US, the European Open Science Cloud, that's the best known, some work that had been done by the Canadians on the National Data Services Framework, the World Data System, and so on. Next slide, please. So we tried to pull this together. We tried to work out across these existing commons, what are the common elements that you could extract from international best practice, and how could you come up with something that would structure collaborative activity between these commons. Next slide, please. And so the result of all of this is what we call the essential elements of a research commons or an open research commons. And the idea is that these would guide people who are trying to build, either build commons themselves from scratch, or trying to consider how their commons might interoperate between discipline, cross-discipline boundaries, or across geographical boundaries. So in the centre of this is the notion of interoperability and standards, that unless you are focusing on interoperability and using standards, trying to build a commons is really going to be extremely difficult. Across the bottom, the bottom three hexagons, the more, I guess, technical elements of the commons. The compute that you need to process the data, the storage in order to store the data, the networks to connect the access and authentication infrastructure to enable you to get to it. The research objects, and you'll note that we don't talk about research data, we talk about research objects. There are a range of different things that you can regard as outputs of research. Data are some of those, but so are workflows, so are software, obviously so are publications. So for us, these are all research objects that you need to manage in the commons. And then the services and tools that you need in order to produce the research outputs and work with the compute and the storage and the networks. And often descriptions of commons stop there. But what we said was, well, no, there's actually this entire overarching, trying to avoid using the word stopped, I need to come up with a better word, the non-technical aspects of the commons. So human capacity, you need people with the skills to use the commons, but also the skills to build the commons. You need, and this is an idea that we borrowed from the EOS, rules of participation and access. How do you decide who is able to contribute? How do you decide who's able to use the resources of the commons? How do you decide he gets access to the sense of the data that Hamish was talking about? Government structures to enable you to structure the commons and make sure it's going to work, engagement with the researchers to make sure they're aware of what the commons provides, and of course, if you want to get people to use your commons, you need to make sure that it's sustainable. Next slide, please. So that's the Global Open Research Commons Interest Group and that diagram and the definitions behind it have just gone through a community comment process as part of its path towards being a supporting output. And, sorry, can we go back one, please? And we'll be taking into account the comments and refactoring that. At the same time, yeah, okay, next slide, please. At the same time, we've also spun up an international model working group, which is essentially looking at the attributes of lots of different commons across the world, and we're trying to abstract from that the common features of things that people are currently building. Next slide, please. And so these two groups are going to inform one another. The interest group has tried to do a synthesis task. The working group is doing an analysis task, and those two things, I hope, are informing each other. Next slide. So this is a list of some of the kinds of commons that have been involved so far, some of those are disciplines, some of those are nationally based, a reasonable range of diverse. Next slide. Actually, that might actually be the last slide. So the interest group will be finalizing the outputs that diagram and the definitions, and the working group will be presenting its results at IDW 2023 in October this year. Thank you very much. Thank you very much, Andrew. And a very topical will actually discussion in regards to this talk here today. And I think I will join that working group after this session. Please note that if you do actually have any questions for our panelists and our members here today, please put them into the question answer section, and we'll endeavour to answer them at the end of the presentation today. And this will be made publicly available as well. Andrew has to leave, but thank you, Andrew, for that presentation. I'm going to move on to now to Elizabeth Fink. She is going to provide a presentation on Austrates and Vocabularies. Thank you. Please. Hi. So I'm Lizzie Wank, the project manager for Austrates. And what I'm going to present on today is the Austrates Plant Dictionary, one of the spinoffs of the Austrates Database Project. I'm presenting on behalf of the entire Austrates team scattered across Sydney's universities and beyond, and thankful to the ARDC for providing investment. Oh, and it's not moving forward. There we go. I would like to begin by acknowledging the traditional custodians of the land throughout Australia on which the Austrates data have been collected. I'm speaking today from the lands of the Darimarigal and Darub people and pay my respects to their elders both past and present. Now, most of my presentation today will be about our trade dictionary, but I want to give a brief overview of the database first. If you think about any organism, this eucalyptus tree here, there are three key pieces of information about it to be documented. It has a name, it occurs in a location, and it has characteristics, traits. For Australia, the resource for the plant taxonomy, the names is the Australian plant census. Location occurrence data is documented within the ALA, the Atlas of Living Australia, and Austrates compiles the trait data. Austrates, the trait database was first publicly released a little under two years ago with a concurrent release on Zanotto of the database and a data descriptor in scientific data. The database has continued to grow since then, and we now have more than 380 data sets from more than 250 contributors. We're approaching 2 million individual trait records within the database with some information for more than 500 traits and nearly every one of Australia's more than 30,000 plant taxa. The graphic in the middle shows that trait coverage is patchy across these traits. For some traits, plant growth form, fruit type, leaf shape, we have nearly complete coverage for the Australian flora, while other traits we might have representation for far under 100 taxa. But overall, Austrates works by merging these individual data sets together. Here I have the data for leaf phosphorus per dry mass. In some, there are, Austrates includes data for over 1100 taxa, but this exists only because people have contributed these data sets, and we have 48 individual data sets, each of which, most of which have data for fewer than 20 taxa, but they have all been merged together and we are very thankful for the contributors across Australia's research community. Moving on, the Austrates workflow takes two key components for each of our data sets. There is a data set, there's a data file in CSV format that includes the actual trait data in tabular format and a structured metadata file. These are merged together for all 380 data sets into our single database. It's an open source, our workflow, which is available for others to reuse. The metadata file itself is a core component of this. It encapsulates information to align taxonomy, to add location and context information, add methods and information, exclude bad data. And now here merges in our trait dictionary. The trait dictionary includes the following information for each of those 500 traits, a trait concept, so the name and a description, the best practice units, allowable ranges for numeric traits and allowable values for categorical traits. This information is then also woven together with that data file and the structured metadata file, ensuring that each column in the data file is mapped to the correct trait concept, trait data are aligned to the correct units, data that fall outside the allowable range are excluded and substitutions can be added to match a trait value within the data file to its aligned term within the trait dictionary. When we first started building the AusTraits database more than five years ago, we searched around for an adequate trait dictionary somewhere globally that we could repurpose, reuse within AusTraits that had those four core components at a minimum and no such dictionary existed. Nothing had the breadth of coverage, nothing had definitions that were explicit enough. So it's not just trait concepts, units, values and ranges that are required. To go a step further, it's semantically clear trait concepts, transparent best practice units, verified allowable ranges and carefully curated lists of allowable trait values. So what do I mean by this? What is a trait concept? You're probably mostly familiar with the concept, with a taxon concept, a group of organisms that share a common evolutionary history, share morphological characteristics and have been designated by the taxonomy community to be a single unit and has a name which is widely you hopefully universally reused. Trait concepts should be the same. They should delimit a collection of trait values pertaining to a distinct characteristic of a specific part of an organism and researchers worldwide should use the same label. Unfortunately, this doesn't yet exist, which might be part of the reason such a trait dictionary has not existed. Seed mass might include or exclude the mass of a dispersal appendage. Leaf shape might simply record the basic width to length dimensions or perhaps also curvature. When somebody talks about vessel diameter, might they be referring to the diameter of a single vessel or should this trait only refer to the average of a larger sample? So jumping forward then, we required semantically pure, also explicit trait concepts, trait definitions and trait values and we went about this in three ways for all 500 traits within Aus traits. We held workshops where we brought together experts who used the same trait concept in quite diverse research agendas. We tapped experts on the shoulder asking them to review other traits and then the Aus traits team partakes in pretty much continuous reviews of our trait definitions. As an example, when we started, here are three traits we had. One asked the parasitism status or documented the parasitism status of a plant. One was the plant's growth form. Is it an herb, shrub or tree? But actually, we had about 50 terms that had been submitted to us and stem growth habit. How does a plant explore three dimensions? How do a plant stem explore three-dimensional space? We weren't very happy with how terms were mapped to these and through discussion, we cleaned them up. We went from that brown muddy plant growth form to five really semantically clear definitions. Plant growth form had included where does the plant grow? Is it aquatic or an epiphyte? That is now its own trait. It's growth substrate, succulents, and there are much tidier cleaner shorter lists of allowable plant growth forms. So jumping back to this, this was one part of it. Documenting clearly this information that was required for the Aus traits workflow, but there's more to really build a best practice vocabulary. We wanted to link for each of those traits keywords, hierarchical trait groupings. What structure is being measured? Does that trait refer to a leaf, to a flower or to bark? What characteristic is measured? Are you documenting mass, horse, or color? Adding references whenever possible. Is there a trait handbook that describes this trait? Is there a paper that champions it? Who has reviewed the trait definition? And then as the others have talked about, this idea of being interoperable far beyond just supporting Aus traits, building a dictionary that helps integrate research worldwide. So we included links for each trait where relevant to other trait databases worldwide. And we have now finished this and just two weeks ago have released the Aus traits plant dictionary in machine readable serializations in three different locations. We have a registered namespace with w3id.org and each of those 500 traits has a unique resolvable identifier that leads firstly back to our github repository. Here's a web page where all 500 traits and their metadata can be explored in a nice human readable format, but also provides links to the machine readable serializations, turtle files, and triples. These files are also now in a Zenoto repository and also at research vocabularies Australia. So there are multiple portals where they can be explored. So why is this important? Actually, before I go on, I want to jump back and say that this last step would never have occurred without the ARDC's investment. So we're plant ecologists, we could go and write best practice definitions, but without the constant encouragement and help from those at ARDC, especially Rowan Brownlee, we would never have gone the next step to figuring out how to convert our spreadsheets into these machine readable representations. So why is this important? This is the first plant trait vocabulary that ticks all the boxes below. So as I said, every trait has a permanent resolvable identifier. The entire vocabulary is machine readable. It is focused on ecological traits. It offers links across trait databases. It includes best practice units and allowable ranges or values for all traits. It includes the vast majority of commonly reported traits. And importantly, it is easy to expand upon. So the data itself that is compiled into the RDF serializations is a series of spreadsheets. You have to propagate the metadata for a new trait, but beyond that it is as simple as adding another row to the traits database. But it is also important much beyond that. This is now a trait dictionary that can be reused by others. We are hoping that trait databases worldwide pick this up and use it to underpin other resources. It is also, we also now have a workflow that's out on our GitHub repository that others can use to rebuild additional trait dictionaries. So if somebody wants to build a trait dictionary for a different collection of traits for a different organism, they can use our workflow to go from the spreadsheets that ecologists are familiar and comfortable with into the RDF serializations. So I want to say very much this is a group project. I'm the project manager, but I wouldn't have gotten where I have without our broader team. Thank you. Thank you, Liz. And great outcome there with a problem set, which is very complex and very varied as well. So excellent application of the outcomes of RDA into the trait database. Thank you. So next up is Leslie Wyburn, and she'll be talking about GO 2030. Over to you, Leslie. Okay. Just slideshow and here we go. Can everyone see it? Yes, we can. And where's the laser pointer? Right. Okay. So this is kind of a project and these are the people participating on it. And it's funded by Oscope NCITern and ARDC. And I first of all like to acknowledge, gee, sorry, the traditional owners on his lands, we meet and pay my respects to their elders past, present and future. So it's a collaboration through, as I said, Oscope Tern NCI as part of the cross-increased national data assets program where we were trying to integrate datasets across all of us. But I guess we up the barrier because we said that's what's called 2030. We want to make national scale high resolution geophysics datasets suitable for programming access in these environments and lay the foundation for more rapid processing for scalable data intensive computation, including AI, ML, etc. And the important thing is that the project was about positioning geophysics data collect on an international competitive research infrastructure to enable Australian researchers to be internationally competitive. Now that's a bit of a mouthful, but this is really what a focus of Encriss is it ensuring researchers have access to cutting edge national research infrastructure. And so what do we know about 2030? We know that it will be at Exascale and the key points is our data volumes will be in zettabytes, which is 10 times more than we've got today. So if you can't handle what you've got today, you better start thinking about what's the future. But what we do know is that it'll be mandatory for it to be fully machined to machine accessible as envisaged by the fair principles in 2015. So one of the first datasets we went for was this, I'm going to call it Magneto's Lurex, which is MT for short, and it measures the conductivity of the crust. And it was a collaborative project between the research community and the surveys and some research organisations. And it names to measure conductivity at approximately 3000 sites across Australia. And this is a massive dataset because each station where you leave the instruments out for say six months or more, collects around three gigabytes of data. And the critical factor is the amazing thing is you can use this data for groundwater mineral and energy resource assessments, but at the same time, the same dataset is critical for predicting Australia's vulnerability to solar storms. If you don't know what that means, that's related to the Carrington event where these solar storms can actually wipe out all your power stations and your electricity lines, etc, etc. So it's an increasing awareness given our dependency in the modern world on these infrastructures as to how vulnerable they are. And that's what this dataset is about. So we start out and we collect our raw data. And this is in gigabytes to terabytes, as I said. And we go through these processing stages through what we call level naught, level one. And then level two, we get to where we're actually getting more accessible. This is probably called analysis ready datasets and later on our models. Now please notice above the red line, you're in gigabytes to terabytes. Below that you're in megabytes. And the important thing is that most people don't have the infrastructure to deal with this data. And nor do they have the wherewithal. So what you actually see in a lot of decision support systems or online GISs are these highly derivative products and models, which are down sampled such that you can actually handle them. And this is starting to become important in the resources industry, because what you're dealing with is analysis ready data that's been prepared by someone else to their minimum common denominators. But what they're actually arguing is that say seismic imaging, the tools are not universal, and you should be able to get at your less processed forms of the data and use it to fine tune to whatever your particular use case is. Now, we started to bring this data as part of an earlier project onto NCI. And you can see how we've got these massive datasets. And notice how once we put them onto HPSP and parallelize them, how we're able to actually do the processing in a matter of minutes. So whereas before you just had a few groups putting out these massive data products. Now researchers, if you give them the infrastructure, can get at that data and reprocess it to their actual use cases. And that to me is about more targeted, innovative research than having to take products generated by others. So where is the Auslan time series data? We thought, well, we better try and get the stuff. Now we've got the infrastructure and put it on. And this is kind of what we found. And we went on this massive rescue effort to try and find the data. This is the time series data. Now let me know, show you the products were available, but this raw forms of data were not. So actually, we're going to use RD, the data rescue group and this paper here. Getting access to historical data is very difficult. Oops, sorry. And in this paper, it actually says track down the author and ask nicely, which is what we had to do. The other important thing, though, is when you start to generate your data, and you can do it so easily, you can actually see the number of products starts to proliferate. It's really hard to know which dataset that is available online can actually be trusted and used in which one goes back to the Rigididge versions. So there we enter the RDA data versioning group. And I've got a conflict of interest because I'm a co-chair of it. But we then took their way that they use this further functional requirements for bibliographic records. It's an old library thing. And where you look at the work and from the work, you create an expression, you manifest that product in multiple formats, and you make it available on multiple sites. So there's another one of our 2030 data sets, the Aster data set. And you can see how level naught, level one, blah, blah, blah. So here are, if you like, all the kind of products, but they're available as BSQ, GOT, Fournet, CDF. This is very important with geophysics at HPC, because we need to get the data in self-describing modern formats. And then you've got the multitude of products. But by preparing this map through what the data versioning principles were, we can start to differentiate and map out all the products and pick those ones that we wanted. The next thing we found is geophysics tends to be siloed with each community working amongst itself to measure these variables. And so once we started to get the data together, then on HPC, you could see how people could do joint inversions between, say, gravity and magnetic lyrics. And we know that in the not too distant future, people will be doing multi-physics analyses. And this is another product we started to look into, which is the IADOT, the interoperable framework for observable property terminology. And it has a strong focus on variables in environmental research, which encodes what is measured, observed, or derived. And so as I said, geophysics data is rather systematic, regardless of whether it's gravity, mag, MT, etc. It's based around this survey station run concept. And from this ontology that was developed, you can see how we can start to put those all together around variable sets, variables, properties, etc. Now, unfortunately, the money's run out before we can do that. But were we to be able to do something in HPC, machine readability, I think the interoperability of the data between all those different geophysical data types, this will be a critical approach to adopt. The thing we found is that not many are doing this data intensive work in HPC environments. But I was thrilled in June 2022, when we saw that this BOF, led by Tamaya Bureau from the Finnish Supercomputer Centre and Christine Kirkpatrick from the San Diego Supercomputer Centre, came to say, well, hang on, when you go to Supercomputing Centre, nobody deals with data. So they came to RDA, but the problem is there's not many people in RDA who deal with HPC. Now, unfortunately, they're going back to HPC conferences, but it's just sort of showing there's a bit of a gap in this thinking and modern competitive research data infrastructures. Another group that's worthy, again, conflict of interest because I'm a chair of it. But we have this group and the focus of this Earth Environmental Sciences group is just to get people together to talk about what data infrastructures, what capabilities they're developing and hopefully we'll be starting a repository catalog soon. So again, join us if you want to. It's just we meet every plenary and start to try and reach out to new projects and highlight what they're doing. So now just to finish on geophysics, Australia is well endowed with geophysical data, but the problem is it's all that data products that are available. And so I guess the conclusion of our project is we need to make up our minds today if we want a competitive Earth Science HPC infrastructure, because if we do, we need to start now to find these raw reforms of critical data sets and their collectors above all and ensure they are fair and machine actionable, make these more accessible to build these national seamless high resolution data sets, which is something that is in the national roadmap for research infrastructure. RDA is a good place to find some of the components you need, but I'll just have a dig HPC data issues are missing. And finally, I'd like to thank ARDC NCI and Oscar for funding this project, which ends in two days time. Thank you. Hey, Mr. Muted. Thank you, Kerry. Thank you, Leslie. That was an excellent presentation, bringing in geophysics HPC and the future data needs to be absolutely machine to machine and fair and how RDA has helped in that process. Thank you very much. Last but not least is Chris providing a presentation on agri-fair data tool. Thank you, Chris. Hello, everyone. Just turn my screen. Make sure that works. Okay. Can everybody see that? Yes, I can. Yes. Thank you. All right. Okay. So my name is Chris Balo. I'm from the Center for e-research and digital innovation here and here in not sunny Ballarat. So I was tasked with creating a fair assessment tool for the agri-fed project. So this is about this. Okay. So just a little bit of background on it. So where this came from is agri-fed started with creating a data policy which sets the acceptable levels of fairness for data sharing throughout agri-fed. Based on that was the agri-fed technical and information policy suite, which actually describes a set of 14 specific fair questions with corresponding minimum ideal and stretch goal requirements. Based on that, some of the agri-fed members created a spreadsheet which was initially used for data stewards in the early stages of agri-fed project and it specifies the minimum fair thresholds and the types of evidence that are needed to actually support what the standards were made or not. And then based on all this, I wrote the agri-fed fair assessment tool, which is a public web-based application. Okay. So the reason we didn't use existing fair tools is there's several. Firstly, doing interviews and data assessments that were done in the early stages of agri-fed. It was shown that in agriculture, many data sets rated very low for fairness, especially for findability. Like there was a lot of data that wasn't even on a website, it was still on like thumb drives and stuff or there was hardly any metadata. Agri-fed therefore needs a very strong emphasis on supporting users throughout their fair journey because we're starting in that from that sort of very low point. Although that wasn't across all but that was a common occurrence. Okay. And then we looked at existing fair tools to see if we could reuse any of those. Now, the first problem was that all the fair tools that we came across or that we could find use numeric scores and that doesn't really work with agri-fed minimum thresholds. Secondly, of the fully automated tools, which they're only a couple, they're really good, but they don't work unless you have certain minimum standards like you need to at least have a URL or machine reader or metadata, otherwise it just don't show you anything. Most of the tools have a fairly limited in application help, which sort of didn't gel with the idea that users need to be supported through that. Some tools don't allow saving of assessment results. None of them allow a reassessment of data. So they say you assess your data set, your initial assessment and then find that you need some work to get to minimum standards. Then you go and do the work and you want to reassess it. You can't do that unless you actually go and re-enter all the data. So that was a limitation. And then the other thing is none of the ones we found were actually domain specific, which we felt was important. So what is the fair assessment tool? It's a full-stake PHP application, which was written using the Laravel framework. It's got a PostgreSQL database and on the front end it uses JavaScript with a view framework, which was really helpful to create the sort of complex questions, answers, and handled all the data client side. So the fair assessment questions, all the selectable answers, the scores, and all and silly information like help and like in pop-ups and all that sort of stuff are loaded dynamically from a database and they load as a JSON object. The reason we did that was because then it is possible down the track to have like multiple different versions of fair assessment. Let's say if the standards are updated, which the literature suggests might happen, we could run tests for other digital resources. I mean currently it's set up to handle data sets. Mainly other questions are specific to data sets, but with a like version system you can have a version for say software or vocabs or services or something, so other digital resources. And of course it would also make it possible that we can use an entirely different set of questions, which then just display in the same form using the same infrastructure. All user responses and submitted fair assessments are saved in the database, so that means people can go refer back to it. It can be used to print out, it can be, well and within AgriFed we're actually requiring people to sign up, so that we can create, sorry, that we can collect usage metrics and this data collection had to be approved through the ethics committee. We already had an existing ethics approval in place for the interviews that were conducted in the early stages, but we then extended that to also allow collection of information from the tool. Actually what I'll do is I'll just open the tool itself. Okay so the fair assessment tool itself presents it's available online at assessment.agrifed.org.au. It does come up with just a screen that sort of explains roughly what it is with the link to the ethics approval and everything. People have to, if they're not registered, they have to register otherwise they'll log in, so I'll just log in. And then the same screen appears again, but you can now enter a new assessment or view assessments that you've already done. There's also a link to AgriFed web page, help page. Anyway we'll look at these. So if I want to run a new assessment, it opens the assessment form where you have to enter the name of digital resource that can be the name of the data set or some other meaningful name, a description if you wish, and the reason for assessment that would be probably whether it's an initial assessment or whether you're doing a reassessment or follow-up or something. We put in a lot of like pop-up help here. Now I have to say thank you to ARDC because we took a lot of the resources from the ARDC web page and embedded in there, so we have help for all the main headings, so obviously findable, accessible, etc. And each question, each of the 40 and fair questions, also have a link for help with external links and explanations about what the question relates to. So each question has a number of possible answers to be selected from and what happens here is if we select one, it will then say, well, can you please provide evidence for this? And then in this field, you would put in a URL. You can indicate a status, whether like in the early stages, you might say, okay, we haven't even thought about it, or we're considering it's already being implemented or yes, already fully implemented, and you can enter notes. Let's say if I select one that is actually like the minimum standard for agri-fed, then immediately we get visual feedback about the meeting, the standard, so you get this sort of green bar saying you've met the acceptable standard. Also, as I've gone to different selection here, it's asking me then the evidence that I supply is updated, not the request for the evidence updated. And then for every question, there's different selection of answers, and it is indicated then whether the agri-fed minimum standard has been met or not. Okay, so instead of filling the entire form, which takes way too long, we'll go and look at an assessment that we've already run. Now, when we've done an assessment, but we hadn't finished it, the user can leave it any time and come back. So all the entries in the form are automatically updated. So then you come back to the assessment. And in the assessment result, you can see where the acceptable standard has been met, what the evidence supplied was, what the status was that you entered, and you may also enter assessment notes. So this one wasn't finished, so that's why a lot of it isn't filled in yet. So that's the bottom part. And in the top part, we've also added in supporting scores. While we say, this is specific about the acceptable levels, we also added in scores just to make sure that if people are starting off with a dataset that's on a memory stick, and then they go and put it in an internal catalog with a number, that's an improvement on not having it indexed at all. If they then put it behind a URL and you can actually find it on the internet, that's an improvement. And even though that doesn't yet meet the minimum acceptable standard under agri-fed, it will show a little bit of a score so that when people then go and improve on the previous one, they see it's an improvement, it's just to encourage them to keep going. That's kind of the main reason for that. There is a hoverer here that explains what these bars mean. And then we have supplementary FUJI assessment course. The FUJI assessment tool is a fully automated API that measures the degree of machine readability. So while the metrics are a bit different to the metrics that we use within agri-fed and they don't reflect our acceptable thresholds, it is still a useful indication of whether a dataset actually has machine readable metadata. The help about that is in this little pop-up. So that's automatically done in the background. Of course, if you don't have a URL for a dataset, nothing will go in there. So that's that. While a dataset hasn't been submitted, so you say maybe answered some questions and you had to go away and come back, find out more information, come back and fill in the rest of the form. And the assessment is open. You can go back and edit the assessment, which means then you go back to the assessment form, everything you entered so far is there. And you can just continue on until you've entered everything and then you can submit it. I'll just go back to this show another example. Once it's been submitted, it shows in the list like so. You can go back in and view the assessment. Here's one that basically complied with everything. There's different levels. There's the ideal level or just the barely acceptable level, but we're fully compliant. So this one would be compliant under agri-fed and it wouldn't need any reassessment. But let's say if we had assessed the dataset and we find that it hasn't complied with agri-fed, we might then decide, okay, we're going to go and undertake some significant work. We will mint a DOI, we'll upgrade the metadata, do all that, and then we can reassess this same resource. And we'll open the assessment again. But you can change where you've improved it and it will show as a second assessment here as a separate tab. And there will be stack biographs. So there's one for each assessment and subsequent reassessment until you get to the point where everything is hopefully in green. Yes, well that's basically it. That's all I wanted to show. The fair assessment tool is available online at this address. Source code is available on its open source. It's on GitHub. I've minted a DOI for it with a bit of extra information linked to the agri-fed website. If you want to try it out and you find anything, you find bugs or you have suggestions, please contact me. And then I'd like to acknowledge that ARDC made investment into this. I acknowledge the developers of Fidgy Fair Tool. And thank you to Richard and Scott for helping me with the source code and the deployment. Okay, thank you very much. Thank you very much, Chris. And thank you also to all our speakers today in helping us celebrate the 10th anniversary of Research Data Alliance. The fantastic presentations, looking at fair data, the implementation tools to actually assess that, as well as some of the practical applications that can be derived from this work. At this note, we will end this webinar. I wish to thank all our participants for joining. We will be making this webinar recording available on the ARDC website, as well as the Research Data Alliance website. And that will have links also to our speakers if you wish to follow up more. And that has sparked some interest. I encourage you to go and have a look at the Research Data Alliance website and register for any of those working groups and interest groups as well. Thank you also to Catherine Barker and Kerry Livitt for helping organize today's event. And I wish you a very happy Wednesday afternoon. So thank you very much and we'll talk soon. Bye-bye.