 I'll give you a background that about two weeks ago, I gave a presentation internally in ARDC about this is what it actually means to be machine readable fair. But at the same time, Ming and I have been working on a paper with a group in the US who published that paper on publishing quality information about datasets. And we were working on this paper where one of the measures of the quality of a dataset is being proposed is actually how compliant with fair it is. So that's kind of the gist of what I'm going to present today. But first I'd like to acknowledge and celebrate the first Australians on whose traditional lands we meet and pay our respect to their elders past, present and emerging. So the work that I'm presenting today was done with NCI as part of the ARDC Turn Oscone NCI funded project on elections at what it actually means to be able to access datasets at Exascale because Exascale will be here by 2030. And what does that actually mean? And also I'm involved in a project, the World Fair project, which is looking at machine to machine interoperability of firstly geochemical data on part of that case study. And there's 11 other case studies where we're actually trying to say, well, hang on, if we can do it within our domain, how do we then do machine to machine across other groups, including social sciences, oceans, biodiversity, chemistry, et cetera. So both of these were investigating what it means to have machine actionable metadata, data at scale. And then as I said, I was working with Ming, God only knows why we're doing it over Christmas, we'll find out one day. But it's about how your compliance with machine actionable quality information. And if we can attach that to the datasets, it also helps a user understand how, you know, the quality of the data. And this is becoming particularly important as AI and ML take off because they really are demanding machine actionable data. So I don't think in this audience, I need to explain what fair is, but remind you that it all started in a paper in 2016 that was published in Nature. And I think Russ Wilkinson, formerly CEO of ANS, was part of the international group that put them together. And so again, what was the purpose? And this is why I said, I left this in from the internal presentation. But ARDC's purpose is to provide researchers with competitive advantage and accelerate innovation through creation analysis and retention of high quality data assets. And as I have said, you know, what defines a high quality data assets and now compliance with fair is being considered as a measure of the quality of the data. So if we go back to the original abstract, that's kind of funny because I've been promulgating fair for quite a few years since this came out in 2016, but kind of rereading it with new eyes, it's there in the abstract. The fair principles put specific emphasis on enhancing the ability of machines to automatically find and use data. Ascursus was driven partly by the GoFair group in Europe and they at that stage were already into AI and ML. And they said, unless you enhance machine readability, then you're not gonna get very far and I would agree with them. And so of the 16 fair principles, most of them and most of the fair maturity indicators, anything's around only really giving you human readable. And Baren Mons, one of the authors of the original paper has got an incredible number of citations for that paper and he argues 95% of them are wrong because people are saying, oh, my data's fair because oh, you can find it, you can access it all by humans. And this is a new paper that just came out from Eric Shilters and it's really interesting how he has coloured them up. And I draw your attention to the yellow colour because that is how often metadata appears in the principles, which I probably hadn't realised till this project. And then you've got where you've got content standards, say the pink and the technical components are your protocols, et cetera, and they're in red. So it's a kind of different perspective and so when we're working on this, what does machine actionable fair paper means? We were pulling each one of those apart so that a machine could actually get to and find it. And the other important thing that I never really noticed was F2, it keeps on, they keep these authors bringing up this term rich metadata and I thought, oh God, what does that mean? And you can see what I mean. It's metadata that is part and parcel of almost every one of the fair principles. And again, I keep coming back to that metadata must be machine actionable. Okay, whoops, going back. And as I said, they were influenced by the GoFair group. I've often heard that groups that end up and say, fair really stands for fully AI ready. And that these principles were developed as guiding principles expressing researchers should expect from data resources and they deliberately made them very high level and domain and technology agnostic. And that was a noble aim, but it has caused havoc as I'll come back to. Another thing too that this paper by Wittenberg et al noted is that while the research data alliance and a lot of the guidelines, they're just talking about procedures, components of interfaces. The GoFair group who are now driving machine actionable fair and at data object level is cause they were actually operational and they were trying to make it work. So again, I've got the references here in this paper. And so the end result, as I said, of being generic, trying to please everything is no one knew who or what was really fair. Funders were putting fair compliance, but then it was so diverse what they're getting back. They started to ask well, how fair are you and grants had different specifications. And so what that produced was a proliferation of fair maturity models and assessment. And when you look at them in detail, they'll tell you how to quantify how fair a data set was, but did not tell you exactly what resources were used to comply with each of the fair principles. It's that specificity that we actually now really need to start getting under control. And here's an example and I've just taken it from the most popular fair assessment tool, which is Fuji, fair dash UJI, because UJI in Malaysia, which is where Anu comes from means fair. But you can sort of see, oh, here's the initial top in the top row and in the bottom row, oh whoopie do, they've increased their fairness for accessibility, but I don't know what tools they're using for accessibility for interoperability. I don't know what vocabs, they must have vocabs online if they've got such a high score, but I don't know what they are. So then the GoFair group introduced their implementation profiles. And as I mentioned, as both part of the World Fair Project and the 2030, we tried to do it. And so a fair implementation profile is based on two concepts. It's the community, so who you're trying to interact and make your data fair, but being able to declare and be specific about what data sets, what metadata, what protocols, what identifier systems you're using and what standards. And so for a fair is a digital object and it itself requires compliance with the fair principles by having a globally unique persistent identifier. So you can just imagine if every resource you use to make a data set fair has a DOI or a PID, it's a lot. And so they designed this questionnaire and note how I say you can have multiple furs with each principle. But a lot of people then thought, oh whoopie do, all I got to do is fill out the spreadsheet. And so these things are proliferating like rabbits, but nobody's actually specifying using the furs, which is the second part, so that when you pick up a data set, you can really say, oh yeah, that's what I mean. And so we in the World Fair Project, and this is the concept from the project leader, Simon Hodson, isn't it better to know which fair enabling resources are being used rather than having a score that gives a quantitative value that is meaningless? Are we enforcing, and the second point came is that everyone was trying to improve their fair scores. And maybe in some cases, you're trying to make the data parts or the furs more mature than they need to be for you to be able to interoperate your data machine to machine. So then as we found to be effective, you need the FIPS to be, as you can see on the right, and what we did in geochemistry was we did them at the level of the repository, the data collection and the data set, getting much, much more precise and specific when we did it at the data set level. And then we also said that it was impossible in geochemistry to get agreement, say in the geochemistry community for international standards, but we actually had a few local standards. And so we said, well, who cares if only three people are using the standard or three research groups? I know it's not ideal, but at least it means that when those three independent research groups get together, they can interchange their data machine to machine. And what tends to happen is that somebody gets brave and publishes a decent vocabulary and then it starts to get picked up at the regional, international and ultimately you achieve the goal of having your vocabulary or your resource endorsed by say the international science unions. And the important point again, I come back to the quality of the data set is that your FIP with links to all the resources used can be published as a DOI and then linked to your metadata as a related resource. This is what we have done with geophysics. And even if you don't get 100% agreement, like with vocabulary, you'll never get more than 90%. But you can actually also develop machine to machine crosswalks and publish the crosswalks. So you can sort of see how everything's going from human readable to machine readable. And I just thought I'd digress a little bit because again, this is work that I'm seeing happening particularly in the US with the care principles where they're now taking the fair principles and implementing them at final granularity and this indigenous metadata bundle communique says that within your rich metadata, they want to see governance, provenance, what lands and waters it refers to. Whoops. And local context and notices. And the provenance work that these guys are doing, I sort of thought, wow, because they're developing an IEEE standard. And again, the provenance record goes with your, I think it's your R2 or your R3, you know, your reusable fair component. And so rather than getting this text rich block of blurb that says, oh, this data set came from Fred Smith who and Mary Jones enhanced it or something, you've got it all in a machine actionable protocol. And more importantly, you can actually see data site have now published how to do the guidelines for local tech context and notices within the data site 4.5 metadata schema. Again, it's the precision that's coming in. So what specifically has changed with this machine to machine approach. And as Hodson and Gregory said in their paper that I've got the link to there, we're going from a bibliographic data stewarded practice to a data engineering, where we engineer what's going on in the data through DOIs. And we specifically note data are described with rich metadata. And I fundamentally don't know all that many data sets that have this level of richness that they're now prescribing. But rich metadata allow a computed automatically accomplished routine and tedious sorting and priority tasks that currently demand a lot of attention from researchers. And I'll just actually also say, which causes shock horror amongst a lot of the traditionalists particularly the bibliographic group, is that in practice, they're now demanding that your data and your metadata have separate identifiers. And now I'm going up to the next level that I've been working at. And here on the right, you can see all the groups that are involved in this. This is an international project. I'm involved in the geochemistry, which is being led by Oscope in Australia. Australia is also leading the social surveys. One was again about machine, machine interoperability of social science surveys. That's led by Steve McGacken of ANU. And as I understand it, HASS and the Planet ARDC, Zimatic Research Data Commonsers are wanting to join in on this project. So now I just want to go back to what I call barely fair. And I think the best you'll get out of that is the institutions or researchers that use the generic repositories like Sonato, FigShare, you'll never get quality rich metadata out of those systems. They're too generic and they make data findable. And if you're lucky, you get a license and access. And I also have used the word DOI not PID because one of the interesting things we're now finding is that the publishers will only use DOIs. So, and they have to be placed in the correct fields. But I'm just saying that to me is bare minimum, I can find it, but it's not a quality metadata profile. Human Fair, this is the most common. And sorry, ARDC, but the fair assessment tool that they promote currently will only comply with human readable fair. It's not enough details to enable machine readability. And the machine readability that everyone fails on is the I2 interoperability principle demands that the vocabs themselves are fair, which means they are findable, accessible, interoperable and reusable. And most vocabularies are just internal and not published. Machine actionable fair, the demand for identifiers will be huge. Do we have the capability? There's very limited capability to make semantic resources fully fair, compliant. Again, I just reiterate that metadata and data because it upsets a lot of people. And so we can't continue the way we are with human readable. To go to machine readable, we've got to start automating it. And I don't care whether you're working with the long tail or with the HPC, we have to work out ways to automate the generation of quality metadata. And so at the top of the scale is what I'm saying and what we're trying to do in World Fair. But if you can't get machine to machine within your domain, you're not going to get the cross-domain. It's just too hard. So I think things for fair as an indicator of quality, it's growing, this demand for machine accessibility, I think it's being drawn by AI and ML. And I'm actually starting to see all these papers coming out, how to make your dataset AI compliant. We'll read what the fair principles were in 2016 because I think they nailed it. Although I'm saying this is where you need to go, it's what's appropriate to what you're doing. And measure of the level of the compliance of machine actionable fair will equate to the quality of a dataset. And I'll just throw that out as to whether there is a group that's wanting to form to try and nut out some of these issues. Because as I said, there's quite a few groups in the US and Europe that are working on it. I don't really know of many groups in Australia that are. And so with that, I will stop sharing.