 Thanks everybody for the opportunity today to talk about, I suppose our experience from CERTI from Federation University around grains related trial research. So I'd just like to acknowledge there's quite a few who have contributed to this work and I'll get to a few other pieces as I go. Just a preface to today's presentation. So as stated, work at the Centre for Research and Digital Innovation at Federation University. The presentation is largely based on what was a proposed data quality framework. So I'll work on this in 2017-2018. Elements of that have been implemented since with what we've been doing with online farm trials. So I'll explain a little bit more about that. But just also say that with those data quality improvements, there's been a lot of internal investigation and understanding the data. But also we have an expert advisory group which has representatives of the Australian grains industry, including the Grains Research and Development Corporation, GRDC. They've been really important in helping us and to say that it's a true collaboration and partnership between GRDC and CERTI. OFT started in 2013-14. So we've been working at it for a little while now. In terms of online farm trials purpose and thank John Rivers for this to help provide some guidance. OFT really is around providing enduring profitability for the Australian grains industry, especially growers. That's a key sentiment of GRDC and their reason for being. So it's around accelerating grower adoption of DNA trial findings. So by providing access to that supporting data and information, enacting as a resource for DNA investors and providers. So looking at some of those gaps, how the system can be used as a tool for gap analysis. So things like metadata assessments or meta-analysis on a particular agricultural issue or theme. Helping to connect growers with grower groups and other members of the agricultural industry. And thirdly, avoiding duplication of work. So there's a lot of grains trials research that has been done in Australia. So just a little bit of a caveat. So probably the views I'm going to express today are more from an online farm trial perspective and what we've found rather than the whole grains research space. It's an enormous space and probably allude to that a few times during the presentation. So it's just a little caveat there. In terms of grains research in Australia, there's a rich history. Over 150 years we've been putting in trials. But there's a lot of different organisations who have been undertaking that trial research. A lot of that stemmed originally with growers and government playing a role there pretty strongly. But we've seen more over the last sort of three to four decades grow groups who represent growers of like minds in particular regions or areas of Australia. Universities have played a really strong role in linking and especially around the research aspects of grains. But we also mindful there's a lot of corporate and commercial entities and private partners that are doing research trials and experiments. And lastly, we can't forget the growers. The growers are always doing experiments, even their own paddocks, strip trials, all of these different types of things to improve their yield. However, it's probably fair to say that a lot of trials remain in dark, undiscovered and lost to current and future generations. And one leading agronomist in New South Wales said a couple of years ago that we're effectively going through the third wave of repeating the same research trials. And that was in his career. So it just goes to show that potential duplication or repeating research is something to be mindful of. In terms of grains research trial out there, this was from an external audit back in 2017-18. Literally, there's millions of grain trials out there. But probably it'd be fair to say while some of the research findings are out there and more accessible, a lot of the data and underpinning information aren't necessarily there and easily used for other purposes. So quite a variety of different organisations have been undertaking research trials in Australia. In terms of what is online farm trials, it's had its beginnings in 2013-14, as alluded to at the start, with three initial contributors of trial research data into the system. The focus of the system was really to make trial data and information discoverable. That was bringing that research data and information out of the dark and into the lock, so findable and accessible to the grains community. And currently, there are over 80 contributing organisations to online farm trials with thousands of those trials publicly accessible and available in the one location. So the key drivers there in online farm trials is access, share, view, searching, referring to other data and comparing results. Just sort of some of the key functions of the system which I won't delve into. There's a lot of different grain and crop types in the system. So we have a lot of cereals, forages, oil seeds, pastures and pulses. So there's a lot of different grain production systems and crop types across Australia from Western Australia, across the southern areas of Australia, South Australia, Victoria and Tasmania and then up through the north in New South Wales and Queensland. And that just, the numbers here are a little bit outdated, but just gives you an idea of some of the number of trials that we have in OFT. When looking at some of the legacy data at the very start saying that we've been doing trials for 150 years, there's the ability to bring some of that data forward and look at how things have evolved through time. So this is an example published in the Victorian Agricultural Report for 1884, a research trial at Dookie. So where they're able to test 24 different wheat varieties and see how they performed in terms of yield. What we can do is take some of that information that's often captured in hard reports or potentially PDF documents, now bring that into the light, so to speak, and now use that data information to look at different trends and how varieties, breeding programs, etc. have changed through time. So this is just an example of taking that data from that particular report, bringing it into OFT and then enabling people to use that in a free and accessible manner. In terms of data quality advances in the platform over the last three and a half years, just quickly summarise these. What we've tried to focus on is increasing minimum mandatory fields for trial data sets for inclusion in the system. I suppose what's a little bit trickier is sometimes it's a bit of a balancing act because we've got such a diversity of contributing organisations from universities and highly resourced organisations through to smaller growth groups or potentially even individuals who don't necessarily have that level of administrative support and data type systems and management to help them to share some of this data and information. Other things are around trial statistical parameters and the design of particular trials. There's been a lot of evolution in that over the last 80 to 90 years. So we looked at things to enable people to do searches and filters on things like whether the trial is replicated, randomised or blocked. You've always balanced in the acts around dealing with legacy trials, but also trials that have been done in the past but also trials already in the system and the level of data entry with a completely populated for all these fields or where bits and pieces missing. And then you've got new trials where you're trying to maintain a base level of metadata and also data quality. The crop types which we've just briefly shown before, there's quite a diversity of trials with different crop types. So sometimes trials reported against one crop type, but in a matter of fact, there might have been numerous different crop types undertaken in that particular trial. So we've had to update that trial type. So there are what's called demonstration trials, which are sometimes strip trials or might be whole paddocks that growers or researchers look at yield or different crop behaviour. There's a lot more work around precision agriculture, which probably over the last three decades we've seen that become more uptaken by growers. And the main stage is experimental trials. A recent upgrade is looking at authorship and researcher IDs. So there we're looking at standardisation of researchers and linking that to orchard IDs and researcher IDs. Other parts is around trial warnings. So a lot of agronomic trials is potential risks, you know, climates or pests or all different things that can go wrong, put out the wrong chemicals. So we had to also include in the system trial warnings when things didn't go quite right. Just so then people who are using the trial and the data that's behind it know that there's other factors to consider. Some published trials in the system, so we're trying to work with contributing organisations to make as much of their trial data and information accessible. Been doing a lot of work on standardisation of measurement types. So a lot of different observations and measurements around grains and trials research. So we've been doing a lot of work there to consolidate and align those with standard vocabs. That's pieces we're looking at going forwards. And contributors, you know, about a particular report was published, but it might be in a grower group magazine or some other source, but actually the researchers were affiliated with a different organisation. So we've had to actually look at ways to make sure that people have permission to share and publish that information online. Just to also emphasise that there's been quite an evolution in trials. So this is just an example to actually say a lot more rigor in the statistical design and implementation of trials over the last three decades in particular. So what we're finding was that a lot of the legacy trials didn't quite have that level of detail and requirement, but that need is evolving. So we've had to implement new changes in terms of the tagging of trials with those key things around trial replication, randomisation and blocking. Standard sort of experimental design type features. And just to touch on, here's an example of where a trial is being tagged with an unusual adverse event just so that people are aware with the data that's in the system that it could be impacted in some capacity. In terms of trial data and information in the system, the main things that we tend to capture around project details, so some of what we're calling project metadata, we also capture information about the trial methods and how it was implemented, might be the types of machinery, how big was the particular trial in terms of its dimensions, whether it had a number of replicates, et cetera. Trial results, so often whether the particular data that was in that trial, the sort of summary results of that particular trial, whether that includes some key statistical parameters, but mainly around the measurements that I discussed earlier. In terms of soil data, we also access other soil data sources such as a soil landscape group of Australia. So we're able to have that as accessory information to accompany trials. Likewise with climate, what we access is some of the solo weather station data. So we're actually able to have climate data from nearest weather station, et cetera, depending upon where that trial is located. So that information can be accessed and help provide context to the trial for that particular year. The key thing also is around the trial report and attachments. So a lot of PDF, these are all PDF documents that are attached to the trial. And we also provide linkages for that particular trial and crop type and measurements, et cetera, to other GRDC final reports through their system. So that enables people using the system to link to those other resources. Another key part we've been focusing on is with GRDC support, is the terms of contributions. So really important around making sure that trial metadata, so its details about the trial was open. So GRDC has been really strong supporters about this freedom to operate. So trying to make better use of these trials and the data for future purposes. So what we've done as part of this is ask organisations to contribute their trials under fair terms. So those new terms and conditions are CC by 4. That's been a big change in the last 12 months. In terms of actually what was initially proposed in 2018, FOFT had sort of set out five ideals in terms of a workflow. So begun with the data quality framework at phase one to try and tease that out, look at a quality rating scheme, then apply that scheme to all the trials in OFT. The fourth bit was about supporting trial contributors to improve their data quality management. And lastly was about conveying to users how to use that ratings for their decisions and how the data can be provisioned. So a lot of it stems from some impact research that was done in 20s led by Angela Murphy in 2016. We're really flagged that trials without some sort of quality rating would often mean that people would assume that the trials were all according to the latest standardized scientific approaches and techniques and we know that's not the case. So there was those assumptions that were incorrect as a consequence and that everything was equally equal across the system, regardless of who did the trial and when it was done etc. So we needed an approach to provide a bit more transparency about trial data quality in the system. So the thoughts were around a data quality framework and GRDC actually really pushed this back in 2017 about enabling the sharing of information and data as a focus of Grains funded research across Australia. So the focus from that was about how do we increase those research outcomes and discoveries of improved data and how could we apply standards and protocols. Also I wanted to look at how we could support organizations to adapt to these latest applications and technology developments really with a longer term interoperability goal and then actually thinking also how do we adapt some of these data quality principles and perhaps they might be relevant to other parts of the Grains industry per se. And finally it's really about improved grower support so quality data from trials provides advisors and users of this data and information with better strategically better decisions and options. In terms of having to look at data quality aspects sort of went through literature and looked at all the different terms, phrases, definitions of what is data quality. So there was quite a few publications that had different elements of data quality and just was having a look at what things fit best in terms of what we were trying to achieve with online farm trials. So as an approach proposed approach we decided to run with ABS data quality framework mainly because it was at strong linkages with standards for assessing and reporting on statistics. That was a key part that drove it in that direction and that aligns with a lot of the research trials and experiments. But thinking also about how do you present that to users. So I really like the ALA approach to data quality to make it as simple as possible for users to understand and interpret. We're also very mindful low in the approach that the data and information has been contributed to the system by organisation. So how do we manage that in a way to preserve that but also make sure that we provide data in a harmonised and consistent manner. So the integrity of that contributed data was at the forefront of our thinking with that approach. In terms of quality assessment reporting we consider different ways, different techniques whether it was a questionnaire or tool. So you could use some form of self assessment which happens with a lot of different parts of our industries. And that could include a range of experts or tools to support that. We could use metrics which include filters around data quality. Talking about within, between records, tables etc. Or might be around system optimisation and reporting. Different potential approaches. So in terms of an approach that came up with, we looked at those seven data quality dimensions. And essentially worked out five potential questions for each dimension. So those dimensions were institutional environment, relevance, timeliness, accuracy, coherence, interpretability and accessibility. So in a nutshell came up with a total scoring scheme out of 35. So for each question it was a yes or no. Yes equals one, X equals zero. So the maximum score five for each five by seven. And as an example, institutional environment is one of those really tricky ones because of such a diversity of contributing organisations. Thinking about the yeses on the left with the contributor as an example. Does a contributed OFT publishing this data, are they the recognised custodian versus no. The contributor to OFT is not the data custodian. So just set out some questions to help with that appraisal of quality. And this is just an example for institutional environment. I won't show the other components, but just to give you a feel of the yes no type questions that was thinking about in the framework. And thinking about how you might present that. So here's an example of trial data quality statements. We have a lot of people want to know, call things around trial title or project who contributed it, aims and messages. They're key fields that people accessing OFT look for. And then having a data quality rating. And in this particular one, it came up with an overall rating of four. It's just an example. It's nothing definitive. And thinking about a range of different data quality tests. Again, this sort of links off some of the rationale and what's been set out with ALA about things passing or not passing or alerting potential users to issues with trials. And thinking even about also how you present that in a simplified form. So this is what was a proposed trial project record where you have a number of data quality tests. And based largely on the ALA example, where there's been two that have failed ADA portion 59 tick, etc. So and looking for other things that people, when they're, especially in grains trials research, things like trial classification and plot design, a common data interest in trials in OFT. And behind that is this range of data quality tests. So while we have a rating system, this is a range of data quality tests sitting behind it. So I think in total there was there was over 70 data quality tests that could be run. But this, as I said, this was proposed and thinking about how those data quality tests could be implemented within the system. And again, just just then how some of this information might be presented in a clean and curated form to users of trials. So just some finishing bits is thinking about next steps. The idea was to actually have take the theoretical to a technical implementation. While there are bits we have done with GRDC to improve OFT that being perhaps not the entire framework in its own right. One area we're looking at was active trial management. So what I mean by active trial management is helping to work with organizations conducting trials to have a lot of those data quality principles embedded at the very start before something is in the ground growing or they're looking for an effect of a particular thing. So the idea was to actually rather than looking at a cure was actually thinking, well, hang on. Let's actually get in on the ground floor and work with organizations before we have to do a lot of data wrangling. The key part also is around how we guide and support organizations given there's got such a diversity of skills in this space in data management. And we found this right across Australia through other projects and soil CRC other programs. That's a really big issue. But also conveying to users the benefits of a data quality framework is something that you still got to work on. In terms of experience and learning, there's a plethora of open access data relevant to agriculture and grains. And we're accessing some of that, but there's still a lot more that can be linked in. Understanding the terms and conditions is critical about contributing in data, how it's used. Not a lot of people, unfortunately, I think about the qualities of data. I think that's sort of an afterthought in the support of domain experts around the data wrangling, sharing metadata is something we're still working closely with CRC on it. And the services that come out from it. So that all those different components of data management, ownership, custodianship, sharing access, governance, metadata, licensing, quality, the complex issues. There's no straight highway to deal with that. It's always different for every organization and individuals operating in ag research data. And without, you know, with GRDC, I think, you know, it's fair to say that they recognize this isn't a straight solution, but they're really trying to support the industry through initiatives in statistics, data management and access. So great that they're here today. Yeah, just a question is a data quality solution is, is it the carrot or the stick? I'm not, I still don't know the answer to that. But we're really from an OFT and a 30 point of view and I think from a GRDC point of view, really focused on on making the data fair that's a real driver for us at the moment. So finish up there.