 Okay, so welcome everyone to the second round table for the Planet Research Data Commons public consultations. Let's start with the acknowledgement of country and acknowledge and celebrate the fact celebrate the first Australians on whose traditional lands we meet. For me, that's the Ghana people of the Adelaide Plains, and I pay my respects to their elders past present and emerging and acknowledge the, the very important role they played in caring for country, and with the traditional knowledge for thousands of years. So just quickly about ARDC for those who aren't as familiar with us. We're going to demonstrate in researchers with a competitive advantage through data, and we do that by accelerating research and innovation by driving excellence in the creation of analysis and retention of high quality data assets. So to fulfill our mission the ARDC runs programs and facilitates partnerships with the right wide range of stakeholders from universities, government and industry. So today's round table is about defining the value proposition for the Planet Research Data Commons, and we'll also refer to it as the Planet RDC. So first I'm going to take you through ARDC's plans for the thematic research data comments and explain what an RDC is. Then we'll look at what the Planet RDC needs to deliver and why it will be transformative. With that I'll present a series of data challenges that we have identified, and then I will be asking you all for your thoughts on these challenges and also what other critical data infrastructure gaps there are for your areas of research. Finally, I'd like to know how addressing these challenges will impact your research and that's your specific domain or discipline. The question of how we'll deliver the additional infrastructure to address these challenges will be addressed as a co-design process with stakeholders following these public consultations. Lots of data infrastructure providers here, e-research supports, data users and data collectors, mainly from university with state government and NCRS. Great, welcome everybody. Okay, so we'll move on. So what is a research data conference? Well, a research data commons brings together people, skills, data and related resources such as storage, compute, software and models in a sustained research infrastructure. This supports open research and specifically the fair principles, meaning that data is findable, accessible, interoperable and reusable. So, with the thematic research data commons, from the work that we did on open calls, we know that we are currently unable to meet the demand of the research community for digital research infrastructure. So ARDC's future strategy is based around the concept of thematic research data commons that enable us to support the maximum number of researchers through a small number of strategic priority areas. If you like, a fabric of national research infrastructure capabilities selected strategically rather than competitively and co-designed with the research community. The fabric is both nationally focused platform capabilities that strengthen and support the broader system, the horizontals, and a deep focus on identified national challenges and opportunities, the verticals to provide an ideal balance for the national system. So, you can see here we have three research data commons. The first one is the humanities and social sciences, RDC and indigenous research capability. The second that has gone through consultations is the people RDC, which is focusing on health and biomedical research and the planet, which is what we're talking about today. So as a hub of expertise, the ARDC is positioned to best drive practice in the creation analysis and retention of high quality data assets and share this expertise across domains. The thematic RDCs are ARDC's programmatic approach to delivering our core competencies. It's a coordinated system view of looking at our existing activities and a way to work strategically with our partners. ARDC is uniquely positioned to work with our increased partners, government and research institutions across a broad range of disciplines required to address the complex multi-system challenges our planet is facing, from the geosciences to the ecology of our marine and terrestrial ecosystems. Some of the most valuable research contributions come from synthesizing or integrating data from different disciplines, for example, to understand the dynamics of Earth's interrelated systems and how these are impacted by climate change. These integrated data sets may or may not be big, but they are wickedly complex from the perspective of semantics, standardization, governance and policy. The planet research data commons will deliver sustainable shared accessible data and digital research tools so researchers can tackle the big challenges for our environment, which include adapting to climate change, saving threatened species and reversing ecosystem deterioration. Today, we'd like to know what challenges you see as the priorities. So as I said, the thematic RDCs are ARDC's programmatic approach to delivering our core competencies, which are driving excellence in their creation analysis and retention of high quality data assets. So the key components of a thematic RDC are these four layers. At the bottom, the expertise represents the deep knowledge of the ARDC team, from metadata standards to data governance to machine learning. The infrastructure layer is the enduring and underpinning capability that ARDC has committed to in the long term. Projects are activities that we undertake with national stakeholders to develop new digital research infrastructure, such as tools or data assets or implementing standards. The overarching program governance layer is essential to make sure the RDC aligns with national priorities and meets the needs of research industry and government. Now, we all know projects receive time bound funding and traditionally sustainability has been a challenge, and the RDC model looks at addressing this. So when the outcomes of a project are of national significance, we can look at transitioning them into services, which are enduring, reliable, stable and trusted. So this delivery model for ARDC will have a five to 10 year roadmap. This isn't a single project, but ARDC's long term strategy. So what I'm going to do now is I'm going to present a series of data challenges that we've identified. And just to tell you about how we chose the challenges. We analyze national research infrastructure priorities, research priorities, discipline, decadal plans and increased strategies. So these included the 2021 National Research Infrastructure Roadmap, the Australian Council of Learned Economies, Australia's Data Enabled Research Future Reports, the Independent Review of the EPBC Act, State of Environment Reports, various decadal plans from the Australian Academy of Sciences Committees and increased facilities strategies. And we also drew information from consultations with hub leaders of the National Environmental Science Program and others. So to get into the challenges. Oh, first of all, there are a number of recurring statements in the documents we reviewed that aren't specific challenges, but they will need to be embedded throughout the activities and outcomes of the planet RDC. We've turned them essential attributes. These are that the fair and care principles for indigenous data governance must be implemented. But continuity of data storage access and management is essential. And indigenous knowledge and data are essential for caring for country. So these are the dumb challenges, I will go through them one by one. But just so you can see them all the first seven challenges can be mapped to the research lifecycle, while the last around data governance is an overarching challenge. Okay, so this is not a traditional research lifecycle or how data is used in that life cycle. And so we have discovering data relevant to your research, collecting your own data, processing that data, integrating it, your data with other data, analyzing the data, publishing it and then translation of the research for real world impact. And now we've only got a small group today. So as I go through each challenge, feel free to turn your mic on and ask a question or put a question in chat. My colleagues are monitoring chat and they will interrupt me as needed because I can't see it. But yeah, please feel free to ask questions because we've got quite a bit of time and there's only a small group. Okay. So the first part on the, the life cycle is discovery and we're talking about, sorry, our first challenge is federated discovery and access. So there are institutional discipline government national and generalist repositories where data can be stored and made accessible for reuse, finding relevant data or even knowing when a look is a challenge for researchers. So as a potential solution, the planet IDC could augment the existing research data Australia to federate a wider range of repositories, enabling multiple repositories to be searched as a single step. We could work with the research community on making the metadata fit for purpose. There are also other federating systems such as Cyrus knowledge network. Okay. Second challenge collection of fair ready data by fair ready. I mean a digital object that has some of the attributes of fair so those that can be fulfilled by the data set itself rather than the infrastructure around the data set. So is the data in an open documented structured format. Does it use published vocabularies to describe the data elements. So, an issue is that data that's not collected according to an agreed metadata or data standard is harder to make fair. So a potential solution are digital tools or apps for data collection that can automate and enable standardization. For example, the famous project has created a field data capture app that can be configured for different disciplines or monitoring campaigns. The famed team have used it to create the pilot soils monitoring incentive program data app for farmers to collect up soils data on their farms. So the data standardized from the time of capture, which enables three months submission to the Australian national soil information system database. Turn also has a field sampling app which incorporates the Australian biodiversity information standard. So the planet IDC could enable the creation of discipline specific apps configured with a particular sampling protocol. So that data can be collected by using the same standard terms by everyone undertaking the field work and the fair ready data is easily exported ready for publication. This saves researchers time in having to make having to apply standards at the end of the project and greatly increases the likelihood that the data will be published at all. Okay, so onto processing. One of the issues that's come up in conversations with the nest pubs I must turn an old scope is that processing large streams of data is difficult. So these can be large data set streamed for instance automated monitoring data which can be photo video and audio capture of species and ecosystems or sensor capture sensors capturing environmental variables. It could be remote sensing photography infrared radar and LiDAR. It could be a range of sciences data like seismology. So the data offering challenging to process and manage because there's lots of it, it can be constantly flowing in it could even be real time, and it's often unstructured. So the machine learning is often issued to the answer to annotating these data, but it's not as simple as running an off the shelf algorithm of your data and getting valid answers. So researchers don't have the capability to perform this themselves or the resources to access commercial options. And additionally there's still the storage issue to address. So, as a potential solution or as an example of a solution, the ARDC platform project open eco acoustics is providing data submission tools, data storage, a metadata standard and algorithms which they call recognisers to analyse eco acoustic data. So there's training on how to use the recognisers and a portal to share the data and results. This could be extended or uses a model to deal with a wider range of automated monitoring and remote sensing data. And this leaves researchers to focus on the research questions. And that's just one example of a solution for this type of data. The challenge we've identified is curated integrated data sets. So there's a great deal of data collected on Australia's ecosystems. But differences in sampling methods variable studied units used in how the objects of study are identified and temporal spatial scales can make these hard and extremely time consuming researchers to integrate the data. And that's exactly if it's data sets from different domains. But integrated data sets facilitate and encourage interdisciplinary approaches to problems. So, an example of a really successful curated data set is the ARDC supported cross interest and Chris project eco assets, which provides open integrated data delivered by a la Imos and turn. These eco assets, eco assets data sets are publicly available for biodiversity reporting and assessment covering Australian terrestrial and marine systems, and they've been specifically created to feed into the state of environment reporting. So other curated data assets could be created for the planet research data comments. For example, there are emerging needs in for fine grained biodiversity data coming from environmental economic accounting. So the next challenge is modeling infrastructure for sharing of models and access to compute. So environmental modeling allows researchers to understand how ecosystems will respond to change and future risks and uncertainties with which is essential for environmental research management and policy development. While environmental researchers have growing access to powerful software tools in our Python, they're not all competent mathematicians and modellers. It's hard for them to find full examples of workflows, it can be hard to find data, and for many there is a lack of computational research resources run big problems. And for early career researchers the modeling learning curve can be exceptionally steep modeling infrastructure that allows models to be shared and reused across domains, and also allows access to compute infrastructure. We'll create world class infrastructure to support decision making, adaptation planning and intervention strategies. An example is the ARDC platform project eco commons, which provides a user friendly environment for analyzing and modeling ecological and environmental challenges. It brings together curated models relevant data and compute. And this has already been being extended to biosecurity and provides a general foundation for application tool wide variety of problem domains. No questions so far. So when we go to publishing. It's often hard to decide what the right repository is in which to deposit a particular data set or even know what repositories are available. And this can prevent researchers from publishing their data. Researchers need clear guidance about what to use and the pros and cons of the available options. The ARDC could develop a national system that uses the guides uses the right repository to deposit their results based on the and FOR codes for their research and the policies of the institution where they're based and the requirements imposed by the source of their research funding. And this would save researchers a lot of time and again encouraging publishing of the data. Okay, nearly finished the trusted decision support tools. Environmental managers and decision makers don't have always have the skills or compute resources required to use complex models and analytics to inform their decision making. And for the government and public to trust model outputs and thus the decisions themselves, the workflows must be repeatable and transparent. So an example solution is the ARDC project biosecurity commons, which is developing a solution that allows researchers and decision makers to produce consistent and transparent models and analytics without coding experienced or high end it equipment. The planet IDC could support the development of other decision support tools in other areas of need. Now finally, this is an overarching challenge. One of the key actions to enable interdisciplinary research is improve data access and sharing to facilitate and maximize the use of all forms of data, including between governments academia and industry. Data sharing arrangements need to be simplified and streamlined. Good consistent data governance is needed. And data governance refers to the roles, responsibilities, processes and policies around data sharing. So, as an example, the ARDC is a standard program, which is the health studies Australian national data asset is and that's around clinical trials data, and that program is producing a data governance framework and a data sharing agreement guide. This model could be ported over to the planet IDC and and adapted for the domains that we're working with. So this brings us back to the planet IDC value proposition. So we propose to develop cross sector multidisciplinary data collaborations on a national scale. ARDC can connect and coordinate across anchors government and the universities to build a coherent infrastructure across the relevant disciplines. Through federated models that deliver interoperable compute storage infrastructure and services with analysis platforms and tools that are supported by expertise standards and best practices. So now, this is where we get your input. And I'm going to ask three questions. Have we got the right challenges and just remember ARDC is focused on data research infrastructure and its use so our challenges are to they are data challenges. I'm asking, we're going to ask what's the right the priority for you. And also, and this is really important how we're addressing these challenges impact research in your domain. So when we get to this, I'd like you to be really specific. This is your chance to tell us about your area of research and the gaps in data infrastructure that you or your research community face.