 I would like to begin by acknowledging and celebrating the first nations, the traditional custodians of the land. And for us today, it is the Ngunnawal and Nambri people. And of course for you online joining us wherever you are in the country, acknowledging and celebrating as well and paying respect to elders past, present and emerging. I'm the CEO of the Australian Research Data Commons, which is enabled by Encriss. And welcome to the Shine Dome, the home of the Australian Academy of Science. This is a building that absolutely thrills me. I love coming here. And I had the great fortune this afternoon to bump into our president, Jagadish, the president of the Academy of Science on the way in. And Jagadish reminded me that this year we're celebrating 65 years for the Shine Dome, historic landmark for us, and indeed 70 years for the Academy. So wonderful for us all to be here this afternoon. And of course Jagadish is a huge fan of Encriss, particularly appropriate. So welcome to the ARDC's first leadership forum of 2024. We've been running these events for a couple of years now, and usually we take the opportunity to drill down on a particular topic, whether it's trusted research environments, for example and sensitive data, or connecting with industry or skills. But this afternoon we're doing something a little bit different. This afternoon, we're taking the time to highlight and celebrate the successful conclusion of two of our programs, the National Data Assets Program and the Platforms Program. So the National Data Assets Program has created a portfolio of national scale data assets supporting leading edge research. ARDC has partnered with over 200 organizations across the research sector to leverage existing research infrastructure and investment and ensure ongoing sustainability and stewardship, rather helped by a $20 million co-investment from ARDC, but there you go. Similarly, the Platforms Program, with a focus on increasing the number of researchers with access to platforms, both the absolute number and the diversity, $21.7 million of investment there as well. And you're going to hear more about these significant programs this afternoon from Dr Adrian Burton for the National Data Assets Program, and Dr Andrew Trelaw will provide an update on the Platforms Program. So the projects are already yielding significant results. The National Transfusion Data Set has joined up information on the use of blood products to patients, patient outcomes for the first time. Who would have thought we didn't have it before? TLCMAC is revealing new perspectives on hidden histories and cultures through digital mapping tools. ECO assets has aggregated national environmental research infrastructure data to inform state of the environment reporting. This and many more examples, it seems obvious to us now that we have a huge need for this research infrastructure, but it was coming together and building the community to create these valuable assets that we're celebrating today. As we went through these programs, the National Data Assets and Platforms Program, they showed us the challenges of meeting researchers ever increasing demands for integrated and harmonized data sets and research platforms. Through this process, the expressions of interest and the 50 projects, we really saw what our new strategy should be. And we call that the thematic research data commons. We have three at the moment, three thematic research data commons. The first is called people looking at health and medical data. The second is called planet for earth and environmental science. And the third is the humanities, arts, social sciences and indigenous research data commons co-designed with the research community through extensive consultations and broad partnerships. They will enable us to achieve our goals of supporting the maximum number of researchers in strategic priority areas of research with a new approach to participation and organization. And of course, the thematic RDCs are underpinned by the ARDC's Nectar Research Caled and the tools, skills and services that form what we call a knowledge infrastructure that enables our Australian researchers to transform lives. This new direction would not be possible without the projects that we're featuring here this afternoon. A number of these platforms and data assets will be enhanced through the thematic research data commons program. Today, you'll be hearing from the Australian Digital Observatory, Ecoacoustics Australia and Erika. Before we begin the discussion, I'd like to congratulate obviously all of the project leaders and participants for their incredible investment in building Australia's research data infrastructure. But I'd also like to take this opportunity to sincerely thank the tireless efforts of the ARDC team members that have been involved in supporting, guiding and advising the projects through to their successful completion. Can't say successful today, which is a bit of a shame on a day like today. Look, many, many of our staff are with us, Natasha, she'll be joining us on the panel a little bit later, Siobhan, Catherine, Julia, Nicola, I know Kerry isn't with us today. And there are other ARDC staff as well. We wouldn't be here without you. Thank you. I'm now going to hand over to another friend of Inchris, Ryan Nguyen, CEO of the Australian Council of Learned Academies, who will be facilitating today's event. Ryan has nearly two decades of experience running strategy, policy, research and program areas. Do you really want to say two decades? Across various Australian government departments focusing on higher education, research, research infrastructure, family violence, early childhood, and supporting Aboriginal and Torres Strait Island peoples programs. He's led and managed interdisciplinary project teams, most recently through work on the National Research Infrastructure Roadmap and Research Infrastructure Investment Plan. Actually, I think that's a weenie bit old. Yeah, yeah, you've moved on from that bit. But nevertheless, you have been a friend and a partner to us right the way through this journey. And I am delighted to leave us in Ryan's safe hands as we progress through this afternoon's proceedings. So thank you for joining us. Thank you for joining us online. And I hope you enjoy the afternoon. Over to you, Ryan. Thanks, Rosie. And we really should update that bio because that's getting a bit old now. It should be said that we're also here because of Rosie's vision and her team's commitment to where we've gone. I remember when I was in the department and the idea of bringing the different facilities together at that stage was a big task. And I think Rosie, really, thank you so much for everything you've done for the research community. It's been amazing and what ARDC has achieved is it's really set Australian research on such a great trajectory. So thank you. You, my everyone. Hello in the non all the language here. I just want to welcome you here. I want to myself acknowledge the traditional owners of the land that we're meeting on the Nanguil and the Namburi people and acknowledge the elders past present and those that are on the journey. I'm going to do at least a matter of talking because you're not here to speak, listen to me speak because it's such a great cast of people today. But I also want to acknowledge the interest team, particularly Tony Rothney, who's in the crowd here and your long commitment to increase and the data projects and everything along the way. You know, like Rosie, we wouldn't be here without your and the team's commitment to those processes. Thanks for everyone joining us and particularly those online. We've got a few speakers first as Rosie spelled out will then work into a panel discussion and have some really great questions. I hope we'll explore a range of things and work out where we're going next. Can I pass to Adrian and sort of give so you can give the overview of the national data asset platform please. Hello everyone. I'm Adrian Burton, deputy CEO at the ARDC and I'm a head of our national health research infrastructure platform, the people research data commons. I was involved with the national data assets, which I'm going to give you a bit of a background on which was our previous five year program. I want you to imagine that you're a researcher and you analyze coroner's reports so that you can find the trends. But there's no national collection of coroner's reports. There's no place where any of that data is held together to enable you to do that kind of analysis. So what would you do? How would you bring that data together? Because you need to create insights so that you can allow society to learn from these terrible accidents. What about if you were a fish ecologist and you're monitoring fish and sharks? You have the kernel of an idea of, wow, we could be tracking fish around Australia's coastline and I've got a platform and data collection and data. We could really make this a national scale thing. How would you go about going from your community lab to a national view there? Where would you get the resources from? How would you even do that as part of your research sort of professional career? So the Department of Education's NCRIS program is designed exactly for that. To create national scale services and assets that support leading edge research so that these small institutional community regional initiatives can be given a national scale. And so therefore ARDC inherits that mission and as part of our mission is to help create these quality collections of data to support research at the national scale. That last thing is the kicker. How do you bring things together at that national scale? So lots of researchers and research groups have the seed but that seed very quickly or an egg, I'm just making this up as I go along here, it becomes a cuckoo egg in that it outgrows that nest very, very quickly or a little seed that's growing. And it outgrows the sort of faculty or institutional or even regional resources that would be required to sustain that little egg or the seed. So that's where our program comes in to say okay well how would we bring this up and establish and operate these things as data that is part of the national infrastructure. And that's one of the assumptions of the national data assets program is that data actually is infrastructure because if it's their multiple research projects can be run off that data for many years to come. That's the very definition of an infrastructure. So lots of communities came then when we said we wanted to have a national data assets program. Lots of communities came through the door ecologists social scientists quantum mechanical engineers. They brought with them the idea and the data. And then, and their desire to really establish this at the national level, and then they are DC provided all sorts of services, computational storage expertise informatics services. All the way through to community building outreach policy DC has a whole very holistic support model for these national infrastructure. On top of that, on top of those two, you know the two contributions to one of these national data assets. We're getting in your investment and our 5050% investment in in new resources for that national asset, really provided the glue that's required for that extra national communication national governance national strategy. And that's the thing that really starts to outgrow the local resources within the program a good deal of research depends on public sector data. And so some of our projects were just building a bridge from the public sector custodians back to the research with the Commonwealth Government let's say with the AIHW the Institute of Health and Welfare. Some of our projects were crossing over three chairs of government in the people who are collecting air quality data or endangered species or hospital information. Some of the projects were really just trying to bring together across our federated system and creating assets, you know, that were fit and appropriate for deep analytics and research. Some of our projects had a really virtuous circle, and we ended up so that one of them was a partnership between the ARDC, the ALA Turn and IMOS which Atlas of Living Australia Terrestrial Ecosystem Research Network and the Integrated Marine Observation System. And we came together to provide coherent data back to the to the Department of Environment, and that data is figured very prominently in the latest state of the nation, state of the environment, National Policy Report. The projects from this past five year portfolio, you're just getting a bit of a taste for a field here, have really been the foundation for our new program, the new five year program that Rosie talked about these thematic data commons in the areas of health environment and health, health and indigenous, the people, planet and health and indigenous research data commons. And we've been able to build on those. So, for example, in the environment, we're building a whole new national capability in environmental modeling built on the experience and the capability from previous projects. The linguistics and humanities of projects have pushed into our new humanities, arts and social science infrastructure, and in health area, we're building on secure platforms on national clinical trials and cohorts data and AI and health research to be part of a new coordinated health infrastructure based on some of the frontier projects from the previous initiative. So before I go even at the time is up I'm going to reverse the question now that I asked at the beginning. I'd like you to now imagine that you are a patient you or someone in your family is a patient that requires a blood transfusion. I'm hoping that you will be thankful that some visionary researchers, the clinician researchers in the hospital systems, partnered with the ARDC to create a national data asset that improves transfusion practice and helps the outcome of your visit to the hospital. So I'm doing really badly at holding everyone to time at the moment. So this is not a good start, but it just shows how exciting this environment really is and the potential that are coming out of the projects, the data assets, these things that weren't able before. And so I think we've really seen the art of the possible now becoming the art of the real. And so thank you for that. I understand that Andrew Trelor's got a recording that we're going to be playing now he's not able to join us. So can we play that recording now please. Good afternoon. I'm sorry I can't be with you for this celebration. My name is Dr Andrew Trelor and I had the privilege of leading the platforms program for the ARDC. So what is a platform anyway? We defined a research platform as a set of online services that enable researchers to collect or generate data, analyze those data and produce outputs that can be made findable, accessible, interoperable and reusable or fair. They often include associated integration functions and connections to specific data resources. Platforms are sometimes also called virtual research environments, VREs, science gateways or virtual laboratories. Now our organization is called the Australian Research Data Commons, not the Australian Research Platforms Commons, but it was obvious by the late teens that software environments were becoming increasingly critical to support data aggregation and analysis. Nectar, one of the ARDC's antecedent organizations, had already undertaken a program of investment in VREs and research tools. This has been successful, but the applications for investment were wildly oversubscribed. The ARDC investment sought to build on this momentum and expand the number and range of platforms as well as the disciplines covered. Specifically, we wanted to bring about a transformation in the kinds of research that could be undertaken and in the speed with which it could be completed. Now it was clear from the history of Australian and international platforms investments that sustainability was going to be an ongoing challenge. So the program was designed to maximize the likelihood of sustainable outcomes in a number of ways. Firstly, we made it clear that we're prioritizing adopting existing solutions over adapting them and then adapting them over building them from scratch. Why? Because aggregating effort around existing activity improves outcomes. We ran a two-stage EOI RFP process. The EOIs had a deliberately low barrier to entry and were made public once submitted. Following the close of the EOI stage, ARDC sought to broker collaborations between groups that were working with the same communities or were seeking to use the same technology stacks. As a result of this collaboration brokering, the RFP stage received roughly 50% of the EOI applications and many of these were now larger collaborative groups. We wanted this because larger groups are likely to be more resilient to change. We emphasized the need for significant co-investment, at least one-to-one, because greater non-ARDC buy-in should result in ongoing commitment after the ARDC investment was concluded. And we ran a series of sustainability workshops once the platforms has started their work to give the project leads a range of techniques to improve the sustainability of their projects. Across two rounds of EOI and RFP, the ARDC invested just under $22 million and our partners invested just over $38 million. This was an outstanding result, nearly one-to-two co-investment. A total of 26 projects received ARDC investment. In terms of the dollars invested, roughly 25% went on generic e-research infrastructures, applicable across a range of domains. Nearly 30% were in the biological sciences sector, another 30% on physical sciences, and around 17% on HASS. Here are just three of them. Ecocommons that built on existing nectar investments in the BCCVL, the Biodiversity and Climate Change Virtual Lab, and developed an innovative technology stack that is now being used to underpin the biosecurity commons with the Queensland Government. A particular thanks here to the team at Griffith and Elisa Barakhtarov, who did an outstanding job as project manager. Erica, a secure medical research environment developed at UNSW, with particular thanks to Professor Louisa Jorm as project lead, and the Australian Text Analytics Platform, creating tools to complement an Australian language data commons, with particular thanks to Professor Michael Haw at UQ as project lead. All of these projects and more are now being integrated into our three thematic research data commons. As well as the project I've singled out, I would like to thank all of the project leads and project staff across the 26 projects. They had to deal with starting and running large complex projects when they couldn't meet their staff or their key users due to COVID-19 travel restrictions, and face severe difficulties in staff recruitment, staff retention, and the normal challenges inherent in technology-based projects. Many thanks also to Siobhan McCafferty, the platform's project manager, and all of the ARDC back office staff, who kept things on track and dealt carefully with all of the administrative tasks attendant on a complex multi-year investment program. Finally, my deepest thanks to Kerry Levitt, who is the platform's program manager and is now the Solutions Architect for the Planet Research Data Commons. She did an outstanding job shaping, shepherding, successfully concluding the program, and then working with many of the projects to assist their transition into building blocks for the thematic RDCs that are now the strategic focus of the Australian Research Data Commons. Fantastic. So thank you, Adrienne, and Andrew, for those sort of overviews of the platforms and the assets. I think it's been, gives us a really good starting point for where we're going next, around diving into three of the projects particularly. So where we're going to focus now is move to the panel conversation, and I'll invite the three panelists to come up to the stage. We have Dr Matthew Benison, Professor Louise-Elle John, Professor Paul Rowe, and Natasha Simmons from the ARDC. The first three presentations are going to focus on particularly around explaining their particular data asset or platform. Who is using that asset? What are those data platforms enabled? And most importantly, what are the research questions that we can now better understand or even progress at all as a result of those pieces? What I'd like to do is first introduce Matthew Benison first to speak. Matt has a very career as a technology researcher and developer focusing on human data. He worked on the ARDC-supported Australian data observatory platform which supports researchers, especially in the humanities, to access and analyze dynamic digital data, including existing collections of national interest across Twitter, XNOW, Flickr, YouTube, Reddit, Instagram, and gaming platforms such as Steam and Discord. Matt, can I get you to provide your first sort of overview? So each speaker will speak for five to seven minutes on their platform. We'll work through each of the speakers and we'll have questions and answers if we have time before the break and then we'll continue after the break. Sure. Okay. So the Australian Digital Observatory was set up to respond to a need across humanities and social science researchers to observe what Australians predominantly were doing in their digital lifestyles and on the web. So we call it a focus on human data on the web. The largest product of the ADO was the Australian Twittersphere, which now stands at something like two billion tweets across multiple years and three different malignated collections. We've also shepherded a range of open source tools to access those collections, to process, filter them, and we also provide consultation services, kind of research software engineers and data scientists, because perhaps the unique thing about, I don't know if it's unique, but it's a property of humanities and social science researchers in that they can be said to exist along a climb of familiarity with technology and methods relating to digital data. And so we see people all the way who are comfortable with technological tools like the digital humanities through to people that really don't have any grasp of the underlying technologies and possibly even quantitative methods in general. So we respond to the needs and the ADO's output so far is something like 17 direct publications, many more than that, ancillary kind of publications, other tools, other collections of data and so on. An interesting kind of property of that certainly is the ADO drew to a close so too there was a large change in the digital landscape, something that's kind of vexing humanities and social science researchers interested in data on the web. And that's what we're increasingly calling the post API era. In other words, the era where all of the platforms close up shop and the way that we used to be able to access them and harvest their data and no longer available, which means as a consequence, there is a great reduction in platforms, there's a great reduction in the visibility of what Australians and indeed people globally are doing in public in their digital lives, because the data that they produce as part of doing that has now become intensely valuable. So I don't think anyone's probably missing out on the whole fact that we had it. We've got an AI revolution going on and it's partially related to that that data now has great value and so it's being locked up. So we find ourselves now having to engineer new ways to kind of harvest activities of people on a much broader set of smaller platforms, which is a great challenge. But one of the lovely things about setting up the ADO when the ADO came to a close, it was a fairly easy sell to QT the institution that we should continue as a professional body to support in matters and social sciences. We became the HUS people within research infrastructure in our university. And that then meant that we're able to conceive of the new tools and the new platforms and so on, right the way imagining that they will be reusable resources, which is not often the case. It's often the case. It's been touched on before that you have a research project and then it might help. I think the analogy was it becomes, what was it? It came out of a nest. It became some kind of mess. I forgot what that allergy was. But anyway, so it was a cuckoo. That's right. Yes. And so, you know, and then there's this like, well, who's going to take it over? And who's going to think about sustainability and all the professional engineering practice that has to go into making these platforms long term within research infrastructure. That's how we think. So now when we interact with researchers, we have a different conversation. Okay, you want to research this. What data do you need? Okay, but we view that. How about you partner with us and shape the next kind of platform and the next kind of thing that can be more broadly usable across it. And we're delighted to be working with the ARDC to bring some of those platforms to life. And it's an opportunity that we're grateful to have. We think that that's part of a broader conversation about how we can cater for the range of technological skills, the range of different audiences and what future platforms look like. And the ARDC is taking good leadership in that area in what, in a generic sense, what infrastructure should look like so that Australia's cohort research software engineers have the guidance to be able to develop those platforms into the future. I think that's kind of it for me. Good. Thank you, Matt. That's really, really interesting and I'd like to now pass to you, Professor Jean. Louisa applies advanced analytics to health and medical big data, including retained collected hospital, Medicare and pharmaceutical records to generate real world evidence to improve health care and patient outcomes. She leads the ARDC supported the data asset for Australian health research, which is always good to have an acronym when you have a name that long. Linda and the research institutional cloud architect Erica platform for center of data. I'm really interested in this one as well. I think, oh, actually, I should say all of them. This is such a great crop of projects to talk about, but Louisa, can I pass to you to give you a brief as Ryan said I've been privileged to be involved in to ARDC funded project. One being, they both have female name acronyms for whatever reason, but one is Erica, which is a platforms project. And the other is Linda, which I'd have to say, I was a team member I wasn't the leader of that project. And Linda is a data assets that was specifically about partnership with public sector data providers for my one slide that I was allowed. I've put up a sort of a representation of, I think all the five safes framework that some of you I think will be familiar with, which is a sort of risk management approach for use of routinely collected and potentially sensitive data for research. The idea being there's a range of different controls that you can put in place to protect those, obviously highly private and highly sensitive but also highly valuable data. And it's been, it began in the UK, someone called Felix Ritchie developed it in the UK, but it has basically sort of spread worldwide as a framework that underpins use of these data and sort of you can see what's there we have safe people we have safe projects, safe data, safe settings, safe outputs. I think I put it up here so people become familiar with it but also because it's really important when you're thinking about five safes to know that it was never intended to be a prescription. It was always intended to be an often it's actually depicted as a series of sort of dials or something like a mixer music studio mixer, where in different circumstances, you might dial up one level and dial down another. So for example if you are using a secure research environment, you may be able to dial down a little bit the level of controls that you put on the data in terms of the identification techniques. I think it's important to realize that and I do have a slight fear that in Australia we just have this idea we've got the five safes and we dial all up to the top level, and that in fact maybe something of a barrier to research going forward. But I'll start with Erica so Erica was as I said a platforms project. There was a, you know, a big team of collaborators involved led by UNSW, which has, you know, developed Erica so it was an existing platform and this was a sort of trying to grow and replicate the community of Erica users through the platforms project. But partners included the New South Wales Governments and New South Wales Data Analytics Centre, the WA Government, the University of Western Australia, the Australian Institute of Health and Welfare, University of Melbourne, South Australia and Northern Territory Data Link and others. So quite a big consortium came together. You know, it was a rocky time because it was COVID and so, you know, government agencies in particular were really didn't have a lot of resources at that time to do many things. But during the Erica project, basically, I should just say what Erica is. It's the first of its type internationally actually that actually sets up a secure research environment sometimes called a trusted research environment within basically public cloud computing infrastructure. So Erica uses Amazon Web Services AWS cloud. Most such SRE secure research environments internationally use dedicated hardware. They're starting to be more now that are using public cloud, which obviously has a lot of advantages in terms of the scalability of it, the flexibility of it in terms of the different configurations you can have. But comes with a big cost in post, which is something I might get to at the end. So during the Erica project is there. They were there are now five. There was so there was only UNSW and New South Wales government now there's three others that are actually operational. So Erica instances are run by organizations. So because it's completely virtualized, an organization can set up its own Erica instance and run it. Interestingly, we thought that universities are going to be the people who wanted Erica instances. But as it turned out, it's its government departments who have sort of been the major end user case. And a lesson that's come out is it depends whether you're an AWS or an Azure shop, basically, as to whether or not Erica is a good fit for you. And that's something to think about going forward and we probably need sort of containerized solutions that can use multiple cloud providers going forward. But I know I don't have much time. There are 344 research projects that have used the various Erica instances during the period of the grant. And so examples of the types of things they have done would range from a project that looked at sex disparities in cardiovascular care and identified that women have who've had a stroke or who've had a myocardial infarction when they're transported by ambulance are much less likely to be identified as being someone who's had a stroke or had an MI compared to male patients. There's many reasons for that. But that was a sort of a new finding and it has driven work with New South Wales ambulance to sort of increase the degree of suspicion for cardiovascular disease, in particular in female patients. Often cardiovascular disease is thought to be a male issue. Another example would be there's increasing numbers of projects that have very, very large data sets, electronic medical record data sets, including things like images, a lot of clinical text data. There's a project called Sava in Southwestern Sydney, which actually uses cancer clinical data and is doing really advanced AI work to extract clinical concepts from what are basically patient notes that will then enable those to be used for cancer outcomes research. Linda, I'll talk about a little bit more quickly. As I said, colleague, Professor Claire Vardic at UNSW was the academic lead on it, but it was a really an equal partnership between the Australian Institute of Health and Welfare and the University of New South Wales. But there were actually 35 partners in it, of which 12 were universities and nine were medical research institutes. So you can see there was a massive interest in this particular issue. And it really addresses the safe project elements of the five safes. And it was about trying to get researcher access to national route any collected link data. There was a national data asset formally called NAHISI, which is now trans transitioning to something called the Health Data Hub, which was only available to analysts from government. There was no pathway for researchers to use it at all, which was really very frustrating for researchers. And through the work of Linda, and this took a long time, we now basically have established policies and procedures for researchers to access that data asset, and really excitingly for researchers, a very streamlined process. So in the past, I've been involved in projects that took four to five years to actually receive data and by that time your grants run out. So, and you usually have to go to every single state and territory to get ethics approval and data custodian approval. Now with, you know, the result of Linda, the new Health Data Hub requires only one, one sort of data custodian approval and one ethics approval. Like it's early days yet, there's researchers are all now clamoring and clambering to get access to it. But I think it's going to be a real step change in the national linked data research enterprise, which has been very much state based in the past. And for example, someone like me does research in New South Wales, we submit to a US based journal, they go, what's New South Wales? Yeah, but, but they Australia they get. So I think it's actually going to increase the impact. I just like to finish by sort of talking about two things that I think a future related one relating to Erica and that is this need for scalability that increasingly at the UNSW Erica we're finding that researchers are wanting GPUs or multiple GPUs within their project spaces. There and the cost of actually sort of running some of these projects within the AWS public cloud is becoming a real issue. And another issue is this type of research with the increasing availability of EMRs and things like Health Data Hub. It's just going to burden and some of the mechanisms we have in place. And so for example, at the moment, there's a person who checks every output that is going out from that. So it's, it's just not scalable. And I think we do need to be thinking about all of the mechanisms, including safe people and having much more trust. And how do we establish that trust and how do we accredit people, rather than this bottleneck at the end of every single output having to be checked. And the other thing I'd like to mention is just what has come out of Linda is I said there were 35 different partners in that. There was a steering committee that included people from government, a lot of researchers, but also consumer and community members, and it worked really, really well. It was very rare actually that people other than government had a say or a stake in in the in the in the governance and how things moved forward. And a strong recommendation that came out of Linda was that such national data initiatives involving government data that basically belongs to all of us. Do need governance mechanisms that are probably more skills based and do involve all of the stakeholders, not just the government data custodians. And so, you know, the recommendation came they should definitely be consumer representation on such governance mechanisms, as well as a researcher voice. Because it was something that we had during Linda and we really hope to see that going forward with the new people data comments. Wow, so much in that. Health space is such a great example to see translation from data and I think your example around the the de gendering of data and sort of getting those rapid insights are such a good example about the opportunities but so many more complexities and particularly the complexities around consent access those those aspects I'm sure we'll come back in the question time is so critical. I want to pass to Paul. This is it. It was great having the pre conversation with you because yours was one that I had no real great insight into personally and I thought it was really fascinating. Just the potential and just such an interesting different sort of facility. So I'd like to introduce it to Professor Paul row. Paul has published over 200 papers and received over 10 million competitive research grants. He founded and led the Microsoft QTE research center in clear in collaboration with Queensland government to investigate smart tools of research research. Paul undertakes novel interdisciplinary research, including through the ARD supported open acoustics platform for ecological monitoring. He also worked on a computer system supporting communications and collaborations in remote Aboriginal communities. Can I ask you to speak about your platform. Sure. Thanks, Ryan. Yeah, I'd also like to acknowledge the traditional custodians of the land and we're honored to pay my respects. So yeah, I'm a computer scientist QTE and for the past 12 years I've been working in eco acoustics. I suppose the heart of eco acoustics is trying to understand nature through sound. So the idea is that we can put out recorders across the environment. You can continuously record birds, frogs, potentially other species as well. And from that, understand what's what's going on. And this is important because we don't really have baseline data. Whenever I talk to ecologists, when I talk to ecologists, particularly from the northern hemisphere, they have baseline data. It might surprise you. We don't even know in this country where koalas are. We don't know how even koalas are distributed despite their, you know, great sort of the great amount of investment which goes into kind of koala research and public interest and sort of political interest. So we really need a tool and that's what I've been involved in the partnership with the ARDC to measure biodiversity. And that's what sort of the open eco acoustics project is. It's a tool to measure biodiversity, vocal biodiversity across Australia. So we've been, and what Adrian said, greatly resonated with me because we started off with our own sort of project that we'd put together through, you know, pulling bits of grant funding together and things. And really through the ARDC, we're able to sort of lift our platform. And now we have a platform that supports sort of hundreds of projects, thousands of users to deal with eco acoustic data. We have our platforms, so our databases, which are supported by the platforms store over a petabyte of data. And the reason that we need a platform is because eco acoustic data is is big sort of a petabyte of data and it's unstructured as well. It's not like music data. There's no sort of tracks. There's no staves or anything like that. You know, you just get a bag of SD cards and there are still bags of SD cards being passed around. In fact, last week at the eco acoustics symposium in Melbourne, we were passing bags of SD cards around. So we have this sort of large data and probably a year or 18 months ago, I just said that the other problem was the analyzing the data, the automatic analysis of the data was difficult. But actually, I think we've got that nailed now. I think we can now analyze the data pretty well through some of the new deep learning systems that are around. So we've sort of built this platform. The platform is an instrument. We have sort of lots of people using it. It's sort of web enabled. We do use high performance computing GPUs, all of those kinds of things. And it supports sort of projects, you know, across Australia, so it supports Nes project. So the resident landscapes hub, one of the projects there. Looking at some threatened species on Christmas Island, all of that data is going into platform supported by the open eco acoustics problem platform. We've got projects with the on the Great Barrier Reef looking at sort of seabirds and looking at how the sort of breeding success of seabirds. Probably one of the interesting ones was a project we ran a couple of years ago with the ABC for National Science Week, where we had a project where we had a citizen science project for community members to particularly for sort of schools and students to identify sort of owls and some of the data served by our platform. And once those hours had been identified that was then fed into the Eco Commons project, which is another ARDC sort of project. And we were able to produce new models of the distributions of these species. And we actually found that the hours are distributed in different ways from the sort of previously imagined. So that was sort of really exciting and a great way to bring these these different sort of platforms together. So, yeah, we've really produced a tool, perhaps on the few tools and we are a world leader in this space to measure biodiversity. And, yeah, we're, we're sort of very happy for the support that we've received from the ARDC, which has sort of made that happen to address this really important problem for for Australia. It's always very fascinating. It's just a such a great example of a platform coming together. We were meant to have Professor Grenier Warren join us today, but unfortunately she's unwell, but it has meant that we've got amazing Natasha joining us today. Natasha Simmons is really thank you for joining us at a short notice, but I think it's your perspective in this conversation is really valuable and hopefully will help us have an interesting strategic conversation about the nature of what it means for platforms and going forward. For those that don't know Natasha, she has a background delivering award winning research projects. Natasha led the ARDC's national coordination team of program managers, product managers and subject matter experts. She's passionate about bringing out the best in people and collaborating nationally and internationally to solve data challenges. I'd love for you to share your views sitting across. Thank you. So I'd also like you to take a step back in time. So if you remember April 2020, it was right at the start of the COVID lockdowns and it was a really uncertain time in our history filled with a lot of anxiety and a lot of horror unfolding on our TV screens. That was the time when I stepped into the role of associate director for data and services and under Adrian's guidance, my job was to implement the national data assets program and it seemed like a very challenging time to start that program. But at the same time, this was a time in history when data was probably the most visualized ever. You know, we had a real focus on health data on our screens every night on death rates on illness rates on the race to find a cure or a treatment for COVID vaccine that was all in our screens. There was heightened interest in data, particularly around health data. So on the other hand, it was a very good time to be starting that program. And when I came into the role, we had 25 projects under the three parts of this national data assets program. And I found myself on a number of project boards with really eminent people in their fields. So outstanding researchers, outstanding, you know, government staff who knew their data, outstanding clinicians. And I think who really knew the data that they collected and what they wanted to do with it. And I suppose that left me thinking, well, what is the ARDC adding to this conversation? You know, how can we add value to the projects? And I think I found that the value we added was actually in reimagining the data in ways that it wasn't originally intended. So someone collects the data for a particular purpose. And often they don't think in the future, this data could be useful to somebody else. This data could be combined with other data and made really useful if it was on a national scale. And I think we became a hub of expertise for fair data. So what Andrew Trelaw mentioned, the findable, accessible, interoperable, reusable, that part about how do I share my data? How do I combine my data? All those types of questions, they were the things that our team had expertise in. And I'm really proud of the team, particularly Catherine Brady and Julia Martin on the National Data Assets side and Kerry Levitt and Siobhan McCafferty on the platform side for helping to solve those common challenges around implementing fair in the data and platforms program. So just to go through some of those challenges on the fair side, one of the things that's required for fair is that you implement persistent identifiers, which is something very close to my heart, due to my heart. And we actually found that a lot of the projects had trouble issuing just a digital object identifier for their data. That wasn't very straightforward for a lot of the project partners, even though we have an ARDC infrastructure service that offers the DCOIs free of charge. And part of it was even going a step back from there that they didn't actually even have a repository in which to store the data to then get the DOI and expose it. So that was one of the main challenges. So we brought to bear our expertise in that area and helping to get that happening. And I'm just going to take this chance to advertise that we now have a national persistent identifier strategy and we're working through the roadmap. And Rosie spoke to the deputy vice chancellor of research group this morning for universities Australia talking about this. We had a lot of great engagement just coming off the back of the accord, the release of the accord as well. So if you want to read that, that's out and we hope that you'll engage in that process as well. So some of the other challenges I think, so I think our team also brought a bringing together of community to solve common challenges. So some of those challenges were around cross jurisdictional data sharing. How do you get data from the public sector and combine it with research data? How does Queensland government data get shared with South Australian government data? And the technical stuff is often more straightforward even though it's challenging than actually the governance aspects and who has ownership and who has permission to do this and is the data sensitive. And so Julia coordinated a cross jurisdictional data sharing working group, which brought the partners together to try and problem solve that and share from each other. So that was our role in bringing those projects together, bringing some international experts into that to look at that. Another area that was really challenging was around sensitive data. And Nicola who's here runs the sensitive data community of practice where we could actually bring people together to try and solve those problems. There are obviously slightly different ones in health to what they are an environmental data, but nevertheless very similar challenges. Another challenge was around demonstrating impact. And this was like how do you say that someone has used that data asset or use that platform and then created this wonderful research that's changed government policy and had these outcomes for Australia. So that was also very challenging area and we had an impact reporting working group that Julia led as well to take the programs through that. And I think that's a work in progress. You know, I don't think that's a really easily solved thing, but I think it's something that we'd all like to do having invested so much time and energy into creating these with the platforms community of practice. One of the focuses there that Chavon brought to that community was around sustainability. So they talked through, you know, it changed the way people thought about this. It's not just a project. How do we sustain this beyond the life of the project duration. And that resulted in some partnerships with industry to be able to get future funding for the platforms and also some sort of tiered access models for the platforms that enabled them to be able to sustain that beyond the life of the project. So I think the other thing we brought was so COVID was very difficult for the projects to get that engagement that they had planned with the end users of these data assets with the researchers. And so we had a program that came just at the end of towards the end of the program called Community Connect. And that enabled all the projects who wanted to go through that program to develop some training materials that were sustainable that would help researchers to know about the assets. That's first thing. And secondly, to be able to use them and those training assets were put on Drisa, which is a discovery portal for training materials in Australia. So I think, you know, the end result of the program is that we have these wonderful data assets and these wonderful platforms and wouldn't it be great to be able to sustain sustain those into the future. And to see from outside to see those go into the thematic data commons will help to do that and will help to elevate them to even different levels. So if we go back to the start that's taking the data that you thought was just for me just for my research project into Wow, these are now national scale data assets because it's unbelievable. I one of the things that struck me at the start was exactly what Rosie said. Why don't we have these things already? Why don't we have a national air health quality database? You know, why aren't emergency medical records at hospitals on a common data model? Why don't we have a reference data set for invasive species? And these programs enabled us to be able to tackle those big challenges. And I think it would benefit Australians and our environment, our economy. Wow. That was the perfect closing up on that piece with less than 24 hours to think through about that. Wow. Thank you so much. That was a that left some really interesting questions. I think we can really explore. I'm always conscious as a facility had never stand between someone and coffee and morning and afternoon tea. So we'll break briefly now. I think we're breaking for 20 minutes for those that are online and come back for questions. I'm sure there'll be many in the room and I understand there's a process for online as well. Have a have a great break. Thank you for your questions. If there's any particular ones on pick them with us with the panellists in the conversation, but look forward to coming back and exploring a number of those issues with everyone. So thank you. So where I wanted to start first with the panel was as people sort of ponder their questions, is are there any particular unintended consequences or unintended uses of your data commons, assets that you hadn't actually expected, but it's been a really exciting sort of new addition and thought that's come out. Do you want to take on Louise? What I already mentioned, which was that it's government agencies rather than universities today who have decided they want to run their own secure research environment. But another would be that some data that I never thought about as being sensitive, sort of is regarded as sensitive. And so there are a number of projects from the really relate to sort of business inventions and those sorts of things that have decided they want to use the Erica secure environment. So it certainly spread beyond the health. There are projects relating to justice housing, which obviously are sensitive. But then this sort of commercial and confidence appears to be another use that some people are making of it. That would be my answer. Yeah, many, because we collect like a fire hose of human activity on the web, we're always surprised by what people come to us. But I think there's an emerging theme, which is quite interesting for our era, I suppose. And that's an interest in misinformation and the influence of actors, political actors or otherwise on public discourse and election interference, foreign interference, things like that. As it turns out, we've been preparing our data sets because all of our data is human language. It's very well suited to sticking a generative AI in front of it and being able to query questions, query the massive trove of data in natural language. But the systems that you need to put in place behind it, which are actually not very different from the systems that Paul uses to archiving petabytes or finding things in petabytes of audio data. Those same things just are the same as the way that you could find a frog or something like that. You can find a fingerprint for a kind of a misinformation. And recently we've had a storm of scholars interested in the Indigenous voice to parliament and the failure of the referendum. And the AEC has a list of official kind of misinformation points. And it turns out you can calculate a fingerprint over any of those and then look at a massive human data sets and go back and find out where these have spread up over time, which dark communication as they call it sort of has started this rumor and find it building over time in other social platforms. So that's pretty amazing. It's not something that we had in mind or it wasn't the use of the technological systems that we had in mind for one sort of query. They were much more useful in that second query. Paul, what about for your asset? Yeah, we've had a lot of creative uses of our data, actually. So people like Leah Barclay from University of Sunshine Coast just created installations using our acoustic data served up from the Open Eco Acoustics Project. We've had musicians using the data and composing things. And even myself, I've been up on stage at Woodford presenting the Open Eco Acoustics Project and our acoustic observatory. And then I had a couple of my musical colleagues who were then sort of jamming along with some of the sounds, which was funny. You're really making data cool. Yes. Sorry. Yeah. I mean, I guess more seriously things like noise pollution, we've got people looking at noise pollution in our data. Overseas people have looked at kind of the influence of what happened during COVID when there were fewer planes and things like that. That wasn't with our data, but you could do that with our data. And I suppose more generally with things like the acoustic observatory, which I guess is the data which is enabled through the Open Eco Acoustics with now shown that you can use that for finding threatened species and invasive species. And that was something which was kind of previously sort of unknown, unless you had a sort of a targeted monitoring regime. I think we always think about sort of using DNA in water and those things to find this sort of a, but I never thought about in the acoustic space. It's such a great, great. There are some similarities with sort of DNA and the kind of things that we do. Natasha, thinking with your perspective, sort of looking across all the different platforms in your role, is there one or two really stand out examples of, wow, we just didn't expect that that used to be. Actually, rather than calling out a single example, I think I'm just going to go for what I think was an unintended consequence or outcome, which was a lot of different collaborations and partnerships. So if you think about research, a lot of research is collaborative, but when it comes to applying for funding, it can be quite competitive. And during the expression of interest period for the platforms program and for national data assets, that process included the ARDC brokering to pair different partners together who perhaps had never thought that they'd be paired together to develop a national data asset. So I think that was an unintended consequence. And I think that's actually stood us in a good stead, moving it forward. And people have been surprised at the outcomes of those collaborations. That's really great. I think that's been, from my experience, watching the ARDC over the last, gosh, nearly 10 years. It's been an interesting journey to see it really expand in those different areas and not be a traditional IT platform. So it's really allowed a real transformation, digital transformation of research by not being an IT provider, but actually being a business and research transformation platform. I think that's a really great unintended consequence. And I think that's part, the project's been great, but also this the legacy of what the ARDC has provided in this space. You said it, so I have to go there now. It's so hard to have a conversation anywhere that doesn't have AI. And it's, you know, we all know this. It just happens regardless of whether you try or not. And even when you intentionally don't, it does come up all the time, particularly now with around generative AI and the data platforms and assets that ARDC has supported provide such a great foundation for generative AI, whether it's sound, tech, structural, otherwise. Really interested you unpick that a little bit more and whether you've, whether you've pushed in that space much. Yeah, sure. Where you're seeing the sort of next tranche with your work with QT. Yes, it's without trying to be breathlessly kind of exaggerating the impact of it. The impact has been profound. It's been profound for me as a software engineer. I'm, you know, over twice as productive as I was two years ago because of generative AI. Our ambitions for building platforms and software have increased dramatically. And particularly for humanities and so sciences, as I said, our data is, is natural language. And so large language models are just an amazing way of querying natural data sets. And it's brought challenges at plenty because it brings us into a new realm of needing compute resources that we didn't have before. And we used to touch on that before some of that difficulty that comes with that and the expense of cloud providers and so on. But I would say now, you know, like one of the things AI has done is it's enabled us to conceive of making ever easier to use interfaces for researchers. Like previously, for example, a researcher that wants to engage with your data set would probably need to know, say some SQL, the means of interacting with the data, with the database. What they could do is probably limited by the data schema that whoever collected the data set. So that can sometimes be a limitation if they haven't anticipated future uses of it. But now, even the act of making graphs, like even a little hacky hour thing that we have every couple of weeks where people join our and get advice on the typical kinds of methods and computational things that we do. Participation has dropped off because people can self serve. And now these days, instead of like showing them some code and saying, sure, I'm going to teach you how to use, I don't know, matplotlib or as graphing packages or something like that. I walk them through the higher order of problem. If you can understand that and then you can explain it to a generative AI, particularly open AI is what's it called a data explorer or code interpreters what it's called is amazing for that. It's been profound. It continues to be profound. And it makes us wonder about what the next data platforms are going to look like. That's the biggest impact for us. It's pretty exciting, isn't it? And it's a, I'm glad. Okay. I remember early, gosh, I'm last year. The Academy's Nicola worked on a piece around generative AI and it was a, in the city early just past chat GPT sort of exploding and it was a, there was a lot of fear. It took a lot of effort to sort of think about the opportunities and where things were going. And it's really exciting. The sort of matured over the last 12, 18 months on that journey. And I think these sort of platforms provide such a great opportunity to think about what the art of the possible and what's next. The health data obviously impacts are profound as well. But one perhaps side effect of the fact that health data have been so locked up is that Australian health data are not out there and incorporated in the current large language models. There's several sort of publicly available US data sets. And a few months after GPT three came out, there were things on their website saying you're not allowed to use generative AI with this. But believe me, it's already in there. So we actually have a bit of an opportunity in a way with our big resources of Australian health data to think about what we could do with them to create assets that will help to improve health care. And one thing that our team is working on is generation of synthetic data using based on real data. But what you can do is actually sort of enhance the real data you have to increase the representation of certain what they call edge cases unusual things or unusual population members to therefore build a more robust sort of AI model for prediction. But there is a bit of a downside as well, which is that no one completely understands what are the privacy implications of releasing the trained AI models and whether or not they can be used to sort of recreate the data that they were trained on. And that goes far beyond the sort of basic output checking that is currently going on. How do you actually assess whatever privacy risks may be associated with the release of a trained model. Paul, you were I saw you scribbling notes there and I feel like you've got this really deep thought. I'm really curious from your perspective, where you're seeing the new use cases, particularly I related otherwise. Yeah, I mean look with the opening acoustics we're sort of firmly in that big data space which which is sometimes called the fourth paradigm of science with the first is sort of empirical the second is theory and the third simulation. You've got a petabyte of data, you know, 1000 years of data then you can't listen of audio data you can't listen to it so you absolutely need to use AI to process it and that's what we do. You know, we've had a partnership between us and Google where we use some of their new models for processing the data and that's been that's been truly amazing it's really unlocked the final piece of the puzzle to making eco acoustics work. And I guess getting a bit technical the thing that, and this was mentioned time and time again in the eco acoustic symposium last week was the use of embeddings which sort of this last layer of these deep networks but essentially what it means is that, you know, if you've got a lot you know if you've got petabytes of data the issue is that you can't sort of download it download a little bit of it so the question is well how can you sort of analyze that so we can kind of analyze it on our open eco acoustic system but if you want to do some custom analysis how do you do that well with the new sort of AI systems we can actually sort of pre compute these embeddings so sort of pre analyze the data so we end up with a smaller amount of data that people can then analyze locally. So it really means that if you've got large data sets you can start to process them both in the cloud but also if you need to do something that's a bit more bespoke you can kind of do that locally and there's a lot of interesting kind of games and things that we can do and that's been really exciting. And it's important because you know there are, you know, a lot of bird species a lot of them we don't know the calls and things they make so it's a very poorly you know open eco acoustics from a data analysis point of view is very poorly constrained and so we're not often quite sure what it is we're going to look for so we do need things like this sort of search tool that we that we recently sort of built into the open eco acoustic system uses these new deep networks and things. I mean the other thing which I'm thinking about a little bit further down the track is what people are now calling the fifth paradigm of science which is where you use AI to actually sort of drive the investigation so you're not just applying AI to analyze the data but you're using AI to sort of do the science as a kind of an RO and I think that's really interesting and particularly once you've got a lot of this metadata and system IDs and things like that that people are talking about and can describe things I think then we're in a position to make a lot of that sort of help so I think the ARDC with a lot of its initiative is really positioning itself well for the future of science. It's wonderful to hear AI talked about in such positive light. So it's really good to hear that. But the other factor involved in it is trust you know when AI is involved and in fact we were asked this very question Rosie had a much more eloquent answer than me this morning at the BCR. But really I think what you were talking about before about the persistent identifiers you know that's adding an element of trust into the system that says yes this person is this person that they're talking about and they are associated with these works and through the orchid IDs for example so I think that program contributes to trust which is really essential and they are machine readable so they add to that AI environment as well. Oh the question down the back here. The lady was performing. Oh thanks. I was just wondering if we can have just a minute to just talk about curation. Should we be doing it. Who should be doing it. Who should be paying for it. I really like the conversation went towards AI so maybe that's part of the answer but at the end of the day we need people. One step collecting the data. Australia is probably leading the world in how we manage the data and creating infrastructures. It's pretty amazing when I heard about Nectar and all of that. We should be very proud of it but now we need to go back to the people element I think but I very well most likely can be wrong. Great question. Anyone like to take the first step. It's a hard one to answer that one. In my case it's mainly government data and government agencies don't regard the sort of curation of the data for research purposes as being a priority for them which is quite understandable. So it's a problem. An ARDC has sort of stepped in and helped support some of those things but how we do it for multiple different data assets going forward. It definitely does need resourcing. They're probably awesome tools and AI tools that will assist in things like automating the creation of metadata and all of those things that will make perhaps the demands less but yeah we definitely still need people. I think data sharing has to be taken into account when you first curate data because if you try to share data at the end it's actually quite difficult to do. And so at the moment it's difficult because there's no allocation to incorporate that factor of data sharing in your the grant that you get. And so the money has to be found elsewhere to be able to to fund that activity but it's definitely an underfunded activity and challenging. So we're not going to have all the answers. I wonder if anyone in the audience does would like to add to it. There we go. Is it a question? I'll stop shouting. But to some extent the the policy work that we've done through programs like the the data programs the platforms is going to lead the way to at least automating or streamlining those processes so we can do it with less people. We shouldn't have less people. But we do. So hopefully some of the structures we put in place will allow us to build that case for how many people should be doing that work going forward with a very strong evidence base. It's going to be a perennial problem and continue as the amount of data increases and complexity and people are now thinking about those opportunities and bringing things together in different ways. So it's a journey I think we're all going to be on for a while. Hi, I'm Helen Belfanti. I'm from the Department of Defense question mostly for Louisa, but maybe the others is the data in the database aggregated or identifiable in the database understanding that you release the identified data out into the world. But and then if it is identifiable how what sort of things have you done to keep it secure. So the data I'm talking about is unit record data about individual people. Usually sort of longitudinal and often linked across different sectors so you know general practice hospital and so on. It's always considered to be potentially re identifiable even if the direct identifiers so name address date of birth and so on have been stripped off because clearly once you have a certain amount of detail there. There is the possibility of sort of just accidentally recognizing someone that you know, or a bad actor who has information about an individual being able to sort of, you know, work out who someone in a data set is. But, you know, basically that's why we have the five states. You will probably be aware of some years back there was a release of a 10% sample of MBS PBS data. There's no real intention to that trying to open up data but clearly it's never going to happen again and it shouldn't have happened that unit record data about all those people went out there. So really unit record health data increasingly only ever going to be available in these trusted research environments or secure research environments, people vary about how many sort of treatment should be applied to them within those environments. But if you've got the other safes there and robust, then try keeping as much detail as you can in the data actually very much increases its utility. But nonetheless, you know, there are varying views on that and there is a pervasive idea of data minimization, which I don't tend to agree with because I think particularly when you're starting to use machine learning and AI sorts of techniques, you want as much data as you can. And everything can potentially add to the prediction the model things you didn't even expect. So if you have to pre specify everything that may actually sort of, you know, blunt your, your research. But really that's why we have that all of those controls, those five states that it is a unit record person level data. Yeah, so a question online. I'm just going to follow go back to the generative side of things. And so a question came through from Donald Holburn from the Australian plant Phenomics Phenomics facility. He's asking about how can we protect ourselves from simulated data generated to pollute our knowledge space. What are the cybersecurity implications of this for the platforms that you've been producing. I'm glad you guys are answering the questions and not me. I don't know how is a simple answer. Because as a reason was talking about this earlier, are really good at generating synthetic data. So by extension, it's possible for them to generate fake data. The only thing I can say towards that is this is an active call from the Department of Defense right now is actually to figure out mechanisms to answer that question. So I'm not aware. I'm not aware who knows or who's researching it. I just know that there's a lot of interest in Australia from a security perspective and obviously from a, you know, a research safety perspective as well. And I think that's a really good example. When you think about your question, you're when you're talking earlier about fingerprints in the data and that we're thinking so. I think it's the idea of defense coming to the community to understand these sorts of ways to find synthetic data. It's a really good way of translation. I think it's a new way of thinking through you. How do we how do we work closer together? How do we show impact? And to me, that's a great example of unintended consequences and uses of these sorts of platforms. I don't think there would have been a way of asking that question earlier. Yes, absolutely. That remains the gold standard. So while I guess I can sometimes sound like a cheerleader for AIs, I actually believe that AIs in partnership with people is the most powerful kind of tool that we have. And the creation of gold data sets is, you know, can be an AI, for example, pulling in some data into some kind of viewable record and then mark one eyeball having a look at that and applying a human judgment to it and marking that whether that's good or not. That's the gold standard in data. So it remains a solution only so far as the human can recognize it and only so far as you're in control over that data that they're looking at to judge because we don't, you know, the humans can't judge fake data in front of them that are created by synthetic systems. Quite answer your question. But interestingly, we actually run into a problem, a couple of different problems with fake data. One is that we get ecologists doing cool playback in the field to actually play, you know, calls of rare species, which then get recorded and then suddenly get an alert coming up, hey, rare species. What about a lion bird? Is that fake data? Like is that a chainsaw or is that a very clever bird? Yeah, that was my second example. But interestingly, people can tell the difference between live birds and little chainsaws and mimic chainsaws and similarly with other bird calls and we found that the AI can as well if you train it well. And I think even with the cool playback, there'll be contextual information. So there are some things we can do. Just going back to the current vaccination and AI and defense data. So the project I'm working on is designing a health study for the end of the year. And obviously there's a lot of sensitivities around defense health data. And based on what I know, defense health data will be able to be stored in a database with the general population health data because there are things around specific numbers on the defense force. But where I think AI could come into it is taking the civilian health data and the defense health data identified and then comparing and trying to, you know, provide some sort of analysis and pulling it all together. And I think that could be very powerful. And then taking it a step further, doing that on an international basis with five eyes partners or something like that. There's a lot of interest in distributed data analysis in health because almost all countries have a data sovereignty principle. So you can't bring data together. But if you can sort of distribute your analysis and then bring it together through some form of meta analysis, there's actually quite a lot of cool technologies developing in that. Yeah. One of the things we're looking at with the survey we're developing is finding common data points between the five eyes partners. So that we do end up with an international commonality that can be would become very powerful in analyzing health data internationally. So it's interesting. One more question. Okay. Following on from what Matt said, to give you a sense of the scale so GPT has 180,000 context length, which is like 180,000 words. That's how much you can think about it once that's new information. The Google model that came out a week or two ago has a one to seven million word context length. And as an example, you can upload, you can give it a line, there's a language like if the language is never seen before you can upload a dictionary and 500 pages of examples comparing English and some new getting language that only 200 people speak. And in a couple of minutes, it can learn that language and translate as well as a human can. So, and that's doing that in like Ram, essentially, so you can take them, you can take a language it's never seen before, and it can learn it on the fly. So that that's, we've had a 70 times increase just last week. And, you know, we're only a year into this right. So that when you're thinking about what's possible that that's the scale that we're suddenly getting as it can like read all your work you've ever done and be thinking about that live when you talk to it on top of its baseline. So some work that we're doing as an example of the improvements in context length is what they call it technically for lm's 10 million words actually Gemini 1.5. It's quite well 10 million tokens but you know it's, it's a lot it's enough for a novel but what's actually really interesting is that you can fit an entire methodological kind of paper into context. And then you can ask the AI to apply that method to some unseen data. And it actually works. It works for, you know, this kind of text analytics that there kinds of the way that I explain this to people mostly is saying if you can explain it to an RA. And you can get things to, you know, that I want you to look at this text and increasingly not just text text and images and even video. Then you can task an AI to do it. It's, it's quite remarkable and even though I spend every morning checking the news to see what new has happened in AI. I still don't know, you know, where exactly this is going, but to address what you specifically said, the fact that we can put quite large chunks of data that an AI, you haven't pre trained it. So what that means is you haven't done this ridiculous computationally expensive task of training the model. You've just taken an off the shelf very, very capable model like GPT for fed it a load of research data and now ask something quite sophisticated over it. That is quite remarkable. And we need to be thinking about what that means for data. We need to be thinking about the fact that all of us need to be more ambitious and thinking that we should be doing less kind of small data plumbing and thinking about the bigger questions. And that's a, that's a conversation I think everyone needs to be taken part in right now. I'm conscious of time and I never like keeping people further, but I would talk to her at a range of challenges and opportunities. It's a really interesting pathways. And it's been amazing to hear about these projects and and where they've got to and what and Natalie from a sort of a strategic, you know, what's the overall picture. This is a leadership forum. So I know this was on the script. So Rosie, but what is what would each of you have as your provocation to the sector to the everyone on the on the call here on the room here about, you know, we've had these platforms, we've got these assets, we've got a future beholden. You know what, you know, what would be your challenge provocation from your leadership position because I think all four of you bring amazing leadership for the platforms and the areas that you're looking after. You know, in maybe 10, 15 words, given I've got two minutes before people can start waving signs at me in the back. What would be your, your, the thing you don't want to leave with this audience? Louise. People data commons to actually consider electronic medical record research and all of the sort of AI possibilities are there because that is where the research cutting edges now, not the sort of structured data. That's really 15 words. Well, because I said a lot about AI already, I'm not going to talk about AI. I think we still need to kind address come up. We always need to be addressing but come up with a problem solutions for sustainability of increasingly sophisticated kind of data sets. Like we've talked about AI so I can tell you now that what we're working on a data sets that have AI fronting to them. But that means now there's a bunch of technical complexity that's built into the platform. The more technical complexity you have in a platform, the bigger questions you have over who's going to maintain that and make it run the next year and the next year and the next year. And this is kind of institutionally a bit of a problem in research and it's, you know, it's going to be a more of a problem with this growth of the technology as part of the data sets. Definitely. Cool. Yeah, I mean, we're, we're talking about, you know, platforms and data, but really it's about people. The most important thing I think is about building communities. And I think that's something we've done well in the ecoacoustic space. And I'm just echoing what Kew Possingham said at the machine observation. They are DC workshop, which was run a few days ago. And that's in the end, we're a small country and we absolutely need to collaborate. We've got to collaborate. Yeah, I think I'd like to, I'd like to see us just build on the successes that we've had in this program and on the partnerships and the collaborations that have been built. And I think the things that worked really well in the program, particularly around coming together to problem solve. So we've just done a little bit shared a little bit around the AI challenges and it would be wonderful to see that continue into the future. To see that applied in for the ARDC and our thematic research data commons, but also beyond that, you know, into the broad sector, the benefits of this program can have ripples across the whole sector. Great. Well, thank you so much. Thank you everyone for joining us. Thank you, Rosie for an ARDC for CELO and Liz. Thank you everyone for joining us. Rosie, do you have some closing words you'd like to share? I do. And I have my prompts. I might stick to the script. Yeah, is that good? So obviously I'm going to say lots of thanks. However, I'm going to answer Ryan's question so two can go off script. Two. Ambition and focus. Okay. So thanks everyone for your contributions. Nicola is laughing at me. They're the first two of the ARDC values. All right, go on. I'll do the rest. Thanks, Nicola. The next three values. Collaboration. Flexibility and transparency. Bingo. Clap to me. Thanks everyone for your contributions to this afternoon's discussion. A particular thanks to the 120 of you that joined us live online this afternoon. Often I think we register optimistically for things online and we promise ourselves we will watch that later. There's another 120 people in that bucket, but thanks for joining us. Truly exciting to hear the impact of these projects, the scope of what's been achieved through the national data assets and the platforms programs, absolutely incredible. So a huge thank you to all of our speakers today. A special thank you to Ryan for his expert facilitation. For all our wonderful panel members, I believe we have a small gift for you here today, but most importantly, we extend the hand to look forward and continue these partnerships into the future. A big thank you to all of the ARDC staff for making this afternoon possible. Asher for the event logistics. Thank you again for a marvelous afternoon. Thank you, Julia. Where's Julia you're not to Julia. Thank you. Catherine was there Catherine Myra. We don't have everyone Myra Shannon Joe Sean Shannon you're in here twice that's two lots of thanks. I'm Nicola Shevon Mary not everyone's in the room but it's it's truly a team that brings this amazing work to you. I'll say no I won't say goodbye to our online participants just yet, because as well as bringing together this afternoons program, we want to say a special thank you to a couple of people, not just for the sprint this afternoon and the celebration but for the hard yards over many years in making these projects so successful and Natasha you're going to say this bit aren't you. Yes, and someone has run off to get goodies. So I will start that. I really wanted to thank Catherine Brady and Julia Martin who have been the program managers in national data assets. Many of you in the projects who have done all the really heavy lifting to make this project possible have had support from Catherine and Julia to do that and they really just extended themselves in every way to to help your partners as much as they could to complete the and deliver those national data assets. So I've just got some flowers for you guys and also to Siobhan McCafferty who was the platforms community of practice manager and brought the platforms community together so beautifully it's just so much appreciated and thank you very much for the work that you did there and and building on all the hard work that was done by the platform, put by the platforms project partners as well as the platforms community. Flowers have been delivered into right hands I see and chocolates for Siobhan because like me she's from turbo and younger a country in Brisbane and can't take flowers and playing so there we go. So please join me with a round of applause for all of you. It was really important to acknowledge our staff while we still had the online audience with us you are now free for the rest of your afternoon and we've done that bit. So we wish you well thanks for joining us and for those in the room. Please do join us continue the networking over perhaps a drink or two so thanks very much everyone.