 Hello everyone. I'd like to begin this presentation about the Language Data Commons of Australia by first acknowledging the traditional owners of the lands on which I'm making this recording and working, the younger and durable peoples, and to acknowledge the custodianship of the land and to pay my respects to their ancestors and to their descendants. I'm making this presentation on behalf of a large number of researchers, stakeholders and organisations and I'd like to thank all of them for their contribution to the plans for the Language Data Commons of Australia. Australia is sometimes thought of as an English speaking country and while it's true it's an important lingua franca in society. Australia is actually a massively multilingual society and it's situated in one of the most linguistically diverse regions in the world. There's more than 300 different immigrant languages spoken in Australia, including English, in addition to hundreds of Indigenous languages as well. And in Australia's region, in the Pacific Southwest, actually more than a quarter of the world's languages are spoken. And Australia is also home to some of the world's oldest Indigenous cultures, which stretch back as we know for the 50,000 years, which is much longer than what we think of as often as ancient civilisations like Egypt and so on. And work on languages in Australia has resulted in quite large collections of this language data being collected and amassed. But the current situation is a lot of those collections are either underutilised or even some of them are at risk. So the aim of the Language Data Commons of Australia is to federate discovered access to these language data collections, particularly ones of high strategic importance for Australian researchers and communities. In order to start working on that, that means we need to start working on a national governance framework, which is in partnership with Indigenous Australians. We need to develop a comprehensive language data access policy framework, which involves issues of cultural, legal and moral and ethical considerations. We need to also develop a shared technical infrastructure and shared standards across different institutions, which are custodians of language data, that involves building a portal for discovery and culturally, ethically and legally appropriate access to this language data. Finally, it's about engaging researchers and stakeholder communities in more systematic use of our national cultural heritage. The aim of the Language Data Commons of Australia has Research Data Commons project is to work towards this larger vision of a national language data commons for Australia. So we're working to do this by capitalising on existing infrastructure. There have been a number of past investments from ARSC, from the Australian National Data Service, ANS, from Australian Research Data Commons, and also from NECTA. A particularly important piece of infrastructure that we're working with is PARADESIC, which has long established itself as a really vital part of Australia's research infrastructure for languages. We're also doing our best to work with current investments from ARSC and ARODC. This includes investments in the ARODC Platforms Program for the Australian Text Analytics Platform, and also through the ARODC Data Partnerships Program for which we have some initial investment in Eldaka. Finally, we're also working with the ARSC-leaf-funded NINGAR platform, which NICTIBA is leading from Melbourne. Capitalising on existing infrastructure, our aim is to be securing vulnerable and dispersed collections. What we mean by securing is that collections are preserved in digital objects, so they're durable. Also, we're trying to secure access to that language data. Thinking about the latter, of course, there's an increased focus on community rights and access to data, so not just for a narrow group of linguists, but for researchers and communities more broadly. Finally, our aim in bringing this language data together is to improve analysis environments, which allow for new research outcomes. It's probably fair to say that the potential of a lot of language data collections still remains untapped because of limited access to tools and to skill sets and to infrastructure to carry out this work. There's also new language data sources, including the World Wide Web, social media, which are providing vast tracks of data. We need infrastructure to enable large-scale analytics of that kind of data. The proposed work package is in the LDAC-HASS Research Data Commons project. They're divided into four main streams of activity. In the first stream of activity, the focus is on securing language data collections. What we mean by securing is both securing data and associated metadata itself as preservable digital objects, but it's also about securing access to that data in culturally, legally and ethically appropriate ways. The second set of work packages come under stream three, which is about aggregating language data collections. This is about enabling researchers and communities to draw from different sources of data because we know different language data can be held across different institutions, and sometimes that data can be very difficult to find unless you know where to look. The third stream of activity involves improving our text data analysis environments. What we mean by text data analysis, sometimes when people think about text analytics, they think about written text, but of course, for people interested in language, text is not just written text, it also includes spoken text. It also includes video recordings, signed text, and so on. So we're looking to provide ways for researchers to make best use of computational and also NLP methods and tools in undertaking language research. Finally, stream four is a range of work packages which are focused on strategic partnerships and also engagement and training of our researchers and communities who have an interest in language data in Australia. We're delighted to be working on the language data commons of Australia within the broader HASS research data commons and indigenous research capability program because we see an enormous lot of synergies between what we're working on in this particular initiative and this broader program of work. So one point of synergy that we can see is around authentication and authorization for not only individual researchers, but for research groups and also importantly for communities. A second really important synergy that we see is our commitment to a community-driven approach to access and governance of language collections in Australia. Finally, another point of synergy is data intensive humanities, which we think complements are focused on data intensive social sciences and that's where we see the shared need for working with other partners within. It has RDC and indigenous research capability program on APIs, on large scale text analytics tools and so on. We see a lot of potential there as well. There are a number of anticipated impacts of the language data commons of Australia project. This includes establishing sustainable workflows for securing language data collections of national significance. We're aiming to democratize access to Australia's research linguistic heritage. So it's not just the preserve of a small group of researchers. We're working to demonstrate how we can work with indigenous partners on balancing research needs with preserving community rights for language collections. We're aiming to uplift the digital skills of researchers and communities working on languages and also develop the technical infrastructure that's required to analyze language data collections at scale. I would like to highlight the contributions that language research can make to STEM and health disciplines and also open up the social and economic possibilities of Australia's language data for translational research. Finally, we see Elzaker as positioning Australia as a major international contributor of language data and digital research infrastructure. There is no national language data commons without a community. So in our project, collaboration communities are built into the center of what it is we're doing. In terms of how we want to go about encouraging collaboration and engagement with our stakeholder communities, we're wanting to model the benefits of a computational approach to research disciplines that use text data, which are not restricted, of course, to the humanities and social sciences but spread across other disciplines in the natural sciences and the health sciences as well. We'd like to be guiding the development of reference research applications across multiple disciplines through providing access to language data sets and training in new analytical tools. And we're committed to enshrining the right of communities to control access, contribute to and govern the collections which they have a stakeholder in. We're also working towards sustainability. So one of our work packages is focusing on exploring how we can work with stewards of language data collections in universities and also in Glam institutions as well to ensure long-term preservation of those collections. Sustainability is also reduced around trying to reduce the total cost of operating, maintaining and upgrading infrastructure through maximal reuse of existing tools and platforms and also trying to achieve cost efficiency by demonstrating the utility of our services across arranged domains and applications. The project itself has a nominal amount from AREDC and we're matching that with cash and in-kind co-investment from our collaborating organizations. On behalf of the LDAC team, I'd like to thank you for taking the time to listen to our presentation. We look forward to your questions and your feedback on our plans, the project plan, both at the round table and any feedback that you can provide through the AREDC website. And for those of you who come, I look forward to seeing you on the 21st of September.