 We might kick off proceedings if that's all right. Apologies for the delay. My name's Grant Sara. I come from Gurungurung country. I'm just here to moderate, facilitate, stir the pot, give people cheek and progress the day. We were going to have a official welcome to country, but that's paused at this point, but if Thane does come along later, we'll add that in. But first things first, we need to acknowledge our presence on the land of the Wurundjeri people within the Kulin nation. This is not about today and tomorrow. It's not about talking about Wurundjeri people's law, culture, custom, pain, suffering, trauma. This is about data. And what I would do is like to acknowledge our presence on their land and embrace the values that sit within our society that are to be in this room in the next two days. Those values are about caring, sharing and respect for the land, the people and the environment. They're consistent values held right across Wurst nation's communities across Australia. And they're the values that I always like to see come into rooms that we have like this, where you value and respect each other for what you've got to do. You're all doing your own little bits to try and improve outcomes in this space. There's no experts in this room. I see that word written. No one in this room has the right answer and that's an impossible feat, but you're all here to collaborate, communicate, share and value the knowledge that you're going to share with each other. I want you all to think about who you are as individuals and what your role is in this conversation over the next two days. But start by recognising that you individually are all unique, special, but no one's perfect. You look in the mirror, you look at yourself, you look to the windows of your own soul, you've got to say to yourself this, harbour, harbour, what are you going to say? Harbour. That's right. What are you going to say? You've got nothing to do with Aboriginal values, you've got to do with Curly Sue, the movie. You've got to feel good about yourself and I want you to feel good about yourself today. I've had a quick scan of the audience and I can see a Harrison Ford lookalike over here. And you'll also hear the lovely voice of David Attenborough in Simon as he gets up to present. But little things, special things that we've got to value about each other. Just embrace each other for who you are, feel good about yourself, enjoy yourself. Ants is here going to take photos but he's informed me that he's only going to take photos of good-looking, smart people. So if he doesn't take your photo, they'll be counselling out the back. Now, if you don't want a photo taken, just put your hand up and say, and I'm sure that'll be okay. Yeah? Yeah, maybe not. Yeah, cool. Alrighty, so we'll make up for the acknowledgement. If Zayn comes, Jenny, we will bring him up respectfully to do what he has to do. But we're on Wurundjeri land today and I've met some different people from different parts of country. Steve, where's your country from? Eastern Ireland. Are you in the middle? Yeah. Yeah, I know. And Waka Waka, brother. And Sister Ghana. Yep. That's actually something interesting. We talk about indigenous research. Which one of these people are gonna be the one soul voice for indigenous research? Reality is none. We're an extremely diverse people and as one Aboriginal person, you really need to recognise that I up here don't have the right authority or responsibility to speak for all indigenous people. You need to value and respect the diversity of who we are, including our voice. So first things first, toilets, you know where they are out there. Dury smokers downstairs. You know what duries are? Other than that, we'll kick off with Jenny. You're gonna come up and do a little bit. Do you want me to do your bio? Is it Jenny or Jennifer? Jenny's the director of humanities, arts, social science and indigenous research data commons at ARDC. You can work out all the acronyms as we progress through the day. If people want to know what the acronyms are, I'm sure people just ask. She has a wealth of experience in collection management, dissemination of research, cultural heritage data and resources through digital humanities platforms. Most recently as executive officer of Osdage and her stewardship since 2003, Osdage has become the most extensive national cultural data set. On live performance, Jenny is acutely aware of the diverse infrastructure needs and digital humanities with expertise in database design, metadata schemers, interoperability, there's more interoperability practice, resources, discovery protocols, content managers, management systems, data visualization techniques and digital literacy. That's my gamma pool, dots and can find. Okay. Okay. Welcome everybody, thank you so much for coming along. It's great to see all of you. Before I start and in case I don't get another chance, I really wanna thank everybody who's had any hand in helping this day, these two days happen. So thanks to my colleagues, thanks to all of my project partners and anyone else who's helped out. Okay. Let's get started. I too would like to acknowledge the country that we're on today, pass through, okay. So I wanna talk to you a little bit about the ARDC and what it is that we do. And then I'm going to touch on the projects that we are showcasing in the symposium over these next two days and also talk a little bit about the future. So where to next? So the ARDC's purpose is to provide Australian researchers with competitive advantage through data. We accelerate research and innovation by driving excellence in the creation, analysis and retention of high quality data assets. So this is the current ARDC strategy. We operate a range of programs and services over four portfolios. And the two main portfolios are platforms and software and data and services. Our storage and compute and people and policy portfolios support the work that the two main portfolios undertake. So for the last two years, we've been running a range of open calls that have given us a unique insight into the digital research infrastructure needs of the Australian community, which has influenced our priorities. But first of all, what is a research data commons? So we're working to build national research data commons for Australia, but what is a research data commons? So a research data commons brings together people, skills, data and related resources such as storage, compute, software and models to enable researchers to conduct world-class data intensive research. We support open research and specifically the fair principles which ensure that data is findable, accessible, interoperable and reusable for all researchers. And I'm sure we'll be talking more about those fair principles over the next two days. So our audiences range from government to infrastructure providers, research institutions, data managers and then most importantly, researchers from all disciplines. Not only academic, but government and industry as well. Okay, so there are those fair principles. I'm not gonna talk about them much right now because as I said, I think they will come up time and again over the next two days. You'll know them off by heart if you don't already. Okay, so from the work that we've done on open calls, we know that we're currently unable to meet the demand of the research community for digital research infrastructure. So the ARDC's future strategy is based around the concept of thematic research data commons that will enable us to support the maximum number of researchers through a small number of strategic priority areas. So if you like a fabric of NRI capabilities selected strategically rather than competitively co-designed with the research community. So the fabric has both nationally focused platform capabilities that strengthen and support the broader system, the horizontals and a deep focus on identified national challenges and opportunities, which are the verticals or strategic pillars to provide an ideal balance for the national system. So the Hassan indigenous research data commons was actually the first of these research data commons and the people and planet research data commons are in their infancy compared with us. So go Hassan indigenous. Okay, so I'll give you an update on the Hassan indigenous research data commons. I wanna talk about the projects and the integration activities. Let you know about our proposed activities heading to June 2023 and talk about what planning we've undertaken for the five years from July 2023. So many of you know that the Hass research data commons differs from our current approach to thematic data commons in that it was born from a series of scoping studies undertaken by the Department of Education who then identified four investment ready activities which were subsequently funded through the October 2020 research infrastructure investment plan or RIP. So you'll hear me talk about RIP and that's what I'm talking about the research infrastructure investment plan. Don't we all love an acronym. So while we did hold consultations, workshops and roundtables to gauge researcher requirements, we were already committed to pre-designated projects. However, the consultations that we did undertake were really valuable exercises which have enabled us to identify capability gaps for Hass and Indigenous researchers which in turn has guided our planning for beyond June 2023. It's been important for the partners comprising the Hass and Indigenous RDC to think beyond the boundaries of their individual projects to imagine what activities could start to shape the RDC as a real proper research data commons. In the way that I talked about previously. And that is one with potential for shared elements such as governance, tools, access and authentication, data models to enable researchers to conduct world-class data intensive research. So the outset, we set aside some money to enable us to support what we called integration activities to achieve that goal and I'll come back to that shortly. Okay, so I'm just gonna really briefly touch on these projects because you will hear much more from the project leads and their partners over the next couple of days. So the first one is the Improving Indigenous Research Capability Project led by Professor Marcia Langton and three streams of activity, social architecture, which looks at Indigenous data governance and sovereignty, technical architecture, building the foundations for an Aboriginal and Torres Strait Islander research data commons and core national Indigenous data assets. So building an Aboriginal and Torres Strait Islander spatio-temporal framework and work is well underway, but I'm gonna leave that to the partners to talk about. We'll go on to the next one. The Language Data Commons, which is being led by Professor Michael Haw at UQ and here four streams of work securing language data collections, aggregating language data collections, improving text analysis environments and strategic partnerships, engagement and training, Integrated Research Infrastructure for Social Sciences led by Professor Steve McEachran at ANU. Steve here, he's coming, he's coming, okay. Six work packages, Iris Project Coordination. Now this is like so many acronyms, right? Vassal, Vocabulary Access Service for Social Sciences in Australia. Steve's the master of the acronym, I think. So Geosocial Data Integration Service, Iris Demonstrator Projects, SPIRE, Survey Project Integrated Research Environment and Cards Curation of Australian Research Data in the Social Sciences. Again, you will hear more from the leads and the project partners over the next couple of days. And then we've got our fourth stream of work, which was to work with Trove at the National Library. So we've now separated the Trove researcher platform into two separate but connected pieces of work. The first one is being led by the ARDC and that is the creation of a community data lab and we've got a session on that just before lunch today. And the second component is some work being undertaken by the National Library, which is enhancing the Trove APIs and improving communications with researchers. So I won't dwell on that because you'll hear more later. Okay, there are other integration activities that are underway and these are projects that are for the benefit of all of the partners in the HASS and Indigenous Research Data Commons, but indeed for researchers more broadly. So we're looking at access, authentication and authorization using CI logon and REMS and this may well come up in Peter Sefton's talk this morning. We're also working on the Gazetteer of Historic Australian Places, which is an offshoot of the time-layered cultural map which is led by Hugh Craig at the University of Newcastle. We have a project which is porting the online heritage resource management tool into a more modern standards-based approach, which is being led by Nick Teaburger at University of Melbourne. But we also have other overarching integration activities, Indigenous data governance and governance of Indigenous data is front of mind in all that we do. We're looking very closely at care, the care principles and we're also looking at the traditional knowledge and biocultural labels and notices and how we may apply them across the RDC. Okay, so to date, we have the three projects well underway as well as three integration activities contracted. We have a fourth integration activity at project plan development stage. Additionally, we consider, as I said, the Indigenous data governance framework and the exploration of the adoption of the traditional knowledge and biocultural notices as program-wide activity. In February, here we are, we're conducting two outreach activities, the symposium this week. Next week, we've got the computational skills summer school in Sydney, which we're really pleased. We've got lots of registrations from HDRs and ECRs for that event. In March, the thematic research data commons lead, so that's me and the people, Planet RDC leads, expect to have finalised implementation plans and we're also looking at drafting a reference architecture document, which will span the RDC so that we're working in a consistent way. We are hoping that the submission date for the RIP phase two, so that's the research infrastructure investment plan, will be announced soon. So we're working towards the submission that the RIP phase two will provide us with activity investment to make sure that we can continue the activities within the House and Indigenous RDC. Okay, let's go on to the next one. How am I going for time? All right, so phase two. So as I said, at the moment, we've got those three projects underway in phase two. We would like to continue and consolidate the work that's happening in the language data commons, the Improving Indigenous Research Capability Project and the Integrated Research Infrastructure. We also want to continue with our integration activities. So the Community Data Lab will become one of our key integration activities, providing tools and services for people to work initially with data from Trove, the National Library, but also with the view of expanding that to enable people to work with other types of data and indeed other cultural institution APIs. Of course, we want to continue to pursue shared standards and infrastructure across the research data commons, but what we really need, and we know this from our consultation process, is we need a distributed or a federated repository for HASS and Indigenous data. This has come up time and again. So we will be pursuing that. We want to continue to partner with the GLAM sector. And last year, we had a wonderful event up in Queensland, which Grant facilitated called Bringing Data to Life, which really examined Indigenous collections in the GLAM sector. And it was a really respectful conversation between the custodians of that material in the GLAM sector and Indigenous people. So we want to continue that conversation. And of course, skills development, because why build it if no one can use it? So you can't say build it and they will come. What we have to do is we actually have to build it and then we have to train people how to use it so that they will come and they will stay. And then we want to expand our work as well. So we want to address those capability gaps that we identified in those consultation programs, roundtables. And what did come up there was the need for a federated repository for Hassan Indigenous. The work with the collections came up and also applying Indigenous knowledges. So these are all things that we want to pursue in phase two. We want to try and broaden our partnership and the disciplinary spread of the Hassan Indigenous RDC. So at the moment, we've got languages, we've got social sciences, we've got Indigenous, but we haven't got the A in Hass. So we're missing the arts. We're also missing mediated data. So web-based data, social media data, app-based data. And that was again, something that came up in those consultations that there was a need for researchers to be able to access and use that kind of data. And I think that's me. Subscribe to the newsletter, everyone. Are there any questions that people would like to put to Jen? Don't be shy. Hope I didn't confuse you all. No, there's no one. There was no time. Oh, Isabel's got one. Is this a difficult question? No. No. Is there has been a visitor column? I believe not. And that's all I will say about that. You could have just said pass. I'll remember that next time, Grant. Thank you. Thanks, Jen. What I got out of that, Jen, that's sort of the interesting overview of everything was how cool would it be for you to have a data commons trivia night determined by acronyms tonight? Yeah, that could be part of it too. Work out all the acronyms. So now the next presentation will be an update from the Australian Academy of the Humanities and Academy of the Social Sciences in Australia. And that will be facilitated by Dr. Kylie Brass and Isabel Seron. Now I'm going to read you their profiles. Kylie, where are you, Kylie? You can proceed to come up and I'll talk you up as you walk through if you like. Dr. Director of Policy Research, Australian Academy of the Humanities, Dr. Kylie Brass is the Director of Policy Research in the Australian Academy of Humanities where she leads a research and policy agenda focused on future humanities workforce and national research infrastructure. She's the co-author with Professor Graham Turner, Farha of Mapping the Humanities, Arts and Social Sciences in Australia. I'll read Isabel. Isabel, to my left, there's a pseudo hippie cum alternate gypsy soul, just like myself, but she's a senior policy analyst at Academy of Social Sciences in Australia. She considers herself a generalist with a passion for systems thinking policy reform and the application of research to foster positive social impact. Energetic outcome focus, Isabel's educational background includes a BA in social communication. Masters, is there Masters in Urban and Regional Planning? Masters, is there? Masters in Regional Development and a PhD in Science Technology Studies. Isabel, professional career spans 20 years across media government, consultancy and not-for-profit sector. Splat me slowly. That's a lot. Thank you. So I'm up first. Thank you, Grant. Thank you. You've set the tone so well today. Already I feel much more relaxed than I did when I walked in the room, which is what I needed to feel. I also need to upgrade my biography. That's one of those kind of battle-do that doesn't say much about me, the Academy of the Humanities for 14 years, essentially. So I'm in it for the long haul, the policy agenda for the humanities and I've been policy and research director there for the last 10 years. So, Grant, thank you very much. I also want to acknowledge and recognise First Nations people as the traditional custodians of this land that we're on at the moment and their continuous connection to country, community and culture. The Academy of the Humanities home base is on ANU campus. We're not affiliated with ANU. We tenant there and I'm based in Sydney. So I acknowledge the various places in which we lodge across the country. So, you know, we have been engaged in this work for a long time, the advocacy agenda, and I was trying to work out, you know, what should I say kind of coming into this discussion? I've had involvement in the program of work and insight into what Jenny's been doing over the last year and a half and more because I'm on the advisory group that's chaired by Jill Benn and Chris Hathley, the executive director of the Social Sciences Academy is also on that and we've got some other terrific people. And so I've had the joy really of seeing it come to life and participating in other sessions, the various consultations and roundtables and the wonderful event up in Brisbane last year of the Language Data Commons. It's just brilliant to see it come to life. We have a new executive director starting at the Academy and I said to her yesterday, her name is Inga Davis and we're saying, you know, a very fond and sort of bittersweet farewell to our very long-serving executive director Tina Parallel and I said to Inga yesterday, I'll tell you about research infrastructure but it'll take a while. And it does, you know, to get your head around it. So what I thought I'd just quickly do is say, you know, where we fit, I guess in some ways, the points of entry for us and what our role is. And so really as a national body for the humanities in Australia, we, you know, and in my role particularly, I use the policy chops, the convening power and work through our partnerships and networks to advocate for the humanities and to further the agenda around research infrastructure and more broadly research in Australia. So we have a policy and research agenda I'll just work through briefly. We're constituted by a membership of nearly 700 fellows but a lot of our work, principally our work is focused in on emerging and early career and mid-career researchers. So all of our grant programs and a lot of our policy workers in that space. And then we also, I guess, bring to the table a range of sort of organizational connections. I'm the Academy's representative on GLAM peak. And so the kind of agenda of activity around bringing together and partnering with GLAM at a strategic and an operational level into this program of work and otherwise is something that's really close to my heart. So that's a peak of peaks and it meets three times four times a year and that's coming up. The next meeting is coming up on the 14th, 15th, 15th, 16th of February in Canberra. And so, Jenny, you've addressed that meeting before but it's a really good opportunity to bring the conversations together. Obviously, we work with our other compatriots, the other learning academies closely with the social sciences with the health and medical sciences, technological and engineering sciences the social sciences, I've said the sciences, who have I missed? Oh, yeah, them. That's the Academy of Science. And then also the umbrella organization which kind of convenes us all the Council of Learned Academies. So there's a lot of ways in which coming together and joining forces on stuff is really powerful and useful and we've been able to do that. I wanted to flag also a joint initiative that we've auspiced in our academies and it's a network of early and mid-career researchers and it's called the Shape Network. There's an acronym for you but it's a good one and maybe we tear down the House acronym. This is what I'm provoking us to think about because it doesn't mean anything, really. And Shape is an acronym that the Shape EMCR, Early and Mid-career Research Network have taken on and it stands for social sciences, humanities, arts for people and the environment. So it's about kind of changing the conversation more than just a kind of cosmetic makeover. It's really about driving a different agenda and a conversation to sort of a guest place shape on the same standing as STEM as an acronym in Australia. And then of course we've been partnering with the ARDC through various ways and means informally and formally. So that's our role. Policy agenda in 2023. Obviously at the forefront is, you know, of substance here is the National Research Infrastructure. A new advisory group has been established. People have seen hopefully the announcement last year and I know Mark Weston is here. I know Liz Sonnenberg was meant to be here as well. I think brilliant. And so it's wonderful to have that group established as an advisory entity into the National Research Infrastructure Task Force and within the Department of Education. I think that's wonderful. And so, you know, at some point we'll get, I guess, more detail on how that they'll come into play. But it's a terrific initiative. And obviously, as Jenny mentioned, the investment planning process is, you know, a front of mind at the moment. But there's other things going on within the Department of Education that we're obviously across and engaged with and I think kind of are the policy context for a lot of what's going on here. And I'll just mention them. The university's Accord process. There's a research infrastructure agenda that needs to feed into that. We're obviously engaging big time. We've got a meeting all of the academies with Tony Cook, who's the acting secretary of the Department of Education on Friday next week to sort of run through and get a bit of a briefing on some of these sorts of issues. But we make submissions into that process and that's one of the ways in which I think we can engage in a line on some of the important issues in higher education reform. The review of the Australian Research Council is another one. Obviously, that was big ticket piece of work that happened just before Christmas. There's the submissions to that process. But again, we're expecting to see some shake-up in that space as well. And obviously, you know, a lot of the Australian Research Council funding has underwritten so much infrastructure building our disciplines. And so it's really important to kind of see what flexibility and strategy and innovation can happen within the ARC over the next stretch as well. And the other couple of things I'll just mention is the National Science and Research Priorities review, which is a bit opaque at the moment. You know, we know that it's happening. We know that it's going to be helmed by the Office of the Chief Scientist, Kathy Foley, and run out of the Department of Industry. We've had a briefing about that. But at this stage, we're hopeful that the scope of that is not, you know, very minzy science and tech, that it's a much broader kind of agenda of activity. And certainly, Indigenous priorities have been at the fore of that in the announcement that's come on down the line from the minister at this point. So that offers up opportunities again to sort of expand and consolidate the work that's happening here as priorities to government. And then the National Cultural Policy, which has been announced in the last couple of days, you know, people can read that at will, but First Nations First is a pillar of that cultural policy. There's money behind that. The one thing I'll just pick out is that we're still sitting on what happens to the National Cultural Institutions. They're desperately underfunded, and it won't be until the May budget that we find out really what's going on there in terms of, I guess, the funding that's flowing through National Library is in the spotlight at the moment, but it's, you know, across the NCIs. So again, that's something that we're actively engaged in those discussions and advocating in that space. I'll just flick across also then, we've got a popped our policy priorities up there. We've got some big reports coming up. And I would say as well that we have last year, our president announced a new Indigenous study section within our Academy. So it's designed around fulfilling the promise and ambition of our strategic plan. And it's about diversifying our fellowship, bringing in a lot more Indigenous knowledge, expertise, and really shaking up our governance and policy and program agenda as well. So it's a massive agenda for us and a really important sign of our commitment. And then I wanted to flag a few things. One that Jenny mentioned, those scoping pieces of work that fed into the sort of Department of Education's thinking on the initial spend on the HASS RDC and Indigenous Research Capability Program. And that was a piece of work that we did on mapping international infrastructure models for HASS. It's on your website, Jenny. And I think it's on ARDCs and it's on ours. And then of course your foundational piece of work that was that was led by Alexis Tindall and Ian Duncan at the ARDC, which was a clincher as well, I guess, in getting things over the line. We've also partnered last year with the other academies and with ARDC on pieces of work where we each did an environmental scan in our disciplines. And that's an agenda around kind of taking forward a joint sort of policy and research commitment to data-enabled research. And all of those environmental scans are sort of the first port of call. So the academies are kind of arming up on that work. And then three, four, and five are pieces of work that, well, three and four are, I guess, you know, tangential to what's going on here at the moment. But we're publishing this year our Future Humanities Workforce Study, which has been an abeyance a little bit, but we've been looking at updating some of the data from the 2021 census. But it's really about there's a data and digital literacy agenda there. There's a gender and workforce diversity agenda there as well. There's a sort of writ large humanities, you know, across the workforce, but also some spotlight on academic workforce as well and the professional workforce. And then the fifth thing I'll say, and I don't know whether you've got permission to say anything more, Isabel, but really at this point I just want to flag that there is another piece of work underway and it's not ours to announce or anything at the moment, but it's certainly to sort of just flag. The Academy of Sciences has been contracted by the Department of Education to do a piece of work, a pre-scoping piece is what it's been called on national research collections and they've been talking to both our Academy and to the social sciences around what that could look like, what its scope is going to be and more on that will be coming down the line soon. But that's come also out of the roadmap process. And so it's about, you know, connecting the dots on some of that work as well. Thank you. Any questions? Hello. Thanks. I just wanted to reiterate what Kylie said and that is that the scoping study that the Academy undertook as well as the one that the ARDC undertook, which was the precursor to the Hassan Indigenous RDC, both of those are available on the ARDC website. So, you know, dive in and have a read if that's of interest. Sorry, Kylie. Yeah, very interesting. I had a bit of a newbie question. I see myself as a stem test in social sciences. Who is covered by the humanities? Who makes up your 700 fellow? Whoever self-identifies. No, I mean, you know, theoretically, yeah, the Academies have got some dividing lines around stuff, but our disciplines are history, Asian studies, European languages and cultures, philosophy, religion, linguistics, cultural studies, cultural media, creative arts. Who have I missed? So broadly, I mean, there is some sharing across with the social sciences. So broadly speaking, that's who we elect into the Academy. So the creation of a new section, an Indigenous studies section, is a way of actually electing an entire cohort. So we've got Indigenous fellows and we've got fellows who, honorary fellows as well, who've been elected into our Academy based on their practice. So we've already got some really brilliant people in that space, but this is supercharging that effort and really bringing it, you know, to the fore. So, yeah, in our work, it's much more blurry, I suppose in terms of the policy sort of advocacy work. I mean, you can divide the world up into Hassan STEM, but it's not productive. Thank you. Jen, just want to clarify, there are people that are online viewing today. So I should at least pay them the respect of acknowledging. Yeah, and the other question I have is, if those people have a question to ask, how are you going to orchestrate that? Okay. Thank you. So I apologize for not acknowledging them from the outset, but feel free. Here's me talking to a camera where I can't see anybody, but if they have a question, please feel free to send it through and we'll put it up. Isabel. Relax. Yeah. And now that I know there's also an online audience, I'm even more stressed out. So, thank you. Oh, no, did fonts have gone? That's what doesn't matter. Hello. Thank you, Jenny, for making this space for us here today. I'm really happy to be talking to you all names Isabel. And I'm a senior policy analyst with the Academy of Social Sciences in Australia. And here really I want to hijack my time today to talk to you, talk to you all about an invited to join our, a project that we're just starting out, which is a decadal plan or a 10 year plan for social science research infrastructure, which I think comes, you know, right up your alley. Quickly about us. Well, Kyle is already, you know, everything she said, Academy's do with her Academy does we try to do for our guys. Probably less. It's Kylie such a champion. But what, yeah, intersected people usually think economies and think fellows, and our fellows are indeed our strength and the reason for us as our existence. But we are also a between their fees and the government yearly funds. What what that sort of money goes to is to pay this group of people, my family, a bunch of professionals really passionate about research and education. They pay us to do more than just manage those fellows. They really those fellows are there to actually encourages and empower us to represent the sector more broadly and to hopefully do good policy advocacy work for for the entire sector not just those fellows. So that's us in the spot of that mission. We worry about infrastructure, which is a worry that we share with you, right. So we're here to talk about that shared worry. The work of the Academy in the research infrastructure space goes back a long way, much a lot of that before my time so do not ask me about that work done before. But I do know that we've been really pumping up the work we put into research infrastructure over the last couple of years like since 2021. When the ARDC approach our Academy and the other academies to do that piece of work that Kylie was also mentioning before. And this piece was, as Kylie said, an effort to get us academies to remember the word was something like activators right activators to think about the problems and the challenges and the needs for the social sciences sector in our case in terms of infrastructure. So we did that in 2021 and that was then published as a report called that. And then what we did next last year was again with the ARDC and a bunch of other partners. We convene a round table in Canberra and in April with like I think it was 23 or 26 research infrastructure related organizations across the country. And we then to talk about those issues so now we know that these issues exist is all these all these problems. What are we going to do about it. And if I try and channel summarize the sentiment on that day, what transpired was something like there's lots of great work, lots of like massive work happening by the infrastructure leads in the social science space and lots of value to look forward to build together. But also lots of fragmentation, the infrastructure area is highly fragmented and that that hurts the sector back in several ways. There's lots of work duplication happening. There's, you know, groups pulling in different directions instead of pulling together. And one of the most felt probably consequences of that fragmentation is that when we go we find it really hard to get our hands in on big chunks of increased funding because it's like lots of little voices asking for different things instead of a unified voice asking for a big chunk of money. So the idea of a decadal plan came out from that meeting as in as a way forward to as an antidote to so to speak to counter that fragmentation a way of a decadal plan as a way of rallying the troops and finding a way forward together. So that's what that's what we're at now that that's what we intend to start now I'm so late to get this project together. We're starting out now and meant to be finished by September so this is sort of our our chance to to do that. Decadal plans have a long tradition in science. They're basically and particularly as a an advocacy tool for academy type organizations. I picture there the two consecutive ones the Australian Academy of Sciences put forward for space science. And then on the other side is there the counterpart counterpart for that from the US. And so as advocacy tools decadal plans are I'd say very simple but powerful instruments advocacy instruments. How simple well basically a decadal plan once it's printed in shiny. All it does is it's like a it's like a signal that the sector beams out to the outside of the world saying these are the things that we want to achieve. And this is why they matter. This is why you should care about those things. That symbol that's that now how is that powerful in a context like infrastructure and funding and things. Put yourself in the shoes for a minute for a minute in the shoes of infrastructure funding agency. And then a team comes up and says I want funding for this. Now that's a conundrum for the policy investment decision maker. And the other thing is that if they give the money to this group. That's a fairness problem. The rest of the teams are going to be mad that we gave the funding to these guys. And we're just government people. How do we know that this is even a good idea. You know, at all. So it's a it's a bit of a. Yeah, it's a it's risk. It's risk scrimmage. Now how the decadal plan works out is. Whatever is the funding. Agency might be government might be a bunch of institutions rallying together. And now they say, we want a bigger telescope. But this time they say, and look, there's this printed thing with a sign off of these X many stakeholders saying that this is actually something the sector needs this something the sector is aspiring to. And now for this. For the decision maker there. The, the fairness and the effectiveness. Issues are no longer that much of a thing. Actually it becomes an opportunity a political opportunity for that for those funding agencies to say then, oh actually if I help these guys, I'm going to be fulfilling the wishes of all the big sector. So, basically, it is you guys doing the hard work out there, building the infrastructures. And I don't think we're going to be able to do that. I don't think we're going to be able to do that. I don't think we're going to be able to do that. But what we can do as a community based policy advocacy organization, or we would like to do in this space is to place those ambitions, those projects are you thinking about in that broader story, the broader picture of the needs of a whole sector. And of a fingers crossed hopefully real things that we would like to do. I've listed that my CEO, like right after we thought out the kill plan, then our CEO went out looking for funding partners, and we found five very generous organizations willing to fund this piece of work is not. It is basically funding to have people do the job of doing like, you know, going through it. So, so we now have a team of four, made up of four people. Amanda sitting over there. Yeah. And a couple of interns from the University of Queensland, and there's still room for another couple of interns. So if you know anyone, we're looking for PhD interns to get involved in this piece of work. Let me know. And at the bottom there is our steering group. I'm usually not too hyped about steering groups. If I can be candid, but this particular team, you probably know all of them. They've been in this space for a long while. And together, amongst them, they really bring in the expertise and then the passion and the care, you know, for all the different aspects that could make a plan like this work. So that's, that's that. How do we see this plan coming together? So obviously, it would have to address some of those like basic components of the infrastructure, like, you know, the challenges or the issues that have been defined previously, which have listed, oh, sorry, the funds were not. They're supposed to, it should have looked really nice, but anyway. But you know the issues, so we're good. And look, if you see yourselves as leading one of these working groups, please get in touch. Otherwise I will haunt you. Because I will. Yeah, so there's first this sort of setting up of compartments of things that need to be decided on thought about. And then again, the funds and then and then the how the plan comes together. The way we visited is a series of basically of dialogues where different partners of stakeholder types come together. One is the part that we bring in the academy, the researchers, the garden variety type researcher. Who should tell us what is it that they want to be able to do by 2033. Just basically in terms of what kind of research questions they want to be able to answer what capabilities they see themselves like I want to be able to do this. And then the second company would be your crew, the technical, the tech minded people. You guys who have been through the work of doing these infrastructures and know what's involved and all the issues and all the challenges. Then you tell the researcher, the normal research, the average researcher, you tell them, sorry, but you tell them, hey, for those things that you want to achieve, the most viable or the best in town now is this. This is where you bring in that expertise. And then the third element in that discussion are those funding agencies or regulators and anyone who's sort of run a national facility before. The people who could tell us those infrastructures you want to build, how do we deliver them nationally? What are some of the models to deliver these for everyone at a national scale? And all that leading to, yeah, that plan, that signal of consensus out to the sector, beaming out to whoever listens, stating what are those capabilities that we want to have? What are the models and types of infrastructures that we then, the sector supports? And finally, why it matters for Australia? Why should they invest in it? Now the academy, this is a place where we have staff to contribute. In 2021, we run a massive consultation exercise called the State of the Social Sciences 2021. And basically as part of that the researchers in the sector identified these seven grand societal challenges. No problems for research, problems for Australians. Where social science disciplines would have a critical role, you know, helping tackle those problems. So these are already priorities that the sector has identified. We could use this as a basis to tell that story of relevance for your kind of projects. Where the infrastructures that you are intended to build will help social sciences get these outcomes out for Australians. So that's one thing that sort of it's already there to build upon. The CEDA plan is also a the sector being proactive. We've identified some issues like we wrote a report two years ago. Well, let's do something about it. Let's respond to that. Let's make a plan. And it's so interesting looking at the kind of challenges that sort of bear down on the infrastructure space, the infrastructure sub sector. It's uncanny how very much they resemble the mirror, the broader challenges in the whole social science research and education as well. So we're so bad at getting reconciliation up. We're fragmented. One of the big things was we need a more connected sector. Same in infrastructure. Demonstrating value. Why investing in social sciences and all those things. By doing this space, we will at the same time be contributing actually to improving the infrastructure. I'm going to skip this one because it's not that relevant I think anymore. But probably the one thing to highlight here is the CEDA plan should not be saying as a tool to promote particular projects or stakeholders or organizations. This is what the sector needs. At a higher level. It should also be seen as a competitive move from our academy to outmaneuver to other disciplines. But not really. And this has been one of the big messages of our steering group looking at how we put this together is the social science needs to stop seeing themselves as the sad group taking advantage of and not funding or not and instead demonstrating our value and not competing but collaborating and coming together with the other discipline groups which is I guess our best way to get integration is the place of social science really. And to conclude and through the CEDA plan fashion I've told you what we'd like to do and this is why you should care about it. I'd really like to all of you guys to participate in this project because first because we need you we need you as technical experts. Second, this is a chance for you guys to help shape whatever vision comes out for the future of infrastructure. And lastly and hopefully if things turn out okay if we don't do our jobs well then you will have that decal plan in your hands as a tool to place your projects within that bigger picture of hey this makes sense this is worth the funding because this is actually responding to a real sector need and a real societal need. So that's me. The thing there that are called a cure code please you know I've been keen on this but we've just opened a LinkedIn group to manage our stakeholders for the project. It's currently got two members because we only open it Monday to be ready for this thing but please be amongst the founding members of our LinkedIn group and yeah honestly and whatever just email me text me whatever really looking forward to working with you on the decal plan. And that was me, thank you. Any questions to Isabelle? No, watch out Any other questions? What was the story about social sciences are they like being the poor cousins or something in this? Yeah, poor cousins are the same. Yeah and then the reference to not competing and coming together and then the genuine reconciliation conversation what's that all about? Was it all about? That's the question This might be boring with no question I'm listening to what you're saying It's an interesting question what do you mean what was the story behind it? I saw it on that slide there in reconciliation so you talked about reconciliation of all of the parties here working together is that what you're saying? No, it was specifically indigenous Oh okay So what can I say about that? There's no possible infrastructure journey that does not involve indigenous peoples? Yeah So we have, oh Mark do you want to jump in? Yes, yes please And what there was was a very strong view that the commitment of the social sciences to truth-seeking and truth-telling has to also acknowledge sort of the way in which the social sciences in various ways have been complicit in the process of colonization dispossession of Aboriginal peoples Aboriginal and Torres Strait Islander people in this country but also the role of the social sciences with First Nations peoples generally And so part of the kind of truth-telling exercise I think is the social sciences coming to terms with our own histories but also recognizing then what the contribution of indigenous knowledge indigenous perspectives has to enrich the social sciences and also provide a model for an exemplary form of truth-seeking and truth-telling and that has to be I think a priority for disciplines in our sectors and in the future as we go forward it's a way really to reinvigorate the social sciences as well as as yeah think through where we are implicated in processes of colonialism and I think it's a critical task in the research infrastructure space where it plays in I think is around issues of and again I think it seems to me that these provide a model for thinking not just about data for indigenous peoples but data for all peoples that the model around indigenous data sovereignty is really a model more broadly I think around data sovereignty when we are talking about data for peoples and about peoples so I think that's the promise and the aspiration that could almost be a presentation in its own right yeah so thank you for that clarification because I saw it and I thought I was trying to work out what it was to do with this group these groups come together and reconcile why but there's a definition for reconciliation I've been searching for long and hard and it's a biblical one of all things given the history of I mentioned lots of other things that went on but it is probably a good definition it says that reconciliation is bringing back into harmony, unity or agreement what has been alienated that makes sense so the quick question is what has been alienated and what needs to be brought back into unity harmony agreement and what that means for the people another thing to do with that is to engage in this space you need to be trauma informed because colonial history has caused intergenerational trauma so everything you try to do with the best intentions you've got to be able to develop a capability to understand trauma and work through factual politics and issues that are a consequence of colonisation your governance framework should also importance of kinship leadership and decision-making versus top down. It should also be focused on healing and bringing people together to restore balance and harmony. I'll probably cut into Robbie's presentation shortly, but he'll reaffirm that. But there's a lot of things that need to be considered that haven't been considered in the reconciliation and the cultural governance conversation, which it's time we get to that point. As of today, this year, we're only 235 years into our existence. Truth telling, from 1788 to 1901, massive genocide, except that you didn't cause the problem, you're part of the solution. We've got to get to that point of healing. 1901 to 1967, we finally got counted as citizens for the census, but ongoing genocide. Today, highest incarcerated people in the world, 53 to 60% of kids in care, Aboriginal, something to miss. Is there an orchestrated process that continues? How do we heal through that sort of stuff? That's the reality. Part of that journey as you go forward is to start to sort out who you are as Australian people. You need to heal yourselves. And part of that, as non-Aboriginal people who are born in this country, can you put your hands up for me? You belong here. That's part of what you're going to understand. Through your birth, you belong in this country. You need to start to understand what that means in an ancient historical, cultural, environmental context. You have responsibilities, obligations to take care of this land, as we've always taken care of. How many people are not born in this country? You're very, very welcome in this country. Do you realise that? You're very, very welcome in this country. You have been the only person to ever explicitly tell me that I'm welcome. You're very welcome here. And we'll talk more about that too. But that's what we've got to understand. Australian society as of this year is 235 years into its existence. By skin, by blood, by ancestry, over there, by walk-a-walk-a-brother, you've been here for thousands upon thousands of generations. Not years, generations. That's what it means to acknowledge and pay respect to one of the world's oldest continuous cultures in the world. That's you, brother. But as a young man, we need to support and nurture, build you up. I don't speak for you and your people. On your country, I'm physically older than you, but the engagement process has got to be underpinned by cultural integrity, honour, and dignity and humility. So on your country, when I go to your country, I've got to see myself as a little boy. I don't know your language, your hurt, your suffering, your pain. I don't know your people. So I'm a little boy. I need to sit, listen, and learn in the first instance. And that will come by and respect. Makes sense? It's not rocket science. It's just basic common sense and decency. So when people, part of the challenge as you progress forward, you're all trying to do the right thing. And I can add a little bit here because it's a little long shortly. And Isabelle and Kylie spoke very well within their time. The challenge I put to you as you progress forward individually and collectively, when we get up and we say, thank you very much, I'd like to start proceedings by acknowledging the traditional custodians of the land on which we're here to gather, pay our respects to their elders past, present, and emerging. I want you to remember me because I could be sitting in that audience and I'll say thank you very much for that acknowledgement of country. Can you now tell me and everybody else in this room how you are authentically doing what you just said? Yeah. And I don't want you to think, oh, is this smart ass? I want you to say thank you, Grant. That's a great question. What we are doing is one, two, three, four. And we're doing that authentically. That would be genuine reconciliation. So we're about to proceed a little lunch. Isabelle, we can add a little bit. Ancestry, where's your where your ancestors from? Where's this Columbia? And we're just playing mixed race, undetermined mix of panic and blood. And so we say, so we say, Simon, Buenos dias. We learn to be polite and courteous in as many languages as we can. And we'll talk tomorrow about your Sri Lankan language. That's all part of the process. If we truly value humanity and diversity, social sciences and all that, we'll all get to sit down and reflect on who we are. We're all unique. We're all special. None of us are perfect. And the basic values that have always been in this country that are global values to humanity, caring, sharing and respect for the land, the people and the environment. What you do is brilliant. But you've got to come together and build kinship, leadership, support. And that's all part of a conversation of bettering governance. Thank you very much. Little lunch is served. We've got Vegemite sandwiches and, I don't know what else, you've got half an hour for a little lunch. So what we do, talk to each other and get ready for your next random tasting. Thank you. Well, a couple of things just before we progress to the next presenters. But those three people sitting out there and ask us, can we get you to come in? Because that's a bit disconcerting that you're sitting out there. I feel like you're alienated. We want you to all be part of the family, if that's okay. Bring your chair, whatever. Now, another important announcement today, Dr. Sandra Phillips. Happy birthday to you. Happy birthday to you. Happy birthday, dear Sandra. Happy birthday to you. Now, how did I obtain that little piece of data? You look like this. Facebook, sister. Facebook. Now, there's another little challenge. Give me a little challenge for you. I want you to all get your pens. Here's going to be a couple of emergency nurse acronyms. And you've got to write down AMF, AMF, Y-O-Y-O. That's the first one. And the second one is TF Mundy, B-U-N-D-Y. Whoever gets what those two acronyms are by the end of the day, we've got a prize, haven't we? No, you've got to work out what those acronyms are. Now that everybody's relaxed and ready to roll, you can't work the acronyms out yet. You can leave it for later. That's big lunch. You do that. I'm going to be paperwork sorted out. If you haven't worked out so far that I've got a bit of a distorted sense of humour, it's important, I think, for people to laugh. You've got to laugh with each other, not at each other. And that's part of the journey. A hundred years from now, no one in this room is going to be here. So enjoy today and the next day and learn new things from each other and learn new things about you. Our next presentation is about language data commons of Australia, another bloody acronym, and the Australian text analytic project, ATAB, overall vision and progress. And that's going to be delivered by Michael Hoare as in war with the W, I heard. And then statistically, we go to Steve Hoare's cricketing stats. That's all data too. That's a separate one. And you'll be co-possilitating with Peter Sefton. Now Michael was a professor of linguistics, applied linguistics in the School of Languages and Culture at the University of Queensland. His research lies at the intersection of linguistics and communication with a particular focus on the role of language social interaction. In social interaction, he's leading the language data commons of Australia and the Australian text analytic platform projects and also co-director of language technology and data analytics lab, LADL. LADL is the traditional custodians of the of Mornington Island. Did you know that? No, I guess. Peter Sefton is a e-research expert, specializing in software development, research data management and metadata, currently leading the technology and infrastructure team for the language research data commons project for the University of Queensland. Dr Sefton has been responsible for numeracy research initiatives at the University of Technology Sydney, Western Sydney, and ran software research and development laboratory at the Australian Digital Futures Institute at the University of Southern Queensland. While at USQ, Peter was involved in the development of rubric, CARE, whatever, C-A-I-R-S-S, which I'm sure you'll elaborate on, and is responsible for the widely used open source research data catalogue application, ReDbox. Peter's deeply passionate and involved in the e-research community is a contributing author and continues as an editor of the increasingly influential research object rate, row crate, standard, and as a member of the program committee of the Research Australia-Australia Conference. Yeah, so Peter, and Michael first, and then Peter. Where is Peter? Oh, that's you, Peter. That's the second place. Let's look in the room, I worked out. Hi, so good morning, everyone. I'd like to also acknowledge the traditional owners of the lands in which we're meeting today, and to give thanks to all the contributions that our friends are making as part of this project, the projects that we're working on. So I deliberately didn't put my name and Peter's name on the slide because it's really a massive group effort that's involved here, and it's just my privilege here to be talking on behalf of a really large group, and the others are really talented. I'm just riding on their coattails. So I wanted to introduce two of the things that we're working on here, but first, it's always good to start with a purpose like why, this is the preaching from Robert's score sheet here, why are we doing this anyway, right? We could be doing a lot of things, especially if you're an academic and audience. We have to teach, we have to do research, we have to do administration, which we hate. So we have to do a lot of things. Why would we get involved in this stuff here? So we're passionate and it brings together a lot of people who are passionate about language, and it's trying to get the message out that Australia is not an English-speaking country, right? English is spoken here, but hundreds and hundreds of languages have been spoken here for tens of thousands years, more than 50,000 years. It's a really long time. It's one of the most linguistically diverse regions in the world, Australia and the Pacific Southwest. So it's a really special place that we're in, and I think sometimes we don't appreciate that enough about this country and where we are. And another reason that we're doing this is actually there's been quite a lot of work, so we're not starting from scratch. There's already been large collections of language data that have been pulled together. There's a lot of work, particularly I think a shout out to Paradisic, which is a real world leader in the space. So working in partnership with Paradisic. There's also growing demand for text analytics capabilities. So if text analysis is not on your mind now, you haven't been reading the news, right? Chat GBT. Apparently, I'm just going to put in my research question and it's going to give me my answer and we're done, right? So it's an interesting time, right? So it's getting to the point where we really do all have to understand these things. It's not just something for computer scientists and people working in NLP to understand. And we've been working together since 2018 and we keep, I mean, it's much longer than that, but this rendition is from 2018 and we keep on welcoming new people along the way, which is really exciting. What are we doing? So there's a number of projects. So the ones I'm talking about today is the Language Data Commons of Australia project, LDACA, and the Australian Text Analytics Platform, ATAP, and also just a mention of the Language Technology and Data Analytics Lab, Ladal, right? So three acronyms there to learn. So what's our mission? Actually, this is in some ways the hardest to explain because when it's those words on the page, it seems a bit dry and not so exciting, but, you know, access, fair care, all these things, right? So essentially what we're trying to do is lead in a space and work collaboratively with all sorts of institutions, right? Because we don't hold all the data ourselves, but work with institutions around democratising and decolonising access to data, language data, and that's actually not a particularly, it's not a simple task. Technology-wise, there's fabulous work being done and PT is going to talk about that, but in terms of the people involved and the histories involved and what Uncle Grant was talking about around trauma-informed healing focus, this is all part of it and trying to figure out the right governance of this. When traditional university governance structures are it'll fit it for what we're trying to do here, actually. So, but we're trying to get people to access, the right people accessing the data, they should have rights to access. So it's a lot around rights in our area. There's a lot of talk in other areas around sensitive data and keeping the data safe. That's obviously really important, but we can't forget that people have rights and so part of this is about getting, you know, addressing those rights to that data. And I guess one of the biggest challenges is trying to find sustainable hosting for this data. So our mission with the National Languages Collection Strategy is to say we need a repository in HASS, right? We really need one that the government commits to, we say 50 years at least, right? That's that's our demand. So for analytics and training, this is about, there's lots of tools out there, whenever you talk to a computer scientist, you can do that easily. Unfortunately, we don't know how to do it easily. And I'm saying me as part of the group who don't know how to do this, even if it is really simple, if you're a computer scientist. So it's about training up people in the area, wherever they want to be on the scale of just dipping the in was sort of really simple off the shelf type of things, through to really understanding these processes and giving, so ATAPs, primarily in education and training enterprise, because a lot of the tools are really quite well developed, but it's packaging them in a way that's accessible and having training there for people to better understand. And again, who we addressing here, it's from sort of researchers working in universities and those kind of places through to people working in communities trying to do the research as well, because they can benefit from these tools too. A very simple one would be going from speech to text. It should be really easy. It is easy, but it's not easy for everyone. And once you start getting into particular languages, English is a lot easier than other languages. There are tools you can use, but you know, it's a work in progress. Our collaborators, I said there's a wonderful team. I'm not going to be able to mention everyone, but some of the key ones are our CIs on the project. So we've got Clint, Bracknell, Monica, Bednarak, Ryan Fraser, myself, Martin Schweinberger, Jane Simpson, Nick T. Berger, Catherine Travis, Bo Williams and Louisa Willoughby. Some of the CIs are in the room. I won't start trying to point you out, but you're here. So thanks for being here and supporting. We've also got a fabulous team of people working across different universities. I can't list them all. The font gets too small on a page, actually, to do that. But just to mention, because this is relevant to the presentations coming up, we have Ben Foley is working in Applications Tools area, Robert McClennan, who's our program manager, Simon Musgrave, who's leading in engagement and working with Ben on the training. We have Sue Plunkett-Cole, who's recently joined us as the project coordinator, and Peter Sefton, who's leading in the data and tech area. And we've come up with a kind of a triangle of how we sort of see things working together. Initially, we had four teams that were working separately, but we found that actually wasn't particularly effective. So we're creating this kind of model where we're coming together. We also have a lot of national, international partnerships with various groups. We find when we're trying to do these kind of things, it's better to work with more specific groups. So we're trying to engage with Claren, but Claren is a massive thing. So we're trying to find specific points into Claren, which is essentially like the language infrastructure, research infrastructure for languages in Europe, across Europe. So it's massive. So we're trying to find, you know, real collaborative points rather than just performative stuff. I did want to mention here that the language data commons is not just about linguistics. It might seem a number of us, a number of the CIs are linguists, but not all of them are linguists, and some of them would desperately, you know, don't call me a linguist. So it's actually involving quite a few different disciplines, applied linguistics, which is looking at real world problems that are related to languages, computer science, cultural and oral history, data science, research, indigenous studies, language technology, library and archives studies and linguistics at the bottom. So the point here is actually we have people in all of those areas on our team, whether they're CIs, whether they're project team members working with us. So that's a fabulous experience working with all those people. So challenges. So you've seen the disciplines. It's quite hard communicating with people across disciplines. Sometimes you don't know what the words they're using mean, and even worse, sometimes you use the same word to mean different things. That's, that's really, really challenging because you get to the end of it. Oh, right. So that's quite challenging. I think working across institutions and I wouldn't say that. Anyway, there's, there's challenges. I was going to make some kind of comment. Communicating across disciplinary boundaries, but that's exciting because you start to see new ways of thinking. We see this as innovation project. And I think one of the things we're working with is the complexities of Australia's colonial legacy as well. And that is a challenge we're working through in what we're doing. And that's a whole talk in itself. The benefits, which I think is a good point to end on before I hand across to PT is going to talk about what we've actually been building. I think building a stronger, more resilient language research ecosystem, trying to overcome the sort of fragmentation that Isabel was talking about. We're wanting to create real world impact for communities, connecting communities with language and assisting them where we can on those journeys is a really important thing to be doing. Communicating importance of languages and cultures at the national scale is really important in a country like Australia, where it's so central to its identity, even if it doesn't realize that yet. And of course, there's also opportunities for scholarly and personal learning and growth, which I think is not something you'd necessarily associate with research infrastructure, right? Because when you talk to people outside of this room, I don't know your experience, but I get the glaze over quite often when I talk about, you know, when I bring out the term research infrastructure, all right. And then that's it. But it's a really exciting and fulfilling area to be working in. So a pleasure to be talking with you about it. And thanks to AODC and Encrus for their wonderful support for these projects. So can I pass to PD? Okay. Hi, everyone. I'm Peter. You can call me PD. So we had a steering committee for this program, the LDACA and ATAP, or kind of a steering committee yesterday. And Michael, the title of the presentation we gave was these updates on technical and social architecture. So I thought that was actually a really nice way of putting this. And I've recycled yesterday's presentation for today. This is a snapshot of, this is a screenshot of the homepage. I'm going to come back to this from the LDACA homepage. If you can read it quickly, you can look at that. Or you can go if you could just go and look it up. So the technical social dimension is a really interesting way of looking at this. So I threw together this table here to talk about our technical architecture. I'm sorry, I'm going to show you some mildly technical diagrams in a second. So what are the technical and social dimensions of the systems that we're building to do what Michael has been saying, look after language collections? Technically, we've been working on storing and packaging data so we can describe data and we can keep it in ways that will help us improve the chances of being able to preserve it for a long time, reduce risk and reduce cost. We can't, a project that runs for a few years can't preserve things for decades, but we can try to work out, keep things neat and tidy and well described so that they are preservable. That's the technical thing we're doing. And the social dimension of that is actually standardization. So there are a couple of standards I'll mention that are the technical things. I'll describe them when we come to them. And then the next layer up from having captured, that's a kind of science metaphor, like for being general Daryl and going out and catching language, but it might be colonial, I think that one. I'll just pause briefly to acknowledge that some of the language we use turns out to be problematic and you walk into things without realizing it. So I just did that. So indexing and serving data is the next thing. So once we've described things, we want to make them findable so that people can find where things are and we can use them with machines and humans and process them and do speech recognition, turn all sorts of different languages into text. And again, there's standardization efforts happening there. So we've been working a lot with Nick Teaburger who's going to talk later on Wave Nick. He's not listening, he is. And many others to bring together a standard for how we describe languages in a way that builds very much on what works that's gone on with Paradisic and the open language efforts that have gone before, but modernizing those and using the latest technology. So that's the social dimension of that. And then another thing I'll talk about a bit today is how do we give access to data to the right people? So we've heard about FAIR and I'll talk a little bit more about FAIR later, but the A in FAIR is accessibility and providing access and FAIR specifically mentions making sure that the access is to the right people, which is it also works with the care principles and underlying that is an assumption that there is data sovereignty for any group who have a stake in the data needs to be respected. So we'll talk about that and the work that we've done to take language materials which are not open in the sense of Creative Commons open and make sure that we make sure that they're accessible to the right people and reduce the friction for people to be able to actually to get access to things. So that's the technical thing and the social part of that is we have to work out, so who's looking after data? Who's the steward or custodian of the data? Who actually owns the rights? And we have to look at the law that we're working with here, typically the rights we're interested in a copyright. And then if you can work out who holds the copyright, then that person or persons who own the copyright can grant rights to other people in the form of a license, which is a kind of contract. So that's complicated and no one really wants to think about that, but that's the way things operate and that's the framework in which we're working with this stuff. So that's the social dimension of that. So key to all of this is standards and we're working with a few of these. I'm going to go through these. I'm going to go through the stuff on the right. PIDs in the middle, that's PIDs as persistent identifiers. So that's one of the things we have to think about is what's our policy for how do we identify something in a way that if we have to move it, if the institution that's holding it doesn't want to hold it anymore, how do we make sure that if we issue an identifier something, for something that we can find one or more copies of that in the future. There's one on there that you might not have heard of, which is RCP. It's an archiving and packaging identifier for people who are interested in this sort of stuff. You can go and look that up. DOIs, widely used, that comes to us from the publishing industry, so digital object identifiers. Okay, so I'm going to quickly run through the standards, the RO crate, so research object crate. This is, as it's said in my bio, I'm an editor on this standard and the RO crate is a cross-disciplinary international effort to come up with ways of describing data, any kind of data. So it's omnivorous. That's what this slide is meant to represent. So you can use this for language data or science data or cultural data or whatever. And it's a, how do you bring the data together and describe it as a set, an object of some kind. It is, an RO crate has a human readable and machine readable document in it that has the basic who, what, where metadata that we're all used to from things like Dublin Core and that you see in library systems and so on, who created it, what parts does it have, when was it made, what's it about, how can it be reused and that's the bit about licensing which I will come back to. It's really important that every piece of data we work out who owns the rights for the data and that then we ask them what are your wishes for people to be able to reuse this data and we capture that, we capture that in a licensed document that goes along with the data that gets stored and RO crate, all RO crates must have a license to be compliant and then there's stuff about funding and so on, which is important in the academic context. We can also have one thing that RO crate does and the reason we developed this standard is that previous metadata standards were okay at the sort of who, what, where stuff, but they didn't have a way of actually describing the files. So if you had a bunch of files on disk, there was no accepted standard way for you to describe individual things and say this recording here was made with X video camera in some context by a person and RO crate that you go right down into the file level or even inside a file and describe what are the different columns inside a CSV file mean for example. Okay so the standards we're using on the language data projects are not specific to language data. The first one and we picked this up because we were inspired by the paradisic work where paradisic has 20 years old and one of the reasons I've been able to sustain the archive of Pacific materials is that they stored it in a simple way. So everything in paradisic is stored on disk in a folder with the metadata beside it and that's a really sort of sensible way to do things but most repository systems historically didn't work that way. They would have a big complicated thing with the database and once you put your data in there it's difficult to get it back out. So Paradisic is architected in such a way that it can take pieces of it to a particular community fairly in a fairly straightforward way just by copying things and then lay it on top of that we have the RO crate which is a way of describing individual objects and that's work which is ongoing. The RO crate standard is really lively. RO crate is used a lot by bioinformaticians so my co-editor at Manchester University, Steanne Soyland-Rays is heavily involved with Elixir and major projects in the European scene. So the good thing about that is that they have a lot more money than us and we get a lot of resources by using a common standard with the scientists. What we've been doing in this framework in Eldaka over the last 18 months is a lot of housekeeping. So as Michael said there have been existing projects to collect data. Paradisic is in good shape so we've been talking to them about standardising for interoperability but there are other previous projects like Alvio was one I was involved in which was very similar to Eldaka, collected a lot of data and has run out of funding. It's been looked after by volunteer labour and it had reasonable metadata but we've spent some time improving that and bringing in some of the components like the Australian National Corpus. So what we've been doing and I hope this diagram makes sense we've had programmers, more than one, writing a script and we take the data which is in various stages of disrepair and we kind of marshal it all into this standard and we store it in the OCFL repository which means we put it on disk with a good description beside it with good metadata. Okay so now we're coming to something which I think should be of interest to lots of people in this sector and it's a really key we're going to hear I think from our colleagues in social sciences about doing this in a completely different way but one of the things about language data is because it's created by people typically although machines can do it too as Michael noted you know and we're going to see a lot of that we have people involved in in creating things then those people have rights and there are for various reasons we can't always make language data open so where we can that's great and if the communities who have been involved in collecting data want it to be available then we've got things like creative commons licenses and we can make stuff available but in the typical university based study where you bring people into a room and you sit them down and you video record them or if you go out into communities you'll be operating under ethics agreements which vary a lot between institutions and are confidential so we can't actually kind of survey them particularly well and those those ethics agreements will start will have data management things in them that will say you know how is this data allowed to be reused that participants sign up and say I will allow my data to be used by other researchers or I don't allow my data to be used for linguistic study and that's it the problem with that is we have no way of knowing who's a researcher really or who's a linguist or whatever so once you've collected this data we need to we need to try and make it accessible so it can be reused but only by the right people so the model we've come up with here as I mentioned before is when we're storing things we're storing everything with a license and the license will set out the terms for reusing data so this data is allowed to be used in research contexts or in non-commercial contexts or whatever and but we need a way to to grant people access to to data so the model that we have implemented and this is running is that we we have authorization and authentication systems so we need to identify people so we can do that using a system that we're getting via the Australian Access Federation called CR logon what that lets people do is logon with their university credentials which will probably be a lot of people in this room will have an AAF Australian Access Federation account you can log in through your university but if you don't have one of those accounts you can log in using Google, Facebook and a variety of other Github and so there's a bunch of different social media things that you can use as well and Orcid which is the identifier for researchers so someone can log in with an Orcid and we have a license management system which is a piece of Finnish open source not finished it's from Finland open source open source software that comes from again comes from the bioinformatics community I think that lets people let's us manage licenses so we can publish a license and then we can have this interaction where somebody comes to a repository I won't try and go through this line by line we will these things are all can be published so you can come back and have a look someone comes to a repository the repository checks the license and then from there we can go off and check whether this person the person is trying to access the data or maybe it's a machine with an API key so does this person have access to this data and if not we can send them off to the license server and they can some cases they can just click and agree to a license and in other cases they will have to apply and then the so they can fill out a form identify themselves you know provide some evidence why they should be allowed to have access to something and then they may or may not be granted the license and then once you've done that the repository can give if somebody's come to you to ask for the data can give them access or not give them access this may sound really silly and obvious and why am I standing here talking telling you this stuff the important part about this is we don't really have standards for this and most systems don't you know work in this way people typically just code something into the application but by doing it this way with our architecture it means that we can actually have a distributed system where data is sitting in multiple different systems and people can come into any one of them and we can centralize or at least concentrate the the license management and if we need to move to a new license management system we can we can take all the license agreements and move them some way so this is a screenshot of some existing work in progress where there's this Griffith corpus of Australian English requires authentication so you'll see something on the screen that says something like you don't have permission so you know go and ask for a license and then you agree to a terms of use it's very really simple and then the repository can actually serve the data so this little diagram here is showing the repository saying oh look I've got this person yes they do have this license so I'm going to actually give them the data I'm not going to talk about that one so I'm just going to summarize the work that we've done on Eldaka and then we started a bit late so I think we should have time for some questions so this is the this is what we said on our on the homepage for Eldaka and this is sort of what Michael was was introducing so where are we up to after 18 months we were going to develop a comprehensive language data access policy framework so we have done we've done a lot in that space we have the distributed license access control stuff I just mentioned working and that's working pretty well for research context so we know that where we have university based studies the investigators on those programs can go and find the ethics agreements and they can craft licenses and we can give access what we haven't done is try this in community context so we haven't tested this with indigenous language groups collecting stuff and so what you know I would like to do that and with my colleagues we will we want to reach out and work out find some projects where we can test this because this may not work in that context we don't know we do know that it works for the stuff that we've done it with um developing shared technical infrastructure and standards we have I mean standards are never finished we have an ongoing collaboration on language metadata that Nick T. Berger and I chair um and that will keep going and we have shared technical infrastructure is all open open access and available um we have um so we're aiming to have having a sustainable long-term repository for existing language data and we've built the tools for that but at the moment as Michael said there is no kind of the government doesn't give us a place to put stuff there is no kind of national library for this we can't send it to the you know we can't just expect to show up at our local library and give it to someone so but we do have partners so the various partners in El Dacar are going to look after certain kinds of data that they have an interest in and not just stuff created there but there is still a gap there to look at the long-term sustainability of of data and some data doesn't have a good home so where would that go and we're building a portal or multiple portals for accessing data again so we've built a we have a toolkit which is based on these standards so it is it is actually very fast now to if you've described the data well you can fire up a website to access it the website can do the access control we talked about um and can almost be considered disposable because the important thing is having well-described data so we have that what we haven't done because we've been focusing on doing a lot of data migration work from existing datasets um is we haven't got into the harvesting so harvesting from Trove National Library um that's being difficult with a level of engagement that we've had um and uh there's resources elsewhere like our access that we that so that's on the on the agenda we'll be looking at in the next six months and in subsequent phrases I think that's our summary of where we're up to and I'm sorry I'm sorry I've described myself as an expert in my bio I will go and change that so uh do you want to share questions or challenge anyone to try and question anything you know are there any questions here we go it's a curly question about ethics I suppose um historical data collections of all kinds contain you know we're collective for different purposes and under our current ethics kind of regime we would say as emphasis that they can only be used for those purposes so a lot of this the material that you're drawing in would be automatically closed unless the original givers of the information were able to say yeah you can use it for that purpose how do you how do you deal with that yeah so that's that's a really good question um we just have to do what we can so um so a couple of points to make there is for contemporary people and for early career researchers and and people doing PhDs and so on we need to give them good advice for how to structure their ethics applications so that data doesn't get locked up and I've heard you know talk to some linguists saying you know it's better to exclude people who who don't want their stuff more broadly broadcast because otherwise you you collect it and nobody else nobody can ever use it um we have if you have stuff that's been collected historically um under previous ethics or before ethics were invented in the 1980s um then um then uh you you can go back and revisit so we have we have a couple of collections one collection where the the community is small enough that people are actually going back to the original participants on studies and asking them again you know is it okay if we use this in a new context um and Simon has has been doing similar things with um so yeah another thing that which Simon is about to talk has been doing is taking ethics at his university back through the ethics process to say um we this was the previous agreement is it okay so you can go back and and redo things but if you can't do that or you can't identify um that it's okay to share you can't we don't want to lose the stuff so you have to put it up with a license that says we'll deal with this on an ad hoc basis so it is a problem and we need to structure things in the future so that it's very clear to people participating in things uh you know what that they have some control over it and another um another thing with that is even if people are willing to share uh we have things like the GDPR which I can't unpack that acronym because I don't know what it is but um I try um but under the GDPR we have a European citizen who has data in a you know held in an Australian institution um and they want it withdrawn we have to withdraw it so even if they've agreed to participate in your study and then you um and they are or subsequently become a European citizen they can ask for it to be taken out again so there are good reasons not to just share human-created data as much as we might like to you know under Creative Commons licenses because once you've done that you can't get it back and you may actually have legal requirements to do that so it's complicated. Yeah I have a question from online from Wendy Faden uh it's a question from Michael actually and she asks if there are examples of the complexities of Australia's colonial legacy that you mentioned. Yeah I'm not the right person to speak to that straight away um I don't think Clint's really good at it. Robert do you want to have a go? Um the example I would give if I was going to channel so I'm just channeling an example from Clint Bracknell okay so the example that he talks about is um in the past researchers whether the linguist anthropologists have gone out and collected data from communities um and they're the legal owner of that data and that drives communities crazy because it's them but it's this person who's passed away and their descendants are deciding what happens so that's a really good example of complexity right um because the law is whoever made the recording whoever collected the data is actually the copyright owner which doesn't make sense to most people right it doesn't pass the common sense test so that would be one example um I don't know Robert you've got that as I was just going to say and that in the context of language revitalization you're working with vulnerable communities and their languages are in a vulnerable state that legal process in turn disenfranchises those communities and those mob further particularly if that is a speaker who holds valuable information and that community then then doesn't get to make decisions about that or even access it you know um for that information then to be passed on but depending on what information was shared there's complexities um further complexities around who in that community should be hearing that and there's a sort of a proofing process that does need to occur first before it goes on to broader community so it's it's just there's a there's quite a complex process cultural process uh on that side of things but on the western side we've still got that as quite a big prohibitor because that's the idea and we'll talk a little bit more about that in our next one too so please feel free to keep that coming hang on Kylie you've got a wait for yeah it was just a follow-up on on GDPR and it was really just a I guess a point of clarification did did you have to even do you have to demonstrate that your GDPR compliant to be able to hold the data in the first place um I was wondering and and and and also is there any particular GDPR training that that you need to do or other other people need to do in order to interact with data that's covered under GDPR because certainly in stuff that I've done with European colleagues both of those things have applied um UQ makes us do some stuff about data privacy but it doesn't really it doesn't really kind of apply to this so I we probably should have that but we don't we don't currently have processes in place for that like from you know teaching people about this stuff um partly I just I used to work more in an IT department so I've I'm aware of it uh and the awareness in universities is going to be mostly in IT departments who mostly worry about student records but it it does apply in research and we probably don't deal with it well enough it's probably a university level issue uh mark right it's not the new project specific partly thanks do we have time where are it it's all right you said you said I could um question around the what's the dream with that harvesting of data um so that Ayat's catalogue and trove you mentioned I mean is there there's some stuff there that yeah just you could expand upon would be good to hear michael's the one with the dream and I'm the one who tells him how hard it is I'm I'm after what the computer scientists tell me is a pink unicorn really hard to find these things and do um so the idea is that we you know in in say in iatsis which has led this and been the trailblazer um there's oslan codes which are used to identify bank materials across all sorts of um libraries archives museums and so on um and that process has started at national library it's been a bit crowdsourced and then other institutions are sort of independently moving on it but what we'd like to see happen and and the kind of sector to come together is is to focus on let's start identifying all the treasures that are in the collections because there are and and they're in places which you wouldn't expect so a good example would be the Queensland State Archives um they've started this process of looking for actual language materials you wouldn't you wouldn't think that's the place to look for language materials um state archives government government kind of communications and so on why are you going to find language stuff there but actually there's quite a bit there's there's crampers and industry fellow with us has been been doing marvelous work identifying these treasures and the idea was identifying um for access the first step is you have to actually know where it is right um you can't access anything you can't go through a process of culturally appropriate and and legally uh compliant um access without actually identifying the stuff so the the long-term project is to start identifying things we have the standards for the oslan codes working together with partners to start doing that work you need expertise though um you have to know which language you're talking about to apply the code um so that's not simple and then then you've got the chance to have an an overall portal where you can go and I want to find this language material I want to know where it is right and one of the challenges is um say it's a Queensland language there's stuff down in the Friar Library in Sydney I mean I would think to look there right um and you can't sit there looking through every single catalogue in the country and if they're not oslan code coded for um actual language content not about it or about peoples or country but actually about actual language uh samples um and how can you find the stuff so that that's the idea and that's really important to the revitalization project because if if you don't have materials to work with it's really hard to do that really really valuable work hi I've got another question from online from Jelina Haynes would the ethics policies apply would would ethics policies apply the same ethics protocols to cultural stories and communal practices so it depends where things were where things were collected so if you're doing anything in a university that involves humans or animals you know you have to go through an ethics process um but uh for outside of that though that you won't have that documentation in place so obviously ethics weren't invented in 1980 or whenever 1990 when universities started worrying about it um but you don't have the you don't have the formal framework in place uh for the collections that come from uh from other places so you have you know you have very little to work with and working out um what to do yes well I congratulate you too forget it generating the most questions so far in terms of MVP you're it so far um just on that a couple of things just in response to that first question from the floor uh from the online audience we need to sort of precede things like psychology and sociology with word cultural because it's a different story psychologically um you've got to also be able to somehow astral travel and you'd be able to do that astral travel back to beyond 1960 and beyond 1700 and ask yourself put yourself there and visualize this conversation around data because it changes every 10 years so you've got to be able to adjust to the sociological change via technology and accessibility and all that sort of stuff and my way of example uh we as an elder generation I'm not uncalled to be on this brother to you Michael by the way but at my age I was born in 1960 so when I was born for seven years my life was not counted as a citizen was not allowed to speak language practice culture conduct ceremony referendum came in and effectively 11 years 71 become a citizen counted in the census my access to the knowledge that these young people have today include my nephew here Robert we never had access to that so when you go back to Tyndale's and early researchers people like my generation sort of think well hang on how come your young followers they're talking about this we don't know about that so there's always going to be that problem where the younger generation have got to be nurtured and built up and we got to support them to do that that doesn't happen all over the place as you as you'd appreciate because some of our older people are a lot stubborn and they want to look at the cultural veracity of things and the communication of their stories were delivered in song ceremony art dance so there's a big gap sociologically psychologically in our in our own circle so they're parts of a conversation you're going to have as well but you need to go ashore travel back in time go back to the industrial revolution put yourself there and ask yourself the same question you're asking today and try and work it out good way to think should a hippie come or turn a gypsy start yeah okay so you've taken up more time than you needed but that's okay happy to we'll add time in around big lunch so the next two presenters are going to talk to the LDAC integration and use cases working with indigenous communities Robert McClellan is a growing growing man program manager of indigenous language and industry fellow at the University of Queensland in addition to working with ARDC on the language data commons Robert has taken a keen interest in actively engaging in the community upholding indigenous interest in ensuring ensuring Aboriginal voices are heard Robert's mother is one of the most beautiful generous caring kind compassionate people that I know her nickname is scrubby Athena Tina scrubby but I know very well she's my sister and this of my nephew so the age thing is about people like me nurturing and supporting the knowledge and keeping that humility and dignity and integrity and honor at the forefront of the conversations he's going to co-present with another handsome gentleman by the name of Simon Musgrave who is my David Attenborough man Simon's research is mainly focused on the intersection of linguistics and digital humanities his PhD thesis looked at aspects of the syntax of Indonesian Bahasa Indonesia Bahasa from the viewport of lexical functional grammar his recent projects include an investigation of knowledge and of endangered languages among the Sudanese community in Melbourne and projects looking at issues in medical communication and intercultural settings so Robert and Simon as you in terms of technical stuff this is the first time I have seen a gooseneck um lectern mic plugged into a belt pack transmitted and used in this way so very creative perhaps it's not as innovative as I thought um all right so greetings to you all my name is Robert I'm a growing growing man um I'm also to acknowledging my a matrilineal uncle in the room also in the significance that holds for me culturally um I have also been able to identify around the room and I'd like to acknowledge my countrymen um who are form part of that central Queensland and southeast Queensland area so to my countrymen um I extend a warm one young Ngara and Yura I think too my country um just in respect to positionality as we do my country is encompassing those two indomitable western impositions that you might know of as Bundaberg and Gladstone so we start south of Bundaberg in the Burrum north to just above Gladstone reaching out west to almost as far as about three moon or raw bell before cutting in before Childers in the southwest um but uh myself and countrymen are a long way from home here today so um in doing so I must acknowledge Nginaigangalan Balbarm acknowledging that we are all together standing on country um we come in good will and that yesterday today and tomorrow this land belongs to those traditional owners um representing the many clans and groups that represent the Kulin nation and message there to stay strong on your land you mob to those To's of this country um I do want to hop up too because I think this is most appropriate that Simon introduce himself um and we'll do a bit of bouncing back and forward um I'm standing next to Robert I feel old stale and pale um my heritage is I was born in the United Kingdom my heritage is a mixture of English Yorkshire and my mother was born in Scotland but came from a Dutch Jewish family ultimately so like most people or many people in Australia I'm the mongrel and I feel welcome here too all good all right I'm mindful of time so we're just going to um I guess in having this discussion today um it's it's good to have and you know this what I hate whatever it is passing on giving our data to other other platforms to to display our data and they bugger up our formatting and all the rest of I should have just given pds but anyway um there's many many concepts that we have been going through uh in terms of the Eldaka process I'm only I'm limited to about three or four things that I'm going to run you through that's notwithstanding the valuable information that we have in terms of the conversation the cultural conversations taking place with having Uncle Grant facilitate um these discussions too today so that's notwithstanding that but I I do only have um just a few things I want to brief you on uh from an Eldaka perspective of more recent but please don't let that uh stop you from asking those critical questions which we will open up to and hopefully we get more questions than before so we're just looking at in terms of framing and reframing that environment that we're we're doing in in doing this in terms of like a a preface to our conversation today let's not assume that large white institutions such as universities um who in a conventional way of thinking purport to hold a monopoly of knowledge um that's certainly not the case in the area of which we're working and let's not um assume that they hold great just because of their size they hold great capacity to do effective engagement and particularly indigenous engagement certainly not the case and even as black follows working within these large institutions large white institutions uh we cannot purport and I think we know better than to purport to be experts when it comes to this um so that's part of that journey and very clearly that we put those processes in place um and the last point in in terms of this second last point in terms of this preface I want to challenge us to become more comfortable with saying you know things like or being us being vulnerable enough to put ourselves in a position to say things like we don't exactly know what the solution is however we're confident in knowing what the the steps are to achieve that so that's a pretty big thing that I have tried to um shape in terms of Eldakar is positioning us in in that that spot because I think then we we are in a comfortable position to learn um all right and and too and I just did say before but you know in terms of our our community sense of agency and that's one of words you know that multi-disciplinary um use of language we came in and we're using the term communities well I to me communities meant one thing and one thing only and then in the Eldakar sense it meant quite a few different things so there's an example of that um and in terms of our community sense of agency it's not simple no simple solution it's quite complex and no there will not be one solution in terms of ethics and those other areas that we're focusing on that will satisfy the 250 different language groups and up to 700 dialects and variations of those languages black fella can't be what black fella can't see and so the disparity between indigenous and non-indigenous representational leadership it emphasizes over time succeeding with cultural honor integrity and humility in a conflicted world remains a significant challenge for First Nations peoples living in post-colonial Australia and the under-representation of First Nations leadership positions across multiple sectors impedes equitability for those First Nations communities um I share this slide with you uh and I do want to share too as I'm sure you are all very aware um historically early explorers linguists anthropologists and other researchers have developed a reputation for collecting materials and data in ways that are widely considered unethical in a modern day context by indigenous communities there are increasing levels of mistrust within large institutions both indigenous and non-indigenous unfortunately to handle our data in a considerable and culturally responsive manner the materials that we are working with particularly with indigenous collections are sensitive and we must not neglect our responsibilities to see these materials cared for in a culturally responsible manner we also must not discount the intergenerational and transgenerational impacts of colonization upon Australian First Nations peoples and we must see that we are actually that we are in fact working towards empowering first language holders practitioners and communities and that our project most importantly does not unconsciously cause further disempowerment that we're not working in a way where we think we're doing good but we are in actual fact causing damage so being mindful of that and therefore we must aim to disconcert the existing research governance models to enable community voices to be included and to embed First Nations perspectives within the representational leadership structures of LDACA the LDACA project and I enjoy calling myself the disconcerter in the or the the provoker within LDACA so I'm just sharing the first one with you which was the reciprocal governance method that we were aware of at the very well not the very beginning but for as long as I've been brought into the project which centres around good governance practices in perpetuity we have transparency accountability and oh look I've got my notes here that's good accountability and participation those are standard governance practices western way also this model looked at centering you see it centered around purpose now if you're familiar with if not go and look at it further because like I won't do it justice in the two seconds here but the Simon cynic model the why talks about the three circles ultimately it says I know I'm not as I say I'm going to butcher it but off the top of my head it looks at the three circles the first one is around it says that most organisations know what it is that they're doing other organisations slightly better even how they do it but very few organisations actually know why and that remains a problem so why or purpose that is why we centre those around that and I know Michael did talk to that a little bit before so there is a model it didn't really say in those bios they're a bit out of date sorry but my my experience and what I guess I bring to that LDACA project is predominantly around governance and that's my background as well as indigenous language revitalization so which again the revitalization aspect is quite prominent and on its own in terms of language more broadly there's when we look at linguistics and those other things revitalization has very specific needs because as I say you're working with vulnerable languages and vulnerable language communities completely different aspect there so that's one of those models I will go to the next one the next step for us was to look at the standard levels of engagement this is a small paradigm that looks at standard levels so you have inform and consult later you have involve and collaborate for a very long time all levels of government and we're quite happy in the informed consult space what this one looks at is taking that leap over to a more a more involved way of working with people if it's participatory action research or other or other even in project design there is another aspect in the literature that's been added to this model since then I don't use it yet because baby steps baby steps and we also know I was yet to say the Eldaka team I wouldn't say the word team yet but we are we are getting there and growing collectively so that's the that's the gist of that when we take it and we bend it into this matrix you'll see we've got our higher levels of organizational impact but that also comes with high levels of complexity so my challenge to the collective of people under the Eldaka banner is that we are lifting lifting our gaze and making sure that what we are doing is satisfying that top quadrant of engagement and again that's notwithstanding the the fact that engagement is quite complex on its own I want to introduce this other part for you Simon will touch on some other things around trust and community to talk about we've we've have talked about veracity and cultural veracity of data in a in a way already as a collective so I'm going to move to this slide now I am quite confident in sharing this regularly at the places that I go because it's something that we actually do embody and try and involve in the work that we're doing however I'm not so confident today because it was done by Sarah G and Williams I believe is the quote that I use but we got Sarah G with us today so rather than me give the brief spread over it would you are you happy to come up and just give a rundown of this model you want me to give okay all right well I'll just use your own words that's the easiest thing isn't it when it comes to indigenous governance the western governance lens often fails to accept and or appreciate that indigenous peoples of Australia managed to govern their day to day lives effectively for millennia while exhibiting the complete array of human intellect and also respecting the rights of their neighbours across the continent that particular bit is touches on another body of work called autonomous regard which professor Mary Graham has done which is a whole another area of indigenous governance which is fascinating I'd recommend you go and look at that also so the rights of their neighbours across the continent to manage their societies with their own specific style of indigenous governance based on their law law and customs so that was Sarah and Williams 2021 Williams of course one of my old grandfather's cultural ways and has his apologies for today couldn't actually come down for this event but was intending to Sarah and Williams both argued that the western governance lens does not adequately understand the full impact that colonisation has had on the human indigenous human physical and sacred worlds they also articulated the critical need for trauma informed policy responses to be developed noting that western governance processes fail to acknowledge this so to give this a go we have three worlds at play at any given time in a cultural context you've got the human world sacred world and the physical world and you see the three things so to give you a bit of an understanding of it imagine what would happen if say for example we prioritised our human world over our physical world what would happen in terms of the context of maintaining balance and harmony or perhaps are we already doing that and we look around us today I think that's the that's the big melting elephant in the room the same can be said for the third world the other one there in terms of sacred world too you know looking at caring for country our law our ways of being all of those lessons that we learn what happens if one of those gets prioritised over the other we start to fall into we're upset and disharmony as well so that concept which can be spoken to at length for at least a whole day is one of the concepts that has shaped our thinking in terms of how do we prioritise and how do we see that we're not neglecting areas now in terms of is there something else you'd like to add to that before I move on to it we'll come back to it in the question so from there we've taken this LDACA structural trilemma as we put it you would have seen in Michael's presentation there was a triangle represented effectively said data tools and training also the relationship between the two we argue is a trilemma because we cannot prioritise one over the other there simply is no when we talk about trilemmas we talk about the need for trade-offs that's not the case in this one if we do trade-off in one area I I think that really does impact the integrity of the whole project so that's that's shaping our decision making process but then one of the things that was pretty important it's centered around something and Simon raised this all of this affects people so that was an important consideration that these are decisions about people that we're making that not only are we making but ultimately those communities will need to make also with that I'm going to hand over as I said that was just maybe four quick models that's not the only that's not the extent of it but I'll hand over to Simon and you can take it from here so I'm going to be speaking now about some fairly specific aspects of engagement and the state the stage of the project that we're in I'm going to be picking up an idea that Jenny introduced in her introduction build it and people will come yeah we want to become we wanted to stay how are we going to work at getting that to happen and that's to do with effective communication but I'm also going to loop back to some of the ideas that Rob's just introduced as we get to the end of this presentation so all of us working in these projects we know that one thing that is something to grapple with we start out we don't have anything to sell we don't have much that we can take to people and say look how fantastic we are so in that first stage of our work you know what we've been focusing on in terms of communication is to try and make people aware of us make sure that they know we exist make sure that the kind of aims that Michael has spoken about as our broad aims that they know that that's what we're trying to do and then occasionally we've been able to give them some examples of where we've started particularly in the tools area we've been able to workshops and stuff we're now getting to the point where we have more usable outputs so in Petey's presentation he flashed up a view of a screen from the portal we've got various instantiations that portal which access various kinds of data we've had people working with it we can start telling people about this so now we've got to communicate with people we want to not just make them aware that we exist we want to make them aware that we have tools and we want to help them to use those tools so this is the point we're at in our communicative strategy so what I'm going to talk about at least briefly here is three aspects of how we devise the communicative strategy so what audiences are we trying to reach what information are we trying to give to them and what what information is appropriate to which audiences and what a good means of communication which will link audiences to that information most of what I'm going to talk about here is to do with Eldaka I'm briefly now going to mention ATAP text analytics platform it's much less complex I think so I'll just quickly tell you why I think it's not less not so complex and that's because we're essentially looking at one target audience which is people who are interested in using tools for text analysis yeah there's internal differentiation some people may have more skills some people may have more confidence but they share this factor they want to use the tools and we're pretty confident that we know a way of communicating with those people that works and that's basically using workshops we'll put information online as well but we go out we interact with people we show them how to use tools and I couldn't resist putting a bit of a brag in here when we assembled our reporting at the end of the year we found last second half last year we had over 400 people participate in workshops so this is a model that I think we can say is working okay we hope to move on into online training materials but that's an extension of what we've been doing already so that's why I think ATAP is relatively easy why then is Eldaka more difficult well we've got at least two key audiences we've got the people who are looking for data who I'll talk about as data users and we've got people who are responsible for data who I'll talk about as data stewards of course one person can be both of those things different times but we're talking about them here as different audiences I'm not going to worry about some other more peripheral cases for both of these groups I think we have to start from the assumption that trust is fundamental to communicating and getting them to engage with us effectively in terms of people who are seeking data they need to trust that the data we provide will be fair and it will be good quality in terms of the people who had data that they're responsible for they need to know they need to trust that we will manage the data responsibly and in line with care where that's appropriate I think aside from anything else the issues that Rob's already raised about governance play into this maybe these data users and data stewards are not going to really care that much about the internal minutiae of how the project is governed but they need to have the sense that there are sound principles underlying what's going on that this is an organization that knows what it's doing and makes decisions about what it's doing in ethical and responsible ways just quickly the kind of information that these different groups might be looking for data users you've seen the kind of portals we're building they say well how do I get started I open this thing up where do I begin they start looking at stuff they might come up against a metadata term on a screen they go well never seen that before what does it mean they want to access some data how do they download it if they can't access the data how do they get access to it pete's told you about the systems we're developing for doing that but we have to really communicate to people that they can go through those processes so these are the kind of issues that the data users might have data stewards quite a basic question which we've been asked by quite a lot of people now is why should I get give you my data or place my data with you in the first place what benefits am I going to get from that again access control is something people are concerned about how's that going to be managed and then what is actually the process if I think yep I'd like to have some data that's placed with with the data commons how's that going to happen how do we work through that so these are questions that they might have so that's audiences that's the kind of information we probably need to transmit what about the means the medium that we use most of what we're going to do is going to be in writing okay that's more or less inevitable but there are different possibilities even with written communication of course we can go for different delivery methods web pages downloadable documents we're developing a get book for some of our information at the moment which personally I'm a big fan of I say go for get books if you want somewhere to put lots of information in a nice form we also have to make assumptions about how much background knowledge how much shared knowledge have we got with the people we're communicating with are they absolute beginners are they experts how does that influence how we transmit information and what medium we want to use and how we write we're also looking at some more interactive possibilities I already mentioned online tutorials screencasts are really great for doing a lot of stuff and there's also possibilities like kind of virtual tool tours there's a reference here to one of to the virtual language laboratory which is run by Claren which Michael mentioned earlier it's actually really cool have a look at it you go into their their main portal running this tool and little boxes pop up and guide you through the process and give you information about what's going on at each stage so these are the kinds of possibilities we have and even trying to move beyond just text based stuff we're thinking about using infographics for some of our communication we've got one that we're working on at the moment and this was actually one of our ways of responding to this question that I already mentioned people have to us why should I place my data with you what are the benefits so we're trying to develop a an infographic which has graphics of course maybe some animation some text but which gives some graspable graspable information responding to that question that people can get out very easily just to think a little bit more about how we might divide up labor I wanted to talk a little bit more about using a portal as I said one of the kinds of information we may need to transmit is how do you use this and already there are various possibilities in this medium there are different possibilities within a single medium the web interface so there are things like tool tips little buttons you can click on and you get some information box pops up tells you something can give you a link to further information or it can just give you a few words we'll have health pages at the moment we're thinking of these fairly minimally that they're going to be almost like a quick start guide that comes in the box with your new tech toy this is enough to get you going and also tell you where you can go for more information we'll have some FAQs and then as I said lots of these things will involve links to much more detailed information so I mentioned we're developing a Git book that covers our metadata vocabulary which is quite a lot of information you don't want all that coming up on a single web page but you can be referred to it and get the detail I mentioned also interactive guide possibilities such as Claren I don't know whether we'll get that far and online tutorials so there's lots of different possibilities for giving the same kind of information in different ways possibly to different audiences so for effective communication we need to match these things well we need to match the audience and the information in the medium and I think I hope it's clear from what I've already talked through that this is going to involve redundancy we're going to be putting the same information in different packages for different audiences but we need to do that and we need to be aware of how we're doing that to some extent this is going to mean things like choosing a tone and a style of writing and I'm sure we're all familiar with this you know we write differently if we're writing a journal paper if we're writing a blog post we do things differently but sometimes it's going to mean choosing different forms of delivery to say that they match well with an audience now all of this I would say to some extent depends on our imaginations when I sit down to write for a particular audience I have to have in my mind I have to imagine if they're not the same person as me and obviously they're not I'm writing for someone else I have to have some idea of who I'm addressing and this process can fail as you already heard we had a steering committee meeting yesterday and I went through some of this material and I also some of it was in reference to particular documents that we're preparing and there was one document which I thought we'd done quite reasonable job of making suitable for a general audience and the steering committee told me no your imagination has failed in this case which was really great to know because now we can try and make it better but if my imagination fails at that stage then there's this huge question that comes up how is it going to cope with the challenge of communicating with indigenous communities which is going to be part of our mission as I said I'm old and stale and pale I've had some fantastic experiences over the last year working with colleagues like Rob with partner organizations like First Languages Australia and Bo and his team and dare I say it with my brother Grant Grant but I'm not there I can't do it and this is going to be a real challenge for us we've got people who can assist us as Rob said we at least know the steps we want to take but we're aware that it's a very very big challenge I'll stop there stale and pale don't even say that yourself just say readily in the daytime and in the nighttime that's it um any questions on that last comment you made Simon about um you're not there it's uh if we I think I talked about it the last time we caught up if you want to go and get your motor car fixed who you're going to go and see the mechanic with certain qualifications so you can ensure quality standards and there's all of that sort of stuff if you want to get your house fixed or you want to get a house done so you're going to go and see well the architect the builder with accredited experience they've got it's all about your quality you've got to have quality assurances which is a governance conversation you don't go to the carpenter who tinkers with his motor car on the weekend to fix your car and you don't go to the mechanic to fix your house so in the context of indigenous engagement research data who you're going to go and talk to indigenous people on the ground in their own country and respect the diversity so uh why are you there your purpose how and what is it going to look like and what governance framework do you have to guide you in that process is important now I'm mindful that um we're talking about indigenous engagement First Nations engagement Aboriginal Torres Strait Island all those introduced terms to define 400 different tribal groups as one we're not one so brother what's your thoughts on engagement with indigenous community speak up loud um I guess my thoughts on indigenous communities is that you know I'm not the expert in in their their lives that I don't know they walk their journey so walking into that you know their family I don't nothing so their story is shared with me through their eyes and through their shoes so yeah Steve Corporal which way I have known one black hole in the room I was um I was thinking about that you know um mate I was with with sometimes we uh it's how you deliver it because sometimes all depends if I'm sitting at my local shops I just sitting there with a whole bunch of dollars in our area case rich and and low socioeconomic area you know it's getting a bit flash these days but I love people come up and they're hey hey gang brother and and you know and they and then so I go to university where I work at uni and finish the phd and suddenly you have to talk in this other language and um so you know it's quite interesting where you go sometimes and and so how you relate to people and I'm just same Aboriginal person I'm not different you know but but how people talk to me in different settings is quite interesting and so I don't tell them that I've got five degrees I think five hey I didn't ask for Billy big no I just wanted to know okay but this is it's a it's an interesting point because sometimes when people ask me things I have to really go down sort of thinking oh man I've got to talk real tribal on this you know and and researchers come to see us see us in community and they'll dress down they'll this that and everything else and it's such a stereotype it's nearly racist sometimes you know like they think that and and the other thing is I sit down with a lot of community people there we talk about lot of big things in community that no one else is going to talk to them about because they come in looking really flash or they dress down and come to talk to them and they're going oh this mob here again from the university or from department or something and so they'll talk crap to them well anyhow I'm just telling you just so when I sit there with mob I just sit in their yarding up and we'll hear all sorts of stuff and sometimes when I do research people know you from working 30 years probably 30 years plus in the community and they know you so they don't tell you crap and then any others what I'm getting at yeah so as an Aboriginal person I just I I laugh about it sometimes when I see something written about about mob but I'm thinking man that mob just told these people crap and I don't know if you remember Margaret Mead but anyhow that sort of stuff can you see young sister there beside you yeah you've gone a long way around not answering my question but what I did pick up was that just be you they just naturally engage with people what do you reckon sister thank you thank you um yeah I I guess in a shorter shorter way um I spend a lot of time with Steve here at work and um that is about being authentic in in yourself and what you're doing and really taking time to get to know the people that you're wanting to engage with and to just really get to know them you need to have that as your first priority yep brother Fox over there from Walker Walker which way what are you thinking and again this is just an example that it's not one Aboriginal person speak for all Aboriginal people it's important to be respectful and hear what other people are going to say uh yeah I I I guess my approach is to always just try to be present um and realise like the whatever moment I have in time I'm never going to be able to repeat that again so just actually taking it in for what it is actually being present and absorbing that in itself and actually being led with like humility so yep and sister behind you like with the community work that I've done in the past like I always just go in there um just trying to get to know the community and being respectful and getting on their level yeah just be as I said the word when I talk about act with honour cultural honour and integrity and cultural dignity and behave with cultural dignity and humility act with cultural honour and integrity and behave with cultural dignity and humility integrity and honour you've got to have the right head and the right heart don't go in there talking gammon bullshit trying to really ignite yourself don't stand up in front of an audience and say I'm grant sorry I'm very important person and I've come here to help you aborigine peoples because I'm the most intelligent aborigine in the whole of Australia straight away I don't have to say anything I if I can read the audience which should do in an aboriginal setting I know they're thinking this while it's gammon straight away if people start thinking you're gammon you got problem so just be you you're there for a good cause you've got to basically know why how and what it is you're there for with the honour and integrity and humility and dignity underpinning your approach and you've got a set about establishing building nurturing sustaining a relationship which you when you've done it you can walk away and value the lessons that you learned from people on the ground catch them on pitch and brah see might be all about all that modern lingo brah okay now any questions outside of the conversation we had thank you kind of expanding I guess into thinking about say the policy and government sectors and industry and sort of connecting a bit back to the why sometimes with these tools that we create it it's not even clear why you would you know where how might you service somebody's end goal whether that's to you know deliver a policy or work an industry or community sector or whatever so I'm just wondering how you're thinking about yeah pitching and doing workshops for or whoever you know the training to show how these the elder tools might be able to be used by government private sector community sector um in terms of just jumping on that first bit of the question and if I don't answer it fully just pull me up um that first bit around defining that purpose and why I think in terms of what we're doing or even if you are doing a research project and you're using LDAC infrastructure or other infrastructure building on that conversation we just had around the room it's incumbent upon you as that project driver to be an effective translator of what the process is and if you don't know how to translate that build a mechanic thing find the one who can do that for you to get the job if we're looking at like ethics um applications and in terms of those sort of agreements if you are going and sitting with community sitting listening as small boy small girl taking in what those things there's a translation process that's going to have to occur to articulate their needs over on to the other side of doing so you've got this both ways uh thing at play right and that's your responsibility not necessarily to do it exactly because I don't expect western researchers to be to have expertise in translating but find the group or find the and some communities have been working with academics for a long time and they're quite skilled at community they can communicate the purpose and they'll tell you exactly what it is they want other communities who haven't had those um experiences aren't so well developed at communicating what they want but that is why I guess that rest uh reciprocal governance model was quite an imperative cornerstone I guess also end of piece for our governance structure is how do we do that translation process if we don't actually know what it is that we want to do similarly how do we then go and be familiar with the process from community for them to articulate their why and what they want to get out of it in terms of like shared you know mutual agreement benefit sharing that kind of business there too does that touch on that a little bit I just wanted to say too in light of that conversation before you know talking about community and you can see on those old tapes where they get those old fellows and they they're getting sick of that anthropologist you know and if you've ever listened to those if anyone has done work with Indigenous language data and you've listened to some of those tapes it'll be like because you know all these Australian people they all talk the same back in the 40s 50s you know I'm going to tell you one word and I want you to repeat that three times in your language and then they'll try and they'll be doing that process and then you can see with that I guess as Blackfella you know listening to that recording you can pin point the moment in time on that tape or that recording where there has been a misunderstanding between the two parties and he has pushed and pushed and pushed and this the fella who the informant is talking about something else one of the one was he said koala koala bear and the woman was saying oh she said oh koala bear my daddy said about this and then she started talking all of this information about that and where that comes from and as I said we don't it may be not be appropriate that we have Indigenous language dictionaries rather Indigenous language encyclopedias because every single language word that we reclaim and we start to work with doesn't have one simple definition it's got a whole story and then that word she said for example gula that's not just common to us that's all up that air that's a very significant there's so many significant words that all encompass that area and connect us together so just but that didn't get to be talked about in that recording because that's not what he was he said no no and he cut them off no just say it again and it was very much like that so perhaps we could we had an opportunity and in some cases we've lost it to get more valuable information if we just sat and listened but we know that as Blackfella researchers but I guess I guess that's that if that yeah you get my point in that one thank you for that that Eldaka project looks really really interesting and I think both groups have discussed ethics and responsibility and I think Peter was talking about the potential for going back to those pre-ethics data collection stages and asking to open data up that has clauses around it I wonder if part of that responsibility also comes back to the opposite of that so we know that a lot of data has been captured without good ethics or you know we know that there's been malpractice in the collection of data historically is there a responsibility to also look at what open data needs to be protected in ways that it's not currently just just briefly on that right this moment we have a group of 16 a cohort of 16 mob from around Queensland who've come together at UQ over today and tomorrow to go through the archives and to go through the Friar Library and start to unpack a lot of that data material we have our steering committee Indigenous led who would raise to concern and stress this very much so that what do you do in terms of what data that they do access and what happens when they find out that has been collected in those ways that are unethical and they look at their great grandmother described as black gin or whatever you know sitting on the river beating the possum skins and that you know in those very colonial you know that in a song context thing about that one what do we do in terms of what do we do as facilitators of that experience in a university what do we do to see that they don't leave that doesn't you know bring up more trauma and what do we do like for me I've been looking at those documents for a long time and it was very shocking the first time to see those but now I think as you know black researchers we we know the material we're working with but for ones coming into that space they don't know so we need to see that we did a bit of due diligence around supporting that in having those conversations with them creating a culturally safe environment to sit and say hey look these are some of the materials that you will come across talking with other Indigenous researchers who can share that experience with us so they know taking the time out when they need to and also dealing with that from a trauma informed way to go and get what they need out of that data and hopefully then that can be flipped in a way that that experience was good the second part of that question not all data can be we talk about fair a lot and I hope there's a point in time that we're going to move to care principles because quite simply not all data can be accessible it should be accessible I've got stuff on my computer there that I'm not that's probably not appropriate it's it's very clearly men's business so the thought of that being open access is something that really concerns me because it talks about that process for around Central Queens and all that and it should concern all of our mobs around that region as well as not only the men and the women of that place rightly so there's women's business papers around so there's things that you know that did I talk about before a bit a bit about that vetting process that we need we have a responsibility if we access that that that needs to we need to find the right people to sort of vet that first and work out if community are going to work through this and make decisions we don't want other community to see what they shouldn't see yeah in terms of that too but we got to get to there go sit down you two I know it's no charl favoritism to my nephew but I apologize for the fact that we were over time there um Jenny and I can save a little bit of time because I don't have to reread Jenny and Peter's bios and we can go a little bit into big lunch time as well but as you two come up uh you can come up together we're all in something while they're coming up I've let I put a little purple dot under one of your chair so whoever's got the purple dot have a look under your chair purple dot under your chair get up and have a look underneath and whoever's got it I might have something for you it's a purple one have a look come on over Peter and Jenny find it not there okay where's Jen all right not it's not over there and that group over there you have a look all right now Jenny and Peter are going to take us through the big lunch and talk through the Haas community data lab the purple dot didn't appear because there wasn't a purple dot under the table at all under the chair at all but you've all had a lovely little stretch which is all that's important yeah I'll hand it to Jenny okay so Peter and I I'm going to do a bit of a double act here on the Haas community data lab project um so you remember earlier I spoke about the Trove researcher platform which was one of the four identified activities that the Department of Education chose to invest in back when the Haas and Indigenous research data commons started so we've been in negotiation with the National Library since the inception of the program and I'll talk through the process that we've been through so back in March 22 the ARDC board on advice from the Haas and Indigenous advisory panel recommended that the ARDC undertake consultations in relation to researcher requirements for the Trove research platform we appointed an independent advisor to that who was emeritus professor Joanne Tompkins from UQ who led that process we also put out an EOI for membership of a researcher panel to advise the process and to really represent a broader research community so that we could be sure that we were capturing the needs of a broad range of disciplines in that process and and Joanne and I said about designing a consultation and reporting process with approval from the National Library. Then in May so we actually appointed that researcher panel and you know that did have broad representation from historians, Indigenous languages, the projects within the Haas RDC were represented and the community consultation roundtable events got underway so we held two of those events and they were open to anybody they were well attended I think we had about 120 people at each of those events and in between times Joanne held meetings with the researcher panel to make sure that she was correctly capturing those needs of the research environment of the research community. We released a draft report and once again socialized that and gathered feedback from the broader community. Then in June that report was released and again that's another document that's easily accessible from the ARDC website so that's a public document so you're free to go and have a read of that and we continued our discussions with the National Library. We made a decision that there would be two projects so two separate but connected projects one led by the National Library of Australia and the other led by the ARDC so at the moment we are in project planning phase and Shubra my colleague is here who's been giving me a lot of help with that and we have appointed Peter Sefton as to help us with the system's architecture for that as well. There is a contract currently with the National Library a meeting to progress that is happening tomorrow so we're hoping that that will all go through and yes the community data lab will be led by the ARDC so the Trove consultation report I have to put my glasses on there Trove consultation report identified a number of desirable developments and they fall broadly into the following categories so there was a desire for improved searching improved OCR more availability of information regarding the content and the functionality of Trove integration with external platforms and resources for example and I'm sure many of you know of this and have used this the Glam workbench improvement to Trove APIs and a community data lab that could pull in existing research infrastructure elements tools for use with the Trove APIs so with our two separate projects the National Library provided we can get the contract finalised we'll do some work on improving their APIs but they'll also do some work on improving the availability of information regarding content content and functionality specific for researchers the other element of work which is what Pete is going to talk about is the community data lab and that is the project that's being led by the ARDC obviously we've got a fairly short time frame to deliver something we're aiming for June 2023 and Petey will talk about how and what we're going to deliver in that very short time frame but we're also looking at a phased approach so we'll try and complete phase one by June 2023 but we've got plans to extend the community data lab beyond and you know my vision for the community data lab is that in the longer term we'll be able to allow it to access data using other cultural institution APIs and other project APIs and start to explore some tools that we can't make available just yet so I might hand over to Petey me again and I know it's lunchtime so I'll go quickly through some thinking behind this community data lab first of all this is a diagram that I put together with my colleague Marco La Rosa just to try and frame the conversation about research data management people often talk about research data management life cycles which is a terrible metaphor and they often talk about a very long cycle for research data where you get a grant and we have an idea and then you get a grant and then you do some work and then right at the end you archive stuff and then you go around and do it again which is not how things really happen and not how they should happen so we sort of tried to put together this thing where we split the world into the place where you do the work workspaces where that's where you have computers and you do you know analytics and so on but the important thing is to have this continual cycle of whenever you produce results whenever you get data you you observe it you elicit it from a person or someone you put it somewhere safe so that it can be managed so that you can get to it again and we'll talk about the fair principles but they talk about findability and accessibility and that actually implies the longer term so I got to do this work over Christmas and New Year while everybody else was on holidays and partly this project from the ARDC has necessarily had to work in quite a pragmatic way we've had to work with whatever and I didn't make some of these decisions we're just trying to tie it together we have had to work with what's available people are known to the ARDC who can you know can get stuff done quickly and luckily there's you know there's a good community of people we can work with so we're going to do some things which are examples and ways of doing things that we'll scale out later this is the ARDC skills matrix and I've built off this in terms of looking at where we place what we're doing so this is it's an infrastructure project but skills is a really important part of that that triangle exists the rubber was talking about exists everywhere in this work so this is my this is sort of my characterization of where we fit for this project in within the ARDC skills matrix so researchers the people we're targeting with a with a community data lab are going to operate with data and they're going to work in an in a haas community data lab environment and that they'll be using data infrastructure and so on but there's a couple of other really important roles if we're going to have data and we're going to have fair which i'm getting to we have to have somebody to look after it so this role for roles for data librarians and metadata people and so on who cut data custodians so that in many institutions it's in a library and often it's sometimes it's in a new research group or an IT group who have to do this stuff and then the other really important part the research software engineers who the people who build this stuff are charged with implementing fair so the fair principles which are which do actually include the a part means giving people access to to data and that's the appropriate people so just to pick up on something Robert said earlier like fair does encompass saying this is only for men or this is only for women or this is only for this is for nobody because it's too sensitive so the a part implies that you have repositories you have all this infrastructure where we have described we know who can see something we know how long we're meant to keep it sometimes we should dispose of things and we have services for finding things we can't just say oh go to people go and be fair you know take a random bunch of researchers or anyone and say oh just can you be fair please because you actually need infrastructure to do that right you need to have this idea of having places for where we can keep things and manage them and we don't have that infrastructure in all the disciplines across you know universities and research and and the glam sector so this is just a really quick snapshot you can just look at the bold bits we took a principled approach to this architecture pulling this together making sure that we are looking for sustainability we don't want projects I won't name them but we've all heard of lots of projects which either don't get sustained beyond the end of the project they just disappear or they persist but are completely useless that's why I'm not going to name them but we want to make sure that we are developing code that is reusable in lots of different contexts we develop a tool it wants to be available to data scientists and programmers and for people who are less adept at that sort of thing in a notebook and maybe later on if it's really successful in a point and click environment so these these we work for some with some principles and here's some more of my architecture diagrams these are all color coded to match that thing I showed you at the beginning so the key part of this for me the thing that that ties all this together is that we know the approach taken by something like the the glam workbench which is looked after by Tim Sherrett that is popular persists it's really useful and that's a set of tools that people can use presented partly as a set of notebooks where which are kind of a mixture of narrative and code where people can go in and do various processes often talking to things like talking to trove using the api's the programming interfaces on trove so that model has worked really well in fact that's what how a tap is structured the Australian text analytics platform and so we're going to take that model which we know works and this is quite simple and pedestrian it's just using standard public infrastructure these things are often hosted in places like github where you can put together like a book Simon was talking about git book there's other ways of doing it where you put together a book that tells people how to do things and they can click buttons and go off and do stuff and that talks to the services that run the notebooks and the rest of it and it's talking to trove as a as a as a repository which is managed by the la but we will grow that to talk to other repositories in the future so the model this is this is a this is a set of screenshots from a tap which are basically just building on what tim built with the glam workbench and so the the back part over on the top right of the screen shows one of our portals as there's a there's a um the portal which is a website that you can go to has data in it so in this case we're looking at the kui corpus which is a historical corpus of Australian English um and going back to the 1800s and that is associated with a notebook which is some code that knows how to work with that data and the two things are stored together so you can discover them if you're looking at kui there's a there's something that goes alongside it and there's there's one now but there'll be more later so you can click a button that says to look at that notebook and then you can click it and there's a bit of code here showing where you get a recipe for how you can take this kui notebook and you can actually work with it and do stuff with it which is the essence of of the approach that's in the glam workbench and the thing that tim will be developing for this data lab which is the trove data guide and just to emphasise this point about the access control I was talking about before if the collection you want to work with is sensitive in some way it might require it might be only for people who are bona fide researchers or or members of a of a cultural group who are allowed to see that data then we have access control which means that that can work in the same environment and you have to do that license dance thing that I I showed last time where you there's a lot of backwards and forwards um where you can apply for a license and then someone can grant it to you so if we want to find out whether someone is okay to look at a resource of any kind you find an authority you can who can approve them and then they can apply to that authority to say can I can I look at this data and that potentially could work in lots of lots of different contexts this slide is just to show that trove is kind of complicated in that um there's a repository part the green pit bit all all the repository stuff in my diagrams there is green um which is the national library hold but there's also stuff that they've harvested from other places and part of part of the job of this community database is making helping people to understand that complexity um this is the most complex diagram I'm going to show you um and the this one uh is to show you the other part so we've got the the trove data guy but there's a couple of other parts um that have been chosen kind of opportunistically but they're representative of research applications um which are being worked on by um systemic solutions and Ian from systemic is is here um uh now these these are two different different they're actually quite good examples one of them is uh an application called the intelligent archive which is not a great name for a piece of software because it doesn't tell you what it does it does stylometric analysis so you know if you're looking at you see popular press articles about working out where the Shakespeare wrote Shakespeare by profiling the um the the content of of various plays and things it does that stuff um and it's been around for a long time uh and is one of those applications which is useful as a demonstrated research purpose uh but it may be difficult over time to you know it's it's migrated from being a desktop application to being a server application um and so that's a good example of a piece of legacy software we might say and the other one is that the image annotation workbench which is a brand new application which is standards-based uh which is going to allow open up people being able to pick annotate images uh so that they can and this is really valuable in lots of context people want to do this in historical contexts uh you know in if you're studying art it's also used heavily in science medical images and so on and they and the same protocols are used right across all research sectors um but we've got a particular piece here about how do you sustain something like that so we have a company who've been engaged to do this work for the duration of the project which is why I've put in here hosted fixed term services because the contract will be for X time um now there's no suggestion that we make these things go away after that but we really need to focus in all of our work um with these kinds of systems on actually recognizing that if you set something up you have to keep worrying about how you're going to host it who's going to pay for it and you know what's the model for making sure that that continues and this will be tied into more stuff about repositories at at the bottom um we are going to as part of this community data lab look at making these things work uh with trove both trove and atap resources so we can start to treat this as a big integration project that ties together um the uh trove atap and various tools across the whole sector so you start to get what you should the promise of a data commons is you should be able to have tools and data sets that that interchange so this will be a great place for us to start proving that we intend to at least have some proof of concept stuff of that by june just one comment about labs so the metaphor we've got here lab lab is obviously a metaphor right in this in this context um a lab might imply you might think of a place where you go and you know and there's various machinery um you can do things to you know samples or something um obviously that's not the case in this place but labs are more than just the machinery labs have protocols and processes and so on that people have to follow and so i think part of the important part of the lab metaphor that we have to think about is traditional lab science um would involve uh doing you know doing things to animals or making chemicals or something but you're also part of the lab is the notebooks and writing things down and the processes for publishing and so on so i think that we need to we need to remember that when we're talking about something like a community data lab a lot of it's about protocols uh and people do talk about it's nice to think about some of these applications are kind of virtual instruments so you know if we're observing things we could think about some of our code as being stuff like telescopes and microscopes they're working in a different different way uh i tried to be quick so i'll stop now any questions that you have for peter or for jenny okay uh so then we'll convene uh break for lunch big lunch uh still come back at two if that's okay thank you very much thank you peter and jane over here earlier you're welcome to come back in as you as and when you wish um now the next uh conversation is about the gazette gazette here of historic australian places which will be co-facilitated by dr in mccrabbe and bill pascoe um let me go through ian's dr in mccrabbe is the founder and managing director of both system system system who's thinking systemic systemic systemic you can cloud what systemic solutions and the oh my god practice foundation yep ian has uh an ma in you tested me out here today sang sang sans chris sans crit and buddha studies from us yd and has a phd dissertation continued his focus on digital methodologies for the analysis of the native inscriptions and characterization of rit ritual practices and religious significance of relic establishments in gendar gendahara gendahara gendahara in is an analyst designer and project manager on the reed project designer of the reed work mentioned tech the tech space methodology methodology as a project manager and design consultant on a range of collaborative corpus development projects bill pascoe bill on the other hand he's what turns up in your letterbox no um bill pascoe just there's no buyer for bill but builds the system systems architect for the tlc map among other things stole and the to you so good afternoon thanks for the opportunity to present um others have been far more eloquent already but i would like to acknowledge the traditional custodians of the land on which this event is taking place pay my respects to elders past and present and and just as as an addition just say i am actually quite humbled by grants contributions this morning yeah i really am um so first of all full disclosure um as grant mentioned my research background is as a philologist um working with sans critic inscriptions i'm neither a historian nor an archaeologist or a geographer or a mapping specialist um um my my alter ego in this context is as a digital project manager um bill who will speak um in a few minutes is the system architect and he'll present shortly and he will have forgotten more about mapping that i even know um so my job really is just to walk you through the project and a brief overview of the platform and then defer to bill and you can ask him all your gnarly questions um just want to shout out to hu craig who's probably on zoom hu craig from the university of newcastle is the principal of the platform um okay so so just a very very brief overview of g-hap um that's the infrastructure project we've been engaged in and um i'll walk through some of the main features um so g-hap makes available aggregated data on all place names in australia based on data from the australian national place name survey the coordinates are more than two-thirds of the 330 000 or so a p a nps place names have been cleaned up and g-hap provides a user friendly search and filter interface and an api so g-hap has really two main data sets as i said the first one is the a nps data set which is aggregated from state and federal records and from some other sources if you like that's the official record of place names the other important aspect of the data are user contributions so um information about places that have been contributed by researchers and and by communities as far as research affordances are concerned um the objectives of g-hap are to enhance understanding and appreciation of the meaning of place in australia or places in or indeed places important to australians even if they're overseas to crowdsource historical indigenous and other place names not already in the a nps gazette here to crowdsource attestations or historical instances and mentions of place names to associate places with their many meanings to provide links to source information and other data sets to provide access to this information with search and filter user interfaces web services and visualizations and make sure we're compatible with other spatio-temporal platforms and systems so just to ground that a little um the sort of things you can do include research on historical events such as history of activism or a history of resistance you could do research on the meaning of places and place names traditional historical journey routes you can look at celebrations of sporting achievements and music tours and events and you can do comparison of layers of information for example comparing the the timing of missions with massacres and with rail and with railways so the project we're engaged in um the g-happers infrastructure project um has been supported by ardc has ardc an indigenous capability program was previously supported by arc leaf funding as a component of the tlc map platform which i'm sure most of you are have heard of or are familiar with um the funding was through the university of new castle and they also provided in-kind support and they also subcontracted our company systemic to actually do the project so ardc were interested in a in a standalone in bringing the gassety element of a tlc map to the fore and hence the idea of a standalone directly accessible g-hap um the idea was that this could then serve as a curated list of place names with geolocations with an api and and website access tailored to the needs of the different strands of the hasson has an indigenous rdc so ardc wanted g-hap to be long-lived robust and regularly saved in portable and sustainable formats so really so hence infrastructure in the in the full sense um g-hap also provides a pathway for hasson rdc user and other users to the other affordances of tlc map so from g-hap that's a an on ramp to using some of the other research affordances of tlc map so there's a set of public user contributed layers which can be combined in multi layers and the resulting maps and visualizations as well as some spatial temple metrics so you move from g-hap into tlc map um so we were engaged to to undertake the development as a managed project um my role was as a program manager so while i've got a good overview of the project my command of the detail can be very shallow um we've been around for 30 years um our focus is on building big content platforms for the public sector and open source dh platforms with academia um you can have a look at our website our specialist expertise is in building research workbenchers and in sophisticated cms implementations for research publications so the project itself um we took a rather conventional approach which is we've undertaken a requirements consultation with research and infrastructure stakeholders we went through a process of refactoring the existing tlc map code codebase and then the establishment of g-hap as a standalone infrastructure so the platform's in beta release and and we do encourage you all to try it out the phases to be delivered over the next few months include augmentation of the platform with an additional foundational data set so that's going to be the national composite gazettea and and also the integration with the proposed community data lab atap infrastructure just to support reproducibility and sustainability of the data sets so just harking back to the the last presentation before lunch um the development's been undertaken to a pattern articulated by um peter in the technical architecture of the community data lab which provides for domain specific research website workbenchers perhaps accessible as a sass platform with supporting methods and standard site um also accessible via api or for a virtual desktop this pattern is one we've implemented across a range of platforms um the dh ones in particular uh read workbench which is my sanskrit area we've worked we've worked on ia but workbench i think which was alluded to intelligent arkbott kybe workbench image annotation workbench which which was also mentioned um a thing called uh nitronode and now and and also g-hap um so just before i hand over to bill who give you a walkthrough of the platform and some case studies just to familiarize yourself with it the url is up there please have at it just try it yourselves um you can see the url up there um just as the briefest introduction there's a main menu which you can see which is how you get around g-hap how you access the workflows and standards so there's a a a mirror a companion website with all the methodology workflows and standards there there's a user menu um which is the the green bar immediately below the main menu and once you're logged in then you'll get access to um things that you need to your account for example you can build your own layers and they are private to you main thing is the search section um simply type in your search search there you can see there's an advanced search and bill go through that shortly the advanced search lets you um upload um files with lists of names to search for it also lets you drag and identify areas on the map to search and all the all of those at the same time um so what i might do is i'll let cut that cut through the chase down and let's bill um take over right thanks uh thanks Ian thanks Grant and thanks everyone for having me here today um yeah so i'll it's not much time but um i'll try and give a very quick rundown of how the gazettea works uh i'm just going to read that and then talk a bit about the um indigenous involvement and aspects of the project so okay we're right in the middle of the upgrade so that's what it will look like in maybe a couple weeks yeah um the moment though it uh sorry it looks like the gazettea itself will look like this so um basically it is a way both to contribute information about cultural places in Australia and to search and find information about cultural places in Australia so um any person member of the public might just um search for something uh and get a long list of results for example i just search for places wait no that's not showing up right just close it down all right there we are okay that mustn't have made any sense at all okay this is what it looks like now um so i could search for places starting with coup and get a long list of 3983 places starting with the word coup um with information about where they came from so most of these are from the australian national place name survey which is an aggregate data set of state gazetteas and you can view this in various ways there's a basic 3d view up you can view it in clusters or for some types of information might be about a journey you can see it as a journey um and other ones so just quickly it looks like that that's quite a lot obviously uh but you can zoom in and you can click on any one of those dots and get the details um so that is just the simple search um one of the things i like most about it is that let's say i was just interested in um in information about uh i just was well what what information is there about this place this area i can see what there is both in in the official place names but also in what anybody else's area so you can also see that on the on the map okay there's quite a lot of stuff there it turns out there's a waterhole uh now um okay so you could also um limit your search to the uh the official gazettea or not and if you just select layers you can see only what people have added um so i'll just do that again for while i could draw a circle around an area like that so let's see there's a railway station um you can link back to i'm not sure where all these are coming from but you can link back to the layer that that particular point is from so it's like you can you can search any number of layers that have been put in um in this direction and then link back to the to the layer and in that direction um you can add all of these options together and just combine them in any way you want so you could search for any place name starting with coup within that circle and that sort of thing um now the other important thing of course is that anybody can add information any maybe aimed at researchers but um community could add information um like if people wanted to put the um the right place names there uh or something like that it is very much though public facing so if you wouldn't put any secret information up there it's it's only for things you want to broadcast to the world um i won't go through the motions of how you add layers at the moment because it's quite a big be quite a bit of work um so that's the gazetteer of historical Australian places which is part of TLC map which means time layered cultural map so time is quite an important part of this being able to see change over time um so i guess now i'll talk about the uh indigenous involvement in this so the projects to develop this software is employed um Ayn Usher who's a Wiradjuri software developer and Dan Price is Derek Gamilaroi as a research assistant um and we've always tried to make if we're mapping culture in Australia of course that's got to start with um indigenous Aboriginal and Torres Strait Islander First Nations um information from the beginning and ongoing um and our sort of motto from the beginning as in was just saying to me was um no infrastructure without a project no project without an in without infrastructure so we made sure that the projects which were driving the software that we developed and its functionality and all that were either led by indigenous people or were collaborations um and I always just personally as system architect try and learn as much as I can about um traditional mapping technology to and and make that sort of drive design decisions um and the next grant that we've just about to work on in the coming years has got dedicated funding for consultation and that sort of stuff so um it's meant to make mapping quick and easy and free so it's not a substitute for other GIS systems so much it's meant to be that so that obviously you would have to put a little bit of time into figuring out how to use it but if you can create a spreadsheet or if you know how to use a computer you don't have to be a GIS expert to use it you don't have to get a six month budget and hire a GIS person it's meant to be if you can use a computer you can follow the instructions and figure it out um at the same time it is compatible with all of those other systems because it uses open standards for import and export so if you do have a big GIS system you can integrate with that and vice versa you could create data with this and export to your GIS system um yeah okay um I guess I can just show a few of the uh layers that are in there already that come from those projects I was talking about um so well actually I guess I'll just say firstly there's layers such as a single layer of information one of the projects we had early on was the um Aboriginal Protection Welfare Board in New South Wales project led by um John Maynard and Ficky Haskins and um others so this is also in um sorry it's it's not just on this site part of part of the thing is the portability of information you can get this information and put it in other systems or put it from other systems to here but so that's information about missions and reserves in in um New South Wales and it's related to a bigger project that they're working on that combines these maps with um uh audio visual recordings of people who are in those places um so that's a an individual layer but you can also add layers together to compare different types of information um this is a fairly new feature so there's not many in there at the moment um I'm on the test site that's no good I'll close it down so it doesn't happen again no wonder I was confused um okay well there's another layer that that we produced which just gives us sort of a high school level version of um Aboriginal and Torres Strait Islander history whoops and that's a multi-layer so we've created those different sorts of information such as about art art and artists um deep time film and music as individual layers but you can see them also as a as a multi-layer and you can do things like show and hide different layers and stuff like that just to interact and again you can still click on them and get information about them um so I don't know how much time do we have time to wrap it up yeah okay all right there's a whole bunch of other other layers there's layers about um Indigenous languages from Auslan West Australian Aboriginal journeyways which has both traditional and historical knowledge is in it um the legacies of slavery uh and other ones um in particular the money from the ARDC has helped pay for spatial temporal metrics so one of the the additional features that are going to be part of this is getting being able to analyze layers and identify groups like clusters in time and space so where activity was quite intense um and I'm going to use that in the current project on historical frontier violence that we're working on now which should help us identify in a quantitatively demonstrable way the pattern of frontier wars where they were most intense and where they occurred across time across the whole of Australia so that's a pretty interesting thing that can come out of this sort of stuff thanks any quick questions a bill or two yeah good day uh very impressive um if if state gasseteers start supporting polygons instead of points would you be able to handle that um so at the moment we always get questioned about polygons at the moment we just work with point data um okay that's all right yeah so we always get questioned about polygons at the moment we just work with point data um and some cases we convert polygons into points but um because everyone asks about it uh and we really want to give people what they want um I have been thinking about some ways we can handle polygons it would be difficult in the search um but at least being able to put them in and visualize them uh because Queensland will ship place name polygons in the next six or twelve months yeah the the main thing we want from the states though is the the date that it first appeared in the in the gazetteer if anyone think it was right uh that it's interesting you talk about the date that it went into the gazetteer there's the metadata also include the date that that occurred I guess it must if you're doing a historical kind of temporal mapping of those massacres yep yeah um so the yes I could have highlighted this I probably should have but time is a crucial factor in you know if you're looking at history or anything really um in the humanities um so we do have um I'll just see if I can get the APWB one up again um so the state gazetteers unfortunately don't really contain the date the town was founded and that sort of thing at least they might have them back hidden back in their data but it's not accessible to us um but if you're contributing a layer you can contribute it with start and end dates so that you can see it on a timeline um I'm just trying to find the APWB one again so I'm pretty sure that APWB one works with the timeline yeah so you can show and show and hide on a timeline like that too if you have dates don't have to have dates but if you do you can do that um yep from the on the front page yeah well I've got a question um the microphone and Mary um how do you handle um if you find out post post something getting posted on there or uploaded there how do you handle cultural sensitivity um if something is shouldn't be up there um yeah so if if you want to put something up that is does have those issues there's a special field there called the warning field which you can put in any kind of warning you want it might be you know may contain pictures or names of people um or it might just be this data was harvested by computers that might not be accurate any kind of warning um and when that dot when anyone clicks on that dot that comes up in yellow so there's a feature there for any user to make use of um and I expect one day we will have a troll or trolls we haven't got there yet um but with many things we just we don't want to make a problem until it's there we do have wording in the terms and conditions that says we can take it down if we don't like it um it's pretty blanket we want to allow people to do what they want but yeah there is and we we can if someone's being abusive or something we can we can take it down so what about um say in time when you get song lines and stories called how do you if an Aboriginal group wants to share a story a song line and that's information that they have to have for their family but no one else can have access to having to interact yeah basically the answer is this system is not for that um there are other mapping systems there are mapping systems that go into great detail about cultural knowledge and who can access it and so on so yeah the answer is you gotta use one of those systems this is for that information you want everyone to know yeah put my card right on the cistern I've been looking for boy you do one and side by and so right up toward the beginning now yeah so if I just want to see what what's up there yeah that's another yeah so well first thing for the day we'll just play with this for a while I can see it on the map um there you go there's loads of loads of things there yeah there's anything in particular you know where uh we'll do I hear that's what it would yeah hang on that'd be boy gear this will be side by here I think it must be can we go number click it we'll just click a few dots and see what what it is up to the left that'd be boy gear sorry boy hey that'll right over this way all right yeah anyway if it's not there the point is you can put it there yeah yeah well that's interesting I think and then you're pleased to see porous right off this back I'll be left out of the story but they're part of the story and then there's a whole series of stories that come out into the specific nation so there's any connection to the world that's pretty deadly all right cool thanks as a deadly presentation so far you're the best thanks bro thank you and you coming up to show off no no I'm just coming to do the thump out yeah so Nick we're Nick you can start hanging your way up the big big guy bird by burger ordinary in your mind heritage research management mr. problem another actual male HRM tool and is and it's called for assets you can take it home in an all-in-one direction um associate professor nick fly burger fly burger work with his friend with Australian languages with and but not saying a language problem and why to men are too he helped establish a specific and regional archive for digital sources in that endager cultures in 2003 a digital archive of many mainly audio language record that is now the director he leads the ARC you forget it it's enough thanks I don't know where these come from now okay so thanks very much I also of course want to acknowledge and pay respect to traditional owners of all the lands that we all come from and I work here in Melbourne so we're injury people um and I want to talk to you about a particular project um so it's an integration project of the ARDC so the the idea with these is that they bring together um other projects and in this case they rescue data that was um orphaned in a particular project and this is on behalf of a number of co-authors that you can see there I think we're all aware that um um research projects often end and all the research data is lost and this seems to be just a periodic thing that we do nobody seems to care Australian research council doesn't seem to care they tell you that you have to look after this stuff but they don't provide any way to do that ARDC and all the predecessors none of them's provided us with ongoing um research infrastructure to store this stuff that we produce so we're constantly wasting huge amounts of money in producing materials that are lost there's a great project out of Canada called the endings project which which looked into this and there'll be an issue of the digital um studies quarterly coming out this year with papers about um project endings um but you know it's a big risk of a digital dark age basically that we're producing so much research data and for some projects who cares but for a lot of the cultural projects and language projects that we're talking about it's really important that this stuff is not lost so um one example is um well sorry the thing that I'll talk about the OM helped people describe collections and it was used in lots of projects and there are many agencies cultural centers language centers who want to describe objects in their collections they have files they have physical objects and they want to describe them in some way and the OM was software produced here at Melbourne Uni that let people do that description so it was very uh it was valued by many people um but how you choose this software is usually you know you do it on the basis that somebody you know uses it or you've seen it you're like it or you've employed somebody who knows how to use particular software and then when they leave your organization you don't know how to use it anymore and so there's a risk that all your stuff gets caught inside some software and it gets lost so that was the problem so the e-scholarship research center was a unit here at Melbourne University that was internationally recognized for the work that it did and it produced several tools one of which was this one the online heritage resource manager or the OM and what this let you do basically is identify entities that is people places things and relationships between those people places things and with that you can do things like describe you know collections of photographs or collections of files that you have or whatever so it was a wonderful tool it was produced in a Microsoft access database and so it was sort of time limited because you know that's that's a old technology but then a couple of years ago for some reason which I don't understand the University of Melbourne closed the e-scholarship research center down and so the OM and all of its products were suddenly orphaned and all of the people who were using that software were upset so there are about a thousand projects that use the OM thousand projects and a lot of them were still active some of them were finished so there was a then an effort in the six months after the center was closed to recover all those files and to put them into storage so that they were at least captured and not lost but they weren't active they weren't so the way the OM worked was it would it would produce static HTML websites so you'd be working in a Microsoft access database doing all the relationships it would then spit out static HTML websites which at least you know could be captured by the internet archive or something and you know they're relatively easy to store but they're fixed you know you can't keep adding more to them so it was a nice model but yeah the problem was that it was no longer active so the sorts of project why are we talking about it here a number of projects working with indigenous data used the OM so the return reconcile renew project used the OM living archive of Aboriginal art did then the big one was the encyclopedia of Australian science and innovation that was really the first project that the OM was set up for so you know really big projects and you can see there's I mean some of the interfaces are a bit clunky but find and connect which is funded by the Department of whatever it's called now social services whatever which is about putting child migrants and families back together again this is run on the OM so at least this one continues to be funded and there is I think there's an instance of the OM that they've managed to cobble together to keep working on this a project that I worked on this so this is a collection of records by produced by Arthur Capel who was a professor of linguistics at Sydney had 15 000 pages of manuscripts of various kinds in a whole lot of different languages mainly from the Pacific but also from Australia and these images these manuscripts were in his executor's house so Arthur Capel died his executor who lived in Balmain had one room of his house completely filled with boxes of Capel's papers so our teams from Paradisic went into that house and spent weeks there imaging all of those pages and then produced them using the OM and the associated software to put them online and make them accessible otherwise you had to go to the house in Balmain to look at them so this is you know the OM was a wonderful way of linking up these objects and navigating them and making them accessible so the AIDC funded the team that we you saw at the beginning to do some work on capturing the data out of the OM and putting it into a format that you've already heard today so research object crate, RO crate so exporting from the OM into a standard format in JSON-LD so it's linked data using just straight textual material and the idea is that we would eventually build a similar tool in something called discrebo or discrebo collections so it's a tool that will do what the OM did but do it in a modern way so using linked data yeah so ultimately the goal is to ensure longevity of the existing collections but also allow people to continue to build new things with discrebo so you've had a lot of technical stuff today but discrebo is a tool for doing metadata entry so putting in that information about people relationships between people places things it's being used in the Ning'an project which I lead which is a leaf project to put up manuscripts of Australian languages and also Paradisac is using it for for the for the data within the Paradisac collection it's also being used in a number of European projects now so the OM has quite a complex data model so the online heritage resource manager which has you know entities objects archival sources and all that kind of thing and this then is exported out of the OM and is captured in this JSON-LD format I can see that we're late in the day here and I don't want to go through all this sort of stuff but Peter's here and he can answer questions about this if we have any later on but basically you know there is there is good solid tech behind the export of this stuff so what we've done is we've developed a library to transform from the XML that comes out of the OM into discrebo's internal structure which is ROcrate and we've tested that with several of the datasets so EOS which is the Encyclopedia of Science, C-TEC which is childhood tradition and change, DHRA which is Reason and Revolts so these are these are research projects led by you know particular researchers and we've tested it against them we've been able to read that data out into a PostgreSQL database and then write it out as JSON for research object crates so there are bits of the OM model that we still have to tweak to get to you know the full OM model but it's looking good so far and reminder that we only got funded for this in December and there's been Christmas in between and we don't work over Christmas the way Peter does so what we have at the moment is a discrepo collections tool you have an authentication to get into it as you would expect now this is using very similar technology that we're using in the Ning'an platform and in Paradisks so we've already got an economy of scale if you like by using exactly the same technologies and being able to build on the expertise that we have in these other projects to make this work and here's the main dashboard we've been able to load in those entities and there's the time that it took so you know 10 seconds for 83,000 properties and 8,000 entities 90 so this is a once only thing so you're loading the data in creating the arrow crates takes a short amount of time and then you've got this robust data that will endure even if you have to put it into storage at least it can be rehydrated later on as opposed to the stuff in Microsoft Access which is at risk within the discrepo collections which is the tool we're looking at now you have entity lookup so because it's linked data it will look up relevant tables so we can look up institutions in this case will have orchards there for people that kind of thing and then underneath it if you care to look at the arrow crate that's what it looks like underneath there but on top you can see the way that it's presented to you in discrepo so discrepo is a bit forbidding it's not it's it's a bit engineering but it's a really good tool and the thing is that it's a generic tool so that you can use discrepo in a number of different applications it will create the arrow crates so it's a matter of sort of educating people about using discrepo but I think it's got a lot of potential this is then you know a discrepo entry you can see at the top you've got an ID you've got the name you can put in free text and then various other notes and so on so if you click on the plus you can keep adding new fields into that and there are drop downs that will do lookups and all that kind of thing there's a database underneath discrepo as well but of course all the time it's writing out arrow crates so if we want to build a data commons and sustain existing research data we need to be able to work with data independently of tools so we have to separate the tools and and the data and this model that we're looking at here I think is a good example of that and the whole sort of infrastructure that you're talking about with arrow crate at OCFL that PD's been showing us and that underpins a number of projects in the HASS ARDC I think is a really fruitful way forward so I run Paradisic Paradisic has been going for 20 years it's got 200 terabytes of data represents nearly 1400 different languages audio material video material text all kinds of stuff and it's all with the leaf grant that we have now going to be put into this format we've got a proof of concept that shows that it works very well so having done having made all of the material that's already in the ARM safe we want discrepo then to be a tool that can be used for creation of new projects and when that happens and we're hoping that'll happen this year that could be a really promising tool for people who have collections in language centres or whatever to use to describe their collections without getting stuck inside you know filemaker pro or access or whatever software that others that we won't mention that could be locking their data up so the problem we have at the moment is we need more coders that we actually have the funding now but we need more people who can do the coding so we're open to hearing from people that are interested in getting involved with this thanks. Any questions online? Good day Nick I'm also Nick not really a question but a comment it's to support this work so I work with indigenous data network and one of our objectives is to map the kind of metadata we collect to the formats that you guys understand and know and so we've got a tool already that can do the JSON-LD kind of discrepo style documentation from our metadata we've also decided that going forward if and when and we think we do need to store data we'll use RO crates at the back end if it's that kind of blob of files sort of stuff we're talking about so it's not the main focus but it's good for us we don't need to have a main focus we just say if we're going to do this we'll do it the way you're doing it. Great yeah well that's good to hear it doesn't have to have files right I mean you can't just have metadata in a RO crate it doesn't have to have a file attached to it right? Yes RO crates can be used just for metadata and just for context I just wanted to mention another project that's happening in a very similar system called Heurist that people might have heard of which comes out of Sydney University out of an archaeologically computing lab and that's been used in a large wide range of projects just like the OM and that one's not at risk it's still actively maintained but we can we there's a project in development to to do exactly the same thing so you can export data out into a common format and one of the projects under investigation actually has a high it's actually got trove trove integration and linguistic content which means we'll be able to reuse data into LDACA and ATAP. The the program could you oh sorry could you um I was thinking back when Windows first come in um we some of the old you know we call them USBs now but you'd have some of the old old data things that you put in there when you would you have any programs that sort of can you know like I know there was some Aboriginal uh I remember we did this one on HIV in Queensland and it was on a flat disc you know them flat yeah well you're able to put all them sort of things in there I think I might have thrown them out now but yeah just those sort of things were like historical data yeah so how do you get in the LVHS and yeah so forth look there are ways of getting if you've got the files you haven't thrown them out then um yeah if you got the USB stick or the zip drive or the that you know there's all kinds of formats that people have used there are ways of reading those still um actually Melbourne University is a forensic um lab which has all kinds of playback machines the problem is finding a machine that will read that stuff right and still be able to read it onto your computer but there are services around that do that but it sort of depends how valuable it is right I mean if it's not that valuable you're going to have to spend a bit of time and effort and probably money on getting retrieving stuff off those old formats yeah just the way I think is is like we talked about timelines there before to show what we did a few years ago like what people in community were doing yeah years ago that was it was pretty relevant at the time let me say yeah HIV at the time was a really big yeah big issue and some of that talks on that was very relevant yeah and then to see where we may have come to now yeah you know that's all I was thinking about that yeah well I mean it's not just that stuff but you got to think of all the things that you've got you know especially language recordings and you know these sort of things really have to be preserved right so if they're analog they have to be digitized but even then they still have to be described and they have to be put somewhere that's going to survive so iatsis or paradesic or one of these archives but just keeping it on your computer very risky you're talking about condom and yeah yeah hi Nick uh Sandra Salko here also Indigenous data network thanks for the presentation and thanks also to the ARDC for having the foresight to fund such a valuable long-running and just fantastic collection I wanted to ask at last time I looked at the Aum the underlying semantic data model was built around the ACCPF which is a very long-running well-established international standard that a lot of archival organizations use and I'm just wondering are there possibilities of collaborations with other archival style of national organizations or wherever where you know that could also be interested in adopting um this revitalized set and and where the semantics there's a common set of semantics across all these collections that could possibly be leveraged as well yeah it's an interesting question so we put an application to the ARDC in about three years ago in which we had a number of partners including agencies like the ones you're talking about but we didn't get the funding so we didn't pursue that and now all we've got the funding for really is to do this rescue work so yes there's a great possibility um to do much more but this is the first step and if ARDC gets funded and if we do a good job and they think we've done a good job and they refund us then potentially that's the kind of thing that I think we should look at yes Nick before you go we can get everyone to just say this oh um oh my god there that's calming calming thank you um um no down oh yeah jenny fusta you're gonna have to proceed up here because you've got this open discussion which i'm gonna have to facilitate but i don't know what we're discussing yet so you're going to co-facilitate a conversation with me okay come on up carly while i was thinking about them too too so open discussion join me in another big round of applause for all those great presentations this morning and this afternoon okay so we now want to have an open discussion really in relation to requirements for humanities research infrastructure as we move forwards um and of course as in all of our discussions being mindful of the indigenous works that we do and the co-design and the respectful moving forwards so taking away from this morning you know I think we've made amazing progress given that most of our projects really didn't start until January 2022 we've got an opportunity as I mentioned this morning to put a submission in for more activity investment from the increased strategy and we've got ideas as I mentioned this morning about how we want to shape that but I also want to hear from you again as to what your idea is about the future for the HASS and indigenous research data commons in particular in relation to humanities is so does anybody have anything that they would like to say there might be things that sit within an ARDC envelope and then there might be things outside that envelope as well but you know I think we've got to kind of track that in a sense and kind of understand you know where the levers are you know and what might be possible with with ARDC funding people have started to talk and I've been talking to people around the edges around repository infrastructure and you know what that now that's not entirely it could be something you pitch into the ARDC something you pitch into the NCRIS so yeah understanding the kind of what that might look like and who might be in a place to do that and when I was sort of overviewing things earlier on you know that review of the ARC if that has actual teeth and there's really kind of meaningful change at the ARC you know program design for infrastructure funding in that way the fact that the ARDC is not doing the open call model you know is at issue I guess you know who picks up some of that sort of stuff and does that some of you know the nationally significant projects and seed funding for infrastructure without that that funding stream anyway they're just a few things I've been thinking about as well yeah I think the challenge we're facing is a kind of political resources one so funding streams all divided up so if we were thinking for a long-term research data repository which we desperately need for languages like Nick says you don't chuck it away ever right you just don't in that case you need long-term storage but the question is who do you get to do it it's not really possible for ARDC the way it's structured to take it on you need an institution which is in perpetuity that institution and the only ones that I can think of are libraries archives and also as it happens ARC the Australian Research Council we can be sure there will be an Australian Research Council maybe it'll change its name but it's in perpetuity as well so you have to find a way of linking kind of project or short-term based funding to an institution which has longevity and the kind of political problem that I see with libraries and archives is and museums is on there's a lot of willingness but there remittances around serving the public and researchers are not really the public not really right and we're sort of public when we're not being researchers but research is going way beyond researchers like greedy public because we want to grab as much you know a person goes to library it gets a book out researcher wants to get a hundred books out you know you can't do this so there's an issue there of how do you sort that out and in the Australian context it's separated and in some of the European context they're not separated and so it's actually much easier for them to do a long-term repository so that's a sort of a political policy issue which I dance back to Kylie and and and you know whether it could be solved um I just thought maybe I would add on that um from a conversation we had last week I think what was and please run the mic around and get your thoughts because that that stirs up our thoughts as well um but you know there's a conversation last week around which and it become very clear to me and you mentioned it now um the political decision making it's evident that there actually is not um a aligned approach or strategy in relation to this um but yes good points on on how those things may well be sustained if you are aligning it too but we need that definite strategy in place and I think that's where that conversation comes with everyone in the room and further to shape that I was thinking when we're talking about the the house um IRDC our best hope was as as a a collective but a smaller group once again um to sort our own backyard out so we can say well if you don't know what is happening from above we kind of know what's happening and this is what works and we can assert that and we can push for that as a strategy to be adopted um it's kind of like we're making our own direction in that regard so one of the things that I um occurs to me after the question about you know USB sticks and things that were created in the past is that the IRDC could run a project rescue project which is tell us about things that are out there and that need to be preserved right and I think you'll find a huge oral history collections in local museums you know regional museums all kinds of places that have data digital data which is really at risk and needs to be preserved in some way so you know that that'd be we've got all this high level stuff we've got all this you know high level tech things but the rubber hits road where people have stuff in their own possession and it needs to be preserved and often they don't even realize they've got it until somebody starts showing them other things and they think well I've got that too you know so yeah that could be a dedicated project I mean let's forget about the IRC trying to do it for all the projects it's funded which is another project that should be done right yeah and that reminds me of the deadline 2025 nick you know we all know that the magnetic media is all about to become of no use to us so if we don't mount some kind of rescue mission around that soon that's going to be a problem but I don't know if that's something that you know I mean we may be able to look at small projects to do that but you know it's a mammoth undertaking to do it properly this might be a kind of among the engine sort of question but I worked a lot with Ndara senior men for like 10 years on knowledge systems particularly looking at the way knowledge was preserved across groups and divided into sections and each you know geographical unit of family had a responsibility for different the survival of different paths of knowledge and you know listening to them I'm thinking that it is a kind of colonialist project to want to centralize stuff and maybe we need to be looking back at what indigenous people did for thousands of years to preserve knowledge that was vitally important to survive or vitally important to the preservation of country the other thing I'm hearing is we need to preserve all of the data and I think back on some of the rubbish research projects that I've designed and worked on and all I want is for that data to be buried for all time so one of the other things you learn from from working with indigenous people is that you know the knowledge that was central to survival is the knowledge that was that was preserved first and you know the the knowledge that was that was essential for the survival of the soul as well was preserved after that and then you know lowering orders of importance so some things were just performative that were that didn't survive a person's lifetime but other things lasted for thousands and thousands of years so I think we need some kind of filtering going on so that my rubbish research doesn't actually get metadata then preserve forever but the good stuff that I did does so so two points really one is about you know filtering and the other is about you know is is centralizing the gap or do we need to look at some distributed networked model of knowledge preservation I might let Michael and Robert tackle this actually but I think that definitely the language data commons is not a centralized anything it is a very distributed model and I think that we're all aware that there is data that needs to remain firmly with the custodians and not be brought into any kind of centralized system do you want to yeah um I didn't really get to talk about this today but um in previous things I talked about I talked a little bit about institutional willingness um and in terms of you know that very point that you've raised um I think that is important but in terms of what in in terms of institutional willingness certainly it's there with Eldacca because we're taking key um indigenous governance functions and embedding them and building upon them they are their central to our project but I can't say that all others would one of the things I've held at mind is wherever you go wherever you walk on this continent you're standing on Blackfella country and in our own and this data that we're talking about is collected during a time of Australia's ignorant infancy which we still are in today and we're pushing these processes on but very rightly so as you've said we neglect to understand that data has been continued successfully for thousands upon thousands upon thousands of years so I'm a big push I'm a really big advocate for Blackfella ways moving forward because you find that we are we do stand on that soil they are the key mechanisms that I believe we should be implementing you find that it's not Blackfella whitefella thing it's Blackfella ways and you find a place for everyone in those structures because that's the way it is they're not set to oppress people I think the other key concepts to maybe consider on this front and I won't talk to it but just in light of that institutional willingness yes autonomous regard which I've already said with Mary Graham but also to the talk about Indigenous data sovereignty in terms of how that's managed so I think that there are probably three key points is there any other or even Sandra is there any other things that you would suggest to consider in light of that that that nice provocation thanks John that was a really interesting moment for us all I think it's certainly a few different thoughts ricocheted for me listening to you and thanks Robert for your response there I think that was really strong a couple of my ricocheted thoughts not in any particular order and not particularly thought through or obviously voiced before I thought about local censuses that prescribed body corporates bodies corporate have been running in Australia you know fascinating given 1967 we mark it as the referendum and the result in the year that led to us collectively being enumerated in the federal census and 55 years later which is my birth date my birth amount 55 years later we see a new tradition of local censuses collecting based on disaggregated data that's considered really important and vital to land estate based communities which I think is really interesting another thought that ricocheted was if we all were had to secret do a secret ballot on what we think the most urgent pressing issue is right now for us as an Australian community we could come up with a few things and there may be recency bias but and I think there is with me but seeing young children in Alice Springs comes vividly to my mind so if we took a grand challenge approach to what we do we'd come up with some fairly clear priorities and challenges that we should make some considered approaches towards the other ricochet thought was around the revive the new cultural policy so I did a quick search for research infrastructure in that zero results zero result then I did a quick search on research 19 results about half of which are citing research about you know Australian attitudes towards arts culture and creativity one in relation to the proposed music Australia the need for basically data collection about past you know performances and things like so that's something and then it was part of revival of course there's the the intentions to create new law around indigenous cultural and intellectual property rights another ricochet thought if I may while I'm still rolling is if the next referendum does get up on a voice to parliament we have a range of truth-telling regimes and treaty making processes rolling across rolling out across several Australian jurisdictions where is the evidence base that those those bodies are going to need for the conduct of their business which goes took me back to John your contribution there around anything any research that's been done that's been Indigenous led in all of these across our community organisations for the last three to four decades where is that research how is that research vulnerable I imagined Alexis Wright sitting and weeping over the the latest debacle in the territory when in fact Alexis Wright and the late Tracker Tillman and others from central land council have done an enormous amount of work on revisioning the Northern Territory where is that research and how can how can this kind of thinking perhaps orient to well servicing that rather than John which you very humbly put servicing that research and perhaps hiding away the the rubbish research that has done us no good I'll stop now thank you Sandra for your considered thoughts as always rubbish research and aggregation of data not being beneficial to Indigenous people we're both very important points that Sandra and you've made too I'd like to move on to what we've identified as capability gaps so as I mentioned in my presentation this morning during a consultation process which was a number of fairly large round table events that were open to anybody we identified a number of capability gaps and we've talked about one quite extensively and that is the need for a data repository for has an Indigenous research data the other one that that well several that were identified was the access to mediated data so that is data from the web data from social media data app based data we identified a gap in the has RDC in the turn in in the form of the creative arts which obviously is pivotal to Indigenous population as well what else was there there was something else but I think that there were several things but those are the things that immediately spring to mind but I'd actually like to kind of open it up to this group to see if you can articulate any other capability gaps for your particular area of research like what do you need research infrastructure to provide you with what data do you need that you can't get hold of and what tools do you need to analyse that data that are inaccessible to you at the moment so someone's got to have something that's a well I'm not going to start with next table let's start with Nick let's start here do you want me to go while they get ready yeah I'll be going I'm just putting you on the spot Nick so it's not everything that we're doing in the Indigenous data network but one of the things we're doing is taking pretty well known standards somewhere and then extending them for Indigenous purposes so our particular case is we take cataloging data standards and we extend them very slightly to cater for special Indigenous concerns and we see that perhaps with the presentations before where there's RO crates or something that are used quite widely and then we might think about how they need to be extended for our cases what we generally need is a better understanding across all of these projects about what sorts of standards and so on are in play out there in the big world and then which ones of those have been extended handled you know made suitable for Indigenous and other purposes now we all know this to some extent but we don't as far as I can tell have an overarching picture of that the reason this is important is precisely what Nick said before and others about a separation between data and tools the better handle we've got on the models and the standards involved in our data the more separated from our tools will be and the better preserved things will be so that overview is important um we're again focusing on data cataloging in our thing but where we encounter things that are not data cataloging we look elsewhere within this community and so on to find things but and we're building up knowledge then of our total space so data I mentioned it before but if we want to store files we don't really know about that they know about that we ask them but we all need a better handle on the kinds of standards and so on that we're using the um two yeah it's on the um what I was thinking of I I did uh did my phd a lot of it was on workforce and one thing I think of when we're doing research in this area which I have me in back and uh we work with the health service in brismond and um is is about building up the the data research workforce and because we need and also as well as building up the workforce we also need to build up the understanding of of why we're collecting that data when we talk about data sovereignty we need indigenous people within our communities to actually know what data sovereignty means because that's one thing we asked a lot of the mob and um interesting responses and and so also the organizations within the community including the health service when I say health service I could give you the acronyms but I just say health service is is about getting members of each of our community organizations to understand that they are the sovereign people and that data is their sovereign data and also in regards to governance because it is their organizations they need to understand that they actually own that that's their you know like they have a role in the governance of it I I don't think that that's out there enough and when we talk about gathering data from indigenous communities I don't think our mob know why people are gathering it or actually who owns it in the sense of sovereignty so if we train up more of our people I don't like put ownership when I say our people but um more indigenous people to be involved in in the process to create that workforce around you know in data science that's it yeah thank you I think that's right I think that we need skills development not only for researchers but we also we need a workforce that are coming up into coding to help out Nick T burger and Peter Sefton so we need to look at generating pipelines to bring these people up to speed and definitely there needs to be education around indigenous data sovereignty I mean what does that mean you know I can understand why many people would be scratching their heads over that and I think that's something that the indigenous data network are grappling with as well did you want to add to this too Michael I think like if we're talking about getting a workforce in there too and creating a pipeline why we're still why it hasn't been done yet you know in terms of the language that we're using around data it doesn't you know leave black fellas with the thought that we can take ownership over that when you look at data and you look at things like restricted access and you look at all of that terminology that we put with the data with black fella data it's not inviting place it doesn't say here's an open opportunity for you to stand up and take ownership and make decisions it seems like a big complex fight with a lot of jargon in there and that's challenging to bring communities along um in terms of that like uh professor Clint Bracknell was do you want to talk to that a bit more with um with Eldak you know he was he'd made that point about this this language and we we talk about flipping the coin in in terms of restricted access flipping it to something where we're talking more so about respectful sharing was that it yeah just those kinds of things a different mindset or even just that that as you've put that cultural way of managing data in terms of we know there's there's data that can't be shared for various reasons but it's not restricted access it's respectful sharing I know I've got data that I know and I need to be responsible as a custodian of as to who I pass that on to and when I do that likewise I know there's old people who were withholding data and information from me until I have done certain things to prove to them that I'm can receive that data you know what I mean that in itself encompasses that idea of um respectful um respectful sharing and I think we need to what I said before um in terms of this space black fellas need to make that space for themselves just as that space needs to be made for them it's both ways you know I'm sure I'll add that to it I might add to that another layer um back in June last year the Indigenous Data Network held an Indigenous Data Governance Roundtable discussion in this very room and one of the presentations that really struck me was by Michael Aird who's from the Anthropology Museum at the University of Queensland and he was talking about restricted access and he was talking about white people imposing restricted access on Indigenous material in collections and you know I think that some of those people feel like they're doing the right thing but actually by restricting that access they're making it impossible for those Indigenous traditional custodians to identify material that belongs to them so there's that's another layer again do you have anything you want to yeah so so I think just picking up on what you said Rob about respectful sharing um so I think part of the idea is you know whether it's you know it's data that people want to keep to themselves for their own purposes or it's something they want to respectfully share we came the idea with our lack is to provide infrastructure to future-proof it right because if it's in files or tied up in software you're going to lose it um if you want to keep it to yourself you can't really use public money to fund that future-proofing but we can give you resources to do it yourself if you want to keep it to yourself if you want to share it with others on a spectrum of whatever the spectrum is then then that's something where public money seems to fit in does that make sense or am I confusing things but you had the tools but you couldn't you couldn't have LDACA staff working on stuff that you want to just keep to yourself I suppose right initial part staff forgot to average more people's sacred sites that are restricted access all the time there's no okay all right there's no logic yeah sure um so I guess that the key thing is the respectful sharing but just on the workforce thing um I think there's an issue around humanities more generally um I don't think it's an issue um that you're talking about getting people into data science and so um it's actually a more general humanities issue to find someone who can be in kind of tech and really understand humanities at the same time it's quite a rare person that combines those fields um and so we're quite often dependent on humanities friendly tech people so we have someone like Peter who actually went through into a PhD in linguistics so he he he understands humanities right um but we've worked with various people um who don't have much understanding and then you just end up with a product that doesn't work at all um so I think the workforce development probably it's about starting to transform humanities and saying we we have to get with the program here actually um and we have to start embracing some of these changes in the world and we know AI is making a little bit easier um it's not I'm not suggesting that humanities people start trying to become really bad coders and so on um but what I'm suggesting is that you can kind of create a class of graduates who can more effectively communicate with you know people that have real high-level expertise and and computer science and and coding and and that kind of development um at the moment there's a real gap between those two things so I think it's about encouraging all of our humanities graduates um or at least the ones who are interested right who see something there to find programs at the moment there isn't really a program um you keep looking around it's very hard to do it if you want to do a master of data science you actually have to be a graduate of a bachelor's in computer science um so data science degree sound really good but you can't actually do it if you're a humanities graduate so how do you get the skills right you just have to do a double degree or that kind of thing um which takes time and all that so we've got to create new programs I think humanities which start to address that issue and being mindful of prioritizing um you know indigenous students are coming through those programs and encouraging as much as possible because if you don't get people on the ground you can't really transform what you're doing um I guess um at the risk of sounding like one of those tech people who don't get humanities I'm a data scientist um I guess I just want to just re-force one of the some of the conversations on data pipeline we talk a lot about data collections we talk a lot about sovereignty um but as a data scientist I can say showing the data once you've done the analysis you know showcasing the data back to the community is also important something we've got to think about um that's something I know Bill and I we talk at length on our historical frontier violence project um so you can make you can take all the data you want you can write a fancy paper about it the community is going to be like so what so we've got to think about how we're taking the data visualizing it in a way that the community you can engage back with the community so it's just once again goes back to the workforce around you know getting people who know how to showcase their data back to the community thanks any comments on that I have a couple of questions from online um yeah I'm just going to let Robert respond oh yeah sorry I just wanted to maybe quickly throwing just because I was having this yarn with a few people today already and if we're looking at pipeline I hate that word eh um but if we're looking at pipeline and and I'll just contextualize this in terms of LDACA what I said was you know we're talking about longevity and making something that will last and I think from a community perspective and you know in every sense of the word um we need to see that we have something that people can use so training yes is a big aspect of what we need to do in terms of training those people but separated from the project almost like a third party thing if we're going to make it stick we're effectively influencing culture new culture a new way of doing and I was just saying with yarn out front I said when I was a kid Microsoft didn't teach me how to use a computer right somebody else did someone in the compute in the community did and others did and it was something that we taught ourselves so yes LDACA is going to have to be effective in how we're doing that training if you want to influence the culture but if it's going to stick community people need to know it they need to be passionate about it they need to teach others so I'm thinking of people sitting in the you know language centers and all of that who are going yet if we can convince them that this is this is a good thing and you want to do this this is how you want to do it um for the best interest of your Malbenya data those people will pick it up and they will champion it so we're talking about community champions so if we're looking at that pipelining but starting in that regard too that was that point um and there was one other thing gone lost a gone dementia and I just jump in quickly I think Jenny before you mentioned um you know what I said yeah uh you mentioned you know what is indigenous data sovereignty I'm grappling with that I just want to be clear that it is defined and has been defined for years it's it's very clear and I think coming back to your question about well what do you do when you've done the research and it's not really relevant to communities I think the solution to that is making sure that you only do research that is relevant to communities in the first place and that's really the the basis of indigenous data sovereignty that we do this work because it's relevant in the first place sorry sorry Sam I should clarify I did not mean that indigenous data sovereignty was not defined I meant that it's a term that I'm sure to some people can seem mysterious until they have the definition explained to them I was just going to say too quickly there are some really good um community based or well ones you know that I've worked with and they are examples of this where they're collecting data but community are involved and it's not it's also about presenting that back to them at the end you know so they know it's relevant but as it's being developed you know community are participating in projects and seeing that you know as it's going getting halfway done or a quarter way through they're having things like exhibitions and different community barbecues and different things where this data is coming back out and they're starting to see even if it's not fully built yet they're starting to celebrate in that journey and it's it's not so isolated from community and the other thing I did just remember I just wanted to share from at home we've got this sketch grammar that needed to be digitized and there was the function you know Nyinga provided a aspect to do that I've got my login but I'd be honest I'd been very slack and I don't know how to use it and I'm still wearing that out and I was going to hand that into them and see what they could do because it needed to be digitized it's a guru and guru sketch grammar done in the 70s it's type written and it's and it's also handwritten as well and so it's a it's a bit of a mix them up way but it's important information for us in terms of the next steps forward for our community in having those language conversations and actually revitalizing that language properly but then I had concerns that well what happens when I hand that over who actually deals with this because I want this to be our community to have this conversation first but it was pretty useless in the form that it was it needed to be digital we needed to be able to use it a bit more but then another program had come and now we got a new recently graduated guru and guru linguist and a few quite a few connections um in that central Queensland region who has now been able to take uh advantage of a program that has been put out in terms of um uh digitizing manuscripts what's it called the problem is that yeah yeah and and so this this linguist has been able to do that and that is that's perfect solution for my problem in that instance because now we've got someone not only is it going to get digitized but we've got one of our own taking this process forward and that person's going to be better for it the end too yep so we've got a few here uh one from Janet McDougall at Australian Data Archive who asks will there be career path career paths for computer data scientists in the humanities and another one from Alyssa Arbuckle who asks are we training would learning would leaning in to the existing digital humanities strengths in Australia be helpful two very separate questions I think um first one will there be positions for career paths career paths I think it's up to all of the partner institutions to make sure that there are career paths um ARDC included um you know and we have got the wonderful Lisa who's here who's our Indigenous intern although not in data science we see the importance of bringing Indigenous people into the data commons um and I'm sure that Peter's got people that he's nurturing so you know yes the projects do need to nurture people in data sciences but we also need to think about the employment of Indigenous people in real roles not in advisory capacities the second one training would leaning into existing digital humanities strengths in Australia be helpful yes yes and yes and some of those digital humanities strengths will be on show next week at the summer school so get in there and there's another one here um what what sort of skills are needed and what support mechanisms are there to train people in data management skills that are needed uh different approaches might be needed for haas basai data especially when there is a need to incorporate care principles that's from Robin Burgess at ARDC thanks Robin nothing like your colleagues dropping you in it um um yes the it's not one size fits all I don't believe that it is one size fits all and again I think that that's going to be addressed next week at the summer school where we've actually got three concurrent streams so one very much targeted at humanities one targeted at social sciences and then we've got a third stream especially for people who work in Aboriginal community controlled organisations who may just be at the beginning of their journey with looking after data so yo it's not always one size fits all no however we have got plenary sessions too where you know there are some there are things in common but there are also things that are done differently social science scientists work a lot with tabulated data humanities researchers maybe not you know so no not always one size fits all Robin I'll see you next week in Sydney Hey Dee Oh hi um so I had a couple of things that have come up so I was going to respond when Michael mentioned the training but it's come up again um one I think the biologists are really well organised um and I can't remember his name uh so but I've seen great presentations of identifying you know how many there are there's there's thousands like there are of the Haas people and so they've really clearly identified the skills gaps and um which helps if you're doing things like decadal plans and so on to be able to talk about how many people there are and how many programs you need uh of all kinds um but it's not that's not just a problem here like I'm trying to help members of my family who are scientists younger ones um and there's the same problem people you know building up skills um I wanted to pick up on what's something that Nick said about um standards um and I so I learned I think I learned a lesson about this over the last 10 years um going back a while when we first got the Australian Access Federation that was called MAMS um I can't remember what that stood for either but there were um James DL went around the country he was at Macquarie University and they told stories about this that you would be able to log in which you can now you can log into the Australian Access Federation and you would be able to go to someone else's university and he said what's the effect of something like you would be able to have resources which were available to female anthropologists over the age of 30 for example right um that have been tagged as being you know this is women's stuff or you know and mention things like I remember anthropologists and uh I always thought there was something a bit strange about that but it really struck talking to people from the National Library at the time we don't know how to we don't have a list of anthropologists right and they probably don't agree who's an anthropologist because you've got those ethnographers um and other you know deviance um people can't people can't agree on who's in the discipline and we can't identify those people and we so we can't have standards for that we don't have standards for identifying who's in or out of an academic community or any other community so I think the idea of instrumenting there's a there's a level of standardization which I don't think we're we're able to do I'm not sure that we should do of getting into instrumenting that kind of thing but I think that in those kinds of situations and this goes to stuff that Robert was talking about about um you know people holding holding onto knowledge what we should be doing is identifying the custodians or the stewards of the of the knowledge and having trusted systems that they can use if they want to where you can put uh you can put stuff but the person who makes the decision or the persons who make the decision it's people right this is something we I don't think we can standardize so I wanted to I think there are limits to this and if you look at it's what what's happening with things like the crypto trading and NFTs for people who are following that stuff there's a bunch of geeks who believe they can instrument things like law and and contracts and that you can have financial transactions that are that are kind of programs and it's an absolute mess it's full of corruption and and people stealing money from each other and it doesn't work right these are social institutions that you can't instrument at this point in time and I think we should stop thinking about that and talk about how communities can um can take control of things and sorry I've well got the mic I'll one final thing that's happening in the in the tech world as people are moving away from twitter I don't know how many who's gone to mastodon no one three three okay um so for those of you who aren't following all of that we twitter was a highly centralized service like a message board for the whole world and it may or may not be falling apart so a bunch of people have gone to to mastodon which is a a um better rated distributed system where there's a whole yeah it uses a standard yeah um but what the interest this is a social thing here right that the whole the mastodon community and it's not just that but there's these are geeks um mostly but there are people from um communities who might be at risk um for various reasons um and they there's a in mastodon you don't build central indexes so this is the the sort of tech culture of going from everything's in one place like twitter or everything's in google and if you go into mastodon and you start trying to build a centralized index of all the stuff that's happening in that social media realm you get told to go away and so we are we may be seeing a shift to um an online culture where people go back to community and building trust relationships before they share things with each other so we might have been misguided in thinking that we can have google you know managing all the information in one place and it may actually become that finding things is a bit more difficult and people start hoarding and um and only sharing with people they know i think that's kind of it goes against what i thought we were doing on the web and it's something we should think about thanks pete len yeah jenny you started off asking us for our views on the future of humanities research data and it just seems to me that if we'd sort of take a step back that it it's it's going to have to be a sort of networked system but also needs a focus sort of center some sort of well to me some sort of center of excellence which when i think of it would have been the scholarship research center at melwood except university of melbourne specular spectacular self-go in abolishing it but i wonder if there's some process by which we could move to establishing a sort of focus for humanities research data or whether the community data lab is going to be there keeps not here anymore is he jay no okay um so recently i've been having discussions within the ardc um to generate what we are calling a portal so it's not this you know it's a pointing out to all of the work that we're doing so externally facing portal but branded as the hasten indigenous research data commons um obviously we don't want to centralize everything that everybody's doing because that goes against what what we're about really but we do want to have a central point of access where people can reach out to the projects that are involved in the haste rdc but an indigenous rdc but also to reach out to other projects that are pivotal i mean we don't think we're we're we're it in terms of humanities arts and social sciences research infrastructure there are some really long-term projects out there so that portal will provide links out will provide links into skills development and it'll provide access to the community data lab which is not a standard you know not a static thing that's going to keep morphing and growing so i'm hoping that that portal will serve the purpose that you're talking about um at least to some extent len so keep an eye on the website for that i don't know when it's going to come but it should be seen what do we call afternoon oh smoke oh everyone i'll let grant take over um thank you very much and i'm sure there's other questions that people can ask outside as you were talking jenny i had a few thoughts ancient ways and modern days is a good way to think and it is also about survival so you can bring ancient ways of thinking into modern days and as you were talking about all of these things i was thinking how does this whole conversation process skilling up building capability building trust in a place like armada how's that going to happen because that's a whole different level of culture knowledge and story and then i think in the year sister up in ti we're talking about indigenous but we always have to have this conversation that is inclusive of the Torres Strait islands east and west and central yeah so that's a conversation that we've got to be very mindful of uh and just not keep it all mainly in focus is because there's some serious implications around rope remoteness language story uh ability to build people up to a point where they even understand what all this is because everything will work there and that's when you're going to see people just shut down and if we go to ti whatever you do here add 30% on your actual cost because that's what you're going to do to get into those places same way you are okay ready to go okay shall we uh shall we all uh take up our seats again um i'm just conscious that we need to afford the last two speakers of the day the time for them to do their presentations uh jenny just told inform me that there's water bottles and books and other things out there which you're entitled to take um feel free to do that and i don't i didn't tell anyone after this there's drinks and nibblies or something uh where is it where's the drinks and nibblies here so what sort of drinks so we can hook it on the ground that sounds like a good party now the next speaker i'm going to introduce is bear with me would you come as a whiskey tom as a whiskey polish gen gen dobre so i said to you italiano viene qua i mean come here yeah um vortex is going to speak about the conceptual method a lot method a lot methodological and practical issues in spatial data integrate spatial data integration zoning integrated research infrastructure for social science g.a social system um project project associate depressed um professor that is the deputy director research and a research group leader at the institute of social science research and is also chief investigator in the australian research council center of excellence for children and families over the life course of the life course center holds a bachelor of science master in science and mathematics as well as uh masters in sociology am i right in that right um from the university of warsaw pole warsaw poland and a phd in social science from the european university in florida florenzo in italia the capicia uh project joined the uq from national center of social research in london and has specialist expertise in quantitative research methods and advanced statistical analysis i'm talking to you in italian yes yeah see capis now that's an or here i'm i think 60 people are going to speak more italian and i can speak gory and gory because historically that we would let do that so that's a couple of days doesn't matter you're going to work all those things out thank you so much and thanks thanks grand for this um introduction and uh just really um really happy to be here and so privileged to i suppose be kicking off what could be considered sort of social sciences stream of today um so look i mean i want to just say also that i'm presenting today but um there's a whole group of people you know that that sort of put work uh into this and and who are behind this of you know a lot of the things that uh that i'm going to be talking about now the team at the um university of queensland um but we also collaborate very closely with colleagues at the a new or in merman issued also asprey's is is involved so just to flag up that we're going to do a bit of a deep dive into into a specific component or work package of the um of the iris project the uh integrated the research infrastructure for social sciences so there are six work packages in total and today we're going to be talking to michael and i'm going to be talking about one of them called geo social and uh i believe tomorrow there are a couple of presentations about other components so you kind of cannot get a get a bit of a sense of what we what we do across the projects and and i think michael um will do a little bit bit of a kind of you know that sort of showing your different difference of work packages and how they interconnect beyond stuff the geo social um i do want to also start by acknowledging the traditional custodians of the land here as well as other places where people are joining us on online and just to add kind of just a different kind of introduction uh to to myself yes i i i come from poland i was born there um this way my parents still live and both of my brothers um i came to australia in uh january 2011 uh just after the floods in brisbane and uh yeah i've lived there ever since and and called um brisbane homes since 2011 and that's where i live with my uh little family and um i've always felt very welcome um look so i just want to to just to set the scene and what we're talking about the geo social what we're trying to do is is is connect data integrate data bring data together data about people individuals and and data about places and so there's a big some big opportunities around this and there's been some you know obviously quite a quite a bit of research that's of utilized this sort of data a lot of this comes from the u.s you know people have looked at things like the effects of neighborhood characteristics on all sorts of um social outcomes um there's of this kind of you know as as we heard in the discussions today this is actually kind of you know really relevant in the context of research um with indigenous communities you know the importance of place where we live um there's also growing interest in so place-based approaches among policy makers and in my own research area which is equity in higher education for example just to give you an example there are sort of policy targets and and and monitoring around things like um so students from so lots of socioeconomic status backgrounds and lots of socioeconomic status is actually captured is defined as an area-based kind of measure just because it's easier for you know to to monitor that rather than just collecting all this detailed information about certain people's incomes and so on so it is kind of increasingly used um however you know there are some there are some barriers to particularly in Australia for research to be to be using this data this data is not really that kind of data is not not readily available and what just to be more specific in terms of the kinds of data I'm going to be talking about today um one of the kind of um staples in terms of the the kind of the data sources or types that that that are used in in in social science as as people would would know I think uh surveys based on um collected from representative samples you know national national representative samples often over time so this is this is the kind of data that asks about you know all sorts of things about people it rarely um contains very detailed information or any any information about the places very rarely so if we want to have that information it needs to be appended to that sort of data it's and and this is this what we're trying to do but yeah the data is kind of fragmented um it's it's lacking good good documentation and but particularly around some of the more technical um issues or issues around the data which is really and and a lot of this detail is actually very important to understand and I'm going to be kind of you know coming back to this point um throughout the presentation today so it does really the sort of spatial integration um does does really require uh quite deep um technical and methodological knowledge around some of those issues because you know if you don't yeah if you if you don't do it right you know you you might be just getting kind of mis misrepresenting the data basically so there's there's a kind of as a consequence of that there's quite a lot of duplication you know people working individually or in small groups you know doing the same thing now and time and time again this creates lack of consistency it's difficult to reproduce or kind of or do it at scale and it's very time consuming so these are the kind of sort of things that we want to tackle here to make the data kind of research ready for people so that they can spend more time you know analyzing that data rather than just doing all of those sort of things that need to be done in the background so what we're trying to do in uh with this geosocial work part of the service is to design research retrieval and integration services that can generate data that integrate people place time and space so as I mentioned before specifically what we're looking to do is is to link geospatial statistical data for example data from the census to person-centered data from nationally representative longitudinal surveys such as Hilda the Hilda survey which is the household income and labor dynamics in Australia survey or the ELSA survey which is longitudinal surveys of Australian youth but we we also aim to to kind of accomplish some broader broader objectives but in case of down the track to really inform policy and build kind of you know community of practice and and and and provide training potentially and we talked about this this one down the track to kind of create those opportunities for people to to be using this this type of data so we to to when when thinking about designing this kind of service we really need to think about the user and what people need and and you know design design some sort of solutions with with that sort of user in mind so there are different types of users though and and you know you it's kind of it's it's difficult to probably to to build a single solution that's going to cater to to everybody's needs so we had to prioritize a little bit particularly for the kind of for the first phase of this project so we on the one hand we've got so more advanced users who are kind of trained in quantitative social science methods or statistics so they typically are able to perform data transformations merge data sets derive variables and and so on and and then analyze the data but as I mentioned before these these users would still benefit from this of easy to use trust worthy and transparent and reproducible solution because it saves them time and effort but with that sort of user in mind you know you want to you want to design a solution that that will allow some flexibility but also provide some sort of information or advice or kind of warning flags and and things like that around the kind of more technical things that they need to be aware of now if you were to design a service for less advanced users so you know people who might be sitting in in say policy departments where a lot of I think you know this kind of potentially broad broad really user base there but you know the services needs to be different for them because it it yeah it would require many more features you know but being kind of providing really ready to use data so features to prevent mishandling or misinterpretation so and and and some functionalities around data analysis and so on so this really kind of crucial thing to consider so so when when when we think about this of geo social data integration there are really two main scenarios and I said as I said you know before what we're talking about here is sort of integrating some sort of longitudinal survey where you have data on individuals and could be you know say I'll say about 11 000 of people young people that you start the data to get tracked over time over 10 11 years so it starts with the cohort of 15 year olds and and tracks people for yeah until they're about 25 and so so one scenario is to say appending some spatially structured data to the lc data so the resulting data set is still like lc you know it's the individual level data at person level over time but it has some characteristics of the of the areas where people live kind of appended to that and in lc we've got post codes this is so and this is the kind of you know the the thing that's used for the data integration in this case not not the individual address because we don't have that data that information but we've got people's postcode the other the other way to to think about it you know the other possibilities is aggregating some the information from the survey up to an area level you know whether it's postcode or sa2 or whatever and then trying to you know do some area area level analysis with that kind of data and and there are sources of challenges with pursuing those both of those options but particularly the second one so and i'm going to be talking about these in the next slides so the the first sort of challenge comes from the fact that we're trying to integrate survey data rather than say admin administrative data like you know center link data or something that's that's kind of for the whole population so as i mentioned before surveys are typically designed by you know some sort of randomized running some sort of randomized some sampling process and typically that's also done with sort of geographical clustering so so observations might be geographically clustered and and they might be unrepresentative of different geographical units and yeah i should have said that uh warned you that my my presentation is actually going to be rather boring because it's it's of text text heavy and not not many well not no pictures i mean michael's going to be much better in this regard but if you can imagine use your use use your imaginations and and you know picture in your in your mind map of australia with over 38 000 um collection districts for example for hilda and you know try to imagine that the sample is is is the observations in hilda only come from 40 sorry 488 of those collection districts so if you if you thought about such a map you know you you'd see that there's just little dots here and there right so the coverage can be really really patchy um and the the what follows is that the sampling and and stuff waiting as well which often you know surveys come with some sort of weights but they they really are designed to allow uh building reliable estimates from from uh survey to smaller levels of geography such as s a 2 c m s a 3 s uh or lg a s they they actually typically designed to allow an estimate at a state level um sometimes you know broken down by metro rural sort of thing and the other additional thing is with surveys this the sample size particularly the longitudinal um surveys with attrition i mean even if you start with something like 11 000 observations in lc you know by wave 10 you know it might be about 4 000 and when you start kind of looking at smaller and smaller areas obviously you run out of of observations very quickly which which kind of can cause problems the other the other type of challenge is around the concordance between survey and spatial data so um it starts with the actual the the availability of spatial units and they identify us in in in the data sets in both data sets so you know if we think about even two data sets you know ideally you would you would want to have a situation where um the same spatial unit is is available in both of those data sets something like s a 2 and you've got you know s a 2 either number or you know name but uh but uh you know something that enables you to link the data kind of directly um it's not always possible so sometimes you might need to aggregate um the data i say you know survey data to create higher level geographies um so for example postcode to s a 3 this is what we what we what we're going to be doing in in with they'll say um however this not not always straightforward and here you can see this in this table you know just an example of a brisbane uh postcode uh where you have the same postcodes of split across um five different s a 3s in in kind of you know and so percentages range ranging from you know nearly 40 but not even 40 percent to something that's very very small so you know you need to have ways to handle that sort of situation there's the issue which is typical you know common sources of data but you know data quality so we do have for example examples of invalid uh postcodes in l say even though it's a very kind of you know well done survey um you know things like that just happen this about you know not many of them but about 50 that don't exist and and you know you still have you know those codes um in the data so you know what you do with that um the other the other thing is obviously there's additional challenges um associated with the fact that we're trying to link data over time as well you know the data is is is is also longitudinal data so it it things can change over time so this this includes changes to um uh classifications including geographical um classifications so we uh you know in 2011 we had a change from ASGC to ASGS um standard and and you know this this is then regularly updated so you know we need to be conscious of that changes um to boundaries of spatial units particularly with something like postcodes that they seem to change all the time very often and I think the decision on on the sort of those changes is actually made by the um Australian Post Office and and it is it isn't very clear it's not very well documented how they for example communicate with the ABS then who have their own you know standardized postal postal areas that kind of should correspond to those postcodes but how how they do it and how how it's kind of how they communicate it's not very well documented you know so that's another so thing yeah and then decisions need to be made around temporal concordance so here in this example you can see there's multiple possibilities right so you know 2011 uh or you know 18 postcode could be could be um um matched to all sorts of SA3s SA4s and so on so ABS kind of units at different points in time you know we might need to consider things like do you want to interpolate between time points if the data is for example only available from 2011-16 you know and you've got LSA waves so you know we for example going to be working in the first instance with 2009 LSA data so that covers time points years from 2009 to up to 2020 so data from 2011-16 and 21 censuses are going to be relevant for this time but but you know it's not even going to capture all of all of this so you know you might need to you know think about the ways to to handle the fact that you know you've got you've got things that change between censuses as well and you know how you handle that um the other the other thing is this of set of challenges around data access and governance so you need to consider access requirements so you know data sharing governance kind of issues um you know often need to sign confidentiality or so something like that you know you need to think how you don't download the data all of those sort of things and and then those requirements depend on the dataset of you see and but but also on the level of spatial unit for integration so um non-survey area information for example might not be publicly available at the particular the small level of geographies survey data often has additional requirements or restrictions but when you when you want to access that sort of data so it comes as a part of restricted version for example and then there are restrictions associated with reporting outputs from from those kind of analysis so in Hilda's an example you can't actually without this explicit permission you can't you can't publish estimates at SA2 level at SA2 level or kind of or lower and um and and you know like some of this I mean comes from kind of you know fairly valid concerns around you know the risk of re-identification and those sort of things but you know it can create problems for researchers and we've had actually you know a bit of a hiccup with with Hilda because the SS who is Department of Social Services who who is a data custodian you know weren't really ready to engage with this person they wouldn't give us our permission to to use Hilda for the purpose of this project which is you know a little bit surprising and you know took took took us some time to to you know talk to them and then receive that sort of answers that's why we had to switch to ELSE now and yeah but that's that's just an example of the kind of things that you need to navigate so look I mean Michael as I said just going to talk about the kind of technical design and and the and the kind of for the pilot but just just to flag up some of those things and so wrap up the presentation um what we've kind of decided in the first place given the time and and now the time is even shorter because of the thing with Hilda that I just mentioned um and and you know with the feasibility in mind we we we want to start building and piloting the service by targeting fairly advanced users so so you know people who can actually manipulate some data but but you know might need something you know in the form of sort of editable scripts or something that will just you know will be able to just use and and maybe kind of tweak as they need but but you know it's going to do the job for them um we we will in the first instance we design the service to be run on a local um users environment so you know so the user actually has to take care you know around all the issues around um access and and you know confidentiality and all of that and security you know so that's that's kind of on the users responsibility at the moment we we yeah we and and Michael's going to talk about the connections to other um iris work packages you know we need to build in some I mentioned you know the the kind of maybe a system of kind of warning flags or something if for example the classification changes you know people people are aware of that and and and providing links to other um resources uh and yeah it needs to be it needs to be kind of really well documented in that in that that regard and we we're thinking again about um uh developing some some training user training around around this um service once it's once it's operational and then so far set of resources for that and yes but having said all of that we are thinking already about you know what what comes next you know so you know taking baby steps but but trying to keep the the service kind of as flexible as possible to to enable um you know adding more data different types of data down the track um so this is yeah thank you for the invitation again and thanks to the um ARDC uh and that's that's it any questions okay well you give Michael a plug but yeah Michael I didn't need to come up when I'm thinking Eleanor Rigby yeah yeah you heard that one before the lonely people where do they all belong um got to spark them up for the end of the day Michael's the last presenter for the day yeah um Michael's a spatial spatial scientist with background in design and engineering and data science support team manager at Australian Urban Research Infrastructure Network based at University of Melbourne current research interests include multidisciplinary research understanding users and geographic information science topics of representation analysis visualization decision making with a keen eye on cybernetics and the social implications of technology Michael is passionate about spatio temporal data analytics and research platform development his teachings experience of teaching experiences range from spatial visualization and analytics to applications of GIS and Michael also gets pissed off when people have a go at it about Eleanor Rigby sorry about that you know I don't know thanks very much yes I have the opportunity and the challenge of being the last presentation of the day um and it's been great actually coming after um wait text right after so we'll hold the questions till I guess later on um but certainly um the way this project is working is very much um it's actually quite a I guess the design of the whole project actually quite excellent so that's off to Steve the way the RS project drawing in the different components that are necessary and I'll get into that in the presentation today so first off um I too would like to acknowledge the traditional owners um Melinda which this event has taken place which for us here is the Wurundjeri Wurrung and um also pay my respects to the elders past present and emerging um so Vortex given a great um research perspective looking at the date and the challenges of actually bringing things together which has been fantastic and you know we've had a look at I guess the the current landscape and and what's the end result um of actually trying to do um work with longitudinal social science data and integrate a range of different uh products together and but really this project project is actually looking at um you know the challenges of actually bringing these things together is not just on the data side of things but it's also due to I guess the knowledge in terms of understanding what these things are um so we I guess we're traversing I guess the whole spectrum from from data to information to knowledge um and from a research infrastructure perspective that provides us with a number of different challenges so in my presentation I've lots of pictures because I'm the last one and also I'm going to be glossing over a bit of what Wojtek's done but also trying to sort of dive deep into sections way needed so certainly we do have a research who's out there and they have a research question um and their method um which they actually want to to use to investigate a phenomena or whatever it may be um that's up to them and we don't really know what that is and and certainly when they look out there in terms of the landscape of the huge amount of data which is out there right now you know it's vast and we've got obviously spatial temporal and a whole bunch of other dimensions which are characterized in these data sets and this presentation is going to center around that particular problem around obviously the research of the relationship to their method and the application and really drawing on those threads which Wojtek actually presented there so I'll be initially actually taking a step back and looking at well what is the research infrastructure because we are looking at one for the social sciences um we're looking at how we're trying to apply a design thinking approach to the development of iris um and the process we went through to engage document data flows and ultimately design and develop a solution at the end of the day um but we are very much um in the first phase of this project so we are very much doing a pathfinder project looking through I guess this space to see where the opportunities are and also to see where the blockers are um I won't spend much time on this um but certainly um looking at what is a what is a national research infrastructure now obviously with ARDC here today and with myself here from the Australian Urban Research Infrastructure Network which are funded by Ncruz or the National Collaborative Research Infrastructure Strategy uh we're out here to actually provide nationally significant assets facilities services and associate expertise to really support um the social sciences area and this particular project um and for us you know we're in my design thinking hat you know it's all about the users that are out there and here I guess we're talking about human users but you you could also look at machine users if you want to look at that as well um and who they are what they're after and where they are and what's their context and and I really like this slide from the ARDC um which has recently come out which is looking at their research data commons and and looking at the flows of of data information and knowledge across NRI um and particularly for the research data common space and for us on iris we're looking at that center box in terms of this integrated bit so we have data which is out there it's gone through some sort of um process um and this integration which occurs in the middle that's that's really the I guess the output we're working with um sometimes it's the input um but really in that center in that center space of the whole um I guess the flow and you'll see fair data is core there um and really looking at how we can use that core offering to actually enable things um and and really accelerate impact in the research space um so the iris project has five main work packages aside from project management um and I'm just going to touch on these briefly today to sort of contextualize around what geo-social is this this thing which Wojtek introduced um so as you see up there we've got a bunch of different acronyms and uh there's actually quite a few up there and and the first one there is is Vassal and Vassal is the actual vocabulary service um which is actually quite a another one in the social science space because there are a number of vocabularies that are out there but not enough um for us to actually connect these things up to that actually understand what these concepts um actually are in the social sciences we need to actually do a lot of work and Vassal is actually trying to stand up a vocabulary service for Australia. The second one is geo-social which Wojtek introduced and I'll touch on that in a moment uh we also have a range of demonstrator projects which is key to this pathfinding piece of work because we actually need to engage with the researchers on the ground to figure out how do they use longitudinal social science data and how do they perform integration in their work um and how do we then support them through a design of things on this project to help them you know create a pathway towards impact. The last two here spy and cards are these relate to two different aspects one the survey design how do we actually go out and design surveys for communities to actually capture information and the last one there cards is quite close to Iris in this particular project which is looking at the actual way in which we um curate these data sets and make the machine readable so it's a quick sort of contextualization side geo-social sits in the middle of all these things um we actually draw on the vocabulary service the survey design and also the metadata to actually build a data integration service or a capability to allow researchers to bring data together on people place time and space and as Wojtek uh mentioned before we're currently targeting a pilot capability um focusing more at the moment on demonstrator but through the interaction with the researchers on this project and outside trying to build this up to a national capability so um design thinking um I got all that other stuff out of the way design thinking um I've got a background in creative design which sort of as an engineer caused me some issues first first of all in my uh my life how do I bring these two things together but I'm actually better to hit this I guess point now in my life of actually how do we use creative design thinking approaches or at least design thinking approaches to actually design and infrastructure um and how do we think about innovation within the space um and design thinking as a process which is kind of emerged out from uh some of the early interaction design groups for um how do we interact in this particular case with digital systems um to actually do things and in the case of national research infrastructure um and trying to meet the goals of NRI um we want to support leading edge research and innovation and trying to actually get to that point um does require us a lot to make things usable and useful um and this will be start off with understanding users and the context is one of the sort of first software design 101 principles in terms of how do we actually get things going and um we do know obviously social science encompasses a really diverse and broad range of domains and disciplines um which is a great thing for us as the as the IRS project and on this project we actually had to get out there and use a range of different techniques to try and understand the research needs and working alongside the group from ISSR Melbourne Institute and ANU we're able to get a bit of an insight into their needs and and and what they're actually after in this area um what are their challenges and blockers which we heard from Wojtek before um and we can also draw on a range of different materials such as what code and scripts have they developed um back in their offices to actually do this task on a Nat Hawk bespoke um way all the way through to let's have a look at their research papers and how they've done this before and what were some of the assumptions they made in their work um and here we're looking at I guess the data capture process of sort of contextual inquiry and contextual analysis which sort of very traditional based approaches for the design of things and really what we sort of came to find out was that um a number of different things um firstly um all the squiggly lines um data is out there but um it may not be you know known it may not be easy to get um and uh in some cases there you have to ping pong around to actually find what you're after um and that data availability that I guess the fairness we're talking about here is research infrastructure space really maps back to the method what method are you going to choose and maps back to I guess the question that the researcher is making um but obviously we're not the supervisor of the researcher which is out there or at least the boss um so we have to try and work in the middle here and try to help them across this space and there's also then the the knowledge of the researcher where they're coming from and and the great things they're bringing this space for for what they're wanting to do because they have in their mind actually may not be what we have as an engineer or a social scientist um so we have to try and get our heads around that and one of the very first things we did on this project was to actually try and map the data flows um and uh whether it's uh starting with data and also moving up for information and knowledge and oops um and in this diagram is is very simple this is actually a simplified version um but we have a research down there in the bottom left hand corner who has an environment which they actually want to do some research within and they need access to these these data sets which are the ones from organizations in three boxes sort of in the center there um some of them are fully open they can actually go and download the data straight away others they have to register with an api or something like that to actually obtain the data um and on the other hand some of them actually have to fill out a confidentiality deed poll sign their life away ethics approval and all the rest of it to actually even start getting their hands on the data if they want to start integrating these things all across the spectrum um it's actually quite difficult um there are blockers along the way what they can and can't do um boy tech said before you know some challenges around working on spatial aggregation units in terms of how we're gonna you know represent things and whether we should be doing that as well um so in terms of data integration um what are we talking about here we're talking about bringing I guess two or more sets of data together um just bringing them together so they can be used in a way um whether you're talking about fusing merging joining whatever word you'd like to use we're just bringing two things together in a way so that we then you can use them together um and a lot of the ways that's done whether it's done spatially or temporally or in other dimensions um this joining process may have a number of requirements um which may be dependent upon the research method they've chosen um so if you're trying to join things together and you're using a particular method from data science from social science there are things you don't do and there are things that are more encouraged and that's part of our user knowledge is trying to get our heads around what that is so that in our in the design of our infrastructure we can try to make something which is usable and ultimately useful um so as an example for people who are familiar with I guess a bit more informatics type stuff we're talking about what we're called geospatial lifelines more sort of crazy concept basically looking at how people move across space and time in this particular example we've got I guess two individuals or two different stars we're the first on the left we have an individual living in a particular suburb and over time maybe going back to the example from Wojtek 2011 2016 2021 they've actually stayed in the same location but the postcode or the abs unit around them has changed shape and that shape that actual dynamic and changing has been a result of population change demographic changes or something like that so the actual unit we use to describe that location has changed further the actual tabular representation of the dataset may evolve over time as well so the schemas of these things may change from a very yes a table of say 100 columns to maybe 120 to 150 and the meaning of things may have changed we know that in the ABS census for example questions are morphed over time sex going from male and female to something else and other questions have also morphed and changed as well on the right hand side we also have a person who's moving around Australia so we've got a person which is actually maybe changing jurisdictions and the data collected about them also changes so we change I guess moving around in terms of spatial units in terms of once again postcodes or spatial units and also the schemas which I guess describing things about this person are changing and then we've got heterogeneous data emerging here and all the all the other challenges are voytec looking at there and so these different scenarios are looking at particularly maybe migration patterns people moving across different parts of the country at different points in time maybe looking at changes in employment changes in education and here we can actually link all this back to some of the longitudinal datasets which are out there so how do we actually get into I guess designing an infrastructure and we sort of starting to think about all this stuff now the very first thing we always look at user needs we sort of looking at okay well coming from a social science perspective or coming from outside social science in what were the actual blockers or opportunities in that space then really comes a really important part is how do we take those user needs and start to map those to high level requirements I guess requirements aren't user needs in terms of a system and particularly when we're looking at things like user experience and usability goals which are trying to meet for infrastructure how do we try and come up with those those high level requirements and then distill those down into requirements which we can prioritize for the design of something so back in august we ran a workshop with the researchers up at University of Queensland and started to think about what would be the requirements for a research infrastructure for the social sciences and we started to look at a number of different things in terms of what would be a minimum viable product what would be I guess the simplest form of infrastructure we could create for them and going back to that diagram I've had before it's like well actually we want to provide support to them we want to support their you know their leading innovative research which would require a number of different things we want to help them around I guess their solution or their method not design it for them but help them to get to that point of actually using tools and other assets in the community and then make connections to these data sets through services to help them actually do things and Votex already provided some examples there and some of the bits and pieces we've been looking at is say the use of the LSA data set which he didn't actually say is about 700 variables in an LSA data set every year or wave and in total there are 7,000 variables throughout the whole of LSA one cohort and those variables obviously may have different differences in concepts and all the rest of it over time and how they map to different ABS data sets so we've got a really large challenge and as an infrastructure around how do we handle this variability across the thing and then how do we actually help researchers navigate this space so our MVP was quite challenging and one of the very first tasks we looked at was all right we've got all these different data sets out but how do we start to look at this data integration task and you might not be able to see there on the screen but effectively looking at one particular data set and how do we break this down to things we call aspects and a particular data set have may have various aspects from what were the observed the things observed out there in the environment obviously we have a survey and we're observing things then we're looking at all the way through to the provenance information and other details moving down to what is the licensing of this data set what we can actually deal with it so we can look at I guess this this initial view around what is it this particular data set which a researcher wants to use and how do we then explore through this data set to find opportunities for where they actually may integrate this with something else and when we started to flesh out these different aspects we're able to identify those areas which would need to have particular machine readable vocabularies in them to help us understand what some of these variables are so what is this particular thing which is represented in a data set um we can see this as plain text or we can see this machine readable form and if we actually define what this is somehow and make it linkable coming back to I guess some of the stuff that Nick was talking about before we can start to obviously connect things up with a vocabulary service we can start to draw in some really rich metadata curation work which is needed to actually embolden or strengthen some of these data sets and then look at how do we then bring other data sets together to perform this tricky task of integration so geo socials focusing on that integration bit at the top so in order to integrate two concepts we need to know firstly about what these things are whether it's defined in metadata or in a vocabulary and do some degree we call reasoning to make sure we can actually connect these things together and that's the tricky bit and the persisting challenge in some ways so for us looking at integration at the start we come with a very simple naive integration script which we wrote a very simple package which we could share with researchers and say hey you could you could join two tables together you know you probably do this all the time but we're going to try and find a standard way for you to do this with a nice shiny app on the front with a GUI let's give that a go and the integration will maybe just be simple in a join using things like correspondence or concordance tables from the abs and then if things didn't look like they were going to match we could have an integration flag to say that you don't you know you probably shouldn't join these two things together they're not the same concept a doesn't equal b and that doesn't equal c so I guess this integration flag is quite an interesting concept in terms of understanding that what this data set is and going back to the metadata and all the rest of it so from a user perspective we had to really explore this space quite a bit and this is very much the current focus of work right now to support this whole space of data integration obviously to provide a lot of documentation and reference material to actually help researchers actually perform this task not just find the data sets which are out there in the social sciences what's actually compatible with LSA or Hilda to actually do certain things but also links to other data sets which are out there but also training material as well and finally obviously the repository the actual tool itself currently sitting in a github repository I said it's a simple naive version but we're looking to enhance this and obviously socialize it with the community and find gaps as well as build a roadmap out for our research infrastructure make sure that all the pieces of the puzzle there so as we put things out there and have discussions with the community it's all about iterating over what is out there and also getting the feedback from the community and make sure that we're building something which is usable and useful without that feedback what is innovation innovations really tested against a market and without that market and getting the feedback in we're just creating something which you know we think is great in isolation but if we're trying to create a national research infrastructure we obviously have to go through this process and this is obviously the whole purpose of what we're doing so future of work for geosocial there's so many different things and this is something I was trying to pick my brain myself with nick over lunch thinking about well you know we know in in the space of reasoning there are so many different things we should or shouldn't be doing firstly we need to do a lot of engagement with the researchers to figure out the whole space in terms of data and integration and how that benefit can be passed on to the research community and innovation can be made how can we strengthen data curation work that cards work package from iris how do we really strengthen and expand the metadata of things so we can actually do stuff with it at a machine level how can we actually infuse more information into it or at least through linked business and device out to services which can provide that information in and then how can we expand iris's capabilities not just from a simple couple of scripts in a github repository in a web page how can we create this out to encompass things like data enriching ontological design link data store as well as facilitating data queries on the cloud so ultimately we can refine geosocial we can make reasoning better and actually do proper reasoning using an ontological framework to say that these concepts are related in this way rather than doing things in a much more simple based approach and the final thing there around that's our overriding principle and everything that we do to make sure things are findable accessible interoperable and reusable at every level not just the data but the software and the virtual environment as well so iris geosocial is very much a work in progress we're making some great gains and working with the other work packages which you'll see over the next 24 hours I guess whether it's dinner tonight and have a chat or in the presentations tomorrow you'll get to know more about us so thanks very much for today and um any questions going once going twice thank you very much Michael for your presentation okay um and you did well being the last one for the day I don't know about you but I'm feeling a bit boring you know what that means I'm hard but everyone put your right arm up in the air like this as far up even you are for us too now you're going to reach it back back down toward your back and just go one two three give yourself a pat on the back because you've done an exceptional job today that was a long day um but very productive in informative day to you no doubt a lot of what was good but to me to be quite honest and that's I say that respectfully but you're communicating to a group of people who are going to still think the same thing but to think about how you simplify and communicate build trust build understanding mutual respect and so forth