 Hello and welcome to our presentation regarding the Integrated Research Infrastructure for Social Science in Australia, the proposed IRIS project, part of the ARDC's Humanities Arts and Social Sciences and Indigenous Research Data Commons. My name is Dr Stephen Keckram. I'm the Director of the Australian Data Archives, part of the Centre for Social Research and Methods at the Australian National University, and I'm presenting them on behalf of the project partners involved in IRIS. My aim today is to give you an overview of the IRIS project, the foundational issues that we're looking to address with IRIS, how we intend to go about them, and a sense of how you might be involved in looking forward to receiving your feedback about the program itself. So, a short summary for reference, we'll outline the basic problems in drivers that are driving our intended IRIS project, what are the key solutions or enablers that we're looking to use, what's the approach we intend to use, what's the work packages that we are looking to undertake as part of this project over the next two years, through until June 2023, and what's the overall impact we expect to see on both research and policy going forward once IRIS is fully implemented. So a quick introduction to the problem that we're looking, or the set of problems we're looking to solve. India has a number of institutional level social science facilities that support parts of the overall demand for social science data and infrastructure amongst Australian social science researchers, but that infrastructure for the most part is largely bespoke small scale and fractured in its approach. So what we're looking to do is to bring together these sets of institutional infrastructure into a larger national framework. We need to be better integrated and interoperable in order to keep up with the sorts of global research and infrastructure developments that are currently operating internationally, particularly Europe and also the United States to allow Australian researchers access to the sorts of facilities that their international colleagues are able to take advantage of. Australian social science research represents a significant proportion of the overall research community in Australia upwards of 20% of the research workforce. So we're looking at large scale in terms of users and in aggregate large volumes of output and Australian social science research has generally been well regarded internationally in terms of its research activity, but now we need to continue to develop the research infrastructure alongside that to support those emerging needs of that community. Australian researchers also rely heavily upon government data sources, the most prominent of these being the Australian Bureau of Statistics, but also policy and program departments such as the Department of Social Services through their longitudinal studies program with data such as Hilda, the Long Children Survey of Australian Children, Department of Education, Skills and Employment who provides major administrative and longitudinal data sets as well and a Department of Health through their administrative data sources and also major investors in data collection programs both directly and through research government facilities such as Australian Institute of Health and Welfare. Access to these resources and be able to analyze and use these resources as a key activity within the Australian social science community as well and we need the infrastructure that allows us to both access that content and be able to integrate it with other sources of data and other analytical tools and facilities to address the research outcomes that we're interested in. Alongside that, a significant proportion of the available data that we require for the evidence base for social science in Australia has significant ethical considerations associated with it. It needs often to be kept confidential and has various privacy considerations for those who have participated in the research or the collection of data about them. How do we support those confidentiality and privacy considerations while at the same time enabling data that is useful for the purposes of furthering both research and policy to better understand the social problems facing Australia? So in order to address some of these concerns, we need to take account of the privacy and confidentiality challenges but not while at the same time ensuring that suitable research outputs and policy outputs can result from the analysis of these confidential sources in a privacy-preserving manner. This is the focus then of the integrated research infrastructure for the social sciences. Iris will call it from here on. Iris is intended to be developed in two phases and here for the purposes of the House of Research Commons, we're focusing on phase one. Phase one covers 2021 to 2023 and is supported by the House and Indigenous Research Data Commons program. Under that program, we're really focused on three objectives. Firstly, establishing a coordinated governance and access model for access to data and facilities for the analysis of social science by Australian researchers. Secondly, enhancing research capacity through a stable and long-term environment for the creation, dissemination and use of data, allowing researchers to get data under suitable access controls to the places they need and work alongside other data sources that allow them to understand the problems that they're studying to achieve suitable research outcomes. And finally, enabling a cost-effective and accessible data integration environment through which they can actually bring multiple data sources together. In phase two, which we expect to occur from June 2023 onwards, as part of the currently, the National Research Infrastructure roadmap currently being undertaken by the Department of Education, Skills and Employment, we'll be looking to expand upon the outcomes of the initial Iris investment to support a broader range of social science data sources, such as qualitative data and social media data, both of which have had some initial investments in recent research infrastructure investments, to be able to support systems and tools for the capture analysis of real or near real-time data sources, such as both social media and internet of things sources. And thirdly, to align the data collection, integration and analysis requirements with secure data facilities, both physical and virtual, for enabling the access to highly sensitive data, and to address those situations where current practices may not be sufficient to address the privacy concerns associated with certain forms of data. What are we trying to achieve? What's our expected outputs from the Iris program? Four key outputs is really what we're looking to achieve. Firstly, a governance model for future social science infrastructure investments to allow us to continually bring together those established social science infrastructure and services, as well as new facilities and content that may come online. Secondly, to expand access to a broader and higher quality range of research and public sector data, and progressively moving into related sectors, such as commercial data and content from non-government and third sector agencies, supporting enhanced data collections for those high volume, high value data collections of national significance, such as the Australian census. And fourthly, enabling better integration of data from multiple sources across spatial, temporal and dimensions, as well as different units of analysis, be able to move between individual and aggregate data as well as across geography and historical time periods. So the project is intended to enable streamlined management of research data across collections and institutions, better improved data collection and processing for researchers, allowing them to focus on analysis and reporting, to move us from the 90% of data munging to the 10% of analysis and output, and then providing a foundation for enabling sensitive data access and integration into the future, bringing together the integration capabilities in phase one with the census sensitive data environments moving forward into phase two. So how do we intend to go about this? Well, part of our intent with Iris is certainly to leverage facilities and services that might be shared with the other projects inside the Hassan Indigenous research data commons. The sorts of things we expect will be should be shared there and I'll come back to this later in the presentation, things like vocabulary services, working with the RDCs, research by Capitalist Australia, data repository services across the multiple projects and collections, providing additional support for Indigenous could data collections to support the Indigenous data network, leveraging the shared access and governance arrangements for Indigenous collections being established through the Indigenous data network, and looking at access and authentication models across Hass collections and related domains, such as health and environmental science, leveraging both a shared program of work within the Hass commons, as well as related activities within the cadre project, coordinate access for data, researchers and environments, and a number of other ARDC and increased investments. The sorts of impacts we expect to see resulting from this work, while expanding the ability of social science research to contribute to public policy development, being able to contribute to competitiveness of Australia's companies and other sectors, recalling that the humanities, arts and social sciences encompasses the business and economics disciplines who can have a significant contribution to make and have strong connections into the private sector in Australia and overseas, expanding and supporting improvements in the quality and quantity of social science research, and the competitiveness of Australia's academic institutions relative to our international collaborators and competitors in the research sector, providing a more productive research environment for social science researchers, less time wrangling, more time analyzing, more time writing, reducing the time to publication for research, and lastly, to enable the coordination of data investments across the social sciences in academia and government and progressively into the private and third sector communities. In terms of our partners, contributors to the IRS project include the University of Melbourne, the University of Queensland, the Australian National University, where the Australian Data Archive is based, and the Australian Urban Research Infrastructure Network, along with the major partner, the Australian Research Data Commons, and we do expect more partners to come as we develop this work program in the weeks and months to come. We'll very much emphasise reuse in the development of our project. Key to the development of virus will be the reuse and extension of existing infrastructure both in the social sciences and in the research environment in Australia and internationally. This includes support for things like standards-based systems throughout the research life cycle. There's heavy investment in both data and metadata standards in the social sciences in Australia and internationally. We intend to be able to make optimal use of those to support both the curation and integration processes that are called to the IRS facilities. There'll be a focus on reusable fair data outputs, so archival storage and access to any outputs that the result from the IRS system forms part and parcel of our implementation. We'll certainly be working for integration with both national and international infrastructure in social science and e-research, an example of some of the potential collaborations that exist listed here. Making use of standard tools at the levels of infrastructure, data collection and curation allows us to be able to extend both our tools and our facilities to leverage new developments within our partner organisations as well. These include certain analysis tools common in the social sciences, standardised data collection tools that are often embedded or subscribed to by institutions across Australia, standard software development languages and environments and standardised systems for deployment of research infrastructure, such as Docker, Rocky and the like. So that's an overview of what we're trying to achieve using IRS. What about the focus we have on how do we implement? So I'm going to turn now to an overview of some of the work that we intend to do within IRS, but I want to start with how we're a framing model for thinking about how does IRS support research activity in the social sciences. And to facilitate that, we're leveraging a fundamentally a research process model to understand where each of the work packages of IRS fit into research activity in the social sciences. This is leveraging some standard research process models from social science researchers such as Alan Primand, common infrastructure frameworks such as the Western Australian Biodiversity Science Institute, a model developed for biodiversity in Australia, and also tying that to an understanding of the technical considerations aligned with different stages in the research process, picking up on recent work from colleagues at the University of Queensland implementing a technical solution for research problems at that university. So here we have a fundamentally an overview of a fairly standardized research process that not every project will go through every one of these activities, but most projects will deal with one or more of these. And given the emphasis in our work on integration, our key here is to be looking to integrate across multiple activities both within each of these stages and potentially across different stages of the life cycle where possible. So what are the work packages we intend to undertake? I'll come back to work package one in a moment, but let's start with the work packages that are outlined in the project plan. Two of our heavy major investments in the IRS project are aligned with both access to standard standards and practices and then data integration activities. So work package two, what we've termed VASIL vocabularies access service for social sciences in Australia is focused on a core service for the creation dissemination and reuse of classifications in vocabularies in Australian social science. Looking to support creation storage and reuse of data and metadata standards that are in common use in a lot of Australian social science research, things like statistical classifications such as the Ansco occupational classification or the the SNOMED library, similarly standard sets of question responses and survey response formats, even things like simple items like like its scales, these sorts of things that allow us to both streamline the process of data creation by using standard descriptions of content but also facilitate the integration of data once it's created by recognising the common content within new data collections they'll create. So where we might use the Ansco occupational classification in data set one, if we can recognise that that standard has been used in that collection and another, we have a potential point of integration of two data sources. So this becomes a foundation then for work package three, the geosocial work package. Under geosocial what we're looking to do is establish a search, retrieval and integration environment to create new data products that integrate data across people, place, time and space. Be able to collaborate with both research and government data custodians and prospective end users to integrate significant national data homings and the examples that we're thinking of are leveraging the longitudinal studies from the Department of Social Services and the census data and other spatially enabled data from the Australian Bureau of Statistics working with the University of Queensland and the Australian Urban Research Infrastructure Network to bring together spatial and personal content into a common integrated data package. Work package five, which we've titled SPIA, Survey Production Integrated Research Environment, is intended as a means for facilitating the process of data creation, particularly for the workhorse data collection activity of social surveys in the social sciences. So SPIA is intended to support end to end processing and metadata support for survey data collection and archiving. Be able to draw upon the standard web data collection facilities such as Qualtrics and Lime Survey and potentially other environments that are already web enabled but often are not aligned with the storage and archival preservation environments that we need to be able to maximize the reuse of such data. So we're looking here at using the survey data collection, supporting survey data collection, using a vocabulary and question bank services based upon the VASIL work package described earlier. Be able to harmonized archive content with existing classifications vocabularies picking up on both VASIL and the geo-social package and then enabling throughput of newly generated survey data into data processing environments, both on the researchers desktop and into cloud-based facilities or national data processing facilities such as CloudStore. Sitting alongside this work package, cards, the curation and research development environment for the social sciences, where we're really looking at looking to here is a standardized program library and training packages for the management of social science research data. So what Cards is looking to achieve is establishing standardized processing tools initially based in R but also in other common statistical tools such as Stata and SPSS to be able to streamline the process of data management, data cleaning and data quality control to facilitate data integration and data processing that would be accessible through shared statistical software libraries such as R packages for use in day-to-day data management. So a practical set of tools that researchers can use on their desktop or in a remote desktop environment that allows you to connect your content with external facilities where possible. So Spire and Cards will work together to provide the tools for processing and an environment for moving content around effectively between the different stages of the collection process. So having described the core implementation software development or infrastructure development activities within Iris, we then have two work packages to support both the external use of the data and services and to coordinate activities across the program. So Work Package 4 is a series of demonstrated projects intended to establish and illustrate the integrated capabilities of the different services we're establishing in Iris and these have been aligned to project use cases which are detailed in the project plan and include a spatial data analysis demonstrator run by the University of Queensland, a sensitive data analysis demonstrator through the University of Melbourne and an Australian Census digital collection demonstrator being run through the Australian National University. Alongside this we have a work package around project coordination across the different work packages to facilitate the effective integration of the work packages into a cohesive whole aligned with the integrative intent of Iris. So these include activities around project management coordination such as strategic directions, operational project management and external communications and technical management and integration to allow project-wide technical architecture design implementation, the front-end interface development and development of a web presence for Iris going forward. So to highlight the where each of the work packages fit into the overall program here we can see where each of the work packages is intended to fit within our overall Iris research process. So a number of packages have applications in multiple stages in the work program and others are really focused on collection activities or curation activities and the like but each of them is intended to at least move between one or more activities within the overall research program. In addition to these we can think about some of our existing infrastructure and how they might align projects like the ANZ Lead, Political Science, National Data Collection and the Cardray Platforms project being supported by the Australian Research Data Commons as well provide related facilities for data support and data creation as well as coordinated data access. Across the program we're also considering potential integration activities with the other HASS and Indigenous projects, the other three as part of this overall program these include questions around access and authentication and governance, possible shared consultation activities, expectations around computation such as for high-performance computing, graphical processing units and artificial intelligence, whether we might have skills and training shared infrastructure picking up on some of the work that's been done within the the skills group at the Australian Research Data Commons and then broadly looking at longer-term roadmap requirements firstly a HASS research data commons roadmap looking forward to feed into both the short-term national research infrastructure roadmap needs currently underway but also setting a framework for making further investments go into the future and alongside this legal requirements for things like data management and infrastructure going forward. We're also proposing a governance committee steering committee drawn from project partners the social science research community and government and non-government organizations to recognize and support the sorts of communities that we expect IRIS to be able to contribute to and an overview of the proposed breakdown of the committee is here and we'll have an independent chair drawn from the non-partner organizations to lead the steering committee. We bring together a large number of collaborators and they're continuing to expand the set of groups we expect to be involved across the research government, the discipline representatives and the university partners and we provide a sustainable program for continuing this infrastructure into the future both through long-term facilities such as the ADA which is now 40 years old based at ANU and the NCI National Computational Infrastructure which we we operate and also those investments of our project partners by leveraging reusable open-source infrastructure components and really aiming to align as outlined in our phase one and phase two program proposal to with longer-term national roadmap and government data agendas such as the Office of the National Data Commissioner coming from the Department of Prime Minister and Cabinet. So that's an overview of the program overall we hope that has been informative for you and we look forward to your feedback and I've provided my contact details here if you're interested in knowing more along with those of the Australian Data Archive and we're coordinating any queries you might have on behalf of the IRIS project partners so we look forward to hearing your responses and working with you into the future. Thank you.