 Okay, great. Perhaps we'll start here. Thanks everyone for joining us today at this meeting of the Australian Sensitive Data Interest Group. This is an interest group that's co-facilitated by ARDC, the Australian Data Archive, PHRN and MDAP. So let's move on. I'd like to begin by acknowledging and celebrating the first Australians on whose traditional lands we're meeting today. As I was saying, I'm in Canberra, so that's the Ngunnawal and Nambri people, and I'd like to pay my respects to their elders past, present and emerging. As I was saying, this is a very cold land at the moment, but it's beautiful land to be on. The sky is blue, and the light is bright and clear, and it's a great privilege to be joining you from this land. So as you may have noticed, the meeting is currently being recorded so that people who can't join us today can find it on our YouTube channel. So for that reason, if you don't want to be captured by the recording, I would recommend that you turn off your camera and microphone, and that way you won't show up in it at all. I would ask that everyone here turn off their microphone anyway during the presentation, just so that we don't get any audio interruptions. But if you don't mind accidentally showing up in the recording occasionally, then it would be great to leave your camera on, and that way our presenters can see a couple of friendly faces. So a couple of bits of admin before we get started. For those of you who aren't already on it, we have a mailing list for this interest group. That mailing list is where we advertise meetings that are coming up, we share news, but also you're able to post to the mailing list. So you can ask questions. We had a bit of discussion a little while ago about the way that data relating to children is classified in different universities, which was quite interesting, and I would love for us to have more discussions like that. So if you want to sign up to the mailing list, I will post a link in the chat once I've stopped sharing my screen so that you can go there and sign up. I will also be posting a link to our collaborative notes document. So that document will be open to edit during this meeting. We put in there a space to take notes to record, and I encourage everyone to take notes. We record useful links that have been posted in the chat, things like that will go in there. We also have the recordings and slides for all of our previous meetings, and the recording and slides for this meeting will be there as well as soon as they're available. So a useful document. So that's enough for me. I would like to introduce our speakers today from the ARDC supported Monash SIRP project. So I'm really looking forward to hearing a bit more about secure platforms for sensitive data. I, in particular, I'm quite interested to hear about the range of different kinds of projects that can benefit from using that kind of platform and a little bit more about what is actually, I think, I understand in theory, but understanding a little more about what in practice is involved in getting a project up on one of these platforms. So I'm really grateful to Vivca, Matt and Juliana for presenting to us today, and I will stop sharing my screen so that they can present to you. Thank you, Nicola, and good afternoon, everyone, and thank you for joining us today. We hope you enjoy our presentation while you have your lunch. We are here to talk about the scalable governance, control and management of fair sensitive research data project, or more known as the e-research platform, secure e-research platform, SIRP. Matt, if you move to the next, thank you. A quick agenda of the main items we're going to take you through. Thanks, Matt. So we're going to take you through the needs for the project, the project, and the lessons that we learned throughout. So SIRP project is co-founded by the ARDC and is a national collaboration across multiple institutions. We deliver a secure, trusted, and scalable environment for data governance, control, and management services for data historians. But we also provide secured remote data analysis environment for researchers. The primary objective of the project is to lower the barriers to making sensitive data fair, findable, accessible, and interoperable, and reusable. We do this by emerging technologies, processes, and controls to build a trust between data custodians, researchers, and collaborators, of course. We aim to deploy and run trusted prevent technology called the SIRP software to stock, and it's managed nationally by consistent service. We will achieve this by deploying and multi-tenancy arrangements to enable research collaborators across jurisdictional boundaries. There was a slightly change of a scope last year to expand to a national platform, and that's Keypoint, which was developed to service the University of Queensland. We will talk a little bit more about Keypoint in a minute. Our second aim is to employ exemplar projects, research exemplar projects on the SIRP service, and integrate the SIRP across the specific research applications. This will help us to demonstrate the capabilities and the benefits using the platform. And our term aim is to establish three communities of practice, data custodians, users, and infrastructure COP, and we want these communities to enable training, knowledge sharing, development, and the dissemination of best practices and principles related to the use of the SIRP service. Matt, thank you. The need for the project, for the service. Data custodians are responsible for managing sensitive data. They require a robust and secure environment to ensure the proper data governance and control. And we've identified several drivers behind these needs for data custodians. So there is a lack of control. So data custodians must have mechanisms in place to remain in control over their data and prevent unauthorized releases or loss. Also ensuring that data is used only for authorized purposes. This is a significant concern for data custodians. We also identified the need to enforce strict access control to prevent inappropriate use of sharing and the sensitive data. There are also several factors that influence the need to improve the sensitive data collaboration. As the data protection regulations become more strict, data custodians and researchers need to adapt to ensure compliance and privacy protection. There's also that concerns for organizational reputational risk. No-dates organizations really recognize the potential of reputational damage if sensitive data is mishandled or misused. So leading the reinforcement of secure collaborations and practices. So this is very important. With the advancement of data-driven projects, there is a growing demand for data to be fair. So collaboration between data custodians research is crucial for achieving these goals. Thank you, Matt. By understanding the drivers behind these needs of both the data custodians and the researchers, we can work towards bridging the gap, which is crucial for advancing knowledge while also ensuring data security and compliance. To meet these needs, Monash established a research collaboration with Swansea University in the UK to adapt and deploy an instance of the SERV service on the ARDC's Nectar Reach Cloud. SERV has the potential to be scaled nationally to enable Australian institutions to manage their own data while also supporting effective cross-institutional collaborations. Thank you, Matt. We offer a user-friendly interface with a single easy-to-understand and navigate user experience where users can efficiently manage their projects without any hassle. SERV allows users to manage and provide data quality information right at the point of their upload. Users can also easily access the quality of their datasets, enabling them to trust and utilize their with confidence. We offer a full suite of analytical tools, allowing users to work with the products they're more familiar with, either a statistical software programming languages, visualization of tools. Our platform will support Sims as integration for a small analytical workflow. And we know and understand that data comes in various forms and formats, so our platform also accommodates the structure and unstructured data, providing flexibility to store and analyze diverse datasets. SERV provides secure and remote access, enabling users to work from anywhere in the world, regardless of location, which is very important these days as well. Thank you, Matt. So I briefly mentioned that Keypoint is an outcome of the project that was again to attend the needs of the University of Queensland and is a data analysis platform that was developed by QCIV. It has been co-designed to meet the diverse requirement of the researchers in the UQ, providing them with all the necessary infrastructure, software system and analytical tools to conduct prior for data analysis, addressing all the research questions. Keypoint is still in version one, and it was released end of last month, but it's still testing and just to ensure the optimal performance and reliability, so it will be ready very soon. Thank you. I think that's all for myself. Yes, thank you, Matt. I will now hand it over to Vivek from UNSW, partners of the project, and Vivek will take you over their onboarder examplar projects and challenges they've faced. And that's it. Thank you. Thank you. Thanks, Juliana. So yes, I'm Vivek Akhats. I'm the research manager at the Centre for Healthy Brain Aging at UNSW. And as way of background, our centre collects data from a number of longitudinal cohort studies of older adults. And we have a long history of sharing this data with... Sorry, I got distracted. We have a history of sharing this data both with people within our team, but also with researchers from around the world. And one of the things that CHIBA does as well is that we lead a consortia called COSMIC, which has data partners or partners on all six populated continents. And the sharing of that data is quite complex. So based on all this, the previous data manager at CHIBA, Kristen Kang, established a partnership with Dementious Platform UK, which uses the Swansea SERP for their data sharing. And DPUK has been extremely generous in sharing all their know-how and templates with us, which has been very helpful in establishing DPAU. At this point, Dementious Platform Australia is servicing only studies as part of COSMIC because that's what we've got NIH funding for. And it's been also very, very helpful that we've got the ARDC funding and as part of the SERP project at Monash. It's taken a little while to get the tenancy agreement in place. That's definitely one of the learnings and challenges. Getting legal sign-off on agreements is a time-consuming process. But in any case, we've gotten there. So the platform, well, it aims to enable lots of dementia research, which of course is a big global health challenge leading course of death in Australia. And of course, it's going to increase as our population ages because age is the biggest risk factor for dementia. And I guess we know from our own experience with our own data set how important it is to have a secure way of exchanging data. I feel like when you send your data to a researcher somewhere, you can never be quite certain that they're going to do the right things with it. Will they store it appropriately? Will they not share it with other people, etc. And the other beautiful thing about SERP for us is the multiple institutions that one can contribute data sets and also the multiple institution and researchers who can access them from anywhere around the globe. Matt, could you advance the slide, please? Okay. So DPAU is a data platform to enable scientists to discover and access data from what we call contributing research studies, which are the data custodians, and in that way to allow secondary data analysis to aid in the prevention diagnosis and treatment of dementia and age-related diseases. So maybe just talk a little bit about the types of data that people collect. So there are questionnaire-style data about medical history, physical and mental health status, medication use, family history, lifestyle, prior trauma. Some of this data is definitely quite sensitive and we don't want people to be re-identified. We do at least a CHIBA, but many other studies of this kind do medical exams that do neuropsychological exams to examine people's cognition, memory and thinking, IQ, mental health assessments, again highly sensitive. We collect imaging data, so MRI scans and PET scans of brains, and these scans may include identifying facial features. There are ways to remove those features, but then you're no longer dealing with the raw data and you may be removing some opportunities for analysis in the future. We collect genomics data, including whole genome sequencing, and I'm sure this audience knows that these days that actually can be quite identifying, you know, with the advent of 23andMe and Ancestry.com and, you know, crimes being solved by people's DNA sequence because they can identify close relatives, etc. It's potentially, de-identified data is actually in this way potentially quite re-identifiable. We also collect digital data, such as recording of voice, and there's a move within dementia research to collect more digital data, such as geospatial data on a participant's residence and their movement in the local area, and we also know that this sort of data now with the use of AI and other technologies can be potentially re-identifiable. So these types of data are sensitive and may allow a person to be re-identified now or in the future, and I think this is where CERP really comes into its own because we have good control over where the data sits and the potential export of any data or identifying information, and it just really means that all the data custodians can sleep a little bit better at night. Next slide please, Matt. So the DPAAU platform consists not just off the data warehouse on the CERP. If we have a public face and information website and on that website you can see a study directory with some sort of big picture metadata of the studies that contributing to DPAAU. We have a matrix that shows users the studies and the sample sizes, but also what types of data each study has collected. Do they have a medical exam? Do they have MRI scans? Do they have genetics data? So it's a big picture, but it can give you an idea of, oh, there may be four or five studies in the matrix that actually could help you address your research question. And then we have a data explorer, which is behind a login, but anyone with an educational email address can easily get a login to this, and the explorer provides access to de-identified but individual-level data for a subset of about 30 variables. So things like age, smoking, whether the person has a dementia diagnosis or a particular prominent genotype that leaves a high predisposition to dementia called ApoE. And then we also have an online data application form. So we actually provide some level of governance or assist the contributing research studies with the governance. So any study that deposits data with us will have to sign a data deposit agreement, and any project that results from a successful data application will have to sign a data access agreement. Then the contributing research study data are stored on Monash SERP, and then it's provided in a separate folder to each approved DPAU project in accordance with what the applicant have ticked on their data application form. The DPAU projects can also import additional data onto the SERP. So say we don't have all the cheaper studies on SERP at the moment. Someone might want to combine something that is on SERP with a data set they already hold elsewhere. They can import that so they can do a mega-analysis and actually do the analysis across the data sets. Any results or graphs or anything people want to use in papers, manuscripts, the export of those results is monitored by DPAU staff. So you have some oversight over that people are not just exporting the whole data set or anything inappropriate like that. Next slide please Matt. So this slide I want to just try and demonstrate how DPAU adheres both to FAIR and the five safes. So FAIR, Findable, Accessible, Interoperable and Reusable are all in salmon orangey coloured text below and the five safes is in teal. So five safes relating to project people, data settings and output. So in terms of making the data findable we have the data directory and matrix which is public-facing and then we have the data explorer and it just shows on the left you can when you go into the data explorer you can select what type of data you are interested in. So I'm interested in alcohol. I only want to know how many people that may have data on alcohol whether they never used it currently used it or used it in the past. I don't want anyone who only has missing data on alcohol and I could then do a similar thing for smoking or genotypes. They are only interested in people who have the Aperfor risk genotype. I can then filter my data on these types of variables and also at a study level I can say I'm only interested in studies where they actually collected MRI data and then you'll get some high level output of how many people may fit your criteria and what sort of sample size could you potentially garner by collating data across multiple studies. You can also filter on say I only interested in studies that are conducted in Southeast Asia or in low income countries. There's quite a few different things both at a study level and an individual participant level that you can filter your data on. So the next panel across with the theme domain family and variable that refers to a data ontology that the collaborators at DPUK have been developing called the CSERV data ontology and this helps to make the data interoperable. If you want to combine data across multiple datasets you will find that people code sex in many different ways and actually if you want to then combine it and do the analysis it's tedious. So the CSERV has a standardized way of categorizing data into themes and then you go down into it. So theme could be medical exam data, it could be medical history data and then domain you're kind of breaking down in further categories until you get to the variable and the naming of the variable has a structured way of doing it. So this of course helps make the data interoperable but it also helps us to keep the data safe while allowing the data explorer to actually do its work right. If we standardize the data across for the small number of variables across studies it allows people to actually do a little bit of exploration before they submit their data request to see that their analysis is going to be feasible without revealing to them the individual data points. So the next panel I guess is around the application form and the governance that the website and DPAU supports so it makes the data accessible and reusable but in a safe way in terms of keeping the project safe people have to apply they actually have to describe what they're going to do and it also means you can do some vetting of the people who are going to access the data and only with the access to the SERP you're ensuring that only the people who are approved to access the data actually are accessing it. Of course sharing of your password for the SERP is a big no no and that's insured by multi-factor authentication and then we get to the SERP itself it makes the data accessible and reusable by anyone with the right credentials across the globe. It keeps the data safe you don't have to worry about someone doing you know reverse engineering of the data to identify participants for whatever weird reason people might have to do that it's a very safe setting and of course with the monitoring of the output the output is safe we have an idea of what people are outputting from the SERP and we can deny output requests if we don't feel that that's appropriate so the SERP provides the data storage multi-factor authentication for access and monitored export of results. Next slide please so some of the challenges and lessons oh it needs a lot of money and it needs a lot of time and it's slow so we've our teams received NIH funding to establish the DPAU for sharing a Cosmic Member Studies data which has set covered six continents and Cosmic has 58 member studies and we approached them all from 2022 onwards and the pie chart on your right just shows the engagement we've had so a small percentage starting at 12 o'clock have declined a number of studies have said they're unable to engage so most of the reasons for that is that the data custodians have limited resources to curate the data even to the basic level that we require and to you know dig out their metadata and the data sets and actually provide them in an interpretable way to us we are offering to assist them with application of the the CSERP ontology but just even to kind of get everything together and have the energy to a momentum to actually get the data delivered to DPAU is a barrier for some studies we have some a handful of studies where the the principal investigator is reviewing the process and making up their mind about whether they can engage with it quite a number of studies are sitting with the I guess contracts and legal officers of their institution awaiting review and as I said this is quite time consuming it can take months before it gets pushed through and then we have a small handful of studies that are signed up and we've also offered because we realize it can be a barrier that some studies will only provide their metadata so at this point in time they won't be providing the data to sit on the CERP but they are providing metadata and in some instances also sufficient data to populate the explorer and at least in that way it makes their data findable and it may be that these studies have already in place a different application process and they don't want to complicated by adding the DPAU on top of that and so people can find the data they can apply directly to the data custodian and then if they're approved and they want to combine it with other data on the CERP they can import it and I guess one of the things that we are really grateful for is that the ARDC funding really provided the necessary focus to establish a relationship with Monash CERP and to facilitate the local learning here at GBA we don't I'm definitely not a techie person I think everybody in the project knows that now we don't really have the resources within our team to establish anything like this by ourselves and I think our relationship with the Monash has actually made the CERP onboarding the smallest challenge for DPAU so super grateful for that. Next slide please Matt. So at this point in time we've focused on what I call flat file data but there is also a need at some point to expand that to onboard imaging and genetics data and I know our colleagues at DPUK have already managed this in some way and we are hopeful that once we get more grant funding that we will have the resources to expand that to DPAU as well. The other challenge I guess is that we know that harmonisation methods need more effort and thought to optimise even like if you do a neuropsychological assessment people don't always use the same instruments they may have different ways of measuring what we call processing speed or executive function or different types of cognition and so there's different ways of harmonising that and this definitely needs more effort. There are great enthusiasm for federated analyses and of course DPAU complements very closely. Dementia's platform UK who have on boarded 51 studies and also the Alzheimer's disease data initiative. They have a slightly different setup called the Alzheimer's workbench but they are also applying the CERP ontology to the 48 studies that they have on boarded. So the ontology actually opens really up the door to do federated analyses in a more efficient way. We are talking both to ADDI and to DPUK about developing interoperability across platforms architectures to allow this federated analyses across datasets located in multiple regions. Our DPAU manager is currently involved in a federated analysis with the people at DPUK where they've got the variables and data dictionary and then they're writing the script in UK and Brewery is running the script on our data here but even just doing that is actually quite complicated and time-consuming and that's it for me. I'm going to hand back I think to Matt. Thank you Rebecca and Juliana thank you and good afternoon everyone. I am Madyshek, Senior Project Officer at Monash University and manage many of the deployments to Monash CERP. I will talk to some of the interesting technical challenges we have encountered for this project. Firstly, XNAT. XNAT is fundamental to one of our exemplar projects and is an open source application used to receive and store medical images from hospitals. A review and testing was completed to ensure the application can be controlled in a sufficient manner and can be accessed by approved users only. Once this was completed a script was developed and made available to projects to allow the download of images from XNAT Direct to Strip Storage. Our risks were identified and in consultation with project stakeholders, governance restrictions were put in place. For example, when it is configured for a project to be able to access XNAT from CERP, it does mean that all members who have access to both the CERP project and the XNAT project can be able to have the ability to make the downloads. So as part of a risk mitigation for this, access to the XNAT repository was restricted only to a very limited number of users, approved users, and the XNAT Utils application used to download this to CERP was, the permissions were only allowed for this for a very specific and small window while the data custodians could set up the project. We also had issues using the most recent version of the XNAT Utils tools. We did have to revert to an older 2020 version of the application to get this to work due to a bug with the software that had been introduced in the most recent version. Issues like this add to the ongoing maintenance of projects required to ensure the services maintained to a satisfactory level. Also, Active Directory at its core, CERP integrates with the Monash University Active Directory for provisioning of projects, managing of the memberships and their corresponding roles and commissions. CERP requires full delegated access to the Active Directory systems to manage these aspects. And as you can imagine, that's we aren't able to do that fully due to some restrictions with Monash University and allowing access to the Active Directory. As a result, we have developed a custom integration tool for Monash CERP in consultation with Swansea. This doesn't allow us to communicate with the Active Directory but not have direct control. The restrictions this currently puts on us is that all users must have a Monash ID created to utilise the platform. This means every user with external institutions must have a Monash email address to access and also matching multi-factor authentication. This is still a work in progress. We are always looking to improve this. We always look at new opportunities and new technologies that we can do to help with this situation. For hardware, this has been a real challenge, especially since the COVID pandemic has amplified the situation. This has led to increased lead times and the scarcity of some components, especially with the current of GPU-capable hardware. Costs are also on the rise in this area and need to be considered when forward planning. We've mitigated this by carefully reviewing our usage and ensuring we're submitting procurements only for what is required and carefully managing this on a regular basis. We're doing quarterly procurement missions to ensure we have enough but we're only really ordering what we need when we need it. For software and various project requirements, from a general perspective, each project has unique requirements in their own way. For software, projects require specific software usually. For example, we have received increased requests for specific geolocation or mapping software that requires integration with online mapping tools and specific package repositories. The effort to complete due diligence on this software to ensure it meets our security standards and to configure the software to download maps in a safe and secure manner requires extensive effort. For resourcing purposes, it is important to assess each project on its merits and understand the software requirements and data being used. This ensures the appropriate hardware and software is available for each project. Some other lessons learned and for Becky to touch on some of these as well. So project onboarding, it's very important to get early buy-in from project. It was very helpful to have stakeholders immediately to consult with and to understand the requirements and be able to build the platform to those requirements without that buy-in and constant consultation. That is very difficult. As Rebecca mentioned as well, legal agreements not established early did create some significant delays for the project. So do encourage these to be started as soon as possible, even drafting before projects if possible so that you have an understanding of what you're dealing with and there's always usually delays with just dealing with legal teams. Also this relates exactly to ethical approvals as well as Rebecca mentioned onboarding can be easy but actually waiting for ethical protocols or other agreements can take a long time. We did have very successful collaboration efforts with our communities of practice where the community collaborated to create training guides, minimum requirements for trusted research environments and a shared knowledge base which is being actively used by the platforms in the community. Compute power for one project was not sufficient and we did have to find another solution. This was due to hardly limitations on what was available for us to purchase. We did require a very high-powered machine that we could not get in a satisfactory time. So we did have to shift one project to a different infrastructure to allow use of a high capacity computing. Also listed some operational needs and agreements that are important to have for this kind of projects. These slides will be shared after agreement so don't worry about writing these down or anything. I will make these available afterwards. Finally here's an example of some of the interesting work we're helping to facilitate. The RISE project is evaluating the impact of health interventions within specific at-risk communities. A high volume of unstructured data is being passed through the platform. The second project praise uses AI and machine learning to predict fracture outcomes for Victorian trauma patients. This project requires high GPU compute capacity and high volume data transfers of clinical images. ASMID is establishing a unique imaging repository comprising 3D body images. This involves the integration of XNAT that was mentioned earlier with SERP and Keypoint with the ability to establish the use of demoscopic mapping software available on the platforms. SERP and Keypoint is also supporting the Atlas project which routinely acquires electronic medical records related to sexual disease transmission. This is using data linkage technology to facilitate the merging of additional national data collections. Thank you for your time today. I'll stop sharing now. Fantastic, thanks all. It's interesting to see the range that it's not just the medical because I know that this kind of platform has emerged from medical research but or been highly used by medical research but to see it starting to be utilized by other kinds of project is really interesting. I'll open the floor to any questions we might have either if people want to put them in the chat or just throw a hand up either way. Yeah, Kristen? I'll get the ball rolling. Matt, something that I saw on that last slide that caught my attention because I don't know much about it and I feel like it's something I'm going to have to learn about doing a peer, that privacy impact assessment. Was that specifically, was that the SERP itself? Was that the kind of project level? I don't know much about peers. I've started to hear them in different discussions. I know that ONDC data place has had one or is doing one currently. Could you give us a bit of information about that? Absolutely. Your question and thank you Kristen for kicking us off. Appreciate it. You can always rely on you. The the privacy impact assessment is fundamentally going to need to be done on a project by project basis. One thing we've learned from this is that as you're aware, you just said you're not not aware of how it works and what needs to be done and we found that's pretty common. So what we've tried to do is collect the information of what is required. So there is some foreknowledge of what's ahead and what is required. So yes we did do so how it works here at Monash is we have as part of the OJC, which is the legal office at Monash University. We have a sub office called the Data Protection and Privacy Office. So they have a template they follow for a private impact assessment and we answer the questions with them and they provide us a recommendation. From this project we have developed a template of the questions that can be asked. So an understanding of what is required is available. But usually the advice from legal is that this is done on a project by project basis, not on a platform level. Obviously a lot of the questions are going to be the same on a project basis because it's related to platform specific governance etc. So it'd be very similar questions and answers. Excellent, thank you. Oh and just so we had a question before about whether we'll be able to share the slides and as Matt said we'll make those available. I'll both email them around and link them from the collaborative doc. So thanks to our presenters for sharing those. Oh Steve your hand is up. Yeah I was interested on the harmonisation question with the dimension platforms. Because as I mean as you say this is kind of the how do you facilitate the interoperability between systems. It sounds like that was a significant part of the result. Your non-response or your declined response from your possible partners was we don't have the time to do the harmonisation fundamentally. And I say we're kind of working on a couple of projects around this as well with ADA and the various projects we're in. What is this? I mean essentially I'm guessing this is a manual process you've kind of got to go through. You just got to map, produce and then could you describe a little bit about that as to what the resource intensity is. Because this is a often this is where you know the heavy lifting comes from. Yeah so just on those projects they don't actually have to do any harmonisation themselves necessarily. They just have to deliver the data but I think if people don't share it regularly it's not necessarily going to have a good data dictionary or anything like that and you know if you've got to make it available you've got to make it usable as well. The ontology, I know that the DPUK group are working with people to apply some type of machine learning to at least try to predict what a variable might be and then there would be some human oversight to say well yes that gender is sex or you know some of those easy ones hopefully you would pick up and they're up to about 80% accuracy with that so that definitely is a help. So that's kind of just to get the variables labelled appropriately but then you know there's the second level harmonisation. Okay are males always coded one, females two or whatever and I know the actually the ADDI group have developed a tool to help with that but my colleague who's much more into coding etc says it's easier just to do it yourself but if it's a non-expert like me I might actually choose to use the online tool if it's a student and I guess the idea for DPU is as people work with the dataset and these recoding of data is done that we try and at least within the cosmic consortia to kind of get it consistent across the datasets that are being used and then okay harmonisation when people don't collect data in the same way I mean sex is relatively easy you could go by the chromosomes or what people identify as but imaging data if you use a different scanner for your MRI scans which you know people by nature are going to do they're not all in the same city in the same institution one of the things cosmic likes to look at is differences in ethnicities but well if the Asian studies done in Hong Kong and the Caucasian studies done in Sydney well is it the ethnicity or is it the scanner that's causing the difference in how people's brain volumes might be different etc so there's lots of complexity around that and I think we've recently been looking actually at some imaging projects with some people from Swansea you know I think sometimes people kind of just put the head in the sand and say we're just going to go ahead and do the analysis but I don't know that there's enough rigor in making sure that the measures are comparable across studies that's probably a particularly challenging one other times people kind of just use set scores etc and kind of centre the means and and try and make the datasets comparable in that way but again you may be washing out differences between groups by statistically trying to make them comparable. Thanks Vivca and I think we've got a couple more quick questions one is is there a technical architecture document for the DPAU site that could be shared? No. So I know that the data explorer Rory has been building that Nashini and she is looking to publish that and make the code etc for that available and you know I think she's found okay she built it and then she went back to actually document it so that other people could use and understand her script and that in itself is challenging so she's she's working on that the website etc we utilized sort of the template that DPUK had but we rebuilt it at a point so there are probably procedures and manuals but no there's not a neat little package for the bow on it I can say here you go. Thanks and we did have oh and our next question has already been answered which was could you elaborate on the reasons or requirements that drove the development of the new key point platform for QCIF and Matt has responded we could not realistically deploy deploy to QCIF in the time and cost required and that's for the timelines of the project it just wasn't feasible to get that done with legal agreements and whatnot it was going to that really needed to start a lot earlier for that to be realistic just take a lot of time which was really underestimated. Well I had a question sorry Nicola I can just interrupt that I kind of wanted to make a point on that also on the cost right it's nice to want to data share but if you don't have the resources to do it it's really hard okay Chiba is kind of lucky we've got a lot of philanthropic funding and we spent some of that on our data sharing efforts but you know you need people and you need to pay for the infrastructure you know having your digital data as you know from we researched last year right it costs money to have it sitting somewhere and you know having a tendency on SERP doesn't come for nothing so I don't know we kind of if we want to do more data sharing we kind of have to build in those costs somehow in when we seek our funding. That's an excellent point so and I was going to ask it's almost a related question I think because I mentioned Matt that people who access SERP need to have Monash credentials and that's not a I mean I guess I don't know about the process for acquiring that for at Monash but is that still a fairly manual process is there a you know a time delay are there limits on the number of people who will be who can who can do that I yeah I have questions yeah no there's no real limits for one there's a lot there so it is it is more steps so we have a relationship with the identity team at the university they understand their intentions for why we're using these accounts they're a very stripped-down account so they don't have access to all Monash systems it just gives you access to an email and gives you an entry into the active directory so we can have a listing for them basically we then like to link that to SERP so it's a technical hurdle we've had to put into our operations we have a like said a relationship with the identity team so when we do on board our external partners we are able to quickly collect that information for user information submit that to the identity team we usually get them back within a couple of hours honestly but the agreement is within a couple of days they'll have them ready for us but typically it's within 24 hours and usually you know very very quickly so we can still turn them around very quickly it doesn't affect us except for the fact it's a couple of additional steps in the workflow and that you know it is an inconvenience for external users in the sense that they'll need an additional MFA method from their typical account there will be a second account they'll have to manage as well to connect to SERP that could mean basically you know if they're if they're using they'll need Octa or Google Authenticator so if you've got one of those apps already it's usually pretty simple but if not you will need to install it most people these days do have MFA on their phones so that isn't so much the hurdle but the fact that you need to rely on a second account to access it is and something we're constantly looking for new ways to improve there's actually been some new news just in the last few weeks that we need to investigate and we're waiting for some updates from our e-solutions team to understand the ramifications so that's fantastic in terms of that that relationship that you have with the identity team I think I could see that being an incredibly important thing in terms of managing yeah how that works yeah we just have five minutes and we do have a little bit of closing up admin so I think we might just take this last question do you store data on SERP how does retention and disposal work for that data yeah so we SERP is really an application layer it's a management layer to manage the access to virtual machines manage file in request file out request and users and attached to that is our secure storage that we manage here at Monash so that secure storage is is accessible only in SERP within a secure private network it's a dedicated hardware and infrastructure within our existing data center for for retention we basically set review periods on a 12 month basis to check in with projects make sure nothing's changed and that they're happy to keep storing the data for disposal first there's a backup so whenever the data change there is a most recent backup stored so for disposal we have an off-boarding process where we need to gather some key information so first off you know is it to be disposed of completely or to be archived and then depending on those answers it will branch off the workflow into how we'll act on that but we do have an archival process whether it be five years seven years which are mostly the standard or indefinite or we have a disposable method where we destroy the data is how we label it excellent so the intention is not that the SERP secure storage is the long-term long-term home for the data but rather that if it needs to be retained long-term it is moved into archive correct yeah sorry that's one thing I miss saying there is that SERP is not a storage location it's an analysis platform so we shift data in and out as required some data can be there for you know a couple years or more that's fine if it's being actively used but we do know use it as an active storage we have longer-term storage infrastructure we'll shift it to great thank you very much well I'd like to thank you again for for the presentation and for everyone for the discussion afterwards that was really informative for me definitely as this is not my area of expertise but it's an area of great interest I would before we go I will just throw to my co-chair Kristen who would like to share an event that may be of interest to you all thank you yes before we go it's cross promotion time so the program that I manage is called the Hassan program I know that a number of people on this call are involved with the program in different ways it is ARDCs co-design program currently with the clinical trials community but going to be expanded in the next few months and then over the next few years to address health research beyond clinical trials the program's been running for three years the we're launching the platform that's coming out that's one of the major outputs out of that program so it's a data discovery platform so if you're a researcher looking for clinical trials data in Australia there is it now a national platform that you'll be able to search for it and also a platform that you can then request access to the data in one spot rather than going through to a whole lot of different people in their different email addresses so in a few weeks time on July 18th we're going to be doing our formal showcase and launch for Sander and the platform Nicola already put the event bright link in chat and I've just done it as well so yeah encourage you all to come along and yeah thank you fantastic well that sounds sounds like an event that could be pretty relevant to most of the people here so then all I have to say again is thank you to everyone for for attending today we will send you the link to the recording and the slides and I look forward to seeing you at the next meeting thank you thank you everyone