 Welcome everyone to this meeting of the Australian Sensitive Data Interest Group. So this is an interest group that meets approximately monthly to talk about all things sensitive data. This group is co-facilitated by ARDC, the Australian Data Archive, PHRN and the Melbourne Data Analytics Platform. I'd like to begin by acknowledging and celebrating the first Australians on whose traditional lands we're meeting today. For me that's the Wajatmanga people here in Perth and pay my respect to Elders past, present and emerging. So just a note, as you would have heard, this meeting will be recorded and we do like to share these meetings on an ARDC YouTube channel. If this just means that if your camera's on or your mic's on, you may become part of that recording. If you want to avoid that then just switch those things off and if you have questions you'd like to ask at the end but you don't want to be recorded then you can just put them into the chat and I can present them to our speaker and in fact as we go through the presentation if you could enter questions in the chat or save them for the end that would be great. Okay so a few things to do today. We have a few new co-chairs for this interest group coming into 2023. So Felicity Flack from PHRN and Alex Mihailovic from MDAAP who I didn't put there but brilliant. So thank you very much to both of you for joining and opening up our network of interesting presenters who we can bring along to these sessions. I'd like to remind you that we have a mailing list for this interest group. You can sign up, I'll in a second I'll pop a link into the chat with the how to sign up to that but so we use that mailing list to let you know about meetings to coming up but you can also use that mailing list. So if you have something, some news you'd like to share that sensitive data related or a question for the community you're actually able to post back to that mailing list so any of you who haven't used that feature before you're very welcome to. The other thing is that we have a collaborative notes document. We can take notes into that document today as we listen to the presentation. It's especially useful if often people will share useful resources in the chat we'll pop them into there but I've also added at the top a section for suggestions for future meeting topics. So if anything occurs to you either that you'd like to hear about or you hear about something really interesting and you're like yes they should be presenting at Osdig then you can put that there so I will pop a link to that into the chat as well in just a moment. So that's all of my nonsense out of the way. I'm really happy to introduce today's presenters. I think that so governance within one jurisdiction can be tricky enough. This project has been doing the very important and challenging work of trying to sort out governance across many different jurisdictions and I think that for that reason alone it's a really it's going to be a really interesting talk but I also love to hear about sensitive data from ecology. I always find it really interesting. So without any further ado I'd love to introduce Cam and Tanya to talk about the restricted access species data project. So I'd like to start by acknowledging the traditional owners of the land which we're speaking from which is the Nunguul people and acknowledge their elders past present and emerging. As a bit of context for this project and I'll be talking about what comprises sensitive species data and the background of the project before Tanya then talks about some of the some of the solutions were in the process of developing but I suppose it's a way of kicking off. Sensitive data has been a long-term issue in biodiversity data. In fact it probably appeared just about the same time as everybody actually was starting to compile species data sets. Now can I just pause and can everybody now see the slides? Yep we can see it now thank you. Yeah it's good thank you. So yeah as I was saying sensitive species data and biodiversity data have kind of been issues since the first emergence of biodiversity data sets. So it's not only a national problem it's an international problem and it's probably on an ongoing basis one of the single biggest impediments to being able to undertake landscape level analysis of biodiversity data. So this project was proposed as an ARDC funded project with in-kind contributions from 16 partners including all the government jurisdictions in Australia as well as the Western Australian Biodiversity Science Institute, the Atlas of Living Australia and Eco Commons. The project has kind of evolved a bit over the life of the project. Originally it was aimed at developing a national framework, a data repository and then a demonstrator of a secure data haven but over time that's evolved into while we've continued to persevere with the framework the data repository has merged more with the concept of a sensitive species data servers which we'll be talking about a bit later and while the demonstrator and the demonstrator of a secure haven for data analysis has also really continued to evolve over time. Our conversational pretty much be focused on the framework and the data service. In our space there's a key dependency in so far as the Department of Climate Change Energy Environment and Water is currently developing a thing called the Biodiversity Data Repository which will be a shared view of the biodiversity data held by all jurisdictions in Australia that's used for decision-making with regards to environmental assessments. That project is independent with our project because really sorting out issues around sensitive species data services is an important element of being able to build a sensible biodiversity data repository and it's also key to our workplace being the Atlas of Living Australia which is the largest public display of biodiversity data in Australia. So just kind of reprise what restricted access species data is and just to kind of kick off by sort of talking about terminology, conventionally in the biodiversity data space people talk about sensitive species data that becomes very problematic and it was discussed at length at an early stage in the project because the use of sensitive tends to get very rapidly conflated with official government security classification statuses. So one of the elements of the framework has been to try to disentangle what constitutes true I suppose classified data from data that is sensitive or restricted for a range of departmental or operational reasons but the project was also originally focused on species-related data categories which I'll talk a bit more about in a moment and it was very clear at an early stage in the project that while the core was sensitive species data per se there were a range of behaviors around types of data that meant what was actually impeding data access was a larger group of issues they include personal identifiable information, indigenous data, usage restricted categories so things like third-party data licenses, data embargoes and the like and then lastly what would more traditionally be called sensitive species data so basically data that either from a conservation or similar basis you wish to impede access to information about the species either the identification of the species or where the species occur or some attributes associated with those records. So what is the difference between most people tend to conflate the concept of threatened species and sensitive species and what's the difference? So a threatened species of those species listed by legislation or recognized by programs such as the IUCN red list most species even if not seen are widely distributed enough, populist enough or simply so hard to find that regardless of whether or not they are actually listed threatened species they're more at risk from major threatening processes, climate change, land clearing, other human perturbations, pest species and this includes rare species who might have a wide distribution range but a very inconsistent occurrence within that distribution range or whose presence in the landscape are basically that difficult to predict or that difficult to find there's no actual risk knowing where those species occur. Sensitive species however are rare or even common species whose appearance, value, habits or rarity make their location information high risk if shared in the public sector. So the classic example would be parrots that are valuable to a wildlife trade that can also include things like frogs, species that whose occurrences can be affected by threatening processes such as the spread of fungus in the case of one of the frog pictures there or simply just species that have a particular stage in their life cycle that makes them highly sensitive if their locations are known. Most jurisdictions and some third party data holders like Birdlife Australia maintain sensitive list species lists and so the purpose of the framework was to really deal with the sensitive species or as we're now intending to call them restricted access species but also come up with the framework managing these things by also managing the other data issues that I talked about in the previous slides. Next slide please. So just to clarify that some threatened species are listed certain species are sensitive species some are not and some sensitive species aren't listed as threatened but there's overlap that they're not necessarily the same. Good point. That's all right. So a key part of the project has been the development of the restricted access species data framework which is intended to be a non-legally binding document. It will outline consistent best practice guidance for sharing restricted access species data between trusted parties and try to provide a nationally consistent method for modifying restricted access data for public release. And just to kind of flesh out that a bit what one of the the recurrent issues that we have with regards to this type of data is that a very common means of withholding the data is to fuscate the data by randomizing it or generalizing it the latitude and longitude. Because of the increasing prevalence of aggregated data sets in the biodiversity space one of the commonest issues is you can get potentially get the same record from four different sources into an aggregated data set and legitimately end up with four different points because it's been fuscated in four different ways and the metadata has not been adequate for one reason or another to actually track that change. So trying to develop a nationally consistent approach is partly about where the data is made public making sure that the data being made public is consistent enough that people can recognize commonly fuscated points. The framework is comprised of principles and guidance on managing and sharing different types of RASD. So basically principles about how the data should be shared or fuscated or modified provide consistency on how the data should be transformed. It proposes a common approach to how requests for restricted access data are handled and also suggest clauses for legal agreements and another big focus of the framework is suggesting that negotiated legal agreements are generally preferable to standard clause data license agreements in terms of adding more legal protection for both the data provider and the the data requester. So where's the framework up to? We're up to draft 15. In fact we're hoping to actually have a final version ready by the end of this week. The next project working group meeting is tomorrow. The framework's undergone a fair audit and it's also been broadened to include the care principles about Indigenous data sovereignty. The drafts have been reviewed by all jurisdictions and have also been put out to third party comment twice now. The DECWA, the Commonwealth Department, has been briefed monthly as have the state territory jurisdictions. As I've already mentioned we have undertaken broader public consultation and while we started with a signatory type document we moved towards a statement of principles which most parties were happier with. So basically we would anticipate publicly releasing the framework towards the end of this month all other things being equal and I'm now going to hand over to Tanya to talk a bit about the data service. So the data service is currently in development where it will be a simple request management system that basically funnels data requests to data custodian. Not everybody who adopts the principles in the framework necessarily have to use the data service at its voluntary and its opt-in. What it does is track requests for restricted active species data so that both the requester and the data custodians know where they request it up to at any point in time. It includes an approval process and a release just tracking the release of data. It doesn't transfer or aggregate or hold data. So it's a simple request management system. We are going to enable on DOIs to be minted for each data set that's released to a data requester so that the data custodians can track those DOIs and use them for reporting purposes to track usage and different elements of the data sharing. And then it also will include functionality for if a request staff puts in the inappropriate request or breaches data. The legal agreement conditions and other custodians can be alerted to that and take that into consideration once they when they get extra requests from those the same data request. And I'd probably also add that the reason for taking this approach early in the process as I mentioned we were looking at an aggregated data set or using the BDR as the aggregated data source to deal with some of these issues. It became very clear as we evolve the framework and the concept of a aggregated data set that that that was problematic for a lot of data custodians. And we did an international review of how these things work elsewhere, the answer to which is pretty much as ineffectively as we do it in Australia. However, the Finnish biodiversity information facility actually developed, as far as we can tell, the only functioning data service in the world, which it was a very simple approach, a la what Tanya has just outlined, where the participants in the data service all retain data sovereignty and retain the ability to transform and release data themselves. But what is effectively being developed is a front end that makes it easier for data requesters to be able to go out to most data custodians within a national context. And that was certainly the feedback that we've had from third parties was that the data service was a much needed mechanism to make it easier for organisations and individuals to identify where data sets exist nationally. And for some of these species we're talking about 14 or 15 primary data sets. So it can be very difficult to disentangle and identify where those data sets sit in Australia. So part of the service will also include a metadata catalogue. So a catalogue in one place that is listable, the data sets that include restricted access species data that's searchable, so that a data request that can identify which data sets potentially might include information or data that they're interested in for one purpose or another. And then they can register and put in a request. And then that request is then, if it covers multiple data sets from multiple custodians, it'll field those requests, that request to each of the individual data custodians. And it will also track the negotiation and development of the data licence through the legal agreements between the data requester and the data custodian. So as Pam said, we did a review of international precedent. The only similar model for restricted access species data or sensitive species data was the finished biodiversity information facilities one. We reviewed similar projects in the health and biology space and also had some detailed conversations with Data Place that's being developed by the Office of the National Data Commissioner. There is overlap, but we didn't quite meet our needs and nor time frames of our project. So as we went through a procurement process through what used to be called Digital Marketplace, what's called MyRT or something now, we received six applications to develop the service and we've now on boarded our preferred vendor and development is underway. They've been on board for three weeks. It's a pretty ambitious timeline. We are scheduled to deliver the data service by the end of April with you. So another aspect of the project that's evolved over time, we've spoken a bit in the presentation about sensitive species lists or restricted access species lists. Because each jurisdiction in Australia maintains a list, it's going to be relatively important nationally for users to be able to discover what species are actually on which lists and what the level of fascination is that's being applied to those species. So over the life of the project, we've also been trying to develop a single national list of restricted access species that third-party data custodians can apply to their data again for this idea of national consistency in the way things are dealt with. That's arisen a number of issues. There are fair issues because not all lists are publicly available. There are vocabulary issues. In particular, the taxonomy of species names varies between jurisdictions quite markedly. And there are methodological issues because the processes for identifying species and a fascination isn't really consistent. I can't remember if there's another split. So really, that's involved a lot of meetings. So what we've been trying to do is reduce the level of conflict in existing lists, so particularly where species are listed in more than one jurisdiction, trying to land some commonality of how species should be dealt with, which is frankly still ongoing. But probably the biggest take-home message that we've had is to really try to engineer both the framework and the amalgamated species list or the national restricted access species list in a way that avoids interfering with existing regulations and procedures. Effectively, the approach we've taken with the RASL has been to enable jurisdictions to continue to use their existing processes and species lists, try to define taxonomy nationally for third-party users, make the combined list publicly available and get agreement that jurisdictions will have an agreed process for new species. So basically trying to avoid any sort of conflict with effectively what are eight quite different processes. And so that's it. We'd like to thank our colleagues in the project and technical working group that are involved with the project because this has involved a lot of input from many people around Australia, and at least a few of them are on the call today. And also ARDC and ALA as well as the Department of Education for funding the project. Thanks for listening everyone. Thanks Bo. So we do have a couple of questions in the chat already and I'd encourage the audience to add some more there. But there were quite a few people who were interested in your processes around DOIs. So the first question sorry is the DOI on the data before the request or on a specific request for the data? So I think that they're asking so is there's all the data received at DOI and that DOI is then shared, you know, or used when a request is made? Do I get applied once a request is made? So DOIs will come, will be assigned to divisional data sets and often they already have identify the sign to them already. So we'll just reuse those. What will happen when someone puts in a request? A DOI will be admitted for that request and then it will include information around the parameters included in the request. So it might be for a particular species in a particular location. It may cover several data sets. It will give you it will report on which data sets were the request was used that were that the request was derived from and and who and when it'll it'll include some kind of information as well. Okay thank you so I think that answers the next question which is how would the data services integrate with existing data repositories like those at academic institutions that already meet DOIs or have options to create a process. But then final question because you did that. Do you still have anything to follow up on that point? Yeah thanks. Hi Tanya, hi Cam. Yeah so I mean that's a interesting approach. I'm just trying to understand it a little bit about why you do it that way. So you said that if a data set already has a DOI then that would just be associated with the request and use. But you said in some cases it might be a new DOI meant that it is the released data set. Now I assume that that's not a duplication thing but that maybe is the data set being released. It's a subset of a larger data set or it's an amalgamation of multiple data sets. Yeah okay all right that's that's helpful. So then a related question is how are you, it's not my area but I've got a colleague who's very much into data provenance and tracking you know if a data set in DOI is assigned you know around the collection point and then is kind of amalgamated with other data sets or subsets used exactly as you described how do you track the you know your DOI here that's being used for this particular project relates back to you know the the original DOI? The DOI will that's attached to the data set the requested data set will include the source data set in the in the DOI metadata and it will include that DOI. So all the data data sets DOIs will be included in that. To enable someone who comes later that they can actually recreate that request if need be because we can't attach the data necessarily to the DOI because it's always restricted access. Okay thank you. So do those DOIs they resolve to a record within your service? Yeah we'll be using data set. Awesome. So another question. RIS is obviously SLEE-C based so how does that handle end-range of communities? Good question. So noting the I suppose the the temporal and funding limitations of the project we had to basically identify what was in and out of scope of the project so the two areas that were identified for things that would have to be a next step but not dealt within the project were it became very and kind of drift off sideways and then come back to actually answer the question it became very clear during the project that there was quite a substantial proportion of restricted access species data that was effectively restricted for cultural reasons. So what we've done is kind of put I suppose holding statements in the framework for dealing with those issues and then recognise what actually needs to happen is a separate process that actually looks at the cultural issues around this type of restricted data. The other the other big issue that was quite quickly identified was non-observational data aka ecosystems, communities and the we basically identified it as out of scope simply because it was too complicated to deal within the confines of the project. We needed to basically sort out what was happening on a species record by species record basis before any consideration could be given to ecological communities noting that some of the issues with ecological communities actually overlap complex survey data where you've effectively got the same basic issue where which is if a sensitive species or restricted access species forms part of the community the vegetation community identification or indeed is one of the the species observed at a information ridge plot site in both cases you're going to have problems withholding the record because it affects the functionality of the data but that that will be the next problem to solve once we finish this project unfortunately. So and the other difference is that the framework predominantly deals with point data whereas indigenous ecological communities and usually polygon data and as far as I know at least from the commonwealth commonwealth perspective I managed a section that is that data may release it publicly anyway because there's less risk some of the factors including targeting specific species are less the risk risks associated with that are much lower than not releasing that data publicly for example land clearing can wipe out a whole community no one knows there for example so for a risk-based approach releasing data publicly at least from I can only speak from the commonwealth's perspective if you want want to talk from the state perspective but as far as I'm aware most jurisdictions release that community data publicly anyway. Yeah I think you're asking me in New South Wales we don't have an issue with releasing that sort of data but I think I raised this before that there is the possibility you can treat polygons using the same sort of algorithms that we do for points so that you end up with sort of a grid representation of a polygon which would be all flascated in a similar way but it's not it's not impossible. Will the list of all restricted access spaces be available at the stern or the ALA in machine radar form? Each of the jurisdictions have different maturity levels with respect to how they they store and share species data so we're not interfering with any of those processes. Restricted access species data won't essentially the services host it's going to be hosted at least initially by the ALA but it's not it's separate to the ALA in terms of what we don't necessarily own and it's a it was a collaboration between all the jurisdictions and really the like I said we're not transferring data we're relying on the existing processes that or storing data we're relying on the existing processes that each of the jurisdictions have in place to do that this is merely a request tracking service. And yeah I suppose with particular respect to the amalgamated list potentially but I don't think we're mature enough in our conversations but with all the project working group partners to really confidently say that it would be available externally but that's the intention to move in that direction. Will the license agreements and legal clauses be in the release information that's due to come out in the next few months so that others can see these documents without a data request or is it specific to a data request? So there's in the framework will include does include suggested legal clauses the specific license agreements for each request will be treated as sensitive information and not released publicly. So yeah there is suggested clauses and again each jurisdictions have their own legal departments and they might be slightly tweaked. We've only put suggestions in there and we've had feedback from some NGO data providers such as Birdlife Australia that because they don't they're only a small operation and they don't have a big legal department or none at all they're really happy to have something like that to use to create their own license agreements. Thanks Tanya yeah a related question and you did mention it in your talk about the data service I just missed the detail so you said the data service is a simple request management system but you did say something about whether it be kind of storing the completed agreements or not can you can you remind me what that was I just didn't hear it. Yeah at the moment it will be tracking the negotiation stages of that agreement and then it will also say it's finalised and it will store a document ID only at the moment we're not going to be storing the actual physically physical agreements because we're using an external a different service to do the signature process for the agreements and that'll be stored in that and there'll be links to that in the data custodian and requests that can download that those documents the final final documents. So if I can ask a supplementary question to that does that mean then that your group is kind of one of the parties involved in the execution of the agreement or named in the agreement or has a kind of assigned responsibility for management like overseeing those documents or are you just kind of you're getting a tracking number that kind of allows people using your service to keep everything recorded. Okay thank you. I'm only going to do slightly manual processes and they'll send me the agreement and I'll send it to the signing service and offer code. I'm not yeah I'm not helping drafting or anything like that or even tracking. And you're not not responsible for the management of the document you just kind of offer a technology service there. Thank you. Let's see when I ask about the data set themselves what will the data sets be would you just a current stash or a monitoring data etc and what is the first process for identifying and apologies if I messed this up a chance. It'll probably it's a mix of it'll be a mix of survey systematic survey and internal observation data and museum collection data eventually. It's basically it's really the data sets that the that each of the data custodians sign up to the service have available so it would be the same as approaching New South Wales or WA environment departments and asking them access to via net or or or any of the other state data sets are approaching the museums in her area via the AVH or OSCAM. They're just the similar sorts of amalgamated data sets that exist in the ALA. It's just that because you're going directly to the the data custodians some of the limitations that do exist in ALA data such as as complex data being simplified to meet Darwin core principles may not necessarily be an issue. And one last question at the moment if I am managing a specimen collection how easy will it be for me to confirm if there are restricted access conditions in relation to a particular species that I'm working with or considering loading out? So there's a two part answer to that question. You can actually do that currently albeit with with a slight error margin attached. You can actually load a species list into the ALA. The ALA has all of the the current sensitive species lists in Australia loaded into it. I do have some reservations about saying that because there is a slight error margin associated with it. If the if the process that we're currently embarked on is completed then the idea would be that there would be a publicly amalgamated version of the list with the obfuscations required in each state and territory. And we've also been in conversations with a few of the third party data set providers such as Birdlife Australia who do have their own sensitivity list about how they might be able to contribute to the process as well. Thanks. So I'm not seeing any more questions in the chat at the moment but to be honest I'm really curious about just what the process was for managing coming to cross jurisdictional agreement at this scale across Australia and whether there are any particular challenges that you had in trying to create that level of alignment? Look I think it was a very collegiate process. I think it was made slightly easier in our space because it's universally recognised that there's an issue. I think the key to agreement though when we started the project most project working group partners were keen to have something akin to a signed document either a non-binding legal agreement or a legal agreement. It became abundantly clear as the process went on that that was just not in any way shape or form ever going to be achieved. There's just too many variances in approaches and process to make that achievable within a short timeline. I think the single biggest learning that we would have would be recognising from the start that the best approach was the sort of principle-based approach where organisations can voluntarily indicate compliance with rather than trying to get signatories to a document would have I think probably cut the time required by about two-thirds. And frankly I don't think a binding legal agreement had we continued down that track would have been achievable ever. Nor do we have the government's body to have oversight of that. Yeah actually that's a really good point Tanya because the other thing is particularly under the terms of our funding we need to develop sustainable management of the framework and process that would not have been achievable with the legal agreement because we would have had to establish a recognised entity that could actually manage this to the satisfaction of all jurisdictions in Australia which was similarly almost impossible now that if you went back 20 years ago the environmental resources information network within the Commonwealth often coordinated those kinds of activities but these days that kind of national coordination is very difficult to establish or get approval for. So again going down the principles-based approach has proved to be the key to success I think. Thank you I think that's some really valuable observation. So one last call out to the audience or to my co-chairs if anyone has any last burning questions that that might be everything then. Well yeah I'd just like to thank you both so much for sharing your experiences. I'm really looking forward to seeing the framework. So yeah you said fairly shortly that'll be coming out. How can people in the audience who are interested how can they keep track of what's going on and be informed when that comes out? We can probably just pick a copy that can be circulated once we've finally got everybody to pull out their teeth and reticently confirm that they're all relatively happy or at least not unhappy with it. Well then maybe I can when that's available if you let me know I can share it on our interest group mailing list I think that might be a good way and I'm sure we'll also be promoting it through our ARDC channels as well. I hope to have a website or a web page up in the coming weeks which will include some FAQs and sort of a bit more about the project as well. Maybe when that goes live we can link from today's meeting notes there so that anyone who's interested from today can find out as well and I'll be making this recording available as soon as I can and also I know that there were a few people who said they couldn't be here today but were really excited to hear about it. Brilliant, well then thank you both for coming and thank you everyone for attending today and all your questions and a great meeting.