 First up, what I'll talk about today is to provide a context around the sensitive data guide and what the goals of the guide were. So I'll introduce its content and the key points that might pique your interest because obviously I won't have time to run through the detail of everything that's in the guide and also then talk about where to from here. Having put this guide on the web and having had some interesting response from it already, what are we going to do from now on, how are we going to add to that? And then as I said, some time for discussion and any new ideas that you as the community have following the presentation today. So what is the guide to publishing and sharing sensitive data? We released the guide on the 23rd of September, so just a couple of weeks ago, alongside one of our fans up newsletters. If you want to look at that newsletter because it's got a summary of the guide as well and some other bits in it, there's the link there. The guide's written for, well, anyone who's managing sensitive data. So includes data managers, researchers, members of the research office, dull librarians, anybody really. It is introductory level information, however, the guidance through the steps and the decisions made within the guide in terms of directing you as to what to do and what sequence to do things in are relevant to everybody regardless of your level of understanding and expertise. Why did we create the guide in the first place? Well, as you know, as most of you are probably here because you do use or deal with sensitive data is that it can be a little bit trickier than other forms of data. It involves extra steps to publication and sharing that need consideration, namely things around the legal and ethical side of things. At the moment, unfortunately, there was a little publicly available in terms of guidance to help navigate the process and to guide decisions about publishing sensitive data. What we found that was out there was quite disparate in terms of the level that it was aimed at, the level of detail and its inclusiveness as well. And by that, I mean that often parts of the steps involved in publishing sensitive data might have been included or spoken about, but not the full sort of go woe, which is what we've tried to do in the guide. Consistently, there was an express need amongst the data community for navigation and some consensus around publishing sensitive data, which isn't to say that there's ever going to be a one-size-fits-all for sensitive data, but some consensus in the process of how this is done, what steps everybody needs to consider to start from, here's my data, to here's me and my published data. And probably most importantly is that all of these points above have prevented researchers and data managers from wreaking the potential benefits of publishing their sensitive data. So those are benefits for you as the researcher or the data manager in terms of the data becoming discoverable, which can lead to citations, collaborations, reputation and profile, which can lead to future funding, tracking the reach and output and release of your data, ability to publish in leading journals. So a number of leading journals plus is always a good example of that, now mandate that data regardless of what kind of data it is, needs to be published alongside the paper. Data security in terms of storage of your sensitive data, as well as meeting funding and, as I said, paper publishing obligations. So major funders in Australia at this point are not mandating the publication of data, including sensitive data, although there is encouragement along that way. But what you may find or might find is that collaborations, if you have collaborations with institutions like SEAS, particularly in the US and the UK, major funders there, like the Wellcome Trust or NIH do mandate publication of data, including some forms of sensitive data. And of course, there are benefits for wider science in terms of scientific rigor. So any forms of data that can be published can, of course, be checked by others, values of open access. I think particularly for sensitive data, which might not apply to other forms of data, is that the kinds of data that we, including this category of sensitive data, are often those that are most expensive and time-consuming to collect and most taxing on participants. If that data can be published and potentially reused, then there's big points for efficiency of research along those lines. So how did the guide take shape? What did we do? Well, you'll notice that there's an absence of sensitive data records in repositories, including our only research data, Australia. And these are for the reasons that I mentioned in the earlier slide. So we had some discussion and community feedback around this, review of the literature that was out there, and much consultation and editing with experts in legal and ethical fields, as well as experts in people that were experts in particular forms of sensitive data, such as ecological data. What we opted for was to focus on the user-friendliness of the guides or user-focused guide that included major decisions and the steps to publication in a clear, easy-to-follow way. And this is based around a flow diagram or a decision tree, which I'll get to very shortly. The focus or the key features of the sensitive data guide is to, firstly, to clearly outline the sequential steps involved in publishing and sharing sensitive data specifically, although the steps to publishing and sharing any forms of data are relevant in this. So if you're not dealing with sensitive data, you might still find the processes or the steps outlined in the guide quite useful to provide a decision framework for going through these steps. So I've got this kind of data. What do I do next? What's the appropriate sequence through each of those steps? You can tick them off as we go along. Encompassing definitions and some methodology for each steps. So it's quite hard to write an all-incumster and inclusive definition of sensitive data because many kinds of data will get into that category, and I'll go through some of those shortly. So keeping that in mind, in writing definitions or describing, for example, what sensitive data is, was to keep it relatively inclusive so that the focus is on encouraging the reader to think about what it is that might make their data sensitive from a legal and ethical point of view. And lastly, legal and ethical expertise to provide advice throughout and before the release as did sensitive data managers in some fields, which I mentioned. So in terms of thinking about... I mean, ANZ is obviously involved in, and we've given many webinars on various aspects of data management. So just to provide a bit of context, the guide is focused largely on the publication and sharing part of that cycle. It is, of course, predicated by good data management. So your data has to be in good form before you can consider publishing it, of course. And then the point of doing it is obviously to read the benefits of that. So data citation collaboration, potentially future funding. The guide's not intended to replace or override any institutional policies that you've come across. So for example, you might have within your institution quite specific policies regarding data treatment or the confidentialising of data or how it's stored, what repositories to use and intellectual property policies can vary a bit between institutions as well. It's intended to be a guide rather than a technical manual. So the focus, as I mentioned before, is really on looking at the overview or the process of publishing data rather than detailed instructions on each of the processes along the way. There is some information there, but that's not to say that there will and there should be scenarios where you'll need greater detail or quite specific methods required for some kinds of sensitive data. And we aim to keep updating a bibliography of where perhaps some more specific instructions can be found for each of those steps, which will fit within the guide. And of course, there's more to come. So the Sensitive Data Guide is iterative in its content. We'd like to keep updating it and adding more following feedback from the community, following feedback from you. To, we're looking at in 2015 to provide perhaps some more comprehensive modules for adults on specific aspects of sensitive data and some that have come up already in feedback from releasing the guide or along the lines of perhaps ecological data specifically, linked data and cultural data. So I look forward to more discussions about these with yourselves and also there will be plenty of opportunities for input along the way. Topics covered in the guide. Now, I'm not going to go through all of these in detail because we simply won't have time, but just to pique your interest in case you're thinking, well, what's in there? Should I actually go and read further? These are the main topics that are in there. Defining what sensitive data is, confidentialising sensitive data, ethical consideration and legal matters and licensing the data. I'll talk a little bit more about those today. Making data discoverable via a data repository and what to publish and share or in what conditions to publish and share the data. The guide includes some definitions where institutional policies may come in and might differ. So to point you to go and look for those within your own institution. Extra information that might help you in making your decisions and includes guidance not only for new data but also if you're managing or you have existing data or data and whether the data is owned by you or owned by others. Key messages that have come out of the guide probably, but you'll cross the head with this a couple times more before the end of the presentation is that you can publish a description of your data that is the metadata without making the sensitive data. It's self-openly accessible. You might have heard of this before in terms of the public-private contrast. So making metadata public but the data itself private or inaccessible under some conditions. You can place conditions around access to the data. Publishing your data or just a description of your data means that others can discover and cite it. So that should be or probably will be the thing that you're most interested in getting it out there. Sensitive data that has been confidentialised has been modified in a way that is no longer sensitive may be shared in some in many circumstances. And lastly, be a scout plan ahead. So there are things that you can do when you're before you collect your data or when you're collecting your data to make the process of publishing and sharing the sensitive data a lot easier in the long run. So the guide is based around the as I mentioned this idea or how do we go from having the data to publishing the data? What are the steps involved and how do I make the decision about what those steps are and what sequence to do them? So for example, I work in an area of epidemiology. So I'm quite often using other people's data. So I would say, yes, I have sensitive data. And then I would follow along to collecting new data. I've already got the data. Is the data mine or is it somebody else's? So it's collected by you. So as you can see that the idea is to be able to quickly tick off boxes and work through each of those steps until you get to your desired end point. So to begin with, I thought we'd start with, well, the first step in that box and probably reason why many of you are here is, well, why is my data sensitive or are my data sensitive? And this is the definition that we've got in the guide. So it might look familiar if you've already read that. Sensitive data, a data that can be used to identify an individual, species, object or location that introduces a risk of discrimination, harm or unwanted attention. Under law and the research ethics governance of most institutions, sensitive data of this form cannot typically be shared in this form with few exceptions. So it's actually very difficult to define sensitive data inclusively, which many of you in the field would know, even though this is the very starting point to our guide and to the process. And this is because what's out there at the moment can, the definitions can be quite disparate and they're typically disciplined or sometimes institutionally specific. But what we wanted to do was to define sensitive data in a way that went back to the core principles of what it is. So the first being it includes data which identifies a person or a thing or perhaps sometimes even an event or activity and that this identification may introduce a potential risk of discrimination or harm. So in being inclusive and in being somewhat broad it encourages the reader of the guide to think about what and whether their data are sensitive. Sensitive data, as you would know, crosses many disciplines of research. Generally it's separated into two main categories. The first being human data. This is probably what most people think of when they think of sensitive data. So this includes, and this is not an exhaustive list, but human, sorry, health data. So medical records, data from clinical trials, epidemiological records from areas of social science. So social science is obviously research or data looking at the relationships between individuals and any aspect of society really. So common fields, political science, sociology, psychology, and also some fields of humanity. And a common example of social science data which may be sensitive is from surveys such as the census and also cultural data. So for example, research projects which collect information on sacred practices, events, locations, and other information, such as that. The other main category is ecological data. So a common example of that might be data about the locational practices surrounding vulnerable animal and plant species. Geospatial data which is now collected alongside human and ecological data quite routinely can lead to the data being sensitive because it can pinpoint the identity of who or where somebody is. Also sensitive data includes data that's quantitative, such as spread sheets, numbers, qualitative as well as, of course, geospatial data. So pretty much any form of data if it includes identifying information and information which can potentially put a person or an object at risk of harm and discrimination fits into this category. How can sensitive data be published? Now, as I said, I won't go through the entire guide but I thought the two areas that you're probably most interested in and that tend to be most controversial is from a legal point of view and from a ethical point of view. So legally, when we talk about sensitive data, the legal acts that are triggered around sensitive data are formlessly the Privacy Act. So Privacy Act states that data that contains, identify, so identifying information of people and personal information and you can have a look for more detail in table one of the guides. So for example, personal information might be around copra practices, it might be around criminal records and things like that. This kind of data triggers the act. So these data cannot generally be shared in the original form. If people are no longer identifiable, so the identifying information and relevant personal information if it can lead to identification is removed, then technically the act is no longer triggered. But of course, this must meet definitions of identifiability and confidentiality of that act. And we go through that in quite a lot of detail in the guide because it is a point that people are concerned about quite rightly and we're very pleased to have an expert in this field review that section and he'll be speaking in the next webinar. And also the other relevant sections are our chapter on confidentialising data which you might be interested to read too. Also from a legal point of view to think about is licensing data, any kind of data but including sensitive or confidentialised data before it's published. In Australia all data should have a licence. It explains how the data can be used and attributed. And without a licence it will be unclear to the re-user how the data can be reused and this might discourage re-use as well. Some repositories do have their own licences but also anybody is able to use the set of the suite of endorse licences at Old Scotland. There's some links there if you'd like to go and have a look at those and I strongly encourage you to because it's very user friendly. How can sensitive data be published from an ethical point of view? Again, this is what is written in the guide and I think it's a really important part to start from. So I'll just read out this, the introduction to that section. So in addition to meeting legal standards, researchers have ethical obligation towards participants and research subjects. These include preserving privacy and avoiding any possible harm arising from participation in research and its subsequent publication. The ethical management of data must be the primary concern of researchers to maintain participants trust and research integrity. So of course it was one of our primary concerns inviting the guide is how to look at publishing sensitive data from an ethical point of view and that of course includes in how a researcher or a data management operates within the ethical applications and committees within their institution. So the key message to publishing sensitive data in the ethical manner is to plan ahead. So include plans to publish confidentialized sensitive data, so it's the sensitive data which has had identifying information and information which would place an individual at risk of identification and potential harm has been removed. And again, please have a look at the more detailed information about confidentializing within the guide in your ethics applications. So before the data is even collected, if you can. And also to include in any information to participants and in consent forms from hearing participants as well. We've got some great examples from other places that are being used around the world about how to include information about the publication of human data when asking permission in consent forms of participants in the research study. That's really handy. The story is of course a bit more complex for existing data where specific consent for publication wasn't asked of participants. So we're largely talking about human participants here. But there is some that can still often be done in some situations. And we've got some very clear steps as to when and how that can be done in the guide. So check out section 4.2 for that. So what are your options in terms of publishing sensitive data? How do you get it out there in a legal and an ethical way so that you can reap the benefits or perhaps meet the funding obligations involved in your research? So you can place conditions around access to the confidentialized sensitive data. And this would be the recommended action for the vast majority of cases of sensitive data publication. I keep saying sensitive data publication, but as you've sort of picked up from now, picked up earlier, what I'm talking about when I say that is sensitive data that has been confidentialized already. Obviously data that's got people's names, addresses, or specifically identifying or other forms of identifying information is not data that you can legally put out there unless it's been modified. And ethically, you shouldn't be put out unless it's been modified. So we're talking about data that's been treated in some way so that it no longer places the participants at risk. So if you place conditions around access to the data, this is what we would call conditional access. So this is where the metadata, so a description of your project and of your data set is available to the public, but access to the data itself only occurs after predetermined conditions are met. So common conditions, this is already happening in quite a number of fields and we'll go through a great example of that shortly with the Australian Longitudinal Study of Women's Health. But conditions that can be placed around access to sensitive data, common ones are providing information about who and how the re-user wants to use, store, or manage the data. They usually have to agree to conditions of data security and register or provide contact details. And some cases also agree that they may be contacted by the original data owners for the purpose of collaboration or for other reasons as well. And you can set the conditions around access. The majority of repositories will allow you to do that. This isn't, you won't be able to read the detail, but this is just a screenshot from the Australian Data Archive, which deals largely in social science data. And this is actually the record for the Australian Longitudinal Study of Women's Health. But it shows metadata description of the data set, but to actually receive access to get the data itself. And you can just see where that, the one and the two little highlighted sections are there. This directs the potential re-user or the reader of this metadata record as to how they would do that. So this is the best way to do it. Click on that and it will tell you under what conditions and how you can gain access to the data itself and what you would need to, what conditions you need to me to do that and how to do that. I'm going to move over now Associate Professor Lee Tooth, who we're very lucky to have with us today. Lee is the Deputy Director of the Australian Longitudinal Study of Women's Health. And she's also the chair of the Publications Sub-Studies and Analysis Committee, which is the committee that deals with applications to reuse this particular data set. Lee's gonna talk about how they, how their sensitive data is confidentialised so that it can be reused, how they have public metadata or descriptions of the data set but with conditions around the access to that and what the benefits of publishing and sharing this study are. Okay, well thank you very much Sarah. And I'm gonna have to turn to everybody. I'd like to thank Sarah for inviting me to give an overview of our study and exactly how we have been able to go about sharing some of the very sensitive and personal data that our women have provided us. So just to start with, just this quote from our study director, Professor Gita Mishra, just illustrating that how fundamental data sharing is to study and that it very much is a public resource funded by the government and available to all people who want to use it, providing they follow certain conditions. So that's just a really nice sort of overview to start. So just for those of you who are not aware, the Longitudinal Study is a collaborative project of the University of Newcastle and Queensland and it's been going since 1995. So it's one of the longest running Longitudinal Studies in Australia and we were lucky to be able to recruit over 40,000 Australian women aged between 18 and 75 back in 1996. And we've recently added another 17,000 women, young women to our study last year. So just to give you an overview of exactly who our women are, we've got three aged cohorts in the original sample who were recruited in 1995 and six. So we have women born between 1921 and 26 who when they joined the study were aged 70 to 75. Then we have a cohort of women born between 1946 and 51 who were then aged 45 to 50. And we had a group of women born between 1973 and 78 who were then aged 18 to 23. And you can see now that these women have aged. Our oldest women are now entering their late 80s and early 90s. Our mids are now in their mid 60s and our then young women are now in their late 30s and early 40s. So after the 2010 National Women's Health Policy and discussions we had with the government, the government agreed to fund us to recruit a new group of young women because we argued we were no longer able to represent or provide data about young women in Australia today. It was our young cohort was aging. So we were very fortunate enough to be able to recruit another 17,000 last year to become our new young cohort of women. So that's who's in the study. Basically what we have done with the original cohorts is we have surveyed them approximately three yearly since 1996. So we have a wealth of information about them and we are now up to our seventh survey of our young women which is currently underway as I speak and we've recently completed our seventh survey of our mid-aged women. With the new young cohort however, because of the online technologies that are available today, we actually survey them annually now and we collect a whole range of data on all aspects of women's life including mental, physical, reproductive and social aspects of their health. Asking questions about life transitions, about life events, issues such as employment or caring, looking at health service use and so on. And the other advantage that we have recently been able to implement is data linkage with national and state-based administrative data sets. And these include information about health service use through MBS, through pharmaceutical use, through the PBS, data on incidents of cancer, on hospitalisations and also on perinatal information as well. And just to give you a bit of a feel, we have over 600 people who have used our data both nationally and internationally. So it's been a very big data source with a lot of people who have used it. So what impact has the study had? Basically our data have been reported in over 500 papers across a whole range of journals. And we have had a significant impact in informing national health policies in all sorts of areas including chronic health conditions, physical activity, violence, nutrition, caring and so on. And possibly our most significant sort of output was towards the 2010 National Women's Health Policy where our data were cited extensively throughout the policy. Another way that we contribute is by adding value to other data sources, for example, by the data linkage projects that I mentioned earlier. An example of that is a paper published through one of our chief investigators in 2011 in the Medical Journal of Australia which was looking at women's use of mental health services and looked at the numbers of women across the country who were accessing mental health services by whether they actually reported having a mental health condition or not. And this information and these data were able to help the government in terms of its mental health policy provision. Another example is by the way that we can support sub-studies and also large data pooling research. An example of that is the Dinocta project. Some of you may have heard of this. This was called the Dynamic Analysis to Optimise Aging Project. And we were part of a pool data set from nine other longitudinal studies in Australia that had included some aspect of aging. So turning back to now the topic of today's seminar and this is how do we actually manage the sensitive information that we collect? Because we ask very personal information including questions about reproductive events, about sexual identity, about violence. And these are questions that are incredibly personal to people and we can't just go sharing that information. And our study is considered to be a public resource funded by the government and open to anybody who really wants to use our data. So how do we actually go about sharing these data legally and ethically? So the next couple of slides, I'll just talk about our processes. So basically when the women join the study and at every survey that they complete, they are informed and asked to consent that their survey data will be linked with their previous survey responses so that we can follow women longitudinally. And women have the choice to say, no, I do not want this to happen, in which case we can only use their data in a cross-sectional way or women can withdraw from the study altogether. We also get women to sign a consent form and to read an information sheet where they agree. And that openly says to them that your data will be used but you will not be able to be personally identifiable from your data but we will de-identify your data. So we've set that condition up front with anybody who's part of the study. And then in terms of how we manage the data practically, all the surveys that we receive are de-identified and confidential. So when a woman first joined the study, her personal identified information was removed from the survey and in its place she was awarded what's called an ID alias, which is just an alias number. For example, 206069 is the ID alias of one of the women in our mid-aged cohort. So the unique identifies linking the ID alias to the personal identifying information of a particular woman is held securely at the University of Newcastle and only the data manager at the University of Newcastle has access to these. So the rest of the staff, me included, at the University of Newcastle and the University of Queensland have no idea who the women are who were in our study. And so any data set that we send out for people to use for analysis on that data set, the first column is the ID alias so that they can then link that ID alias to future surveys with future data. But again, they have no idea who that woman is. And another level of protection that we offer is because we have women from all over Australia, including rural and remote areas, we have the geocoded data of those women's addresses. And some researchers who do research on geocoding and looking at, for example, women's response to drought and may want to know a postcode that a woman lives in, we have strict criteria that we do not release data that is smaller than a certain geographical area so that there is no way that anyone could work out that that ID alias that comes from that postcode could be a particular woman. So we have metadata available about the al-swan in several national repositories, including research data Australia, the Australian data archives and Trove. And this basically is just a description of the sort of data that we have. But if you want to access our data, you must come and ask us and get permission to do that. And so we have our own public website that is very complex and has an awful lot of information in it, but we have a specific section about how to access the data. So if you click on that link, then you have to come and put an application in. And so basically you have to complete an expression of interest form. You have to provide information about yourself, about what you want to do with the data, your research question, the variables you want to ask, your analysis plan, and all the details about exactly what you want to do with the data. If you want to use the linked data, you have to provide the justification for why you need the linked data. And you have to provide information about what your publications, what you're intending to do with the data, what publications you're planning, what conference presentations and so on. So this application is reviewed by the Publications of Studies and Analysis Committee, which consists of all of the steering committee members plus a couple of other experts. And this process takes a couple of months. And each application is reviewed on merit to make sure that it's an appropriate use of the women's data and that it's feasible. And then if that is approved and as a data re-user, you need to sign a statement of data use and also confidentiality statements. And anybody who uses our data must sign these statements. And these basically cover aspects of what you are going to do with the data and that you're not going to send the data onto somebody else who hasn't signed these documents and that you're going to treat the data properly. And also once you have signed these documents and received the data and you can begin your analyses, we ask you then to provide a six-monthly progress report and update so that we can keep a rough idea that you're doing what you said you were going to do and not breaching any agreements that you may have had with us. Great, thank you very much, Lee. Following on from Lee, just in summary, in discussing earlier what's included in the guide and then hearing a great story about how sensitive data, which we often think about as something that's too difficult to share, as something that where they're already and for a long time has been some great examples of data publication and sharing in a very large scale and in very influential and successful ways. So sensitive data publication and sharing can be done. What you might like to go away today and think about, well, what can I do to start or what could I do right now is to familiarise yourself with local policies for your institution around intellectual property, licensing of data, if you have any recommended policies around the confidentialising of data, about what repositories to use. And if you're in the stage where you haven't already collected your data, take advantage of that and plan ahead in terms of ethics applications and consent forms, including information about potential data sharing for your participants and for your ethics committees before the data is collected. And that will save you a lot of grief and make this process of publication and sharing and the potential benefits of that for you much, much easier on the track. You can start by publishing your metadata if you've already got your data sensitive data set there and you're wondering more, what can I do now? How can I start taking advantage of this data set now? You can publish a description of your data without making the data itself openly accessible. And as we've said a number of times and this is the recommended way to go about publishing confidentialised sensitive data because no one will know about your data if you don't publish a description of the data in Lease to begin with. And it's very rare to be unable to at least put a description, a good description of your data out there with further information about when or how it's already available, access to the data can be gained. If you put the description of your data out there, it's nothing to stop you doing that and then from keep working on access around, maybe you need to treat your data in some way in terms of confidentialisation and you can include information about that obviously in the description of your project. But how you can keep working on access conditions if you require some further expertise advice or time to do that, but getting a description of your data out there is something that can be done almost immediately. We really greatly value your feedback on this topic, particularly as I mentioned it is an iterative process so we're gonna keep adding more detail and more references to the sensitive data section of the ANS website. And if you've got even content to share in terms of if your institution has some guides or information around this topic, please send them either directly to me or our main contact page, we'd love to hear from you.