 I'm going to be talking about various issues, practical and also some initiatives from the perspective of a publisher because that's my background, but also a publisher that does think that we as publishers have a number of roles that we can play in improving the accessibility of clinical research data, which importantly is part of a broader drive to publish more reliable and reproducible research and access to data is particularly important, a particularly challenging part of that. Well I don't want to ignore that sharing of software code and sharing of research protocols are also really important in opening up more of the research process. Given that I'm going to assume that a lot of the audience understand that sharing of research data is a good thing I am going to keep the background and the introduction as to why we're working on these initiatives to to a minimum. Well I did just want to show this this one slide because I think that when we're talking about clinical data access and improving the reliability of clinical evidence the context is really important and also quite tangible because we're talking about research which can have a direct effect on human health and in the reporting of medical evidence there's a worrying in an increasing amount of evidence and high-profile cases where there's been a lack of access to the results or data underlying medical research which is shown in a number of cases here on this slide where we have vast amounts of money spent on drugs which are any marginally effective or don't turn out to be effective or huge numbers of prescriptions for drugs which turn out to be ineffective or potentially harmful which potentially could have been avoided if all of the the evidence results and data had been published and were available to those who are assessing the effectiveness of medicines. So I think that's that's worth bearing in mind that some of the the real implications of not enabling access to all of the evidence underlying medications. So given I'm going to assume that we'll all agree that this is a good idea to be increasing the accessibility of medical research data I just want to think a little bit about what is driving some of these changes some of this these increases in transparency. There are several of them for research funds and research institutions continues to be very important in in driving change. There's a great list that was posted on on fig share earlier this year that has now 27 at least research funders that require data archiving as a condition of grants being being given out and in the medical field in particular I've called out a few on this slide with a recent and fairly high-profile addition being the Bill and Melinda Gates Foundation having an open access and open data policy for their grant holders. In Europe the European Medicine Agency or a regulatory agency currently has a policy of providing access to documents underlying new medical drug applications and they're also looking to extend that that policy on to individual patient data as well. For several years since since 2007 in in the US the Food and Drug Administration Amendments Act has had legislation requiring access to results reporting or requiring results reporting. Other initiatives that some of which I'll talk a bit more about shortly from academic or non-governmental groups that are increasing access to clinical data one in particular the Yale Open Data Access Project or YODA which I'll talk more about later. Also from the pharmaceutical industry there are initiatives to increase data access such as clinical study data requests CSDR which I will also talk about shortly and pleasingly there's also a number of journals and groups of journals that are working towards data transparency and are particularly pleased a few months ago to see the ICMJE the International Committee of Medical Journal Editors begin to form a consensus on data sharing because that's a particularly important and influential group of medical journals and also with a number of different organizations and types of organizations that are looking to increase data sharing individuals actually support data sharing as well. I have a couple of points here from a survey on which I was a co-author in the BMJ in 2012 where we surveyed several hundred clinical trialists specifically about their views on data sharing and individuals for the most part do support the idea of sharing de-identified data through repositories and sharing those data with others on request but the challenge is that often how and where that happens is not quite so clear so the practical issues get in the way despite there being a will generally to share clinical research data and I mentioned that my perspective is very much that of a publisher and I do see that there are several roles that publishers can play in increasing accessibility to clinical research data you may be wondering why it is that publishers are interested in in this I mean well as we don't just want to publish more papers we also want to publish better papers more reliable evidence and having access to the data underlying this publications is an important part of increasing accountability and transparency and ultimately we we hope and assume publishing more reliable papers as well which also ties into the case of many medical journals and some medical publishers that have stated goals stated missions of increasing the reliability of evidence which can ultimately impact on patient care so having access to data is an important part of helping achieve that also various aspects of linking to data citing data integrating data or visualizing data within the context of research articles is also important driver of content innovation in online publishing and many publishers are of course interested in continue to add further value to the research literature so so data is certainly an important area of content innovation to I see a really important role of publishers in terms of impacting and having practical information for researchers with a showing in how journal policies are implemented and how those are communicated to authors to researchers and there are a number of different approaches a number of different types to journal data sharing policy which researchers will be subject to depending on their journal of choice now with this slide I've tried to categorize the different types or at least order the different types of journal data sharing policy that I'm aware of generally speaking with the the stronger policies the more stringent policies towards the bottom of the slide and so there are several different types I'll not go over all of them in great detail for purposes of time but researchers submitting their paper to a medical journal could be asked to state in every paper just what data are available with their paper and that could be that no data are available at all but they still have to state what's available so there's least transparency about what is available probably the most common type of policy which you might see in journals there's a minimum requirement from all the biome and central journals for example or a number of the nature journals is that it's implied as a condition of submission or publication that authors have to share their data on request with other scientists after publication and they also have to share their data on request with editors and reviews as well so that's implied and many people will be effectively subject to that policy after submitting a paper to a journal how effective those are is is an area of debate and that's something I'll come back to later there are stronger policies that have emerged there are journals now which are having requiring active data sharing for for every every submission or every publication more journals introducing standard sections or data availability sections in every paper plus is one example of that BMJ have introduced a data sharing requirement for for all clinical trials for example and then outside of the medical publishing area there are a number of journals that have had mandated sharing with links to underlying datasets in every paper for a number of years in areas such as ecology and animal genomic studies and there's also a new type of publication that's very data focused and that publishes data papers that has particularly strong requirements of sharing data has a condition of submitting a paper to those journals these are journals such as scientific data which I represent and also giga science and F-1000 researcher are also examples I'm just going to highlight a couple of examples of journal data sharing policies and journal data and data sharing by journals in action on the next couple of slides this example that I'm showing here is from by mid-central trials journal and this is actually data sharing via supplementary information files now generally speaking sharing data through dedicated subject specific repositories is a preferred approach that certainly the editorial policy of the nature journals but often and particularly in medicine there isn't an obvious repository for data so publishing data via supplementary information files is actually quite valuable when there's no other option on there's no available repository this example here is a clinical study where the authors have published as a supplementary information file the anonymized data from 19,000 individual patient data from one of the largest strict trials ever conducted and you can download the file as a CSV which opens in Excel straight from the article interestingly this file of 19,000 individual patient data actually came to less than less than five megabytes this is an example of a journal that encourages the publication of raw data wherever possible an example of at least one group of authors doing that another example of where journals have dedicated data availability statements and then link those two data sets are hosted in the external repository again a medical example to keep this relevant this is from the open access journal BMJ open where they have a statement towards the end of the paper and a partnership with the dryad digital repository and they link through to the data supporting this study in the dryad repository I want to talk now about a new type of journal and a new type of publication that has a particular focus on journal has a particular focus on data these are data journals which publish data articles or data papers or we call them data descriptors and so there are perhaps 20 or more of these data journals covering broad discipline numbers of disciplines or more specific disciplines I'm going to talk about scientific data because that's the one that I represent but there are of course others others available as well these journals generally speaking don't publish traditional research papers they publish articles which are designed to fully describe research data sets so they won't generally include discussion analysis or conclusions in the articles they are designed to make data more visible by providing a formal peer reviewed publication for a data set which might otherwise not be published and also to act as a means to give more credit and rewards to researchers for sharing their data so in the form of a formal publication data journals also generally have some additional features or processes which are not necessarily standard in regular journals for example ensuring that data are much more visible to peer reviewers which would sadly often isn't the case for traditional journals. Data journals also fit very nicely with this concept that I've quoted at the bottom of the slide of intelligently open data so this is a phrase that was coined in the UK world societies science is an open enterprise report which is really emphasizes the point that data sharing is obviously a good thing but we can't really derive much value from shared data unless we can understand and reuse and build upon those data so data journals tend to have a focus on ensuring data are actually understandable and reusable and so that that's often the key part of the articles they publish and the peer review processes that they have so this is scientific data on this slide we launched in May 2014 and have published around a hundred articles so far making us I think about the the second biggest data journal that's that's available at least by volume published a couple of things that are different about scientific data it's very broad in scope it covers all of the sciences including social sciences we also have a dedicated data curation editor Varsha whose role amongst other things involves the creation of standardized metadata for every article and every data set that we publish which enables advanced users to do sophisticated comparisons between the data sets that we've published another defining part of scientific data is a bespoke peer review process that's been defined for the journal I mentioned that the data journals do have a particular focus on reviewing data this is how we do it scientific data so importantly peer review of data and the descriptions of data scientific data don't focus on impact and importance of data we do welcome high impact and high interest data sets but also data which are from single experiments or might be perceived to be boring I suppose in in some respects but as long as the data are complete and understandable and reusable then then they should be published so our peer review focuses on whether there is enough information available to understand the data so are they complete can others reproduce other data come prepared in line with community standards if they exist and also our data in the best possible repository and were the scientific methods used to produce those data rigorous and sound so that that's the summary of our peer review process so more than halfway into the presentation about sharing medical data and I feel I should probably speak to the elephant in the room which is that patient privacy is a really important issue in conducting medical research and publishing medical research which means anonymization of data and this is is certainly possible and it is it is there are some examples of anonymized medical research data published in in peer review journals although I'm not actually aware of that many having been published nevertheless I'm highlighting here a table from a set of guidelines which which I worked to produce back in 2009 and 2010 which was intended to give authors peer reviewers and editors a minimum standard away in which they could de-identify data sets so that they could be published in open access journals in putting these guidelines together we came up with a list of 26 direct and indirect identifiers and the idea is that when reviewers are assessing data sets or researchers are preparing data sets that if a data set contains any direct identifiers then it shouldn't be published openly and if the data set contains three or more indirect identifiers such as age or sex then the data set should be reviewed by an independent research or ethics committee to assess the risk of identification before it was submitted for publication these are designed to be quite easy to use and non-technical guidance they were derived through a non-simist systematic review of the literature and through expert consensus building they have been adopted by the BMJ and some biomed central journals but as I mentioned to my knowledge there are relatively few examples of open clinical trial data sets published in the literature that example from the stroke trial earlier is is one exception so and I also want to highlight that there are other guidelines available for for anonymizing data there's a group of Ottawa led by Khaled El-Amam for example who have a method of assessing the risk of re-identification in data sets so this lack really of open clinical trials trial data sets in the literature has led me has led us scientific data to think about whether or not we can take a more pragmatic approach to improving the availability of data underlying the research literature now I mentioned that research data available on request typically from the corresponding author tends to be the most common policy of papers published in journals however this slide that I'm showing now gives three examples of that policy really not being terribly effective when it's tested post-publication so there are a number of examples where secondary researchers have tried to get access to data that's reported in journals that have a policy of requiring data on request and they haven't been able to get that usually in well over half of the cases so this process clearly isn't as robust as it could be so if we assume that most data aren't sufficiently anonymized to be published openly how can we actually improve the robustness of this policy of data on request some of the reasons why perhaps researchers aren't willing to share data even on request are highlighted from this survey that I mentioned earlier so some of the reasons why researchers are concerned about sharing data in this in this graph in particular two major concerns that we hear are concerns about inappropriate reuse of data by secondary researchers on the left hand side and also about maintaining patient confidentiality now there are some projects and initiatives that have emerged which actually enable access to data while addressing these two concerns quite effectively they examples of these are the Yale open data access project Yoda which I mentioned earlier and the clinical study data request CSDR Yoda is a group of academics and CSDR is an industry-led initiative so these are websites portals where secondary researchers can see a listing of clinical research studies clinical trial data sets and they can request access to them the Yoda project has well over a hundred studies from at least two pharmaceutical companies and clinical study data request was initially started by GlaxoSmithKline but now has I believe well over two thousand studies listed from multiple sponsors multiple pharmaceutical companies the links are there if you wish to check them out in more detail but in general how these services enable data to be shared while not compromising privacy and ensuring appropriateness of secondary analysis they have a number of features that enable them to do that so firstly they have a non-public way of sharing data they have a controlled access environment in which secondary research access data and that is not allowed to happen until their request to access data has been assessed and approved and also the requests are approved by an independent governance body effectively so the request to access data are managed not by the original investigators of all the study sponsors but by an independent group which is one of the important features of ensuring balance in in those requests also these the conditions with what one can do with the data and and ways in which participants in the research study will be protected is through the use of documents called DUAs or data use agreements which secondary research have to have to agree to when we're gaining access to the data and also these services have anonymization checks before the data are made available even in a controlled access way these are certainly enabling greater access to clinical research data even if not open access however what these data on request services yoda and csdr don't have is the kind of permanence and discoverability and visibility in the peer reviewed literature so usually when researchers are designing a new a new study they tend to look for what evidence is already out there and that tends to be in journals and it tends to be discovered in bibliographic databases like PubMed so what we've been trying to do at scientific data by consulting with stakeholders in the data repository field with pharmaceutical companies and with with funders and with researchers themselves and editors is how can we connect up the good work of these new data on request services with the fundamental features of data repositories for enabling data sharing and connect those things up with peer reviewed articles where researchers tend to find reliable information so this is the these are the aims of a working group that we put together in December 2014 and we've produced draft guidelines so far on how journals can potentially publish peer reviewed articles about data which are only available on request but to have really robust and persistent links between those articles data on request services and data repositories and we hope that these will produce a number of benefits and in particular you know increasing the visibility of clinical research data which can be requested even if those clinical research data themselves aren't publicly available the guidelines are in the Preprint archive bio archive and are referenced on the previous slide some of the key recommendations from the guidelines are on this slide and they apply to the different stakeholders that that are impacted by the guidelines so clinical researchers we recommend that they need to be prepared to share data on request with short embargoes because they're probably going to be subject to such a policy from most of the journals they choose to submit their work to editors and publishers certainly have work to do here I think we can all work harder to check actively policy compliance for every submission so when researchers are submitting papers whether or not they can actually comply with a request to share to share their data sponsors and funds of clinical research also have an important role in ensuring that they have potentially partnerships with data repositories to enable permanent and persistent archiving of their data and repositories I think are really really key they are enabling access to data across a wide variety of fields at scientific data we work with 70 or more different repositories for specific areas that we cover generally for open access data and for repositories to support controlled access to data in a reliable and consistent way they do need to introduce new features which I'll talk about on this slide so scientific data we have a list of 70 recommended repositories and we assess them by several criteria and they're included at the end of the presentation but I'll not go into them for purposes of time but through the process of developing these guidelines we've come up with a number of additional features for data repositories to enable them to provide access in a controlled fashion to clinical research data and then for us as a journal as a publisher to have reliable and robust links with those with those repositories in our peer reviewed articles and so that those links are persistent ultimately we don't want dead links in in peer reviewed articles we want readers of articles to connect up easily the different products and different outputs of scientific research so repository is non-public data amongst other things need to provide landing pages for clinical data sets or metadata records which can be publicly available and permanently linked peer reviewed articles they need to have systems for enabling researchers to agree to data use agreements and ideally they should be independent of study sponsors as are those data on request services Yoder and CSDR that I mentioned ideally we also want to see transparent systems for requesting access to data and we're reviewing those requests to access data and also equally for open access data as for closed access data those data need to be available and preserved for the long term and for long term we mean at least 10 years so we put out an editorial in July calling for comments on these guidelines I'm not showing that here but what I am showing is a data descriptor we've published in scientific data at the same time as that editorial which actually is an example of our first data descriptor which is open access but actually links to data which isn't publicly available it's available on request so I just want to show this one example in finishing the presentation I'll be the first to say that this example doesn't fully comply with the best practice guidelines that we have proposed but it is certainly we believe an improvement on providing data on request so this project is the Brain Genomics Superstruct project it has a number of functional magnetic resonance imaging data sets from healthy subjects and some of those data can be are available publicly but they have to be requested and then there are some even richer clinical data where you have to go to the study sponsors directly to request access to those data so in this article there is step-by-step information on how one can obtain access to the data sets so the very rich detailed clinical information you have to request it from the Loni Image Data Archive project website and they have instructions on how you can request access to the data and there's also a component which is deposited in the Dataverse repository which is one of our partner repositories and I'm showing that on this slide so this is linked out from the article and there's a number of data sets listed there and one has to request access to data and before you're given access to the data you have to login and then agree to a data use agreement one of these features that I've mentioned so what we're hoping is this is is a useful example it's a useful addition to the debate about how we can have these more robust systems of providing access to data on request so I just want to to sum up the five p's of clinical data disclosure which were the title of this presentation and which I therefore promised to you at the start of it so from my perspective the five p's of clinical data disclosure are firstly publishing so ensuring we can publish as much of this information as possible or if not publishing it then linking it reliably to the peer reviewed literature the second p is policy and journal policy I think is really important and how we as publishers and editors actually check and enforce that policy is really crucial and can have really important impact on on data accessibility. Another p is also peer review I think it's really important to make data visible to peer reviewers even if we don't expect them to reanalyze data we should at least give them the ability to access data if they wish to dig into it privacy is undoubtedly key in sharing clinical research data and various mechanisms exist to try and address that challenge and finally pragmatism I think is important with clinical research data it's certainly something that I have come to learn over the last few years is that while open access to research data is a great thing we often can't let the best be the enemy of the good and if we can experiment with more pragmatic approaches such as this current initiative with scientific data then hopefully that will still lead to greater visibility and accessibility of clinical research data.