 Yeah, thanks. So yeah, I'm thrilled to be here. We have four speakers from the Tilburg Meta Science Group. Each one of us is going to focus on a different aspect of the research process and also provide some ways that we can maybe improve them. So jump right to it. First up, we have Hilda, who will discuss peer review. So go ahead and take it away when you're ready. Yes, thank you, Rick. Let me see. Share the screen. I think this should be working, so hopefully you can all see me now. So welcome everyone. I will be talking about the evaluation of scientific manuscripts and more specifically about peer review, as well as thesis grading. So as you all probably know in the published literature, there we are, in the published literature and especially in psychology and psychiatry, the research fields I work in, there is an overabundance of statistically significant results in the published literature. So over 90% of studies support tested hypothesis. Maybe we're just really great at doing research and we have very plausible hypothesis. But this doesn't seem to be the case since we have extremely small sample sizes with a median sample size of 62. And this is not sufficient for the small to medium effect sizes that we generally investigate. So that results in low power, which is estimated to be 0.5 or even 0.35 for psychology. Furthermore, we know that there's a lot of publication bias, there are questionable research practices going on, and people have a lot of reporting errors in their studies. And all of this has resulted, as you're probably aware in a so-called replication crisis, where prominent studies published within all kinds of research fields have turned out to be very difficult to replicate. And that makes us wonder how do these apparently not replicable studies end up in the literature, and what kind of quality control do we have? So for the published literature, we have peer review as quality control. So peer review should act as a gatekeeper to make sure that published literature is of high quality. And we already know that peer review is the best option we have, but it isn't very perfect. We know that peer reviewers often disagree. You might have experienced this yourself as well when you get feedback, and things are going wrong there as well. So what do they actually pay attention to? What are their quality criteria? What do they think is important when they look at a manuscript? And peer reviewers have a second job often as academic staff, where they also teach their students. And it also makes me wonder whether these academics, the people who educate the new generation of researchers, what do they teach their students? Do they preach what they shouldn't practice? Do they preach what they actually should practice? And what is it that students implicitly learn or explicitly learn in our thought? So for the published literature, we had peer review and in education, we have grading as a system to check the quality of the students' work. And specifically, we will be looking at the grading of the master's thesis, which is the final project usually that a student has before they graduate. So what do we pay attention to when the teachers, when they grade these thesis, and what do students believe is of importance to their supervisor? So our main research question for this research project was, which characteristics of a manuscript being it either an article or a thesis? Do students and researchers believe to be of importance when assessing the quality of this scientific manuscript? So which aspects influence assessed quality? Is there a difference when they're assessing the quality of a submitted article versus when they're assessing the quality of a thesis? And do students and researchers differ in their quality assessment of these thesis? So we set up a study, which was a vignette and survey study. We pre-registered this study on OSF. And our participants were both students and researchers. So for the participants, among the researchers, we emailed a lot of authors and editors who had published within psychology in the past few years. And we had three rounds of sampling. And then we ended up with 687 usable and complete responses after two months. For the students, this turned out to be a bit more difficult. We use social media as well as our own network. However, this didn't went very smooth. Facebook kept deleting our messages and stuff like that. And after 11 months of data collection, we decided to close it and we had only 113 usable and complete responses. So therefore, we decided that all hypotheses and analysis that include the data from the students should be considered exploratory because we did not reach the power that we intended. So first we had the vignette study part. There we had 32 conditions. So participants were either a student or a researcher. And they were all asked to read an abstract of a scientific manuscript. And for students, they were told that this was the abstract of a thesis. For researchers, there were two conditions. It was either a thesis or a submitted article. And within the abstract, we manipulated three factors. So the abstract itself was on best anxiety. And we manipulated the sample size, which was either small or large. There was either a reporting error present or not. And the results of the main hypothesis were either statistically significant or not. We asked all participants to read the abstract carefully and then to rate the quality of the manuscript on a seven point scale and to name three aspects that were relevant for their quality assessment. So for the vignette study, we hypothesized that sample size would matter. So large sample sizes would yield higher quality ratings. The statistical significance would matter. And that the reporting errors would not matter. When we looked at the results, we saw that there was only a small but statistically significant predictor effect of the sample size being small or large, but we found no effect of significant or non-significant results or being an error or no error present. So for the first two sub-hypothesis, we saw that sample size matters. Statistical significance didn't. But we have hypothesis that would be no effect of the reporting error. So we cannot answer this with frequent statistics. So we used Bayesian posterior model probabilities. And there we saw a lot of evidence for the null effect. So we can conclude that the reporting error did indeed not matter. We also looked at the open question where we asked them to name three relevant aspects for their quality rating. And there we saw that they mentioned sample size of power 160 times. So in 23% of the responses, the non-significance or significance of the results were only mentioned three times. And the error in the p-value was never mentioned. So it seems that nobody noticed that there was an error. Or at least they didn't think it was important for their quality rating. For second hypothesis, we hypothesized that there would be an interaction effect on the article and thesis with the three aspects that we manipulated. And there we saw no interaction effects. So thesis is generally rated as higher quality than when they think it's an abstract of an article, but there are no interaction effects with the three factors we manipulated. And for the student versus article, this was the same. So students rate the abstract as higher quality than the researchers. But there were no interaction effects there. Then after the vignette study, all participants received a survey study. And there they were asked to rate 29 different items on the importance when they assess the quality of a manuscript. And these items were all related to either the theory, the design of the research conduct or analysis and presentation of the results. There were six conditions here. So participants were still either a student or an academic. For the academics, they received either a thesis or an article condition. And they were asked to either to assess the quality of the when they would what was important to them when they were assessing the quality of a manuscript, or what was important to them when they were grading a thesis or when they assess the publishability of a manuscript to see whether there were differences between assessing the quality or assessing whether it's fit for publication. And here we have the results of our 29 items for all the different conditions. And what you see in general is that when that all conditions or all groups were kind of similar. So they either think an item is important or not that important. But they all kind of agree and they show the same patterns for all the different items. So for the sample size, which we manipulated in the in the vignette part, we saw that they were only of moderate importance. So these are either having a large sample size or achieving high statistical power. For the error free reporting, sorry for my bad for the sample size. Yeah, for the statistical significance, we saw that they told they thought that that was not very important to them. This is either observing large effect sizes, observing the main effect in the hypothesis direction, or reaching statistical significance of the results of the main hypothesis. Students think that this is slightly more important compared to researchers, or that it will be more important to their supervisor than their supervisor actually thinks what a peer reviewer actually thinks. For the error free reporting, we see that their rate is quite important. And they actually state that it is that's reporting the statistical results without errors is of importance to them when they assess the quality. Then we also looked at responsible research practices, which we did not pre register, but we were interested in this anyway. And we thought what is of importance to them there. And doing a power analysis was only moderately important. Pre registering part of the study was also only moderate important as well as sharing data, materials or analysis code. And please note that for both pre registration as well as sharing apparently the students, which are the diamond shaped figures, are think that this is more important than their supervisors do or the peer reviewers do. So our new generation apparently thinks that this is more important than the old older generation. And furthermore, we also asked them for the distinguishing confirmatory from exploratory analysis, and that they thought was of higher importance, especially the researchers. So to conclude, surprising to us was that reviewers did not seem to be a great source of publication bias, yet no impact on the quality rating of the abstract. And it was also rated as one of the least important characteristics. Sample size did seem to be of importance. So this had a significant impact on the quality rating of the abstract, but it was only rated as moderately important in the survey part. And statistical reporting errors are of importance. They rated as very important, but they're really hard to spot. And we also noticed from earlier research that peer reviewers have a hard time spotting errors. And it had no impact on the quality rating. And nobody mentioned the error in the open ended question. And finally, responsible research practice indicates I'm not rated as super important. That was it. Thank you for your attention. Great. Thanks so much. So I see we have one question already. So which field of study were the participants from? Yes. So they were all from, they all published within a topic of psychology. And we asked them what fields they were in. So I can actually, I have a slide on that here as well. Let me go quickly there. Yeah, just a sec, almost there. So the participants were from all kinds of continents, both the students as well as the researchers. The study phase of the students was mainly bachelor and master students for the academics. This was all kinds of career phases. And these are the research fields that they were from. So it's quite diverse. Social psychology and applied psychology are the largest ones. And the other category are, of course, is a bunch of all kinds of descriptions that people had, but they all published within psychology. Yeah. OK, great. So mostly psychology. I'm not sure if there are any other questions. If not, I would ask Juan. So I find this, there's no difference between significant versus non-significant vignettes and how they're evaluated. I wonder if you could just opine on how we square that with the excess of significant results we see in the literature. Yeah, I think that is very interesting. So it surprised us. We expected that peer reviewers would maybe not explicitly state that they, in the survey part, that it was important to them. But we definitely expected that it would have an effect in the vignette study. So it could be that the peer reviewers are actually not a big part of this. Maybe it's really about the actual authors not sending it in. I don't know. But it is interesting, definitely, to see this. Yeah. Great. And then we'll do one quick last question. Gabriel asks, why do you think that statistical mistakes are hard to detect? Yeah, they're really hard to spot just with the eye. So you do have special softwares such as SpotCheck, which is used more often now and also already at the journal levels, which I think is great because we do find it important. So this stretches the importance of this kind of software. If we do not want errors in there, we should facilitate the reviewers to spot this. Because it's really difficult to really see this with the eye. And software could definitely help with that. This was also in our vignette, it was a small error. So it was statistically significant and it remains statistically significant even without the error. Because we did not want to influence it being either statistically significant or not. Because that would also impact other aspects of the peer review as well. Yeah. Okay, great. Thanks. Well, we had a time there. So let's go ahead and move on to the next talk. So next up we have Marjan, who will be discussing whether sample sizes have increased in response to the replication crisis. Go ahead and take it away. Yes, thank you. So as said, I want to talk about, let's see, yes. I want to talk about sample sizes in psychology. Hilda also talks about something about sample sizes and how they are evaluated by researchers when they have to peer review or evaluate the thesis. But she also stated that in psychology, yeah, power, statistical power is typically quite low because the sample sizes are low. And of course, the, yeah, the most, the easiest way to increase this statistical power is to increase the sample sizes. And we have quite some studies that investigate statistical power already starting by Cohen in 1965, if I'm correct. But sample sizes directly are not that often investigated. But there is at least one study who did this and this was published in 2011. This is a study by Marsalek, Barber, Colhart and Holmes. And they looked at studies published in 1955, 1977, 1995 and 2006. And these first two years were already used by Holmes in some older publications. And Marsalek had all added the letter two years because they wanted to know whether the sample sizes increased as a response to the Wilkinson and task, the task force on statistical inference from 1999. But as you can see, I showed here the median sample size of the four journals that they include. So they had four psychological journals from different fields in psychology. You can see that it not really increase, maybe even a decrease, but not really evidence of an increase over in sample size over time. But of course, a lot of things happened after 2011. We had the replication crisis and followed by different initiatives to improve psychological science. So we had preregistrations, registered reports, awarding open science batches, increased focus on direct replications and null visuals, and also an increased attention to the problems of small sample sizes. So we were wondering whether we could update this Marsalek at all study to see that our sample sizes in psychology have increased after the publication crisis. So we wanted to add a new year to this data and add 2019 because of course then we had all the researchers would have had some time to incorporate these new changes and see whether that has an effect on the sample sizes. So these were our research questions. So our question was mainly half sample sizes increased over time. And we also wanted to know whether this was a response to the replication crisis. So to see whether the sample sizes from before and after whether they differed. But we also wanted to know whether the journal level policies might have an influence on the sample sizes. So we wanted to compare journals who really focus on open science practices, also with journals that doesn't focus on open science practices and also whether differences on the paper level might have an effect. So better studies that practice more open science. So have for example open science batches, whether they have higher sample sizes than studies who don't do that. So to do that we used the original data from Marsalic et al. So we had full data from the years 1995 and 2006. So the older data wasn't available anymore. And these was data, the sample sizes from four journals. So it was abnormal psychology of the journal of abnormal psychology, journal of applied psychology, journal of experimental psychology and developmental psychology. We added a 2009 volumes of the same journals. And we added two journals that focus also more on open science practices and award for example batches, so psychological science and journal of experimental social psychology. And we also hope that that would cover a broader range of topics in psychology. And of all those articles that are published in these volumes, we collected the sample sizes. And here you can see how many sample sizes we collected. And that was a total of more than 3000 sample sizes. But I also have to give a warning because these are preliminary results. So we did a pre-registration and one of the things is that we also need to check a part of the collected sample sizes. So do an additional check to that. And I didn't have time to do that before this talk, but I still wanted to present some of the results already. But yeah, just a warning that it's preliminary. But you can see that we have a lot of sample sizes, but there is also some differences in the number of sample sizes in the different journals. With quite a lot of sample sizes in journal of experimental psychology for example. And relatively less sample sizes from journal of abnormal psychology and journal of applied psychology. And here we can see the box plots for the different journals in the different years. So the colors are the different journals. And yeah, for every journal we have three box plots. And the first one is 1995, the second one 2006, and the last one 2019. Of course, there are also some very high sample sizes sometimes, especially from people that use survey data from a big public surface. So I only show until thousands of participants because otherwise it didn't really fit in the figure. And if we look a little bit to the median over time, then yeah, there might be indeed some kind of an increase over time for the different journals. And here you can see that as well. If you look at the different, the medians for the sample sizes for the different journals. There's also seen also to be a difference between the journals. For example, journal of experimental psychology has a relatively low median sample size, but journal of experimental psychology also contains a lot of studies within subject design. And we see the highest numbers for journal of applied psychology. So in here we can also see it in the table. So for the different journals and also for the total median sample size. And then we see that in 1995 the median sample size was 40. In 2006 it was 57. And 2019 it is 120. So there seems to be an increase. And we also tested that with a multi-level model with a negative binomial to incorporate the distribution of the data. I mean indeed found a significant effect of sample sizes over time. Or second and third question were more about whether it was a reaction on the replication crisis. So we compared 2019 papers with the ones before. And also whether there was this difference between journals. So we compared the two new journals that focus more on open science with the older journals, the original journals. And we didn't find an interaction effect. So we don't see a stronger increase in journals that practice open science practices. So you don't find any confirmation of third hypothesis. However the main effect of replication crisis was significant. So we see indeed this difference between sample sizes of studies before the open science, before the replication crisis, and after the replication crisis. And then we also wanted to look at the difference between papers with and without open science patches. And therefore we can only look at papers from 2019 and only from journal of experimental social psychology and psychological science. People say awardee patches. And if we combine those two journals then we see that the median sample size of the papers that have no batch is 160. And for the ones with a batch it's 190. So there yeah you see some it is a little bit higher. However if we test that with a Wilcoxon rank some test then it's not significant. So it's not a significant difference. We also checked it for the two papers, two journals, separately. And then we did not find a significant effect for journal of experimental social psychology. But we did find an effect for psychological science. So to conclude we see some increase in sample size over time. Which might be a reaction to the replication crisis and the reforms that followed onto this. And this increase seems to be general and not dependent on open science practices on the journal level. And we see some mixed results about the difference in sample size between papers with and without open science patches. But of course there could be other reasons also involved. So it might be that papers that have larger sample sizes also investigate topics that are for which it's easier to get these larger sample sizes but also to get these patches. So there might of course other explanations to be involved. So these were my conclusions. So I just want to say thanks to you and also to Jake who helped with setting up the study and giving more information about our original data collection. And also Yvonne who helped me with collecting all the sample sizes from the papers. Great thanks again. Okay so we have a few questions coming in. One we have you mentioned the problem that within subjects designs generally have smaller sample size. Could the sample size per study condition be a fair metric to compare studies regardless of their experimental design? Yeah that's a good thing. So one of the things that we I look now and I only presented the results for the total sample sizes but if they compare different conditions then we also collected individual data sizes. So that means the individual group sizes. And so we are also going to look into that as well. Because of course as you said it is important to take also the research designs into account especially if you want to say something about the power of a study. But yeah that's still something that we need to do. Great there are a few more questions. I think what we're going to do is just move on for now and then we can revisit these at the end of the session just to make sure we can get through in time. So unfortunately a third speaker can't be here today but we got the next best thing he sent along a video. So next we'll hear from Robbie about a project assessing errors in COVID preprints. So we can go ahead and play that video. Okay here we go. So do shout if this isn't working for any reason. Welcome everyone. Unfortunately I couldn't present in person due to other obligations and that's why I pre-recorded this presentation. And this presentation will be about a registered report that we are actually currently working on about statistical inconsistencies in COVID-19 preprints. So we're working on this project which means that we are currently collecting data. So I cannot share you any results but I can tell you something about the setup of the study and what we are actually planning to do. So this is work together mainly together with Michel Nuitet but also with other members of our research group at Tilburg University. And the study is about the quality of COVID-19 research because what we know is that there is currently some sort of information explosion with respect to COVID-19 research. A lot of research on COVID-19 gets published now. And what we also know is that there are these special fast-track review procedures in order to be able to publish studies on COVID-19 very quickly. And we also know that these studies are more often shared prior to publication for instance in terms of a preprint. And then the question that we are interested in is does actually this high-speed science does it negatively influence the quality of research? Because what we know is that based on the seminal work from John Ioannidis in 2005 we know that if there are financial or other interests that this will lower the likelihood of a finding being true. So in case of COVID-19 research there are definitely financial and also other interests which may play a role. And another factor is the extent to which research field is halted. And in this COVID-19 research there are many different teams involved who are doing research on COVID-19 research. So this is also an important factor. It's actually also playing a role here. And what we also know is that based on some studies that have already been conducted to study the methodological quality of COVID-19 research that the methodological quality could only be evaluated as high in 41 percent of the COVID-19 studies. So this was determined based on applying the standard quality checklists to these studies. And if you compare this 41 percent to the control group then what you see is that 73 percent of the studies in the control group were of high methodological quality. So this already indicates also together with some other research that I do not have time to discuss right now that the quality of the study seems to be lower in COVID-19 research compared to non-COVID-19 research. So this motivated us to look at another indicator of the quality of research and that's the statistical reporting because incorrect reporting of a statistical result might lower the confidence in a study. So if you see a study with a statistical inconsistency in it then you might put less trust in the study in general. Examples of statistical inconsistencies are for instance a percentage that doesn't match up the event and the total sample size. So for instance if there is stated that seven out of 100 participants were infected by the coronavirus then and there's also reported that this is 5 percent and this is of course a statistical inconsistency and another type of inconsistency is for instance if an odds ratio is reported but if also a two by two frequency table is reported and these two are not in line with each other then this is also a statistical inconsistency if this recomputed odds ratio based on the two by two table is not in line with the reported odds ratio and the hypothesis that we want to test is whether the prevalence of statistical reporting inconsistencies differs between COVID-19 and matched non-COVID-19 preprints and the good thing is that we have a population of studies because we are going to look at preprints and at preprints that are published between January 2020 and the end of January 2021. On the preprint service med archive and bio archive and the reason that we focus on these preprints is that first of all they play a central role in disseminating research on COVID-19 and they can also be easily located since they are published on these preprint so we really have a population and this is important because we can use this population to draw a random sample from and we draw a stratified random sample of this population of preprints using straight out the number of authors because you can imagine that if there are multiple authors involved then people might check the statistical results whether these are reported in a consistent way in a correct way whereas if it's a single author paper then of course there is no one no co-author who can check these statistics. We will also using the sampling procedure the subject category so the subject category of each preprint that is preprint servers are assigned by the preprint servers and we will also take into account the date a preprint was actually published and if you have a stratified random sample from the COVID-19 preprints then we will select randomly a matching non-COVID-19 preprint which you serve as a control group and by doing this we actually conduct a natural experiment where we have an experimental group and a control group both based on existing groups. We will look at the number of statistics that we will extract from these preprints using a protocol so we will look at the percentage versus the number of events cases we will look at the test properties so look at the accuracy sensitivity specificity for instance of a test we will check whether the total sample size is in line with the subgroup sample sizes so if subgroup sample sizes are reported we will check whether these sample sizes sum up to the total sample size we will also check whether the marginal values in a frequency table match with the values in the cells we will compare prevalues with test statistics and degrees of freedom to see whether these two are in line and finally we will always we will also compute effect sizes based on dichotomous data if a frequency table is available so based on a frequency table we for instance compute this odds ratio and then we compare this with the reported effect size so we are very thankful that we got some funding for this project by Tilburg University and we could use this funding to hire two research assistants and the idea is that they in total extract data from 2,400 preprints or 1,200 preprints on COVID-19 research and 1,200 preprints not on COVID-19 research and they are currently already collecting these these data and they are about halfway so we need a bit more time for the for the data collection and prior to studying the data collection we also did a power analysis a power analysis in a way where we try to figure out what the effect size would be that we could detect with 80 percent power because the funding is limited so we know we can include approximately 2,400 preprints in our sample and then we can detect an odds ratio of 1.38 with 80 percent power so if the data are collected now what we will do in the next step is we have written automatic scripts to check for statistical inconsistencies and we will apply these scripts to all the data and if there are inconsistencies detected then we will double check these inconsistencies by hand we'll go to these preprints and then check whether these inconsistencies are indeed inconsistencies in the next step we will fit a logistic multi-level model with a dependent variable whether a statistical result is consistent or not and with an independent variable whether a preprint is about COVID-19 or not and then we will run a frequentist hypothesis test to test our hypothesis in an alpha level of 0.05 and we will also compute a base factor a base factor that is comparable to the the frequentist hypothesis test and we will repeat this analysis also with including some control variables so we will include the number of authors and the extracted statistics of a preprint and also the date when a preprint was published we conducted the study oh we are set up the study as a registered report and there's also the reason why it took a bit longer to start with the data collection because we first had to pass this first stage of the registered report the stage where the introduction and method section method section are reviewed so this was this was done and we got reviews from six different reviewers and in the anti-proposal was accepted as a stage one registered report at the Royal Society for Open Science and now we hope since we already have all the materials that stage two can quite easily be finished after all the data are collected because we have already all the scripts so we only basically need to run these scripts and write the results so we hope that this can be that this will be a quite straightforward process and we also worked on a side project that side project that is really related and that is that we also want to enrich the pre-prints and we want to enrich the pre-prints by posting short reports about the consistency or inconsistency of statistical results in a pre-print so such a report can for instance be added to the pre-print via the pre-print server because we can post the report over there as a comment and such a report will add value to the pre-prints because it notifies the authors and also the readers whether there are any statistical inconsistencies in a pre-print and I think what is very good about this is that if inconsistencies are observed in a pre-print that they can hopefully still be fixed before a pre-print turns into a publication and these reports were actually developed together with a research master's student of our university Hong-Wai South she spent a lot of time in developing and improving these reports but thank you for your attention so unfortunately I cannot answer questions now in person but if you have any questions any remarks feel free to send me an email via this email address over here yep so that's it and I pasted Robbie's email into the chat so yeah if you have questions you can either put them in the Q&A and I'll get them to him or you can just email him and then for the final talk this is kind of like that search for a Jeopardy host I have in fact selected myself and so I'll be talking about some privacy risks associated with open data go ahead and share my screen here okay so hopefully that's all set up okay and then first before we talk these days because I'm still working from home at the moment I do need to apologize for my co-worker he has gotten very spoiled and does not like to be shut out of the office so you may end up hearing some grumbles in the background hopefully that will all be fine so we'll be talking about data privacy in openly available data sets and I think this is an important topic at the moment because we are seeing a rapid uptick in the sharing of open data other sessions I suspect will address this more closely but multiple metrics are pointing this way from OSF usage to signatories to the top guidelines and this is of course a great development this is undoubtedly a good development for science the pros greatly outweigh the cons and I say that even after doing this project but I do think it presents some new avenues for risk that need to be managed and so for example a dramatic demonstration that it's surprisingly easy to identify individuals using metadata one study found that the vast majority of American voters could be identified with only three pieces of metadata zip code gender and date of birth so if you're not mindful of this stuff you can end up sharing data that's actually far more identifying than you intended and so we ended up doing this project the sampling frame is a little disjointed because we're merging separate projects and really we started this because we actually wanted to use these open data sets ourselves and we are concerned that if we sampled a whole bunch we'd end up resharing personally identifying information and so we didn't want to be responsible ourselves for privacy violations so we end up with a sample of three journals and a few different years for each and then we took all of the almost all of the articles with open data from that sampling frame we had two coders go through each data set and assess them for re-identification risk and the sensitivity of the data so that is whether you could identify an individual and then also whether the data touched on sensitive topics that someone would feasibly not want to be shared and then when we did detect problems we informed authors in many cases these have already been corrected and this was actually a quite a bright point of this project was that authors were usually you know not happy to hear that they had shared potentially identifying information but were very fast to correct it and very thankful that we had alerted them so how do we go about assessing re-identification risk we looked to the HIPAA safe harbor guidelines as a starting point here and we tried to classify each data set into three different categories so HIPAA has some specific variables that need to be removed in order for a data set to be considered anonymous for that we're looking at concrete things like name, email, IP address initials, birthdate and zip code there's a larger list but for us these are the most relevant and we categorize those as high risk we categorize an additional set as some risk this was data sets with rich data with rich demographics which you could feasibly combine these and identify at least some of the individuals in those data especially if there were outliers for instance maybe an outlier on age within a certain university major or something like that and we also coded for sensitive data for this we looked to the GDPR guidelines because they also lay out some great specific topics here so here we're looking at things like racial or ethnic origin I'll have more to say about this in a minute political, religious, philosophical opinions health related data and we include mental health there since these are a lot of psychological data sets here and data about sex life or sexual orientation and again we code these into three different categories so if GDPR said it's sensitive we categorize it as sensitive if it didn't contain a GDPR violation but we thought that a common person might still consider this sensitive we categorize it as possibly sensitive and if there was just no nothing sensitive at all of course that's its own category so each of these gray squares represents a data set and we're going to jump right into the results for data sensitivity what we see is that overall about a quarter of the data sets contained at least potentially sensitive data most of this was definitely sensitive according to the GDPR this is not that surprising I mean a lot of us research sensitive topics and it's not necessarily problematic as long as these are appropriately anonymized before they're shared breaking down the reasons the very first one is race or ethnic origin now this is frequently collected and shared I think you know in the U.S. and other areas so I'll just go ahead and say I understand there's probably going to be some disagreement on that if we just ignore race then it drops from 25 percent sensitive to 18 percent sensitive then after that in order we have political views health items oftentimes we'd see things like measures of depression anxiety then religious beliefs and sexual preferences or behaviors to a lesser extent and again this is not necessarily problematic until we look at re-identification risk so here ideally all of these squares will remain gray but let's see unfortunately we see that about five percent pose a definite risk and an additional roughly five percent pose a potential risk that you could re-identify individual participants in these open data sets with the publicly available information here getting into these reasons so this light red the light red bars here this is all categorized as potential risk and those are all cases where we deemed that the demographics were so rich that it was likely or possible that you could identify individuals based on triangulating these variables then in terms of concrete HIPAA identifiers IP addresses were quite common they were the number one violation in terms of HIPAA then we also saw date of birth fairly frequently and then kind of just the flat out identifying information names initials full names we did also see some of that I'll note that for some of these like IP address date of birth we can kind of debate on exactly how identifying that is on its own I'll note that in almost all of the cases that we observed these were also accompanied by rich demographics that would allow you to further triangulate down to an individual now you may be thinking well Rick you haven't demonstrated that there's a problem here because maybe if someone has a sensitive data set they're particularly careful to de-identify it and unfortunately we see that that's not the case so if we narrow down to the 208 data sets that we observed with at least some risk of re-identification we found that about half of those also contain sensitive data and so it's actually the opposite these data sets that have identifying information are also much more likely than those that did not to have in addition sensitive data and so this was about five percent of data sets overall and I think these are the ones that are truly quite problematic so then the big question should you care I'm not going to tell you how to live your life I'll tell you that the exact numbers here are debatable they are subject to your definition there's not always agreement there was not always agreement between our coders and if you're a cyber criminal in the audience are you kind of licking your chops and just waiting to get in there and start exploiting these data probably not I mean there's probably softer targets to consider on the other hand if I was a participant I would be pretty upset if I was giving you honest answers on a depression inventory and then that was shared alongside my IP address so I think there's there's some room for debate here and I would moreover say that we didn't identify any cases where the really identifying information like IPs like names were actually used in the analysis in other cases for instance data birth was sometimes used in the analysis to derive an age which can be rendered much less of a problem if you just convert it to age and then share age instead of the exact date of birth so I would say we're not sacrificing any scientific utility by addressing this and either making sure that we're removing some of these variables or transforming them into less identifying versions this does get tricky sometimes so I've put some resources here that you might refer to if you are in one of these trickier situations but the vast majority that we saw simple changes to the data sets would have resulted in no loss in scientific utility and a great gain in terms of being unable to identify them then on to solutions so I think the good news is that actually most of the cases we did catch were fairly easy you know if someone was just aware when they were submitting did an extra check of their data they would have spotted IP address and they would have said we don't need that we shouldn't share that and I think that's borne out by when we contacted people with these concerns very often they just said oh you know whoops I'll correct that so I think a little education and awareness can go a long way this could be simply in the form of you know a check mark on a submission portal kind of a higher reach goal would be maybe if you have editorial staff on staff that could both check for these sorts of issues and also you know while they're checking maybe data documentation maybe code reproducibility I realize that's kind of a larger investment a simple tip just in your kind of personal research lives is that you might want to consider whether you're collecting unnecessary data that might be identifying one of the major violators we saw I think the majority of IP addresses that we saw were from Qualtrics data collections which by default will collect IP address and insert it into your data set whether you want it or not now you can turn that off and maybe if you're not going to use those data at all just don't collect them in the first place and then the one that I'm I think most excited about is that a lot of these violations follow semi-standard syntax that could be detected through regular expressions and the like and so we actually submitted a grant about this we're waiting to hear back and we might be developing an open source tool that would be able to automatically flag these say you know before a researcher uploads their data just hey we we caught a few columns here do you actually want to upload this this column that appears to be IP addresses and stuff like that but of course this wouldn't catch everything and especially some of these cases with triangulating demographics where it's really a kind of a judgment call that would be more difficult to detect so the takeaways for me I think open data is absolutely worth it this shouldn't lessen our enthusiasm about open data but it's worth doing responsibly and I think there are some simple and low-cost interventions that at this point if we implement them could really mitigate the risk moving forward as we see increasingly more data going on so thanks and at this point I'll take questions on both my slide and I think we can open it up to all of the presenters if anyone has some new questions there just get situated here okay so yeah maybe no questions as of yet so here's one going back to okay could you share the name of the standard that you used to code the data you mentioned the early slide when I missed it HIPAA oh yeah so this is HIPAA which is a I'll paste it in chat specifically the HIPAA safe harbor guidelines which is a it's a standard or it's a it's a regulation in the United States that we used and then I think there's a question for Marianne also oh no we already answered this one okay and I answered some of them in the bytes a bit text okay great and yeah we just have a few minutes we can see if any questions come to light here and if not maybe we can just give the west coasters a little time for a power not maybe before the next session and call it there so thanks everyone for presenting again if anyone has questions for Robbie feel free to send those directly to him via email and yeah thanks for thanks for listening