 So many of you probably know me from doing things around IT security, but I'm gonna surprise you to almost not talk about IT security today, but I'm gonna want to I'm gonna ask the question like can we trust the scientific method and I want to start this by giving you which is quite a simple example So if we do science like we start with a theory and then we are trying to test if it's true, right? So I mean I said I'm not going to talk about IT security, but I Chose an example from IT security or kind of from IT security. So there was a post on Reddit a while ago from a picture from some book which claimed that if you use a Malahut crystal That can protect you from computer viruses Which to me it doesn't sound very plausible, right? Like these are these crystals and if you put them on your computer this book claims this protects you from malware But of course if we really want to know we could do a study on this and If you say people don't do studies on crazy things, that's wrong I mean people do studies on homeopathy or all kinds of crazy things that are completely implausible So we can do a study on this and what we will do is we will do a randomized control trial Which is kind of the gold standard of doing a test on these kinds of things so This is our question. Yeah, do Malahut crystals prevent malware infections and How we would test that our study design is okay We take a group of maybe 20 computer users and then we split them randomly to two groups and Then one group we give one of these crystals and tell them put them on your desk or on your computer and Then we need the other group is our control group That's very important because if we want to know if they help we need another group to compare it to and To rule out that there are any kinds of placebo effects. We give these control groups a fake Malahut crystal So we can compare them against each other and then we wait for maybe six months and then we check how many malware infections they had Now I didn't do that study, but I simulated it with a Python script and Given that I don't believe that this theory is true. I just simulated it is this random data So I'm not gonna go through the whole script, but I'm just like generating I'm assuming there can be between zero and three malware infections and it's totally random and Then I compare the two groups and then I calculate something which is called a p-value Which is a very common thing in science whenever you do statistics and a p-value is It's a bit technical, but it's the probability that if you have if you have no effect that you would get this result Which kind of in another way means if you have 20 results in an idealized world Then one of them is a false positive which means one of them says something happens. Although it doesn't and in many fields of science this p-value of 0.05 is Considered that significant which is like these 20 studies. So one error in 20 studies But as I said under idealized conditions So and as it's a script and I can run it in less than a second. I just did it 20 times instead of once so here are my 20 simulated studies and Most of them look not very interesting. So of course we have a few random variations, but Nothing very significant except If you look at this one study It says the people with the malachite crystal had a on average 1.8 Malware infections and the people with the fake crystal had 0.8 So it means actually the crystal made it worse But also this result is significant because it has a p-value of 0.03 So of course we can publish that like assuming I really did these studies And The other studies we just forget about I mean they were not interesting right? I mean who cares like not non-significant results Okay So you have just seen that I created a significant result out of random data and that's concerning because People in science. I mean you can really do that and This phenomena is called a publication bias So what's happening here is that you're doing studies and if they get a positive result meaning you're seeing an effect Then you publish them and if there's no effect you just forget about them and Then I mean we learned earlier that with this p-value of 0.05 means one and 20 studies is a false positive But you don't usually don't see the studies that are not significant because they don't get published And you may wonder okay, what's stopping a scientist from doing exactly this? What's stopping a scientist from just doing so many experiments till one of them looks like it's a real result although it's just a random fluke and the disconcerning answer to that is it's usually nothing and This is not just Example so I want to give you an example that has quite some impact and that Was researched very well and that is a research on antidepressants so-called SSRIs and In 2008 there was a study. So the the interesting situation here was that the US Food and Drug Administration Which is the authority that decides whether a medical drug is Can be put on the market they had knowledge about all the studies that had been done to to Register this medication and then some researchers looked at that and Compared it with what has been published and they figured out there were 38 studies That saw that these medications had any had a real effect had real improvements for patients and from those 38 studies 37 got published But then there were 36 studies that said these medications don't really have any effect They are not really better than a placebo effect and Out of those only 14 got published and even from those 14 There were 11 where the researcher said okay, they they have spin the result in a way that it sounds like these medications do something but also yeah, there were also a Bunch of studies that are just not published because they had a negative result and it's clear that if you look at the published studies only and You ignore the studies with the negative results that haven't been published then these medications look much better than they really are and It's not not like the earlier example. There is a real effect from antidepressants, but they are not as good as people have believed in the past So we have learned in theory with publication bias you can create result out of nothing But if you're a researcher and you have a theory that's not true But you really want to publish something about it That's not really efficient because you have to do 20 studies on average to get one of these random results that look like a real result so there are more efficient ways to get to a result from nothing and There's if you're doing a study then there are a lot of micro decisions you have to make for example You may have dropouts from your study where people I don't know they move to another place or they you no longer reach them So they are no longer part of your study and there are different things how you can handle that Then you may have corner case results. We're not entirely sure is this an effect or not and how do you decide how do you exactly measure and Then also you may be looking for different things. Maybe there are different tests you can do on people and You may control for certain variables like do you split men and women into separate Do you see them separately or do you separate them by age? So there are many decisions you can make while doing a study and of course each of these decisions has a small effect on the result and It may very often be that just by trying all the combinations you will get a p-value that looks like it's statistically significant Although there's no real effect So and there's this term called p-hacking which means yet. You're just adjusting your methods long enough that you get a significant result and I'd like to point out here that this is usually not that a scientist says okay today I'm gonna p-hack my result because I know my theory is wrong, but I want to show it's true But it's it's a subconscious process because usually it Usually the scientists believe in their theories honestly They honestly think that their theory is true and that their research will show that so they may subconsciously say okay If I analyze my data like this it looks a bit better. So I will do this so So subconsciously they may p-hack themselves into a getting a result. That's not really there and Again, we can ask what is stopping scientists from p-hacking and the concerning answer is the same usually nothing and I Come to this Conclusion that I say okay the scientific method. It's a way to create evidence for whatever theory you like No matter if it's true or not And you might say that's a pretty bold thing to say And I'm saying this even though I'm not even a scientist like I'm just like some hacker who whatever But I'm not alone in this like there's a paper from a Famous researcher John even need is who said why most published research findings are false He published this in 2005 and if you look at the title He doesn't really question that most research findings are false He only wants to give reasons why this is the case and he makes some very plausible assumptions if you look at That many negative results don't get published and that you will have some bias and and comes to a very plausible conclusion that this is the case and this is not even very controversial if you ask people who are doing What you can call science on science or meta science who look at scientific methodology? They will tell you yeah, of course, that's the case some will even say yeah, that's that's how science works That's what we expect But I find it concerning and if you take this seriously it means if you read about a study like in a newspaper The default assumption should be that's not true while we might usually think the opposite and If science is a method to create evidence for whatever you like you can think about if you think about something really crazy like Can people see into the future like does our main mind have some Some extra perception where we can feel where we can sense things that happen in an hour and There was a psychologist called Daryl Bimm and he thought that this is the case and he published a study on it Was titled feeling the future. He did a lot of experiments where he did something and then something later happened And he thought he had statistical evidence that what happened later influenced what happened earlier So I don't think that's very plausible based on what we know about the universe, but and It was published in a real psychology journal and a lot of things were wrong with this study Basically, it's a very nice example for p-hacking and just even a book by Daryl Bimm where he describes something which Basically looks like p-hacking where he says that's how you do psychology but The study was absolutely in line with the existing standards in experimental psychology and That a lot of people found concerning so if you can show that Pre-cognition is real that you can see into the future then what else can you show and how can we trust our results and Psychology has debated this a lot in the past couple of years. So there's a lot of talk about a replication crisis in psychology and And many effects that psychology just thought were true They figured out okay if they try to repeat these experiments, they couldn't get these results even though Entire subfields were built on these results and I want to show you an example which is one of the ones that is not discussed so so much So there there's a theory which is called moral licensing and the idea is that if you do something good or something You think is good then later Basically, you behave like an asshole because you think I already did something good now I don't have to be so nice anymore and there were some famous studies that Had the theory that people consume organic food that later They become more judgmental or less social less nice to their peers But just Last week Someone tried to replicate this original experiments and they tried it three times with more subjects and better research methodology And they totally couldn't find that effect But like what you've seen here is lots of media articles I have not found a single article Reporting that this could not be replicated Maybe they will come but yeah, this is just a very recent example But now I want to have a small warning for you because you may think now yeah these Psychologists that all sounds very fishy and they even believe in precognition and whatever But maybe your field is not much better Maybe you just don't know about it yet because nobody else has started replicating studies in your field and There are other fields that have replication problems and some much worse for example the the pharma company Amgen In 2012 they they published something where they said we have tried to replicate cancer research and preclinical research That is stuff in a petri dish or animal experiments So not not drugs on humans But what happens before you develop a drug and they were only able to replicate 47 out of 53 studies and these were they said landmark studies so studies that have been Published in the best journals now there are a few problems with this publication because they have not published their Replications they have not told us which studies these were that they could not replicate in the meantime I think they have published three of these replications But most of it is a bit in the dark which points to another problem because they say they did this because they collaborated with the original researchers and They only did this by agreeing that they would not publish the results But it still sounds very concerning So but some fields don't have a replication problem because just nobody is trying to replicate previous results I mean then you will never know if your results hold up so what can be done about all this and Fundamentally, I think the core issue here is that the scientific process is Is tied together with results? So we do a study and only after that we decide whether it's gonna be published or we do a study and only after we Have the data we're trying to analyze it So essentially we need to decouple the scientific process from its results and One one way of doing that is Pre-registration so what you're doing there is that before you start doing a study You will register it in a published register and say I'm gonna do a study like on this medication or whatever on this Psychological effect and that's how I'm gonna do it and then later on people can check if you really did that and Yeah, that's what I said And this is more or less than a practice in medical drug trials The summary about it is it does not work very well, but it's better than nothing so And the problem is mostly enforcement So people register study and then don't publish it and nothing happens to them even though they are legally required to publish it and They're there's two campaigns. I'd like to point out. There's the all trials campaign which has been started by Ben Goldecker He's a doctor from the UK And they like demand that like every trial is done on on medication should be published And there's also a project by the same guy the compare project and they're trying to see if a medical trial has been Registered and later published did they do the same or did they change something in their protocol? And it was there a reason for it or did they just change it to get a result which they otherwise wouldn't get But then again like these issues in medicine they often get a lot of attention and for good reasons because if you have bad Science in medicine then people die and that's pretty immediate and pretty massive But if you if you read about this you always have to think that these issues in drug trials at least they have Pre-registration most scientific fields don't bother doing anything like that so Whenever you hear something about maybe about publication bias in medicine you should always think the same thing happens in many fields of science and usually nobody's doing anything about it and Particularly to this audience I'd like to say there's currently a big trend that people from computer science want to revolutionize medicine Big data machine learning these things which in principle is okay But I know a lot of people in medicine are very worried about this and the reason is that these computer science people Don't have the same scientific standards as people in medicine expect them and might say yeah We don't need really need to do a study on this. It's obvious that this helps and That is worrying and I and I come from computer science and I very well understand that people from medicine are worried about this So there's an idea that goes even further as pre-registration and it's called registered reports There's a couple of years ago. Some scientists wrote an open letter to the Guardian whether that was published there And the idea there is that you turn the scientific publication process upside down So if you want to do a study the first thing you would do with the registered report is You submit your design your study design protocol to the journal and then the journal decides whether they will publish that before they see any result Because then you can prevent publication bias and then you prevent the journals only publish the nice findings and ignore the negative findings and Then you do the study and then it gets published, but it gets published independent of what the result was and There are of course other things you can do to improve science There's a lot of talk about sharing data sharing code sharing methods because if you want to replicate a study It's of course easier if you have access to all the details how the original study was done Then you could say okay We could do large-scale collaborations because many studies are just too small if you have a study with 20 people You just don't get a very reliable outcome. So maybe in many situations. It would be better get together 10 teams of scientists and Let them all do a big study together and then you can reliably answer a question and Also some people proposed just to get higher statistical thresholds that p-value of 0.05 means practically nothing There was recently a paper that just argued we should just like put the dot one more to the left and have 0.005 and that would already solve a lot of problems and for example in physics they have They have something called Sigma five which is I think zero point and then five zeros and three or something like that So so in physics they have much higher statistical thresholds Now whatever if you're working in any scientific field you may ask yourself like If we have statistic results, are they pre-registered in any way? And do we publish negative results like we tested an effect and we got nothing and Are there replications of our relevant results? And I would say if you answer all these questions with no which I think many people will do Then you're not really doing science. What you're doing is the acme of our time Thank you very much. No, I have more sorry I have three more slides That was not the finishing line please Yeah, big issues also that there are bad incentives in science. So So a very standard thing to to evaluate the impact of science is Citation counts where you say if your scientific study is cited a lot Then this is a good thing and if your journal is cited a lot This is a good thing and this for example the impact factor But there are also other measurements and also universities like publicity So if your study gets a lot of media reports, then your press department likes you And these incentives tend to favor interesting results, but they don't favor correct results And this is bad because if we are realistic most results are not that interesting Most results will be yeah, we have this interesting encounter intuitive theory and it's totally wrong And then there's this idea that science is self-correcting So if you confront scientists with these issues with publication by S&P hacking surely they will immediately change that's what scientists do right and I want to cite something here with this Sorry, it's a bit long But there's some evidence that in fields where statistical tests are commonly used research which yields Non-scientific if you can't results is not published that sounds like publication bias and Then it also says significant results published in these fields are seldom verified by independent replication So it seems there's a replication problem These wise words were set in 1959 So By a statistician called Teodor Sterling and because science is so self-correcting in 1995 he complained that This article presents evidence that published results of scientific Investigations are not a representative sample of all scientific studies These results also indicate that practice leading to publication bias has not changed over a period of 30 years And here we are in 2018 and publication bias is still a problem so if science is self-correcting then it's pretty damn slow in correcting itself, right and Finally, I would like to ask you if you're prepared for boring science Because ultimately, I think we have a choice between what I would like to call Ted talk science and boring science so So with Ted talk science we get mostly positive and surprising results and interesting results We have large effects many citations lots of media attention And you may have a Ted talk about it unfortunately, usually it's not true and I would like to propose boring science as the alternative which is mostly negative results pretty boring small effects But it may be closer to the truth And I would like to have boring science, but I know it's a pretty tough cell Yeah Sorry, I didn't hear that Yeah Yeah, thanks for listening Thank you two questions or We don't have that much time for question three minutes three minutes guys question one shoot But I just wanted to comment Hannah you missed out a very critical topic here, which is the use of Bayesian probability So you did conflate P values with the scientific method which isn't which gave the rest of you talk I felt a slightly unnecessary anti-science lent on P P values isn't The beyond and all of the scientific method so P values is sort of calculating the probability that your data will Happen given that the know hypothesis is true Or as Bayesian probability will be calculating the probability that your Hypothesis is true given the data and more and more scientists are slowly starting to realize that this sort of method is probably a better Way of doing science than P values So this is probably a a third alternative to your sort of proposal of boring science is doing You have a really a Bayesian probability. Sorry, you have a brief reaction. We only have I agree with you I unfortunately I only had half an hour here Where are you going after this like where are you going after this lecture? Can they find you somewhere in a bar? I know him You know science is broken, but then scientists it's a little bit like the next lecture actually that's waiting there It's like you scratch my back and I scratch yours For publications, maybe two more minutes Yeah, hi, thank you for your talk I'm curious. So you've raised You know ways we can address this assuming good actors assuming people who want to do better science that this happens out of Ignorance or willful ignorance What do we do about bad actors? So for example the medical community? Drug companies maybe really like the idea of being profitably incentivized by these random control trials to make a essentially placebo do something How do we begin to address them current trying to maliciously p-hack or maliciously? Abuse the pre-reg system or something like that. I mean it's a big question, right? But I think if the standards are kind of confining you So much that there's not much room to cheat That's the way out, right and I basically and also I don't think deliberate cheating is that much of a problem I actually really think the bigger problem is people honestly believe what they do is true Okay, one last you sir, please so the value in science is Often the account of publications right to count of citations so and so on so is this true that to improve this situation You described journals whose publications available who are like perspective should impose more high standards So the journals are those who must like raise the bar. They should enforce publication of Protocols before like accepting and ETC ETC So is this journals who should like do work on that or can we regular scientists do something also? I Mean you can publish in the journals that have better standards right There are journals that have these registered reports But of course, I mean as a single scientist is always difficult because you're playing in a system that has all these wrong incentives Okay, guys, that's it. We have to shut down, please. There is a reference better science org go there And one last request give a really warm applause