 People like stories. When things happen to us we like to construct an explanation, some narrative. We're very unwilling to admit it might just have been chance, just coincidence. Maybe we evolved to do this, but to me this suggests we really need mathematical help to try to distinguish signal from noise. Here's a map of London. Let's pretend that each grain of rice is the occurrence of some new disease. We've plotted them on the map, but we know in fact that I've scattered these grains at random. Well, I've tried to. And yet we've got clusters appearing. We could look at these disease occurrences and say, well, Willsden, Willsden, clustering Willsden, there must be something going on there. And they'll be deeply misleading. Events that occur completely at random do tend to cluster. And that's why we need rigorous statistical analysis to try to detect when a cluster is real or when it's just due to chance. Some great statistical papers have appeared in the philosophical transactions of the Royal Society. A real classic was Sir Ronald Fisher's 1922 paper when he essentially laid the foundation for modern statistical inference. One of Fisher's great contributions was to identify a measure of the compatibility between a set of data and a preformed hypothesis. And this is the p-value, which is the probability of observing such extreme data if this original hypothesis were true. Then if that p-value is small, we can say either something very surprising has occurred or our original hypothesis can be rejected. Take the Higgs boson. The scientists required very high levels of confidence before they concluded it was there. The p-value was less than 1 in 3 million. And that's both because it was a very important claim, but also because they searched so hard for it. We need to make sure that claim discoveries are not because of cherry-picking fortuitous extreme results. The published scientific literature might not tell the whole story. We need to know the trail by which the conclusions were reached. So I'm off to talk to Nicole Yance, who studies the reproducibility of research. So Nicole, what do you get your students to do? So my students take a published paper and try to reproduce the results. So they download the data, they do all the analysis, and they try to get exactly the same tables, exactly the same figures. Most of the students get very frustrated. They can hardly ever reproduce the same figures. So can we trust what we read in the scientific literature? Unfortunately not all of it. On the one extreme is fraud, where actually authors have manipulated data. There are honest mistakes as well, but just things where you feel like this hasn't been handled very well. You can do better. And then in the middle there's all kinds of things. For example, you get all kinds of results. And then you say, oh, I'm just going to cherry-pick all the significant ones. If you test multiple hypotheses, you should report all the results. If you report only selectively, I think it's definitely misconduct. And one of the problems is spotting this, of course, because how do you know what you're not being told? So how easy do you find it to get the full data from researchers? Well, that is a big of a problem. A lot of people are slightly afraid to give out their data because then someone will come and check. I think I'd be worried that somebody would find a mistake in my working as well. I would be worried too. Yes, of course. But I think, you know, if you work transparently, then at least I can always say I was open. And if you do find an error, yes, I'm human. That can happen. I wouldn't want to trust a study where I can't see the data. If you want to get published, usually you better have significant results that is backed up by all your figures and all your data. I had a paper rejected because there's a bunch of null findings. And I thought, yes, exactly. That's my result. It's still something that we learned from. Of course we learn a lot from that, but it means that the story might not be newsworthy. Most often research might not be newsworthy in itself, but it helps science anyways. Journalists should really take up a stronger role in asking authors to upload their data and telling their reviewers that a null finding is not a reason to reject a paper. So we should work against publication bias. But I also think at the moment there's not a lot of courses in the UK that ask students to replicate published work. This is such a good way for students to learn stats, to do it better and to see when research is really reproducible and when it is really not. It's great that the analysis of individual studies is being checked and even better if independent teams repeat the entire study. But that's not the whole problem. We know that there's lots of studies that are done and never get published. And that's why we need open data. We need access to the totality of evidence, all the studies that are done, and not just the ones that receive prominent publication.