 So, welcome everyone. It's a great pleasure to introduce Professor Shinichi Nakagawa, who is visiting us in the framework of the theoretical science visiting program. Professor Shinichi is a professor at the University of Southern Wales in Sydney, right? Yeah. And where he leads the Interdisciplinary Ecology and Evolution Lab. And his main expertise is in meta-analysis, so the science of answering questions by combining a large number of different experimental studies. One of the course of his activity has been to use this technique in behavioral ecology, but then he's actually his productivity is rather impressive. So he runs stuff in a very vast number of directions and he will tell us a bit about this work today. So, Shinichi, it's a pleasure to have you. Yeah. Thank you very much, Simone. And yeah, what a sort of privilege to be here and what a wonderful place you have. And actually, I'm Japanese, originally from, actually, I was born in Nagano. And, you know, this is sort of my image of paradise. You can see the coral reef and you can work as much as we want. This is wonderful. So we're going to tell you about like meta-research and meta-analysis in ecology and evolution, the vast topic, but see how I go. First, I'd like to acknowledge my group members in Sydney. They're probably missing me by now. And I'd like to acknowledge our funding source ARC. So there's two parts to this talk. First, are we in a reproducibility or replication crisis in ecology and evolution? So what is reproducibility or replication crisis? I use these terms interchangeably in this talk. And the first time I heard this term, reproducibility crisis, was in this paper, Nature, back in 2000, about 10 years ago. And it said, they're trying to replicate a cancer landmark study, 53 of them, 90% of them, they couldn't replicate. You're not talking about behavioral ecology studies or anything like the cancer landmark study. And this is pretty shocking, isn't it? And after one year later, psychologists actually tried to replicate 100 key experimental studies in their field, and 68 of them didn't replicate. So that's what they sort of like this paper made, this paper in science made, I guess, the reproducibility crisis or replication crisis, very famous. And if everything replicates, this is the original and replicated study, 100 studies there, they should all rely on this one-to-one line. But as you see, lots of them effect size much smaller, and two-thirds didn't replicate. So, you know, psychology is certainly in the replication crisis. So after a few years later, there was a survey in Nature, they asked this, you know, 1600 scientists across different disciplines, asked, are we in a crisis? Okay, this is the answer, 52 of them said, yes, we are in a crisis, you know, we need to do something about it. And 38 said, oh, we are in a slight crisis, whatever that means. So all together, it's quite serious, 90% of us are thinking like, oh, we are in a crisis, we got to do something about it. Me personally, is interested in this question that, are we in a replication crisis in ecology and evolution? Actually, we saw this large replication effort by different labs in psychology, this was done in economics and also computer sciences, and they produce similar sort of results. Some are more better than others, but you know, lots of results didn't replicate. And the reason, you know, we are yet to do such large effort replication study in ecology and evolution, but we have a very good reason we haven't done so. We deal with in ecology, we deal with those, like how many species are nearly nine million species in the world and we deal with diversity of species or all the biodiversity. Like for example, my colleague studied seychelles, how are we going to replicate that? You know, that's not going to be easy. However, multiple lines of evidence suggest we are probably in replication crisis in ecology and evolution. So first line of evidence, I've done the survey with my colleague now several years ago. Maybe how many of you have heard of questionable research practices? Maybe, no. So questionable research practices is what it sounds like. So it's actually selectively reporting, you're not reporting everything, just reporting a significant result, p-hacking, many of you probably heard, you're throwing in the interaction, all sorts of things till you get less than 0.05, hypothesizing after results are known, probably many of you have done this, you know, you had an original hypothesis, but you saw the result, we swapped the result, a hypothesis around, and also more serious note is a fraud making up data. So we actually did the survey, nearly 800 ecologists and evolutionary biologists together, 64% of them said they have engaged in one of those questionable research practices. That's pretty, like it's a majority of us has engaged in one of those, and then we have to, like I was told this was okay when I was student, selectively reporting, and that's very, very common. What would you guess? So we included this, what percentage of people have made up data, how many, any guess on what percentage? It's 3%, that's pretty like 24 people actually said, like those are like, I guess what I call honest people, so real percentage would be like a triple. So that's kind of like very scary, 10%, you know, one in 10 of us maybe have made up data, and maybe you're thinking like, you know, maybe there's a mathematician or physicist in this crowd, and they're thinking like nothing to do with us. Actually fraud happens all the time in any single discipline, like mathematics, for example, they make proofs so difficult, nobody can understand, and the reviewers are too proud to say they don't understand they get published. You can cheat all the time, right? So we, you know, there's a systemic problem in ecology and evolution and all other fields. And another evidence from ecology and evolution, my colleague and pizza cleaners group from Croatia, they actually did sort of meta, meta analysis of the, putting all together the 10,000 studies to quantify what proportion of studies in ecology is being wasted. So they estimated about half of studies are never published, 67 of them are poorly planned, and 41 of this is selectively reporting, 40% of them selectively reporting. So their estimate is shocking. Optimistic, optimistic estimate is we are wasting 82%. And the worst estimate is 89%. So like, I'm not sort of here to say like, well, science is so bad, you know, this is hopeless, but actually science is good, and that's the best we have. But we can actually do much better. Interestingly, this came out in Nature Ecology and Evolution last year. So about 10 years ago, they've done exactly same sort of stuff in medicine. And their estimate was 85%. So it's very similar. And I think it's very common across different fields. Anyway, so those are lines of evidence, maybe we're in an application crisis. And this is where our study comes in. So providing more evidence, I guess. Our question was that what's the impact on publication bias? I assume everybody knows publication bias. So people trying to publish just positive results. What's the consequence of that on the knowledge about effect size, statistical power, type M and type S error? Probably I need to explain what type M and type S errors are. You probably already know type one and type two error. That's a false positive and false negative type MRIs. Once if you get statistical significant results, what's the mistake or how much that's further away in terms of magnitude from true effects? That's a magnitude error. And type S is a bit easier. Type S, once you get significant results, what's the probability percentage of you getting signs wrong? That's quite serious, isn't it? You said we found positive effects, but it's actually your true effect should be negative. So maybe you are not familiar with type M and type S. I think it's okay. So it's a magnitude error and a sign error. We quantify those, but to do this, we need it like a true. So to do the power analysis or estimating those publication bias, we need to know the true effects of phenomena. How do we do that? The best way we can think of when people in other fields have done this is get all the meta-analysis and see the meta-analytic mean, overall mean of meta-analysis to take this as a surrogate of true effects. So to do that, we need a lot of meta-analysis. We need more than 100 meta-analysis if we want to do this kind of meta-meta-analysis, so to say. But we are lucky because we have a couple of years ago, we actually created this prisma echoable that's a pre-referred reporting item of system reviews and meta-analysis ecology and evolution. So this is a reporting guide for if you do meta-analysis and ecology and evolution, this is how you report it. Through this paper, we actually reviewed 100 meta-analysis and ecology and evolution to see the reporting quality. Have you heard of prisma? So original prisma was published in, so this is a pre-referred reporting guideline published in the medicine and this was published 2008 and now I just checked Google's site at 150,000 times. So that's one of the most cited guidelines of meta-analysis because meta-analysis in medicine is serious. Because if you go to GP, what they are referring to is meta-analysis those medical people conduct, what's the latest evidence. So through this paper, we actually had nearly 100 meta-analysis reviews, so we use that meta-analysis mean as our true effects of phenomena. Anyway, so around that 100 papers, 87 meta-analysis are usable and they have different kinds of effect sites. Maybe some of you are not familiar with those standard mean differences. They are comparing the control and the experimental studies. They use those standardized version of mean difference or sometimes they use this ratio of two means, log-response ratio, and they tie just the transformation to correlation, those effect relationship between two variables. So those standardized effect sites they use in ecology and evolution because different studies, different units, so you need to standardize to put them all together. That's what meta-analysis does. So those are 87 meta-analysis, they all came with all the data sets. So we have basically those estimated effect sites, standard data from like thousands of studies. The question we are asking is what's the impact of publication bias on effect sites and all those statistical parameters I mentioned? So we needed to actually quantify publication bias. We needed to do the methods. So a couple of years ago we reviewed how to correct publication bias or how to detect publication bias which we published in there. It's a review article but it actually has a new method to quantify how much publication bias particular meta-analysis has. So we use that method and that's the result. So blue is the original effect size and yellow is a bias corrected. So impact of publication bias and effect size. So you remember we started off with 87 meta-analysis. All those, once we correct for publication bias, there's a 23% reduction in overall effect size. So those are the original overall effect size from 87 studies. They're all going down after correcting publication bias. More serious note, so actually 87 meta-analysis, originally 37 of them are not significant, so not different from zero, but 50 meta-analysis which had a significant result, 33 out of 50 of them changed sign to non-significance. So that's like we have like this vast knowledge from meta-analysis, usually meta-analysis taken as the last word or overall evidence and a lot of them they don't correct for publication bias actually. And once you do, we lose the sort of confidence in more than two-thirds of it. So what's the effects of publication bias on the statistical power? I mean you probably know the statistical power is whether you have enough sample size or power to detect true effect. If there is an effect that exists, what's the percentage you can detect that effect? So average power, so if you take the random sample of ecological paper on average, they only have 23% of power. So that's pretty bad there because if there is an effect you are just getting like less than coin flip, it's better you do a coin flip. But it even gets worse after bias collection, it's only 15% on average power. And type M error, so if you get significant result paper you have, they're overestimating the effect, three effects by nearly three times. Once after publication bias collection, it's 4.4% there are times that gets even worse and the same sort of story with type S. Type S is like what's the probability of getting signs wrong. That's 5% is not that a lot, but once you after bias collection, so what's published if they say results significant, it's a positive effect, one in 10 times maybe it's a negative result. So all those are quite serious, they're based on quite a few studies. So what is the solution? So I told you all the bad news, but there's some good news to sort of how to remedy this kind of situation, how to reduce publication bias. People think I'm crazy, but you know if you publish everything that's clearly fixed the publication bias. And I'm not kind of saying like just randomly published stuff. My guesstimate of the truly published thesis chapter is about 50%. So if everybody puts their thesis chapter to the preprint server, like bio archive, archive, and my colleague and I created echo archive, so some people kind of say like oh we have bio archive, why do we need echo archive? Because echo archive, we wanted to raise the awareness of preprint, so everybody use preprint, then you know even you don't publish, not accept in journal, it stays our scientific knowledge preventing publication bias. And also like echo archive, we started three years ago, now we have over a thousand preprints and a monthly access to 25,000. So that's not bad, that's great for the community, you know increasing knowledge. Another solution I talk about is the registered reports. How many of you heard of registered reports? Okay, one, two, yeah a few, and this is a cool way of it's actually last last month or last week, nature even started to accepting a registered report, how it happens is usually you get reviewed once you start you know finish study and you know write it up, you get reviewed here, but the registered report you actually review that writing intro and method you get reviewed, and you get a comment, you improve, and you actually do the studies, and you get like you know stage two reviews quite lighter, because they only review making sure it's not they're not assessing you know the results are exciting or statistically significant, they're assessing whether you actually did what you promised to do, and the statistics are like actually telling, so the they compare, so this is a registered report, not to be confused with actually pre-registration or registration, but they are there related, but the interesting thing is, so normal publication, this is the data from psychology, 90% of papers support their hypothesis, so 90% of paper creates positive results, yeah, you're kind of even thinking you don't even need to do study, you know, because 90% your hypothesis, right, what's the point doing your studies, but the registered report that drops to 40%, so this really changed the way we do science, and it's actually negative results are published, problem with current science is we don't know what we don't know, because negative results are not published, so we encourage to do this, and in fact that study I just shared, this meta-meta-analysis stuff, that's actually registered report at BMC Biology, and it's soon to come out from that journal. Another way it's not, so we can actually change the sum of the culture, so this is the idea we came up with some workshop a couple of years ago, is when we do the, you know, literature census, like meta-analysis, what happens is those people, many of you including myself, is, you know, we are all empiricists, some of the theorists, and we are collecting data, some of those effort translated into the publication, but some of them don't publish, yeah, and the synthesis, some of you might be the, you know, they might be the empiricists, but they could be different people, they are like putting all this published work together, so what sort of like synthesis is clearly biased, because those non-significant results, some of the effort is not captured, so we sort of, in the future, that should change, and what we propose is open synthesis community where everybody should be, so if you are working on a certain topic, let's say, I don't know, sexual selection, that's the behavioral ecology topic I have worked on, that was my PhD topic as well, we should put sort of like a group, you know, network of this, and all the people, regardless of their publisher in the primary research, that's, they aim to do anyway, but they all be, or also our contributor of the synthesis, because even they don't publish, the synthesis will be published all together, so that's what we call open synthesis community, I'm waiting for Grant to support this idea, so you know, watch this space, I guess, so it's interesting stuff in psychology, I started with, I mean, nearly finishing the first part, started off with this reproducibility crisis, in psychology, they don't, they try not to call this anymore, because Christ sounds too serious, and we have too many crises anyway, yeah, they want to call it the credibility revolution, because you know, once you identify the problem, there's a problem, we can fix it, that's the opportunity, isn't it? They sometimes call it something opportunity, opportunistic like moments or something like this, credibility revolution, so we need to make credibility revolution happen in a collagen evolution, to that end, a couple of years ago, my colleague and myself created new society, and especially young ecologist and evolutionary biologist, I want you to join, it's free or just $10, even like, I think senior people, it's $20, they do the conference, they do a lot of workshops to support, and there are lots of resources available, so you know, the, I guess conclusion of first part of my talk is, you know, join the credibility, how hard is it to say this, credibility revolution, to change the way we do science, yeah, ecology and evolution, so leaving that first part, and second one is little bit more uplifting, hopefully, the future of meta-analysis and ecology and evolution, I'd like to acknowledge my colleague at UNSW, Will Cornwell, plant biologist actually, and colleague Callahan, he was a PhD student at UNSW, now moved to University of Florida, and it's really interesting like working with him, because he really sort of like created this new field of using citizen science to answer many different questions, I was very lucky to work with him, so I want to share a little bit of story, working with Paulie and some other work as well, so citizen science is changing the way we collect a biodiversity data, and many of you have probably used it, if you are not, you know, if you are ecologist and biodiversity person, so global biodiversity information facility, often called GBIF, is a meta-data basis, many different nodes, Australia, we have a living, at rest of living Australia, we've been working with them as well, they have a multiple regional nodes, and every day they're getting millions of data, yeah, so this is a huge opportunity to utilize this data as a biologist or ecologist, and one of those nodes across the world is called eBird, eBird you can go off to do the bird watching, yeah, so even our children can contribute, so bird watching, that's not my child, but this is a mobile app, you can get those in one of your phone, and you can go out and bird watching, and this is what checklist looks like, and this is the important one, come back to it, remembering this checklist is quite important, so what it is is like, ah, you can put how many people are there, what's the, you know, effort time, how much you coverage, mileage, and you can put like, you know, seven different species, it's important thing is you are putting all the, how many of each species you have seen, okay, this is a checklist, yeah, and millions of people are doing it, so for an example, we used eBird to estimate, ask the question, how many birds in this world, answer is actually 50 billion birds, so seven, a little bit more than seven birds per person, and how did we do this, using the eBird data across the world, and also combining with all sensors, more reliable, we assumed all the true birds estimate sensors bird data, and we used some implication technique based on the, you know, how easy to detect color, flock size, body size, IUCN status, and this study is like, you know, 50 million birds with large confidence interval, but what we have achieved is we have estimated number of individual per each species, 9,700, okay, so that's kind of amazing what you can do with this citizen science data, you know, that was not possible, let's say even five years ago, but you were kind of thinking, that's not meta analysis, that's just a big data analysis, so I'm going to tell you about our sort of biggest meta analysis I ever done or world has ever done using eBird data, but before that I need to tell you about, so this is a theoretical, supposed to be theoretical work, it's going to be just, I'm talking about meta science stuff, but you may have or may not have heard of, it's a bit like a second law of thermodynamics, which everybody knows, but the second law of macroecology nobody probably knows, I mean, apart from ecologists, so that's called abundance occupancy relationship, what it is, is how wide the spread of species, those are data from this paper and that, I think this is a fish species and abundance, what it says, widely distributed species are more abundant per unit space, I don't know, this seems like intuitive or counter-intuitive, I don't know, but this relationship is so robust, that's why it's called second law of thermodynamics, not thermodynamic macroecology, they used in a conservation, because if there are species wide spread, you don't need to worry about that particular species, or fish called is clever, this species of fish wide spread, you can just harvest a lot more, and this is taken very seriously, that's very theoretical, Hubble's very famous unified neutral theory of ecology predicts this relationship, that's why it's taken seriously, of course there has been meta analysis on this, already quite a while ago, they put like 300, nearly 300 data points together, this is called panel plots, and what it's showing is this is basically the transformation correlation, let's say correlation of 0.6, that's, it's actually correlation of 0.6, this correlation, and those are like just one data point, they're putting all those available correlation, and taking a mean of it, so overall mean, meta-netic mean of this abundance occupancy relationship at 0.6, that's by far the largest or biggest relationship, or how do I say largest relationship I've ever seen in ecology and evolution of biology, I mean this is why it's probably called the second law, this is so strong, and supported by empirical evidence, however, yeah, there's a disregarded hypothesis, the easiest explanation of this second law of macroecology is the sampling bias, widespread species is easiest to see, so if you are surveying a unit space, widespread species, you're going to find it first, there are other sort of rare one, maybe densely populated in one area or something, so this relationship would disappear if you observe units like a space, like really extensively, this relationship would disappear, but this has been disregarded because meta-analysis, lots of published work supports the second law, but we talked about publication bias, what if there's a publication bias, whole ecological literature has tainted with publication bias, so now it comes to eBird data, because eBird, when people are going out to the bird watching, they are not worrying about publication bias, they just want to go on and see what's bad, so I'm going to use eBird data to quantify to see whether this second law of macroecology would hold, okay, how did we do that, so based on nearly 8,000 species, let's say, so you remember, I asked you to remember the checklist, so you go out here, you go out the bird watching, you'll be doing your checking list, you're putting what species you saw, how many you saw it, so those are all the checking lists we collected from the eBirds, this is in the US, this is in some in Europe, here is Australia, it looks like Melbourne, but let's say Sydney, maybe the coral is collecting, going out, the eBirds, 10 eBirds, so you can see each checklist you put in eBird, we get this local abundance, so eBird was 10, other species are 20, so we watched about 20 different species, and range size we know from GBIF data, each 8,000 species or near close days, we know that data, so we can calculate the correlation for each checklist, so eBird checklist, we calculate correlation, turn into data, basically this is just a transformation, so make it easy to do the statistics, and then we turn, and n is a number of species, he has 20, he has about 10, some people just go out in one bird watching trip, they can watch about 200, it's a good day for the bird watchers, and we can put all this together, meta analysis to putting all the effects size together, and if you put together, it should look like this, precision is higher precision means it's based on lots of number of bird species, and lower precision, it's not many species you have seen, so there's a lot of error in this, but as your list grows, this should converge to true effect size, and what we are expecting is 0.6, correlation of 0.6, that was estimated by earlier meta analysis, but what did we get, so we're gonna share our meta analysis using eBirds, it's so far okay, this makes sense, so checklist, we calculate correlation each because we know the local abundance from checklist, and we know the range size from GBIF data, that's okay, so and this is a result, and I take you through, I mean you can see it's a bang on zero, yes, and here's probably like a precision, it's hard to say, but this is the list having like 200 species, around here are just only 10 species, if you are just 10 species, sampling error make it like any type of correlation, but as sample size increases, it converges, we were expecting correlation or z-tub about 0.65, but it's actually bang on zero, and it's based on 17 million effect sizes, so that's why I claim this is the biggest meta analysis, and actually based on 3 billion observations of individual births, and overall effect size 0.15, since I have lots of data set, it turns out to be, this is like a positively significant, are we supporting this second law, not at all, I get to this, you know, as you know, sample size is a lot, anything can be significant here, meaningless, but what's most important or interesting observation which supports, it's actually this, we can nullify this second law macroecology, it's this number, I squared, I squared is a proportion of like actual true variability because you would expect this is all across the world, maybe those relationships differ between Japan and Europe and Australia, but almost like 86.5% of variation you see in this figure is actually sampling variance, a difference in sample sizes, and this is the smallest I squared I have ever observed in the meta analysis, so this tells me actually this second, you know, abundance occupancy relationship, what's statistical artifacts, even though this is in the textbook, and you might remember originally meta analysis, meta analysis, they found the correlation of 0.6, they said the failsafe number indicates, so some sort of publication, sort of bias test indicated that more than half million unpublished null results would be required to nullify their results, but you know it's okay because we have a 17 million, not just, you know, half a million, so that's quite surprising, there's another one, so I said it's actually 0.015 is significant, so it's a bit hard to see because it's 17 million data points there, but here log F4 times, so one log F4 times is about three minutes, and the five log F4 times is about three hours, and as you can see, you can't really see, but you know you need to believe me, actually when you are putting less effort, it supports sampling bias hypothesis, you get positive relationship if you are not putting much effort, that's the second low macroecology, but if you watch about three hours this completely disappears, so this 0.015 significant result becomes like bang on zero, not significant, if you correct for effort time, okay, so time-wise I'm okay, okay, so future meta-analysis, promised future meta-analysis, so I think meta-analysis, if you have done it, well maybe you haven't done it, but you have read it, those meta-analysis you see is usually literature based one, but we can go beyond literature based meta-analysis, you can use archived raw data, actually this is becoming like I think standard in medicine, they call the individual patient participant data meta-analysis, anyway citizen science data we have shown you the example how to use, and you know you can use this in the meta-analytic framework, and also you can incorporate climatic data into meta-analysis, so example I want to share is it's sort of like you know a suitable example here, because on the village is the village of coral, so I want to I've done the some meta-analysis coral disease prevalence over here and the temperature, so this is an example of the data integration I'm advocating, so you hear about you know the coral bleaching being a really serious problem, but neglected is actually they been infested with those different all sorts of disease, and then we've done meta-analysis around based on 300 papers and as you can you kind of can see over the years, yeah, prevalence this proportion of disease corals increasing, yeah, and those are three different oceans, Atlantic, Indiana, Pacific oceans it's increasing, that was you collected from the original data 300 different studies, but you know you can use those you know online all those climatic data available, so you can actually get all those average summer sea surface temperature on that particular spot, and you can just do the meta-analytic analysis there, and you can see not only here how you know predicts the prevalence, but also higher temperature predicts disease, and we done the some prediction as well, bad news is by 200 half of coral disease and corals disease or something, it's bad news, but you know we can change some things still a while away that 2100, but this is what I call data integration, it's not just you know literature based data, but we can put all sorts of data together to do meta-analysis, that's the future of meta-analysis, ah so big data you know what about the big data you know we've done some big data meta-analysis, and you know I talk to my computer science colleagues and they say like oh Shinichi you know meta-analysis this is gonna be obsolete because we just gonna do big data analysis, and that's gonna be mainstream, and I actually disagree with that opinion like them, so Michael Chang, Dana Jack, they are two meta-analysis, meta-analysis I guess, and they propose this kind of idea, I've been also thinking is the split-analysed meta-analysed approach, and basically big data is too big and it has a structure, so you can actually split data into different data chunks, maybe it's a ear, it's a place, it's a trait, it's a different, you know it's so heterogeneous, then each of the dataset you can actually calculate the effect size yeah, and you can use the meta-analytic method to put together effect size, by doing so you can actually use all the cool statistical tools people have been developing meta-analysis, meta-analysis is a big field, there's actually lots of medical statistician working on it, social science statistician working on it, so there's very rich tools we can use, and it's suddenly big data doesn't feel like a big data anymore, because we are chunking into the effect size, and we can actually do some analysis which people can understand, ecologists understand, yeah, so I want to share that I'm nearly finishing my talk, I want to share a couple of examples of this split-analysed meta-analysed approach from our lab, this is second the second final slide I think, so amazingly yes, like it's data everywhere these days yeah, because you know this international mass phenotyping consortium, all that online, you've never heard of it, but it's over 500 traits, crazy, and the 100,000 mice, it's increasing counting across 12 institutions in that, like you know maybe Japan's included, like US, Europe, so you can actually just download all those data, and you can ask any questions you like, we are evolutionary biologists, so we want to ask the evolutionary question, one of the questions asked, so actually the original paper I saw was using this data, you know the sexual dimorphism, males and females mice are different, but what trait, which trait, how many traits they are different, yeah that was a nature communication paper I saw, and I thought like, oh this is cool, I can, but what I want to know is not the mean difference, I wanted to know, oh yeah, sexual dimorphism in trait variability, so this is data that's so vast, you can actually pull many traits, 500 traits you can imagine, there's a male distribution, female distribution, we are interested, evolutionary biologists are interested in the variability of the traits, so we compare, is there sexual dimorphism, what difference is between male and female traits in the distribution within, not the mean, so that was this paper in eLife, and using exactly the same data set, we recently published in this one, and we looked at, so it's interesting, there are distributions quite different, how about the allometric relationship between male and females, and it turns out to be very few people have looked at the difference in the allometry, you are wondering what is allometry and why is that important, and actually it becomes quite important in the drug sort of dosage stuff, because how do I explain, most of actually allometric study or dosage, drug dosage, based on male allometry, because they only study male mice or male human subjects, so if male and female allometric relationship between weight and metabolic, you know, their metabolic capacity is different, if you are using male allometric slope to adjust your drug dosage, you get it wrong, you may be overdosing female, maybe under dosing female, and there's certainly example for this, but that is already, our contribution is using, so using most, I forgot most, we use this split analyze, meta analyze, because you can split by trait, so 500 different traits, we can calculate male and female effect size, compare them, analyze in a meta-analytic framework, and we can understand the male-female difference, and it's probably most systematic and the largest study comparing male-female allometric differences. Okay, the last take-home message slide is the ecological and evolutionary studies, just a little bit, like, vastly underpowered, so we need to change something, we need to collaborate more, we need to do more meta-analysis, you know, we need to increase sample size, there's a lot of ideas there, but we need also credibility revolution, credibility revolution, yes, and so we managed to nullify or, I don't know, it may get challenged, yeah, it's just one study, but I nullified or disproved second law macroecology, so maybe this was, I think, largely due to publication bias, and maybe our knowledge is shaped by publication bias, but we need more study to sort of looking into the effects of publication bias. I think there's a great future combining different types of data, data integration, meta-analysis being one, also meta-analysis have a critical role to play in the era of big data, and there's so much data we are getting into theory poor data-rich era, so this program, like, you know, theory sort of, like, you know, focused visiting program, and the oyster is hugely welcome, and the meta-analysis can actually generate lots of new theories as well, which I'm going to work on here at the oyster, I think, and also, more out of this talk is everybody should be doing a meta-analysis, so I'd like to thank future meta-analysis for listening and my talk, thank you. Yeah, thanks, lots of food for talk, and please, we have questions, are welcome, please use the microphones on the tables for asking questions so people in Zoom can listen. Any questions? Dave? Hi, so I just am really interested, I use eBird a lot, I'm an avid user, but I'm also a terrible user of eBird in that I'll go out and I'll just randomly, you know, add the warblers that I see and ignore everything else, so how do you deal with, that's kind of like a form of publication bias, right, and like people going out and ignoring a lot of birds, how do you deal with that in your study? So, Dave, so the short answer is we didn't, and we assumed, I guess, eBird represents the presence absence data, and most of like, you know, citizen science data, as you have pointed out, if it's just a presence data, it's not as useful as presence absence data, so this we have to, you know, seriously address this, but I think Kori has worked on this, and I can't sort of, you know, remember off the top of my head, but certainly there's some like, work addressing this point, but it is, I mean, you know, our, you know, our publications bias, also citizen science has this bias as well, because lots of people will just write down, oh, Kingfisher, this is so cool, I'm just gonna put that in the, you know, so maybe rare species are sort of like, you know, has more, you know, overestimated, we certainly agree with that, but there's certainly there's a, you know, all those presence absence and observe, how do you call it, the easy to observe factor can be controlled for, that's what we try to do when we did this, estimated, like, you know, the all about species, how many individual, we try to control all those factors, I mean, certainly, I mean, like, you know, you might, you know, you might, if you see the data, our paper, it's a huge confidence interval, and then, you know, there was a lot of criticism, there was comments, like, oh, this is so useless because confidence intervals massive, you are not estimating anything, but we say this is, you know, this is the start, isn't it, because those data will be getting robust, those issues will be sorted in the future, maybe we can, you know, machine learning could, you know, correct some of the human behavior biases, so I think it's a first step, you know, I certainly say this, I told 50 billion birds in the world, this is probably wrong, right, but the confidence interval is so big, I think the true number will be in that region, yes, sorry, I haven't really answered your question, but yeah, I dodged it quite well, yeah. Other questions? I can ask one, so I'm curious about what's under the hood, so for example, for this first part on publication bias, how do you estimate the impact on publication bias? Okay, so I had actually slide, but I deleted this, so this, so actually like estimation of like a true publication bias, we need a sort of model called selection model, and that's quite difficult to implement, but there's a very clever regression based, a simple, simple one, yeah, maybe I can use the, yeah, the choke, no, I try to explain, oh yeah, maybe, oh yes, yes, maybe I try to, so that's, here's the effect size, here's zero, and what is the standard error or some sort of error, or sampling error, sampling error, and publication bias happens, if you, you know, let's say the relationship zeroes here, and if publication bias happens, what happens is the larger effect size have a larger standard error, so cloud of data is like this, if there's no publication bias, cloud is like this, yeah, that's, that's good, sampling error, effect size, and actually, so this regression line at zero, so data points is here, it's a very clever, like a simple method is, we assume, since sampling error is zero, that's an infinite sample size, we assume this is actually true, because if you conduct meta-analysis on this cloud, you are estimating effect size overestimating here, but when you assume this regression, it's a little bit more complicated, this, I mean, I just simplified this method, it's a very clever, simple method, so actually you can take this regression line, since sampling error is zero, this is a true effect size, so this is how you actually estimate, so original effect size is overestimated, you correct for it, so when I said bias correction, I use this method, and you know, publication bias, any method is, we don't know the, like, how much publication bias there is, so we need to take some, you know, we make lots of assumptions, but this method is really simple and empirically, like, supported quite a lot, there was a Nature-Human-Behavior paper, which actually looked at the difference between meta-analysis, so let's say, like, you know, 30 different topics in psychology, there was a huge replication effort of 30 different topics, and meta-analysis of the corresponding topics, and sure enough, meta-analysis is always overestimated effect size, so this, you know, replication effort will always have a smaller sample size, almost always, by using this method, they, 90% recovered actually very similar effect size to this replication effort, so empirically evidence for this method is, like, quite good, yeah. Other questions? So you're promoting registered reports, as far as I understand, you are laying off your ideas and asking other people to peer-review them, so is there any mechanisms that prevent reviewers from stealing your experiment design themselves? This is a great question, and I think, like, when I was a PhD student or, like, early in Korea, I would be kind of paranoid for no reason, but I think probability of this happening is very small, it didn't happen in, like, I've been in this business for 17 years, it's only happened once, but I think the benefit, so I'm not saying this won't happen, but I think that, you know, the scooping or stealing idea, I mean, it's very rare, and actually, putting, you know, your registered report once reviewed, or before being reviewed, you can put in an EchoEva archive, actually people thought putting a pre-print, you know, they would be scooped, that it's a completely opposite, you're putting precedence onto the public arena, so it's less likely people are still motivated, like, if they're already doing it, they should continue doing it, but, you know, when they see the preference of the registration, you can actually pre-print the first stage one, just the intro, intro method, bio archive doesn't accept it, but the EchoEva archive does accept this, and if once there, actually what would happen is to think, like, you know, people look at it and, oh, I had a similar idea, but, you know, I don't bother doing it, or, oh, this is so cool, I want to collaborate with you, and you get some collaborator contact, and that has happened nearly 10 times in my Korea, so if I'm doing, like, you know, I actually email, not the pre-print, but I email lots of people, I'm doing this meta analysis, I need some, you know, unpublished data, and, you know, people, I contact people, and then, you know, there's opportunity, I've been collecting similar data, can we do it together, and we can do it together, so I think it's more positive, I, you know, I've shown that, like, the power of each study is so low, we have to change the way we need to do science, I think. Problem is, who's going to get credit, you know? You know, that's not all roses, that sort of collaboration, that's nice and cuddly, but there's, you know, there's some problem remains, I don't have an answer for this, who gets credit, or, you know, what's the first author, last author, is that meaningful? Actually, actually, very recently, so, food for thought is, we published, you know, credit, how, you know, the credit system, yeah, so who did what, you are starting use credit, we are proposing new method called merit, and that will be published in nature, communications soon, so that's maybe answer some of this, how do I say, credit or acknowledgement, or who gets credit kind of issue, who gets credit for merit, yeah. What does merit stand for, but I leave it there. Any other questions? Tom? Thank you for your exciting talk. You just mentioned about the small effect sizes, and I was just wondering, like, do you have any intuitions of why the effect sizes are so small? Are we asking the wrong questions, or is the field just so advanced that all the big effect sizes are already done, or what's going on there? It's excellent questions, and, like, short answer, I believe that's what's happened, what Lata is like, actually, we are not asking wrong question, we are asking more sophisticated question, and the very good example is the GWAS stuff, so all the genetic genes of large effects being identified, and we all agree, and what GWAS studies identify is the small effects, but nonetheless very important, and that's probably happening across all the sciences. Indeed, there is a very interesting paper in a giga science which I reviewed and published a few years ago, is they look, using all those sort of, like, scraping from the web, and it looked at, I think, pavement all the data, and whether effect sizes are going down across the ear, and indeed effect size of main finance going down, and I wouldn't say this is looking at the wrong question, but I think what we are interested in is the most sophisticated, detailed, even though it's important, because it's, let's be, sort of, think about rationally or rationally here, I think if its effect is so large, we all know it, and a foreclosure should contain this knowledge, if this is so obvious, you know, some psychology one, you know, power poles, and all those sort of, like, you know, all effects are there, yeah, those, if they, like, you know, all the people would have figured out, so most of effects we are after is actually small, but doesn't necessarily mean that's not important. It kind of feels consistent with the recent news about the, that breakthroughs are happening less and less frequently in science, right? Yeah, yeah, yeah, but I think that's kind of making it more exciting, it's, you know, getting a bit harder, but, you know, there's certainly that opportunity, I mean, publication bias, huge effects, you know, I'm finding large effects there, and it's not, like, really the true scientific effects, but there is a certain opportunity, and I wouldn't disregard, there's something really exciting waiting to happen, like, you know, maybe disregarding second law, you know, macroecology, that's pretty big, isn't it? Yeah. Do you want to ask one quick question, because your time is up? Okay, so yeah, maybe back to the previous report, but I wonder if, so that kind of previous report can work for the hypothesis testing work, or if you want to work more like a generally just observing behavior of this species and want to get some body of data, even that case, if that works, so this question. Yeah, yeah, so this is a fairly common question I get, and my quick answer is, yes, it would work for the conformational studies, observational studies, because I think, you know, maybe like a narrow meaning of registered reports, you have to put like clear hypothesis, but I think, you know, the registered report or prior registration could work with just putting, you know, your aims, you know, what do you want to do and load it up, and it's still not many people does this, and not many people do this, but our lab, like almost everybody, regardless whether experimental or not, we do the registration and put the aims, and also I'm thinking like, you know, this is a lot of work, and you need to know a lot of stuff, because you have to write the interim method, you know, you need to know quite a lot about this analysis beforehand, so I'm, there's another project I haven't done it, but if I will do it, if I get granted modular registration, so you can, like you can probably split into the those registration into three, hypothesis for aim registration, and study design, you know, registration and the status kind of registration, because at the moment, I think the buy is so hard, high to do this kind of stuff, so we want to create modular registration to really encourage, including those observational work, because you know, you have a aim, yeah? I like the idea, so kind of related for our question, but do you think if everybody using this type of method, do you think the number of publication increase or decrease? Ah, this is good question, I mean, like, you know, number of publications always increase, so per person, I would say, per person, per researcher. Yeah, I think it's welcome thing, because as I said, my guess estimate is half of PhD thesis never published, even not in the preprint server, and that's a huge waste. Yeah, I haven't seen, Sean is the most shocking case of publication bias today, and that's getting, and this is not the first sort of example of ecology and evolution, and there's a more to be found out, and I think, you know, it's welcome if everybody has a more publication that's good, they're contributing, they publish all their negative results, it's a lot of work, less incentive, we need to change some incentive structure. That's another talk, I guess, but yeah, I think that that's, I mean, increase that it's kind of good, CB looks good, yes. So our time is up, so let's thank Shinichi again. Get on for a while, so if you have other questions, just knock at his door.