 Hello, my name is Shinji Nakagawa and I'd like to thank organizers who invited me and also for organizing this awesome conference online. Today I'd like to tell you about future metanalysts from my perspective as evolutionary ecologists or evolutionary biologists. Okay, first, I'd like to acknowledge my lab members at the University of New South Wales in Sydney. Also, important co-authors and my colleague Will Conwell and the colleague Caroline. Both used to be at UNSW, but totally moved on to University of Florida as a new assistant professor. Okay, so first I'd like to tell you about metanalysts beyond the literature-based data and give you a really good example. You may be thinking it's like individual participant data metanalysts, IPD metanalysts. It's kind of close to that, but it's sort of like ideas goes beyond that. Okay, so citizen science is changing the way, you know, biologists correct biodiversity data. And many pro-biological and evolutionary biologists have heard of GBFGBs, the Global Diversity Information Facility. This is a meta database, so database of databases. There's lots of different nodes across the world and it's collecting millions of observations of species data every day. And one of that is node, database is EBOT. And for EBOT, my kids are even contributing and probably among the listeners, there's many people contributing. My kids, but you know, you can contribute through, you can get mobile app, EBOT mobile app, mobile app, and then you can put all the species you have seen by your birding trip. Important thing is you can put time and how much you walked about, how many people participated. And also most important, I want you to remember this is how many species you have seen and how many of individual species you have seen. Okay, so this is going to be quite important bit later on. Based on this citizen data set, we actually estimated number of species, not number of species, number of species, number of birds in the world. It turns out to be around 50 billion, so little shy of 10 birds per person in the whole world. And yeah, done by Corey, me, myself, and well, and this is not only EBOT data, but some survey data to estimate not only the number of birds, but the number of birds per species using some data imputation method, which is based on like how easy to detect color, flock size and body size and conservation status. And we were able to estimate this, this is, this is pretty amazing to me, this was not possible without citizen science data, but this is not quite meta analysis, which as you know it. And this is a global data analysis, so I will tell you about mechanized example from my lab. And, but before I go to that, I need to tell you about the second law of macroecology. What it is, is abundance of fancy relationship. This is from barbeque. This is at all paper and relationship is so this is abundance, how many individual you've seen, and how widespread the species are so each data points of species. So, idea of abundance of fancy relationship is widely distributed species are more abundant per unit space. And this, this is slightly counterintuitive or intuitive, I don't know, depending on a person, but this has a conservation and a fishing corner, some applied implication and sometimes it's used because if species widespread. They are abundant per unit. So you don't need to worry about those species. Also, you can actually get more fish for that species. Yeah. So that's an important relationship. And there's a reason why this is called the second law of macroecology because there has been a meta analysis. This is the traditional literature based one. So they got nearly 300 defect sizes and the correlation was nearly 0.6 so that is just a fish just transformation of correlation coefficient. This is a final plot. You might be used to seeing it like a 90 degree this way. What you can see is that this is zero and all the data points are sort of like, you know, most dense one is around like this 0.6. And this was done by quite a few years ago by Blackburn at all. And that's that. That's why it's definitely by far the strongest relationship I've ever seen in ecology by far yes. This is why it's called the second law. However, if you read related to literature, there's a disregarded hypothesis. The best explanation of this relationship abundance of fancy relationship is sampling bias and a sampling bias hypothesis, even though this regarded or rejected in the literature in the current literature. Widely spread species are easy to observe because it's a most widespread. So if you go around in the survey in the area, you see this first, but if you actually exhaustively observe that one area, this relationship would disappear. And also, maybe, you know, the in the current literature public published literature, there might be publication bias so people go on a sub in an area, maybe they're only publishing a strong correlation so overall what we saw in the meta analysis. It might be biased or like, you know, this 0.6 seems too high. So here comes a citizen's science data. How are we going to utilize that this citizen science that because there will be no publication bias if you use it all because people are not 10 citizens. Collecting you know millions of people are doing it even they are not concerned about you know whether this is significant or not. So we do this. So each these data points are checklist and this millions of checklist here, and it's included nearly 8000 species. And this is the example sweet checklist in the US Europe and this is Australia you can imagine the soul Corey goes out and birthing and he's. Beeders but 10 of them that's a good day and he was about 30 different species and each of them how many. And we can calculate correlation between local abundance, not the global abundance, but the range side because range side you can get estimates from GB for even. And then we can correlate so each checklist, which I talked about, we can get the abundance information occupancy relationship, we already know for these nearly 8000 species. So each checklist we can calculate correlation of this relationship abundance occupancy relationship, and then we can aggregate using meta analysis and this is called the final plot I get explained this bit later more in the next slide actually so it's a precision so number of. In this case species higher number of species you go and you put more effort, you get to this global mean and we are expecting this to be like around that of 0.6 above if there's a publication bias happening it will be smaller so that's the meta analysis we conducted and show you that. Result results is look like this it's a bang on zero so that's we were not expecting this at all because this is a second law macro ecology. We based this meta analysis on the effect size of the. Nearly 17 million correlations and this was based on the observation of three billion burst individual burst and overall effect is almost 0.015 actually turns out to be because it's we have a 17 nearly 17 million. Effect size and this turns out to be significant but I get to that point later, this is almost meaningless significance yeah but it's very close to zero. What's most surprising is some of you familiar with the ice square ice squares extremely small and this is probably smallest. I've seen an ecological meta analysis because it's mixture of different, you know, places species, what it is it's indicating is there's a body lots of variation you see almost all variations. Due to difference in sample size and as you can see so precision here as precision increases number of sample size increases so those are like you know, a couple of hundreds, hundreds of species of those are the I think we excluded smallest one so I think you have to have about 12 species at least or something like this. So we are observing just 12 species this correlation I expected to vary due to the sampling sampling error yeah. That's why meta analysis include this sampling error variance explicitly to account for this, but what surprising is almost all variation we see is due to the sample size that this actually really indicate this relationship must be close to zero. This is the second law and actually the original meta analysis blackburns meta analysis conducted a failed some number test to claim that more than half million unpublished knowledge though would be required to notify the effects of this magnitude for relation 0.6 but that's okay we have. So it covered because we have a 17 million not just a half million data set sample size. It's interesting thing is if you remember this some sampling variance hypothesis this is we disregard it in the literature. However, so this is a full time so the log one effort time is about three minutes log five is three hours and as you can add this is very hard to see this is 17 million data points. So, if you are observing very little time actually this effect appears little bit but it's completely goes to zero. If you observe three hours so actually this is a huge support for the sampling variance hypothesis, which nullifies the second low macro ecology. I'm wrapping up this part of the talk future meta analysis that data integration so putting different data together. So we already know literature based meta analysis now that's lots of archived the role that we're using role that this is the IPD meta analysis individual participant data analysis but now we can use citizen science data. Also, you can actually, you know, put together a different type of data, such as climate data and I quickly tell you the one example from our lab and this is a study by some feedback at our lab and she collected. This is the disease frequency so that you probably heard about the portal is affected by bleaching event. Normally, bleaching event they affected by different diseases that's boring and she collected about 200 papers on the frequency of disease last, you know, several decades, and it shows its frequencies increasing yeah, but not only that she was able to collect temperature data for this different studies it comes from you know all across different oceans. And you can, she was able to also show temperature significantly correlates or predicts disease preference so this is a percentage of disease. So, so this is what I call what I mean by data integration. So second part big data and meta analysis. Okay, there's a couple of different parts to it but those are a bit shorter than first part. When I talk to my computer science colleagues, they think like you know all meta analysis would be obsolete, you know, big data we just going to visualize analyze big data directly. Yeah, I actually personally disagree and they also disagree with this view Michael Chang and this is Anna Jack the boss meta analysts. And what they propose is so you have a big data here, but the rather than analyzing it to one go you can divide into chunk maybe that has like you know that big data heterogeneous and you can split by places split by yeah split by states or all sorts of things you can split by then you can actually calculate effect size and each you can meta analyze yeah. So this is a code split analyze meta approach and we have a certain use this approach and because of this approach, we were able to analyze big data and example big data we use is international mouse phenotype consortium. You may not be familiar with this but the whole data available online and it has more than 500 trades. 100,000 mice of both males females across 12 institution across the grow and using this data set and the split analyze meta analyze method. We looked at the sexual dimorphism in not the main trades but the variability of the trade so those distribution with the trade whether there's a sexual dimorphism or six differences and it indicates certainly yes. Another one we use this split analyze meta analyze method is we looked at sex difference automatically and this particular study I want to tell you about. What is the geometry, so let's say this is the female male. This is exactly same automatic relationship and it's a log linear user relationship your body size increases your eyesight increases your body size increases. You don't move as much as this is a wheel running. So, another one is this is actually automatic relationship different but the mean trades overall mean the same. Madam female has a two groups groups have different elementary and this case means different and slopes are different. And there's another thing so we can measure so differences mean difference is a slope that's elementary and the difference of the visual variability. So, actually those means and variability are done in different studies those are the sexual dimorphism relates to sexual dimorphism in the mean those are variability so that was a fast paper so we are most interested in the slope differences. So, we used nearly 400 phenotypic trades for each we got effect sizes, this is a split bit split by phenotypic trades, and we met analyze functional groups so we conducted nine met analysis using nearly 2 million data points from many mice. And this is what it looks like and the most important this is a slope on this is a meta analysis of the absolute differences and made on female know what you need to pay look at this if it's around zero. So, made on female as a similar but if it significantly deviates from zero. These slopes are quite different and you can see me know what it's a lot more different between man and female also behavior but I explained this is the pattern the standard explain the implication. So many cases many trades not all the trade many trades slopes are different so what does it mean. So, this is a scenario like this is the beautiful picture. You know drawings done by she met draw many you saw him in the first a second slide in acknowledgement. So, men and female exactly the same and those three trades, you know, three trades among those many trades we looked at fat tissue retina clearance rate and metabolic rate if they're exactly same size and on average you can give the same dose of drugs that's no problem but that's not true. Usually female mindset smaller and the people often. How do you say assume. It's a, you know, perfect scaling there so you can stay those three different traits as well. Same rate as you know people assume the female small males. In such case you can just give two pills rather than three pills but our studies are metric difference between sex indicates it can do that body size the same proportion scale but those different trades. Maybe relates to the drug metabolism scale different with between men and female slopes are different in such case if you just use male scaling slow. You might give two pills that's a overdosing female it's about for female, but if you understand the female specific slope you'll be giving. Right amount of pale. So this, you know the giving overdosing or under dosing female happens in a mice and humans because when they test drugs, they're only using males or human male subject and we need to change this. Yeah. So big data meta analysis, hopefully I convinced you this is really useful approach. And the last section I'd like to tell you about. So last two sections were about you know we can use all sorts of different data not confined to the literature based data, but how we do meta analysis are changing as well, just a couple of example. This paper came from evidence since it's a hackathon that is a predecessor of this conference, and the feature I was involved in, and there we talked about this new ecosystem for evidence since it's currently. There may be two meta analysis or evidence synthesis there's empiricist and he has a synthesis and there may be different people the same people but the what's happening in the empiricist some of them they don't publish. It's not translated to the primary research and we were only able to synthesize primary research and it leads to bias the view of the evidence synthesis because we are synthesizing biased evidence base. It's you know systematic map or you know meta analysis or qualitative analysis what we propose the future is not only just you know we should make the empiricist or synthesis that involved we should make a community of on a topic all people working because regardless of whether they publish or not we can actually because they are all part of the community they can contribute to the all the synthesis primary literature regardless of their contributing that we can synthesize all the effort and this will lead to the unbiased evidence based and we should make it all open open data open code and also use should use preference then it's all open to the. The public and stakeholders and the final I quickly touch upon this hot topic, you know that I wrote blog to the our labs blog page and. I use chat GPT to see whether I can use this for the title of Scott abstract screening and not full text screening, but one topic we were, I was able to get very, very good to result and I was really impressed so I wrote this blog but. I think the next five years use of AI and in the evidence synthesis will increase its presence and I wouldn't be surprised. In the sort of near future, it can do the older screening and also all the extraction of moderate that if excited my big difficult but I could we could be surprised. So all those things are changing and it's really exciting futures coming I think so take home messages from my talk is it's really bright future by combining different type of data which. I called the data integration and the meta nice have a critical role to play in the era of big data it's really that the rich data rich. era with few or a little theory and then you know you can use meta analyze the generating series so you know, this will keep our as meta analysis very busy. And so we talked about, you know, at the toward the end community based synthesis and I will change the way we summarize evidence base. You know that that's pretty exciting and finally, I like to thank meta analysis you audience and the future matter. I think everybody should do meta analysis the conclusion of this talk and thank you very much for listening.