 Felly, rydw i, ddim yn ei fawr o'r byw, a'r pwysig o'r cyfnodd, gyda'r ffordd o'r cyfnodd mwy. Rydw i'n rhoi James Wilson. Nid yw'r ydydd y rheswm y Dynidog Rheswm Dynidog Yng Nghymru yng Nghymru? Felly, dyma'n gwysig i'r rhaglenu am hyn sy'n gweithio'i cyfnodd ac amos y Metescience 2023. Yn yw'r ysgol, mae wedi cyfnodd, yn 2021. Rydw i'n gwneud. We've got a good line-up of speakers for the next session, and first up, I'm delighted particularly that we've got this initial slightly longer opening talk from Pierre Azule, Professor of Management at MIT Sloan School of Management and a research associate at the MBER, and Pierre is going to talk to us on the fascinating title of Who Stands on the Shoulders of Chinese Scientific Giants. Pierre. OK. All right. Good afternoon, everyone. It's siesta time. It's my job to wake up people. I'm going to try. I'm going to do my best. This is joint work with Shu Win Chu, who's at the East China University of Science and Technology, and once visited my group at MIT, and Claudia Steinwender, who's a former colleague at MIT, and is now in Munich. OK. So the broad motivation for this talk is the rise of Chinese science. And we can see it in many ways. We can certainly measure it in publications. We can measure it in the number of researchers. We can measure it in citations. We can measure it in the number of highly cited researchers. There are any number of ways that we, in some sense, don't need a microscope to see that there's something that fundamentally has changed in the global scientific landscape. But the follow-up question is to think about the extent to which researchers, in particular researchers outside China, actually end up building on the knowledge that originates within China. We want to ask how fertile is China's research, and to what extent do Chinese scholars actually contribute to pushing the research frontier outward. So we're going to do some sort of thought experiment in this paper, which is the following. We're going to think about two papers that were published at the same time in the same journal, and are somehow, somehow, hocus pocus, we've made them of the same underlying quality. They have the same fertility potential. They're not going to be in the same narrow subfield, so we don't want them to compete for citations. And the only thing, in some sense, that's going to differ between those two papers is that one is going to come out of a lab in China, and another one is going to come out of a lab in a different country, say Germany or Korea or Canada, specifically not the US. And then we're going to think about the extent to which Chinese papers attract more or less citations from a frontier country, which is going to be the US. That's going to be kind of like our neutral ground. Now, why do we pick the US? Because the US is large, so it's going to mechanically generate kind of a lot of citations, but also it is a prominent country in chemistry research, which is kind of the domain we're going to study. Okay, in the spirit of this is not a mystery novel, what are we going to find, and hopefully I'm going to be able to fill in some of the details, right? I'm going to show you that there's kind of a very obturate discount, citation discount that Chinese papers receive. It's about 30% in magnitude, so it's not small, right? And it's quite stable over time, and it's really unique to China, and I'll have any number of ways to kind of demonstrate that, right? And then what we're after is trying to understand why, you know, why is this happening? And we run through kind of a bunch of explanations, you could think it's about, you know, maybe there are still differences in underlying quality for those papers, we don't think that's the case. You might think it's not about quality as kind of God would see it, but it's about perceived quality. We also don't think that's the case. You might think it's a story about animus against Chinese researchers. We won't find evidence of that, and I think most of our evidence is going to come in a particular mechanism which is reduced awareness and kind of more shallow networks that kind of limits the access of Chinese researchers to the pool of potential citers in the US, okay? So this is very, very brief overview, and now I'm hoping to have a little bit of time at least to give you the details. And I guess I can see how much time I have. Okay, so we're focusing on chemistry, right? Why are we focusing on chemistry? Because this is a domain where, you know, China has, you know, multiple centuries or even maybe millennia of kind of research tradition, right? And even if you're thinking about science today, chemistry is actually one domain where China is relatively more preeminent. And so roughly the way to think about this is that China is the second largest share of top researchers at publications in chemistry, right? So we would think that this is a domain where to the extent that there is catch-up to the frontier, that would be one domain where that has been the case, right? Okay, so here's what we're going to do. We're going to look at very heavy publishers, right? The top 1% of publishers in a set of 31 elite chemistry journals, okay? And so it's the top 1% of researchers according to that metric, so just the number of publications, right? And we're going to eliminate all the US PIs from that list. And that leaves us about 750 researchers from all over the world except the US, okay? And we're going to get up close and personal with all those scientists. We're going to collect their CVs. We're going to stop them online, you know, like Google. We want to know everything about them, where they've trained, exactly what they've published, all their kind of employment spells, et cetera, et cetera. We're going to get all their publication, all the citation data in particular. We're going to parse the citation to understand where those citations come from, okay? Okay, and we are going to create a database of a data set of paired papers, right? So every paper that originates in China in some sense is going to be matched on a very long list of observable covariates with a paper that originates somewhere else in the world, right? So once again, same journal, roughly same number of authors, same age for the PI. And importantly, the quality, but notice the square, the scare quote around the term, the word quality, right? Meaning we're trying, right, we'd like to have those two papers to have the same kind of underlying potential, right? And so the way we're going to do this is by using citations from the world, exciting, exciting, accepted the US, right? And there's a little bit of an issue that I probably don't have the time to get into, which is that citations suffer from home bias. So China is a very, very large country with lots and lots of researchers, so it generates a ton of citations just by virtue of its size, right? And in some sense, if we compare a German paper with 100 citations and a Chinese paper with 100 citations, you're not counting the US, there's a sense in which maybe we think differently about the citations that come to the research that originates in the smaller country. So anyway, what we're doing is that we home de-bias the citations, right? And this measure of home de-bias citations outside of the US, that's going to be kind of our underlying potential covariate, and we're going to match the papers on that as well, okay? I'm going to skip that because it gets kind of in a nitty gritty, but what we want to model is the probability that a particular citing paper sites this particular paper that's in our sample. And so what we need are also counterfactual citations, papers that were not cited but could have been cited, right? Where do we get those from? We get those from the PubMed-related citations algorithm, which is something that people use to search the literature that's only based on keywords, right, and text. Okay. And then we run a pretty simple linear probability model, which we saturate with fixed effects. And what we're going to look at, the coefficients of interest, are going to be this indicator variable for China and the interaction of the China variable with stuff. And these interactions are going to help us maybe explore some of the mechanisms that might explain the results, okay? Hopefully, I haven't lost you yet, okay? So this is like a baseline model for our research in this kind of matched dataset. So this effect roughly corresponds to a discount for the Chinese papers relative to the papers from elsewhere of minus 25%. So that strikes us as kind of large in magnitude, right? There are kind of like missing citations, if you will. And so we want to explore channels, right? Could we make this thing go away by controlling for stuff? And you can think about communication, right? There might be language barriers, for example, that penalizes Chinese researchers more than others. There could be networks of varying depths. Maybe there's clustering in intellectual space across countries. Like maybe Chinese researchers specialize somehow in some stuff fields of chemistry that are not very popular in the US. And that's why we find what we find. It's possible, right? Maybe it's an issue of reputation, right? Maybe somehow, for reasons we don't fully understand, research that originates in China is viewed as less reputable, right? And we can think of reasons why that might be. And so our approach is to develop and to measure variables that can capture some of those potential mechanisms, right? And I don't want to go through everything. But the thing that I want to kind of double-click on is this issue of network. So we can measure a bunch of things. For example, we can measure the extent to which the investigators in our sample have US training, right? And that turns out to matter quite a bit, right? And then the extent of your network in the US matters quite a bit as well, right? So, for example, if the siting co-author is an investigator's past collaborator, that can undo, in fact, much more than undo the discount that we found. Now, there is the thing that I think is especially interesting, which we were not necessarily expecting, right? So, in this design, we kind of make China very special. And then we compare China with kind of stuff that's outside the US and the rest of the world. But if you think about it, this focus on China is entirely arbitrary. What if we made Germany pivotal instead? Or Japan? And the point is we can do that and we have done that. And what this allows us to see is that this discount is really kind of China-specific. There is no Japan discount, right? There is no German discount, in fact, there is a German premium, right? And we can do that at least for the countries that are large enough, right? Now, we went one step further and we kind of created a fake China. And what is fake China? It's a country that's composed of all the investigators with Chinese names that are outside mainland China. So we find them all over the world. We find them in Sweden and in Canada and in Singapore and in Taiwan and in Hong Kong. And we kind of agglomerate them into one country in some sense, right? And if this were a country in our sample, that country would be as big as Switzerland or Canada. So it's a pretty big country, right? And then we can make them pivotal and ask, OK, is their research discounted relative to research in other parts of the world? Right? And the answer is no. It is not discounting, right? The effect is approximately zero, right? So I think that kind of narrows the range of explanations that are possible. In particular, I think it's kind of really hard to think about this discount then as, for example, reflecting animus that cytos might have towards Chinese people, who are Chinese people, right? There's got to be something else going on. So the other thing I might say is that this effect has been relatively stable over time, right? You might think that this has gotten worse recently or this has gotten better as China, as in some sense, continue to make stride in research. And the truth is, as far as we can tell, it's been pretty stable since the early 21st century. Let's see, how long do I have? This is not optimal. Oh, I'm out of time. I'm out of time. I should stop talking. What? I have seven. So that means I have two more minutes. I have seven more minutes to talk. This is like mana from heaven, but I was not counting on. I was like speeding up. This thing is totally unreliable just to be clear. OK. All right. OK. So the question is this discount. Can we make it bigger? Can we make it smaller? Is it larger for some researchers and less for others? And once again, we can get at this by interacting our China variable with a bunch of covariates that can help us try to investigate the mechanisms responsible for the discount. So, for example, education matters a lot. We think that access to co-authorship networks matter a lot. One thing that might give you access to co-authorship networks and to conferences and to lots of things like that. We don't literally have measures for conferences, but what's a proxy for something like this is actually having done some of your training in the US. So here we're talking about either grad school or postdoc. But it's also important to talk about the stuff that we found and not mattering. So I think one thing, for example, that this community has been concerned with is replicability and scientific misconduct and things of that sort. And anecdotally, for sure, it's possible to find instances of scientific misconduct in China. And there is this perception maybe that is relatively more frequent in China than elsewhere. So one thing that we have done is to try to think about, it is the case that Chinese researchers operate and publish in fields that are relatively more affected by retractions. That might be perceived. I'm not talking about the ground truth behind those perceptions, but that might be perceived as being less reliable. And we can measure this thanks to the PubMed-related citations algorithm where we have a narrow subfield for each paper. So I think about, are there lots of retractions in this subfield or are there not a lot of retractions in this subfield? And does that matter differentially for Chinese researchers from other countries? And the answer is, no, it doesn't seem to matter differentially. We also find that co-ethnic US researchers, so those are researchers with a Chinese name in the US or researchers with a German name in the US, that co-ethnic US researchers seem to discount articles by Chinese investigators relatively less. Which I think speaks also to this question of access to the networks in some sense that allow research to diffuse. So I think it's probably not super productive to go through regression tables, but they are available in the paper. There are lots of regression tables, lots of robustness analysis, and you should avail yourself to all of it if you are so inclined. The other thing that I should say is that we also look at patents. We look at citations. Chemistry research, that's an area where very often scientists can not only publish but patent as well and we can look at citations to papers contained in patents. So we do that as well and we find evidence of a very strong discount of about the same magnitude. So it's true for both citations in papers and citations in patents. And the last thing that you might kind of think about is like do we care? Like those missing citations? Are they important in some sense? Or is scientific progress going to unfold in exactly the same way if those citations are present versus not? One very limited window into this is to look at whether that discount kind of applies to research that is published in high impact journals versus maybe journals that are kind of less. I know that in this community it's really hard, it's counter cultural to talk about a hierarchy of journals but we know it exists, right? We know that's how people think. So we're trying to kind of look at journals in different kind of impact tiers in some sense. What we find is that roughly it's basically the same, right? It seems to cut across. So the missing citations in some sense, they come from super prominent and elite journals as well as less prominent and less elite journals, right? Which makes you think that maybe the pace of scientific, the direction of science in some sense could be meaningfully affected by this discount, by those missing citations. OK, so just to conclude, we know that there's been a rapid rise of China in research over the last two decades. That's potentially a big pool of scientists that has been added to the global scientific community but just because you produce a lot of knowledge doesn't mean that others are necessarily going to be able to build on it. So we ask if research in China is as fertile as that of other countries and it looks like the answer is no. That there's a relative under-citing of Chinese science by US scientists holding quality constant, right? As best as we can and there are lots of caveats and if you want to explore more of them, they are in the paper. There are lots of potential explanations. I think the one that survives the battery of tests that we have performed the best is one where we think that reduced awareness and shallower network is kind of the key story here. In particular because we find that the discount is reduced for returnees from the US for focus researchers and eliminated even when you think about citing by co-ethnic PIs. Let me stop here. Brilliant, Pierre. Thank you. Spot on time and a fascinating presentation. We've got time five minutes for questions. Yes. Thank you for the presentation. Very interesting. Speaking to the network or the awareness hypothesis, I don't know if you can observe an undergraduate education. Maybe Chinese researchers that do their undergrad in the US, maybe they're more connected and that might allow you to extrapolate even further and say, okay, researchers doesn't matter that they're Chinese or not, but foreigners that they did not do the undergrad here, maybe they are as disconnected as your Chinese researchers. No, I think that's a totally, so we actually looked at this. I think for this generation of researchers, so for researchers that have grown prominent enough to be included in our sample, like basically almost no one did their undergrad degree outside of China or frankly outside of their home country. Like going abroad is like going to grad school, but I think for the next generation that's definitely something that could be relevant and that someone else will have to look into. We'll take one this side. I'm super-fascinating presentation. Darryl Tharabarelli, Changsha Cymru initiative, but also wearing a separate hat as one of the former colleagues of the initiative for open citations. I'm curious if you've taken into account potential biases or issues related to the indexing of the literature coming from China in databases, how the discoverability of these papers affects citation practices, how the coverage of citation. We know that there's like a vast asymmetries in how like citation data is covered for researchers of different geographies. So I'm curious if that's an element that you've taken into account. Right. So it's possible that there are things we have not taken into account, but our I think first order answer to this concern is we're focusing on citations in, I don't want to say US journals, those are international journals, but it's all English, they're all published in English, and they're all like they've all been indexed for a very, very long time. So I guess what you might say is if we were to include, for example, Chinese journals as well, one might think that the discount would be even heavier. I think that this focus on those kind of elite, relatively elite journals, 31 journals plus kind of the multidisciplinary journals, kind of implies that we should have pretty consistent and universal coverage of the citation data. Great. We've got time for these last two. Take you next. Thank you. Neil Thacker from the ALS Association. I'm wondering if you have thoughts or advice for an organisation that wanted to make a large investment in capacity to conduct research, because that's what China did. They made a large sustained investment. Then apparently they didn't get a full value for that investment according to this discount. Well, is that true? Is that an appropriate takeaway from what your findings are, and then what would be the advice for someone that wanted to grow a field or discipline quickly? That's a great question. My thoughts have been more... I've been thinking more about the global social planner than the Chinese social planner in the case of this paper. My sense is that it's possible for Chinese policymakers to look at those results and to say, we don't care, because what we are optimizing is... China is large enough, in some sense, that what we want is to have enough of a vibrant community on the scale of this particular country, ignoring what the rest of the world thinks. I don't know how they think, but I can imagine that some of the policies that they've implemented are consistent with this slightly more parochial view. The question that I'd like to answer is, what are we missing as a result of this undersciting? What are the rules not taken, in some sense? What are the discoveries not made as a result of this discount? I think that's still left open by the results I presented. Great, and we'll just take one last question here. Thanks. Very much, Wolfgang Gerson of Michigan State University. A comment first, there's a great study by Cassidy Sugimoto, who is an information scientist who has looked at citation biases from a country who likes to cite whom and who hates whom and so on. This might be interesting for you. The other thing that, for me, glaringly stands out, and maybe probably I'm wrong, but this seems like a typical geopolitics thing. We hate China because they're bigger than us and things like this. For me, the next question would be, have you looked at India? Because they are also in the population growth above us. Right. I just want to say two things. My fear is that all this story has been made more salient, and it's probably a bigger deal as a result of the pandemic and the latest geopolitical tensions. At the same time, most of the period covered in this data is actually not so much subject to those tensions and this kind of political undercurrent. Now, Indian researchers are very much in this data set. It turns out that they don't contribute. The reason why India was not singled out in this country is because basically it's too small in this particular data set. Despite the size of the country in terms of the highly published researchers that we start with, it has not yet reached a scale where it would make sense for us to kind of single out India and make it pivotal. But presumably in the future that's going to change. Very much. Great. Thank you, Pierre, for a great talk to start us off. Thank you.