 So this is a new dataset for Buddhist transcriptional Chinese. So this comes from a project and the project was called Han Phenology when Chinese became Chinese and it's a project run at SOAS University in London funded by the Arts and Marriage Research Council and the PI is Dr. Urs Caldwell. So the project is about how Chinese was pronounced during the Han dynasty. It has various components and we will only be talking about the use of Buddhist and making some methodological remarks. I think we have two kinds of historical phenology. One is comparative where we reconstruct the uninterested languages by comparing related languages. And the other one is what I'm calling philological historical phenology, the interpretation of ancient documents, the phonological interpretation of ancient documents. And then I make this a little bit more precise, which I think there's no point in reading out, but the linguist in me wanted to make this a little bit more precise. When we do comparative linguistics, we have two tools. One is exceptional sound change and the other one is analogy. For those of you who aren't so interested in historical linguistics, never mind, it's just how I think because it's mostly what I do. But then for philological historical phenology, I think we also have two methods. There's less discussion of this explicitly in methodological terms. So I'm a little bit punting here, but this is how I am thinking of it for our project. So we want to figure out how Chinese was pronounced and that's what my capital C is. And I think we have two methods, one in position where we know something, how it's pronounced and we're somehow imposing that phonetic interpretation on the Chinese characters from the outside of the system. And then we have a transposition where we have an existing interpretation of some part of Chinese and we are kind of moving that interpretation to some other part of Chinese. So if you like, we get some interpretation on part of the data and then we can move that interpretation around inside the object we're studying. So just to give some examples as this was quite abstract. In polisers, we can use loans in and out of Chinese. So for instance, the word chariot, it doesn't come directly from Sanskrit, but maybe Indo-Aryan, we can identify the word chariot that we know the pronunciation of and the Chinese word for chariot and refine the pronunciation in this way. And similarly the word for tiger which comes from Southeast Asia. So I'm doing one, you know, Western borrowing and one sort of Southeastern borrowing. And then the thing, you know, that this talk is about is the transcription of Indic terms. So we have dharanasi and a kind of dish. These words allow us to identify, you know, how these Chinese characters were pronounced. Yeah. And then just to fill in the methodology, for examples of transposition, we have things like paronymastic glosses. So this commentator says, read this character as this character. So in this way, we have a reason to think these two characters were pronounced. Similarly, and then there's actually, of course, explicit linguistic annotation. Like we have these chains of fancier spellings where we look at Lu and it says, this is the initial, this is the final. We look up this one and it says this is the initial. So then we can identify this whole series of characters has the same initials, right? So the whole point of everything I've said so far is just these two moves of imposition of phonetic information from the outside and then transposition of phonetic information among different Chinese characters within the system. And if you like, our project is just trying to make these two moves with all of the available data in the Han dynasty in particular. Now on to the dataset and maybe this is where I switch to let Julian take over. Thank you. Right, so as we are collecting data from Buddhist texts, there's this classic book from 1983 by South Copeland that collects all the sunglasses from the Eastern Han, not just the Buddhist one, but today we only focus on the Buddhist one. This contains glosses and transcriptional data from three people. So Ansegal, famous translator of Chinese texts from Central Asia. Lukak Shema, whose name is reconstructed, we only know his Chinese name with Yuezhi and Kang Mengxiang, we don't know very much of him, but we think he's Soviet. Right, so this is what is contained in Copeland and we try to update these with more data. So we've added new text for Ansegal, so the literature has proven that more texts that were not listed, so Thainis Corpus should be ascribed to him or to his translated team, let's say. And there are also texts that come from a discovery of manuscripts in a Japanese Kungoji temple that have also been proven to belong to Ansegal's translations. For Lukak Shema, we added more words from the texts that were covered by Copeland, but that Copeland had missed. There are no new texts that we have added yet, but we know that there are all the manuscripts that we need to go through and try to retrieve transcription for. And Kang Mengxiang, we didn't add anything yet and I'm not aware that there are all the texts that have been ascribed to him, so that might be just the state of things for Kang Mengxiang. Since the literature tells us that it is possible that the source language of the translations by Ansegal and Lukak Shema is Gandhari, we're adding the Gandhari equivalent to the words that we have in Chinese. These words have usually been identified as matching some Sanskrit or Pali words and so we are making the work of adding either attested or unattested, so positive Gandhari words. And finally, we are adding transcriptions into different stages of Chinese so that we're able to compare the Gandhari, Sanskrit and Pali to what the Chinese characters would have been pronounced like according to Schuessler's reconstruction. Right, so we added some texts. This is the one we leave. This is the text I have been talking about that we had for Ansegal. These texts are the first ones all in the Taishu Triptaka. They were just not ascribed beforehand to Ansegal and now stillistic and linguistic evidence has conclusively proven that they are likely to be from either Ansegal or his team. And we've removed one text that was ascribed to him but now we think that this is a commentary so the Da Anpan Shui Ting is a commentary on the previous slide. So somewhere around here is the Anpan Shui Ting, I think. Yes, so in the congozi, the text that the Da Anpan Shui Ting is a commentary on was found and so it has now been concluded that this is a later text in Ansegal so this has been removed from our data set. This is our data set so far but it's still evolving. The biggest change is that adding the congozi mainly taken from the dictionary of Dr. Veta, we are more than doubling the number of transcription words that we get from Ansegal and we're using Dr. Hill and Natie and others look at Shema's data which contains those transcription words that were not covered by Koblin. And for Kangung Xiong, as I said, nothing new. All right, so maybe some observations on what we can see at this stage where we haven't really finished the data set but we can already see some things. So if you look at the Ansegal text, we see that for instance, Sibulant's match can vary better than Senpian Pali so every time we have a palatal shirt, this is written in Chinese with the shirt so we have like Shakyamuni or Shakyamuni and we are Shaliput for Shaliputra in Gandhari so it's match is there and for the retroflex we have characters that contain retroflex pronunciation rather than the palatal. So in three of the examples, Gandhari and Senpipu would be equivalent matches against Pali and then in the Shramana then here we have a retroflex on the Gandhari and on the Chinese. So that goes with what has been said about Ansegal that maybe he was using Gandhari text as a source. For Lukashima, we get a slightly more complicated picture. So some words match better in Gandhari and some in Pali. So on the first table, we have the word Achanda in Chinese which is more likely to meet Pali than Gandhari which seems to have lost the nasal before the D. Although in some dialects, the writing, the spelling does not actually show the end. So it might be that the Achada that is written in Gandhari he is actually Achanda. We need to investigate further. And in some other cases, it's more clearly that the source might have been Gandhari. So we have this word Shanti, Kanthi and Shanti which is written in Chinese as Chanti where here the spelling is actually slightly confusing but the Kshar actually transcribes the sound Kshar and so it's a better match for the Chinese. And then we have these words at the end, Lodawa that preserves the R let's say either of Sanskrit or Gandhari. So these are the initial observations that we have and in some cases for specific phonemes, it seems there's variation. So some words seem to match Pali better, some words match Gandhari better. And we need to analyze better like which text does it come from? Because it is supposed that Lukashima's translation team would have changed over time and that it's possible that as he worked with a different team over time, it would have an influence on the choice of transcription into Chinese. So in the first table, we have some Jher that in Gandhari became Jher. I have a typo there, this Avarajjida should be Avarajjida for the Gandhari column where we see that the two first line really are better matched by Pali. So we have the Kwakid that is probably closer to Vajira and Apalakidai that better matches Apalakidai rather than Apalakidai. But on the last line, however, we have Laya that better matches Raya than the Raja of the other ones. And the second table, we have this P and B that became Labial Approximant in Gandhari, so word for which we see the first word is again the same example as before Apalakidai that preserves the Pali and Sanskrit P but for the word Dipankara that became Diwangara, then here we have a character that clearly does not contain a P in Chinese but something like Hua so might be taken to be the Labial Approximant. Oh, and also note that here we're using Shistra's later Han reconstruction that has all of these Ai everywhere but it seems that at least the choice of character here suggests that this Yod had fallen by the time of the translations by Lokakshima and Anshaka so it should not be Apalakidai but Apalakidai. And that is kind of where we are at the moment. There are more texts that we would like to include in our data set. There's a text called Siba-Nili-Itsing so the 18, the classic of the 18 Hells that has been shown by Antonello Palembo to be free Anshaka so I think he shows that it's clearly Western hands so that would be even earlier and as much closer to all Chinese phonology that we need to integrate and then we need to do the computational linguistics on top and that is it. Thank you. Where can I get the data? We're about to publish an article with the data but as you might have heard from some of the comments he's still working on it but yeah, we hope to publish this sometime in May. But it's worth noting that an earlier version if you like of the data from 2020 is already up on Zenodo. Also you digitized all of the fetter and that's on Zenodo as well. How do you handle the phonetic uncertainty on the Indian side? I'm surprised you're worried about the uncertainty on the index side. We know that Sanskrit chakra, okay it was somehow chakra, right? Yeah, the phonetic details maybe are more insecure but knowing how the Chinese is pronounced is much more of a problem, right? So like for Gandhari we're reading things by stuff and bounds and doing the best we can. Let's take the old Chinese in the first example to the late Han Chinese in the second example. This represents the current state of the field. I mean, I think it would be science in general as a kind of spiral process, right? Where like Koblin looked at some things based on the understanding in 1983 and that had a certain influence mostly on Chinese and then there's a lot of other work on Chinese in the intervening 30 years and then every once in a while these fields that have been developing more or less separately like Chinese historical phonology and central Asian Buddhist philology need to get back in touch with each other. There is an element of circularity in this but I think it's a kind of good circularity because it's not purely circular. Yeah, like in those two equations like they feed each other. You're using unknowns to solve for other unknowns. That's clear. But it's a question of the overall coherence of the system. I think, yeah. Are changes in transliteration practice coming from changes in or of the source language or changes in Chinese? I mean, I really think we have both. So let's take the example that Julia mentioned at the very end. So using the Chinese terminology you have what's called the yubu and the gebu. So originally in all Chinese these were pronounced ua and ai, yeah. But then at some point in the Han dynasty and maybe around the change from Western to Eastern Han a changes to u and ai changes to a. So then suddenly like there's like a date like there's a date where when they hear indic a they want to write one set of characters and then there's a date later where when they hear indic a they turn to a whole different set of characters. Yeah, so that's a totally Chinese internal change that's influencing the practice of transcribing. But especially if we want to use indic evidence for some of these finer details in Chinese and I'll just give you an example that I'm very interested in, which is there's a hypothesis that all Chinese had a final r separate from final n and final yu. It's controversial, the data is mixed. Well, here you can't just if you look at dharma, yeah and you think it has an r there then you will look for how this r is written in Chinese but of course dharma didn't have an r there because of the middle index side. So yeah, so this is how we're trying to kind of really bring both into detailed dialogue, yeah.