 For today's talk, our speaker is Professor Jonathan Smith. So Professor Smith is the associate professor of Chinese and director of the Chinese studies program at the Christopher Newport University. So his academic focus is early Chinese language and writing. His newest publications turn to the historical phonology of the South Eastern main languages. So today, he will talk about the new digital resources for the study of historical and comparative main. And I think this is part of his reason the project, right? So Professor Smith, please. Yeah, thank you for the very nice introduction. Thank you for the invitation as well. It's a tremendous honor. Hope I can say some things that are of interest. I mean, I had thought for a while about trying to do a more technical problem. But maybe it's actually interesting to think about just the resources that are available. Largely, it's stuff that I've gotten from various sources. It's not original with me. But the goal is to sort of trove around for material and make it accessible to a broader audience. So you guys can help me think about the progress that I made and how it should best be sort of utilized, leveraged, and shared. I'm not an authority on mean dialectology or on historical finality, really. I've come to this subject area a bit randomly. Originally, I was working mostly with early writing. But more recently, I've gotten interested in a historical finality. Trying to collect and organize data is something I've been spending a lot of time with. But off and on, I started on a project to sort of see how productive it is for me and then off and move on to something else. And then as far as my work inside historical mean, I suppose it's sort of chaos agent or disruptive, sort of seeing some things that maybe don't make sense and pointing them out, whether in the long term there's going to be a productive enterprise, I'm not very sure. But the idea is to point to the problems and try to spur us forward a little bit. When we talk about a proto-mean or common mean, to be real, we haven't made maybe a whole ton of progress since Norman's work in the 70s. That's not to say there's not a lot more data, a lot more ideas, there are a lot more ideas. But ultimately, the structure of proto-mean is what it was in 1973. So is there a way through or a way forward to more progress in this area? So I'll show you guys just some material that I've collected. We can sort of discuss together. This is some mean point. We'll see the specifics momentarily. But the points that are relevant to, for example, Professor Akatani's book from 2008, that talks about northern mean, the points that are relevant to his book about a very similar parallel book about Ninda, part of eastern mean, and then the southern mean points that are the emphasis in Gobi's book about recent, 2018 book also about southern mean, for a lot more data out there. But the stuff that I have prepared initially in my presentation is relevant to these points. So generally, if we zoom out, this is sort of the spread of mean as part of the spread of synodic languages and cultures to the south over the course of 2000 years or more. The nature of this process is still a bit blurry. It's not entirely clear. Often the assumption or the natural assumption, the more traditional normal assumption is that that there's sort of a Chinese colonization to use a problematic term of this area and that indigenous people languages are largely acculturated, displaced or acculturated. Maybe that's not, maybe the picture is in fact richer than that. At that kind of time that it's possible that the process is more bidirectional, that there are relatively complex southern cultures that are present in this area when the Chinese language has sort of expanded south and that we have a kind of a long-term cultural mixture bilingualism in this area. So in my opinion, we have to be a bit cautious approaching mean data. I suppose it should be unproblematic to say that mean is synidic, but when we look at the colloquial layers there's a lot of material there which is not necessarily synidic in a word. So looking at any individual item, it's good to, in my opinion, it's good to sort of remain agnostic initially rather than instantly connected to some apparent cognate in middle Chinese or in other Chinese languages. Some words that appear at first blush to be clear cognates, turn out not to be on closer examination. So to some degree it's good to sort of treat mean data as its own thing and then maybe think of some later stage about what's the nature of the relationship to other Chinese or to other languages of sort of Southeast Asia, old Southern China, Southeast Asia. I guess I'll show you quickly what Norman's protamine looks like, but we do not have to get into detail about this. Maybe everyone is familiar. Yeah, so I don't have a sense for, to a degree, everyone is a more expert than I am in these areas. It's possible everyone is hugely familiar with this system but this is actually not exactly like the papers Norman published in the early 70s. This is a manuscript actually, Professor South Coblin gave me this manuscript from 1974. It's very similar, but it's not identical in all respects to the protamine of Norman's 1973-1974 papers. One of the projects that I was trying to work on was digitizing this manuscript. It's all typewritten pages. It's slightly more extensive in terms of the cognate sets when you compare it to some of the published work. So it would be very nice to, I was thinking I could do it very fast, digitize it and make it more accessible. But these projects turn out to take a lot longer than you anticipate that they will. So at some point, hopefully be able to create a more accessible digital version of this particular manuscript which was never published. At any rate, you guys, I'm sure everyone knows there's weird typological features of this protamine system. For example, the voice aspirants. We're used to these Chinese systems where devoicing and register of genesis appears to happen in a more or less principled way where you have early voiced onsets that are devoiced into aspirants, not aspirants, but in a principled way where we can talk about a complementary distribution like the Mandarin situation is aspirants in the old level tone versus non-aspirates elsewhere. But the mean situation is different where we have these modern aspirants across the tones. So the usual generalization about mean is that there are non-aspirates that most of these old voiced onsets are gonna be non-aspirates across the tones. But there is this relatively small set of aspirants across tones which compelled Norman in 1973 to reconstruct so-called voiced aspirants. For our purposes, this is just sort of a place holding convenience. People find this to be typologically impossible, but it doesn't particularly matter. At any rate, it seems to be a valid comparative category. Of course, later on, Norman's going to change his approach. The reasons for why he ultimately switched to a more common mean type frame where maybe aren't essential to consider. But this is the initial idea. And my sense from people who work closely on comparative mean now is that this is the preference when we compare it to Norman's later adjustments. Common mean system is to retain these special categories. Most importantly, probably being the voiced aspirants because this class sort of unites mean as diagnostic in mean. We can create cognate sets of modern voices aspirants across all the lower level tones. And then, of course, softened onsets. The details here are interesting, but maybe less important for our purposes. Modern northern mean languages still have voiced onsets that often seem to correspond to simply regular voiceless onsets in coastal mean. So there seems to have been some other kind of onset. Exactly what it was. Norman was a bit non-committal, ambiguous about. But this is how he represented these special sounds that are going to account for sort of anomalous voicing in modern northern mean languages is the sort of hyphen. He says on several different occasions that probably this was some kind of the segmental depletion of the left-hand side. So there was some kind of complex onset, but he prefers not to speculate about what these particular segments were. Actually, he says the same thing about voiceless aspirants but prefers to hue close to the modern values when he goes to sort of choose a particular representation for the protomine system. Actually, part of what common mean later does is hue even closer to the modern values by abandoning, yeah, by just going back to simply voiceless aspirator and voiceless sounds and forgetting about register genesis entirely. But probably the best reference point if we're thinking about what protomine, common mean was like is this earlier protomine system with the voiced aspirants, what the so-called softened onsets that are gonna be both voiced and voiceless. And among other things which are of some interest, but these are the main sort of long-lasting, persisted controversies in common mean. Sorry, I'm trying to keep a little bit of track of my time. I don't want to not have a chance to show you guys the sort of historical material. This is not that important, but this is just illustrative of one kind of problem where we have these, for example, on the left-hand side in coastal mean we have these doublets where we have a non-aspirated and aspirated member and there's a temptation to see them as a protomine phenomena or a very early mean phenomena. But it's not clear exactly to what extent that's the case. When we talk about these coastal mean examples, there aren't good in the mean parallels, but there are some who are not cognates for the aspirated cases. So probably if we look on the far-right-hand side, on occasion, a norm can be found to speculate that this thing goes to some kind of protomine form, which would have to be a voiced aspirate. But probably there's no need to regard these items as going back to protomine. However, the middle case, the inland mean case, more controversial or it's harder to tell whether or not these reconstructions to the protomine softened and plain values are valid or not. Inside of northern mean, we also get this kind of doubleting where we have what looks like, in this case, what looks like a verb and what looks like just a post-position. Is it relatively recent inside northern mean or is it something that's historically deeper? I just mentioned this to give one little illustration of some of the problems that you weren't into when you're trying to figure out to what extent, how big these categories are. So in some of my papers, my inclination is to consider these categories to be smaller than others. So I think, for example, the protomine softened category exists but is much smaller than is presented even in Thorin's papers of the 70s. And then this voiced aspirated category also seems to be comparatively valid, but there are many trick items that up here, if you just look at coastal mean to belong to this category, but perhaps deeply, aren't gonna be present as a protomine instead are some more recent phenomena inside of coastal mean languages. Anyway, we get some sense for these complexities when we go and try to figure out which items, which are, which of these cognates sets or apparent cognates sets are really traceable to protomine level. That's all I put inside this presentation. Instead, we can just, I'll share with you some material that I have, where the correct answers to these customs aren't so important. Instead, it's just a matter of thinking about the data that's available and maybe being able to leverage it in new ways. It seems that, I mean, compared to 50 years ago when Norman is working, first working on protomine, I mean, there is dozens of times more data available than there was then, if not more. So it should be possible to discover new things or at least to render Norman's hypothesis more precise in many ways, but it's not necessarily easy to access, manipulate, leverage the data in the ways that we want to. Actually, I'm by no means super skilled at doing stuff like this. I'm sure there are many people out there, perhaps in the audience who will instantly be able to make more of the material that I am able to. But hopefully in the very near future, everyone will have access and you can make of this material what you will. And, you know, for example, apply computational sorts of methods to it. I'll show you Professor Akatani's book from 2008. He has many similar works. The amount of work he's done in sort of documenting mean and other modernized languages is mind-boggling, completely staggering. I mean, even if we just look at sort of, this is the sources for his, for the section is his, you know, lexicon, northern mean lexicon sort of comparing key items across various northern mean languages and in some cases comparisons that are more distant. I mean, it seems as if he's produced as much dialogue, sort of mean dialect material as anyone else or as much as everyone else combined it sometimes feels. And anyway, there's a ton to be learned from books like this. This is about northern mean in particular and focus is on three particular points they are here. And wait, I want to zoom in but I have to move my windows around. So in Northern Fujian province, these are the points that he's focused on in this particular book. Shibay, Shibay in Google, Shibay in Norman's books, sometimes Romanized differently in other works. Jensie, all the villages in Northern Fujian province, this one, Diko, these three points are the focus of this particular book that we're looking at right now. So one kind of data to move on a little more quickly, one kind of data is this, the sort of a homophone character collection, massive amount of material collected in this form and very familiar to Chinese dialectologists. But from sort of, if you want to make use of it and do sort of automated comparison across this kind of data you want to present it in a different form. So I'll show you. In general, I'll try to keep these original sources on the left hand side and I'll show you on the right like what I've been able to digitize. So this is, of course, each individual dialect point in this book, Akatani's book 2008 about Northern Mi languages will have an equivalent to a sort of collection of homophone where the individual syllables slash characters are organized into homophone groups where Ryan is the big level organization and then all the onsets are presented beneath and then inside of each onset category you have it broken down by tone and so on. And so for example, lul on my left, my right hand side over here. So this is more or less the Tonian Suhui from this 2008 book. Generated initially I tried to generate from this copy of the book that I made. Subsequently, a professor Akatani sent me clear digital files, which are still in that basic same format but are easier to manipulate. So for example, this is the same material that's in let's try to show these in parallel if I can. Same material that's here on the left but presented in sort of spreadsheet type form where the syllables are all here, it's broken down. I split it into onset final tone is here and then the characters. And then now we have a clear sense of exactly how much material is here which is a ton. So we're talking about sort of near on 4,000 individual syllable slash morphine slash characters inside of this particular collection. This is, this particular file is just Suhui and then the other two are also here. I guess one thing that I've been thinking about as I do this is what sort of, and you guys of course will have opinions about what's more or less useful, what kind of a final product is mostly useful. Initially I wanted to do sort of more immediately and for example, combine all of these tones into one single file and start aligning cognates or aligning similar character. You can of course do that relatively easily when the data is in this form on the right hand side. However, that begins to get further from the original shape of the data and maybe I thought it's better to leave it in forms that resemble the original document. So this particular Excel file is sort of a digital version of certain pieces of Akatani's book from 2008. And the three different dialect points are strictly separated rather than being combined into a single file. So individual people, if you get access to this file, you start using a file, you can proceed according to your own preferences and how to organize or how to think about this material. I guess a question is what's the status of some of these items? So it's 4,000 somethings, but it's not necessarily obvious that these are, for example, words of the colloquial dialect of this region. So one of the things sort of pseudo philosophical questions you have to think about in dealing with this material is what am I going to take in a scar? What is gonna be valuable in terms of doing comparison and reconstructing protoforms and what is potentially gonna be less valuable? I don't know the answer to the question. I mean, I certainly know I've run into problems where such and such a form turned out to not be as meaningful as I expected it to be, but you really have to have a very close understanding of this particular or one of these particular varieties in order to know for sure, is such and such a word proper of this particular variety of Northern mean or is it part of a word? Is it at least a morphine that exists inside of certain colloquial vocabulary items or is it a non of the above? Without sort of a some expert guidance for professor Octane or other, it's gonna be very hard to make these kinds of distinctions. So for example, one way you could sort, I had just toyed with this idea, but you could sort for example by, you could find all the items that don't have written characters associated with them. These are necessarily stuff from a local colloquial and you can easily sort them on that particular parameter if you're using this Excel file and then suddenly you have a collection of potentially more interesting items that you can use to go and compare to other Northern mean or to other mean. You can also sort out what items are provided with particular special glosses and examples. So there's obviously a lot of overlap between the words that don't have an associated written character and the words which require some additional explanation or gloss or context. So I started marking these items that have no character or are glossed in some particular way. And it seems to me that if you wanna do sort of more detailed comparison across these three Northern mean varieties or other Northern mean or potentially more broadly across them, but these are gonna be items that are gonna be of most value of most use. However, yeah, for now this material is just here. And then precisely how to use a laboratory is a matter for us to discuss or for individual people to decide for themselves. But at any rate, this is here as a parallel version of certain, of this particular aspect, this particular feature of this book and other of Professor Akitani's books. This section, these tones that we are attracted because there's so much material here. However, it's not necessarily easy to pick through. So another valuable section is real lexicons. This is sort of always an issue in Chinese dialectology is are we gonna have lists of sort of character readings where we're not totally clear about the status of individual items or are we gonna collect lexicons proper where we have real words of colloquial languages. So this particular piece collects, it's a lexicon. The downside is not, of course, everything relative. This is of course in itself a sort of mind-boggling project. However, if you're talking about sheer quantity, it's just sort of 600 atomas opposed to sort of what seems to be 4,000 different items that we could sort of use automated procedures to compare across mean languages and potentially sort of see more, see more deeply. Here we have a smaller set of more substantive, shall we say, items, but the first thing that we really want, that I really wanted to do was translate the 600 item lexicon into English. So I'll show you where I did that. We're still doing this in a principle kind of, I see. So this particular list has 600 items and he's used it in multiple words. Initially, he was working with a somewhat smaller list. Some earlier books are like 480 item lexicon list. Do this by open or no. Yeah, maybe. Let me close this. You can see clear. This is the 600 item list in English. I'm sure everyone will instantly see things that I made mistakes with, but this is an attempt at translating the 600 item list. So people who don't necessarily work with tiny languages can have access to this material. In traditional characters, simple like characters are hidden here, but both are there and then trying to have a gloss. There are different ways to approach doing this kind of translation. So now, of course, there are translation tools that can do this stuff very fast. I've tried it that way with, for example, chat GPT and stuff. You can translate that way and it seems to save a lot of time. You feel as if you're doing less typing that way. However, unavoidably, you still have to check and rewrite every single item. There's probably not one single item, but maybe there's one. There's a few items where sun is correct or whatever, but in general, you're still going to everything and confirming that it's accurate. So ultimately, automated translation tools save time. I think they do, especially when we get to bigger files. So if you're dealing with a dictionary type of file where you have tens of thousands of entries and probably a procedure like that will ultimately save time. However, you'll inevitably be introducing errors and you'll have to look at every single entry and confirm that it is actually correct. I'm sure I haven't done a perfect job with this, but you get an idea of what this looks like after it's translated. Yeah, so this is the initial step, is make this lexicon English, done. And then I'll put it into the northern lexicons, also into, we'll see how it's like that. The same applies to other books, but this is the digital equivalent, the Excel file equivalent of what we're looking at. So left and right are, I hope they are, I mean, perfect is probably an exaggeration, but tries to match this material on the left-hand side as close as possible. So again, it has, this is of course the Mandarin words equivalence. And then the three points, nor the main points, the enrolmentized. So for sort of phonetic detail, details about tonal systems, you have to go back to the book, but at least the material that's presented here in the context of the 600, I have the lexicon is all here. And then I kept all this material represented in, so this is the dialect forms represented in Chinese characters. However, there are special things about it. It's not always clear, it's far from always clear what character is most appropriate to represent some particular syllable. Often it doesn't really matter. So there are lots of cases where he's used homophone characters and has marked them in various ways or has just simply left boxes to indicate that no homophone character or sort of etymologically appropriate character exists. So this is here for our reference. However, maybe it's in some ways it's safer and clearer to just focus on the romanized forms. Anyway, you get the idea I think of what this looks like. It's not completely done. I want to extract the notes. So there are also little notes associated with the various of these items. Here they're in red text. So it's not going to take a minute but I need to separate these columns again so that the notes are retained. Perhaps the notes are even translated and you have a file that captures all the information in this 600 item lexicon and it adds English. That was the idea. So it's, I mean, my sense or my aim and my sense is that this is fine if you don't read Chinese at all. You can look at the English definitions. Hopefully they're largely accurate. You can look at romanized forms of Nordamine and learn a lot about these Nordamine languages without having access to the original book. This was the idea. Of course, it goes without saying that the three columns are not necessarily going to be cognates instead that they are the local ways to say these various things. Whether or not their cognate is a matter of examining correspondences. But at any rate, the idea was to capture this material in a way that's more accessible. So I guess what I want to do is share this back with Professor Akutani and that I'm sure he'll be totally fine with sharing it more widely for people who are interested in looking at using it or potentially correcting it. Hopefully there are not too many errors. But that's the idea is to in the near term make this accessible to people who are interested in looking at it who might not necessarily be reading the original work. Let's speed on. Close. This is, we don't need to really look at that. This is more recent book, 2018, on a subset of Eastern mean languages that structured in an extremely similar way. So I did the exact same thing with this material. I'll just show you the, did I put it here? Yeah, I did. So for example, this is the same 600 item lexical, but in the NINGDA languages, the three NINGDA points instead of the northern mean points. Same idea. Again, as tempting to just combine all this material, you wanna kind of have all the words together at one spot. But for now I prefer to maintain these sort of discrete files that reflect particular published sources as precisely as possible. The English list is the same and it's just pasted in here. The manual and English lists are gonna be the same, the same 600 lexical items. And then the local ways of saying these various things. We can see in many cases there's more than one possibility and so all these are listed here. And I hope are aligned correctly and accurately reflect the original work. Again, there are notes that need to be extracted. There will be a variable value in meaning, I guess, if you're working only in English, but nonetheless we could extract them and translate them and create a database that reflects the original as completely as possible. That's the idea. Okay, so the southern main points. This is Gobi's Professor Gwak's book about southern mean. So for example, there's a very similar section of this book. Let me see if I can find it either. For example, if you're just thinking about data that you want to this data, of course you want to read and understand more deeply a lot of the particulars, but this would be nice to have in a digital form. Perhaps Professor Gwak has it in an equivalent form, but I put it into an equivalent form and then I'll check back with you about whether this is something that is redundant whether the material he has or can be shared more widely or what, but the same idea. He has English here, so. And in general, these are words. I mean, he's working on, this is like a lexicon. So there are a few two-syllable items in here. There are also Chinese character renditions, which you always want to take with a grain of salt. However, the final product is comparable to the northern mean product and the Ningda product that we were just looking at where you have an English golf, you have robotized forms, you have character representation. And then he's given the pro-southern mean reconstructions, his new work from just recently, 2018. So again, using all these resources together. Yeah, my sense is there's more to be, for example, one thing that you would want is, ideally we have a bigger group of items that seem to be shared across me or at least at some minimum number of points seem to be shared. Combining all of them, it seems like there'll be more than Norman was working with. So potentially enough to enrich some of his ideas or maybe, for example, in the case of voiced aspirates to study particular words in a more detailed way and reach more perhaps more substantive or more thorough conclusions about the forms of some of these words at early periods. Yeah, I don't want to go much beyond an hour. So let me just move forward to other material. To me, this is the thing that most needs to be digitized and studied, but it is so hard. This is Carstairs and Douglas's Amoy Dictionary. It's possible that it's become hard, I think it's become harder to even grab digital files like this, even though they've been out of copyright. This is from 1873. I grabbed a couple of different versions of this and they are not all the same. Some are missing pages, so I've sort of been careful to combine pieces of different ones to come up with a version which I think is fully representative of the original. Later, a supplement was also added at a later date by a separate author, which is included in this particular version. At any rate, a massive piece of work in terms of the nature. Yeah, so one thing that comes up in the context of or in association with this particular dictionary is, well, such and such a word doesn't exist or doesn't seem to exist in modern Southern main languages. So can we be sure that such and such was accurate? How confident can we be that stuff in this book faithfully represents the language languages of the time? It's hard to say, but I mean, he does say in the introduction that there were many points at which he was told that something was in fact removed it from the book, but subsequently was told by some other informant. I know that that is perfectly valid and please re-add it to your collection. And then, you know, he sort of went to multiple iterations of this book, adding or removing, adding or removing such that we can have some degree of confidence that the items here reflect the language of the time. And actually, as maybe you're familiar with this book, this is does not only reflect, he was working primarily in Amoy, in Siaman, in Southern Fujian province, but it reflects not only that variety, but also other nearby varieties. Most prominently is in the title of the book. We find the title of the book reflects Chengzhou and Zhangzhou forms. And it's not a small amount when you start going through the dictionary. It's not like there are a few comparable forms from Zhangzhou, Chengzhou. There are thousands and thousands. So to be able to digitize this and use this in a more convenient way would be invaluable for the southern mean in particular, but also mean more generally. I can show you what I've done to this point. I thought I could finish it by now, but I couldn't. Someone had better not have done this before or I feel like it was a lot of wasted effort, but it's an utter nightmare to digitize. I'll show you just sort of the entries. So if we go down into the entries, this is in some random order that it was in when I was manipulating it in one way or other, largely alphabetical, but we can see there are tens of thousands of individual entries. So this is sort of, the file is not done, but you're gonna sense for the vision of what this thing should be. These happen to be, so the formatting of the book is rather complicated. And sometimes there are various indications of our example. The corresponding, so-called literary pronunciation of a particular word, or oftentimes the same at among in neighboring varieties, most often, Chengzhou, Jiangzhou, but also several others. But these entries that are here are the ones that are sort of straightforwardly formatted and allow me to just simply bring it over without having to mess too much with the order of the information. And maybe we get the idea, but using OCR in this text was a true nightmare. I mean, you can do a lot now with, even with Adobe Acrobat and things like that, you can do a lot of very oppressive optical character recognition, even on Chinese. On Chinese, if the file is clear, it works remarkably well. It does not work well, needless to say for diacritical marks. So training systems to do this in a way that was more or less reliable took a long time. And you can see there are still typographical errors. However, largely systematic errors, such that once I get this in a more organized form, I can take care of a particular kind of error at one stroke. So anyway, my feeling is when this is done, it will be a typically useful resource. And I don't think it's that far. Probably 95% of it is in a form that's sort of usable now. Other pieces of it require more work, you know, you have to. So these are complicated lines where you have different kinds of information presented together and I want to organize it. So the goal is, for example, you can see that the goal is to align corresponding pieces of death. So the R means literary pronunciation. That information actually is not necessarily already useful if you're just studying comparative mean. However, it's there in the book and it's useful potentially for other sorts of research ends. So there seems no reason not to preserve it in a digital version. So the idea is to preserve R and align them. And then gradually to preserve and align all the other dialect points. So for example, have Trenjo all aligned, which will be many thousands of Trenjo forms according to this particular dictionary and so on. So yeah, we'll see how long this takes to get done, but at least you get a sense for where it's heading, which is a digital version that reflects this dictionary of 1873 fairly faithfully. Yeah, it's another situation where you could, you could simplify it in various ways in order to create a sort of simpler, more coherent digital file. But my thought for now is to preserve as much of this information as I can. Keep it in the same order. Of course, number each individual entry so that you can flip it around and not flip it back to its original order. That's the idea for Douglas's dictionary. Let's see. Yeah, so I can just show you these last two things very quickly. This dictionary of Chanto, I mean, there are many early sort of dictionaries, resources, lexicons of mean languages. This one is nice because it's very colloquial. The author says as much where he's focused on spoken language. And someone started doing this in a wiki book. It's not done thoroughly. So it's possible I can use that and complete it. But at any rate, this is also a Southern mean language, an early 19th century reflection that ought to be digitized. I've started it, but haven't gotten very far. Probably there's not much point in looking at a digital version. But at some point that would be something that would be nice to have done. I'm missing something crucial. So for example, if we go to sort of whole, oh, this is what I was thinking of opening. I'll show you that shortly. So published in Taiwan relatively recently, we had sort of early 19th century missionary articles where they were publishing newspapers in various forms, disseminating them originally written in Bayway-Z written in Romanized script. So stuff like that. This is a book that I photocopied and OCR. However, of course, as we just mentioned, the OCR is useful in how you can search and for individual words, you can find stuff in this text that you couldn't find in the hard copy. You can find individual words that you're looking for, but not with 100% reliability. And also, diacritics are not gonna work. So again, it's a question of how to move forward to this text. This series is like four books that are full of 100 year old articles written in Romanized script, some by European missionary authors, many by local Chinese Christians. Some have Christian themes, but some are sort of just more general social, cultural themes. So it's tremendously interesting to read through. But it's not very easy to make it super, super accessible, searchable. Now it's sort of, again, this is a copyrighted book. So it's not the same as the earlier dictionaries where you just wanna share it around. But ultimately, it would be nice to have access to resources like this in digital form. But digitizing this, maybe you guys know better than I do. Is there some magical way to create digital forms of text like this? As far as I know, state of the art does not allow you to do it very well. But definitely something for the future is digital versions of texts like this. Yeah, same for texts like this. So this is all Romanized Taiwanese textbooks from late 70s, early 80s. It's almost not, it's almost just a collection of language. I mean, it's so massive. There are tens of thousands of sentences, tens of, just hundreds and hundreds of paragraphs and stories. And when we start looking through it, we find that the languages has changed over only in the course of 50 years or so. I was gonna look at some specific examples. It's really, it's not very important. But what tends to happen to, actually, this is sort of not only, you know, Taiwanese affected by contemporary manner. It's any regional language impacted by a regional standard or a national standard. So it's the kind of process that goes on or has been going on sort of throughout China since forever. So especially if you're studying a mean and you have to always be thinking about these issues, there seems to be the possibility for impact influence at any level where individual words get replaced. Dyselabagore sort of gets get glossed from standard languages in a way that's a bit weird from the point of view of just borrowing, right? We're thinking about, so yeah, if you listen to older Taiwanese speakers, of course, they'll simply code switch a Mandarin word. But over time it seems instead of having a sound borrowing or phonetic borrowing as we might expect it, instead you have sort of glossing of individual syllables into local language. So you create sort of local calcs of standard words and gradually local calcs tend to displace earlier native vocabulary. The thing about a mean is that these process have of course been going on for a long time, but relative to other southern, other regional Chinese languages, these processes have had less impact. So more of earlier colloquial material survives. So in the case of, for example, Taiwanese, you can watch this in progress from year to year, from the 80s to now, this kind of thing has happened. So, well, yeah, I had to show you one example. This is under this word. Well, you can see my cursor. That doesn't matter. If we look at, I'll show you a modern dictionary. And look at this for just a second. So this little grammatical particle that introduces a patient means towards the benefit of the thing, towards the thing, this kind of thing. This seems to be part of Taiwanese and southern mean more generally, not related in an obvious, okay, maybe we can make arguments about some Mandarin cognate word, but seems to just be a piece of Taiwanese grammar or a Hokkien grammar. However, now we can find sentences like this in modern Taiwanese that's become very common. It says, you know, pick up the room, clean up the room, and the room is introduced to this particle, where the idea is do it to the room. But this kind of sentence is a model on Mandarin syntax, where it's like a ba sentence. So actually this Mandarin translation, you can use ba, fang, dian, dama, dama, ya. Now the Taiwanese sentence says ga. I probably didn't share myself, but the idea is you can see that it's closely parallel. Take the room, act on the room, clean it up. Same thing in Mandarin, ba, fang, dian, verb. Take the room and clean it up. This kind of sentence does not exist in the early materials. So if you go to Douglas as a dictionary, he'll say he will only have sort of pronoun person objects with this thing, where we're doing to a person or affecting a person in some way. And even this, this is only from 50 years ago. You will not find sentences like that in this book. And there are thousands of thousands of them. And anyway, I just was searching through, thinking about this particular word. This is typical with a person. He's always twisting my ear. So ga, wa, to me, towards me, twists ear. This is a typical sort of classic Taiwanese sentence where he affects me, does to me. And this is the kind of sentence that begins to shade towards the modern situation where the light bulb is broken and I'm going to, I want to twist it off. So I, I'm going to guide, I'm going to take it and twist it off or am I going to do it for someone or something? The sort of ambiguous use of for it to it where we can't be totally confident that this means the action is directed at either the light bulb. And anyway, this is just a one little illustration. If you go through Douglas's dictionary sentences for 1873, Mary Knoll sentences from 50 years ago, and then the sentences that are presented inside our modern dictionary, you'll find this process of sort of minorization of syntax over time in many respects. So useful, interesting in its own right and also sort of an illustration of the degree to which and the ways in which standard language is impacting these mean varieties. I feel like I've taken up enough of your time. And I went through most of them in a teriton that I have. I didn't look at them in as a bunch of details. I was thinking I could, but it's all good. We spent more than an hour. Let me see. I'll close this stuff up. Yeah, let's close this. I mean, I can just, I can make like one minute of sort of concluding remarks or general thoughts. One is can we build a bigger list of core mean lexical items that's bigger than what Mormon et cetera were working with 50 years ago using material like maybe there are possibilities to expand this set of important words to some degree, maybe even to a considerable degree. And then can we study some really, really specific changes? I'll show you one word. Like I was thinking of discussing this more in more detail. Well, let's just look it up and I'll stick to that. Like nose, it's not supposed to be nasalized. If you look at, for example, we don't need to look it up in another book, but it seems to be mixed in Southern mean and other main languages nose does not have a nasal vowel, but in Southern mean languages, for some reason there's a nasal vowel and why we can read in, for example, Beech's book talks about this phenomenon a little bit, but largely talking about in literary layers where for whatever reason a non-nasal vowel has been borrowed into mean languages with a nasal vowel, but this stuff is not borrowed. This is deeply mean. In fact, this belongs to the important group of so-called voiced aspirated words where on a system like Norman's this will have a voiced aspirated onset in common mean or plural mean, but why nasalized vowel? Anyway, one idea is that it's a very, very narrow condition change that happened in particular places in Southern mean. What are the conditions? It's not clear, but this looks a lot like the word ear and some other words. Ear is written as he in some sources, he with a nasalized vowel in other sources. Same for this word, no, the written as p sometimes, p sometimes. So you suspect that this relates to voicing neutralization relates to vowel quality, relates possibly to the onset, the onset category. So any of these things are possible. So looking at more data and more detail might allow us to answer this kind of question. This one is just one random question, but is important for the nature of this particular item nose in common mean and proto mean. It could tell us something about the onset, which is sort of exactly the question Norman had about this category. Is it voiced aspirates or is it something else and voiced aspirates or just a placeholder? Anyway, this is the sort of level of granularity that we might have to get to in order to understand some of these issues more closely. Yeah, maybe I'll wrap it up. Thank you so much, I appreciate it. Hopefully I covered these sources in a way that everyone could understand. Maybe you'll have better ideas about how to utilize some of these resources.