 Okay, well, thank you for joining us for this talk Asian etymology achieving faster progress and more secure results so we start from the assumption that etymological dictionaries are nice to have and And having more of them would be a good thing And then we observe that there's an inequality in the distribution of etymological dictionaries So a Brazilian who is curious about the origin of a Portuguese word Can turn to one of several etymological dictionaries of Portuguese But a speaker of any of Brazil's 217 native languages struck by a similar curiosity has no return Less than 1% of the world's 7,000 languages have etymological dictionaries And at a time when 40% of the world's languages are endangered addressing this inequality is Urgent and would yield a very broad benefits So why aren't there more etymological dictionaries of more languages? Well, etymological dictionaries are slow and expensive to compile. They are not commercially viable And just to talk through some examples the francisicis etymologicis vertebrou required 80 years between 1922 and 2002 to be published Looking a little closer to our own field the sign of Tibetan etymological dictionary and the source Was planned to have 80 fascicles of those only one was published despite an investment of 28 years and over three million dollars and Like this example many etymological dictionaries are never finished In contrast bilingual and monolingal lexicography has has really Entered the digital age In particular the availability of large corpora has revolutionized the ability to come to compile such dictionaries We'll just look at One project that's kind of at the vanguard of this development Elexis so they have three interlocking products that help speed up dictionary compilation sketch engine which organizes Raw or or annotated corpora of language use to best showcase the distinctive behavior of different words They have something called one-click dictionary, which then organizes this data into an automatically drafted dictionary And then they have a third product lexonomy which provides an environment for editing That automatically compile dictionary for for publication and and also a publication environment So what about etymological dictionaries? Well, basically the working methods of etymological lexicography Remain unaffected by the digital revolution Lexicographers type out data from printed books which they arrange and manage in general database and word processing software with no use of specialized tools So how does progress happened at all in etymology? It happens via Having a large investment of of labor and a thriving collective Environment of researchers who who gather and scrutinize hypotheses and this crowd sourcing If you like has served European languages relatively well with as many as 20 Italian dialects having etymological dictionaries, but without the generous support of national funding bodies Dictionaries of less well resourced languages are not compiled So so we think that for new etymological dictionaries of understudied languages to be compiled at all And for the dictionaries of well studied languages to continue improving etymological research demands automation So we see there's been basically two tasks to automate in the first instance one is the identification of related words and The other one is the identification of changes in pronunciation. So just to take an example first is something like the fact that English foot and German foos descend from A Fort in protogermanic this fact is this divisible into into two facets one is the mere association of foot and and foos as as Probably somehow related and the other one is is the proposal that German changed a T into an S in this environment So automated methods exist for both tasks. These are automatic cognate detection and mechanized historical phonology and our workflow Ames to combine these two so We are looking at the Burmish family where there's there isn't a lot of previous scholarship On historical phonology This gives you a sense of where the Burmish languages are spoken and something like the the family tree of the languages And in terms of what materials we rely on we almost exclusively rely on this this book from 1992 Edited by Huang Bufang with where a Certain list of were of ideas rather about 2000 concepts are given in 40 Center-to-Bend languages, but we augment that in the case of Burmese with with the relevant literature with old Burmese And then this is just to give you a sense one page from Juan Bufang's book here. It's the meaning one Listed in a bunch of Sonnet-to-Bend languages When given a set of words in different languages the algorithm tells you which words in this set Are likely to be related. We're going to see an example later And so there are older methods which are based on actual phonetic similarity and that's robust and newer methods that are already a little bit historical linguistics themselves and they do calculate the recurrent phonetic correspondences and so Basically here we use in our methodology the Lex that algorithm developed by Johann Matisse List and so Here is an example of the kind of thing produced by the Algorithm so we feed into this algorithm a lot of data in different Germanic languages and These these data are indexed by semantic identity. So here is Here is a row of different Germanic words for women and The algorithm gives a similarity score. So for example Danish queen is very similar to Swedish queen But German fowl has nothing to do with English women. So it's like I'm almost a hundred percent dissimilar so if we plot this score on the map and They are algorithms that can cluster the Results so the algorithm knows that notice that okay this fowl and this row looks similar enough So they are one class. This is another class and this is the third class So looking to mechanize historical phonology it relies on The fact that sound change is regular. I'm going to talk through one example Very swiftly which is the inter-European word for eight something like hook toll Which gives us eight in English octo in Latin ashtow in Sanskrit all Via regular sound change So let's just look at how this works in the case of English We have to get first to proto Germanic and then to old English Looking at the consonants first The the H disappears before vowels The palatal K merges with a plain K and then becomes H according to Grim's law Turning to the vowels. Oh changes to ah in Germanic Long O remains longer, but then merges with long ah and And with these changes in this order, we are have arrived at the proto Germanic form Now moving from proto Germanic to old English ah changes to ah and Then front vowels including a break into diphthongs Before certain consonants so for our purposes changing act into a yacht Unstressed diphthongs are monop Thong eyes and so changing toe into toe And then unstressed all becomes up and that's how we get from octo to air and The details of this don't need to concern you. It's just a Basic methodological point that that sound changes regular each of these changes happened at a specific moment in time across all of the words In the language that they could apply to and one of the major goals of historical linguistics is to figure out historical phonology the relative chronology different sound changes and to to reconstruct ancestral forms so We can teach a computer ordered changes like this and Then run the changes backwards on the tested forms to find possible Reconstructions and we can run them forward on reconstructions to check that everything is working according to plan or you know to refine our understanding of historical phonology as the case may be and It's an important Principle to never adjust the proto forms in an ad hoc way But instead have the proto forms generated by the tested forms so here Using a an online platform for finite state transducers developed by Diego Sorry, Tiago Trisoldi in Yena I Have I have formalized the changes from Indo-European To dramatic affecting the word for eight and you see it down at the bottom. It says apply down Hock-to so that's where I say okay apply these rules to this proto-Indo-European form and I get the Dramatic form on the right and then if I apply the same changes backwards by just saying apply up To the dramatic form I get this whole list of options and this is characteristic in terms of Information is is lost over time So a a tested form can lead to multiple Ancestral forms through this backward reconstruction now, of course many of these aren't possible words in Indo-European but that's extraneous information that has not yet been modeled and We see as actually one of the points of using methodology like this is to force yourself to Increasingly have to explicitly model more and more information So that so that knowledge that is tested becomes knowledge that's explicit So we are going to look at one of the hypotheses of phonological history, okay? So here is a toy version of Burmese So it's not old Burmese, but proto-Burmese so from proto-Burmese bar it gives ngo chang bo and How do we actually encode encode something like this in In transducers well, so this needs to be Basically, it needs to be Divided into two parts There is this definition of the phonotactics of the proto language Which is shared by all daughter languages because every daughter language projects back to the same proto language Otherwise, they won't be related so Proto-Burmese syllable In our toy example, it's made of an initial a rhyme and a tone And it's quite readable notice that we just put definitions together to say that they're linearly concatenated And an initial is one of the following bur bur Blotterless bur m or glotterless m And similarly, so the rhymes where we have just two choices r and e and tones. Okay, so we see that bar is a legitimate proto-Burmese syllable according to our toy definition now How does it change into bo? Well, there is the first Sound change Burmese devising which happens that bur Is devised into bur so this is the phone. This is the syntax of the of the firmware language, which Fortunately stays quite similar to the sp. E Kind of notation most linguists are familiar with and the actual daughter language can be defined as as kind of application relationship, so You take the proto language syllable you apply on it the first sound change and then the second sound change and it gives the actual predicted form in Mocha So The use of finite state transducers for this sort of backward reconstruction has some history in in linguistics In particular Houston wrote this computer-generated dictionary proto-algonquin However, what what what His his method is Something that relies on having figured out a lot of historical phonology beforehand Which is also the case in the example of of eight that I gave earlier I knew what I needed to formalize and more recently piece aloe has been using finite state transducers in the modeling of of historical phonology of Indo-European and similarly was relying on a huge body of scholarship of previous knowledge And also his efforts are not terribly successful simply because the Indo-European language and question are quite Distantly related. So analogical change Also needs to be modeled and that's not something we're going to get into basically the the it's just a remark that This the use of finite state transducers in historical linguistics works better with lower level subgroups Then there is a complete methodology that can lead to etymological dictionaries, which is the which is the methodology of back projecting dictionaries The mechanized historical phonology can back project whole dictionaries But only in conditions of near-perfect knowledge when you just approach this language group Of which you don't have like a very profound knowledge and nobody has because it hasn't been studied that Predately, um you it's very difficult to create the initial hypotheses from rural lexical data You'll need something to work from The basic problem is what I call the exploratory deficit That is to say that there is no way to go from a state of very little knowledge to towards A state of Much knowledge and if we decompose this exploratory thing we we get two Basic components which we try to solve in In our project the first one is bootstrapping. That is to say We need to create the initial hypotheses from rural lexical data So we have something to improve on and we have a computerized Assessance mechanism for gradual Improvement so the human linguists are truly supported in their gradual improvement of the hypotheses So the way that you can make it practical as a methodology is by having a way to get it started, which is bootstrapping and a way to make gradual improvements in our Methodology the algorithm and the transducers are combined in the following way the algorithms are used to produce a preliminary version of the cognate assignment then human linguists Gradually correct the primary preliminary cognate assignment using transducer enabled user interface So this gives us the what I call the kappa workflow So it's computer assisted proto language reconstruction because basically you In the ideal sense, which we have quite approached in our banmish case We start from a huge blob of words From a certain language group and we work with that and we actually end up with a reconstruction of the proto language So the the actual data the human linguists in the workflow is working on is the bipartite hypothesis It's called bipartite because it's made of two parts. So there is this hypothesis of phonological history how the sounds of Every individual language changed from the proto language of the group to the individual daughter languages the other part of the bipartite hypothesis is the The lexical cognacy judgments. So it's quite simple. So since we're doing dealing with Southeastern Asia monosyllabic kind of languages here. So basically every polysyllabic word Is divided into syllables then every syllable Belongs to one of the cognates says and so for example the first syllable of the word for brain It belongs to this set To which also it belongs. They were the first syllable In the hair word in the long trans archang language Given the cognate assignment and given the hypothesis of of historical phonology, it is quite simple to predict the proto form of a certain Cognate set. So for example, we have this these forms maru big Bola P and archang P which all means tear And we and we can see here. So for example maru has had underwent had undergone some kind of sound change which made the Which made it be in all the three possible tones Confounded so if you see maru big It could be proto-bomesh B B in the H turn or B in the X turn And similarly so in Bola B can be reconstructed back to B and B and finally in long trans archang B can be reconstructed to all those different proto-bomesh forms And it's quite easy to in this case. It's quite easy to predict that Given all these forms the most probable form is B So that's what actually happens in the In the dictionary view. So this is the preview of the compiled dictionary. So the There are some languages that have Bay only so it turns out that the The computer considers B the most the most probable Reconstruction in both phase but B BH and B ji are also Possibly not entirely we ruled out And different and here we can see that different forms different actual Attested forms can be projected back to different Proto forms however in the dictionary in the dictionary view all the proto forms displayed are the proto forms not judged as probable by the by the system So let's talk about the organization of the entire capable workflow the we began by the stage of Pre-processing so the source word list Need to be pre-processed by the by the linguists And then they at the bootstrapping stage the algorithm Produce one part so the cognate set part of the hypothesis and human linguists try to work with that to produce the first version of the hypothesis of phonological history From this first crude bipartite set of hypothesis We got a lot of different use interface which help the Human linguists to Gradually refine the bipartite sets of hypothesis to account for the linguistic data and this is an iterative process. So When you make one part better Then the other part automatically gets better to at least you get more more materials to work with and once the linguists judge that the The material is good enough fit for print then the stage of finalization happens where the prepared hypothesis can be Made for publication in the form of an etymological dictionary So here is how the iterative improvement works So as we see here is the we have the hypothesis We have the bipartite hypothesis which consists of the cognate judgments and the phonological history So with the first version of the bipartite hypothesis the human linguist use the cognate judgment reassignment interface to To improve the cognate judgment So we got human corrected cognate judgments here. And so those cognate judgments are quite good So they can be fed into the correspondence pattern view algorithm to produce correspondence pattern view which Used in the phonological history the Bergen interface can allow the human linguist to make the phonological history better so better cognate judgment Results in better correspondence patterns and better correspondence patterns Give the human linguist a better view of the phonological history so the human linguists can encode The better understanding of human of phonological history into transducers then the transducers are used in refishing and with refishing The computer produces In some cases better and in some cases worse but always easy to change kind of Cognate judgment and which can be corrected by the human linguist again so and and and maybe the point is that um refishing always Includes more data than the last iteration and the better your Transducers the better your back projections from the dictionary right that basically we're gradually Transitioning from mostly relying on the algorithmic approach to mostly relying on the uh transducer approach Conceptually we have I'll say three different parts the transducers uh And the stored Cognate judgments. That's just two parts. Yeah, let's say we have and the third and the third part is the actual source Word forms and glosses which are like which are stored In the application yeah, okay, so um So so it looks like you've sort of integrated everything together. So maybe just um Talk me through here like what should we do for first? Should we should we paste in a transducer or exactly because uh, because in general these transducers are Debugged in the other app. So they kind of get changed quickly and so uh, every time that we start to you every time that we modify the the Sorry the Cognacy judgments. We put the we put the newest transducer just here So I'm just going to paste the transducer that you sent me in here. Yeah So I'll just uh, click on It's it's on the left side click on load transducer load transducer Okay, so uh, so this looks like the boards and just let me see if I understand here the The computer has already sort of fished out from the From the saved, you know, um actual attested forms Various words that reconstruct to the same or similar things. So here we have Bar And it looks like Yeah, we have three, uh Kind of piles of cognates Yes, because that's the because that's the cognates that you have like already Corrected once at least yes, I see Yeah, so yeah, so they're quite good and there is And here there is just one very tiny Thing that needs to be changed Well, it looks to me like basically Well for for for that That these two should be combined, right? Yes, because uh the maru word says empty, but There is this mur here. Yeah, so it it's etymologically means not full or not having or something like that. Yeah, exactly So shall I just drop drag this in here? Yeah, you can drag this entirely inside Okay And then the reason there is no reconstruction here is because some reconstructed bar and some reconstructed bar. Is that right? And then do I need to save like if I You know, please How do I do that? Yeah, just save boards the the the third Save boards So they're safe to the cloud Okay, so let's just do a another one just to sort of get the The sense okay so Uh There is another one. I noticed. Oh, yeah, we have all this blossom blossom of flowers Blossom of flowers and then we have here classifier for flowers. Do you think those are the same? Yes, that's uh, that's uh Specificity of vermice. So so you you use the same now to classify it So maybe I should combine those two What do you think? I think so because uh because basically there are like no Words that doesn't really belong Although, um, I'm just noticing like do we really think that this Pong Is related to this? No, I think and a bone is orangoon Buying so in fact, we need to we need to create two different So and you can and you can always create a new column by clicking on the On the on the here So then for instance, I'll move The the ones with the nasal. Yeah, the point and the phone Here Yeah But then Yeah, then oh and this one too. Yeah And it's very nice being like Burmese Burmese long-time and gender Which are expected to be closer with each other And then these ones I will just just do a categorical thing Okay, you mean the whole thing like this Just make sure we don't have any nasals Okay, so then I can say save boards And then uh, you know one activity that then I guess I would go through is just one by one click through these These boards. Yeah Yeah um Let's just uh, you know zoom along to Further one. Maybe this rock or something Well, that's quite a handful Yeah, this one is is maybe Too complicated to go through now, but uh, yeah a riddle riddle should be the same thing as the fourth deceiving cheat Uh, let's see. Where's riddle? Oh, there's a riddle. Yeah. Yeah, and the third and the third from the left Is like false deceiving cheat. Yeah Okay, so um So I'll move the false deceive cheat one into the riddle, huh? They save boards now, um Just tell just tell us what this fish means this fish this fish means that uh, uh, these words are the ones that wasn't considered like reliable enough for For like inspection like this The last time but this time with better The With better transducers is those are fished out and now we can see if they are really like reliable cogniz like to be changed and so so, um Basically like like putting it another way if something doesn't have a fish It means it's been there in our system for a while But if it has a fish, it means that since the last time we uploaded a transducer This has gotten because of the new transducer has gotten newly fished out of the overall lexical database And so before we begin, uh, we should like Paste the current version of our transducers Into both the old and new tabs Now should I paste the transducer you sent me in the email into the like the new tab for instance, or You should paste Into both old and new Because old will be like relatively old. Yeah, like we like this is like used to To make further improvements on the transducers Okay, so I've just uh, so I'm just going to Do that there it is in old Here it is in new And now we can check the diff Because it's supposed to if should be nothing nothing. Yeah, okay uh, and then maybe just to um I don't know just to prove the point I will say right here at the beginning I'll say blah blah And then if I say diff it says blah blah there. Yeah Now I will delete that because we don't actually want it to say blah blah there We just need to click on load cognitive assignment to to to fetch the to fetch the cognitive Assignment that you have just saved So now I'll load cognate assignment and now this is load loading from the The assignments we just saved is that right? So So here like you can like compare The suggestions amount is like between two or three languages But of course you can do more and the actual language under study like the one that you plan to Do something more about will be put at the end So, yeah, we're not married old Burmese and Bola. Let's see if we can do something with the Bola. Okay Yeah, and then it takes a little while it says compiling correspondence pattern Okay Okay, and then now it's done. So we go to report. Yeah Okay So a bunch of not very interesting things that we have already solved. So for example that uh old Burmese L We correspond to Bola L Well, but um Yeah, okay. So so yeah, so Bola L corresponds to old Burmese L in the word for warm doesn't Surprise us very much. But what are the what is this raw and default? What does that mean? Raw so so default means the the amount of these things that we are that are that are actually That are currently in the boards So so because this a fishing mechanism has uh, like a structural floor, which is that maybe there are things that are super common But they are never bordered and I see they are not recognized and they will never be recognized So so one of the reasons it takes so it is such a long time is that is that actually it's compiled Uh on the raw data also So that we are not going to like lose anything so like in this case F goes to f or sorry f corresponds with f There are four examples in the raw data But only one has been Boreded to use your term. Yeah and The system considers that it's a problem that might There there's some there is some like a heuristic there that That's in first that it there might be a problem. So there is exclamation mark here Yeah, which is just and and the sense that there's a problem is because of the large difference Yeah, like like if the difference like if the There are too many in raw that that aren't It's like in a proportional way. I see Yeah, whereas like here. It's much more examples, but we have 30 raw And 25 in the board, but that's we've gotten most of them. So it's kind of doesn't bother us. Yeah Okay, so here we have we there are five in the raw and five that have been bordered. So so pretty good pretty good And There is a problem here, which is that the the cross the river word isn't like a properly taken account of in Bola Well, you're you're jumping a little fast. Let's just say so we have the the gloss. This is the shared semantic Across all three languages that actually comes from our original, you know data source, which is a comparative word list And then these are the reconstructions it has a question mark because not all languages point to it exactly and then So and then the first line is the is the old transducer And the second line is the new transducer, which are the same because we haven't yet changed the transducer And as you said Uh, there's a problem in so far as if Let me see if I got this is right. Yeah, if The Reconstruction is correct. It should have led to this form in Bola Rather than this form Exactly So can we try and fix this now? Yeah, let's let's search for let's search for the rhyme and let's search for the vowel So there it is Problem Bola is that it should have it The the predicted form has a oh, but the actual form has a oh Yeah, we'll see here. This problem is actually very It's actually very recurrent Yeah, it's it's a real problem. Yeah, and I and a problem that Apparently can be solved by just adding a new New sound law changing all into all Yeah, so let's Try and do that and then we do that first by finding Bola. So here's we're a transducer for the current proto-Bermish to Bola Sound laws Let's find some of the vowel sound laws um Okay, so let's just So We just need to have this like uh, like after after or to out so that there is like no feeding or bleeding Yeah, so shall I put it after or to out? Yeah, because otherwise because otherwise every every or will become out Oh, no, no. Yeah, I understand but maybe I was tempted to put it sort of right near the end Hey I have no idea but yeah, uh, we'll give it a try and see whether it messes anything up. Uh, that's the idea Or just for fun. We can we can have it on the On the wrong sequence first Oh, yeah, okay So I'm gonna just call it Who goes to Yeah, and then Who goes to And we can remove the conditioning because I think the it's probably unconditioned. Yeah I mean, I think this vowel only occurs in open syllables anyhow. Oops Okay So I've just defined the sound change And then I go down here And put it into and I put it into the right place And it will be a very good idea to check diff to look at diff so that we see that nothing Nothing super Nothing Superfluous Okay, everything looks good about right. Yeah Of course now I um Yeah, and we can lost it, but let's go back to Bola. Yeah Okay Okay Well, and then like you suggested what I'm going to do first is actually put it In the wrong place In terms of the sequence of changes And then we will see how that is reported to us. Does that sound good Sounds good. So how do I do I go back to action then and say we can get correspondence get correspondence Okay, and then we go to the report and look at cross again, and we'll just look for cross Well, so all of these are are still wrong, which Means we haven't fixed the problem, but we do see that Uh It's wrong differently. It's wrong differently. Yeah, whereas whereas in the old transducer it was expecting Oh, and then now it's expecting all and then if we just scroll around. Yeah, we see that There should be other No, nothing. I don't think anything will go wrong. You don't think anything else will go wrong because it just um Yeah, isn't a change that affects very many words Okay, but now let's uh Let's Put it in the right order Okay, so now we put it in the right order and go back to get correspondence port cross And now it has fixed the problem And this smiley face has come up Which which means kind of congratulations. You've fixed the problem I just scroll down to the other place. It comes in the rhyme examples Oh, did that It fixed Some of them, but not all of it fixed some of them, but not all of them. Yeah Hmm third problem is quite funny. I don't think it can be It's it can be like just done like this and the x thing And for the x and for the egg word the thing is that where The initial glottal stuff shouldn't like be introducing Tenseness neither on maru nor on burla, but in fact it introduces it on both So, yeah, that's like a that's a deeper problem that needs like to be looked into in a deeper way Well, but I mean this shows kind of how well things work actually which is we We found a problem. We fixed the problem. It's led to The discovery of an of a new sound change in the history of bolah Also, we have had to put it in the right chronological order So that we're we're developing an increasingly sophisticated model of bola historical phenology Both in terms of what sound changes happened and in terms of what order they happened in And then when we implement the the the change We see that it's uh, it it has fixed some of the cases we wanted to fix But it hasn't fixed all of them And the reason why it hasn't fixed all of them is that there are Yet more subtle problems and even if we if we look at them very specifically You know, uh, how can I say we have fixed the vowel, right? Like like in the old transducer here. It predicted the wrong vowel. Now it predicts the right vowel The problem has something to do with the initial which we weren't thinking about and similarly here The the I mean, I think that egg correctly reconstructs to have a glottal stop initial and somehow, uh, you know We have a an issue of the interaction between the glottal stop initial and the creaky Voicing that hasn't been modeled and it sounds like you think maybe shouldn't be modeled in bolan in maru, but We can look at that at another occasion. The point is the vowel correspondence Which is what we were targeting with this intervention has been fixed So even though we didn't get smiley faces here because The reconstruction still doesn't predict the attested form We have still gotten closer, right? That's actually like Isolates when part of the problem and highlights the remaining problems. Yeah, exactly. And that's the point of, you know, progress in science, right? It's like, um by like Yeah, we've articulated a hypothesis and then we've gotten a feedback that that hypothesis is indeed correct and By articulating the hypothesis We can, uh Move on to yet more thorny and subtle problems So, um, and then I just think it's worth saying that had we had we, um made a proposal that had uh Broken all sorts of things there would have been some frowny faces But i'm a little bit, you know reluctant to go breaking things intentionally in order to um Uh, although although actually let's just here. Here's what we can do. We can say okay. That's the new one exactly Go in paste it into old And then you'll need to paste the original thing into the new Well, I'll just say go to bola and then I'll just delete the, um Yeah, the new sample a new sound change, which is this one Actually, I'll just I'll just comment it out. I think that's a nice Way of doing it Okay, and then we go to action. So, you know, just to be clear to all you kids at home What we're doing here is intentionally breaking it In order to show you the way that the report happens when You break something So we go to the report and then go to cross And Here we have this frowny face because the previous You know now the old transducer correctly predicted the attested form But the new transducer does not and that's why we've gotten that frowny face So, um, I mean this basically this is very similar to the board view, right? uh, but What we have is it printed like you would have in uh, in in in a finished uh dictionary And actually let's just look at the at the very beginning Because we saw that with the boards We have this word Exists and this other word exists which are not combined Because this printout came from you know before I just did that But they're still presented next to each other and uh, what What I think is particularly nice about this, uh system and I will even You know zoom in some more Is if we look at this exist Uh, we have the the the different meanings that occur And then this check mark means that this attested form Is predicted by our current transducers Whereas this x mark means that this attested form is not Uh predicted by a transducers and when we get the x It also tells us here what Reconstruction would have Predicted that attested form and so so this is a huge amount of information about what's regular what's irregular What areas uh can be looked at for improvement? That's all you know more or less done automatically, right? So I think that's a very powerful um component of this system that shows its potential both for for making etymological dictionaries faster Uh, but also for making them more rigorous and explicit So we have just seen the k-po interfaces for the Cognitive reassignment and also for the debugging of the transducers of the phonological history Some things work and some don't but in general, I think it's fairly easy to see that we have something that is that really assists the linguists and The creation and elaboration of historical linguistic hypotheses. So Uh, I hope that our approach could lead to a more human centered approach and computer assisted historical linguistics where you know where linguists by Being forced by the computer to be a little bit more rigorous a bit more explicit can reap actual benefits in In having the dirty details taken care of by the computer and seeing We're seeing exactly the things that needs change and the things that don't And In that case we developed this methodology for the For the burmish languages, which are languages with relatively reduced syllable canon and with Very little morphology except a syllable composition with almost no morphophonology And there's also no paradigms. Everything is agglutinative. So there's no There are not that much analogic effects either. So If we are going to Port this approach Outside of let's say China China and mainland Southeast Asia We'll need to Make better transducer engines, which can be which can be tailored to deal with the alignment problem, which is quite funny and non in non monosyllabic languages and also we'll need to find ways of encoding and working with paradigmatic morphology and Analogical historical changes So I hope that this is a good start and with further Improvements on the road it could lead to a future word to a future world of verified reconstructions where historical linguists can Really show that what they propose at least do have an internal consistency The reconstructive forms with the reconstructed changes do lead to the attested forms. Thank you very much