 Okay, well, thank you for joining us for this talk Asian etymology achieving faster progress and more secure results so we start from the assumption that etymological dictionaries are nice to have and and having more of them would be a good thing And then we observe that there's an inequality in the distribution of etymological dictionaries So a Brazilian who is curious about the origin of a Portuguese word Can turn to one of several etymological dictionaries of Portuguese But a speaker of any of Brazil's 217 native languages struck by a similar curiosity has no return Less than 1% of the world 7,000 languages have etymological dictionaries and at a time when 40% of the world's languages are endangered addressing this inequality is Urgent and would yield a very broad benefits So why aren't there more etymological dictionaries of more languages? Well, etymological dictionaries are slow and expensive to compile. They are not commercially viable And just to talk through some examples the franciscous etymologist is vertebra required 80 years between 1922 and 2002 to be published Looking a little closer to our own field the sign of Tibetan etymological dictionary and the source was planned to have 80 fascicles of Those only one was published despite an investment of 28 years and over three million dollars and Like this example many etymological dictionaries are never finished In contrast bilingual and monolingal lexicography has has really entered the digital age In particular the availability of large corpora has revolutionized the ability to come to compile such dictionaries We'll just look at One project that's kind of at the vanguard of this development Elexis so they have three interlocking products that help speed up dictionary compilation sketch engine which organizes Raw or or or annotated corpora of language use to best showcase the distinctive behavior of different words They have something called one-click dictionary which then organizes this data into an automatically drafted dictionary And then they have a third product lexonomy which provides an environment for editing That automatically compiled dictionary for for publication and and also a publication environment So what about etymological dictionaries Well, basically the working methods of etymological lexicography Remain unaffected by the digital revolution Lexicographers type out data from printed books which they arrange and manage in general database and word processing software with no use of specialized tools So how does progress happened at all in etymology it happens via having a large investment of of labor and Thriving collective Environment of researchers who who gather and scrutinize hypotheses and this crowd sourcing If you like has served European languages relatively well with as many as 20 Italian dialects having etymological dictionaries, but without the generous support of national funding bodies Dictionaries of less well resourced languages are not compiled So so we think that for new etymological dictionaries of understudied languages to be compiled at all And for the dictionaries of well studied languages to continue improving etymological research demands automation So we see there's being basically two tasks to automate in the first instance one is the identification of related words and The other one is the identification of changes in pronunciation. So just to take an example first is something like the fact that English foot and German foos descend from Fort in protogeomanic this fact is this divisible into into two facets one is the mere association of foot and and foos as as Probably somehow related and the other one is is the proposal that German changed a T into an S in this environment So automated methods exist for both tasks. These are automatic cognate detection and mechanized historical phonology and our workflow Ames to combine these two so We are looking at the Burmese family where there's there isn't a lot of previous scholarship On historical phonology this gives you a sense of where the Burmese languages are spoken and something like the the family tree of the languages and In terms of what materials we rely on we almost exclusively rely on this this book from 1992 added by Huang Bo Feng Certain list of were of ideas rather about 2000 concepts are given in 40 Center-to-Bend languages, but we augment that in the case of Burmese with with the relevant literature with old Burmese and Then this is just to give you a sense one page from Juan Bufan's book here. It's the meaning one Listed in a bunch of sign of Tibetan languages When given a set of words in different languages the algorithm Tells you which words in this set Are likely to be related. We're going to see an example later And so there are older methods which are based on actual phonetic similarity and less robust and newer methods that Already a little bit historical linguistics themselves and they do calculate the recurrent phonetic correspondences and so Basically here we use in our methodology the Lex that algorithm developed by Johann Matisse list and So here is an example of the kind of thing produced by the Algorithm so we feed into this algorithm a lot of data in different Germanic languages and These these data are indexed by semantic identity. So here is Here is a row of different Germanic words for women and The algorithm gives a similarity score. So for example Danish quena is very similar to Swedish quena But German how has nothing to do with English women. So it's like And almost a hundred percent dissimilar so if we plot to this score on the map and And they are algorithms that can cluster the Results so the algorithm knows that notice that okay this how and this row looks similar enough So they are one class. This is another class and this is the third class So looking to mechanized historical phonology. It relies On the fact that sound change is regular. I'm going to talk through one example Very swiftly, which is the inter-European word for eight something like hock to Which gives us eight in English octo in Latin ashtow in Sanskrit all Via regular sound change So let's just look at how this works in the case of English We have to get first to proto Germanic and then to old English looking at the consonants first The the H disappears before vowels The palatal K Merges with a plain K and then becomes H according to Grims law Turning to the vowels. Oh changes to off in Germanic Long O remains longer, but then merges with long ah and With these changes in this order. We are have arrived at the proto Germanic form Now moving from proto Germanic to old English ah changes to a and Then front vowels including a break into diphthongs before certain consonants. So for our purposes changing act into a Unstressed diphthongs are monop Monophthong eyes and so changing toe into toe And then unstressed all becomes all and that's how we get from octo to And The details of this don't need to concern you it's just a Basic methodological point that that sound changes regular each of these changes happened at a specific moment in time across all of the words In the language that they could apply to and one of the major goals of historical linguistics is to figure out historical phonology the relative chronology different sound changes and To to reconstruct ancestral forms So we can teach a computer ordered changes like this and Then run the changes backwards on the tested forms to find possible Reconstructions and we can run them forward on reconstructions to check that everything is working according to plan or you know to refine our understanding of historical phonology as the case may be and It's an important Principle to never adjust the proto forms in an ad hoc way But instead have the proto forms generated by the attested forms. So here Using a an online platform for finite state transducers developed by Tiago Sorry Tiago Tresolde in Yena I have I have formalized the changes from Indo-European To dramatic affecting the word for eight and you see it down at the bottom. It says apply down Hockto so that's where I say okay apply these rules to this proto-Indo-European form and I get the Dramatic form on the right and then if I apply the same changes backwards by just saying apply up To the dramatic form. I get this whole list of options and this is characteristic in terms of Information is is lost over time so a Attested form can lead to multiple Ancestral forms through this backward reconstruction now, of course many of these aren't possible words in Indo-European But that's extraneous information that has not yet been modeled and We see as actually one of the points of using methodology like this is to Force yourself to increasingly have to explicitly model more and more information So that so that knowledge that is tested becomes knowledge that's explicit So we are going to look at one of the hypotheses of phonological history. Okay, so here is a toy version of Bermish So it's not old Bermish, but proto-Bermish so from proto-Bermish bar. It gives more chan-po and How do we actually encode encode something like this in In transducers. Well, so this needs to be Basically, it needs to be Divided into two parts. There is this definition of the phonotactics of the proto language Which is shared by all daughter languages because every daughter language projects back to the same proto language Otherwise, they won't be related so Proto-Bermish syllable In our toy example, it's made of an initial a rhyme and a tone and it's quite readable Notice that we just put definitions together to say that they're linearly concatenated and An initial is one of the following bur bur glottolist bur M or glottolist M and Similarly, so the rhymes where we have just two choices R and E and tones. Okay, so we see that bar is a Legitimate proto-Bermish syllable according to our toy definition. Now, how does it change into ball? Well, there is the first sound change Burmish devising which happens that bur is devised into bur. So this is the phone. This is the syntax of the phoma language, which Fortunately stays quite similar to the SPE kind of notation most linguists are familiar with and the actual daughter language can be defined as as a Kind of application relationship, so You take the proto-language syllable you apply on it the first sound change and then the second sound change and it gives the actual Predicted form in Mocha So The use of finite state transducer for this sort of backward reconstruction has some history in in linguistics In particular Houston wrote this computer-generated dictionary proto-algonquin However, what what what his his method is Something that relies on having figured out a lot of historical phonology beforehand Which is also the case in the example of of eight that I gave earlier. I Knew what I needed to formalize and more recently piece of low has been Using finite state transducers in the modeling of of historical phonology of Indo-European and similarly was relying on a huge body of scholarship of previous knowledge and Also, his efforts are not terribly successful simply because the Indo-European language and question are quite distantly related So analogical change Also needs to be modeled and that's not something we're going to get into basically the the it's just a remark that This the use of finite state transducers in historical linguistics works better with lower level subgroups Then there is a complete methodology that can lead to etymological dictionaries, which is the which is the methodology of back projecting dictionaries the Mechanized historical phonology can back project whole dictionaries But only in conditions of near-perfect knowledge when you just approach this language group Which you don't have like a very profound knowledge and nobody has because it hasn't been studied that Predately and you it's very difficult to create the initial hypothesis from raw lexical data You need something to work from The basic problem is what I call the exploratory deficit That is to say that there is no way to go from a state of very little knowledge to towards a state of Much knowledge and if we decompose this exploratory thing we we get two basic components which we try to solve in In our project the first one is bootstrapping that is to say We need to create the initial hypotheses from raw lexical data So we have something to improve on and we have a computerized Assessants mechanism for gradual Improvement so the human linguists are truly supported in their gradual improvement of the hypotheses so the way that you can make it practical as a methodology is by having a way to get it started which is bootstrapping and a way to make gradual improvements and our Methodology the algorithm and the transducers are combined in the following way the algorithms are used to produce a preliminary version of the cognate assignment then human linguists Gradually correct the primary preliminary cognate assignment using transducer-enabled user interface So this gives us the what I call the caper workflow So it's computer-assisted proto language reconstruction because basically you in the ideal sense Which we have quite approached in our Burmese case. We start from a huge blob of words From a certain language group and we work with that and we actually end up with a reconstruction of the proto language So the the actual data the human linguists in the Workflow is working on is the bipartite hypothesis. It's called bipartite because it's made of two parts So there is this hypothesis of phonological history how the sounds of Every individual language changed from the proto language of the group to the individual daughter languages the other part of the bipartite hypothesis is the The lexical cognacy judgments. So it's quite simple. So since we're doing dealing with South-eastern Asia monosyllabic kind of languages here. So basically every polysyllabic word Is divided into syllables then every syllable Belongs to one of the cognates and so for example the first syllable of the word for brain how Belongs to this set To which also it belongs they were the first syllable Oh and the hair word in the long trun archang language given the cognate assignment and given the hypothesis of historical phonology, it is quite simple to predict the Proto form of a certain Cognate set so for example we have this These forms maru big Bola P and archang P which all means Tear and We and we can see here. So for example, Mario has had underwent had undergone some kind of sound change which made the Which made a bee in all the three possible tones Confounded so if you see maru big It could be protobermish B B in the H turn or B in the X turn and Similarly, so in Bola B can be reconstructed back to B and B and finally in long trun archang B can be reconstructed to all those different Protobermish forms and it's quite easy to put in this case. It's quite easy to predict that Given all these forms the most probable form is B So that's what actually happens in the in the Dictionary view. So this is the preview of the compiled dictionary. So the There are some Languages that have Bay only so it ends up that the The computer considers B the most the most probable Reconstruction in both phase but they BH and the BJ I are also Possibly not entirely we ruled out And Different and here we can see that different forms different actual Attested forms can be Projected back to different Protoforms however in the dictionary in the dictionary view all the protoforms displayed are the protoforms not judged as probable By the by the system So let's talk about the organization of the entire capable workflow the we began by the stage of Pre-processing so the source word list and need to be pre-processed by the by the linguists and then they at the bootstrapping stage the algorithm Produce one part so the cognate set part of the hypothesis and human linguists try to work with that to produce the first version of the hypothesis of phonological history From this first crude bipartite set of hypothesis We got a lot of different use interface which help the Human linguists to gradually refine the bipartite sets of hypothesis to account for the linguistic data And this is an iterative process. So When you make one part better than the other part Automatically gets better too. At least you get more more materials to work with and once the linguists judge that the The material is good enough fit for print then the stage of finalization happens where the prepared hypothesis can be Made for publication in the form of an etymological dictionary So here is how the iterative improvement works So as we see here is the we have the hypothesis We have the bipartite hypothesis which consists of the cognate judgments and the phonological history so with the first version of the bipartite hypothesis the human linguist use the cognate judgment reassignment interface to To improve the cognate judgment So we got human corrected cognate judgments here and so those cognate judgments are quite good So they can be fed into the correspondence pattern view algorithm to produce correspondence pattern view which Used in the phonological history debugging interface can allow the human linguist to make the phonological history better. So better cognate judgment Results in better correspondence patterns and better correspondence patterns give the human linguist a better view of the Phonological history so the human linguists can encode Their better understanding of human phonological history into transducers then the transducers are used in refishing and with refishing the computer produces In some cases better and in some cases worse but always easy to change kind of Cognate judgment and which can be Corrected by the human linguist again Well, and and and maybe the point is that refishing always Includes more data than the last iteration and the better your transducers the better your back predictions from the dictionary, right? That basically we're gradually transitioning from mostly relying on the algorithmic approach to mostly relying on the transducer approach Conceptually we have I'll say three different parts the transducers and the stored Cognate judgments, that's just two parts. Yeah, let's say we have and this and the third part is the actual source Wordforms and glosses which are like which are stored In the application Yeah, okay, so So So it looks like you've sort of integrated everything together. So maybe just Talk me through here like what should we do for first? Should we should we paste in a transducer or exactly? because because in general these transducers are Debugged in the other app so they kind of get changed quickly and so Every time that we start to you every time that we modify the Sorry the cockiness the judgments that we put the we put the newest transducer just here So I'm just going to paste the transducer that you sent me in here. Yeah So I'll just It's it's on the left side click on load transducer load transducer And see how the loading successful Okay, so so this looks like the boards and just let me see if I understand here the The computer has already sort of fished out from the From the saved, you know, um actual attested forms Various words that reconstruct to the same or similar things. So here we have bar exactly and It looks like Yeah, we have three Kind of piles of cognates Yes, because that's the because that's the cognates that you have like already Corrected once yes, I see yeah, so yeah, so they're quite good and there is And here there is just one very tiny Thing that needs to be changed Well, it looks to me like basically Well first for for That these two should be combined right yes because the Maru was says empty, but there is this mur here Yeah, so it is etymologically means not full or not having or something like that So shall I just drag this in here? Yeah, you can drag this entirely inside Okay, and then the reason there is no reconstruction here is because some Reconstruct to bar and some reconstruct to Bach. Is that right? And then do I need to save like if I you know, please How do I do that? Yeah, just save boards the the the third save boards So they're safe to the cloud Okay, so let's just do another one just to sort of get the the sense Okay so There is another one I notice. Oh, yeah, we have all this blossom blossom of flowers Blossom of flowers and then we have here classifier for flowers. Do you think those are the same? Yes, that's that's Specificity of Burmese so so you you use the same now to classify So maybe I should combine those two What do you think? I think so because because basically there are like no Words that doesn't really belong Although I'm just noticing like do we really think that this poem is related to this? Yeah, I think bone and bone is run green We so in fact we need to we need to create two different So and you can and you can always create a new column by clicking on the on the on the So then for instance, I'll move The the ones with the nasal. Yeah, the point and the phone Here Yeah But then yeah, then oh and this one too. Yeah, and it's very nice being like Burmese Burmese long-tongue and gender Which are expected to be closer with each other And then these ones I will just just do a categorical thing. Okay, you mean the whole thing like this Just make sure we don't have any nasals Okay, so then I can say save boards acne and then You know one activity that then I guess I would go through is just one by one click through these these boards. Yeah Yeah Let's just you know zoom along to A further one maybe this block or something Well, that's quite a handful Yeah, this one is is maybe Too complicated to go through now Yeah, I riddle riddle should be the same thing as the false deceiving cheat Let's see. Where's riddle. Oh, there's a riddle. Yeah, yeah And the third and the third from the left is like false deceiving cheat. Yeah Okay, so So I'll move the false deceive cheat one into the riddle, huh? Yeah, they save boards now Just tell just tell us what this fish means this fish this fish means that These words are the ones that wasn't considered like reliable enough for Like inspection like this lot the last time but this time with better With better transducers those are fished out and now we can see if they are really a like reliable cogniz like to be changed and so so Basically like like putting it another way if something doesn't have a fish. It means it's been there in our system for a while But if it has a fish, it means that since the last time we uploaded a transducer This has gotten because of the new transducer has gotten newly fished out of the overall lexical database so before we begin we should like Paste the current version of transducers Into both the old and new tabs now should I paste the transducer you sent me in the email? into the with the new tab for instance or then you should paste it into both old and new Because old will be like relatively old. Yeah, like we like this is like used to To make further improvements on the transducers Okay, so I've just so I'm just going to Do that there it is in old and Here it is in new and now we can check the diff Supposed to if should be nothing nothing. Yeah, okay And then maybe just to I Don't know just to prove the point I will say right here at the beginning also blah blah and then if I say diff it says blah blah there Yeah, now I will delete that because we don't actually want it to say blah blah there We just need to click on load cognate assignment to Fetch the to fetch the cognate assignment that you have just saved So now I'll load cognate assignment and now this is load loading from the The assignments we just saved. Is that right? So So here like you can like compare The suggestives amount is like in between two or three languages But of course you can do more and the actual language and the study like the one that you plan to do Something more about will be put at the end. So yeah, we're not Mario or Burmese and Bola Let's see if we can do something with it. Okay, let's do so and then it takes a little while It says compiling correspondence pattern Okay Okay, and then now it's done. So we go to report. Yeah, okay Yeah, so a bunch of not very interesting things that we have already solved. So for example that Old Burmese L corresponds to Bola L. Well, but Yeah, okay, so so yeah, so Bola L corresponds to Old Burmese L in the word for warm doesn't Surprise us very much. But what are the what is this raw and default? What does that mean? Raw so so default means the the amount of these things that we are that are that are actually That are currently in the boards So so because this fishing mechanism has Like a structural floor, which is that maybe there are things that are super common But they are never boarded and I see they are not recognized and they will never be recognized So so one of the reason they take so such a long time is that is that actually it's compiled on the raw data also So that we are not going to like lose anything So like in this case a F goes to F or sorry at F corresponds with F There are four examples in the raw data, but only one has been boarded to use your term. Yeah, and The system considers that it's a problem that might There there's some there is some like a heuristic there that say that's in first that it there might be a problem So there is exclamation mark here Yeah, which is just and and the sense that there's a problem is because of the large difference Yeah, like like if the difference like if the There are too many in raw that that on Accounted like in a proportional way. I see yeah, whereas like here. It's much more examples, but we have 30 raw and 225 in the board, but that's we've gotten most of them. So it's kind of doesn't bother us. Yeah Okay, so here we have we there are five in the raw and five that have been boarded so so pretty good And and there is a problem here, which is that the the cross the river word isn't like a properly Taken account of in Bola Well, you're you're jumping a little fast. Let's just say so we have the the gloss. This is the shared semantic Across all three languages that actually comes from our original, you know data source, which is a comparative word list And then these are the reconstructions. It has a question mark because not all languages point to it Exactly, and then Maru So and then the first line is the is the old Transducer and the second line is the new transducer, which are the same because we haven't yet changed the transducer and As you said There's a problem in so far as if let me see if I got this is right. Yeah if The Reconstruction is correct. It should have led to this form in Bola Rather than this form Exactly So can we try and fix this now? Yeah, let's let's search for let's search for the rhyme and let's search for the vowel So there it is Problem for Bola is that it should have it The the predicted form has a all but the actual form has a all. Yeah, we'll see here. This problem is actually very It's actually very recurrent. Yeah, it's it's a real problem Yeah, and I and a problem that Apparently can be solved by just adding a new New sound law changing all into all Yeah, so let's Try and do that and then we do that first by finding Bola, so here's we're a transducer for the current proto-Bermish to Bola Sound laws, let's find some of the vowel sound laws Okay, so let's just So we just need to have this like like after like after all to our so that there is like no feeding of bleeding Yeah, so shall I put it after who to out? Yeah, because otherwise because otherwise every every all will become out. Oh No, no, yeah, I understand but maybe I was tempted to put it sort of right near the end I Have no idea, but yeah, we'll give it a try and see whether it messes anything up Huh, that's the idea. We're just for fun. We can we can have it on the on the wrong sequence first. Oh Yeah, okay, so I'm gonna just call it Who goes to who? Yeah, and then Okay, so And we can remove the conditioning because I think the it's probably unconditioned. Yeah I mean, I think this vowel only occurs in open syllables anyhow. Oops. Okay So I've just defined the sound change And then I go down here and put it into and I put it into the right place, which is Here whoops Okay, and it will be a very good idea to check diff to look at diff so that we see that nothing nothing super Fluors Everything looks good about right. Yeah, of course now I Yeah, and we can lost it, but let's go back to Bola. Yeah Okay Well, and then like you suggested what I'm going to do first is actually put it in the wrong place in terms of the sequence of changes and Then we will see how that is reported to us. Does that sound good? sound good So how do I do I go back to action then and say we can get correspondence get correspondence Okay, and then we go to the report and look at cross again, and we'll just look for cross Well, so all of these are are are still wrong, which means we haven't fixed the problem, but we do see that It's wrong differently. It's wrong differently. Yeah, whereas whereas in the old transducer it was expecting Oh, and then now it's expecting all and then if we just scroll around. Yeah, we see that There should be other No, nothing, I don't think anything will go wrong. You don't think anything else will go wrong because it just um Isn't a change that affects very many words Okay, but now let's Let's put it in the right order Okay, so now we put it in the right order and go back to Get correspondence Port for us and now it has fixed the problem and this smiley face has come up Which which means kind of congratulations. You fixed the problem. I just scroll down to the other place It comes in the rhyme examples. Oh did that It fixed some of them, but not all of it fixed some of them, but not all of them. Yeah Hmm The problem is quite funny. I don't think it can be It's it can be like Justin like this and the egg thing and for the X and for the egg word the thing is that we The initial glottal stuff shouldn't like be introducing Tenseness neither on Maru nor on Burla, but in fact it introduces it on both So yeah, that's like a that's a deeper problem that needs like to be looked into in a deeper way Well, but I mean this shows kind of how well things work actually, which is we we found a problem We fixed the problem. It's led to a The discovery of an of a new sound change in the history of Bola Also, we have had to put it in the right Chronological order so that we're we're developing an increasingly sophisticated model of Bola historical phonology Both in terms of what sound changes happened and in terms of what order they happened in and then when we implement the change we see that it's It it has fixed some of the cases we wanted to fix But it hasn't fixed all of them and the reason why it hasn't fixed all of them is that there are Yet more subtle problems and even if we if we look at them very specifically You know, how can I say we have fixed the vowel, right? Like like in the old transducer here it predicted the wrong vowel now it predicts the right vowel The problem has something to do with the initial which we weren't thinking about and similarly here The the the I mean, I think that egg correctly reconstructs to have a glottal stop initial and somehow You know, we have a an issue of the interaction between the glottal stop initial and the creaky voicing that hasn't been modeled and it sounds like you think maybe shouldn't be modeled in Bola and Maru but We can look at that at another occasion. The point is the vowel correspondence Which is what we were targeting with this intervention has been fixed So even though we didn't get smiley faces here because the reconstruction still doesn't predict the attested form We have still gotten closer, right? That's actually like Isolates one part of the problem and highlights the remaining problems. Yeah, exactly and that's the point of you know progress in science, right? It's like by Like yeah We've articulated a hypothesis and then we've gotten feedback that that hypothesis is indeed correct and by articulating the hypothesis we can Move on to yet more thorny and subtle problems So and then I just think it's worth saying that had we had we made a proposal that had Broken all sorts of things. There would have been some frowny faces But I'm a little bit, you know reluctant to go breaking things intentionally in order to Although although actually let's just here. Here's what we can do we can say okay, that's the new one Exactly go in paste it into old And then you'll need to paste the original thing into the new Well, I'll just say go to Bola and then I'll just delete the The new sample a new sound change which is this one Actually, I'll just I'll just comment it out. I think that's a nice Way of doing it Okay, and then we go to action. So, you know just to be clear to all you kids at home What we're doing here is intentionally breaking it In order to show you the way that the report Happens when when you break something So we go to the report and then go to cross and Here we have this frowny face because the previous, you know now the old transducer correctly predicted the attested form But the new transducer does not and that's why we've gotten that frowny face So, I mean this basically this is very similar to the board view, right? but what we have is it printed like you would have in in in a finished dictionary And actually let's just look at the very beginning because we saw that with the boards We have this word Exists and this other word exists which are not combined Because this printout came from, you know before I just did that but they're still presented next to each other and What What I think is particularly nice about this system and I will even you know zoom in some more Is if we look at this exist We have the the the different meanings that occur and Then this check mark means that this attested form is predicted by our current transducers Whereas this X mark means that this attested form is not predicted by a transducers and When we get the X It also tells us here what Reconstruction would have predicted that attested form and so so this is a huge amount of Information about what's regular what's irregular what areas can be looked at for improvement? That's all you know more or less done automatically, right? So I think that's a very powerful Component of this system that shows its potential both for for making etymological dictionaries faster but also for making them more rigorous and explicit so we have to sing the K-Po interfaces for the cognitive reassignment and also for the debugging of the transducers of the phonological history some things work and some don't but in general I think it's fairly easy to see that we have something that is That really assists the linguists and the creation and elaboration of historical linguistic hypotheses so I Hope that our approach could lead to a more human centered approach and computer assisted historical linguistics where you know where linguists by Being forced by the computer to be a little bit more rigorous a bit more explicit can reap actual benefits in and having the dirty details taken care of by the computer and seeing we're seeing exactly the things that needs change and the things that don't and In that case we developed this methodology for the for the Burmese languages, which are languages with relatively reduced syllable canon and with very little morphology except syllable composition with almost no morphophonology and There's also no paradigms. Everything is a gluttonative. So there's no Analogical effects. There are not that much analogic effects either. So if we are going to port this approach Outside of let's say China China and mainland Southeast Asia we'll need to Make better transducer engines, which can be which can be tailored to deal with the alignment problem, which is quite funny and non in non monosyllabic languages and also will need to find the ways of encoding and working with paradigmatic morphology and Analogical historical changes. So I hope that this is a good start and with further Improvements on the road. It could lead to a future word to a future world of verified reconstructions where historical linguists can really show that What they propose at least do Have an internal consistency the reconstructed forms with the reconstructed changes do lead to the attested forms Thank you very much