 our brains come contain all of the rules and all of the vocab and like think oh if I swap these two rules what happens etc so we want computers to do that to assist us and there's very nice articles by patrick sims williams that kind of documents all of the attempts that have been made from people like just doing control f control r to like search and replace and this type of things and the basic idea is phonological change is the same as a rewrite rule on a string so what if we took our IPA wrote it like we even ties all the material we have have IPA and then have a list of rewrite rules and so our work today is we look at this thing that's called finite state transducers which is in my opinion a very complex name for a very simple thing um right so finite state transducers more or less it's like you start from a point which is here these circles that that say zero and as you go through your string of IPA symbols you decide either you keep them as such and emit them or you replace them by something else so if i had the rule that said uh h disappears in all contexts then i get this uh i don't have a point no okay so just uh follow the tenet of my arm um the circle that says uh the at symbol says oh that is very kind uh so it says let's come back to that but it says every time you see an h just output zero so output nothing and then continue go through the string like these and every time you see an h just remove it right that's simple and the library we use is for foma uh that's more or less the same impact of fsc libraries is always the same uh and very similar example if i say no uh is always changed to n uh then i have the circle that say all right go through uh my string and every time you see an n replaced by n when you see a n just go with a n and when you see anything else just output anything else right um so far nothing really impressive i hope um now if we look at rules let's say something like oh h disappears in any time there's something before uh so any time that is not starting position then we would have this thing we say all right start your word consume the first letter and output it there's nothing that can happen here we always take the first letter uh ip symbol sorry and once we've consumed that least one item then we are back to the previous rule that we had on the previous slide which is to say all right if you see an h delete it anything else consume right does it make a rough sense so far um okay and then we get into rules that are a bit uh all right so this is the kind of standard linguistic notation and this is the format specific notation and i hope you see that uh they're fairly close so you can translate them uh fairly easily and i've done i've written a bit of uh software to do that so now we have a rule that says all right when you have uh delete it if it's between the word and okay all right so now uh we have the meaning of the word and we say if it's really anything except of word just consume it go with it and do nothing about it if it's a word go to the state one so okay we're happy with the word and there are several things that can happen either we get a e in which case we delete it and we go to the third one that's kind of the point of our rule with the condition that we would get a k after right and if it's anything else so if for instance we get the e but we don't get a k after so here there's no k we just start with e and then whatever sound comes after so this very simple rule like it translates to like this fairly complex diagram um and you know we can come back to it after and you tell me you know hey i don't understand why he's doing this um and you know if you had and another infinitely then you would just loop on the stage or if you had anything that was not then you go back to zero until finally you would find another and okay all right um so we see that we already get fairly complex uh output for like rules that are trivial for a linguist but that is the case uh there is support for uh future classes so if you want to say no these appears uh after something that is uh plus syllabic plus long and before an s uh you can write it fairly pretty much the same way as it is written as long as you define like what it means to be uh plus syllable and uh plus uh plus long oh sorry i wrote still plus but it's a long plus here so you can define future classes uh and so you can do nice things like this um and right so now that you've defined rules uh there's this concept called like going down and up which i guess is like you go down in time and up in time so i took this uh similar to the word and and uh quick that we had earlier but a bit easier to pronounce for me um let's imagine we have a rule that says i is uh goes to a when it's between per and per uh so if you have a word uh so you pass these words through this diagram it goes from apato to apeto okay nothing uh really crazy and a nice thing here that you can also go up so if you said all right i'm i have some french words i would like to reconstruct the possible latin ancestors uh you'd say all right i've seen apeto in french uh i don't know that it's the word but why not it's all right that could be apato or apeto right because uh but if you thought it's hey i have the word apato you would say all right that's not a possible string in french based on the rule you've just given me so it's able to as it reconstruct up at some point you will start cutting some branches uh because here for instance we start creating two branches it's all right maybe it's apato maybe it's apeto and when you apply all of those backwards you will have a really large amount of uh candidates uh and it will start cutting branches when it finds uh rules that can't be applied right so this kind of system can be used for phonology as we've done it's usually used for morphology uh but you can think of it you can compose them uh so this is the rules rule one is like none goes to uh uh before k uh we can analyze this if you want later and k goes to zero uh between the n and the t well i guess it would be a coronal in general but let's just go with t so these are the two rules uh and if you compose them this makes this massive diagram which i promise will behave the same way as you can say first do this and then do that right so it makes this very large uh graphs so here i have two rules now uh i took about it later but for latin to french i have a system of 600 rules and once i had something like 15 rules it the entire screen is covered with you know i can't even compute the entire thing so two rules and it's already a nightmare but it's a nightmare to look at uh the computer does the thing and it doesn't really care about this um right so now uh i i talked a little bit oh yeah and so these for instance we take the word like pink to ram and it would first go like through pink to ram and then pink pink to ram with the without the k um these are rules that go from latin to french i didn't do the full uh derivation up to french but to begin the so with this what do we have so far um we have this interface uh called caper that's been uh developed by a previous uh person on the project for xungong um where you can define uh he worked on uh proto-burnish and he defines all of these rules so you don't have to put them rule one you can actually you call them whatever you want and then he has like uh the modern pronunciation in many burnish dialects and then he tries to reconstruct not just one line but saying all all of these dialects let's try to reconstruct ancestors and sometimes you get a nice reconstruction for one or sometimes you have all of these possible things according to yours um and with this tool he published a dictionary etymological dictionary of proto-burnish or the burnish family so we have this tool uh what else do we have um come working on through coming here i uh made a proof of concept there was some uh reviewer from reviewers saying like this works for burnish but it can't work for languages that have more complexity level structures uh and so we did latin to french borrowing from uh uh mar and mortensen uh the first half of which is going to present tomorrow and is in this room uh where they digitize uh the entire uh classic uh from latin to modern french all the rules and so we now have a data set uh where we can like do these fsts that we've shown and go from latin to french um and yeah and the nice thing is that in addition to having the forward reconstruction that you get in by mortensen's paper we can also do backward reconstruction and so you know we could use uh if we had the same for latin to spanish latin to portuguese etc or you know when you have attestations if you find a manuscript that's in middle french or old french you could start plugging in these data so that you shave some of the possibilities in the reconstruction well latin to french of course we already know quite a lot about it and i think uh now it's time to go back to chinese though rather it's the end of what i have to say really uh so now that we've proven that it's the system kind of works for any uh sort of language not just vermish then we want to re-submit that paper and then further than that we want to start formalizing really this is uh the steps from all chinese to middle chinese as fsts so that we can publish and then other researchers can like come and contribute or argue the order but so that we can have a uh a collective conversation so starting from the base list we get in baxter uh the revisions uh that they've made in 2014 uh there are some books that are specific to the han that we've uh collected data from and then we've there's material that we started collecting and that we want to integrate in that system it's not super easy to re-integrate because uh how do you integrate rhyming data in a tree like this uh so rhyming bronzes uh there's like transcription so translation of buddhist text into chinese uh contain a lot of antique words for which we know the the phonology and so if there's something about the chinese phonology and recent excavation etc uh so that's where we at it's very work in progress and point in time