 There's been a lot of developments in artificial intelligence that we could be using to automate some of the tasks that are most difficult in language documentation. So for me the one that's really the bottleneck is the transcription because everything depends on it and it takes forever as you all know. So I'm very interested in that. I'm very interested in parsing as a way to annotate corporal. I'm very interested in making tools that can serve the purpose of language, criminalization, and reclamation. So things that for computer scientists might seem trite and boring, things like predictive keywords, but for communities they are credible consequences. So I'm going to tell you a little bit first about processing, language documentation, a little bit about the delivery people and the language that go down to Mali, and about algorithms that we can use to make NLP. That amount of word decadent is so time consuming when it's incredibly competitive it meets high levels of expertise, like you can't just put your headphones on and listen to music all you do. You have to be completely focused. And yet it's something where it's like the frequent words are going to be the frequent words again and again and again like surely a computer is the thing to do this. The transcription we need for linguistics, but of course we need it to create children's books, for example, to create something that schools can use to train more teachers, more leaders in the language. There's the translation of our corporal, for example, which we also need if we're doing research or if we're doing educational material, for example. We like annotating corporal, for example, phonetics studies. We want to have alignment so that we can study five phonetics of languages. We want to have corporal annotated semantically, syntactically, and again we ultimately want to turn all of these into things that could be useful for the community, teachers, and for the future speakers at least. The transcription, of course, is the main character of the story. So people have estimated that maybe you need about 50 hours of work to provide one hour of audio. It's not consuming, but also the expert knowledge is insane how much you need to get these to a reliable version and also a version with stable photography. This is insane and it stops everything else. And of course the technology is not perfect, so all of you have probably watched YouTube with auto-generated subtitles and they're not perfect, but it does exist for many languages. So it's a technology that we could use for the languages that we work with as well. The transcription is one part, but then we also want corporal that are properly tagged to study, again, phonetics, to study syntax, semantics. We want to translate them. This destroys your soul. Doing alignment by hand is something that takes hours, one hour, to finally figuring out where each sound is. Translating is also a very cognitively heavy task that only very few people can do. If you think about who can annotate or op-rock, in my experience, for example, in Latin America, Oniget is usually school teachers. Those are the only people who would know enough of the languages to be able to write them in a stable manner and who also care about this work. And they're the busiest people in Latin America. Money cannot make them work with you because they're already so busy preparing the classes and caring for the children doing impossible things to educate children in the most dire circumstances. So this might probably be just throw money out. Again, we also want these to make things like books and tools so that children can learn the languages and also tools to make the languages relevant for the daily lives of speakers. They can use to message each other. They can use to make jokes to tell people that they love each other so that they can feel the presence of the languages in their daily life, which is of course one of the things that is lost with language displacement. So I'm from Costa Rica and I mostly work on Latin America. I work with Mexico and Bolivia and the southern U.S. But mostly I'm just from Costa Rica. I worked in New Zealand for about three years and that's how I started working with Cook Island's money, which is located in Central Polynesia. That's a really, really first. So the Ruby language has about 70,000 speakers in Costa Rica. In these regions here there's a mountain range in between the two, which is quite these groups of separate. By the way, just the Czechoslovakia is here in Central America. The language belongs to a family called Chyptian languages, which are spoken here in Panama and parts of Colombia. It is a very rural community. There might be as many as 7,000 speakers and as few as 3,000 speakers, a new census is coming out soon. Hopefully we'll have more information. It is vulnerable in that there are still some children speak the language, but not all children speak the language. As I have linguists here, so interesting features about Ruby Panama. It is an ergative language. It's SOB, so you can have eye ergative, the house soft. You have inflectional morphology. So it would be the conjugation for the past perfect. Some way, it would be the conjugation for the past perfect. You have things that would be difficult for a machine to translate. So, for example, you have a very complex monstrosity. Du e would be that bird, but Du ai is that bird up there, up here and nearby. Du dia, we have worked down there and very far away. Du se is a bird that you can hear, but you cannot see. So you can see how the machine translation system would produce very clunky translations from here to here. And if you only have to English that bird, it will be very indeterminate, how to generate this word when you go in this direction. It has numerical classifiers, like Japanese, Mandarin, Chinese. Du wuk are two birds where the black and arachnid are two women. The data that we have comes from two sources. There's an oral form of this, and we're in the process of continuing its transcription. Right now there are 68 minutes of transcribed volume. These are mostly some personal songs, there's stories, there's instructions on how to make cocoa, for example, which we'll look at in a couple of minutes. We also have publications, so you can read universities, for example, there are dictionaries, there are books to learn the languages, so we have some examples from those. We have managed together about 90,000 words of monolingual text. And again, if you think of the MLP applications that we might be used to, you probably used to be able to do over a million of them. Things that were very different from this level. For example, worded phonics to do the semantics when you use the contemporary transformer systems. The system can learn a lot less semantics from such little density. Let's cross the ocean. So I am here. I work here for a couple of years, sorry, here in Wellington. Somewhere in the middle there's the Cook Islands. This is a country separate but with an association treaty with New Zealand, so they have New Zealand passports. The main island is rather Tonga, which has a large airport and receives many tourists. And there are some smaller islands, which are very small airports and have almost no tourists. Maokai, for example, is the island where Southern Nicholas is from, because I have just about 13,000 speakers in the Cook Islands, at least 8,000 more in New Zealand. It is highly endangered in Maokai Tonga because there's so much contact with English because of tourism and because it's where the capital is, so it's where most of the contact is with New Zealand. It is vulnerable in many of the other smaller islands. So in Maokai, for example, children in schools still speak the language, speaking with each other. Almost no children in Maokai Tonga speak the Cook Islands language. Something that helps us to include the speech that you mentioned. It has relatively few phonies, so five vowels, it also has long vowels. It has only nine consonants, whereas really has several vowels, seven nasal vowels, five tongues, many more consonants. So this one has a relatively small English inventory. It has isolating morphology, so it doesn't really have all the inflections in the words. Kwa-tun-no-paw-li-ti-ta-taw would be perfect plant. I accuse it of the top. I plant the top. Kwa-tun-no-kya-tun-ta-taw could eat the dogs that have left. So you can see, for example, that both the perfect tense and the plural number are separate words, which helps because you don't have to learn so much. This whole thing started because my colleague here, after writing it to us, she did her PhD. She is from Maokai, and did her PhD writing with her arm for Kwa-tun-no-paw-li-ti-ta-taw. Within the PhD, she recorded dozens of hours of elders, some stories, genealogies, but she faced a challenge. When we met, she said that she just got so much data that she was going to die before it was finished. So is there something that can be done? So everything you're going to see here is an attempt to answering that question, to get anything done to transcribe these in a manner that is quicker and useful for the documentation of the language. It's very linguistically rich in that it has good conversations, stories, it is very sparsely annotated, which is what we're trying to fix, and transcriptional supports the main mission. Over the last three to four years, we've managed to transcribe about four hours, which is what this is for. We've used the system to accelerate that more than a minute. One of the things that we've tried was using something called untrained force by microns. So taking a model for English, and then like stopping from that is what we've been doing. So, to rato would be like theirs. Let's say you want to get on a micron like this, where you have to do a with its phonetics, the t with its phonetic correspondence. We used an English model and then taught it that there were some words in English that sounded like toe, rat, toe. We tried to find it. It was a simple fix, but it was actually, the error was only about 8%, and finally the center of the words, and the error was only about 25% of the center of the vowels. It was not as difficult to correct it later, it was still time-consuming, but it helped us to make star research. We got about 4,000 vowels with which we managed to make vowel charts. You can notice, for example, that these spools are here in the more canonical position, and those spools are more central. These two islands have large airports that receive a lot of New Zealand tourists, and New Zealanders kind of pronounce their word in this speech. These islands have really small airports, and almost get no New Zealanders. There's a corresponding more canonical Polynesian position. And to be very honest, we don't have an explanation for my math here. But we do have a hypothesis that the context of English is changing between the phonemic inventory, and we got this from several thousand vowels that we got because we jammed the words into a lot of English. So even weird and creative solutions can really help you get started. We presented it to the school teachers who were training, and it was a very positive reaction because they said that they were proud and excited to have complex and sophisticated languages because we were, of course, telling them about these patterns, vowels, contact, or the glottals. The glottals are becoming laryngeal like creaky voices of circumstances. We managed to get a thousand of these examples too. And this, of course, goes against the narrative of the languages invalid or simpler than English. So not only is this very desirable result, it also made it so that people wanted to work with us to continue getting more information. Let me tell you a little bit about those things that have changed in the last couple of years. Mostly for CS people, they're not used to working with sexual reasons. I mean, they get, you know, a huge zip file in English and they did everything to spy on us. Obviously, we cannot do that for all languages. The data is not just sparse, but it's so difficult to generate. It's expensive precisely because there's so few people who have the expertise, so few school teachers can work with us, and it takes much longer, and it's much more expensive to find experts who can have this value going. There's a lot of demographic divergence. For example, different scholars choose different systems. Members of the community might say, I'm just going to write it, that's I hear it, which is beautiful, and please do so. The important thing is to keep using these languages. It should be up to us, the computer scientists, to figure out how to teach the computer to deal with this. But it is a statistically difficult problem, how to deal with so much variation in such a sparse data set. Of course, these languages are found in more complex social and western environments. It's going to be called switching, for example, who can speak English and Greek and Spanish. And out of a perverse coincidence, languages like Mandarin and English, like the ones that usual tools are made for, they're not technologically rich. So that means when rich morphology are going to have corporates even more sparse. In order, if you have a million words in English, you're going to find run and running much more often than the corresponding conjugations in a language with many verbal conjugations. So I'm going to give you examples of four particular speech recognition, machine translation, parsing, and predictive keeping. It's very, very difficult to describe. But as I told you, there has been a lot of progress in this area in these last five years, mostly in helping the computer understand context, helping the computer remember what things that I've seen before. In previous systems, let's say you had a sound recording, and then every five milliseconds, you try to figure out what was the sound here? What was the sound here? What was the sound here? So the system would try to say, get something like and then try to see if that sequence of potential guesses sums up to one orthographic English word. I didn't have a way to read beyond its local window. It had very little information about its context. When you started being invented in the 80s, they weren't really implemented strongly until the 2000s because of hardware constraints, really. But in the 2000s, people used loom types of neural networks that had some memory for what they saw before. These are both RNNs, LSTMs, like large, long, short-term memory networks, where if you're reading this window here, for example, you could transmit information of what you saw later into the prediction of what you were seeing before, and you could transmit a prediction of what you were seeing now into the future to see if it has something. You can see how the linguistics gives a core articulation in cases where one phone influences the other. This will be highly usable. This was one thing that was implemented, but really helped. The particular loom called Beep Speech, which is from about 10 years ago, was very good at doing this. In about 2018, there was an algorithm invented called Transformers. They take strings of input, turn them into some intermediate representation, and then transform that intermediate representation into another output string. And these can be anything. This can be a question, intermediate, decoded into an answer. This could be English, encoded, and then you decode it into German. Or, in our case, it could be bits of sound that you encode and you decode it to a potential orthographic representation. This is very useful, however, it consumes frightful amounts of data in order for it to train correctly. The really big leap that happened is that now we have enough data from main languages, large languages in English, Mandarin, Hungarian, Swedish, Spanish, where those can aid us in our transcription. So, for example, the Cook Islands Valley A is going to sound a lot like things that are sort of in Spanish or Hungarian or Japanese. And so newer mechanisms have a way of going through the windows of the sound, trying to see if they match things they have learned in other languages, and incorporating this information into how they produce their output. There's a particular algorithm called WavedVec2 from MediFiberberg, and it has this. Because it is supported with multilingual information, it can produce better guesses with less data. Because if it doesn't have enough network information to support its prediction, it draws from its other languages to try to have a guess of what's happening. Notice interestingly that this is going to be better with sounds than with words, because it's prioritizing understanding which sounds it has from its other data, in particular words of the language it has to learn from its own data set. And so you're going to get very good character accuracy. You're going to see characters that sound very much like what the recording is, but for it to learn the actual words, it's going to take more data. In the case of Cook Islands Valley, we used approximately 237 minutes, hopefully more after this winter, which has about 36,000 words from 10 speakers, which means from 30 to 75 years old, from four different islands in WDR. WDR is word error rate. CDR is carrier error rate, and these are three algorithms. So this speech would be a comment of the deep learning neural network without a multilingual component. WavedVec2 would be a transformer with a multilingual component. Colby would be, is the old system. So these old systems try to break up the problem into, first I'm going to figure out what the sound is, and then I'm going to draw the probability of these sounds becoming a word, and then I'm going to draw the probability of these words making up an actual sentence of the language. It is a system that is more than 20 years old. It's good with little amount of data, but it has many limits in how much it can learn from large amount of data, which is what it's the older system now. We prefer end to end solutions at this point. As you can see, within the deep learning range, if it lacks the multilingual model, it performs much worse, both in work and error rate and character error rate. And we are now to the point in with deep learning where we can improve, improve upon the character error rate of the older rights. These use statistical methods and that's why they're relatively good with little data, but they do not scale. So these are the three systems and the transcriptions that they produce. I'm going to play you the sound clips. As you can see, the Wave2Vec transcription is not bad, like it gets really close. By the way, you can hear our general companion at 12 in the environment, a rooster in the back. No, it doesn't really work. You really didn't know something like, get to work. With a little bit of deep learning, the computer was confused about it. Again, the newer methods would prefer accuracy in sound than accuracy in words. And it's only after they have a lot of data that they can begin truly understanding the words. So you can see the other problems again with words or the word boundaries, but this is a transcription that's decent enough that it will take less time to correct it than to store it over my hand. So how long, how many hours of data would you need for the word error rate to be ideal? Well, obviously it depends on the definition of ideal. Here we have admittedly a little bar, which is, will it take less time to correct it than to do that? If you want it to be reliable enough that you can subscribe a video on YouTube, probably in the, not 100, but probably between 10 and 15, something like that. Obviously though, so it doesn't decrease in a linear fashion. The more data you get, the smaller gains you make. So with the gain from 10 to 100, it takes a lot more data to reduce from 6% to 3% error rate. People who have, for example, there's work in Mexico to transcribe now a lot from those little sabotage crime, which is data about 80 hours of data. And this has made it so that the systems are competitive with English. They get work error rates of between 5% and 10%, which is a highly developed data point. You could have a system giving you a good preliminary transcription. We are hoping to get to at least 10 hours. So for the first time this, February and March, we're doing fieldwork with the program and we incorporated the system into the workflow. So students would do recordings with native speakers. The computer would provide the first pass. Our non-expert students would correct the computer pass. And then in the third pass, the expert would correct. And we managed to get about an hour of data in a month, which would have taken us a year in our previous workflow. So as you can see, it's no longer a toy. It really is a tool. This is from Green Green, which is again the language from Costa Rica. Much more complex phonology. It has nasal vowels. It has tone and still. So this is an example that has good results. This is an average one, and this is one where it was bad. As you can see, the word's completely bad. The average, sorry, is the median. Character error rate was 23% of the median where the rate was 65. It moved the median. You can see the default computer means it's almost like moving the microphone to some of you. So it couldn't hear it. This one is like a valid variation. I'm very happy to see that it got it as it sounded and not necessarily as it's spelled. So it is, if you give it enough data, it'll probably become aware of these kinds of things. This is a language called Kabikar, which is a sister language in Green Green. We have, let's say, we want to come back to the experiment, 12 speakers, median 22% character error rate, where the rate is 53%. This is a good result, median result, not great result. So one thing that might be happening here, which is normally called mean overfitting might be happening here, but it might be focusing too much on the speakers that know from the data set. I'm not so confident that it would be able to generalize effectively. And I do need to warn you that these results that I showed you were speakers that the system had heard before. If the system has not heard somewhat before, the error rates will double. So this experiment we conducted where we extracted people from the training data set, and then we figured out, we tried to figure out what the errors would be. So the average error, if the computer doesn't know you, would be 15% error rate and 46%. Sorry, character error rate, 46% error rate for Cook Island smile. These are, again, if the computer doesn't know you. You can see it starts missing vowels, more problems with the word boundaries, some problems here with missing vowels, and stuff like that. So as I told you, we do have working prototypes, 40 is our perception system. We are working on improving the way we want to, frankly, need to transfer it more in order for that to really get a hard time to be, so that we can commonly use them to make good first fast transcriptions. But if we are already seeing dramatic improvements in our transcription times, four different types. And I'm going to have your questions by no means. I'm going to show you a little bit of what we're going to machine translation. So the same system of transformers that you can use to transform question into an answer, you can use to turn English into French, for example. So, again, these are for transformers. We have about 90,000 words of rebreat, and we have the Spanish translation from many of them. We might actually get 10,000 pairs, bi-text pairs of rebreat and Spanish, from mostly our learning textbooks. We do have the issuance, you're probably very familiar with, of a lot of variation in it. So different scholars chose them in representations, like this is the nasal vowel as well. Spinalar writes it, the start of the region hat writes it, and the writes it, and the smart degree writes it. For example, the line under the vowel can be encoded in many different ways in unicode, and we have found all of them, so we need to standardize that in many ways. The language is the same. Human language has a phonetic phenomenon that makes the actual pronunciation different sometimes. There's nasal assimilation. So for example, this i is nasal or virtually necessary m. That's actually the other way around. The m is nasal and virtually necessary i. But because these two are nasal, you can choose to write it or not. Unstressed vowels can be deleted, so the word m can mean more than me. There is social diversity variation, so different dialects have, for example, nara and nol, we wrote. And most relevant to you all is that, of course, materials are usually published and asked you here, so here are transcriptions from native speakers, which of course we love and we need, but they might be written in a form that might be very different from the system. The word dying means a lot, and we found it in 16 different forms, and they're unpredictable. You'd never know exactly what they're going to mean, but it's going to sound like, for example, it's going to be an n, it's going to be a line, it's going to be nothing. So there's a lot of work that needs to be done in order to make these datasets more uniform. We've chosen some internal representation support, but we have to invert into the number of representations going in, out of the representations going out, so they can be read by CSS. These are some examples of the results that we've gotten, so if we want numbers, they range between blue, 14, and 16. If you have Spanish, like if you have English and German pairs in a system with large training data, you get little 40, so it's about less than half of the performance. But you can see that it's starting to understand a little bit, like for example, the bird is sitting on the branch. You do get, this is what it should get, and this is what the system offered as a prediction. Sometimes it hallucinates things, as many, you know, if they don't tell us, it's good to do. So it should have said that it was sitting next to the river, it said that it was sitting next to the row, probably the closest it could get. This one should say that the shirt is hanging, that my shirt is hanging over there. The translation just says there shirt is. This one has a description of where you were at the house, but that you were standing at the house. The computer could not see the positionals, but only figure out that you were standing up there. It still has to work, but you can see that with materials of linguists already have textbooks and grammars, you could get modeling systems going. We have not worked on this, but we are testing two things, or we're testing unsupervised systems where you try to use the modelable text to use it to learn internal structure than with that, and we're testing something called transfer learning. We're trying to see if the computer can learn both languages at the same time and therefore improve its results. Our students will know is the process of deciphering the structure of the given string, so figuring out that some things are verbs, pronouns, some things are subjects, an auxiliary, so it's going to happen. That's the sentence that we read Isashikena, which is how are you, how did you see it? We did it, we did it kind of like what you were doing it, but not started with CCG, but CCG, and what's starting with just Sancti of Jesus, the Syntem of Jesus. So we built one of these words, a sentence we made about an MPNPP, and we transformed it into a fantasy structure. We've done that with about 1500 words, so this is the longest one that we have managed to to make a manual parse for as we train the system, so it's about how you cut co-code. As you can see, this one's going to be oral for us, so it's not just some example of a technologist, we've really spoken about just what we want in the dataset. With using this, we've trained the system to learn parsing automatically, just to show you a little bit about the system. The evaluations are based on these two methods, which is unlabeled attachment and label attachment. So unlabeled attachment means it's the arrow going in the right direction, I didn't care about the label, but it's the arrow okay. Labeled attachment is, are both the arrow and the label okay, like did I find my subject and did I correctly identify that it was a subject? This one, She's Eating Rights, has the subject correctly, so this is a verb, a subject, a compliment to the main verb, and the direct object rights, so it has a hundred and all, but when you post it's just the part of speeches in the verb word. This is just an example of what it involved. This one is yellow here, so sorry, here yellow, yellow here, this is here, this is yellow, so it should correctly be a noun on an adjective, and it should be going in this direction. You can see that it messed it up. It thinks it's a verb, thinks it's a subject, so none of the arrows go in the right direction, it gets a zero in that, because it didn't get any of the arrows right, did not get the arrows and the labels zero, it didn't get the punctuation, and the noun is correct, so the part of speech is 66%. For a Ruby day, I'm going to show you about these results, so in fact, some of the arrows go in the right direction, when we give it immuno data, 81% of the labeled arrows are in the right direction, around 90% of the part of speech, so even this would be very useful to start annotating a corpus and try to find some tactic balance. We have become work on the book on smiling parsing, the tagger is about 92% correct for the parts of speech, I tested this last week, we've gone about the same thing like 81, for the labeled attachment, and maybe three for the unlabeled attachment. And we are looking for, we do have people, students who have studied a little bit of panel's body, working on expanding the status, this is like one of the fans here, once that we have parsed the data, public servants are permitted to travel overseas. We hope to release these soon, we're working very hard, for example, on making something called the feature system, whether determining whether the programs are exclusive, inclusive, first person, second person, so I'm finishing the annotations correctly, and for both of the languages. So again, these are very nerdy and very useful for us linguists, but most of the community is not going to care about these arrows, they're not going to care really about how it's written, all they want is to tell each other where they're going to get a burger, all they want is to tell each other that they should go out and hang out and have fun. All of these texts that we compile could be turned into very useful tools, like predictive keyboard systems. So we make them for Kavekar, Kodokal and Smalati, and we have had success in getting people installing them in both environments, and they're starting to give us feedback, of course, about what's wrong, which things don't work, which things didn't work. They have very strange patterns, which CF people usually ignore, for example, we are trying to do parts of the Bible because it was a lot of the text that was available, so it cannot, for a while, it couldn't say, hey, but it could say the names of ancient prophets, because that's what this system used. So these aren't the kinds of things that you don't think of when you think of big TXC files, but we're slowly deploying them, and we actually have them in testing right now, with young, not only the teachers, but in the Kodokal Smalati with some younger high schoolers, fingers crossed, they'll give us more feedback. So again, what are we doing this for? I don't like this term, but this term is very resourceful, you'll shit out 2020. They're not the lush mindsets, they disagree with that. This is the number of labeled data in number of data sets that exist for language. So there's one data set, two data sets, three data sets, 10 data sets, 100,000 and so forth, and this is the number of labeled data data sets, no data set, one data set, nothing, 10, 100,000. I don't like the name again, but most languages in the world would be in this category zero, or there's maybe zero or one data set that exists for them. Obviously the languages that have the most information, we want to have a lot of resources, and for which many resources are collected. Most of the languages in the world, as you can see, are labeled with that unfortunate label, but vice we reject that and hope to keep working on it. People have spoken of the term digital debt, for example, if we don't have the language on the internet, that might mean that they're digitally dead. This is relevant, of course, but we have to remember that what makes people, that what makes languages alive is people who speak them. It's not having computers who remember them. How many CD models do you have in your offices by now? Cassettes, lack cylinders, only modifies. Well, we do have cassettes. We do have CD models, or we read it, for example. It's just there, like it's just moldy, and so in 20 years, these wave-defective models will be also digitally moldy, and having, training all these things is going to be useless for the language, if it's just they're stored in some USB drive. We need, when you create these tools, we'll be thinking of what can we do to have these make an impact in the community. The fact that a computer remembers all these words is not going to make a language more widely spoken. The ideal process would be to use it to help, for example, create communities where people can use the languages. This is a beautiful project. We'll use your voice that's still going on, where people, Zapotec, both from Mexico and from people in the US who speak language as well, have created communities to share content in Zapotec through Twitter and social media. This is from an actor in Mexico who translated the interface of this phone into Zapotec, and it's because of a person who said that I have the right to have my language in every part of my life, including the interface of my phone, the thing I use the most every day. So whatever we can do to further this process, even if it seems simple, like rhythmic standards, is a direction that we should be taking. Imagine if we were training high schoolers to talk to robots and tell them, go left, go right, something that you can do with the artists and the ASR working panel. We face many challenges. It's going to summarize and we'll discuss it later. For example, data is sovereignty. We're very used to the model where, like, the linguist owns the data, and the community has little say in how it's working. In Cochrane's mind, we make sure that the older control of the data remains with the Cochrane's linguist, with Dr. Dr. Nicholas, and with the community, so that you can make the decisions of how to proceed. We're very fortunate, ethically, in that we are working on these because the community wants us to make these tools in order to finish the documentation and stuff. In Brewery, we still have the challenge of we work with the non-traditional model of linguists and collaborators of communities, but we need to focus on training even from the communities in CS, in linguistics, so that they can be the ones who manage these products next. And so that they can take decisions about what to do with all the data. In Māori, there's this whole thing that has been happening. So there's a group of Māori activists who collected a beautiful dataset of ASR for the Wellington, New Zealand, in Māori. And so every tech company in the world was trying to get their hands on it. And of course, they were very jealous of it because it took them a lot of effort to compile for the most important reason. It's the communities. And if they're not going to see a scent of this or any tools coming out of this, much of the large computer companies have been thrown. So this discussion is something that we need to have with this. For example, the parsing trees, we're going to have to release it in some way where the community continues to have to have to give permission to people for them to use it in their projects, so that it's not just sharing like a huge model that is trained and packaged away and the community doesn't see that scent from it. This is one of the main challenges that we have. It's something that needs to be discovered in these projects. Obviously, it takes a village. So I do want to acknowledge the Cook Islands team. Sally, you have invited Nicholas. It's going to be one of the main characters of the story. She is a professor of linguistics and Māori in the University of Auckland. She is a linguistics art major and her documentation work is the one that we're trying to support with tools. Dean Tikura Mason is going to be the voice of the text to speech. She is incredibly generous with her time and with her knowledge of the Cook Islands Māori culture and language. Teachers for people who are trained to be high school and school teachers throughout the Cook Islands have helped us people from school of market. Tyler Peterson, Peter B. Wells, Lian Kokawa, Emma Acura-Val-Powell, Samina Dara, Victoria Quinn, Jessica Chance, Zaya Tenger, Sarah Harts, Wendu Dekka, Lynn Conway, Revella Benton have helped us with aspects of the thesis and there's more people coming in than that. Thanks to them as well. On the Chieftain team protocol, so you can remember this, Grimby and Kavekar, we have Sophia Flores, Father Corpus, Isaac, Dee, Annie, Catherine, Nien, Sharid, General, Taiwan, Freddy, Franklin, Alex and more people joining in. Thanks to them these tools are working. Thank you so much for your time. So way to let you has this kind of cross-singuistic background database. But then in order to know the spell things and like how the language involves it all, it has to know something about the language going in. So if you didn't knowing nothing, you had to call that zero shot transcription. If you gave it a little bit, which is what we're doing, it would be called few shot transcription. So I guess based on that my question is how many is the sort of old standard that you needed in the first place. So there's so little research on this, there's no old standard. And it's interesting because also even though these two terms are very common in computer literature, zero and few shot, no one knows what few means. Everyone just uses like the old definition. So a few shot can mean 10 examples or it can mean 10,000. Probably not a thousand, but I mean a thousand could still be considered few for example if you're dealing with something very large. There's no old standard. Obviously if you do zero shot it's not going to know anything about the actual language. It's probably going to imagine that its vowels sound a little bit like Spanish and its consonants sound a little bit like something else that I kind of do. Sure, the system like Japanese. I think a lot of people are trying to do zero shot transcription, maybe I should do it. Because I mean it is, for example, common in modernization of Japanese in the time that the long thoughts are annotated. Yeah, I wonder. So this is a great point because probably it learns Japanese in characters not in modernization. So it does have, it learns the writing systems as well as the transcription. So it's a question. My impression would be that the Japanese symbols were only in characters, but there's going to be other languages in there that have macro long bands. I'll get back to you on that. I don't know. Yeah, and maybe it'll just do two use. We should do that. So yes. I have a question that is related to the band because you tested like also why it was written where it's like new to sound, right? Yeah. But you obtained the result. It was kind of impressive. Yeah, I was impressed too. I was not confident, but it would work. Yeah, obvious. I mean, because it's also learning new words. So for English, it has to solve the challenge of night, for example, K, N, I, G, H, D. There's things in there that don't have a sound, but that should be good. So it has assets. As it learns its conversion, it must also learn the orthographic forms that it should be aiming for. It must have succeeded in learning that there were only so many words in the 4% of the machine. It was impressive. Yeah, and exactly why it would be night, like night example run among. Yes. Yeah. I was also, but I was like to be perfectly honest. That was really interesting. We'll have to see which ones have messed up. Like if it, you later have to tell me if you noticed that there were specific letters being dropped out or something like that. Yeah, I would be very interested in knowing that. You can see it, you can see that problem in English with different foreign names. If you start a YouTube video that has a lot of foreign words, yeah, you're going to see it taking back tunnels that try to figure out what to say. I'm just curious about the IP treatment project. Yeah. And if you have to, it's just new labels. Sorry, could you say it? New labels, like the latest label. Yes. I mean, how many new labels compared to the one that they're ready? We have not come up with new labels, but we will have to, yes. We haven't invented new labels yet. We haven't proposed yet. We will have to propose a new feature at some point because it has a test that some languages in Europe have, but this one, you like it. So, really has something called a hodierno tense, which old French have, for example, which is for things that happen today. So, something from this morning and from right now, we have the same tense. We're thinking of proposing that feature, but other than that, we've kind of just spent a little time with you to be able to see it. Because we want, yeah, we obviously, I mean, the instruction that you need to give us, we're making rich use of the X part of the speech column. But, and we will have to do the X, like X relationships in the last column, the adaptations, because definitely in the Cook Island party, for example, the auxiliaries can be tense aspect, they can be directional. So they can be, there's a bunch of other things in Polynesian purple clusters, and we want to distinguish them but in the universal system, they're just aux. So we will make rich use of the, like on the terms of common. Yeah. But how much a priori knowledge of the language syntax do you need in order to make the tree lines? Do you mean how much a human needs to know in order to start this project? Well, it's sort of, yeah, because a lot of these little resource languages, they might have sort of use of credit syntax or maybe something that people don't. So, something that doesn't really fit the mold. I mean, like, it didn't happen in this particular case, particularly for example, which had a pre-wrote focus on how, you know, heads actually work in a lot of syntax, you know, what happens if you have a line like that and you want to make tree banks, but it turns out that actually, you know, it's not easy. That is a fine question. So I'm going to try to answer it in like a very complete way that we'll see if I manage to answer it. For the Cook Island spotting problem, for example, we have, so this is someone's thesis starting the tree bank. She had knowledge of Hawaiian, this is Derek Martin, she had knowledge of Hawaiian, and she studied this, the grammar in the active language approach, so she had a little bit of knowledge about Hawaiian language. The people who are helping us to continue with the same assignment all the time, for example, has studied both therial language and Cook Island's Māori, at least, or like they go into the semaph of a quarter, so they have some knowledge of the grammar. They're not expert speakers, but they have had to study the grammar. We really have a problem that, probably what occurs a bit steeper, and also, the weird problem is that the grammar for this is in English. The grammar for really is in Spanish, and so we're having to translate it in order for students to learn it, and we have had less success with hiring outside people to help us with that. As for hiring members of the community, we have not had success because, in the case of the council body, the only people who know this are Ackerman Inglis, my colleague, and the teachers that we train to teach high school. They only, I mean, their training is at a dividing verb and a noun, and so forth, so we would have to train them a lot in order to get the required knowledge. As for review, yeah, we still improve. So, because it's been, it would be an ideal to review community members, and just from a very pragmatic point of view, because we've gotten the money home to them, but we've had to have a lot of difficulties hiring them. So, the students of the CIREC have had unlimited knowledge. If someone comes in with zero, they're obviously not going to be able to do it. Does that answer your question? Yeah, I think that answers the question. It's a lot of work. And obviously, so, there's many cases where the students that cannot figure it out, I know a little bit, but I cannot figure out, so we have to go to, for example, World Grammar, and it's like, what are we supposed to do here? And there's a couple of circumstances in the language that we're still trying to figure out exactly how to do it. How do you teach, I don't know, way into that, too, to a very nice talent degree? So, in all the systems, you have to give it the explicit money-making inventory for it to develop a satellite. And so, you did have a problem of, like, oh, it's a high-toned A different from a low-toned A, or it was respective. You're going to have to separate these diacritic, like the diacritic, like write it like A, big H, virginal limits, the foam and tone. In monosystems, they are black boxes. So, it's because it's end-to-end with big black boxes in the middle. We, it learns it. We assume it learns it, and there's very little research on exactly, if the representation of vowels with high tones are similar. So, that's something I need to do next. Like, for example, are the internal embeddings for words with high tones similar to each other, and the similar two words with low tones? So, had a minimal pair that was the same, which showed me by tone, well, I mean, would there be some part of the embedding that clearly refers to tone and is at the stake of the old system? I have not seen much for any research, because from the CSI, the workers trust, like, oh, we worked there well, and the analysts haven't really gotten to it. So, there is, there is more research we have to work in the older systems than in the newer systems. And in the older systems, it happened a lot of times. You have to separate the diacritics. The better results you got was with explicitly telling it, there's a character in the CA, there's a character in the high, trying to find, figure out a pattern, and then, like, relink them correctly in the output.