 My name is Ilaria Cruz. I'm a linguist and a mayor speaker of this language called Chatino, which is spoken in Oaxaca, Mexico. Today I wish to extend my gratitude to the organizers of this specific workshop, but I also wanna extend a gratitude to the ELDP mandalas here and to SOAS because you guys have over the years since 2015 have funded the documentation of the Chatino language, and for the same reasons here at SOAS, we have a large collection actually of the Chatino language, like for example, I just today went out and just did a quick Google search of the Chatino language and look at all of these that we have. We have a large collection right here at SOAS of spoken language, but also sign language. So one of the dreams that I have is acquired a technology so that we can transcribe all of this data so that it could be for use for native speakers for making books on the Chatino language for schools, but also for materials for linguists and artists and anyone who wants to use this language. So I'm very, very grateful for that. So I feel that there is a part of Chatino and I'm grateful for that to be here. So this is the second time that I come back to SOAS, so it's an awesome institution. So let's see. So today, one of the things that I would like to do is to talk about some of the quests that I have gone through to acquire ASR for the Chatino languages. So we know that ASR right now is becoming very prevalent everywhere you go, right? Like you have Alexa, you have Google, so you talk to them, but we know that this technology is mostly available for major languages such as German, Spanish, Japanese, all of these languages, but it seems like there is not a lot of resources for endangered or minority languages. So when I began to do this work, I was also learning how to transcribe the language myself. So I was learning how to read and write in the Chatino languages as I was writing my dissertation at the University of Texas. So what happened there was that nation states in their efforts to bring everyone together, they just did not have any funding for starting indigenous languages, like in the 1940s, members of the Summer Institute of Linguistics, which are, there were two people from ALSO, from here, from England, who were in the Chatino regions. So the only, some of the few things that we know about indigenous language in Mexico were work that were done by the people from the SIL, and the members of the SIL were not doing this work for altruistic purposes because they wanted to know of the language. I mean, many, some of them who were linguists wanted to know about the language, but their main goal was to be able to translate the Bible. So the Mexican government did not allocate any funding for research in these languages. So people like me, I did not know how to speak the Spanish language until I was eight years old. And then when I went through school, I acquired literacy through competing immersion in these languages. So then when I got older, I questioned, why is it that I know how to write Spanish? I know how to write English, but I don't know how to represent the sounds that I have in my mind about the Chatino language. And Chatino, like the Nal languages, are have very unique sound systems and consonant systems that you do not have if you are trying to use the Spanish alphabet to be able to represent these languages. So like for example, here we have some consonants that have laminal sounds, like for example, da, corn, qt, paper, qt, frog. In this slide, we also see that we have glottostaps as well. Which is the glottostap is something that we represent with the q. So we were trying to use the alphabet with the Spanish language, but the Spanish language we know does not have laminal sounds, they don't have glottostaps. Also, we have lots of nasal sounds just like French, right? But let's see, I also talk about the glottostaps there. As you can see, we have a language that has a lot of glottostaps. So we were trying to use the Spanish alphabet to represent all of these sounds that you don't have. So then also like the null languages that Alexa was talking about, the Chattino language is very monosyllabic and we also have a large system of tones. We have 12 contrasts of tones. So I'm just gonna give you this little group. For example, we have ke ke keke. So in this one it has the little elongation as well. So that will be, his head will be ke and then deal with it will be ke. I think that there is a small distinction in there but anyhow, so this is just to give you a little example of some of the things that we were trying to represent in the Chattino language. So if you don't have any research in this language and if you are trying to represent your language then what is it that you do? So we were dealt with a lot of the service by the Mexican nation state and by any means any settler colonial states have done the same thing with indigenous languages. So then what is it that you do? Luckily I began to hear about Native American communities in the United States who were working with linguists to recover their languages. There are these linguists were working with these native speakers going back into the files and maybe even reviving some of these languages that cease to be spoken for even a hundred years. So then I thought, wow, you know what? I think the linguists could help us represent our language because what happens is that there were some proposals for how to represent these languages but if you wanted to read these languages back if you don't have a good orthography, an orthography that represents all the phonemes in the language is very difficult to be able to read these languages. So what we wanted was, if we wanted was an alphabet that could represent the sound systems of the language and something that was easy for us to read back and hopefully use it for teaching others. So we needed a system like that. So this is how I began to do linguistics because I wanted, all I knew about linguistics at the time was that I wanted to get a writing system for the Chatino language. So then, so as we continue doing our work we became acquainted with the literature of endangered languages which was very invigorating for us. And then so we have, so we did, we have done many years of language documentation in all different contexts. Like for example, the corpus that we have here as I was saying before at here at SOAS. So we have, we have formal conversations, we have everyday conversations, like we have these kitchen conversations. So we have a really large collection of data and it's very rich and we have women, men and I don't know if we even have some children conversation as well. So we have that richness. So then, so then, Lina Hu, when she began to document the sound, the sign language in San Juan Keige in my community, she asked me to help her transcribe the recordings that she was getting, the conversations that she was getting from talking to families. So then I began my transcription for her. I have done mine as well. But when I began to do, to work and really in earnest to do the transcriptions for her, I just, just found myself like, you know, like getting back aches, getting carpal tunnel, getting, I was just like tired of writing the same thing over and over again. Like for example, the word cha and chatino, and I was thinking, well, what the heck with this? I mean, you know, right now we have this, you know, like advanced technology for, you know, speech recognition, why is it that I cannot have something that could just write cha for something, like could just write a few things, right? Something, you know, I just need some help. So then I began to ask around on Facebook of all places, I said, hey, what can I do to get an ASR for cha-tino? So people began to say, well, you know, you need a really large collection, you know, you need a lot of data. So then there was this effort, right? Like were you part of ARBARC? Yeah. Yeah, okay. So there was this group of people, there were phoneticians and computational linguists, and they were complaining that they could not use the large collection of indigenous languages or endangered languages in the archives. So just because they needed, they would like to use some of this data for their model. But then they were saying that, you know, the majority of these languages in these language archives did not have transcription, which is right. So then they began to meet, and it happened that I was lucky when they were meeting that I just went and raised my hand. Yes, okay. So what do I need to do? How many hours would I need to work, transcribe so that you can begin a system? So at the time, people could not answer that. I mean, it's just the same thing. Like for example, Madonna is asking the same questions. Okay, so tell me with detail, how many hours would I need? What are some of the steps? Some easy steps that I can do in order to begin this effort. And it seems like nobody knew at the time. So then there I met some people from linguist list. And then the people at linguist list says, well, you know, I think that you can do, you can do it with 10 hours. So then I went to, I went to the basement of linguist list. And then what I did was I took some texts and I respect these texts. And then so we put these texts on open, we offer it as an open source. It is out there. It's, I think that we acquired 3.8 hours of these texts. And these were the texts that Oliver and Alexa used to compare with the na language. Okay, so actually I do continue to ask the same question, you know, because I have, I don't have these answers. Yeah, I do have some of them. So I continue to ask this question. And then I began to do a fellowship at Dartmouth College. And there I wanted to find, people over there at Dartmouth College who were working on computer science. I know that they have a huge computer science out there. But to my surprise is that there was no one who was doing NLP at Dartmouth College. So I could not find anyone there. And then the linguistics department had been trying to hire a computational linguist as well. But then they could not hire anyone at the time when I was there. So then the new co-foundation said, you know, that was sponsoring my fellowship. Said to me, well, why don't you have a little conversation? How about retreat? Bring people so they can have this conversation. So I got the funding to get the retreat. And then Oliver and I began to have all of these conversations. So a lot of people came to this retreat. So we had this weekend. We had a wonderful weekend. Did you enjoy it, Oliver? Yes. Yes, so it was a very informal gathering. You know, when we got together, we went outside sightseeing. We ate good food and we just kind of sat down in a very informal setting in a house where we just had these conversations. Okay, basically we were just kind of getting to know the different cultures of all of these different people working in these fields. And this is, so I got some of my answers in there. So one of the things that we found out right here, so we had, we have linguists, we have language activists and we have computer science. And so one of the things that I found out there was that we just did not know about each other a lot. I mean, a lot. So then, what else? What else did we do for that? Well, I got a better sense of how different people was linguistic work, clothes work. Because prior to that, my conversations were largely with a small set of people such as Alexi. So I got a better sense of the variability that's in there and the different types of data people had available. Yes, and another thing also is that I feel that right now the literature that is emerging from this work, from technologies for endangered languages, is not taking into account native speakers like myself who are doing language documentation, but also a lot of native speakers are running out, are beginning to get trained in linguistics. And many of us really have this, are ready. We want these tools to be able to help us do the transcription. But the problem that I see here is that there is not a lot of continuity. Like for example, we met with Oliver at the retreat, but then communication did not continue. I'm very happy that I see you guys today and we hope that we can continue this. But also, I see that there is a lot of hiring in different linguistics departments and universities. Everyone is trying to hire a computational linguist. Everyone, but in this quest I have realized that many, the majority of computational linguists are not interested in problems of NLP. And you might be the exception. And hopefully, there will be more people who are interested. And then I've been talking to chairs of different departments, people in linguistics departments. And I asked them, well, why don't you hire an engineer who does NLP? And they said, no, no, no, we won't do that because we don't want to hire anyone who does not have a background in linguistics because they have to teach all of these really basic linguistics courses. And on the same way, I see that programs that do a lot of NLP like Carnegie Mellon, like engineering programs, they don't have linguists as well. So I think that we need to have some more conversations. We need, you know, these departments, like for example, CMU, Carnegie Mellon, they have a wonderful lab there, but I don't see a lot of continuity as well. So it'll be wonderful if there was a scholarship that to support a native speaker to go there one or two years and work there so you'll see more continuity in there. And if I was a chair of a linguistics department, I will hire a person with an NLP background even if they didn't know linguistics, but I'm not there yet. So there are many things that we still need to do, even just kind of hash out some little, even cultural issues for how to do these things. I think that with computational linguistics, with documentation, we have done some of these work. Like for example, when we are documentation of the Chattino language, we have this very synergistic group where we had system linguists like Anthony Wilbury and there will be native speakers like us and then we did a few work together and we were all very committed to this and I see more and more right now documentation programs where you have these collaborations, very wonderful collaborations actually right now, I see more and more people are beginning to see that if you are a linguist and if you are gonna go and do field work with communities that you are doing it in conjunction with community members. I think that this is something that we need to start pushing for people who are computer science to do if we wanna move forward in this conversation for how to acquire technology for endangered languages. What do you guys think?