 So, hi, I'm from Chinese Center for Asian Studies, Trinity college doubling. So, so like the focus of my page. Anyway, so like the focus of my research is the language, which is a naughty language in central South China. That belongs to the Tibetan Burma language family. And just to mention like the language has no writing system. And you may wonder like, how is it supposed to do like to have transcribers to transcribe a document in this language that has no writing system so actually the tradition Alice in China. And to document languages that has no writing system is to use both Chinese and international fanatic alphabet IPC most so like here. This picture, this is a typical type of world list. In China to document the minority language that has no writing system so basically here we have all of the information structured in the table way. And the left most column here we have the lexical meanings of the words recorded in Chinese and followed by columns presenting the pronunciations of the corresponding words in each and every dialects recorded. So, speaking of the training, the training for this model includes around roughly around 14,000 words 345 pages and we reached the character average at 5.9. So, a little bit more about the training so first about the layout so initially we were trying to kind of develop a table model because everything structured in the table way but then somehow we decided it would be easier for us to do like this on this page so here we have only one text region. And so like for each of the row in the table we covered with a baseline so basically select the lexical meanings in Chinese on the very left side. And then followed by the corresponding pronunciation so the first row is just the title of the page lexical world list. In the second row. We have three columns. Each of them are kind of tags of different dialects we have on this page. But still like we ran into some of the issues like if you can see here on the fourth row. So like we hear we have the word, the word recorded for the lexical meaning son. So here in the first dialect we have here. We have in this dialect, two different pronunciations of the same word recorded, and it's been two lines. So, you really will the case will be like the model cannot recognize this correctly. Instead, you will recognize say two different baselines, and it will also ruin the reading order of the file of the page as well. And somehow it cannot be solved by the model by training new models so basically speaking the latest version of the layout model we can only manually cracked it. And the transcription of Chinese so the model now can recognize to 2400 characters. And but still we ran into some of the issues which can be seen two ways so the first day like creating the model is 10 tends to creating new words. So here like I'm not sure whether if it should be called as an error because, as we can see here for the first two examples for creating. So like, so you always like the one on the left hand side. The original version we give to train the model and the one on the right hand side is like what we got from the model. So like if you can see there are always several extra strokes added by the model, but like for the first, for the first pair. The one is actually not an error because it's an alternative form of the original word well the in the second example, the one on the on the right hand side is like the traditional form of the Chinese character. So even rather we, we kind of have the cases where we have the model combined a part of the Chinese character with another part of another Chinese character. So we, which makes completely a new Chinese character that is not existed in the language. And the second way we can see of the areas replacing so whenever it met some say, difficult characters that it cannot handle or deal with it will kind of replace what it has been trained. So, as you can see from these two pairs. So still like the one on the left. The original version we give, speaking of the IPA transcription. I think the model now can recognize say 100 IP symbols, more likely to all we have in the system. And the problem would also be like mixing the model kind of mixing up some of the different symbols so here we have the number two makes the way to stop consonant. So here we have numbers because of the language is a tonal language and normally in China, the scholars will recorded with numbers to represent different tone levels. And also we have the model to mix with two different, say, vowels. And that's basically all about it. Thank you.