 Well, let's get started with this round table on transcripts for Asian languages. We have till 11, and then we have a coffee break. It's my pleasure to introduce the speakers and organizers of this round table, who are Rachel Griffith from the Australian Academy of Sciences, Franz Kave Erhard from Leipzig University, James Henry Morris from Waseda University, James as well as the University of London, Alexander O'Neill, also University of London, and Lee Shi-Hua from Trinity College Dublin, and Nicole Merkel-Hugh from Heidelberg University Library. The floor is all yours, please take it away. We have a mic over here, and this one also works. Okay, thank you very much for this introduction, so there's not much left to say for me, I guess. Maybe, yeah, thank you for joining us here, and the idea for this round table is a very kind of selfish one, because sort of Rachel and I, we both work on Tibetan, and when I started with transcripts only one year ago, I had sort of the strong need and desire to get more information about it and some sort of exchange with people who have already had more experience with the specific problems that Asian languages bring to transcripts. And that's how we had this idea to try to get together some people to exchange on issues with Asian languages, and maybe also find other people who are working with that, because this is a very, very difficult thing, you always start to reinvent the wheel from the beginning with these languages. Yes, so what is next? Hi, good morning. So just to give a brief overview of how the round table will work, we're joined by two speakers online, Alexander and James, and obviously the speakers here in person. So we'll take it in turns to introduce ourselves, tell you a bit about the scripts that we're working with and introduce maybe some of the challenges that we have working with transcribers so far, and then we'll break out into a discussion. So we've got some points on the screen of some of the questions that we're not going to anchor ourselves to, but we might refer back to to keep the conversation moving, but also it's open to questions from the audience, so feel free to chip in with any tips, advice, questions, thoughts throughout. OK, that's me again. So yeah, maybe you've seen my short presentation yesterday. I'm from Leipzig and I'm working on a corpus of Tibetan newspapers. And what I want to do is sort of do a qualitative study of it. But before being able to do that, we need to get to the text, actually. So yeah, and what my problems with transcribes transcribing is that in Tibetan, there are no transcribing conventions, not really. So every project working with Tibetan is doing according to their own kind of ideas, which then results in like training data that is available, but it's useless for other projects. So I think that would be something at least in this area of Tibetans, that is, people should kind of try to talk to each other a little bit more. Yeah. And then I don't know who's next. I see. Feel like we're hogging this. Hi. So I'm actually I've been working with Tibetan scripts on two projects. So the first was at the Austrian Academy of Sciences, the project's called Timskol. And the project wanted to utilize transcribers to automatically transcribe around 500 scholastic texts written in a variety of different scripts all Tibetan, but by different hands. So I was working on this project from 2021 to last year. And during that time, we developed two HDR models, one called Drutza, one called Betsuk, and I'm hoping you can see these on the screen. There's not much to the eye. There's not much of a variation between these scripts. The one Betsuk of the lower two folios is slightly shorter, squareer and could be a slightly more innate. But just those differences were enough to make the first model that we trained unusable on those scripts. So we decided to develop a separate HDR model just for those. And these were made publicly available in December. And the character error rates were quite happy with and the projects now being able to use these to start automatically transcribing these texts. One other thing to add is that, as Sabin mentioned, people working with Tibetan tend to use different conventions for transcribing Tibetan. And for this project, we used a transliteration system instead of Unicode just because some of the transcripts were already available to us, use this system so we could cut some time by reusing those. The current project that I'm working on is based in Paris at the École Pratique Autitude and again, working with Tibetan manuscripts this time with slightly different problems. But again, touching on what Zava said about conventions. So these texts have abbreviations that are rendered in all different forms and without there being a Unicode or any convention for standardising how we write these transcriptions. There's also some interesting punctuation marks and codological features like the two lines on the top folio that I've put a box around. And also these texts use their own type of numbering system that differs completely from the Tibetan numerical system. Often they're used interchangeably. But again, there's no system for accurately transcribing or transliterating these. So these are things that we're currently working on and trying to find a way of coming up with conventions and standardisations for these. So it's now my turn. I'm Nicole Merkel-Hilf from Heidelberg University. I've been using transcribo since 2018. And we started to use it for the Devanagari script, which is a North Indian script used for various languages. And we mainly use it for the printed texts in our collection. And these are texts from the 19th and early 20th century. We also had some trials with Urdu script used in Pakistan. And recently we started to work with Tamir, which is a South Indian script and different from the Devanagari script. It's more rounded in the characters. It has no ligatures. And that was what that is what proved difficult for the Devanagari. It has a lot of consonant compounds and sometimes up to four consonants clustered together into one grapheme. And we also encountered difficulties with the vocals, which are sometimes written beneath, sometimes below the character. And we also work with little graphically printed books. That means these texts have been handwritten on a stone and then printed. And here we encounter often writing errors and or words that have been omitted. And then they are placed above the line. And we really don't know how to deal with that. And so that it is clear where this added word has to be placed in the text. So we have haven't found a solution for that. And that is what our material looks like. As you can see, it's quite simple as what in regard of regarding the layout. So we don't have much apart from page numbers, headings and then the text. A bit more difficult is our journal Saraswati, which we are currently working on. It has two columns, but we found a solution for that. We trained a layout recognition model with Pitu Parla, which works quite well. Yes. And what else is there to mention? Ah, yes, our Devanagari model. One of our Devanagari models for the printed texts is comes out with a character error rate of just 2 percent. So that works quite well. And we published it for reuse on the transcripts platform. And we also published the ground truth on our own ground truth archive at Heidelberg University Library. And now I think it's Alexander's turn. Hello. Yes. I I developed a handwritten text recognition model for Nepalese manuscripts written in the Pratulat script, like those that you can see here on the on the folio that I screenshotted here. And this model that I developed as part of an AHRC funded project at Soas University of London. Um, essentially, as is made to recognize Pratulat and transcribe it into Pratulat Unicode. And one of one of the main difficulties I had in creating this model was essentially baselines or layout recognition, as you can see here in this picture. This is these are the results that I got using. Using I think it was using the Tibetan Pacha layout recognition model. I got pretty good results with universal lines, but it also requires a lot of manual correction. There are a lot of other issues. One of them is that there are very few manuscripts either in Sanskrit or Nehwar. And those are the two languages of which the manuscripts that I was working with were in there. There were very few fully diplomatic additions to work with. So I had to essentially I bootstrapped Sanskrit additions of some Sanskrit texts, which had which had versions in a manuscript form. And that sped up the process considerably. And so using about four different manuscripts to in two of them in Nehwar and two of them in Sanskrit. I was able to in training training the model in 2022. I was able to get a character error rate of 2.6 and then essentially correcting, correcting new transcriptions and then refeeding them into the model. I was able to get a character error rate of 2.0. I also deposited my data on in the Heidelberg University system the Haydata system. And and yeah, this is this is really part of a bigger project that I'm involved with, whose goal is essentially to create a corpus of Nehwar. And since there are so few existing transcriptions of Nehwar manuscripts or texts at all out there, whether diplomatic or edited, this is really one of the first steps in our pipeline. And so it was really, really essential. And it's something that I'm still working on. So, yeah, and I think that is about all that I'll say for now. Thank you. Thank you, Chihai. Thank you. Next. Yeah. So, hi, I'm Shiholi from Chinese Center for Asian Studies, Chinese College, Dublin. So, like the focus of my page. Anyway, so like the focus of my research is the Tudiao language, which is a minority language in central South China that belongs to the Tibetan Burma language family. And just to mention, like the Tudiao language has no writing system. And you may wonder, like, how are they supposed to do like to have transcribers to transcribe a document in this language that has no writing system? So, actually, the tradition, at least in China, to document languages that has no writing system is to use both Chinese and international phonetic alphabet, IPA symbols. So, like here, in this picture, this is a typical type of world list in China to document the minority language that has no writing system. So, basically here, we have all of the information structured in a table way. And the leftmost column here, we have the lexical meanings of the words recorded in Chinese and followed by columns presenting the pronunciations of the corresponding words in each and every dialects recorded. So, speaking of the training, the training for this model includes roughly around 14,000 words, 345 pages, and we reached the cut to error rate at 5.9. And let's talk a little bit more about the training. So, first about the layout. So, initially, we were trying to kind of develop a table model because everything is structured in a table way, but then somehow we decided it would be easier for us to do like this on this page. So, here we have only one text region identified. And so, like for each of the row in the table, we cover it with a baseline. So, basically, the lexical meanings in Chinese on the very left side and then followed by the corresponding pronunciation. So, the first row is just the title of the page, lexical world list. And the second row, we have three columns. Each of them are kind of tags of different dialects we have on this page. But still, like we ran into some of the issues, like if you can see here on the fourth row. So, like we here, we have the word, the word recorded for the lexical meaning son. So, here in the first dialect we have here, we have in this dialect two different pronunciations of the same word recorded and span two lines. So, usually, the case will be like the model cannot recognize this correctly. Instead, you will recognize, say, two different baselines and it will also ruin the reading order of the file, of the page as well. And somehow it cannot be solved by the model, by training new models. So, basically, speaking of the latest version of the layout model, we can only manually correct it. And the transcription of Chinese, so the model now can recognize 2400 characters. But still, we ran into some of the issues which can be seen in two ways. So, the first is like creating, the model tends to create new words. So, here, like I'm not sure whether it should be called as an error, because as we can see here for the first two examples for creating. So, like, you always like the one on the left-hand side is the original version we give to train the model. And the one on the right-hand side is like, what we got from the model. So, like, if you can see, there are always several extra strokes added by the model. But like for the first pair, the one is actually not an error, because it's an alternative form of the original word. Well, in the second example, the one on the right-hand side is like the traditional form of the Chinese character. But even rather, we kind of have the cases where we have the model combined, a part of the Chinese character with another part of another Chinese character, which makes completely a new Chinese character that is not existed in the language. And the second way we can see of the error is replacing. So, whenever it meets some, say, difficult characters that it cannot handle or deal with, it will kind of replace what it has been trained so as you can see from these two pairs. So, still, like, the one on the left is the original version we give. Speaking of the IPA transcription, I think the model now can recognize, say, 100 IPA symbols, more or less, it's all we have in the system. And the problem would also be like mixing, the model kind of makes things up some of the different symbols. So, here we have the number two mixed with the stop consonant. Yeah, so here we have numbers because of the language is a tonal language. And normally, in China, the scholars will record it with numbers to represent different tone levels. And also, we have the model to mix with two different, say, vowels. And that's basically all about it. Thank you. Thank you. I think James is left. James, do you have slides that you'd like to share? Yes, if it's possible. Hold on. I'll just need to set you up for a second. Thank you very much. Apologies for not sending things on time. It seems like I can I can share it. Wonderful. Right. So my name is James Morris. I'm working at Waseda University and this project details. I'm working with a sort of text called Kirishitanban, which is a Japanese type of text, which was printed by Jesuit missionaries in the 16th and 17th centuries. And these texts pose. Unique problems. They tend to be written in up to four different languages in combination. Predominantly in Latin or Japanese, but I believe if I'm not mistaken that every text contains some Latin script and Latin language and every text, I think all most of the text contain Japanese script as well. But some are predominantly one or the other. Right. So if you can see my screen here on the left, we have a text which is predominantly Japanese. And these languages then. Can be written in two scripts as well. Right. So. They can be read in Japanese is that it can be written in Japanese script or in Latin scripts. So on the right hand side, we have a text written in Latin script, but the language is Japanese. And. So this great mixture of languages and scripts is one potential problem we have. But there's also some opportunities with these texts, unlike. Other late medieval and early modern Japanese texts. These are printed using movable type print, which means there's a lot of greater amount of consistency between the characters in comparison to woodblock printed texts. And it also means there's actually a limited number of characters and variations, which there aren't in in other Japanese language texts. So myself and others independently have been working on Kiddish Dunban and we tend we're tending to get similar results. So I did two experimental models using texts written in Japanese language in Latin script, which, as you are likely aware, is going to have good results because it's in Latin script, right? So the first model had a one point two nine percent character error rate for the validation set. And the second experimental model had a two point six eight character error rate. Other scholars, Mamiya Gawasso and Sophie Nutsula have made a similar model with a two point zero nine percent character error rate. The problem we tend to have with the Latin script texts is accents, which tends to come out poorly and working on Japanese language texts, which I'm experimenting with. The problems tend to be at the early stages of layout, right? So it's vertical transgress does not deal with vertical text. Well, I may be someone has a solution for this, but I haven't found it. So we tend to have to do the layout manually. Yes, wonderful. And I'm hoping we can talk about some of some of these shared and unique issues later in our discussions. Thank you very much. And please keep hearing. Oh, Kate, you want. Yeah, maybe. I would directly want to ask a question to James. As I speak, can you hear me? Yes. OK, great. So I guess sort of the Japanese text you're showing here is top to bottom, right? Yes. And yeah, top to top to bottom and right to left. OK, so could you share your experience in how that works with transcripts? Does it work at all? Did you do any? Yeah, so my the experiments I'm doing with Japanese script are still in quite early stages, but I'm trying to do any automated layout analysis is basically impossible. It doesn't recognize lines, although these these are in quite clear lines, although some other Japanese texts perhaps don't have lines at all. These are all in very clear vertical lines. It tends to sort of just pick random characters. And so you'll just have like a single character as a baseline there and the order is often. In reverse, because my transcript is wants to work like this, right? So here will be our first sentence rather than here, which is our actual first sentence. But so Miyagawa, who is also working on these sorts of texts, I don't want to share too much of his work because I don't have permission, but he's told me that a potential work around for this is actually to flip the images horizontally so that this would no longer be vertical. And then you can recognize the lines fairly easily if you do that. But I've been trying to to keep the text in vertical, yes. Yeah. OK, thank you. Thank you. Well, I'm just like a step forward. But maybe then sort of we have we start the discussion. Well, you can moderate. OK. Yeah, would you like to share your most pressing challenges? I don't have challenges anymore with the Devanagali script because that works quite well for our material. The challenge we now face is with a Tamil script because here we're working together with a project at Heidelberg University at the Center for Asian and Transcultural Studies, CATS, abbreviated. And they're working with printed Tamil texts. And they have two scripts in it, the modern Tamil and the classical Tamil called, I think, Manipravalam. And as I understood, not all of the Manipravalam characters have a unique code code. So that's a bit problematic. And they transcribe the Manipravalam in Devanagali. So we're currently training a mixed model with two scripts. Devanagali script for the Manipravalam text and modern Tamil for the modern Tamil text. And what's a bit tricky is that you have not only you have the Manipravalam script also in the modern Tamil mixed up. So it's not separated word by word. But the script, the two scripts are sometimes mixed in one word. And the first model we trained on only 20 pages came out with a character error rate of about 2 percent, which I think is okay, because we only had 20 pages. And currently we're now transcribing more texts or the colleagues from the project do that. And then we will start training or improving the model. And then we will see how it works. But I'm not sure if this approach transcribing a script in another Indian or South Asian script is useful for the whole community. I think it's tailored for this particular project. And I don't know if that is a good approach if you have the community as a large in mind. Well, I think it's very interesting because even though I'm only working with Tibetan, but in those newspapers I'm working with, there's also some Chinese. And what happens with these Chinese there from the 50s? So this was when simplifying Chinese characters was in the beginning. And characters were not simplified altogether, sort of brought from the traditional character to the now standard unicoded character. But there were steps in between. And so in our newspapers we find a lot of characters that are not in unicode. So we have no real way to transcribe them in transcribers. So it seems also a bit related. So we've been lacking the unicode characters. And well, at the moment, we're just leaving it out. So we're not transcribing the Chinese text for that reason. And a similar problem I think is with the script's ratio is working. Is this on? Yeah. Yeah. So I was actually going to ask you, I think quite a few of you are working with just from your presentations, scripts where there's no unicode equivalent, not always. So just what approaches have you taken? And this is just mentioned that he's omitting some of those that are tricky, which is I guess a short-term solution, but maybe not a permanent solution. So I was just wondering if anyone's developed an approach to this, because we haven't yet, we're right at the beginning of our project in Paris. So I was hoping to kind of see what other people were doing and have some things to think about when I go back. I know from another project in Germany, OCRD, they worked with German Frachtursch scripts, which are also tricky. And they develop best practice guides for transcribing. And if I remember correctly, they transcribe characters that don't have a unicode in the ASCII code. But then, of course, you cannot train the data models with that. But that might work when you correct the text that comes out of transcripts. And then you might replace the omitted characters with the ASCII code if that works for your scripts. I don't know, but that might be a solution. Yeah, something I could look into. Shiho, you're the same, right? Do you have other elements of your scripts where there's no unicode equivalent? Actually, that's actually very rare. So like only maybe one out of a hundred pages. So literally just mentally corrected. So it's no kind of useful approach for now. My texts have some characters which don't exist in unicode. If I show the slide again, down here we have a character which consists of a D, a long S and a tilde combined. So it's a sort of ligature, but it acts like a Japanese kanji. So it has a reading. Like a Japanese character would have a reading rather than it's not a phonetic sound, it's a word. And there's no way to type these into a computer. So as part of this wider project, we developed a font which would allow these special characters. There's about 30 of them and they appear fairly regularly on this page. Just quickly looking, I can see three or four. But yes, there's no possible way to put this new font into transgress at the moment. So there are some competing standards in the field, but usually people put these special characters into some sort of Latin equivalent. So since this is a D, a long S and a tilde, people tend to put it as a D and an S in transcriptions. So that's what I've been doing. Sorry, I'll stop sharing again. Yeah, thank you. Just to kind of go back to the unicode. So I've been working with Chinese HDR, but not in transcribers. With the scriptorium and the decision that was made there with characters, so there's a lot of character variants and some of them are available in unicode. Some of them are not. So the kind of consistent, the decision for consistency that we made was that we're going to transcribe with the character that is closer to the meaning of it, not the visual similarity. And then as a second stage in the future when one develops then a font or develops those characters in unicode, then you would just replace those places where you transcribe the character not in the right way, but in the best way that was decided in that point in time. And then just a small comment on the vertical segmentation in transcribers. So I know that Emma Stanford, from one of the libraries in Stanford, she created a training for vertical text for Chinese, but the next stage is the transcription. So to my knowledge, that doesn't exist just yet in Chinese in transcribers, but I think the approach there, the idea was to then having to segment each character as a text region and then create a baseline for that. So it's a bit painful, but that's a possible approach for language which has so many characters and it could be written in a vertical form. Thank you very much for that. Can I add a small suggestion just for consideration? Maybe with the new fields models, you can try to create sort of column, so to say, and that might also kind of add. Yeah, that's in my newspapers, I try to sort of train this to recognize sort of vertical Chinese and it sometimes kind of works. So it finds two or three lines. And then in other cases, it just like totally fails. So it's probably again sort of adding more data could possibly help. But yeah, even though it's, I don't think it's very promising. You only need to combine potentially different segmentation models because it's just about a third of the things that potentially in terms of collaborating which is really other pages that other people submitted could be not even in the language that they want on that, but potentially be helpful with more other cases. I'm not sure if it's something that you should just want to really be able to name out the check that they had. So I don't know how much they can play out. Have you heard that online? So the suggestion was to in order to kind of be able to train a model that is capable of detecting the layout of vertical Chinese or vertical Japanese or maybe even Mongolian is to try to train a layout model that works for all sorts of vertical scripts, sort of get training data from other projects and then combine that. Yes, maybe. I guess that would apply to you as well. Like Alexander, obviously it's a different script because it's horizontal, but in terms of your layout, it's quite a standard layout for lots of other languages with that gap in the middle. So maybe working with other projects as well and sharing data might be a step in the right direction. Just a thought. It usually, although recently it seems to be a bit better, but what I was noticing when I was training the models a couple of years ago and last year was it was detecting layout very similar to what you would expect in a European newspaper with multiple columns because there are holes in the middle. There are lines which are divided by gaps where, for example, there would be a string hole to keep the folios together or where they would put an illustration in more ornate manuscripts. But the lines continue across those gaps. So yeah, perhaps training a layout model which is able to just deal with that and then sharing that would be the best way going forward. What I realized from the discussion yesterday and also today that I think there are two major issues when it comes to the Asian languages in general. The first one is, as we already discussed, the vertical lines. This has been a discussion for years, I think, how to solve this issue. And I think it should be more of a priority to finally find a solution for that because we are facing it so often that these texts are written from top to bottom and still there is no good layout solution yet where even everything starts. And then it even comes to the transcription. So it should be kind of like it starts from maybe making small baselines for each character and somehow down there in the transcription that it all is represented as one line. This can be realized somehow. But I'm not a programmer, I don't know how it works. But I think that would be maybe for the transcribers team to take maybe some notes that this should be a top priority to make a layout analysis from top to bottom or maybe a layout model if you want. But yeah, this should be really something that should be more of a concern because it affects so many Asian languages. And the second one that I came across from your presentation, Sava, and also today, and Nicole from Devan Agarri, that vowels are not recognized properly. If the character is up or down there's a small ad that the vowel is not recognized. I think this is also across the Asian languages that this is an issue that should be worked on, that these small things should be more considered. I don't know how it's technically possible to be more precise in this case. But just for me personally to sum up this, I think these two problems are the one where we can suggest to transcribers to make these top priority because it affects most of the projects on Asian languages, I think. Yeah, absolutely. Top priority, I think yes. There's actually like one issue I am encountering is, I don't know if the slides Rachel shared with the Tibetan, you can see there's gaps in the between and it's not separating sentences, but it's just sort of a pause or a writing convention. And then also you have white space in Tibetan, sometimes sort of there's between text and numbers, for instance, there's a white space. And it seems that transcribers, once you do the automatic transcription, it replaces any white, any space as a single space. So in my transcription, I end up losing sort of the information that is in this longer gap. And I also, since white space is white space, there's also no, it's simple way to replace it. So I think this is, I don't know what is why this occurs because in my transcriptions I checked and also in the XML output and everywhere, in the ground truth, there are tabs still. So they disappear in the automatic transcription. So I don't know, this is, well, creating downstream many, many problems now. I guess to maybe to kind of respond to that, I guess it's possible to just create yourself some sort of convention on how to handle the white spaces that you want to preserve. So for example, you decide that a certain character is representative of the space and then later on you can do some whatever you want with that. It's just kind of... Well, that means that we have now to go back and do that in the ground truth, of course. Yeah, I know. And I think, I heard rumors, there are other characters that are kind of sort of simplified into sort of dashes into one single. So it would be maybe also a good thing for transcripts to let us know what characters are actually there to be transcribed now. I do have a question. It follows up from what Anemiko was asking before. And it's kind of an open question, is whether anyone has tried field models on to use that for recognition and segmentation because it's quite more advanced than the P2Pala. And it might be dealing well with these issues of these holes and spaces that we still want to kind of, to express that they are there. We don't want to lose that information. So just whether if just interested, if anyone has tried to use field models for this type of material or this type of text. Stressing that it's still in beta, so it might be off the radar for some people. More questions or suggestions over? I have a question for other people, if possible. One of the things we tend to face is Japanese texts, which not only with transcribers models, but also all OCR platforms, is the existence of interlinear gloss. It can be pretty heavy in Japanese texts, either telling us how to read characters in terms of their pronunciation, or telling us the order of reading characters or things like this. And the general solution so far has been, we just ignore anything that is in between lines. I don't have a solution, but I wondered if other people have sort of a similar features in their texts. Is there a lot of sort of this extra textual material? And Matt, what are you doing with it? This is my question. Maybe I'll go first. Yeah, so in Tibetan texts, you often get interlinear glosses. And with Tibskol, we decided to actually ignore them. We tried, so the layout recognition was the first step. It often ignored them, because they're so closely packed between the two lines in a text that it just assumed it was part of either the line above or beneath. So it would kind of warp the shape of the baseline. So that would have needed a lot of manual correction and retraining to try and get a baseline model that would recognize those as separate. And then when we did manually draw them out and tested the HDR, they're often too small or there's too much noise going on for them to be accurately transcribed. So for now, we just decided to ignore those and only mark up the glosses that were in the margins of the text. So not much help, but yeah, that was our experience. It shows that our fields are doing the same thing, basically. Thank you. Did anyone else have interlinear glosses? As I mentioned, we have this phenomenon in the lithographically printed text when, for example, a word has been omitted. And it is written between the lines. And what we do with this, we just draw another line and then have this single word recognized. And but the output, of course, is unsatisfactory because when someone uses this recognized text, the person won't know where to place this single word. But it's still there and it's searchable. And in our presentation, users can compare the image with the OCR text. And then they might be able to identify where the word from this one single line appears in the document. But it's not an ideal solution. Sorry to begin. I just to say, I remember that there used to be a page on the Transcriber's website on transcription conventions. And right now when I click on it, it just goes to the help center. So maybe it's just working. OK, it's in. OK, I tried to search the help center. I couldn't find it. Anyway, this could be a good place to record decisions that are made by people like us to deal with Asian languages. And then that could be like the go to place to look for, perhaps best practice. So things that we kind of like together decided that this is how we would like to do it. And that could also help with the issue that was raised here about the ground truth. If it's, you know, if people use different, you know, different schemas or different conventions and then and then it becomes useless for other projects. If they do follow the same guidelines, then then one could use them the same ground truth to, you know, to improve and create models. Just out of interest, is this something that anyone could submit to the convention? So is it open? I see Miriam nodding. Yeah, please do just submitted to us. And we can add it to the to help center pages or something. Great, even if it's really big. So if it's a project set of conventions, I mean, we can add links and stuff like that. Okay, perfect. Great. Thank you. And there's also Zenodo where there is this community that you can link to which can link to transcribers. So there are solutions. So final words. Well, in given the time, I would very much like to thank the speakers and the discussion for Asian languages and I can just say there were quite a few people from the team here, so we did hear you. So thank you very, very much and warm applause for you.