 Yeah, as I said, welcome to panel two of this afternoon's program. Today we are going to hear some scholarship presentations and we are really, really proud and really excited about this program. So this is the program where we as a cooperative try to support especially early career researchers and teachers to everybody who helps people to learn about handwritten text recognition technology and transcribers in particular. And yeah, let's see what some of those wonderful people around the world have been up to. The title of this session is from non-Latin character to layout analysis to research starts as impact learning by experience, experience and sharing solutions. And as you know, the co-op is all about sharing. And let's see what the speakers of this session have to share. First up is Constanta Borlaku from the University of Oxford with her talk about Old Romanian transcription translation or interpretation. Yeah, I'm not sure how much of a solution I'll offer, but I definitely would like to offer some questions. Let me just get going with this so that everything is on up and going. Can you hear me all right? Yeah, OK, wonderful. So I shall start by saying that Old Romanian, which is dated between the very beginning of the 16th century. That's when we have our first text and the kind of end of the 18th century. Like this. Nineteenth century scholars have been extra kind of isolated area, so the answer lies in the state building process. But recently it has been seen that the constant war makes it strong. Like the sociologist Charles May has said, war made a mistake and state made war. Usually the state building is approached from the macro level by studying teams and how could you use that? Yeah, the disadvantage of being a tall person. So such as the T that with the comment that you can see in my name, which is stands for that's, for example, present in the Cyrillic alphabet. And for the Latin, it needed to be adopted, of course, to the linguistic means of Romanian. Now, another introductory kind of word, transcription, transcription, transliteration and interpretive phonetic transcription. Transcription, we all know, that's what we do with transcribers. Transliteration is instead the action of kind of changing from one script to another, the same content of a text. And in my case, it would be taking the original Cyrillic script and adapting it to the Latin one and interpretive phonetic transcription is something that's present, that we find indeed in Romanian scholarship. And I kind of quote, I kind of don't quote, I translated myself and interpreted it. It's the practice of transliterating a piece of text, written with Cyrillic letters into Latin letters, by applying orthographic rules derived from Romanian history of phonology and or current spelling rules. And this is a little bit problematic. I'll tell you more in detail why. And so I wonder whether interpretive phonetic transcription is even an adequate kind of label to put down to it. Again, my personal needs were philological. I wanted to understand to work on two early translated biblical books. So I worked on the soldier and the apostolos, so the acts of the Apostles and the Epistles. And at the very beginning, I tried to understand what was the original source. And I realized that it was a Trislavonic source. And then I tried to draw the Stemachodicum. So I understand how the witnesses, the sexual witnesses that we've inherited are linked, if at all, to one another. Because the work was mainly lexical, I decided not to translate the sources and so kind of put myself aside from Romanian scholarship, let's say, and keep the original Cyrillic script. Just a couple of names, because they will come up in the models that then have been devised. Important point. My sources were manuscripts and printed books. Everything happens in the 16th century. They were monolingual and bilingual. So Romanian with Cyrillic script, but also Trislavonic interlinear with Romanian. So not a column and a column, but literally a piece of text followed by the Romanian translation. Again, some names so that you kind of get your year used to it. Kursumakis' soldier is the most important one. That will be also my case study. There is the Voronetz Codex, the Schaea soldier. The Bratocodex is a bilingual one. And Koresi is the big name for typography in Romanian history of the book. OK, let's get started with the models. How I got into all of this. First of all, I've been quite lucky because for Trislavonic, there are some publicly available models that are very good. And they are provided by Ahim Rabus from the Freiburg University. We'll be speaking tomorrow, actually. Now, good starting point, at least not completely scared. But because Transcribus learns also linguistic features, it would work only on half of my, well, half over half over half of my data. So the Cyrillic part in the bilingual manuscripts. And so although the characters would be similar, the script is similar, the linguistic data per se is different because we are dealing with another language. So I need you to train my own models. This is kind of schematically what I've been up to. We already know this, it's a process of recycling. You start transcribing, you create a little model and then you keep on transcribing and then another little model and so on. So for me, it has been like an accordion. You would have a small source-based model and then I would put together other models, like combine them and so on and so forth. And again, because some sources are monolingual and some others are bilingual, some printed and some other are handwritten, the outcome has been varying. The best one is the first one, which is based on printed sources. And I must admit that I didn't even need that many word tokens for that. Yeah, it's even less than 4,000 and it's already below a 5% error rate. So that was good. Second half of the presentation. Oh no, sorry, that was not supposed to, I thought I, sorry, that was not supposed to be there. Second half of the presentation, so case study. I then wanted to address these transcribing, transliterating, interpreting. What do we do? How can I use transcribes for this? So my case study has been the Hormuzaki soldier. It's the earliest source ever. Linguistically, a chaos, very interesting. So the first attempt has been to get as much of a good model as possible for the recognition of the Cyrillic characters. Now, I basically transcribed all the, all the manuscript. So like 200 and something pages checked it all and then created a model. And that model goes to 5.5% error rate. So basically that's as good as it gets. I don't have many more folia for this manuscript. So that's it. I then thought of a, I must say, I must say not perfect table of correspondence between the Cyrillic and the Latin characters, which you see here. And then kind of in an automatic manner, I fed into the transcribers the transliterated data and then created another model. This model has an error rate of 7.3%. So not great, but a good starting point. And I've done this automatically. So there is something called protea, not, I mean, it's basically you can put in this chart that you see to the left with the correspondence. And then it automatically you bring in your file or text and then it automatically basically transliterates. If you are into programming, you can do that as well. And not programming, so I've decided to do it this way. Now, the problem is that there are some problematic letters for, hmm, for Czechoslovakia. So the way they are interpreted phonetically in Romanian. For example, the first one, K is usually transliterated as C and it has these phonetically is K. Now, if K is followed by the frontal vowels E or I or E, that's kind of difficulty with English. In modern Romanian, you write it with the C is followed by an H. So to have the same sound of key, K, because if you write C and then I or C and E, it's like in modern Italian, it's Che Chi, right? So it's a completely different phonetically, it's a completely different phenomenon which needs interpreting, say. When it goes down to the vowels, it's even more difficult. So for example, you can see these ones in Slavic studies, they are called yeru. They even have three different interpretations, possible interpretations. Eh, so that's what we have the schwa in English, bottle, right, or no phonetic value at all. So what do we do with that? In my previous here chart, I just kind of gave them a random symbol coming from Slavic studies so that I have a one-to-one correspondence, but those phenomena need interpreting. So, yes, two minutes. That's the outcome here to the right of just kind of doing diplomatic transliteration. And this instead is my picture, diplomatic transliteration. I must confess that to a modern Romanian, this would sound a little bit incomprehensible. And then that's the interpreted transliteration or what Romanians would call interpretive phonetic transcription. Now, I've not yet found a solution for how to make transcribes become that into that. And I wonder if that's possible at all. Open floor for discussion. I will skip this. And then, yeah, to go down to my conclusions. Basically, based on my PhD experience, I think that the current stereo affairs when it comes to textual editions of all Romanian is very chaotic. Do the Mary approaches apply to making critical editions? And especially this phenomenon of interpreting certain letters. I don't think that we should disregard completely what has been done so far. It is important for us to interpret certain linguistic phenomena, but I do think that before getting to that step, we need the raw data. We need to make scholars kind of access that data. And that's what I would like to do probably in the future. So make digital editions if possible. Here's a little bit of a bibliography with an explanation mark, which was not supposed to be there, but is just to kind of show my enthusiasm for you listening to me. And I'm open to whatever questions you have.