 Okay. I think we're all technically set now as well. So let's move on. We have quite some presentations that are coming up within the next one and a half hours. We will hear about how to transcribe Greek manuscripts with a single HDR model, the challenges of recognizing indexed script and about how smart transcribers can get. And we start directly with a Pida Perliki from the Democratic University of Thres. The stage is yours. You can use either this microphone or that one if you want to move. And this is the presenter. Just click on that. Thank you. Well, good morning. As a friend told me last night, I'm the Greek girl. So I'm here to give you some insight on how transcribers behave with Greek manuscripts. Well, I don't know how familiar you are with Greek manuscripts or manuscripts at all. So let me first tell you that to this day, there is a plethora of fun-edited or under-edited works of Greek literature, ancient or Byzantine, especially the latter, due to the complexity of producing critical editions. To critically editing a literary text, scholars need to pinpoint text variations on several manuscripts, which presupposes entirely or at least partially transcribed manuscripts. Well, if such is the case of a significant manuscript tradition, an example, a large number of manuscripts transmitting the same work, that process can be a painstaking and time-consuming project. To that end, HDR algorithms that train AI models can be of great assistance, even when not resulting in fully accurate transcriptions. However, deep learning models require a quantum of data to be effective, and this in turn intensifies the same problem. Big transcribed data requires heavy loads of manual transcriptions as training sets. In absence of such transcriptions, this study experiments with training data of diverse sizes to determine the minimum amount of manual transcription needed to produce functioning results. Like any research question, this study was formed out of a problem observation and shaped to be a solution methodology. It was merely a three case question. Number one, transcription and collation of manuscripts, the main aspects of manuscript tradition research, that is, proved to be time consuming, and as such, demand great numbers of economic and human resources. But apart from that, let's consider the hundreds of manuscripts transmitting some famous texts. In example, the around 800 something manuscripts of Homer, almost five and a half thousands of the New Testament, or around 23,000 of John Chrysostom's opera. The factors mentioned above could exponentially increase. Secondly, that massive digitization process undergone in libraries and archives all around the world right now, that brought to research spotlight many hidden treasures of the past, seems equally problematic. Some digitizations, if not most of them, are merely digital reflections of the manuscript, or digitized manuscripts. In other words, we can't view the documents, but we cannot edit them. Last but not least, should we exploit relevant technology to solve the problem of multitude of manuscripts, we must first determine the optimal balance between minimum manually generated data input and maximum accuracy of output transcription. To make a long story short, how can we transcribe big manuscript data rapidly, massively, and with minimal human resources? Well, our potential solution could be converting the raw data, the manuscript image, to digitally manageable data, the text digitization, which can be best done with the HDR method. Since not every scholar is tech savvy, the aim was to make the best use of an already existing HDR software, one to include all necessary in state-of-the-art technologies, image processing and segmentation, document management, text search, text markup under TI protocol, maybe, perhaps user's real-time collaboration, and of course, all of that in a friendly environment. Would you like to guess what that software is? Well, the bibliography suggested the AI-driven Transcribus platform built here at the University of Innsbruck, for which we are deeply grateful. So, Transcribus was put to the test. In order to meet the experiment's criteria and test Transcribus limits, a group of manuscripts was selected to serve as a case study. So, the test sample was created of manuscripts transmitting John Chrysostom's homilies for two main reasons. Number one, John Chrysostom is one of the most famous writers of late antiquity and Byzantium, and his opera runs up to around 23,000 manuscripts, including liturgical texts and indirect tradition. That is a portion of text equal to almost half a million words, and that is indeed a big data research question. Now, second of all, about 3,000 of those manuscripts are known for the double-resension phenomenon, simply meaning that we have two manuscript groups transmitting the same, well, two versions of the same text, the original one and curated. The double-resension is part of the second phase in my research, which is the classification and autocollation of those thousands of manuscripts. Now, with a representative sample of 11 disparate manuscripts, dated from 10th to 14th century and transmitting Chrysostom's first homilie on St. Paul's epistles to Titus, a transcription of 29,228 words, experiments on transcripts began with a complete diplomatic transcription of those manuscripts and an HDR models training evaluated on CR character error rate. Well, transcripts documentation clearly states that transcription of at least 15,000 words is needed for 100 in documents successful modern training. Yet, early testing of the sort were proved that most erroneous HDR results regarded accents or word tokens splitting both due to the script a continuum form of the manuscript. That means that the words are written too close to one another. As a result, a maximum threshold of 20% CER was decided for a model to be deemed adequately accurate. That would result in 80% accurate transcription, which then could be easily corrected or normalized via string matching or ngram matching algorithms. Thus, to minimize manual effort and heavy data demands, experiments were conducted with additional data augmentation process leading to AI models manipulation and improved outcomes. Four methods conducted experiments. The first and most straightforward was to train a model for a manuscript. 24 models were trained from eight manuscripts with a decreasing number of words. Transcription input of around 3,000, 2,000 and finally 1,000 words rounded to the next page with only 50 epochs trained. Besides a few low image quality manuscripts, most models performed below the 20% threshold. 3,000 and 2,000 word models produced the optimum of 10 to 15 CER, as you can see. But even 1,000 word models was not that bad. That proved quite positive. So the second method was applying every single model cyclically to only 11 manuscripts, hoping to read multiple, telegraphically similar hands with only one model. For these experiments, the 3,000 word inputs were used. And that process was an early attempt at clustering manuscripts only without a special algorithm. That testing resulted in 90 text recognitions and two main conclusions. Number one, only nine out of the 19 combinations resulted in lower than 20% threshold CER. In most cases, the models were highly unsuccessful. Number two, those nine perfect matches were not apparent in advance as telegraphically similar writings. As a result, clustering algorithms that would predict which combinations might produce satisfactory results seem necessary. That being said, the few successful results inspired the third condition of experiments. Combinations of mixed models for more than one manuscript were trained. That data augmentation method was decided to enlarge the data set, but without demanding more manual input. The process led to the two following experiments. Mixed models were trained out of the nine of the second methods training best matches. And those models recovered manuscripts text with a CER lower than 20% within the limits that is. And in addition, an HDR model from all 10 manuscripts was trained to achieve an optimal model capable of transcribing any of our Greek manuscripts. Almost 9,000 words produced a model that applied to all manuscripts, resulting in top-end CER results down to 4.48% of CR. The fourth method was all about validating those results under the same methodology with the same manuscripts, but on a different data set. This time transcription of the fifth homily of John Chrysostom. Experiments were tested once again as a validation process. Now, eight manuscripts, eight manuscript transcriptions, I mean, of 3,000 words almost, each and up to 250 trained epochs produced even better HDR models. And moreover, the general uniformity of the scripts, despite the peculiar differences, once again showed the promise of building a master model, one to rule them all. By taking advantage of the most accurate model, the age 3,000 in that case, as a base model, a new master model was trained from eight manuscripts. The final CR on that last training set was only 0.6% the minimum CR of all conducted experiments. Well, in the light of the above, from the aforementioned four methods, I can say that numbers one, three and four led to fruitful text transcription results, while the second one seems more challenging to implement successfully as being more complicated without extensive testing of possible script matches. However, even that method highlighted paleographical questions, paving thus the way for further research. To offer you just a glimpse, these are some instances of manuscripts that, A, sorry, yeah, here it is. They were matched as similar scripts by the HDR, which is not apparent to the naked eye. I don't know if you find that similar. It wasn't for me. B, despite a particular writing style resemblance, the model's performance was poor and that was the case for the majority of the mixed models. And I'll see a peculiarity of Greek manuscripts. Well, some of them are written not on a baseline, but from a hanging line. Well, although the automatic layout process, as you can see, accurately recognized the dependent line traces, there was an attempt to adjust the unusual manuscript to a normal baseline. And as a result, the manuscript was turned upside down because HDR was not guided to expect such an occasion. Two were the main factors that affected HDR performance. The first one was the deliberate low number of transcription data, and the second one seemed to be the image quality. The latter is unfortunately beyond our control unless standard and strict digitization protocols are applied in every digitization project. On the contrary, data augmentation as a means of manipulating neural networks might overcome the lack of data. Nevertheless, as shown in the last methods experiments, the multiplication of training data alone does not suffice. Data augmentation should be best used as a process of model fine tuning and enrichment of preexisting knowledge, as in any form of training after all. Now to sum up, this case study proposed a solution for researching or editing authors and works that were popular enough to survive in hundreds, if not thousands of manuscripts, and is therefore unfeasible to be evaluated by humans. Through innovations like the Transcribus platform, successful HDR applications are no longer wishful thinking, using the appropriate methodology, auto transcription can enhance and accelerate philological research that would bring to light archive treasures of the past. Big data are indeed crucial to machine learning projects, but when properly handled, less could be more. Thank you. Yeah, thank you very much for these insights into your research. Questions? Yes, no, maybe. I don't know. No questions? Yeah, we've got one at the front. Please just use the microphone because as we're a hybrid conference, the people on the screens also want to hear. Yeah, it's a very practical question. When it comes to accents, which seems that was quite a hard part to kind of teach to your models, like the segmentation, do you usually include when you do the baseline, do you usually include that with the, because sometimes in my case, the accents are outside the picture basically. So I need to expand the baseline, which looks only at the... I had quite a similar problem with ligatures that were superscripted. I can't give you a straight answer. I've experimented with both. I tried to direct the baseline to follow the script. And I've tried to just automate layout process and both worked fine. So I can't give you a straight answer on that. But it's not that you built like two different lines. It was a continuation. Sorry? It's not that you built two different lines. It was a continuation of the same line. I only built two different lines when I had marginalia. Otherwise, I thought it would be practical to continue the baseline, because it's the same line. Maybe the model could be trained to understand that sometimes the script is not on the proper baseline, but could be travelling all around. Any more questions? We have one question online from, excuse me, my pronunciation, Fardzin, Shakebaneyat. The question is, has there been an approach to do handwriting comparative analysis using these models you explain? Sorry, I couldn't hear you well. Has there been an approach to do handwriting comparative analysis using the models you explain? I'm deeply sorry. There's an echo and I cannot understand your words. Okay. May I just say it without the microphone once? Maybe that helps. No. Simple answer to that. Right. Any more questions? Another one online? Perfect. Could you please comment on the way you calculate the learning breakpoint of the fine-tuned models? And please do it with the microphone once more. They can read online. Sure. The learning progress you said? The breakpoint of the fine-tuned models during the process. I'm sorry. The breakpoint of the fine-tuned models, is that what you're saying? Could you please comment on the way you calculate the learning breakpoint of the fine-tuned models? Okay. Well, if I understand correctly, I combined all manuscripts, all my data set. I trained the model and I used as validation pages from all manuscripts. If that's what... That's the question. That's not mine. Okay. Right. We've got another question here. I'm sorry if you could hand over the microphone. That would be cool. Thank you Elpida for a really, really good talk. I thought that was a kind of curious result that the machine thought that certain scripts were similar and others were dissimilar. I wonder what Greek paleographers would make of the machine's different classification of what counts as a similar scripts and different scripts. It's not really a question. I'm just curious because I'm not a Greek manuscripts person myself. Well, one thing I can say about this is that to me is a huge debate because when it's up to paleographers it's more like a personal opinion. Well, regarding experience and knowledge of course, but nowadays there are algorithms that do... that can do stylometry, I think it's called, and can predict to the point every detail of the script. So I'm not sure if maybe humans could evaluate the algorithms, but AI could be better at that process. Or it's proposing some kind of alternative classification. Yeah, that's better. It shows things that we maybe couldn't imagine or we are biased to see. Indeed. I think that's really interesting. Me too. Thank you. I've got one more question I think. And then another one online. Thank you. I wanted to just ask you a question a little bit to reflect on the idea of image quality, digitization quality of manuscripts, and to ask you if you would agree, possibly, with the idea that what we think is bad, digitization quality is not what the computer thinks is bad. That's true. That's a smart observation. You know, I believe that Transcribus already does a preprocessing of the images. Correct me if I'm wrong, but I think there is a process of sort of augmenting the images before start the recognition progress. Now, I don't have a straight answer, but from the experiments I have done, I have seen that manuscripts with low quality image were not producing good results. That's the only thing I can say for the moment, but I do intend to experiment a little more on the matter with a bigger dataset. Okay. Let's try one more time. One question from Achim Rabus. He will then join us later virtually. Could you please tell us more about your master model? How many GT tokens or ground root tokens did you see? Do you intend to make the model publicly available? Well, the dataset was greater than the previous ones, so I believe there were around 20,000 words transcription when all manuscripts combined. And yes, I intend to make it public as soon as possible for the benefit of the community. Right. Good. Then if there are not more questions, we end exactly on time, actually, which is nice. So, I don't want to give you a second. Some of you have already seen them. We've got these transcripts cups here. There it is. Beautiful one. And yesterday we gave them to the speakers privately. This time we do it on stage. Thank you very much. Better than Oscar. Thank you. That's a statement.