 Is it working now? Yes, now it is working. OK, sorry, everyone. You haven't missed that much. I'm here, Miriam and Sarah. We are going to do this workshop. And we are just talking about the contents. So we'll start with a quick introduction. Then we'll talk a bit more about text training and also tagging, because this is quite important for a lot of you. And then we will talk about how to deal with layout in Transcribus and how you also can train your own baseline models and what to know there. And then we will also talk about the field and table models feature, which is still in the beta version of Transcribus. But you can already try it there and use it there. And then we also quickly want to talk about Transcribus sites. And you can't really see it here now, but we will have questions and hands-on session in the end as well. So let's see if this works now. Yes, OK. Just a very quick introduction, because as I said, I think you all are quite advanced users of Transcribus anyway. Just what is Transcribus in general, we try to give you an AI-powered ally to work with this very time-consuming and laborious tasks that you all will probably know when you're working with historical documents and digital historical documents as well. And yeah, so we always have this slide as an introduction what Transcribus can do. And today what we are talking about mostly is just the training of the different AI models, so text training, fields and tables and baseline models, and also the tagging as well. So tagging off the content or the structure of your documents. And yeah, just very quickly in the beginning as well. So just because we are always talking about AI and machine learning and so on. So just to quickly talk about it, we have what we want to say is that machine learning and the training that you can use in Transcribus that is based on machine learning enables the machines and computers to learn from either labeled or unlabeled data and then identifies pattern in this data and makes predictions on this data with at least human intervention that is possible. And yeah, the AI models that we're always talking about are then the results basically, or they are algorithms created during the training process. And what they are is basically the output of the training. And this represents the acquired knowledge of the model or of the training process. And so we already come to one of the most important slides, I would say, in this workshop or in general when you're working with Transcribus, because we're always talking about ground truth or ground truth material or data. And so we want to quickly show what this means or what we mean by that. So the ground truth data consists again of labeled data for the training that enables then the model to identify these patterns and make predictions for the labels on new data. So you could say that the ground truth data or the material is all the pages that have been transcribed manually. And then this ground truth consists of the training and the validation set. And the training set for the training process is a set of examples that is used to adjust the parameters of the model. So basically it's the data on which the knowledge in the neural networks or during the training process is built. And then you also need a validation set. So you need a set of examples to assess the performance of a model. And this is done during the training process. So the validation set is the set of pages where the model tests its accuracy. And I don't know if we can do this any better. Maybe. No, I don't think it really helps. Because here we can't really see all the whole slide. And yeah, anyway, we have just some information here what we would recommend a good validation set consists of. So we would say at least 10% of the training data. And it's very important if you're building a bigger model, which should work with more hands or more script types and so on, that all of these hands and types or all of these examples basically are in the validation set as well. Because only then the training can make a good assessment of the performance, basically. And then next, we're going to talk about the training of the models. So we have four different model features or training features available already. And yeah, what you probably all know is that you can train a text recognition model. So you would use that to recognize the text. And then you can also train a baseline model to recognize the lines in your documents. Thank you, Flo. That was what I was looking for. So now we can all see the slides better. And then there's two newer features, two newer training features that as I already said, you can test in the beta version already to train a model or field or table recognition. And this is used to recognize text regions. So you can train models to recognize text, individual lines and text regions or fields and tables. Then we will start with the training of text models or text recognition models. And here we also have some recommendations to begin with. So before the training of a model, as I just said, you need the ground truth material. So you need about 25 or 75 pages. So between this number of already transcribed material to start training a model. So we would recommend to have at least this amount of pages. You can start earlier, but it doesn't really, yeah, probably it will not be very good. Okay, that's fine. Let's just see what we are hearing here. I'm not sure. Sorry. Yeah, okay. So you can already start with fewer pages. And this also depends on your document type. So if you're working with print or printed material, then you can start with fewer pages and you will probably not need as much training data. But for handwritten material, we would recommend to start with about this amount of pages. And then to start training the model, you have two options. You can either just go ahead, start completely from scratch and transcribe all the pages that you want to use in the training completely manually. And then you know, you have the ground truth ready and set to go. Or you can apply a model, a public model that has been trained on a similar script already. And if it's, yeah, if you get suitable results, this might help to increase the training data in less time. And then if you have applied the model, you can just correct the transcription manually as well. So for the first option, we just have the basic information here as well. We'll just quickly go through it. Yeah, you select the pages that you want to include in the ground truth set. Then we would recommend to run the layout recognition separately. And then you have to of course check if this is already good for your documents or if you need to adjust it. And then in the first option, you just transcribe all the text from scratch. So put in all the text and you can, if you can't read some words or they are, yeah, illegible for some reason, you can tag them with that tag unclear or gap. We'll come to that in a minute. And what is also quite good to know maybe is that if you have any lines that are left blank. So if you don't put in any text, but you still have the baseline that they aren't considered in the training. And for abbreviations, we will also talk about this in more detail. But you have basically three options. You can just maintain the abbreviations as they are in the text. You can solve them to write out the expansion and you can also tag them with an abbreviation tag. What you decide depends a bit on your material and what you want to do with the data then. So what you expect is your final output, but we will talk about abbreviations in a minute. And then we also always recommend to save all the pages that you want to use for training with the status ground truth so that you know, OK, these are done and we can use them for training. But it doesn't actually do anything. It's just for your own orientation or your own information. And then in the second option, when you when you are applying a model or maybe you can also make use of a supermodel, so of the text Titan one, for example, you can apply it and then correct the automatic transcription. So the same thing you select the pages that you want in the ground truth set. Then you don't have to run the layout. Sorry, recognition separately. You can just run the text recognition and then, of course, again, correct the automatic transcriptions and again, save all the pages in ground truth. And then when you are done with this ground truth data set, you can start with the text training so you can in the transcripts interface. So in the web interface, we will be focusing on the web interface and only talk about the expert client. If it's not if it's a feature or something that is not in the web app at the moment. So you go to the model section in the web app interface, then you click on train a new model and then select text recognition model. And yeah, you also have to select the collection with your ground truth transcriptions and can choose the pages. Then in the in the training configuration, you can select the pages for the training. So all the pages on which the model is then actually trained. And then you have to select the validation set or the pages. And as we already or I already mentioned, the validation set should be about 10 percent of the training set and contain all the different examples that you want to that you want to then use the model for recognition. And then there are some advanced options. So you can also have an overview of the advanced options here. And we'll just go through them now so you can use a base model. And so you can use any public model that has been trained on similar material. If you want to try and increase your training data in that way. And so the yeah, if you use a base model, then the training doesn't have to start from scratch completely, but it will also take into account what it has already learned during the training process of the base model. So this can help, but we also recommend or we always recommend to test this. So maybe start a training with with a base model and one without it because it it is very hard to predict if it's really going to help much or not because it always depends a lot on what the base model was really trained on. And also maybe a disclaimer here, you can't use text super models as base models. So the text type one, for example, you cannot use that. It's just too big and also the German giant. And I think that the Dutch, the Dutch is or just the Transformers and the German giant, you definitely cannot use as well because it's also too big. But it is in the description of the model anyway, just as a heads up. Yes, then we have some other advanced options in the training configuration. So you can also adjust the number of epochs or training cycles of. Yeah, we we recommend for the first training or if you are if you don't want to worry about this at all, to just leave the default number or the default amount of 100 training cycles and only adjust this if you maybe have read a bit more about this in the help center, for example, or you know already what epochs are. And basically, they are the the maximum number of times that the model goes through the entire training set. And yeah, you can adjust the number for this, but we would recommend to leave it as it is only if you know that you would have to adjust them or the training doesn't work at all. This can be an option. And then there's also the the early stopping option. So the early stopping value is the minimum number of training cycles. And here, the default value in the training configuration is 20. So if after 20 epochs, the CER of the validation set doesn't go down anymore, the training will be stopped. And you can also use this or adjust this for your training if you feel like you need this option. But also here, we would recommend to use the 20 value for now. And then we have some more advanced options that we will quickly go through because some of them we will also discuss a bit later. And you can select the option to reverse the text for right to left scripts. And so at the moment, you have to do this if the text was written from right to left. So you have a right to left script, maybe an Arabic or Hebrew text. And so in the image, it's right to left or in the scan. But in the text editor, it was transcribed left to right. Then you can select this option so that the the output text will be shown from right to left. Then you can also use select the option to use existing line polygons. But also here, we only recommend this if you have adjusted the polygons manually in the transcriber's expert client. And we will also talk about this polygons a bit later. Then I already said that you can select the abbreviation tag or you can tag abbreviations in your text and you can also train them. So you can then in the advanced options of the training configuration select this option to train the abbreviations with expansions. And that that then trains the model so that it automatically automatically tags the abbreviations and also adds the expansions that you put in as an attribute or as a property. We will talk about abbreviations in more detail. And then you also have the options. I also already mentioned this to omit lines by tag by the tags unclear or gap. So if you if you have tagged any words or phrases that you can't read or something like that, or you don't want them included in the training for some reason, then you can select this option and the lines will be omitted during the training. But also here, keep in mind, the whole line will be omitted. So not only the word, but the whole baseline. And so this is also something that it is good to know, probably. And yeah, then we can have a look at some examples. So when the training is finished, this is what it then looks like in the Transfibles Web App. So you can check out the model's details. You can check the character error rate. You can see the learning curve. So here, for example, for this model, character error rate is 1.5. So it's quite a good model. And the learning curve also looks OK. Yeah, then here we have some, yeah, a small table with some predictions to or some recommendations to keep in mind. As I already mentioned in the beginning, if you have printed text, you can start the training with a lot less pages. So you can already achieve from 25 pages in the training set, you can already achieve a character error rate between 2 and 0.5 percent. So that is quite good and quite easy with printed material. And then, of course, it gets more complicated. The more complicated your data is. So if you have handwriting with one single hand and handwriting isn't that hard to read, you can with about 50 plus pages, you can expect a character error rate between 4 and 2 percent. And then if you have several hands, but they are all seen during the training, so they are all in the training data and the validation data, probably with about 150 or more pages, you can get to a character error rate from 6 to 4 percent. And then, of course, if you have many hands in the training data, they are maybe from the same period and region, but they were not all seen during the training, then the character error rate will be a lot higher and you will have to put a lot more pages into the training data or prepare a lot more to then recognize some hands or some document that were not seen in the training in any way or the notes are very scribbled and hard to read, then you will, of course, get much worse results in the text output. But you can, of course, add more training data always to improve a model. And if you can, for example, double the amount of training data, then you can already expect a 20 to 25 percent decrease of the error rate. So if you can manage that, then the model should be improved in quite a good way. And, yeah, we have already heard about this, that existing models can also be used as base models, so as starting points to reduce the amount of new data that you have to put in. But as I said, always test this because it's not always as easy or as good as it sounds. Yeah, then we wanted to show you some examples of some public text recognition models that you can just so you can get a better idea, maybe. So we have this public Dutch model for handwritten data. It was trained by the Utrecht Archives, and you can see that here with a training set of 178 pages and a validation set of 20 pages, they already achieved a character error rate of 3.10 percent. So quite impressive. And also the learning curve looks good. So really not that many pages. I mean, of course, this was a model that is just for one hand. So it's just for this Margarita Turner who wrote letters, but still so one hundred and seventy eight pages is really not that much ground truth data to start with or to prepare. And then we have a print model, which is also public. So it's an Irish and Irish Gaelic model. It can recognize Gaelic and Roman type, which I'm not going to pronounce because it will be wrong anyway. But yeah, it was trained by Gerard Farrell and they put in two hundred and forty three pages into the training set and only three pages to the validation set, but still achieved quite a good result. Of course, this were printed documents, but it's very impressive because they have this Irish Gaelic print in there, but also another Roman type. So two different types of print, scripts or fonts and still a very nice result. And yeah, so that's it for the part of on the training of text recognition models. And then let's move on to tagging. Well, you can train two types of tags in train. You can tag two types of yeah, tags in transcribers. You can use structural tagging to mark up the structure in your documents. So you can see some examples here. You can the in the on the top of the page, the heading was tagged. And then also a paragraph, the page number and also some marginalia tag was added. And you can do this in the document editor in the layout section. So on the directly on the page, you can select the shapes and then right click and just add the structural tag. And in the configuration part of the document editor, if you go to layout, you can manage your structure. The structured types or which are the tags for the structural and structural tagging and also added. You can add more. You can delete them. You can change the visibility of the tags that you want to use and so on. So this is quite straightforward here. But we will also hear a bit more about the structural tagging in the fields and table or in the fields model part of this workshop. And and what we want to talk about now in a bit more detail is the textual tags. So you can use textual tags to mark up your transcription. You can add attributes, for example, here. And the word Austria was tagged with the textual tag place. And then you could also put in a wiki data ID or you could add a country and place name attribute. And that the textual tagging is done in the text editor. So just select the words or the word and click on the relevant tag and then add a property if you want. And also, yeah, to manage the textual tags, you can you will also go to the configuration section and then can again, added these in the collection setting. So you can add tags to lead tags again, modify the attributes and so on. And we can now look at an example. See if this works here and maybe also try it. So we can see here we have a letter, an example letter. The text is already recognized. And then I know that there is let's see if we can increase this a little bit. Yeah, we have a section down here. Where there is an abbreviation. So this would be William here. And then I can just select this. Oh, yeah, this is good to show it here. Now you actually have to enable this tag setting here. If you want to use the textual tags. So if I now selected, I'm shown the textual tags that I have selected here in the text settings here. You can also add more if you want to use them regularly or delete these. So for me, this is fine. I can just use the abbreviation tag. And now I can write the abbreviation in the expansion. And then we cannot see this again now. But we should have we should have the abbreviation tag. Now, if we can see it now, no. OK, let's see. I will just do this again here. Even though this is not an abbreviation, but just that we see what it looks like here. Now we have the abbreviation tag and that's as easy as it is basically. Then let's continue. OK, yeah. So I wanted to show how to act at how to add abbreviations and also the expansions, because now I want to talk a little bit more about how to deal with abbreviations in Transcribus or in your training data. So it really depends on what you will then do with the output text or what you want to do, but you can you have three options. You can just keep the abbreviated form in the transcription and just transcribe the abbreviations as they appear in the documents. And then this will also be what you get in the output text. Then you can transcribe the expanded form. So like we just did now, if there was WM, we could just transcribe William anyway. And in the neural networks during the training process are quite often able to learn these yeah, these expansions and also output them in the text as well, especially if they appear frequently. So if you have some tags that appear very often and you just want the expansions to be recognized, then this would be the option to go. So you just write the expansion of this abbreviation in the transcript. But of course, it is very important here that you then consistently solve them in this way so that you do not use other expansions. Yeah, yeah, because then that would just be confusing to the model. And then you can also have the third option that I already mentioned. You can tag and also train abbreviations, including their expansions. So you can tag the abbreviation in the text at the corresponding expansion, like I just did now, in the expansion property. And then when you're training the model in the advanced settings of the configuration, select this option to train the abbreviation tag, including the expansions. And so just to summarize again what you then get, you will in the first option, you will receive the abbreviations as they are or as you put them in the ground truth data in the text. So the abbreviated forms for the second options, you will have the probably some abbreviations still. Okay, that was not good. That was the wrong button. See, here we go. Right. So for the second option, you will receive some abbreviations and some of the expansions in the text that you train the model to recognize. And then for the third option, now again, we'll have to select this. For the third option, you will get in the output text. You have basically again two options. You can then have only the abbreviations as you tag them. So when you export it, that is you can select that in the export option. You can only use the abbreviations. You can get the abbreviations followed by their expansions. Or you can also select to substitute them in the export settings. So you have a lot of options with abbreviations and it makes sense to really decide what you want or what you expect or what you want to have in the output data before you start the ground truth creation or production. Then let's continue. Yeah. Here again, we just have to overview of the training configuration where to find the train abbreviations with expansion option in the advanced settings overview. And then I also wanted to show some models where we can see what this looks like. So we have this public model from the University of Toronto. And they trained the model to solve the abbreviations in medieval manuscripts. They had 330 pages in their training set and 30 in their validation set. And now let's look at the example again. We can see it now. Seems like this is probably not working as expected. Let me check if I can show it like that. Currently not. Doesn't want to show me the page, which is not ideal, but I don't really know what the issue is. Here we go. Just reloading. Yeah. Just reloading helps sometimes. So let's increase the image here. And I don't know how good you are with medieval manuscripts. I don't really know a lot about the abbreviations used there, but you can just see, for example, here this omu, and there is this abbreviation, which was solved to hominom automatically. So this is just the page recognized with the model. And also here, for example, Sua, which was then Suam, and the following word again. So they have a very nice model for medieval manuscripts, which also solves the abbreviations automatically. And then let's go to the next one. And then there is another model trained on medieval Latin documents from 1520. This is not public, unfortunately, so I cannot show more, but I will only show the one page. And they chose to train the model to recognize the tag abbreviation, including the expansion property. And they had also not that many pages in the training sets of 177. And quite a big validation set for that. You can see here that the character accuracy is quite high, but this can also be due to the tags, to the trained tags, not always being recognized correctly. So the model works very well on their documents, but the character array doesn't show it as expected. And now let's also look at this example. Hopefully, yes, here we can see it. So this is also, again, the recognized tag. And here we can see all the abbreviations probably end to select this again. Yes, and then here we have the expansion automatically recognized. So I will not go through all the examples now, but this is what you can get if you train the model with the abbreviation tag. And then, as I said, in the export, you can... I mean, this is, of course, not 100% right, but it's just an example of a model. And in the export options, currently, still only in the expert client, you can then select if you want to keep the tags. So only the abbreviations, if you want to have the tags followed by the expansion, or if you want to substitute the tags with the expansion. So you have some options there as well. And then I'm more or less done with my first part. And with the talking, I just wanted to talk about training text recognition models for right-to-left scripts in Transkibus. So we can show this slide, which we are quite proud of or which is very good news. I think that we have already five public models for different right-to-left scripts in Transkibus that you can test. Although I have to say that there are now two versions of the Ottoman-Turkish print model, so it's only really four different models. But then there is another model for Yiddish, a Yiddish-type phase, and also the Debug model for different Yiddish handwriting documents. And then there's another mix of historic Hebrew scripts and languages. And so we can see that it definitely is possible to train text recognition models for right-to-left scripts in Transkibus, but you have to use some workarounds or you have to know how to do it, basically. And so this is what I'm going to talk about right now. We know that this is not far from perfect yet, so the workflow that you have to follow at the moment, but this is what it looks like right now. It would be good to run the layout recognition separately or you can, of course, also mark up the layout manually yourself. And so text regions and baselines that you have that. And then, as I already mentioned, with this advanced option setting, you should transcribe the text from left to right in the text editor, so you have to get used to that that you cannot really transcribe the text in the correct orientation, but you have to do it the other way around. For the ground truth data. And then in the training configuration, you have to select in the advanced settings this option to reverse the text for right-to-left script so that the output text is then written in the right-to-left direction. And yeah, we also have an example to show in a minute, maybe, but I just wanted to mention kind of an outlook. So as I said, we know that especially in the web app, working with right-to-left script is really not very good yet and we are planning to have right-to-left support in the web app hopefully soon. I am unfortunately, I cannot really give any better dates for that and then that we would also, of course, want to adapt training configuration for right-to-left scripts as well so that you don't have to work around with the training configuration. And just very quickly, let's take a look at the example page if it loads the image. Yeah, so just a very small example, but I think it's quite nice to see here that, yeah, you can see the numbers basically and then you can see that it's written from right-to-left or recognized from right-to-left correctly and recognized with this very nice model. And so, yes, I think that's it from me now. Shall we answer some questions? Yeah, let's answer some questions for sure. I can read that. So, thank you, Miriam. First, just one thing to tell that when you mentioned the Gaelica model, one user wrote that another Gaelica model will be released soon. An Irish-type model will be released soon. We train again the Gaelica model and also a bilingual Irish-English model. Yeah, very nice. And then the other question is from Alexander. We're running text recognition. Can one provide an external dictionary, not the one from the train language model? If yes, in which format and how one to upload this, I can see a list of dictionary files under custom dictionary in the exporter, but I don't know where they all come from and how they should look like. Yeah, actually, I think I saw that question also at the support mail already, but I didn't have time to answer it yet. I think, yes, in the expert client this is possible, but in the web app it's definitely not possible. I think it was possible, it's not possible anymore at the moment. Yes, so probably it would be quite hard to explain it now because I also need to actually ask my colleague Schorsch how it is done in detail because I've never done it, but it is possible in the expert client and if you ask us, I know that there are some questions on that already in the support inbox and if you ask us directly, we can share the workflow with you. I did it for another project and it's possible only for P2PALA models, but right now only the developers can do it, so it was an option in the expert, but it's not longer available. So you just can add new names or words to the modeler. Yeah, I think if there are more requests, maybe we can talk up it as a feature request and talk with the product management. Exactly. They want, if there is the possibility. Yeah, to add external dictionaries, so not only the language models that are created during the training process, but if you have some dictionaries, some probably place names or something that the model usually doesn't recognize very well, then it can help to add such external dictionaries. Yeah. Okay, thank you. Right. Yes, let's move to layered recognition, which is not a funny topic, but it's important. And so probably you already know it, but just repeat it when you click the recognize button. In reality, there are two different processes in transcripts. The first is the layout recognition, which is the identification of the baselines and the text region. And the second one is the text recognition. So transcripts need first to know where the lines of text are in the image, because for the computer, the image is just a series of pixels. And then based on the baselines, where the text is, we have the text recognition. So even if you process everything with just one clicker, you just run your text recognition models. In reality, there are two processes behind. And the layout recognition is the markup of the document's image layout. As I said, the images need to be divided into text regions and baselines. And this is a prerequisite for the recognition, but also even if you want to manually transcribe your document to create the ground router and training model, you first need to run the layout recognition. There are three pillars of the layout in transcripts. The first is the baseline. The baselines are a polyline running along the bottom of the unwritten text line. So the blue line that you see here with all the points, this is the baseline. And even if you see this shadow around the text, the baseline is just the blue line that runs at the bottom of all the letters. And it's really important that it stays there. It's there at the bottom of all the characters and starts where the line starts and ends where the line ends. Then we have text regions, which are rectangular shapes that encase the text. And on a page, we can have one text region or multiple text regions. It really depends on the layout of the page. When you run the default layout recognition or the default text recognition, the baselines are clustered into text region based on their coordinates. So often we expect the first text regions are created and then the baselines are recognized. No, with the default approach in transcripts is the other way around. So first the baselines are recognized and after that, they are based on the coordinates, they are clustered together into text regions. And this is the bottom-up approach. So first the baselines and at the top, after that, we have the text regions. We will see that it's also possible to create first the text region and then the baselines, but we need to use Filz model for that. And then the other elements are the line polygons. If you are a user of transcripts expert, you know very well them in the web, in the web apper, we recently introduced them. So with the new editor, it's possible also to see the line polygons. But before one month ago, it was only possible to see the baselines. So probably they could be new to you. And the line polygons are polygons encasing all of the written text in the line. So the baseline runs at the bottom of the line where the character sits. Sit and the baseline, the polygon comprises the body of the letters and the ascenders and the descenders of each letter. So roughly they should comprise this region. It's important to know that the line polygons are computed by an algorithm starting from the baseline. So first you have your baseline and then automatically there is an algorithm that computes the baselines, the polygons. So there is an option to modify to train polygons. We will see it later. But usually they are just computed automatically by transcribers. And it's important to note that the text training and the text recognition happen at the line level. So even if we always say that the most important element to correct is the baseline. So when we talk about transcribers, we always say that you need to correct the baselines and then you can start the transcription and train your model. In the bank end, when you train the model, what transcribers looks to during the training or during the recognition is this light blue region called line polygons. Usually because the algorithm that computes the line polygons is good and works well for most of the handwriting. So it's enough. But there are some cases where the text recognition isn't good because of the line polygons. And we will look at some example later. Yeah. So these are the three pillars of the layout, baselines, text regions and polygons. And if each of these pillars can affect the text recognition. So often we don't get a good text recognition because the model isn't very good. But it can also be that the model is good and we have a good character rate. But the problem is the layout recognition. And there are different problems. So now we are looking at the problem. And then we will see how to solve them. The first problem is in regards to the baselines. We can have inaccurate baselines. You see when I have this newspaper here, I started the text recognition with the print model. And this was the result. So obviously the text recognition isn't good because the lines aren't accurate, aren't correct. So it could happen that you have too few or too many baselines. And these affect your text recognition. The second problem could be inaccurate text regions. You have few text regions or too many text regions. Or because you don't have the right text region, these could affect the reading order of the lines. In this case, you see there is a page with two columns of text and the header at the top. For us, as humans, it's quite simple to understand how the lines are gathered. But because the baselines are gathered by their coordinates for transcripts, these coordinates, it's very close to these. And these baselines, so it's just one big text region. And this would affect the reading order because probably I will have this first line as the first line of my text, then this one. And after that, the reading order will go to the first line of the second column. And then we can have inaccurate polygons. So this is another problem. Even if the baselines are correct, the models isn't able to transcribe the text properly. If you are in this case, so the baselines are correct, but still I have a good text recognition model, the baselines are perfect. Still, I don't get a good transcription. The problem could be the polygons. And now let's look at the single problems and how we can fix them. Inaccurate baselines. So let's look at some example. Here we have the back-off letter with some information and probably the address written in vertical instead of horizontal with a vertical orientation. And here you see this on the right. You see the default recognition with a text title. The text title should be able to read this text. I mean, it's English. It's not so difficult. The problem is that it doesn't recognize the lines where the lines are. The first thing that you can do is to use a different baseline model. So it's possible to select a different baseline model. There are 12 public models trained on baselines. So they're not trained on text, but they're trained on the baselines. And you can also train your own baseline model. And we will see later how to do it. We recommend usually to try one of these three models, universal lines, mixed line orientation, and horizontal line orientation. It's really a try and error processor. So our advice is to create a sample, test the models and the parameters on the sample, and then find the best solution for your documents. So there is also a three-cop we often need to do tests to see which is the best baseline model for each type of document. So in this case, I know that horizontal line orientation, despite the names, you can read the description. The horizontal line orientation is trained to recognize horizontal and vertical letters. Horizontal and vertical text. While the mixed line orientation is trained to recognize the lines in all directions. So in this case, we have just vertical letters. So I will go for the horizontal line orientation. And then you can open this section. And here we have a lot of advanced parameters that we can use. The first, so I will explain you briefly all the parameters. This is if you want to generate a new text region, or if you want to keep the existing text region. In our case, we want to generate new text regions. You use this option if you are working with fields or tables. The text region method, here you can decide if you want to use just a general algorithm to cluster baselines into text regions. Or if you want to custom this and create just one big text region, few, medium, or many. And if you know that you have just one text region on the page, I would go for this option here for instance. And then we have the text-based line orientation. This helps transcribers to know how to cluster the baselines in text regions. In our case, the text orientation is mixed because we have vertical lines. So I would try with these settings and then we can look at the others with another example. So I just selected a different model, the horizontal line orientation. And I told to transcribers that we have mixed baselines. And now we can start the recognition and look at the results. By default, when you start a text recognition or a default layer recognition, you are using the mixed line orientation. When you start a text recognition, transcribers use the mixed line orientation model with the default parameters. Let's see if we are lucky and we can see the result. But I already did it yesterday. So trust me, this is the result with the parameters that I show you. And when I try to run the text titan on this model, obviously, the result is much better because now the lines are properly recognized. Let's see another example. In this case, let's look at the newspaper. This is just to show you how the different parameters can give us different results. Yes, this is the result with the newspaper when I started the print model, when I launched the print model. And let's see, I go to a layout. I want to change the parameters to analyze this newspaper. I select a mixed line orientation and under advanced setting, I want to keep the text region, the lines are horizontal. And now we arrive to this setting, image scaling. If the image has poor quality or if you have many lines on an image and very thin lines, like in the case of a newspaper, it helps to upscale the image. And then we have all these baseline options. The minimal baseline length indicates the length of the baselines and its measure in pixels. So if it's set to medium, it means that transcribers will disregard all the baselines shorter than 25 pixels. So if you notice that transcribers isn't recognizing all your baselines because they are too short, it would make sense to set it to low. Especially in tables, it's really helpful because we have numbers or dates and the baselines are really short. It helps to reduce it to the minimal basal length. In our case with the newspaper, I'm sure that we have longer baselines. So I will set it to high to avoid too short baselines that can annoy me during the recognition. So let's set it to high. Then we go to the baseline accuracy threshold. And this indicates the threshold set by transcribers to recognize the baselines. We have seen that when it's set to low or medium, you receive better results. So if you notice that transcribers isn't recognizing all the lines, try to set it to low. Based on my experience with newspaper, it's usually helpful to set it to medium or high. So it is just for newspapers. Use train separators. So these parameters and the next one have to merge or distinguish baselines. So if you notice that two baselines that should be separated are merged together because they are quite close one to each other, try to set the use train separator to sometimes and to decrease the maximum distance for merging baselines. So in our case, because we have newspaper and the columns are really close one to each other, I expect that the baselines would tend to be merged together, even if they belong to different columns. So using these parameters, you can prevent this to happen differently. If you have very distant baselines that should be merged together, you can set the maximum distance for merging baselines to high. And now we can start our recognition. It could take a bit because when you upscale the image, it takes longer, but I think I did it yesterday. Oh no, this is another one. No, let's wait. Let's wait for the result and it should give us all lines or most of the lines in the right way. We can go back to it later. So we have already seen how to hold the parameters. And sometimes it could be that using a different public baseline model or working with parameters isn't enough. And when you have tried to work with all the parameters, but you don't get a good result, you need to train a custom baseline model. I will just show you some examples also. This is one. So here we have a vertical lines but mixed with horizontal lines. And because they are very close one to each other, even if you try with the horizontal or mixed public model, it doesn't work. So in this case, if you have multiple pages with this layout, I would recommend to train a baseline model and train it to recognize both orientations. Another example. So let's go back to the collection. Could be when we have a see-through lines like in this case or when the document is damaged, you can train a baseline model to recognize the lines as you want. In this case, just to recognize the lines on this page and not on the back of the page. And also to merge them as you want. So here you would need to have only one longer baseline. Or another example could be if you have a marginalia in very in with different orientations. I uploaded this version manuscript for instance. No public model we have, even if you work department meters will be able to recognize this properly. So you need to train a baseline model for that. So if you have differently, very unusual layout, you need to train a baseline model. Or another option could also be if you want to avoid a certain information. And what you have to do to train. So let me see here. Here for instance, the layout here is quite simple. But maybe you don't want to have this note here in red. Or you don't want to have a name here recognized or this page number. So you can train a baseline model not to recognize them. So just draw the baselines that you want to recognize. And the model will learn to avoid all the others. And to train a baseline model, what you have to do is to create at least 50 pages of ground router. So let's take this example. You need to look at the, you can start the layout recognition and then you can correct it. So you cancel the lines that you are not interested and you modify them and create your ground router. You don't need to have the text. So the correct baselines are enough. And after that, you can train your model. So it's very similar to the text recognition models. You go there and you click on train a new model. You probably have, you select the training data, then the validation data. And we always recommend a few percent, 10 percent of your training data to the validation data. Then there is the model setup for the advanced settings. So we recommend to stick with the default one. So baseline models are set to run for 100 training cycles. And usually it is enough. And then you can start your training. Then when you go to the layout recognition, you can select your own baseline model. You will see all the public models and also your trained model. Now let's go to inaccurate text region. So what to do when the text region aren't recognized as we expect. Here we have text to example, the one with the two columns and the other one here. When we have an index carder and we have three text regions that doesn't make much sense. I already told you about the bottom up approaches. So this is the default one. First we have the baselines. Then they are plastered together into text regions. So you can modify this approach with the advanced parameters. So you can specify if you want just to have one big text region and also the text baseline orientation, as I showed you before. And then there is the other approach, the top down approach. So first we have our text, first we recognize the text regions and then we recognize the baseline inside the text regions. And we have to do it with the fields models. So let's move to field models. They are available on beta. So you can go to beta.transcribus.eu and access it with your same credential you use on transcribus. And the goal of field models is to extract just the information that we want. So in this case we have this carder here and I don't want to transcribe all the text there because I don't need all the text that is on this document. I just want to extract the name, the place of birth, and the year of birth. And because the information is a structure, we can use a field model to extract it. Yeah. And we can also assign structured tags to these regions and train the assignment of tags. Yeah. We have already seen the application of a field model during the plenary session this morning. They are very helpful for text regions but also in the case of newspapers, forms, columns. And to start with the field model you first need to prepare at least 50 pages of training data. Preparing ether is very easy. So let's go there. Let's see if it's recognized. Yeah. This is the result. Which is good, better. I would say probably not always perfect like here at the top. There are too many lines but I mean the main text is there. Yes. No, in this case the reading order wouldn't be correct because it's only one big text region. So for this I would first use a field model to recognize the text regions and then apply a baseline model. If you want to have, I mean it's always dependent your project. If you're looking for names and you just want to process thousands of newspaper and find a specific name or a place, I mean it's not so important the reading order. You just want to find the information. But if the reading order is relevant for you, you need to first use a field model or manually draw the text regions in advance which is also an option but it's more time consuming. So you see here it jumps all over the page. So let's take an example. Here I have this index card and also after the application process with the regions. I don't have any handy thing. Well, it's a good idea. It's good if I have to think a lot when it doesn't get the problem but you just select the regions and then it will fix the reading problem. So what you need to do in the case of fields models is just draw the regions that you are interested in. Oh no, sorry. Yeah, let's go back. Yeah, okay. I'm sorry, it's not my laptop. And then you can tag them with the text you want. You can, if you're not interested in some information, you can just avoid to create a text region around them. And then you can assign the text here, the ones that are relevant to you and under there the wasted zoom bar. But yeah, here you can always manage your structural text as the textual text. And I just want to show you an example. Yeah, then when you ever create the ground through for around 50 pages, it really depends on the complexity of your documents and how many texts you want to use. But let's say in this case, I think I trained this model on 30 pages, so less than the recommended amount of pages and we just have five texts. So still the results are quite good. So this is a field model recognized with this, this film, this is a page recognized with the field model, sorry. And yeah, it works quite well. I mean, here, okay, it misses the day here, probably increasing the ground route would give me better results. But yes, still, I'm quite happy with what he can do. And I think there is also a poster this afternoon from Adi from the British library. And she compared the result of P2PALA and field model P2PALA was the older old segmentation version tool we had in the expert. So if you want to look at this, there is a poster on that. Yeah, so this is what you can do and yes, then you run the baseline. So let's see. So you train, you create the ground route, then you train the field recognition model, you apply the model as I show you, then you need to run the layout recognition to get the baseline. And in this case, remember just to unflag the option, find new text regions, because you want to keep the existing text regions. Then you can run the text recognition. And in the end, you can correct or export it. And in the expert, there is the option to export the structural tag as an Excel file. And I think we will soon introduce it also in the web app. So you see here, the end result is this spreadsheet where you have the file name and then a column for each tag. Doing the export. So it assigns different tags. So the problem is the tags or the region order. No, so for now, it's not possible to train, to teach the model to have the right order. So the order, no, I don't understand. So in this case, even if the reading order is different, so let's go there. I'm not sure if I understood your problem correctly. So it could be that the reading order of the regions is different from one page to the other. But when you export it, the order will be in the spreadsheet, the order of the columns will always be the same, at least for the document. Yeah, isn't the expert I can show you later on my laptop because I don't have the expert on this one. Sorry. Here we have just two examples of fields models. This is one and this is another one for newspapers. Unfortunately, we don't have a public field model for newspaper at the moment. So you need to train your own one or if there are people interested in newspaper, maybe you can work together and work together on one big newspaper model. Then the other problem could be inaccurate polygons. So you see here, in this case, the lines, the baselines are fine. So here, the line is good. It corresponds to the text, but the transcription is not good. So it doesn't exist. And if I go to look at the polygons, you see that the problem here are the polygons because of the music, like the transcripts algorithm is confused. So it isn't recognizing the lines, but it's trying to recognize the music above the line of text. And the same problem, you can have the same problem where there are bigger letters or bigger characters. What you can do here, you can train a field model on polygons. So instead of training a field model on text regions, you can train it on polygons. First, you need to create your run through to manually adjust the field models, the line polygons to comprise all the text in the line. And then in the training of the field model, there is this option where you can select train online polygons. So instead of training on the text regions, you can train it on the line polygons. And this is the result. So you see at the top, we have the default line polygon. And on the bottom, the train one. And also the recognition is much better with the correct line polygon. I'm trying to speed up. Sorry. The last thing that I wanted to show you is tables. So the principle is the same of field models, but here we have columns and rows. You can train a model to recognize columns and rows, so to recognize a table. It's important to note that there are no general tables models, so you really need to train a custom model for the tables that you have in your collection or in your document. It's also possible to train a model to recognize different types of tables, like two or three different types of tables, but the technology won't allow to have a general model for all the existing tables in historical documents. We have the rows and the columns, and you already saw something similar this morning. And the ground truth creation here, instead of creating the text regions, you need to draw the table and then split it in rows and columns. And also this process is quite easy to create the ground truth. I mean, in two hours, you can create all the ground truth for a table model, or at least if you have simple tables. With easy tables like the one in the image here, 20 pages of ground truth are enough. If you have difficult tables, the columns and the rows are not so regular, you need to increase the pages of ground truth. And if you have a mix of different tables, I would recommend to use between 50 and 100 pages of ground truth. And then the training is similar to the to the field model. So it's always same. And this is the result with a table model with just 20 pages of ground truth. And here we have the process pages. So it was correct on most of the pages I trained it on. No, you can include it, but it was just my decision not to include it, because the header is always the same. So it's easier to add a row at the top when I export it. If you recognize if there are higher chances that the text won't be correct on all the pages. Yeah, I'm sorry. So I can just skip that to 200 pages for the type of first pages, the normal AR, and so skip this kind of process. It's not possible to add a base model. What you can do is just combine the ground truth in one document. And yes, you can do it. Base model can be used for baselines models, but only in the expert, not in the web app. I think it's one of the future we are missing. So you can train your baseline model on the top of the mixed line orientation model, but for fields and tables, it's not possible. So what you can do is just you can just create a bigger document with all your ground truth and train the model on it. But for fields and tables model, it's really important that the ground truth is consistent. So there should be some similarities between in your ground truth. You're welcome. So here is how to process all the documents. And here you have just a summary on fields and tables. Compute accuracy, yeah. In Transcubus Expert, it's possible to compute the accuracy for the text recognition and also for the baselines recognition. So you can compare your ground truth, a corrected page with the HDR page or with the layout analysis. And here is the result. You can compare, for instance, a model with the accuracy of a model using the language model or without the language model. And you can see if the language model improves the accuracy or not. And then you can also measure the accuracy for one page or a set of pages. And it's also possible to compute the accuracy for the baselines, not just for the text, but also if you're not sure which parameters are the best one for your documents, you can create the baseline ground truth, and then you can compute the accuracy on them. Yeah, only in the Expert, but as Kersin told us this morning, this quality control tool will also become available in the app soon. But for now, yes, it's only in the Expert. Just so I understand what you're just saying. So if you want to compute the accuracy and take one of your ground truth pages that you partly established is completely correct, you run it through one model and run it through the other model. And then you can select here, there is the option to select the reference text, so your correct text, and the hypothesis, so the HDR text. And you can click on compare text versions. And you end up with this view. Or if you click on compare, you have the measures. One thing to know is it would be best to use a ground truth page that you haven't used in the training. Because if you want a really accurate evaluation, it's better to create a test, a different test set, because the validation, the training set and the validation set have already been seen by the model during the training. So if you want really to be accurate, it should be a complete new page for the document. Yeah, you can also use public models. If you're not sure about which public model is better, like the text titan or the german model, you can compare it in this place. But in this tool, that you always need to have the ground truth. So it takes a bit of time to create the ground truth first. Sorry. Yeah, let's see that we just quickly finished this up, because we actually wanted to have a hands-on and question session as well, but we also wanted to show as much as we can. So just very quick information, basically, because I'm not sure everybody's aware of this. We already heard in the chat today that there will be a new Irish Gaelic model coming soon. So that other users are publishing their models for everyone to use in Transgibus. And here we have just listed some reasons, basically, why other users would publish their models. I mean, it's quite straightforward. I did. I just want to share it. They're proud of it, or they know some people who would like to work on it. Of course, you can also share models in Transgibus with specific collections as well, but why not publish it if you can. And you can also publish models without publishing the training data. So of course, a lot of people have to be careful with that and cannot just publish the training data, but that is also possible. And now just very quickly, if you want to publish a model for all Transgibus users, you can just contact us via our support email or via the contact form in the help center. And there is also a section that says I want to make my model public, so that we know this is what you want to do. And there are some requirements that we have in place for now, because it's difficult to just publish every single very small model. So we have requirements, but I just want to say if you have models that are trained on text that we cannot offer yet, or they are very specific models that you know some people can benefit from and they are smaller than the training set size of about 50,000 words, or the CER is a bit higher still. We will still look at it and we might still publish it. So this is not set in stone. It's just for us something that we can work with. And then, of course, it is necessary for us to publish it. We will publish it in Transgibus itself, but we will also publish it on our website if that's what you want. Then we would need a description of the model so that others can then understand what this model was trained on, what it can be used for. Also always some representative images cannot hurt because then people can see it. And of course, we also need to know who should be credited to be the creator of the model. This can be one person, more persons, or we can also take the whole research project, for example. And then what I just said, the training data can also be kept private so that nobody can access it. Or you can also share it. And then the training data will be public as well, which might be beneficial for other users too. And then just as again, a bit of a heads up, we want to make the sharing or publishing of models a bit easier as well, because now you just have to contact us, and then we have to do it for you. But we are planning to add this Transgibus Connect to the web interface. And within this feature, it should also become more easy to publish models. And then there are no more requirements and such. I think I will just go through it very quickly. The Transgibus Sites feature, I will probably not show it in the interface, but we have prepared some slides. Yeah, so you can see here that this is what it looks like in the interface. You have this desk and models and section and then sites. And you can just access sites there. Of course, it's important to mention here that you cannot use Transgibus Sites within the individual subscription, but you have to start or you can start using it within the scholar subscription or higher, of course. There are different kinds of tiers, how many pages for Transgibus Sites or for publication will be available. But this is all listed very well on our website. And so yeah, you have different features that you can use with Transgibus Sites. Because of course, you want to show what you worked on so hard in Transgibus. So all the texts that you recognize tagged edited and so on. Transgibus Site is the perfect way to easily share and let others access your material. You can have, yeah, you have this material then in this side by side view where you have the original or the scan and you have to recognize a text where you can also see tagged instances and so on. And of course, most importantly, you have some enhanced search capabilities to use because that is probably what other people will want to do is search your your text or your transcripts. And it is quite easy or we wanted to make it quite easy or as easy as possible to create such a site interface or such a publication interface. So if you are in the sites section of the web app, just click on create a new site. There you can assign a project title for your sites interface. You can have a custom URL as well. And you can add all the connected collections or all the collections that you want to show within the sites interface. And then you basically always have three pages that you can edit. So you have the home page or the home, yeah, site you have about and explore. And you can always edit these pages and some and see them simultaneously. So once while you're editing it, you can also already see what this then looks like in the site interface. And so this is what the home page would look like. You can see the title. So here this is the Marjorie Fleming Transkibel sites. You can add a brief description or it can also be longer. And you can put in a background image that you want to share. Then you have this about section where you can give longer explanations about the project, the content, the team that has collaborated on it and so on. Here you can have as many sections as you want basically. And for every section, you can always edit the heading, the text, and you can also put in an image if you want. And then for the explore part of the sites interface, you can then here you can configure what users or others can see in the search page. So if you want to show tags, if you want to enable that, they can browse the tags or which tags are enabled in the first place. You can set filters that you want to show. But this is then of course based on the metadata that you have added in the Transkibels document. So of course these filters need to come from somewhere. And these are, you can edit these in the metadata options of the documents. And of course then there are also some other settings that you can edit. You can add more languages or also the possibility to edit translations because some others might help you with that. There are some privacy settings. Of course you can customize the theme of your site. So the logo, the color and so on. And you can of course add other users with different roles. So with the role of owner or editor and just how you want it. And so I just ran through that a little bit. We will have some more detailed explanations, workflows on the Transkibels sites in the Help Center. So we have also linked the Help Center here. We already have a site overview and there will be some more information coming also in the video. So hopefully that will clear things up. So do we have some questions here? Yeah, yeah. I don't know. Maybe I can just repeat the question so that they can also hear it online. Please. I don't think it shouldn't happen. No, it shouldn't happen. No, the baseline shouldn't be in issue for the field training. But if you have experienced that, you can send us an email because it could be maybe a specific bug or a problem of that collection, that documenter. So it's not your fault. It might also be a problem with the images or something like that. So it would be good for us to check this in more detail probably. We understood that field is just a region. So the event and all are just by mistake. Okay, yeah. Yeah, it shouldn't. So if you have field, if you want to train a field model and there are some baselines in there already, this shouldn't be a problem. Yeah. Are there any other questions? Yes, I know the feature. It's there in the expert client and all of us in the team who are working on documents or working in projects, we also have mentioned a lot of the time that we need this option also in the web app. So this will definitely come to the web app as well. I'm not sure. R&D added it as a pointer. I'm not sure when it will become possible. I think that name and recognition and large language model more accurately for R&D for this year. Yeah, please for this year. We can try to push it. Yeah, I'll say that. And there was another question. Yeah. So basically this is up to you if you publish it or not. Okay. I'm not aware of any. And as far as I know, you can, of course, restrict the publication. Or you might want it that only specific users can access it or so. But if you want to make it publicly available, of course, that's not what you want to do. I think on every transcriber site, there is also a report button. So if any user notices that not right content is there, they can report it to us. Yeah. I would say of course, there are different restrictions in different countries. Yeah. We can also make sure to ask some of our team that are dealing with legal legal things to make sure. But I don't think that we would offer the option if we were not allowed to do it in Austria. But I know that in Germany, there is the problem. Yeah. Thanks anyway for this. Yes, please. I think one of the things I was interested to learn is the API to all of the new works. Yeah. Yeah. And I wonder if that's going to be covered in a different session. We wanted to, we were thinking of including it to this workshop, but then we thought it would just be too much. Because yeah, it is, it is a whole other thing, basically. And for this, for this workshops in this year, years to you see, we don't have anything, I think, especially on API, but we are planning to do a webinar on it. So a separate webinar just on API, introduction, usage, questions. If you send us an email, if you want, if you are interested, you can just send us an email and we can make sure to send you the registration link when we organize such a webinar. Yes. Some more questions from the chat for, yes, please. Is it possible that one of the obstacles is, for example, in the API, I have read the, you know, yeah, yeah, yeah. Thanks. Thanks for reminding me. I just rushed through it. There is the option to publish your ground truth data in, as a nodal community that we have, specifically for the read co-op. I think it's hosted by Gunther. So you can publish your ground truth there, just ask for the access and then he will, he will grant it. But yeah, I'm not sure about directly exporting modules. No, it's not possible to export all the model. You can just export the ground truth with all the layout information of the coordinates and the text. So the page XML, which you can then import into another system and train it. Yeah, there are also some projects about sharing a model. A ground truth one is HTR United. So it's run on GitHub, I think there is a website and you can contribute to this data set of ground truth. You need to make sure that you have the right to publish the images to, so, to, yeah, this one. Yes, exactly. Okay. Any other questions from the next chat? Or, yeah, just one, unfortunately, unfortunately, now we didn't have time to do the hands-on session, but I hope that you still had some good input and some questions answered. And I mean, we can start and I will still be around. So if there are any more specific questions, feel free to ask us. And I put the short link and the QR code to the page. Yeah, you can download the slides, basically, of this presentation as well. And, yeah, of course, the link to the website, sorry. And our email addresses are also there. Maybe I just again, works better that way. Any more questions now that we can answer or Oh, yeah. Yes, yes. For sure. So with the compare feature that we showed, there is the option to also compare samples. But, yeah, unfortunately, it's still only in the expert client, but it will come to the web app hopefully soon. And, yeah, this is a great, great option to check also what kind of model you want to use on specific documents or specific types of documents, because then you get the CER for each individual page, not only the whole model. I'm not sure if we can see if we have it here. And the compare, I don't know if you can see it here, because unfortunately, we don't have the expert client installed here. But, okay, yeah, we don't have the specific compare samples view, I think. But it is in the computing accuracy tool in the expert client. Yes. The whole, or the whole page. Yeah, thank you. Thanks. I just wanted to share one thing that they mentioned in the chat. Yeah, tool to compare OCR. So Katarina, I think she recommended this one to, not just to measure the accuracy, the statistical accuracy, but also to see which are the characters that create problems. And there is another one called Cerberus HDR. It's on GitHub. Is this one? Yeah, there is another page. But the name is this. And it's with this tool, you can also calculate the accuracy without considering the punctuation or avoiding to, in not including some characters in the calculation, because maybe you just want to look at the text, but the punctuation is not important for you. In transcripts, it is measured. It is considered when the accuracy is computed. And there is this other tool that enabled to avoid it. And there was one more question. Yeah. So this is the Cenodo community, where you can also check what other data was shared there already. Or was that the question? Yeah, okay. Yeah, so it's linked in the slides anyway, so you should find it. Okay, so yeah, I think then probably we should end the session as not to, again, have too much time from the break. But thank you. Thank you very much, Bert. Joining us.