 Okay, then next up is Albert Ludwig Oroine from the National Archives of Estonia. And the talk is entitled, On Orthodox Structures, Peter Pala on Orthodox Birth Registers. The floor is yours. Yes, hello, my name is Albert, I come from the National Archives of Estonia. And I'm here to talk about our project with Orthodox Birth Registers. And the focus here is on our work with the layout analysis tool, Peter Pala. But as we are on the second day of this conference and on the first day, we already had some fantastic presentations about the technology of Peter Pala and also the workflows of working with transcribers. So I try to repeat as little as possible and also trying to bring my experience with transcribers as a historian. And this presentation could have an alternative headline title, like historians first double with transcribers. And as a historian, you can go without a historical background. And maybe it's interesting to know as we are working with Orthodox materials, Estonia is not a historically Orthodox area. Estonia and Latvia are historically Lutheran areas fully all the way until the 19th century. In early 1940s, there were a few bad harvests. And when there's a bad harvest, people start to think about the social and political reforms. And some rumors started to spread from Riga around the country, around the areas of Latvia and Estonia, that maybe joining the Orthodox Church, maybe joining the Tsar's fate, we could get some land. And what is more attractive to landless peasants than land, for example? And so in these Orthodox sources, we have the more poorer part of the peasantry in these church records. And during these two larger mass movements in 1840s and 1880s, up to one third of the population changed their faith religion. Nothing really practical came out of it land-wise. As you can see here in the picture, only actor that got some land was Orthodox Church. So in nowadays in Estonia, beautiful countryside, there are also many beautiful Orthodox churches. Ten plus years ago, our great volunteers who are working with us, we have a great partnership with the archive, started to make this kind of database of Lutheran parishioners. So name indexes of the names found in the Lutheran church records. And this wonderful project has some great results in that sense, that if I'm searching for a 19th century Estonian peasant, then most of the cases I can find the family tree of that peasant from some of the more popular family history websites. So the work of the family historians has been really great with Lutheran parishioners. Now if I'm searching for Orthodox people, then more often than not, they do not exist. The records of Orthodox parishioners are still stuck in the archives. The family historians have not embraced the Orthodox records as well as the Lutheran ones. And the reason is that the Orthodox records are a bit more difficult. The language is a pre-revolutionary Russian and the names are hidden in a larger text block. And in our project we are dealing with the Orthodox records from 1838 until 1926 when the registers had a pretty standard structure. Here you can see on the right side where there is a date of birth, date of baptism, child's name and the information on parents. And before 1838 the records are in a different structure. And in this sense focusing on this other type of structure has been beneficiary and also in that sense that as I said earlier, Estonians moved to the Orthodox Church in 1840s. So before that these Orthodox materials are more about the Russian merchants and soldiers in Estonia. And the name, source name, birth registers is not correct. In these birth registers there are also lists of baptized people and here we can see the mass movement indeed in this movement. So in the smaller parishes in the countryside the sources begin with these kinds of large lists of newly joined parishioners. Again making it another kind of structure. Although the structure is the same, the data inside of it is different. And when I first started to work with transcribers, my idea was that you do the HDR model and you are done. Life is easy after that. And indeed that was the funnest part of this project working with the HDR. At first we made our own model that was the CER was 2% or something like that. And last year people from the Freiburg University contacted us as they were working on a generic Russian handwriting model. And so in that general model is also our model inside of it. So if in most cases maybe people are using general models as their base models to build on top of that then we are using the general model because one of the base models of that general model is our materials. And it works really well. So no problems there really. But yeah as we know when the next recognition is done or the HDR is at the acceptable level the work does not end there. So we found out that we will have real challenges with the layout of these documents. And as we started this project early 2020 then the best tool in transcribers to battle these kinds of horrors was a layout analysis tool called P2pala. And so we started to work on with that and with two specific text regions on the blue we have a child's name and on the red we see parents information. Just very basic text regions with baselines and tried to work with P2pala with that. And if with HDR you need 10 to 15 pages of transcribed text to get the machine going with P2pala it was known from the start that it would be not so that easy. It would maybe take 100 or 200 pages to really get some good results. But here for example in this picture we can see that about 300 pages still the results were far from ideal. So the problem was specifically in the baseline detection. And our final semi-working and stable P2pala model required finally to ditch 30,000 pages. Our materials we have those orthodox board strategies we have 200,000 pages of them. So about 7% of them of the materials we had to create beautifully to get some sort of a working working and almost trustworthy P2pala model. And after that the last model life seemed like great because as we see this was the first pages with the new model and everything worked. Amazing. After 30,000 pages you kind of are losing hope maybe and this was a really great surprise for us. And as we went through the pages everything seemed fine but still about 5 to 10% of the pages had some really weird errors like here shown. Where baseline detection completely fails for out of no real reason that I can explain or think about. And so again we had to come together think about what to do because this although 90 to 95% of the pages are correct maybe then still having 5 to 10% of these kinds of layouts are unacceptable. As we have family names for example in the middle of the textbook and if the middle and if the specifically the middle lines are not with correct layout then those pages are just a waste of space, nothing else. And so we had to really think about how to follow this error and this is the part where historian pressed open first time the XML button to see what is in there and XML is very clean. We can see baseline points and we know that our problem is with the baseline points and there are some numbers coordinates. Maybe we should do something about it. And so we started to theorize with this coordinates we can calculate some kind of a default length of the baseline. And if we and with that we can recognize two short baselines and tools knowing where in which pages there are some really awful layout layouts. But this is also the limit of historians abilities. So we conducted our digital archives explained what we want and and really got more than I could have ever expected. And so here is one part of the code to with a goal to recognize the two short baselines. And this code helped us with our workflow very well because with this code we could actually skip some of the pages that we knew for sure that they had no errors inside. But unfortunately, the pictures of those registers are from old microfilms and their position is sometimes different, the resolution is different. So this method did not apply to all of them all of all of those all of those materials. So, so at the end, we still had to unfortunately do too much hand made duration with these structures. And the most workable solution inside transcribers was P2pala for text regions. It was amazing to see that the text regions were spot on, but the real problem with P2pala was with baselines. We used SITLAB advanced also in transcribers. So the working solution here inside transcribers was in two different parts of the same software. And it gave us optimism for the future, because in the last two years we have seen transcribers progressing and really we are seeing some new tools that we would have wished were two years ago. And of course listening yesterday's presentations, the optimism is great. And yeah, that's about it. On October, we are publishing our read and search platform. It's our first time then show showcasing our materials for the larger public. Thank you very much. Yes, let's see if there are any questions from the audience. Anything you would like to know or ask the speaker. I can say that I did not cry during this process. I mean, there is one question for me. Have you already noticed the new trainable baseline feature? Yes. Okay, have you tried it out too? Yes, we have done some testing and it still requires some work. Okay, yeah, sure. It won't be perfect, but did it improve your situation? Do you think it could? It wasn't better than the combination that you were using before. Okay, good, because that can solve some problems that are to do with baselines. And yeah, for structure recognition, there will be trainable layout recognition for that too. So, but this is probably going to take another couple of months, I think, so towards the end of the year, and you can give that a try too. Maybe that will help with the structural part. But yeah, really impressive what you've been doing. And yeah, let's hope that it will find a lot of users too. Cool. Again, and of course, you got a transcribers mark too.