 We will see now a presentation from Evide Jean. This was part of the READ project to develop something for the BASA Pistome Archive, the Dio Zsand Archive, and yeah, let's surprise you. Thanks, Günther. So, as Günther said, this work was started with the READ project a couple of years ago. So in the READ project, we were in charge of what we call document understanding, and a specific part was basically table processing, and we wanted to explore a use case basically with tables. And we had a close collaboration with another partner, which is the Archive of the Diocese of Passao, where they basically provided us with, at that time, was a nice collection, so 20,000 pages of death records for a specific period, which was of interest for them. And the collection was mostly composed of tables and was fairly challenging, with 700 different hands and very sometimes difficult writing. And the purpose basically of our work was to see how far we can go with the current technology in order to extract information and visualize information extracted from those tables. So basically what we wanted to do is to start from a set of images and applying the current technology, see whether we can simulate things like what we call the propagation of the scarlet fever for 1870-72 in the Diocese of Passao. So this animation has been done automatically thanks to the workflow we have developed for this specific collection. So we have built an online demo that you can use. So it's very basic one, but basically the purpose is to query the database we have built automatically and to see in a special way, so thanks to a map of the Diocese and also using temporal animation to see a specific visualization of a specific query. So we have a couple of examples in order to illustrate the database, but basically here what you see is the number of deaths where the database is in German so you have to know a bit of German if you want to use it. But here this is basically the number of deaths where the death reason contains the string shala, which means scarlet fever. And so thanks to the information extraction tool we have developed, you can also specify a certain year so you can do it for death causes. You can also query the overfields of the database, so most of the time you need to know a bit what is in the database and you will query it. So here for instance you can see, so the occupation of the death person contains the word glass and it spots a specific parish which is Reagan and which is well known for a glass maker company which is still existing today. What you can do also is, I don't know, so you can query with first name and last name, but here it's not that meaningful currently, but we have other use cases. For example, Inverner which is a kind of patient or farmer and you see that this term for a given reason was mostly used in the part of the diocese so currently I don't know if it's because it's a linguistic usage in that part of the diocese or because in this region the social agriculture world was organized with this kind of occupation. What you can also do is the activity of the doctor who noted the death, so you see that mostly this doctor Kufner was working in a specific parish, Altenmarkt and you see also the other regions where he also worked. And what you can do of course is to go back to the original page in order to assess whether the information was correctly extracted or not. So here I'm using the web interface provided by the transcripts platform and so on the top you see basically the image, original image and on the bottom you see the outcome of our table processing tool which structures the specific image. So here you see the first row is the header row of this table and basically here you can verify that the doctor Kufner was really associated with this specific record. Maybe I can show you here what we have is basically the temporal animation over five years for the scarlet fever death reasons in this region. And here again you see a specific parish where the disease was fairly active with even two cases for this specific page. So I will show you the URL later on so that you if you want you can play with it. The workflow we use is basically this one so we during the REIT project we generated a fairly large ground truth data set with more than 1,000 pages which were transcribed and which we were able to train a model which is not that exceptional. We have a CRR or about 10% so the outcome could be somehow noisy. But also we had the annotation for the table so we have a ground truth of 1,000 tables for this data set so we train our model in order to perform table understanding. So workflow the first steps are very basic. We first detect the text lines using the tool in the transcripts platform then we apply our HDR model and then we perform the table understanding in order to structure the tables into records and then there is a step which is here mentioned as information extraction but which is in fact a sequence of small steps in order to carefully extract the information we need to perform some temporal visualization. So a couple of words about the technology we are using for table processing. So we use a neural network approach in order to basically structure, organize the set of text lines so you have a page in a meaningful structure. So if you want paragraphs you annotate paragraphs and the system will learn how to segment the page so to organize the text lines into paragraphs. In our case we wanted table rules so we annotated the table rules and the system was able to organize the text lines into an individual table rule. So it's based on graph convolution network. And so for this specific data set here you see an outcome so we basically are able to extract 8 rows out of 10 in a very reliable manner for the full collection. And we are still working on the technology so we should be able to improve the results. And when we have this set of table rules the next step is to extract the information from each record. Here is a list of items we would like to extract from each record so the name of the people is our location, the profession, the religion, the region of death, of course the date of the death and the burial and the age and also the family statute. And the outcome of this information extraction is basically a kind of XML file where you have for each record a specific value for the different fields you want to extract. To do this so the task is not that easy. So initially you have an image and you apply it here on it and then you have a string which is still not a data. Here you have to analyze a string in order to either normalize it or extract meaningful information. So in the first case the text corresponds to a date with some typos, so some errors. And eventually what you want is basically a normalized version of the date so that you can automatically process it. So you want basically the number to identify basically the month not just a uni but you want to correctly associate basically the month number to this string and also the month day especially. The issue is that here it's not the case for uni, but most of the time for the longer month name like November, December you have abbreviations. So you need to basically recognize the specific months even if there are errors or abbreviations. And again if you have the first name and last name you want to correctly identify the first name or names and the last name. And so that you can query specifically each field. If you want to work on a field like the edge you have to extract the value but you have also to extract the unit it could be ten years, six months, two days. So you have to extract all this kind of information. And to do this we have basically designed automatically some ground truth. So here in this kind of tables you have a chunk of information which is not running text. You have a column where you will have dates or you will have names. So you can, if you have lexicon covering this kind of entities you can automatically generate ground truth and modify a bit this initial text representation by introducing some noise so that the extraction tool will be more reduced to criteria errors. And then you train name identity recognition using this data. And since it's what we call synthetic data so you generate a large amount of training material which is enough to cover your collection with fairly good precision. And then you discover nasty details. Basically what is important is to associate to a record a type stamp so a specific date or at least a specific year. And sometimes the year doesn't occur, doesn't appear at all in the page. You have to look to the previous page or to the next page in order to identify the year which corresponds to this record. So again here you need a specific tool in order to basically work at the document level and see the chronological order of the dates and basically infer for each page the specific year if this information is not present in the page. And you have to be robust to character errors as well. And once you have this so you can easily compute here this is the evolution of some proficiency over the period. So you see that the weaver so in the dirty line is the interpolation of the values. So the weaver profession is decreasing while the shoemaker profession is in red is increasing and the miller profession is quite stable. So this is kind of information now you can automatically extract from the database. So here you can see some temporal evolution. If you want to map this information into a geospatial reference. So you need first to design this geospatial reference. So this represents a diocese of Passau and we were lucky because these diocese had already a GIS representation of diocese. So basically each polygon represents a specific parish of the diocese. And we are able to map the information extracted to this map. What we have done is just a location of the parish level. So we have this information in the metadata. So we know that this book comes from this parish. So all the records extracted from this book will be located into this specific parish. We had also to adapt because what they had is basically the representation of the parishes which are now. And basically we have to transform a bit this representation to match the parishes which were in the 19th century. So basically we had to manually merge some current parishes to create the parish was which was in the 19th century. What we could do but it's not done is to more precisely locate the record in the map. So you have basically the address of the person. Many shoe is that you have a lot of ambiguity here. For example the location Obandorf is very one frequent in Bavaria. You have more than 40 locations for this basically string and more dozen in the diocese of Passau. And we could use some information in order to disambiguate this but it was not done. So what we are now going to do is basically to integrate not only the death records but also the birth and waiting records. We have already done the table processing. The information extraction part is mostly done but still some adaptation to do. Next question is basically how to link the different types of record together because the quality of the text extracted is sometimes nice. Sometimes it's really noisy and so too much names between records is very difficult especially when you know that basically 50 names cover 19% of the population. So we'll see how to do this. We are going to improve the online demo. So a colleague of us is working on this so that you can play more easily with it. And so I will also try to describe more in detail this work in order to provide more information about the specific workflow we designed for this use case. So here we have some references which describe basically the tool we use for the table understanding step. Most of the code is on GitHub and so if you want more information you can contact us and you can also play with the tool and that's all for me. Thank you. It's not really a question, it's more a remark, how to link birth, wedding and death records. I do it by the simple network, simple. So a person is born has a father and a mother. So father, mother and person and then in some point you put them, it must die and then the other person, a marriage and then the father, mother and a child, a child get married to the person, that person has also mother and father. So that's what I do. Yeah, but the issue is to automate this at a large scale and when you have errors in the names then you have to apply some fuzzy matching in order to get the right information. I know the feeling, especially when there are like 40 people with the same name, that's the problem of dealing with the primary sources. I know your feeling, I understand. Yeah, well thank you very much Hefe for your lecture. Although working in the Amsterdam City Archive I get paid by a research group called Golden Agents and they also try to match the burial, baptism and marriage records with each other and they also have all kinds of problems with that as already mentioned. It might be interesting just to contact them in order to see how far they got with certain things. I'm not sure how far you are but it might be an idea to have contact with each other about this.