 This is about the project which we run here in Innsbruck and it's about the cadastra maps. The Austrian-Hungarian Empire, as you can see here, completed the first cadastra in the late or in the second half of the 19th century. So the cadastra was created between 1817 and 1861 and it comprises a huge area, so 670,000 square kilometers. So in comparison, Germany today has something about 357,000 square meters. So a really large undertaking which was started and took, as you can see here, 50 years. Actually this cadastra contains two main sources. So on the one hand the detailed maps which were drawn for each land and it had mainly a military purpose. So the military wanted to know actually, exactly, wanted to have an exact map of the whole empire. And accompanying papers, so descriptions describing how the land was used, had mainly the purpose to standardize tax payment in Austria. So tax payment, it was the idea that it should be independent from local situations, but that there is a standardized way to access and classify the whole area in the empire. The quality of the data is extremely high, so it was done by the military and the surveyors, they were instructed very carefully, there were real laws and actually they didn't change the way they gathered all the data. So it was consistent, well of course there are always detailed problems with data, but in principle the data are consistent over a long period of time. And funny to know that the surveyors had to buy their equipment on themselves and they were also, if they were liable, if errors appeared. So it was, for some people it was half of their life they spent to travel through the empire and to draw the disaster maps and make also the descriptions. So the sheets of the disaster map drawn according to a scale of 1,000 to 2,880, this is rather detailed and there were people who checked, verified the data within these disaster maps and the usual accuracy is about 80 centimeters. They recorded all plots of land, so also those here in the Alps which are not used, we are just deserted ground and a lot of information went into these maps. This information is also contained in the description, so it's not only in the maps, but it is visualized in the maps but it is also contained in the descriptions. So as you see here there are different colors used for grazing land, for meadows, for fields, for wine yards, for houses, wooden houses are in yellow or public houses in red, even the type of the woods is recorded within the maps. So these colorful maps are available for it's about 300,000 such maps are existing for the whole empire and for the Austrian part and also for some other parts they have already been scanned completely by the Austrian Bundesamt für Eichenvermissungswesen, so the official administration for the maps and actually this disaster is still valid in a legal way and it is it is really the first and the starting point for the modern disaster. It is as you can see from the or as you can remember from the from the borders I've shown you before this you will find these disaster maps not only here in Austria but of course in Hungary, in Romania, in Bulgaria, in Ukraine, in Czech Republic, in Slovakia, Slovenia, Croatia and in Italy and actually in Italy it forms also an own regime of disaster maps. This is a detail from these maps as you can see the different colors and the parcel numbers in the maps and also the different kinds of bushes and trees and so on indicating directly how it is planted and so on. So it's a really extremely valuable source. The interesting thing is that digitization was going on with with the maps but and I think there is a Hungarian company called Mapire which is which acquired these maps and is now selling them to family historians. But the interesting thing is that the accompanying documents are in some way even more interesting because they contain the information. They are not colorful and not a bit boring of course but you can find all the information which is then drawn in the maps. So you find numbers of the parcels, the name of the owner, the location of the owner, the size and the kind of usage as we have seen before in the maps and then very important the description of the borders so this is a running text. So when they did the drawing they went along the borders and described each border and each point which you can see in the map. Therefore they needed of course the field names, they had to record, they talked with the people and and the people told them yes this wood is called in that way and this wood or this field is called in that way and that's of course also a very interesting source because you have a lot of names included in the disaster maps which you will not find in the official maps at that time because this is very detailed and the farmer who owns his land of course will tell how he says to this and that part and there are some other documents included such as index cards and so on. So here is that you get an idea of how this how the main table looks like it's the land register with all this information and here's a detail from the column which describes the name of the owner and here you can also see a specific thing that this owner is called plunser and then you see a funny sign and clozen and aloes so the name of this guy was aloes plunser but the house he was living in was called clozen and that's still a very and therefore the people never said to him aloes plunser but they said clozen aloes so the people are called according to the house names which is a typical thing here in the alpine area but also in other areas in in austria and they're still still there so again you have information which you usually do not get from official documents where people are just recorded according to their name and we were talking before about disambiguation of names so here you have an interesting way to disambiguate people because they are called according to their house names then the border descriptions so these are running text and then also a list of land owners which was used to get an overview at that time and the field names now here's a professor at the university kocha who works on these cadastra documents since many years and is also editing some of them and together with him we set up a project for a call which was placed by the austrian by the terrolian government it's called a light tower project for digitization one of the few projects which deals with historical documents and we got a grant of about 200 000 euro 50 percent of it will go to the transcribers team and the project is now running for a bit more than a year and should be finished at the end of this year we started in our proposal i mean the proposal was written much earlier it took of course some time to evaluate and so on and so on so we were rather careful not to promise too much and therefore we set the beginning that we just want to make them searchable for users so that they can simply search in the documents and and see what their grandfather or grandmother which properties were owned by these people but now we became a bit more ambitious and are thinking to extract the data and to build up an extra database with the support of volunteers so that that really these valuable data are recorded in an exact way and can be used by the researchers and that's actually maybe i think the talk from avi was really great because it was based on automatically extracted data of course there is the problem with the noise and you have to get an idea of how much influences the noise which is of course produced by automated methods how much influences this the visualization the data extraction you have seen before but on the other hand if you have really large amount of data it will be the only way to do it in an automated way so i think what what what we saw before is really the future and i believe that that data extraction from historical documents can be the next big thing in our community here it's a bit different because as i said before the the colleague is very much interested in exact data and wants to to have yeah more or less hundred percent accurate data therefore the approach is a bit different similar to hdr correction if hdr correction requires or comes or if hdr processing comes with an error rate of 12 15 percent there will not be any improvement compared to writing from scratch and but if you are not interested in in transforming and automatically process text into a correct text then of course 12 percent character rate are high and and and not perfect but already provide you with something and as long as you know that you are dealing with noisy data it will be extremely useful and i think that's something which is which which can be applied more or less to every project that the final goal is the final goal really to have hundred percent correct data or is the final or is it okay to have the best model which can be done yeah so the objective for searching that was rather easy to achieve because with the structure recognition tool we got within transcripts and the new hdr plus model and so on it was done rather quickly to create a good training data both for the structure recognition and the hdr model and here's a very simple search interface for searching so we due to the fact that the recognition is below two percent character error rate there is actually no need to use the keyboard spotting application to find more or less all words in the document and we restricted the search here just to the family names or person names so that people can search for their encasters as we promised in the in the proposal and what we use here is the standard lucine search index which you have seen yesterday the talk from philix and we display here just according to the document type so there are different documents you could select the documents different commons so you can see that there are different types of documents that you can choose from so there are different documents you could select the documents different commons so the usual way to make facets and then users come directly to the document here we are playing around with an overlay might be helpful or might not be helpful it depends on i mean this can be that's still subject to experimentation in some cases it's a nice feature to to understand what is written here but on the other hand in this case the writing is rather neat and people likely will will know it anyway what it means as i already said we sent the text and the images to our service provider in vietnam and generated two models the structure model and an hdr model and for both we we got a really good result so as you can see we worked with 1000 regions for the columns and 41 000 words the straining material for the handwriting we were not able so far to test it against the whole data set because the scanning process is still going on but we expect that we need not to add too much data for completing the hdr results now the new or extended objective would be to do also something with crowdsourcing for this kind of document with the final goal to to involve volunteers and to get a perfect record or a perfect database of the data included in the descriptions and likely this will be an application on its own so searching and and crowdsourcing the tasks are rather clear we need to create the layout we need to transcribe documents and we need to review the results and i already mentioned this consideration that if correction is needed anyway we really have to be careful if the gain in efficiency by automated processing is really high enough to justify the overall effort and we had yesterday a discussion in the in the workshop and it was interesting to hear that several people were also they had also the opinion that it is more motivating for volunteers to do something fresh than to correct stupid errors of a machine and therefore our current approach goes in the direction to provide a very simple tool to mark the rows of a table to provide a transcription interface which exactly contains the structure of the tables and to help with the automated transcriptions users in transcribing but they would do it manually and and then to add some axle-like features to the to the edit interface so that the input of data is as simple as possible and that could look like this that users so that below you see the table structure you might mark a row put in data and then as you can see here in many cases data are just repeated so the owner of this land had several parcels and they even had the same function or the same data included except a few in this case just the size of the parcel differs so a simple copy function might be fully sufficient to support users to put in the data then the third task would be to review the data and here are some considerations so we also had a discussion interesting discussion yesterday that often it is a bottleneck in crowdsourcing projects that that researchers or those archivists or those who are responsible for or who are the project owners and want to get the data that the crowd is working fast and then there's the problem who cares about the input of the crowd and we got the very interesting feedback here from the Feeley-Hunton project team who said that they can see that if there is too long time span between the input of data by the crowd and the final reviewing that this demotivates the crowd and they see projects declining if the gap is too high between the data input by the crowd and reviewed by the reviewers so it is likely a good idea to allow users to have both roles transcribers and reviewers so that they can have both roles and feel responsible that both tasks are fulfilled for this I think it is also a good idea to make the reviewing process transparent to users so one idea is that users also have the chance to somehow see the history of what they did or to get notified for changes which are done by other users or reviewers to their documents or to their input and to visualize this data to the users the idea I mean this is something which I think could also be very useful for transcription projects because I know from myself actually that if you are not an expert in transcribing and if you are a bit unsure you cannot always use an unclear tag because the writing is not unclear it's and you know your ability or your capabilities is not so high but you want to to provide something because you can read 80 or 90 percent of the text but in many cases you are not that sure and if you get the chance to get notified on this or if I would get the chance to get notified on this and to see also visualized ah these are the words which where I had problems and where I made mistakes I would not see this as a that is offending me but that it is helping me to learn so I think that that is a clever idea and then of course large crowdsourcing platforms have the chance also to have several rounds of reviewing and that's another way of course to guarantee the quality which is required so there are projects like Insumiverse which have even 30 rounds of reviewing data coming from from the crowd yeah visualize the data that's obvious and of course one of the nice things would be to link the maps with accompanying documents and so we have to deal with the numbers on the maps they contain the parcel number and of course the same number is included here we tried out in this case to do some recognition of the parcel numbers but error rates are rather high and this is often can often be seen with numbers that that is that they are not so to not provide that that good results so the same the same consideration will be applied that it is likely easier to type it in than to make all the automated processing and and receive too high error rates for a crowdsourcing project the same is true for number detections because of course you would would need first to find the numbers on the map and this is also a task on its own and you can imagine that you will get a lot of false positives in in this case to find here the numbers and it is extremely easy to think on a tool like in Transcribous where you just mark the baseline of of this number and you get the rectangle automatically and then you would do this for the map and finally get the list or whatever where you can input the data and here it could also be an option to have plus or minus button so that you just say okay the next one is just one higher or one lower whatever so I could imagine that this is something which can be done on the mobile device as well in a nice way if the users are working and if one of the two documents is already transcribed or the map is also marked it is then a trivial job to match the two numbers and to get or to notify even the user that here's a description available or here's already something available in the map and finally of course this would be linked via the database and allow the user to jump from either the map to the description or from the description to the map and since this blue part shall symbolize the database once you have the correct data in the database you can also do a lot of interesting further calculations or further processing because then you have really the clear data and you can do all kinds of data analysis the final goal would be there is interesting application from the Tyrol in government it is called historical maps in Tyrol and they did really a great job so they are working with Arcus and they are georeferencing all historical maps in Tyrol so they have now I think at least 100 or more maps from the earliest maps in the 16 17th century up to modern maps well still historically I think they go until 1930 or something like this and so the idea is to include the whole disaster maps into this application here's a test object already included and and the nice thing for us is of course that they are georeferenced and therefore they know also the borders of the disaster commons and these are exactly those which are included in the disaster documents so it would be simple to link from the the disaster commons description of the database to exactly these maps not on a detailed level but on the border of the commons and so we can imagine an application where users either start here in this historical map application and come to the database or to the original documents or the start in our application and come also to this and of course if rights allows the the images here can be retrieved within our application as well unfortunately we have here rights problems so it's not clarified so far how this will be possible okay thank you a lot thank you any questions maybe in your slide current status the they have written about concept is finished and then many discussions why are there so many discussions and can you give me some examples yeah good question I think the discussions are mainly about this topic shall we do it automatically or shall we do it in a more manual way and some team members in our team think we should do it with automated processing so table recognition and that kind of processing and I'm a bit more conservative and I think it it as I as I have described before um a simple approach could be enough from my point of view so these are the mainly the discussions between developer and project owner is it possible to add a feature to transcribe us to support geotiff or one of the other georeferenced image files file formats because then your xml file will give you the location of the object of attack in spatial coordinates I'm not sure um I mean to to add a georeference is not a big issue but but um uh I don't think that that that we are really prepared to to deal with this so um maybe yeah I think that's that's the answer yep have you made tests with recognizing names on the maps not just numbers no we didn't make tests for the names or for the place names on the on the map I'm rather optimistic in contrast to the numbers I'm optimistic that you would get a good model with this petopala tool which we have shown also in the workshop to find here because that's from the pixel structure I think that's very separate or very distinct from from the other parts with the with the little numbers I'm I'm a bit more skeptical we got the advice from Erwe yesterday in the workshop we should try it out so we will try it out and and we'll see and of course I mean if we would need to process a much larger amount of data then there would be no way to do it in an automated way but here we are talking not about that much data it's just Tyrol and I think that yeah I already mentioned the arguments but so I think it's it's fine to do it in a in a manual way thank you very much Gunther first reaction wow and also a remark one of the projects that that one of the one of the types of fail-hunter projects is dealing in the Netherlands with population indexes actually it looks the same as this kind of source also with tables it's very recognizable that you have the same problem with entering the data and the control indeed of the data there's also the flash a house yes of course there is also an other option and I'm curious if you talked about that and that's if you have the if that's if you compare the let all the records be entered twice by volunteers and then you can very easily by controllers they can only see where corrections have been made a very simple check this is the correct one this is not correct one and go on and go on that it goes way quicker yes sure I mean several rounds have been mentioned and visualized data so that's exactly what what you say so that you might get but then of course yeah it depends really on on the difficultness of how hard it is to interpret the data in that case it's rather simple I believe I'm still confused about the difference between structures and tables are you here structuring or are you using the table I use the structure tool you use the other structure just to extract the column the table has a rather complicated logical structure and it is not easy to find rows in in tables you have seen the figure for more complicated tables of course something like 80 percent or a bit more 85 percent so that's good for automated extract extraction but maybe if I just need to click it makes not a big issue to make some clicks or to correct some yeah so as I always say it's a question if I gain so much for a simple task like this but from the technology it's different or maybe in the in the base it is also similar but for us it looks of course different yeah here we are and thank you very much we're going to finish up with a bit of a question and answer session now so you have all three of the board here on the stage and with all of the answers that we could possibly have and so we just want to wrap up with the discussion or if there's anything more that you want to ask about our plans over the next six months to year as we're figuring out things behind the scenes or if there are any requests that you have or anything that you would like to highlight and there is within your pack there is the user conference survey so if you'd rather put things down in paper then and drop it into the box downstairs at the front desk fine and also if you think about things later and you have some concerns we've had some really great talks over the coffee break about some concerns particularly about the pricing model do drop us an email as well with some bullet points if you've got some issues that you want to raise it's so important that we get your feedback on the plans that we've shown today but what we'd like to do now is just have a Q&A discussion so if you've got any comments or any questions that you want to ask in person well we'll all hear does anyone want to start with that that's the next to you see yeah um any suggestions for the next to you see didn't think much about the models about this but it could also be an idea to to have it not always at the same place I mean we changed from Vienna to in spoke so if there are volunteers you're happy would you like another user conference thanks I have a question on the bug request and the feature request that we can do by clicking on the bug it's sort of a big black box what happens with it and my colleague from Oslo says that he well he makes screen dumps to illustrate and it would be really nice if he would know what other feature or bug requests have been done so that we don't go through the lengthy amount of time to report it but then because if you already know or someone else took the effort then that would save a lot of time so can there be some sort of way that we know what bugs or feature requests are being done being made well they are read but not answered the great that the main reason is that we already answer a lot of questions and if if the developers really have to think on each idea for a feature request it's a lot of work actually actually currently we're thinking a lot of how we're going to manage the community how we're going to provide information and there's probably going to be some sort of forum or other so where you can also then use the search function and look in the forum what's already been requested so you don't have to send your own request so we're working on that as well any other things that anyone wants to raise no everyone's tired and hungry now right we've worn you out in a day and a half and in which case i would just like to thank you to thank you to the organizers of the venue particularly our folks helping out with the tech and the caterers and everyone who's made this a success and also to Gunther and Andy and their teams in drawing us together i really have to say thank you to Tamara and to Johanna they did 90% of the work i was just disturbing them this were my 10 my 10% contributions so really thank you a lot yeah so thanks again i hope you can take something home we had really a lot of input and yeah as always it looks like you would like to realize everything you got this input just in a few days but then the next monday will be an ugly monday again yeah so what to do but we we have heard a lot and and and we try our best thank you