 Good afternoon everyone and welcome as the title of this slide suggests the aim of this talk is to illustrate part of my PhD project but first I would like to introduce myself and share a few things about me. First of all here you can see me in the Bobbo Lee Garden in Florence where I actually live but I was born in Rome I graduated in history of art I'm a PhD student first year I am a Renaissance art fan and I'm half British and I cut lover owner of two cats but going back to my PhD project the purpose of it is the identification of the methodologies to be adopted for the creation of a prototype of the Liguria digital platform a platform that will host the digital publication of 18 manuscript volumes 23 books to organize according to the letter of the alphabet that form Piro Liguria's Enchiglopedia of the Ancient World an impressive work that aim to embrace every aspect of the ancient world it is was written between 1568-69 and 1583 when Liguria was antiquarian at the Stance Accord in Ferrara and these volumes now are hosted in the state archives in Turin during the first year of my PhD I focused on experimenting with the use of HDR to provide the most accurate transcription of the text and following some evaluation transcribers were selected for this purpose the first step of the face of this face was then creating the ground truth ground truth materials on which to train the model after choosing a volume a significant for its content the volume containing all the words starting with the letter R the decision was to start working on a small set of pages that constitute the entry of Ramena a city in Italy 25 pages in total this helped define and test the description criteria and the correct workflow which is characterized by four steps automatic layout analysis correction of the segmentation transcription correction of the transcription I would like to focus on the the correction made after the automatic layout analysis for the segmentation part purely glorious text in fact is characterized by a complicated structure that is different on most of the pages vocabulary entries architecture drawings and inscription here you can see some example of how I adjusted the segmentation I divided the title or entries from the page numbers and main text I have created a region for every Latin and Greek inscriptions and a region for every element that describes an image although it's a time consuming process it simplifies greatly the transcription phase and it anticipates the work that will be useful for the creation of the digital edition moving on to the next phase of the workflow the transcription phase required the definition of the transcription criteria the main goal was to try to remain faithful to purely glorious writing as much as possible some basic criteria to serve this goal was well created such as the normalization of the view the v2u accepting Latin inscription the preservation of the critical marks and plantation as used by the author the normalization of small caps to capital letters and tagging of uncertain words with the tag and clear tag as tagging illegible words with the free dots and the unclear tag also the use of the angle dash instead of the iPhone to divide words into syllables at the end of a line but some adjusted some adjustment where necessary as you can see for example we had to change the third point and where the use of the capital and small caps and it was not distinguished especially for some letters like the S or the P it is transcribed according to the grammatical rules of the Italian language we added also that the we noticed that the critical marks were too hard to transcribe for the part in Greek so we didn't use it for that part and we also created the Greek tag to mark Greek words and inscription to be easy to find the the Greek words and inscription in the text another motivation that was done concerned the transcription of some ancient symbol using Latin inscriptions in the slide you can see the upper corner some of the symbols that were added using their Unicode to the virtual keyboard in transcribers by click clicking on the added button the Roman Cistercius and the Roman Denarius although visible in the keyboard were not correctly displayed during transcription as you can see in the image C lines 3, 3, 3, 5 and 3, 6 you can see some example usually these types of symbols can be read by changing the font in a PowerPoint or a Word document using for example the Unicode font or the new Atana font but in this case it did not work so it was then opted to replace those symbols with others from a Unicode group that Ligorio would have never used in this text and train the model does in this case I don't know why you can't see them anyway in this case as you can see here the font is not right so you can't really see the symbols but they were chosen from the astronomical group already present in transcriber virtual keyboard since it is planned to work on a document in XML format at the later time these symbols will then easily be replaced again with those actually used in the text this issue has been notified to the transcribers team and we look forward to having the opportunity to handle the situation directly in transcribers having corrected the transcription according to the new criteria we continued the with the description of other pages of the volumes pages 9 1 to 19 and 45 to 77 with 77 pages transcribed I proceed in creating the first HDR model here you can see the settings we used it was chosen to test both the Palaia HDR and the sitlab HDR engine and as you can see the results were better for this sitlab engine 3.17 compared to 4.90 the first model was then used to automatically transcribe from pages 78 to 120 and to do so the same workflow I showed before was adopted these 120 pages were then used to create the second model lelegorio zero one two testing both the engine the servo was for the Palaia HDR 3.31 percent and for the sitlab HDR plus 2.07 percent this model showed major errors in transcribing Greek descriptions and some lines of Latin inscription this is why a third and final model was created after transcribing from pages 121 to 200 using the lego area 0.2 the slide show a summary of all the models including the lego area 0.3 this final model although had had a see a higher serval of 2.22 solve the inscription problem each methodology methodological steps I don't know how to play this video it was a video anyway you can go to on the website through the QR code each methodological step is described on a website specifically created for this project this website is an operation of methodological transparency and allow issues and results to be shared with whoever will work on a similar project but mainly with the other member of the lego area digital project of which this project is a part as you can see in the credit page the second phase of the project oh it plays now sorry it's in Italian I'm sorry for this hopefully we will translate it in other languages but going on the second phase of the project for sees the XML encoding of the text to highlight the complex structure of the volume and the creation of what we decided to call intelligent links capable of activating internal and external connection to enhance the complex stratigraphy of antiquarian sources that characterize lego area research in conclusion transcribers has proven to be an effective tool to speed the transcription phase of lego area encyclopedia but we will be a valuable tool also for the basis to come thank you for your attention yeah thank you very much as well and we also have a little time for questions anyone I have a question why did you decide to use the dash at the end of the line instead of the iPhone because with the iPhone you can search the split word between two lines so now I didn't know that I I was for all this project I was followed by Elisa Bassanello which is the digital publication manager of the people that got sienna and she helped me a lot in dealing with that and together we decided to use the angle dash to do that but yes I didn't yes maybe it was a critical edition reason the problem is that we have in word dashes so if we don't use the angle dash that is the one presented in the expert editor and the end we are bound to not know which words are I think it and which word are simply dashed and we need to run some system to get what is in the end of the line and separated there are some dashes that are at the end of the line but are not I think so this is a long discussion we are having and I hope I don't think there is a simple solution because in post-processing if they are all dashes then do Amelia with an arson in the middle is not easily recognizable I cannot simply substitute them as if it's all dashes and remove them and join the words unfortunately it doesn't work that way at least that's my experience but we can discuss about this right I think there was a second question here and the fourth yeah all right so you said that you had to do a lot of manual work for the layouts in the segmentation yes how are you planning to do that for the for all the books for all collection I assume that you're not going to do that manually yes yes it actually took more time to segmentate and transcribe and correct in the transfer the transcription but now there is the also the possibility to divide well that will help for example I also was correcting the lines as well so it talks to me a lot of time but for example now I can I can divide the text region and then run the layout with that and that helps a lot with the lines as well with the correction of the line for the text region I hope to train to train the layout to recognize all the part alone I I tried but it didn't work yet so maybe with the transcripts team and the new developments I will manage to do that quicker perfect then thank you very much I think thank you I'm afraid this is all we have time for because we need to move on to the next category will now