 And the last talk during this session will be given by Stefano Bazzacco from the University of Verona in Italy. And it is about the Mambino project and it's experienced with transcribers and Stefano will be talking about objectives workflow deliverables and future insights regarding this project. Thank you. Many thanks to the organizing and scientific committee to having me here. And in the following minutes, I will talk briefly about the experience that we had that we had with the Mambino project using transcribus, and in some way I will insist in the way in which I we engaged scholars to use transcribus for their own research projects. In some way I'm talking mainly about the southwestern Europe communities so Italian and Spanish communities that are facing now the usage of transcribus. And in some way I will talk about some reluctancy that I experimented with the philologist in the usage of that. So, well, that's the presentation structure so at the beginning I will talk briefly about the Mambino project, some objective and aims that we put for our project and the achievements and results that we get. And then I will center on the automatic transcription of Italian and Spanish historical printed documents. And in that way, also enter the core concept of the of the entire session so how transcribers can meet academia. So briefly about our linkage to transcribus be well what we have done in the terms of dissemination as to say how we engage research groups, and how we managed to realize some worship of a transcription with other scholars, and then the deliverables that we got by those workshops and the work, and then future insights of the whole project. So, let's talk at the beginning let's start with the presentation of the project Mambino. It is a project that start in 2008 is directed by Anna Boniola and Stefano Neri, as you can see, and the members of the team, the staff is composed by different scholars, mainly philologists but also the age specialists. And also the partner institutions are varied as to say we have contacts with libraries with DH centers and also with clearly scholars and research groups. So, well, which are our objectives. Well, the Mambino project deals with Spanish and Italian gibberish fiction. In the so called the Libros de Caballeras in Spanish, so books of Chivalry. They spread among Europe imprinted edition, and well, they are interesting because even if they have been excluded from the canon in some way they tell something about the experimental moment of the novel in the modern period and so the modern period so they are really well not studied but really interesting in the definition of a novel during that period and our objectives started with the sensors of preserved copies of Italian romances of Chivalry, which are some books derived by translation or original spin-offs from the Spanish books. So then we start some digitization programs within European and well worldwide libraries. And then I will talk a little bit about the creation of a digital library of Italian romances of Chivalry, which is under production and we, we can get to that, I guess in June of 2023 with the publication of the digital library. So our object of study as you as you can see is composed by a lot of books. We are talking about more than 50 books among translation and spin-offs that derived from Spanish, Spanish books, but also original spin-offs or adaptations of those books in Italian. So, well, our object of study is very vast and well, we are centering by now in the two main cycles, the Amadeus cycle and the Palmarine cycle, which are the most famous in Spain. So, well, our scientific research produced more than 50 scientific articles on that matter. We are also maintaining journal publication and annual journal publication called the story as Finchidas and we published two repositories, one based on the Amadeus of Gold cycle and one of the Palmarine cycle, which recounts the contents of those books with summaries, with name indexes, and also our bibliographic description of the World Corpus. But added to that, we realized that with other institutions, book censors of the edition preserved in libraries and archives. So, we counted among 358 editions, which number now has increased and there is just something like 400 copies, or 400 in the edition, sorry, and we registered a lot of copies of that edition. So, we are talking about the literature that spread among Europe and other countries also in America during Renaissance period. Well, starting from the census we also thought about launching some digitization programs. So, at the beginning we experimented it with local libraries, like the Civic Library of Verona, and that collaboration produced as you can see a box of DVDs that in 2010 was something like a good technical solution but now seems the best. And then, starting from 2018-2019, we established also some collaboration with, for example, the Biblioteca Nacional de España, other Spanish libraries and also the Marzana Library of Venice, and we're hoping to establish furthermore digitization programs. In that way, for example, with the Biblioteca Nacional de España and the Marzana Library, we didn't take the digitization by our own, but we suggested to them to rediscover that literature, those books. So, in some way, we are enhancing the knowledge of the whole corpus that we are starting. And, well, the final result will be the publication of a membrane of project digital library. As you can see, it is based on a day publisher visualization. And we integrated in a synoptic view all what we have studied and what we have done during those years. So, on the left, you can see the triple IAF image that can be zoomed inside the visualizer. And then in the central part of the of the day publisher visualizer, you can see the transcription, the editor transcription, so we started using transcribes for that and then corrected the resulting transcriptions. And on the right side, you can see the, well, the summary of chapter by chapter of each book with named entities inside it. And we also have to add some motif index analysis to do that part of the of the edition. So, well, that has to be the final result. But we started with automatic transcription of Italian historical printed documents so our workflow involved the digitization the first step. And then follow with the transcription, the automatic transcription surely. And then we started to model in XMLDI our transcriptions and published them. So, well, we faced little problems in the transcription of those books, because of the characteristics of the corpus. And we are talking about printed books, mainly published in Venice between 1530 1580. The script that we dealt with was italics, the ride from Manuzio's italics. So we had special characters and ligatures and abbreviation also inside the text. And we had to face two other problems, the format, they are in octavo so they are pocket books, and it implies that most of all they we can have books in bad conditions as to say they were read in places in the church stairs so people use that to read it loud to an audience they are in bad condition they are stains they are broken in some way. So, we had to face also those problems and the extension of those books as to say my transcription was not viable. There were something like 1000 pages for each book. So it was a problem for us. So we started using transcribers. And, as you can see, the results were was very fascinating for us. From the beginning with italics, we experimented something like less than 1% of character error rate for each book. When you see that the index goes up to the 1% is because the digitalization work very very bad as to say they were something like the Google book digitalization in black and white, and without augmenting the contrast showing some way. It was for printed books transcribers worked very well. And those initial results bring us also to experiment with the Spanish Gothic script. And as you can see, the results are quite similar even if the Gothic script in some cases is more tough to recognize. And also with around script Romana script in Spanish language. So, well, in some way are lamented by those results. We started to improve our relation with red corp and transcribers. And also, well, we started to imagine in what way it could be. And the scholars could be engaged in the usage of the transcribers. So, we faced a little problem at the beginning because from the text recognition results and softwares that were produced in the 90s. The software and those bad results produced something like a bias in humanist community towards the OCR and HDR softwares. The presidents were different nature but they told us that those software was not sufficiently reliable. So, it sharpened the distinction between clear manual transcription and the OCR. And also the thing that OCR and the HDR softwares at the very beginning required a nine level of expertise. And he wanted to amend this wrong perception because our experience with transcribers was completely different. So, well, we started in, sorry, there's an error in 2018 stating a memorandum of understanding with Günther Mulderger when he came to Verona. And then, well, in September 2020, I became a transcribers trainer, transcribers ambassador. So, during those recent years, from 2018, I gave something like 17 workshops, three more are coming in this year. 15 seminar presentations and free consultancies. And the very good thing is that I was called to collaborate with four and more international projects. So, well, those lead me the possibility also to publish some public extended models referred to the scripts that we dealt with. Well, how works the dissemination and the engaging of scholars. Two ways mainly on the one hand, I started to present my research on transcribers the result that I obtained in seminar and different congresses. In that way, talking about transcribers in seminar and different congresses. I only increased the scholar knowledge on automated transcription and the users of HDR. But also, well, work well for while solving or assuring people about the technical biases that they yet that is quite often a matter of fact for human studies. I suggested future improvements. So, mainly going deep inside any specific results of the audience that I had to face. And, well, enhance collaboration. So people that listen to those presentation in some way asked me something and we establish a cooperation between research groups. On the other hand, well, workshops were the best thing to engage scholar and researchers, because they could see concretely the results that I obtained by workshops in some way. And I wanted to get scholars more familiar with some scrapers pipeline. But also, I organize some transcripts, well, some crowdsourcing mediated crowdsourcing projects with because involved specialists and well, well formed scholars, and show also some related technology as you can see this content for example. So, starting from that, I proposed to you but I'm in the willing of listen to other fruitful collaborative models. And I describe two of them are research workflow model that is the one that begins with the giving of a seminars and little presentations. At the beginning I organized that seminar to teach people how to use transcribes and how it can be applied the research. And then the people that participated to the course that were scholar specialists participated after five months in a huge Congress that we had in Verona. And all of them worked during those fifth months, well coordinated by me to some projects, and they presented during that Congress, some results on, for example, the limits of layout analysis, how they transcribe and how to choose better editorial models, and then also something about post production of the results we obtain. It's a fruitful research because the, well, the proceedings of that of that Congress were published in our specific journal called the story as he does in the first special number which were published in June of 2022. This is one of the digitales he studies literarious is planning goes which I edited with with the other other collaborators. Another workflow model that they can discuss is the way in which I engage scholars giving a workshop, for example that one in the well by zoom to the University of Mexico the UNAM National University of Mexico. And, well, in that case, I gave something like 20 hours of theoretical and practical presentation of transcribes, we transcribed manuscript in that case book of delivery which has never been transcribed yet. And we, at the end, well, students and the PhD scholars participated to that, to that special training moment, they transcribe a lot of pages by their own, they work by their own inside transcribers. So, at the end, I got them engaged more with the publication of a model for for that specific manuscript. So, in some way, the publication is a reward for people that participated to the course and produce their own transcription so, well, it's, it works really well and I guess that in other cases I will go farther with that. So, just to go much speedy. Yeah. The deliverables that I produce during those years. Well, I'm, I produce the two extended Spanish extended model for Spanish 15th and 17th century scripts. We got the scripts on the one end. As you can see, we participated. Well, there are a lot of authors that participated to it. The model can be seen in the transcripts website in as a public model with a brief description, then in my GitHub page with a more detailed description, and then the data set is saved in Zenodo with rescripted, restricted access. People can ask for that and we can manage it also for engage and collaborate with other partners. Then, well, the Spanish Redonda round script belong to the 16th 17th century. As you can see, in the same case we produced a public model. The CR is very low as they are printed books, but also that that model involved a little text is that are some something like the pregasities of Spanish culture. So, they were something like four pages, five pages, six pages, not, not so much data, but a suitable data to interpret other other kinds of material because they were very damaged in some way digitized in bad condition so it works really well for those scripts. This is some drop caps or some dashes at the end of the line so it's really, it's really working well, and also the data set in Zenodo, as I told you, then we are producing the italics model for the books printed in Venice, between, well, mainly in 16th, 17th century. And for that we involved the different partners as you can see the University of Verona of Padua and of Rome, La Sapienza, and we are working together to, well, to produce that model and to publish it. And also our two columns P2Pala model with adding and the two columns detection so it worked really well for our books that in Spain were printed in folio in two different columns so we have to solve that problems because unless humanists will not agree with the usage of transcribes if we didn't solve that that matter. So, let's go on with and to conclude with future future insights. So, our extended model will be implemented and they're going to be implemented each six months or each year in our hoping. So, the model updating works like that. Get involved some new transcribers and new scholars that want to transcribe include new project and institutions and by their work, I will update those models constantly. And then I also want to combine those extended model at the end so multi font documents can be transcribed as well. And I was also to increase collars interaction by setting across cutting research field for them, but also to coordinate some crowdsourcing group and well set the guidelines for simplify the transcription workflow. And then what I'm aiming to do is to correct it in a semi automated or automated way, our transcripts or the transfer the ride from our models. So, we can, we are involved in the transformation by batch transformation with some software, for example, Python, but also systematic correction of word expression. So, for example, in this image you can see. Well, the usage of a software by Jose Manuel Fradejas, which is a partner of us that is experimenting with an LP tool and they integrating historical dictionaries just to correct or having an insight of which words can be wrong in the transcription in the transcription. And then, well, the idea is to develop a sustainable and open access access publishing pipeline. So we start with the digitalization, then, well, maybe we have to move them as to say, we can transcribe them within the process and then go on with automatic collision. Then we can model them and well, use some TTI models, TTI editorial models, for example, diplomatic and normalize or others to, well, make first call as much simply the users of the result in transcription. And then customize visualization in type publisher, and then integrating link open data to to the final addition to the final result. So, well, that's our last goal to well create something like a working space for that. So we are working on the creation of a project that we call type case project, referring not to the boxes in which types were put during during those the Renaissance period. And so we are centered mainly in the text recognition of printed documents of the modern age. And the objective is that these sites work something like an environment for interaction between scholars, but also a blog just to present a new release of data sets of new models of models and also some information about future events and workshops. So I guess that in June 2023, we will be able to well to publish this site and also our final digital library project. So that's all for now. Thank you. Thank you very much for your talk Stefano. Yeah, very interesting. You can see it's not just about handwriting. So prints, especially prints that are difficult to read for classic OCR. So this is one of the areas where transcribers is comes in very handy to. So are there any questions. Thank you for your talk very, very, very interesting. And for the script of your rotunda. It's very similar to some script printed in French, at the same time. So I was wondering if we can merge our model in French to your Italian or Spanish, because I realized that it was the same as the same way, just what you showed. Can we, you know, to merge the funds together. Yeah, I mean you wrote I guess that there are much of similar funds during that period but, well, I guess that you're right as to say, I was wondering at the beginning if separating for each language and maybe to print the models and create a specific model for each printers was the best solution, but I came into the possibility to, well, create extended models and extended models and reach much more confidence and reliable results then than using separate digital models by my experience so in some way yes I guess that when the model is well refined. Now I'm publishing the version two of the other of the two models for Gothic script and around script. So, well, yeah, we can merge them quickly I guess. So, for example, in French. Yeah, you know, it can serve also us in French or Italian or Romanian. Yeah, but I guess that it works well I was afraid, quite afraid by the language, the language matter as to say, at the beginning I thought that a model for Italian script servers works better with Italian printed copies. Maybe yes but well extended model revealed to be the solution for most of us so yep, thank you for your suggestion we will. I think this is especially the case with printed material because you can create large amounts of very good ground truth very quickly there, because the CR that you start with is already very low so you can correct a lot of material quite quickly. The language model is quite robust when it comes to mixing various languages so for the language model it's just like one bigger language so to speak. So, I guess that the drop caps or literanobili or as we call now spread along all the whole Europe so in some cases maybe some French script as that really that type that we needed to be recognized the inside Spanish text that's the thing, because type circulated along Europe so so one more question. And, and these that might serve also for people who wants to reconstruct circulation of. I think it's just not loud enough. Up a little bit I think it was just not loud enough. Okay. The kind of data that you're producing can also serve to look at the circulation of the, the costs in Europe, eventually. The solution inside was, I was thinking about, to me, the key was putting function by now seems something like that is unusual but could reveal something like that as to say the representation of capital letters, it could be exported in some way by the key was putting the function, then we can, we can see different representation inside the world works in a moment so. Yep, I guess that also it will can improve not the knowledge of those. Well, also, to know that the movement of those material is quite tough as to say some, I don't know some Spanish printer goes to. So maybe they, well, they combine or mix it with others, and it's quite tough, but I guess that we can use it. Yeah, computer scientists will do it. Yeah, we have to start from something I guess so. Okay, so I think we need to wrap up and. We'll get the lunch break started. And see you all again soon thanks for attending this panel and see you later.