 So let's not lose any more time. There will be more people joining, but we will start. Good afternoon, everyone, or maybe good morning, depending on where you are on the world. And we're very, very happy to welcome you to today's webinar about Transcribuz. My name is Flo. I am mainly working in operations here with the Transcribuz team. And together with my colleagues, Sarah and Thelene, today, we will go through the Transcribuz web interface together with you and explain everything that you need to know to use Transcribuz in the web. So let's have a quick look at what we will talk about today. First, we will have a quick look at what Transcribuz is and who is behind Transcribuz. So the cooperative behind Transcribuz, we will have a look at the new Transcribuz interface, go through the workflow and how to use Transcribuz in the web app, and then also on how to train your own custom AI models. And finally, there's a small section about the new upcoming subscription plans. And then, of course, there's also room for questions. And questions is already a good keyword. If there are any questions, just add them to the chat. We will do our best to address them all. And yeah, if you have any questions that come up during the webinar, just enter them in the chat. And then at the end of this webinar, we will try to answer almost all of them, so depending on how many there are. And of course, we will also answer the questions in the chat directly during the webinar, as well as if there are questions unresolved. Of course, we can also write an email afterwards. So yeah, quick introduction about Transcribuz and who is behind Transcribuz. So first, what is Transcribuz? Yeah, since you are here, you probably already know at least the name, what is Transcribuz. And we also have a little definition to kind of keep in mind who or what is Transcribuz. We call it a little AI powered LA who is designed to simplify the time consuming work with historical documents to kind of give you the tools that make working with those documents fun. Then a quick look on who is behind Transcribuz. Transcribuz is owned by a cooperative, which is basically a European cooperative based in Innsbruck, Austria. We are more than 100 members who basically own this cooperative. Everybody can take part, also private members. We have a lot of private members that also share or hold a little share in Transcribuz. So basically it's open to everyone. You can become a part of Transcribuz as well as a co-owner of Transcribuz by joining the cooperative. And obviously there's also a lot of institutions, universities, archives, national archives and other institutions that work with as member institutions with Transcribuz. But there's more to it. There's a whole community behind Transcribuz. There's now about 160,000 users worldwide that are already using Transcribuz. Here you can see a nice image of the Transcribuz user conference last year. Last year in September, we had the user conference. There will be another one in February in Innsbruck again. And it's a really nice event where we kind of try together with all our users and we will have it in person in Innsbruck but also online, it's a hybrid conference and it's a really nice opportunity to connect and talk about Transcribuz. Then a quick look at the ecosystem. So what is Transcribuz in general? We understand Transcribuz more than a simple platform. Transcribuz already has become an ecosystem with basically three major components. If you're kind of working with historical material, there's a full chain of things that you need to do in order to make those valuable sources available. In terms of ecosystem or the Transcribuz ecosystem, we basically have something in the very beginning. Once you try to start with or once you start with those historical documents, then documents you obviously need to digitize them first. What we have come up with is basically a portable scanning solution called the ScanTent, which is basically a little tent that you can set up to scan with your smartphone. It's a controlled lighting situation that you can put in your material and scan with your smartphone. There's also an app it comes with. It's called the DocScan app, which kind of recognizes automatically easier kind of turning pages. Then the next page image is automatically taken. And then obviously there's the biggest part of the Transcribuz ecosystem, which is the Transcribuz platform, which has also a lot of different functions. On the one hand, you can manually transcribe, collect data, annotate, but most importantly, you can train your own custom AI models with Transcribuz and then use those AI models for AI-based recognition. There are four different types of AI models. As of now, what you can train with Transcribuz, they used to be flex recognition models only, but yeah, we have just recently introduced two more model types, which I will explain in more detail later on, which are called field and table models. And then there's also the field model that you can train, the line model that you can train basically for the text lines recognized. And then eventually there's also Transcribuz sites, which we are launching in December. It's already available on beta, which is basically a solution to give you the opportunity to share your Transcribuz collection with the world. So basically set up your own website where you can make your historical material searchable and available to everyone. But what does Transcribuz make possible? So first, as you know, Transcribuz, the most important component of Transcribuz is the manual, but also automatic transcription of handwriting and printed text. So that's the key functionality that most of you might already know in terms of using Transcribuz, you can use already available AI models, but also your own custom made AI models to Transcribe text. Yeah, a set training with the other key component, when it comes to recognition, there's also collaboration features. You can share your collections and work together in those collections. That's a great opportunity for crowdsourcing projects, for instance. And then obviously, since you can recognize the full text of your material, you can obviously also search in your documents with Transcribuz. Then there's also tagging. So annotation, you can enrich your material with Transcribuz, annotate the material with metadata, and structure, also annotate the structure of your documents. And then eventually, there's also, yeah, almost a thorough of export formats that you can export with Transcribuz to kind of get your material out of Transcribuz again and use it for whatever you like. And now I would like to hand over to my colleague, Kylene, who will walk you through the next generation. So basically the Neutrus Previews Web App, which is now live for about one and a half months. Yeah, thank you, Kylene. Exactly, thank you, Flo. Yeah, so as Flo mentioned, Transcribuz switched to the Transcribuz Web App a few months ago. And the new app combines basically the features of the expert client, many different functions with a new easy-to-use interface. So the expert client, sorry. We just had this short freeze, just go ahead. Oh, the expert client on the left combines or has many different features, but is difficult to use, especially for beginners because of the complex interface. The new web-based Transcribuz platform has a clear design that is easy to use and also easier to learn. And we now have a platform that is user-friendly and easy to navigate. For the web app, no download is required. You can simply log in from anywhere with your personal access and account data to work on your document. The important functions of the expert client are still added step by step to the web app, but the new functions will be only available in the web app. You can find it on the link, by the link that we have here on the slide, on app.transcribuz.u. It is available in 12 languages and data protection is very important to us and all data is stored on our own servers in Innsbruck, in Austria. In the new web interface, we have four main workspaces, which you can also see here. And the approach here is, as with the Transcribuz app in general, a quite holistic approach. So the first workspace is a Transcribuz desk. This is where the work is done, where the documents are uploaded, sorted into collections and the pages are transcribed and searched. So here from the desk, you have access to the documents and of course also access to the other workspaces. The second workspace is the model workspace and this is where the Transcribuz magic takes place. This is where the models are trained and managed. So we can take a closer look later, but we have really redesigned this workspace. This is the area where there is an overview over the public models and also your personal custom models. Then what's new, what Flo also already mentioned is Transcribuz sites. There's the third main workspace. You can think of it as a content management system where you can publish your material simply and easily like a website. And of course, you can also personalize this website with your colors, with your images and in this way, share your work and your transcribed documents with the public. But we will talk about that more later. We have also something excited planned for next year that is Transcribuz Connect that we've also already teased. We recently cracked the 150,000 user mark and we want to give our users a better opportunity to network with each other and Transcribuz will be a better way to connect with like many people. So now that we have looked at why the switch to the Transcribuz app was made and how we were structured the platform, we will now take a closer look at the basics and I will explain the basic keywords. Manages to go from image to machine readable text. This makes it easier to work with historical documents and their content. So how does this work? If we move to the next side, we can see that Transcribuz uses HDR which means handwritten text recognition and the HDR technology can transform images of handwriting or printed text into text that is readable and searchable. And this text we can then also export in various formats. In the field of automatic text recognition, however, there is not only HDR but also OCR. So we have handwritten text recognition and optical character recognition. In principle, both of these methods transform images or text into machine readable text but there are some important differences that we have to differentiate. So OCR can only transcribe printed or typed text and HDR can transcribe typewritten printed as well as handwritten text. There is also a difference in the way that these technologies work. So while OCR focuses on single characters, HDR can process the entire line. And one major benefit of HDR is that it is possible to train a custom AI model that can work specifically with the handwriting style or the printed text that you have. So you can really customize it. So what happens when a page is recognized and so what happens when we use this HDR technology of transcavers? When we start a text recognition, there are actually two steps, even if we click on just one button. The first is the layout recognition that recognizes the lines and the text regions. And then the second step is the actual text recognition. So that an image can be transcribed correctly. First, the layout is analyzed and the documents are segmented into text regions and into lines, as we can see on the next slide. You can see on the left side that the text region is a box that encloses all the handwritten text contained in the image. And the baseline on the right is the most important reference point for text recognition. This is a polyline that runs along the bottom of the handwritten line of text. This is really important to understand that transcribous or the text recognition model knows where the text is and where the lines are and can recognize the text correctly. If you run the documents through the standard text recognition, then both layout and text recognition happen. However, as we can see on the next slide, they can also select text recognition and layout recognition as two separate recognition steps. And this can be very useful and helpful, especially when you have a complex layout, for example. So now that we have seen how this actually works, how the text recognition process works, I would like to explain first how the documents are managed in the transcribous platform. So what we have here and how it works is basically that the first step in our content management system is that you have collections. These are folders in which documents are stored. Usually one collection is a project and each collection has a name and an ID. In the collection, you can find the documents and here you can upload as many documents as you would like. At this point, it is also important to say that the transcribous content management system has no subfolders, but collections and documents. Within these documents, then, are the images. So these images are pictures of the pages of the respective documents. So one document can contain one page, for example, if it's a letter, but if it's a book, it can also contain hundreds of pages, for example. So it is important to understand the structure, how pages, documents, and collections or project are organized so that you can plan and structure your own work most efficiently. We also have new options for managing documents. It is now possible to create shortcuts to other collections. When you create a shortcut, the document stays in the main collection, but it also appears linked in other collections and can be accessed from there. The documents can also be moved, copied, and deleted. If you delete a document by mistake, you can still go into the recycle bin and you can either restore the document if it was an accident, or you can also delete them permanently from there. So now I think we'll take a closer look on how managing the document looks in the transcribous app itself. You can take a look alongside us. You can go again to app.transcribous.eu, as you can also see here, and log in with your account details. So in the transcribous web app, this is the landing page. This is home in the transcribous desk workspace. As you can see in the top right side, we're now in a desk workspace. In home, you can see your collection and your recent collections and your recent documents. If you then move to collections, which is just right beside home, you can see your personal collections or the collections other transcribous users have shared with you. You can also switch the view from thumbnail and you can also edit exactly here. We can see the table view now. You can also edit the information of your collection by clicking on the three dots in the right corner. Of course, in collections, you can also add a new collection as you can see here. When you open the collection, you will see the documents inside. Here you can see the thumbnails of the documents, and this is also where you can manage your documents. You can sort your documents by alphabetical order or how new they are and exactly to newest to first, oldest to first, depending on how you prefer to have your documents sorted and shown. Again, you can also switch from table view to thumbnail view. Here you can also edit the metadata of your documents and you can do that by clicking on edit and you can add the author, writer, genre, languages. You don't have to add the metadata, but it can be quite useful to have that information on hand if you'd like. In documents, you can also create shortcuts. This is what I've mentioned before shortly. Again, they're basically a soft link to another collection where you have a shortcut to the document from another collection, but the document will stay in this main collection. When you open a document, you can see the images of the pages. Here we see them again in a thumbnail view, which of course you can also change. Here you can also adjust the size, for example, of the thumbnails. You can make them larger or smaller. And one nice thing here is that you can also filter them by status. So you can see the different status in progress or done. And this is also indicated by the colors of the documents. So what the status basically is that it shows you at what point the transcription process is. So gray documents indicate, for example, newly uploaded documents, orange means that transcription is in progress, done means that that transcription is finished, and ground truth, which is a dark green, is that the transcription is correct and you can also use these pages to train a model, but my colleagues will explain more about that later. We can also have a quick look at how it looks when we open a page or an image. This is the transcriptus editor. On the left side, we have the layout editor with the image itself. You can also edit this one. We have where you can find shortcuts to edit it. On the right side is exactly here. We have the guide with the shortcuts. And on the right side of the transcriptus editor, we have the text editor. Here is the text, the recognized text here. You can also write the transcriptions yourself and edit the transcription. And there is also the option to, exactly, we can see my colleague is showing how to add transcription. And we will have a closer look at that later on again, I think. So we have looked at the structure of transcriptus, where to find the collections and where to find the pages. And now let's take a closer look at how you can upload and work with your own documents. The process of uploading and recognizing takes place again in the desk workspace. This is where the work happens. And what we'll explain here is how we get from the image itself, from the uploaded image, to machine readable text. The first step in this workflow is to upload the document. After that, we check whether there is a suitable model, a suitable public model that already exists for the scripts or the language that we're working in. If there is no suitable model or if the public model doesn't work well, you can train your own custom model. If there is a suitable model, you can simply use it depending on the extent of the material. It might take a while for the actual transcription process. And we will now take a closer look at how that looks. So assuming that we have a suitable model, what happens next? Again, we have to upload a document. To do this, you can either go into the collection, create a new collection, and then you see directly that you can upload documents, either a browse or drag and drop them inside the file upload. You can give them a name if you want and then click on Submit to submit the pages. So the upload is in process. We can see this takes a bit. We have a few pages, but it is quite fast. And when we have uploaded the documents, we are able to start the text recognition process with a public model. And we'll take a closer look at how this looks now. We have an overview of the job in the job queue. So we are seeing that the document is created, that it is running. Already finished. And you have also an overview of the, there we go. So what we can do now is select the pages that you want recognized and you click on Recognize. This will open a new window. We have already a total of over, as we can see, over 150, 160 public models available, ranging from German Kurren to Kirlingen Menuskel or Ottoman Turkish even. And you can then filter these public models by language, for example. Also, we have here the language is English for the document that we have. So we can select an English model or filter by English model. You can also select if it's handwritten or printed and the timeframe on the left side. Then you can see that the documents at that, sorry, that the models, the public models that are available are shown. And if you click on a model, you also see further information on the right side. So let's take a look at the English Eagle. For example, you can see more information about what language it is, the training set size, and also the CER, the character error rate. The character error rate shows the percentage of characters that were transcribed incorrectly by the text recognition model. So for example, let's say with an error rate of 5%, that means compared to the manual transcription. So the correctly transcribed pages, five out of a hundred characters were incorrectly transcribed. With, if you want to make the model more reliance or increase the performance of the model as it depends on a few factors, for example, also the training material that the model is trained on for example, the performance with the character error rate, you can increase the results by clicking on, for example, language model. And language model means that not only the visual text is taken into account, but also the language in which the text is written. Then you can click on start text recognition to start the recognition process. Again, you can check the status in the jobs queue and see where, how long it might take. Recognizing the pages might take a bit of time depending on the size of material. Once it's done, you can just open or reload the page and then you should be able to see the transcription. We have looked at the editor before, so we have the left side of the editor with the image and on the right side, then that transcription should appear. It's already running, so. Very nice. Depending on the amount of pages, you're also faster in the lane, smaller drops below five pages, for instance, or fast lanes, fast lanes, so you don't need to wait too long. Maybe in the meantime we can take a look at... Ah, perfect. Should we take a look at the supermodels in the meantime? Yeah. Perfect. So one new technology that we have are supermodels and what is new and impressive about them is that they are very large and generally very versatile models. So this means that you can use one model for different scripts and different languages and also for mixed material. So usually you have to really look at what language the model is used for, what material the model is used for, and then find a public model that is as close to your material as possible. But the great thing about our supermodels is that they are very versatile in use. The first supermodel is already available. We have the text Titan and it has been trained in six languages and as you saw, you can already select it. Sorry, I just started the recognition so we can maybe then compare the results from the Titan afterwards. That's a good idea. Yeah, thank you. So it's running in the meantime. Perfect. So I think we've covered now the basics of how to upload the documents, how to use a public text recognition model. Maybe we can see the results of the recognition. So again, here you see on the left side the image in the layout editor. On the right side, you have the transcription. Some things need to be corrected and edited, but the basic workflow I think has been shown and also the results you can see quite well how it technically should work. Yeah, so up here you can always see which model you recognized it, but probably the other recognition is still running. Maybe we come back later. Maybe we can come back later. To see, yeah, with the text Titan, there's a longer queue that will take a little longer until that page is processed. All right. All right, so I think now we've gone through a lot of information about how the new web app is structured where you can find your documents, your collections and how to use public text recognition models. And I will now hand over to my colleague, Sarah. And I think Sarah will now explain how you can work with the transcriptus editor. In the meantime, just a little comment. We have more slides prepared. We will share them with you. So everything that we had a look on now is also in the slides. So I will just briefly go through them. And then you can also have the slides afterwards. And now our colleague Sarah will continue. Thank you, Flo. Thank you, Elena. Okay, now we will have a look at how to enhance our transcription, our automatic or manual transcription within a transcriptus and how to work with the editor and add both the structure and the textful text. Next slide, please. Okay, we have done our text recognition. So now we want to move to the tagging part. You don't have to tag the documents if you don't need to, but this is a feature that you can use if you want to enrich your transcriptions and then export the tags in various formats. For instance, you can export an Excel sheet with all the tags. You can add properties to the tags and you can also export a TI file with all the tags in it. We can have a look inside transcriptus so I can show you directly within the platform. So there are two different types of tags. I will just choose another collection, it works. Yes, so the first type is the structural tag. And we use that mainly to highlight a part of the image and add the tags there. So you see here, this is a page. So you can delete this text region and start from scratch. Here at the top, we have this declaration. I can add a text region there. I can do the same for the title, the different the capital letter here and then the different paragraphs. After that, we can assign structural tags like here. We have the tag paragraph, the tag initial here and here we can add the tag heading or title and so on. We can also customize the tags here. I don't see it in the screen, probably so. Yes, thank you. Here under tags, you can decide which tags you want to see and you can also customize them. So add tags based on the type of documents that you are using. So now, flow enable more tags and when you refresh the page, you will see them up here between the options. So, oh no, they're already there. So you can have another tag here. And how these tags are useful. The first option is because you want to apply, in this way you can apply different recognition models to different regions of your page. So let's imagine that you have a printed text with some unwritten marginalia. You can tag the unwritten marginalia as a manuscript and tag the main text as a printed paragraphs. And then you can apply different models to the different sections of the image. Or you can also use it to export information about your document or for information extraction. We will see later with flow. May I just quickly show something because there's a question which is not perfectly intuitive how to use the region tool. So if you kind of click and then try to draw the region, the page is moving. We will change that with the new editor that will be released soon. But to draw regions, just click once and then you can pull the region open. So that's maybe not perfectly intuitive, but you need to click first once to set the first corner point and then you open the region because that question was in the chat. Sorry, Sarah, then you can continue. No problem. And then you can also export with a new export feature that we will introduce. You will also be able to export the information in a structural way. So you will have the tag in an Excel, the tag and the text contained in that tag. So especially for information extraction, this is very helpful. And now we also have a feature to train a fields model. Flo will explain that later, but just to show you an example. Here in this index card, for instance, you can tag the different information with different structural tags and train a model to do that for you as Flo will show us later. The second type of tag are the textful tags. So tags, you tag the text, not the structure. So we are here. This is a word diary. And you see I have the automatic transcription and I can tag the text here. There are different, there are a pretty fine set of tags like if I click here, you see, the abbreviation data and person tag appears, but we can also configure the tags and enable the place tag. And we have to click save, or you can also customize the tags to your needs and also add the properties to tags. In this case, we have the Yorika station. We select the tag, we click, we select the word. We click place and you see I can write the place, Yorika. Or there's also this nice feature where you can directly select the wiki.id. So in this case, automatically transcribers propose me the Yorika railway station. It's correct in this way. So I can select the wiki.id linked to this place or you can just type Yorika railway station and see the other options. And as you can see here, wiki.id was the ID here and you can do the same for dates or persons that have a wiki.id. And for example, here by the March 16, we can target date of the date and write it here below. And in this way, the tags, now the tags are saved and we can search tags and at the same time export them in various formats like as a list of tags or as an XML. Good, say that I would go to search this, how to search within transcribers. So the search bar here is here at the top. You can search tags across all the collections or you can search within the current document. I want to search the word piece and here I have the results. And if I click on one of them, I'm directed to the page. I think this we found about probably because it's not showing or the Zoom screen is acting quite well. Well, it might be related to Zoom, like with the screen sharing there is some. We should be very directed not to the old pager but to the pager with the with the result. And this is, so let's do it again. You can then filter your results by author, uploader, title and collection ID. And in this case, the author and the title are data stored in the metadata while the uploader is who uploaded the document and the collection ID if you want to limit the search on a specific collection. And also here you can limit the search in the documents. Like this is the document I want to look at. And you see here the result, even if the text is different but it's probably with Zoom, I think. You see the result here and you should see also the correct transcription here on the right side. This is an option that you can also enable the, you can also enable the FATSY search to search a similar words, a word that differs by one or two characters. And it's also possible to enable an advanced searching tool called smart search. To work with smart search, you need to enable it during the text recognition. So when Helena show us how to start the text recognition, there is a small box called smart search. With smart search is advanced tool because it saw all the possible alternatives that transcribers do, that transcribers guess. So what you see in the text recognition is just the word that transcribers think is more probable to be corrected but in reality, transcribers does a lot of guesses. And with smart search, you can also go through all those guesses. And sometimes especially if the car tire rate is high, you can still be able to find the correct word. So the word that you are searching, even if the transcription is wrong or is not 100% correct. Do you want to add something? Oh, I think that's fine, yeah. Quite handy tool. And then the export teachers. First you can export an entire document. So you go here and you can select the entire document or you can do it inside the document and select just a few pages. And when you're here, you click the three dots here and you click export. And there are different type of export and we will improve them in the next months. So the standard export, for now we have standard export, PDF export and doc document. With the standard export, you get the images and the doc files, transcribers PDF and the page XML. And it's also possible to export the alt-excel XML. In this way, both with pager and auto, you get an XML file for each pager and you get not only the text, but also all the coordinates of the points. So the coordinates of the text regions and lines as well as the text. So this is what you get from there. And then with the PDF documenter, you can select more options. You can have the image plus the text layer. With this option, you get a sort of a searchable PDF. So you get the image. It's what you see, but then beyond the image, there is the transcriber text. In this way, the PDF becomes searchable or you can export also the images only with the text as an extra text page. So one page is the image and the second page is the transcription. With the image plus text layer, it's also possible to highlight tags. So you will have the image, but the words, tagged words will be highlighted in the PDF. And finally, the documenter here, this is a simple word documenter and you can decide if you want to have it as a continuous text or if you want to preserve the landmarks, force the page marks, unclear words. If you have tagged the words as unclear, you can export that as well and write an image name before the text. Then there are the advanced settings. These are really handy if you are working with abbreviations and especially early modern text because in transcripts, you can add a tag for the abbreviations and write the expansion as a property of the tag. So as I showed you before, in the case of the data, the property was when the event happened, but there is also the property expansion for the abbreviation tag. And with the word export, you can decide if you want to keep the abbreviations, expand the abbreviations or in brackets or substitute the abbreviations. So you can have a, because we know with abbreviations, the searchability of the text is limited, but if you substitute them in your Word document, you end up with a completely searchable text. When you select your feature, you have to start the exporting, the button here is below and you will receive an email with a link and from that you can download the export but you will also see the job in the job table right here. So here you will see that the export job starts and you can also download your documents from the export document from there. Yeah, and about the upcoming export feature that we will introduce in the next upcoming months, we will have the possibility to export tables in the export into Excel. So you can flow, we show us how to train tables. So it's possible to do all within transcripts. You can train the recognition of table, then do the text recognition for the text instead of the table and then export the table in an Excel sheet because it's not just important the text, but also the position of the text in a table. And then we will have the TI format. So the text included in initiative. This is more important for digital humanities and digital editions. And we also improve the option to export tags as an Excel sheet. So you can export all the places that you had tagged inside a collection and work on them. Good, I will go back to the presentation, talk about training. Here again, the slides will link to all these topics that are just covered regarding export and searching. And now let's come to the fun part. Yes, now move over to the models tab. So where the magic happens, where you can train your own models. You don't need to know how to code. There is a friendly interface that guides you to training, to train AI models. But before showing you how to train models, let's have a quick introduction to what the artificial intelligence and machine learnings are. So next slide. Yeah, here I had four types of models available. So we will just do now introduction about training models, and then we will show you the different type of trainable models. So there are text recognition models, baseline models, fields models, and table models. So when we talk about artificial intelligence, it is a set of technology and techniques that have low machines to perform tasks that require human intelligence. In the case of a transcripts, we are working with machine learning, which is a subfield of artificial intelligence. It provides the machines with the ability to automatically learn from data while identify patterns to make prediction with minimal human intervention. In our case, so our machine learning works with both label and unlabeled data. In our case, we are talking about label data. So we need to provide the machine transcripts with data that is labeled by humans. In our case, the data is the images and the transcriptions or the images and the layout elements in the case of fields and tables models. And only when we have this data, we can train the machine to learn from it, and then we can apply it on new pages. You have heard Helene talking about models, what models are AI models are algorithmic creating during the training process of a machine learning system. They are the output of the training process embodying the knowledge acquired. So after the training process, after the training, we end up with a model. The process is the same for the public models and for the models that you are going to train. And what the machine has learned during the training ends up in the model. And then you can apply the model on your pages. It's also important to understand how the pages are used during the training. The next slide, okay. There is this term that we use in terms of scribbles and comes from a machine learning. It's a ground truth. It seems a very strange term, but in a few words, the ground truth is the level on which the model is trained. So in our case, when we say we have to create the ground truth, it means that we have to transcribe some pages. So it depends on the material, but at least 50 pages, we have to manually transcribe them, we have to manually transcribe them or copy and paste the transcription from outside transcripts into the platform. And when we have this correct transcription, this ground truth, because we save it with the term ground truth, we can start the training. And when we have our pages of ground truth, around 90% of those pages are assigned to the training set and 10% is assigned to the validation set. The training set is the actual pages on which the model is trained. So the model learned on those pages, but then 10% of the pages is set aside and it's called validation set and the model uses those pages to test its accuracy. So you need done to spare effort on the validation set, but because it's quite important to have a good validation set to attest the accuracy of the model. Otherwise you could end up with a very good or very bad carter error rate, but which is not realistic because it's not based on a good validation set. So the model doesn't have the chance to, doesn't have the chance to attest its real accuracy if the validation set has some bias. So now we will have a look inside the transcripts on how to train models. Yes, we are now in the second part. So we want to train our model because there is no public model available or we are not happy with the results of the public model. This could also be another option. And we have first to start with the layout recognition then with the creation of the ground-through pages. And after that, we can do the text recognition. First, okay, thank you Flo. First, we have to select our ground-through. As I say, for unwritten documents, 50 pages is the minimum to start. What you can, there are, and it's also important to have a representative sample of the materials that you want to process. So if you want to train a model on two different, three different hands, so you want a model that it's capable to read three different hands, it's important to include all those hands, all the three hands in your ground-through. The same if you want a model that can read different languages, you have to include the examples of different languages of example of different scripts in the ground-through because the model will learn from what you show it. So we have our ground-through, we have our images and now you can proceed into ways. The first is to do all the work by hand. So we are here. This is a new page. As you see, it's not possible to type the transcription here. There is no line. If I try to type something on my keyboard, nothing happens. This is because we need the first to start the layout recognition. As Elene told us before, transcripts work with the regions and the lines. So the recognition happens at the line level but also the training happens at the line level. So we need first to run the layout recognition. Now I don't see it in my screen. Can you click it? Is it working? Yeah, sure. Like here below we have the recognition button and then you can choose between text and layout. Yes. Now we want to run the layout recognition public models which can select this model. Usually it is done automatically when you start the text recognition but if you want to create a ground-through you have to do it manually and you don't need to do it for every page at once. You can start the text layout recognition for the entire collection or the whole document. So now we have to wait a few seconds and after that we will see here the number of the lines as you see here. And here you can decide maybe. If I just may quickly interrupt a few. There was one question. If you want to reorder those lines if the order is not correct then you say want to say that this is the first line that's the second one you can reorder the lines here and the left. Select the lines and once you clicked on the line it's always also highlighted. For instance, you see, you can say that's actually line one and now you have changed the order of the lines. Sorry. Yes, no problem. So here you can decide what you want to transcribe. For instance, for me this two notes this page number is not important so I can just leave this line blank or I can delete the line. So you select the line and then you click the cancel on your keyboard and the same here. And when you're here, we start the transcription. Here you see there's a, but for the punctuations you can decide how you want to treat them. Usually we recommend to follow what is written in the document but the model also learns to, if there is enough ground the model also learns to normalize punctuation or uppercase and lowercase letters. So we go on with the transcription and if some words are not clear what you can do is to tag the word as unclear or if there is a damage in the document you can tag, you can use a tag gap. So it's here the tag gap. And why it's important to know the unclear words because during the training we can decide to exclude them from the training. But you have to have in mind however is that if you use this unclear or a gap tag not just the Boston word which will be excluded from the training but the entire line up. So Boston December 29th, 861 the entire line won't be considered really training because the training happens at the line level. The second, so this is one approach. Let's stay here. Sorry, I forgot. And when you have finished all the transcription and you're sure about it you can save the transcription as the ground route. If you're working with collaborators, colleagues, volunteers, students and you ask them to create a ground route for you it could be useful to ask them to save the page as done. So after that you can have a double checking transcription be sure that all the guidelines are follow and then save it as ground route. Because it's very important to be consistent when you create a ground route. For example, here you should decide the quantum how you want to render it. Do you want to underline it or not? Or when there is this dash on the end of the word do you want to keep it or not? And things like that. So to get a good result it's better to be consistent in those choices especially if you're working with medieval or early modern documents. The second option. Here for instance, you can then also set the transcription guidelines for yourself to be consistent with the work on that. Yes. And about abbreviations, as I show you you can write the abbreviated format in the text tag them and add the expansion as a property. And you can train transcripts to tag abbreviation for you during afterwards. Or another option is to solve the abbreviations and by a certain extent the model learns to solve the abbreviation especially if they are very frequent in the ground route. So here it mostly depends on the goals of your project. So if you want a diplomatic transcription you can keep the abbreviation and write them as a tag them. If you want a searchable text in this case you probably want to have December as a searchable word you can just write December and if you do it consistently in your ground route the model will learn to do it itself. The second approach is not to do all the transcription manually but it's to start the recognition with a public model correct the recognition so we can start the recognition on this page. I think the text title or the English ego should be fine. Thank you. And when the recognition is finished you can correct the recognition and base on that train your model. Usually this approach saves you time if there is a very good model already available. If not probably it's better to do all from scratch. Or you can also work in an iterative way so you can train the first model on 50 pages then you get a certain level of accuracy apply it on other 50 pages and then start and then apply it on new pages create new ground route on that and gone. Like here the first line are quite good. So we just have to make some here. It's four and here is 23 I guess. Then you have just to check the transcription we can also underline this word and the same here and you can check all. Here for instance you see here there is an error and when you finish you can save as well as ground route. Okay when you have enough pages of ground route we move to the model training. So we are here and we want to train a new model. In this case a text recognition model. Your first ask to select the collection. So the ground route should be all in one collection but this is on the problem because you can always create a shortcut. So if you have a ground route scatter among different collection you can create a shortcut and have only one main collection with all your ground route. We are here and here you can select if you want to select as a ground route the pages with the latest transcription because you have saved the transcription as done or final or in progress or if you want to have the possibility only to choose your ground route pages. Here at this stage so if I go there for instance here I don't have any ground route pages so there's no possibility to select this document and here I can select all the documents in my collection or I can select just one document or go inside the document and select only some pages. You see here this document starts from page five because the pages from one to four are not saved as a ground route so I can select them but if I change this option they should become selectable. As you see now I have all the 41 pages in my document. Here we have selected the training data and the next step and you also see here the amount of words 4,000 words. In this case I should before starting the training I should increase the number of words because in this case we recommend for a single hand a model trained on a single hand to have at least 10,000 words per hand if it isn't written if it is a printed document you can decrease the number to 5,000 words. The next option is to select the validation data as I told you it's possible to automatically select it so automatically a 10% of your validation set is assigned. I recommend to use the automatic validation set to be sure that all the different kinds of documents all a variety of documents is included but you can also do it manually. And then we go to the model setup. Here you write the name, the description and here you can add the details of your model. Unless you make your model public the model is private so this is just information for yourself to have a reference when you go back especially if you want to train different versions so if you're training a different version of the same model to improve the character rate you can say train on those and those pages. Here you can add an image URL to see a snippet of the writing side here. You can select the language here. Sorry, the centerism and here below there are the advanced settings. You can read about all of them in our help center so there is the link in the presentation I won't go into more details here. I just want to show you here that it's possible to include a base model. What does a base model? You can select the practicing model so a public model or a model that you have trained and use it as a base for your own model. Usually a base model increases the performance of your model especially if you use a very general model trained on similar hands. So in our case we have already seen that the English Eagle works very well on my documents so it's quite perfect. So I can select this model and I'm just going to refine this model on the specific script on my documents. And the great advantage of a base model is also it can reduce the number of pages and words that you need for the training. So with a base model, 4,000 words it's probably not enough to have a good model but you can go down to 8,000 words and you don't need the 10,000 words. And you go to next you see an overview and you can start the training. After you're starting the training you can have a look here. You can see here the training has been created and after that you just need to wait the training. It could take to a couple of hours to some days and it depends on the material you are working on. So on how many pages, now I'm just going to cancel this work, on how many pages you are training, how many grunt pages you have but also it depends on the traffic on the server. So how many trainings transcribers has to deal with in that specific moment. But you will also receive an email afterwards when the training is complete. So you don't need to keep transcribers open. You can close transcribers and close your laptop because the work is all done in the servers in this book. And afterwards after the training is completed you will go there and you can go to private models and you can see your model take a look at the card error rate here and you can apply it. The card error rate is important. So both the card error rate and the learning curve you see here it's important to understand how the training went and the performance of your model. But just don't look only at data because it's also important to see the real application. See the real result on a page. So sometimes it's better to test a model on a real page and just not looking at the card error rate. And I think we can go back to the slides and just have a look at how many pages and words you need. As here you have an overview. So for printed text 5000 words of ground proof are enough and you can expect the card error rate between 0.5 and 2% for a simple hand in a simple writing you need 10,000 words to train the model and you will get the card error rate between 2 and 4% and then the number of words increase especially if you are including more hands or you want a model, a very general model that can deal with multiple hands not all seen during the training. So you would need more than 100,000 words and the card error rate increased to 6% to 8% because the model will be very good on some hands but on others not seen as much during the training the performance could decrease but especially with super models we are trying to face this problem so because super models are really good with all new hands not seen during the training. And the next slide is about the base model so we can go on. And now we will talk about the other types of models. Baseline models, field model, table models refers to the layout. You don't need to train them unless the results with the layout are not good as expected. So in the case of baseline model we are working with the recognition of the lines you see there is the text region in green and the line the baseline in blue and the line needs to run at the base of the letters. Sometimes the automatic baseline section so the automatic layout recognition isn't good especially if you are working with very complex layouts for instance sometimes it happens with newspapers letters with not only with a horizontal text but also with vertical text and text with mix orientation is a struggle sometimes for transcribes or you are working with postcards or strange layouts. So what to do in this case? Next slide. So if you're working with a complex layout the first thing to do is not to train a baseline model as the first option but first we recommend to try to use a different baseline public models. So by default the mixed line orientation model is selected but sometimes there are other public models that could be very good too. So our suggestion is to use a different baseline model and you can do it also in combination with a different layout settings configuration. So this next slide. There are some advanced settings that you can change. We are not going into detail into this but just to let you know that often the problem is not with text recognition per se but that is the layout recognition. So you can work on that if the result of the text recognition is not good. The problem could be not just the model but the problem is the layout recognition. So transcribes isn't correctly looking at your lines and you can tackle this problem by using a different baseline model or working with advanced settings. Often with printed text or especially with the newspaper just finding the right configuration for those type of lines and helps a lot and so it's a lot of problem. If you have tried all the settings and all the possible combination and nothing works you can always train a baseline model specific for your documents. So the next slide. So here you see this is an example. This is a sort of index card and the problem here is that it's not detected correctly some part of the text like the data or the designer on the right. And at the same time it's also detecting some text that I don't want to have in my transcription like the small line at the bottom with the name of the printer of this index card. And in my workflow probably this will become just a rumor in my transcription. So what I can do in this case and we can go to the next slide is to train to create the correct baseline layout recognition. So I can manually fix it for at least 50 pages and I can create the layout recognition that I want. So in the case of the liner I can create just one line or avoid to create baselines for the details that I don't want to transcribe. This is also useful for tables. If you have some information or if you don't want to have the data signs in your tables or some numbers you can just avoid to you can just train a baseline model to avoid those elements. And when you have to prepare at least 50 pages of ground troops with the correct with the correct baselines you can train your model for that. And just remember that you don't need to have the text to train a baseline model. What is important is just the layout. And the training works in the same way as for the text recognition and you can always find some information in the help center. And now I will leave the floor to go for the last part. Thanks a lot. Yeah, as you have probably seen Sarah is very enthusiastic about this and also the expert in our team. So Sarah is the one who basically knows about the baseline. No, I in general transcribers and text models like you know the most fat things and know how to use them best. So it's really good to have you here and share those insights with us. Yeah, let's have a quick look at field and table models. I'll quickly go through them and then also address the questions in the Q&A at the end because there are some open questions and we will definitely talk about them. And we've now heard a lot about text and lines. But what really is a challenge still is kind of the structure of historical documents you can see in this example historical documents. Yeah, it used to be quite complex and the writing in those documents also needs to be located to be extracted. What we have now on our public beta version are field models. And as you've seen before with text regions you can also tag them. And now what we have with the field models is a trainable model basically that you can train on your own beta region where you can train a model to recognize text regions with those labels as you have trained them or as you have basically labeled them in your training data and then label more similar data with those field models. You can, as said, automatically mark regions with labels and design structural text to those. There are about 50 pages of training data needed which usually is quicker in terms of production than text training data because transcribing a page takes more time than drawing the text regions and designing labels. So here you're rather quick in producing training data and then can train your own model to recognize always, for instance, in this detail a shelf mark the name, the newspaper and the details last those fields in more material and then afterwards you can extract that information and, for instance, move it into a database or migrate it into other tools. Yeah, as said, you can find those field models on our beta version. There's a guided process as Sarah has shown you with the text recognition that you can click through where you can add your training material and then set up your model training and eventually use your model. There are a number of different applications. There are historical documents where ext regions are quite scattered and you kind of need to recognize them all. So instead of using a bottom up approach which is the standard layout recognition approach refers to text lines recognized and then they are clustered into regions. You can basically also use a top down approach where you first recognize the regions and label them as such and then add lines to those regions. You can also segment newspapers with the fields field models. So a newspaper segmentation can be also done with that. You can segment forms which are very common in historical material and extract those labels, for instance, as the place name or the first name or the last name or date, religion, you name it. Those fields can also be marked in the structure of the document and then labeled with the field model in also different formats. So depending on the training data size you can train a model to handle a number of those formats not just one single layout but also different layouts as layouts changed over time. You can train a field model to recognize more than one format as well. Then there's also multiple layouts which you want to maybe work with. Yeah, so there's basically as this it's from a technical point of view instance segmentation, unlimited possibilities. So you basically name the fields. You mark the field on your material and then you train a model to recognize those fields. In theory, they also work for illustrations. So you can basically for now there are only text regions but we will introduce more layout regions soon. So you can mark an image for now it works with the text regions as well as this is basically not bound to but just be used for a text. You can also mark an illustration, an image, page number in your material and then train a model to recognize those illustrations. Illustrated manuscripts for instance. You could for instance then extract all the illustrations in those illustrated manuscripts or extract initials. You're working with the materials so you probably know the material better than I do. And then the second model type that we are introducing on our beta version are table models. It's a very similar technology that is used here but it's tailored to work with models. So you can train your own models to work in transcribals to recognize tables and then eventually as Sara has already mentioned with the Excel exports or with the spreadsheet export you can export those tables to an Excel sheet and then basically manipulate and work with that data as you like. We have some more information here on the slides just briefly to show you how it works. Basically those models will recognize rows and columns and then match those rows and columns to come up with a table. So here in this example for instance first the rows are matched recognized then the columns are recognized and eventually they are matched and you come you end up with a table and can extract that data in a nicely structured way. Yeah, you can see it's also another example how those tables will look like. The tables can be also skewed so they don't need to be perfectly straight. So the table models are quite robust and you can also train multi-line tables where multiple lines are in a cell and then basically extract that information like in the example shown here. In terms of training data that is needed you can even start with 20 pages. So for a simple table about 20 pages as shown in this example I've really tried it out with this particular document. There were about 20 pages in the training data that were sufficient. So very similar tables to this you can easily train a table model with about 20 pages and then recognize an indefinite or infinite amount of pages theoretically with that model. For more difficult tables there's about 30 to 50 pages required and if there is a mixture of tables then you might need more than 50 pages of training data. Again, that's also a guided process. It works similarly as with the text model but you will find it on our beta instance and can check it out on your own. Here's a little summary that you can then check out in the slides and being mindful of the time I will try to cover the remaining items. We are currently also in the move towards a new subscription system. As you know, transcripts basically everything that you have seen here what text recognition is free to use so you can use transcripts for whatever you like only for the text recognition at the moment you will need pay also only for larger amounts so with every account you get 500 free credits for free. We will adapt that a little bit in the core principle transcripts will remain free we will even increase the amount of free credits that you will receive we will change that from a one-time 500 package to a monthly 100 package so you will get for free 100 credits which equals basically 100 pages of text recognition every month so if you're basically a loyal user of transcripts you will get more out of it for free all the features that you've seen will also be available except for some advanced tools like the field and table models they will then also be charged as credits so once you use a field and table model that will also then go to credit and also the new transcripts site functionality which I will show in a second will be paid then but in theory there's not too much that is changed we will focus on the change to make transcripts more sustainable as we have addressed at the beginning we are cooperative and need to make sure that transcripts can be further developed and also be continued and maintained and for this we are moving towards the subscription system which will give transcripts more security and a more long-term commitment also from user base in terms of using transcripts there are three different plans that will be available I will show you in a second and eventually you can see them here the whole community will benefit as we will introduce those plans to asset and further develop transcripts and the nice thing about being a cooperative is that every single dollar or euro that transcripts makes will be reinvested so there's basically no dividend payout so we are acting like a non-profit all the returns that we are all the revenue that is coming from the platform will be reinvested and basically just helps transcripts grow and make the software better for everyone said there will be the individual plan where you get 100 free credits every month then there will be more advanced plans like the Scola plan starting from 14.90 a month where you will have then the advanced AI tools such as smart search field models table models and also transcripts sites as well as some collaboration tools which we will introduce at the beginning of 2024 and then there's also organization tools which are tailored toward the users that are using transcripts on a larger scale and their prices will be also be tailored towards those use cases we also try to and not forget where we come from we have a route in the academic scene and also try to give back as much as possible that's why we have the scholarship program where we try to support students so aspiring researchers basically and also teachers to use transcripts we have supported as of now about 300 projects so very often they are thesis projects or other research projects by students where they need to extract some historical information and use transcripts for that specific purpose so yeah if you're a student or if you're a teacher and eligible for that program we're happy if you will get your request as well and now just a brief a sentence or a plot sentence is about transcriber sites a new tool that we're introducing soon it's already available to write out on beta as well as table and field models and that is basically like a small content management system where you can share your material that you have interest scribbles with the world basically you can see you can easily share material you can set up a side by side view as in the editor for everyone so you can share a transcriber site which is basically a website where you can share your material and also make it searchable so those third capabilities that you have interest scribbles they will then also be available for the users of your transcriber site and make the very valuable work that you're doing with the historical material visible as you can for instance set up a yeah digital edition and work on the material pair it with transcriber sites quite easily and give everybody access to the full content of your transcriber site there are already a lot of transcriber sites online and we have also some very nice examples where that material really led to some new discoveries like a yeah long unknown Rembrandt painting that was found based to the content of a transcriber site and yeah about two years ago and now the big change here is that this will now be available for everybody everybody can set up their transcriber site you can see how that might look like and once you search something you will see the search hits then in the transcriber site yeah summing up and then coming to the questions because there are a number of questions and we will try to address them all yeah we are very happy to get your requests to be here for you we also have an extensive help center where you can try to find answers yourself we will try to update it as frequently as possible with all the new tools that we're launching not all the information is always perfectly up to date we're giving our best since we are still a small team it's not always perfectly easy to have everything up to date but yeah Saga and Tulenia are also doing a great job here and communicating everything that we're introducing in terms of product management and in terms of development yeah and we're also always happy to be tagged on acts or Twitter as you might know it and also on other social platforms you can probably find this on the most common platforms once more a short reminder before we come to the questions as said at the beginning we will have our user conference at the mid-February 2024 it will be hybrid so in person and online if you're interested check it out and we're happy to welcome you interest crevice or also welcome you online there's we're a lot of participants during the first hybrid conference that we made in 2022 in September so we're happy to share the insights and the topic of this and the user conference is the future of information extraction here we will really have a lot of very interesting talks about the topic how you can extract data from historical document and what the future of extracting information from those sources will bring now sorry let's come to the questions from the Q&A I have already had them before so I briefly or quickly can go through them there was a question about column sets they will also work with the crevice in theory first crevice is customizable to your material so you can use first crevice for whatever material you would like we have a very nice example for instance with the wikimedia foundation where they were working with column leaves where the writing was basically on column leaves and that seems to work out quite nicely so a nice project to know what the wikimedia foundation yeah with the doc export page break and line break option they are basically as far as I know already available so I'm not 100% sure about the question but we can maybe address that later again then there is material our question regarding the material that is just a specification so I think that's also addressed then regarding the top end guesses that's something I said that is only available if you enable smart search for now you can not see the top end guesses we will introduce that for now only the search basically uses the top end guesses so it depends it's between 10 and 100 per token so it will be not a fixed amount of different variants per word that is stored with the smart search and we will also extend that functionality that you can get also the top end guesses for a token and then decide yourself which is the best guess then for the exporting so basically everything introscript is exportable but the model so as you have access to all the training data and of course you can theoretically also export the training data and train the model on another platform so in theory you can export everything so the train set can be exported but the model itself is not exportable at the moment I think the question regarding split line has already been answered in the chat as far as I've seen so you can merge lines if they are split you can just select both lines hit the M key on your keyboard and then you can merge them then there's a question regarding community documents of the late 19th and 20th century so I'm pretty sure that you can work with them there also and that's probably the biggest set of public models that is available for the German language so you can work on those models with those models and also the next Titan is handling mixed scripts very well so you can handle printed and handwritten documents at the same time which standard models are only capable of doing to some extent so you can mix some material in the models but mainly the text Titan is to go to if you have mixed material and then regarding the free account I think I've addressed that with the new subscription model so everything that or almost everything that you have seen will remain free but the advanced AI tools and transcriptor sites they will be part of the higher tier subscription plan and I said with 100 free credits you can process about 100 pages a month then reordering text regions should also be possible so you can reorder text regions as I've shown similar to text lines that should be available through the text editor in the editing edit pane where I've shown how you can reorder text lines you can reorder text regions as well and you can also move lines from one region to another region because I think that question was in the chat as well then there's a question regarding node code I've not directly have the link now but there was a very interesting session about node code at the last user conference which is on our YouTube channel so you can basically just check out that that's basically the most elaborate tutorial we have at the moment in relation to node code then regarding if a model can train abbreviations I think that's a question you can very nicely answer Sarah if you want to address that question so if a model can train so I can bring something I must try and throw it Yes, it's possible to train a model so there are three different solutions that you can option so the first is to transcribe the abbreviation as they are and you just get a diplomatic transcription the second one is to transcribe the abbreviation and tag it and write the property and then when you train the model under advanced settings you can toggle the option train abbreviation with this expansion and the model will learn to find the abbreviation tag them and write and add the appropriate expansion as a property of the tag you just need to remember that it works when the abbreviation are very frequent so if an abbreviated word appears only once in your run through to it could be difficult for transcribers to learn them that if you have very frequent abbreviations it can do that so remember to have a good sample of the abbreviations in your run through to and the last option is to solve the abbreviations and teach transcribers to solve them too and there is a model for instance train by the University of Toronto for medieval Latin and this model is capable to solve a very complex abbreviation because the run through to is what's raining that way so all the abbreviation were solved in the run through to and transcribers to learn that thank you very much then there's a question regarding the vocabulary of a model at the moment it's not possible to retrieve the vocabulary this is rather a user interface matter so we could theoretically work on something like that so we will definitely take note of this and then maybe come up with a solution regarding that question for the table analysis there's a question if the tables need to be 100 percent homogeneous and that's basically nice nice thing about the table recognition and then also the table models they don't need to be perfectly homogeneous it then depends on the training data of course the models that are trained on the tables should reflect the tables that you are trying to recognize but they don't need to be to be 100 percent homogeneous so there can be quite some amount of variation in the training data of course depending on how many tables there are as I said you can also theoretically train different types of tables in the same model but then obviously you need to add more training data then there's a question regarding private models how can users reuse other people's models so a model is always linked to a collection at the moment so once you train a model that model is linked in that collection if you want to give access to that model to someone else we might change that soon but at the moment it is like that you just need to give them access to that collection and once they have access to that collection they will also be able to access the models in that collection so if you want to share that model easy ways to train that model in a collection that is kind of clear that you can share with other people and then you can just share that collection with other users then there's a question regarding the PC software so I guess you're talking about the expert client so the desktop client that is currently available which is basically a transcribers version which has even more features at the moment that was the version of transcribers that was developed back during the project times it's a very extensive tool Helene has briefly addressed that issue but it was built during a research project without focusing on user experience and also focusing about how a product should be designed to be useful for a broader user base the learning curve is really flat so if you want to learn how to use the expert client it takes you quite a lot of time once you're used to it it's a really powerful tool and it will therefore not be gone very quickly but the goal the eventual goal is to really build one software that can yeah on the one hand be very intuitive and be used by a broad user base but on the other hand also handle all those expert tasks the current desktop software is capable of handling so there's a lot of very tiny and nitty gritty but very helpful features in there and we will step by step also introduce yeah most of them at least in the web-based software as well and then the web-based software will also be downloadable to be a desktop app again just Google sites open access yeah that's something that we're that's why also it's still on beta so the open access is still a topic that needs to be addressed of course at the moment once a transcribersite is published it is basically open for access it's not open access but it's open to access by the general public but we will still introduce a system where a transcribersite owners can define the license the data that is shared in a transcribersite holds basically and I think are you already going to answer the last one right Sarah as I've seen that you're typing yes it's possible to customize a target I would suggest to add a target spelling or modern spelling and you can create a property modernized form and update as a attack okay perfect there was one last question yeah that's a very good question so character rate is the current metric to show you what model performs it is measured as the name says says based on characters there's also word or rate that is kind of used in the background you can also see that in the expert client but yes the character rate is not a perfect measure so as Sarah also already explained it's yeah considered to test it really on your material and see how the performance is on your material and not just rely on the character rate because that is very subjective to the material it was tested on so to the validation set so depending on what pages you add to the validation set all of the character or the error rate will largely vary so here it's really advised to consider how the character rate is built therefore always trying out the model is the best approach here seeing how it performs we will introduce more advanced quality control tools soon we already have some in our internal testing so those will also then maybe add more functionality to evaluate models and to understand how well a model performs for your material having said that I think we have addressed a lot of questions maybe there are some unsolved so I there were so many questions that I could not follow all of them in the chat but I think Helene did a great job answering most of them in the chat already if there might be some unsolved questions or issues just write us an email I said we're always happy to help and it would be really great to have such a yeah huge turnout we're still more than a hundred people and they're already running almost 20 minutes late so sorry for that that we took a little bit longer of your yeah very long afternoon but we hope it was informative we hope it gave you a good overview of what Transcribers is capable of there's a lot of more features to be discovered we have tried to give you an overview of the most important ones but we will be yeah as I said very happy you see kind of join us on our journey as we always say we are unlocking the past and we're doing that together so that's our written and everybody that joins that vision is yeah always welcome together with Transcribers so thanks a lot I hope you have a nice evening if you're in Europe if not then I hope you have a nice day and if you're in the Pacific or yeah Oceania region thank you for sharing your night with us so there's always users around the globe joining which is really great and really shows how passionate everybody is about this thanks a lot and see you soon then