 Hello everyone, and welcome to our webinar about Transcribers. Today we're giving you an update about Transcribers, so there's many news in and around Transcribers, and we are very happy that so many of you joined us today to discover what's new and how we are going to present everything. We will have a look at that in a second. So here's the agenda of today's webinar. First I will give you a short introduction about the webinar and who is going to present it, then we will have a look at the Transcribers workflow in general, and then the exciting things are going to be on the agenda. So we will talk about the new editor, which is currently in beta stage, but will be released next week, as well as field and table models that will also be released next week, and then also Transcribers sites, which you can already find on app.transcribers.org.eu, sorry. And then also we will have a short look at the new subscription plans and third outlook on what's going to come in Transcribers soon. So first let's have a look at who you have here today, or who we have here today. So your guides for today's journey are on the one hand Miriam and I'm Miriam from the user success team. Then we also have Elena here, who is mainly working in pumps. And then we also have Sarah, who's mainly also working in user success together with Miriam. We also have Matias from the plums team and my name is Low. I am mainly working on product here at Transcribers. So Transcribers, building Transcribers is the main thing that I'm doing here. But now let's have a quick look at this webinar's content and what Transcribers is after all. Yeah, Transcribers is your AI powered LA, as we like to say, and that we have designed to simplify your time consuming work and also make it fun to work with historical documents. So our vision is to unlock historical documents together with the community. And we are here to build those tools that enable you as the community to allow those historical sources. And we're very happy that we can rely on a very broad base. We have a cooperative behind Transcribers that is called RECO with more than 180 members worldwide in more than 30 or 35 currently, I think, countries all over the globe, which is really great. You can see many of those institutions that are members, but we also have many private members that are basically supporting and maintaining Transcribers with their share in Transcribers. But also the user community, so we have more than 170,000 users all around the globe that are currently using Transcribers, which is a really great number, considering that we are coming from a research project back in 2013. Transcribers story started and 60 years after Transcribers evolved into a real company, a real thing. And yeah, here we are today. We're really happy about the success that Transcribers had and are really proud of what Transcribers is able to do as a cooperative and with a little bit of a different approach to doing things. So we're not a corporate that is basically profit driven. We need to turn a small profit with Transcribers in order to maintain Transcribers and further develop it. But in the end, basically Transcribers is a community driven project that is really great. And here you can see an image of last year's user or last year's user conference. And we will also have a little sneak peek of this year's user conference on the very end of this webinar. But now let's have a look at what Transcribers is and how we understand Transcribers. So we understand Transcribers as an ecosystem to unlock historical sources. And as you can see here, this is basically the menu bar of Transcribers, the web app that you can currently access. And we have basically three major parts in Transcribers and they are called Transcribers Desk, Transcribers Models, and Transcribers Sites. Before I forget it, if you have any questions, please use the chat. Enter those questions in the chat. My colleagues and I will do our best to answer those questions right away. And of course, at the end of this webinar, we will also have space to address those questions and discuss them in the plenum. Coming back to the ecosystem, we have those three workspaces are they called the Transcribers Desk, Transcribers Models, and Transcribers Sites. And this is basically how we have structured Transcribers to help unlocking and then also publishing historical sources online. But let's have a closer look. What is Transcribers Desk? So Transcribers Desk is basically the core of Transcribers and here you can do extra cognition, but also manual editing of historical sources, as well as tagging, so enriching with metadata and also searching of historical documents. So this is the core where you can work with your sources. So basically that is the space where the work happens. Then we have the Models workspace. This is basically the workspace where you can train your own custom AI models. That's one of the biggest advantages of Transcribers that also paved the success of Transcribers. And back when it was still a research project, with Transcribers you can train your own custom AI models. That was at the time back in 2016 when it became available, basically very innovative. Yeah, that you were able to train your own custom AI models with just a few clicks of some buttons. And that's now still, of course, one of the major parts of Transcribers that you can train your own models. We have further evolved and now can already offer four different types of models that you can train. As you can see here on the slides, which is text recognition models, which Transcribers is mostly known for, but also baseline models. Those are AI models that are here to recognize the lines in those images. So knowing where the text is on the image in first place is really important. And for that you can train your own models, but also field and table models. But we will hear more about field and table models later. And then we also have our newest addition to the suite of Transcribers workspaces, which is Transcribers Sites. This is basically a publishing tool, so you can imagine it like a content management system. As you might name them with this tool, you can publish your historical sources online and make them searchable so you can make the full text searchable. As you can kind of uncover those texts with Transcribers in first place and then also making it available with Transcribers as only a few clicks away with Transcribers Sites. As you can see here, we already have a number of public sites available, which you can check out by going into Transcribers and checking them out via the public site section, where you can see other projects which have already published their very own Transcribers Sites. And those are very interesting projects. If you want to have a look at them, I would encourage you to browse those sites and explore what others are doing with Transcribers. Now I'm handing over to my colleague Miriam that will guide us through the basics of Transcribers and also show us the workflow of Transcribers. So over to you, Miriam. Yes, thank you and hi to everyone also from my site. So before we really talk about all the new and exciting things, we of course also wanted to give you a brief overview of the basics in Transcribers and also the general workflow. So we can, let's see if I can actually, yes, okay. So the Transcribers workflow basically helps you to go from the image you can see here to a machine readable text that you can then access, enhance, download, etc. And the technology that lies behind Transcribers or behind this kind of magic is called handwritten text recognition. And this technology can transform the text of the scanned images into machine encoded text that then is of course also searchable and can be exported into various different formats. Now let's take a look at what actually happens when we use this handwritten text recognition because there is actually some stuff going on. So even though we only have to click on one button to recognize the text, there are actually two steps performed in the background. So the first step is a layout recognition that recognizes the lines that you can see in this image here and the text regions in this green, in the example marked in green. And then as a second step, the actual text recognition is performed. So the machine recognizes the text and you get the output in your text editor. So this means that before actually recognizing any text in the background, the technology has to perform a layout recognition as well. But usually you don't have to do anything more than just click on the recognize button. And then let's quickly look at another important topic in Transcribus, the content management system. So basically how your content is stored in Transcribus. You can usually will start with the collections, which are mostly the single projects that you're working on. So you can see that here we have three different collections. They all have a name and they all have an ID number. And then within these collections, the single documents are stored. So you can see here in this collection called beginners webinars, there are 17 actual documents. And then within these documents, the actual images are stored. So this is the kind of management system that Transcribus uses to store your documents. And now let's quickly take a look at this inside the platform or in the interface. So if we go to the home page of the Transcribus interface, we are now in the home area in the desk section, as Flo has already mentioned before. And we can then see here our recently opened documents and also our recently opened collections. And then if we click on the collection that is called English webinar January 2024, we can see or we should see, yes exactly, the different documents. So here you can see we have a few different documents. They have a number of different pages inside. So this one has 26 pages. This one has 40. Here are 732 pages. So you can see that within the documents, you can store a lot of images. And this, if we just quickly look at this as well. So this is how the document page looks like, where you can see all the individual images. And then let's go back to the slide to actually check out the basic Transcribus workflow. So of course, we will start by uploading some documents and then we can recognize this. So we will start in the Transcribus desk area as we have just seen right now. And so this is the, yeah, an overview of the basic Transcribus workflow because there are two options that you can choose. So of course, in the beginning, you have to start by uploading your documents or one document. And then you have the question, if there might already be a suitable model that you can use to recognize your document, so a suitable handwritten text recognition model. Then if that model is already available, you can just go ahead and start the text recognition. Then you will receive the text output. You can maybe enhance it by using some text or some structural tags. You can, of course, search this text output and then you can download it. If there is no suitable model available for your documents, you have to do a bit of extra work. Then you can start a separate layout recognition process or you can just do the layout recognition or draw up the layout manually. So the lines and the text regions. And then you have to create ground truth pages. So that actually just means that you have to recognize the text or you have to correct it or just type it in from scratch so that you get the layout and the correct text side by side. And using this ground truth data, you can then start your own text recognition model training. And once you have that, you can go back to this step starting the text recognition using this model and then you receive, again, your recognized text and can further edit it or search it or download it. So this is the basic transcripts workflow. And for this webinar today, we will only look at the steps that are highlighted right now. So we will check out how to upload a document and then we already have a text recognition model available. So we will then just run the text recognition and I will also show you how to download this. So these are the steps that you have to follow to upload a document. First, it is good to create a new collection or you can use the default collection that is created for you when you register for an account. So then you would go to the tab collections from the home page and then click on new collection. Then you open this newly created collection and you can upload your files. So you can upload images, these have to be in JPEG or PNG format or you can upload documents or sorry, you can upload PDF files. And then we have already just seen a preview of the next slide. So just as a general rule, of course, handwritten text recognition can recognize handwritten documents but you can also use it or train it to recognize historical prints. So books or newspapers can also be recognized. But it's important to note that there is no general model for all scripts or all languages and epochs available. So basically you always have two options. Either you can use a public model that has already been trained on similar document or similar scripts by the transcripts community or if there is nothing that suits your script then you can, as I already mentioned, train a custom model to recognize the specific handwriting or the specific form that you need for your documents and for that you need an amount of images and their transcriptions which is the round truth data. As I mentioned, there are some public models that have been trained by us or by the whole transcripts user community. So at the moment we have 186 public models available for you to try. Here you see a screenshot, an overview, what this looks like and just one thing to point out is that we also have something that is called super models. These are transformer based models and they are very big and very general models that you can use to recognize a variety of different materials, writing styles and languages and they are quite convenient also for documents that have both handwritten and printed text. So yeah, these are very useful. We have this first super model. We have published it back in April. The text Titan 1 which was trained on English, Dutch, French, Finnish, German and Swedish. But as you can see here, this is now limited to the Scholar subscription plan or higher. So if you are just in the individual subscription plan, then you can use the text Titan 1 at the moment. We will talk about this more than at the end of the webinar. So let's take a look at the interface. So what this uploading and recognizing actually looks like. So I'm going back now to the collection overview. As I said here, you could create a new collection, just type in the name and click on create. But I will now go to this webinar collection that we already have and then upload a new file. So now I need the image, prepared image to upload. And we have here the example. Not sure if I can drag it drop. Thank you. So now we have the example image. We can give it a different title example letter and then click on submit. Okay, this should not happen, of course, but that is probably a result of us sharing the presentation and recording. So let's see. Something that is also good to mention here is that you can switch between image upload and PDF upload here. So if you wanted to upload a PDF file, you have to switch the selection here. But of course, because I want to upload a JPEG file, I have to keep the image selection. Let's see why this is not. Yes. Okay. Somehow I cannot upload the file here now, but this is what it looks like. So if I click on submit, usually the file should be uploaded. I think this is really because we are sharing the presentation. I think it's because of the rights. We haven't read enough rights to that user. So it's basically a collection of another user. We can try it in a different collection and there it should work. If we go to the very own collection here and try it out again, then it works. So basically the user probably was only added as a transcriber as there is user management also included the principles. The user probably didn't have enough rights to upload. Yes. Okay, thanks. Yeah, great that we figured that one out. So we can actually check now in the jobs table. So when we clicked on submit, there was an upload job started and we can see here now that this user uploaded or created this document and the state is already finished. So the job is done and then we click here on open. And so we can see we are now in this collection. The document is example letter and this is the image that is inside the document. And if I now click into it and the document editor is open and we can see here already that we are in a new document editor. So because we are in the beta version right now and my colleague Elena will tell you more about this in a minute. But basically it's not very different. There is also a recognition button here or we can just start the automatic transcription from here. And so now we have to select a public model from the 168 ones available. The first one default model is always the text Titan one which you can see here now that I cannot really select it. So if I choose another one, I can start the recognition here for the text Titan. I cannot do so. So it's asking me now to upgrade to the transcript scholar plan. But we don't need the text Titan for this specific letter. We can just search for another English recognition model. And we can choose, for example, this the English Eagle, which is a very big in general English recognition model for English documents. So I can then just start the recognition for this page. And now we can see that another job has started. We can again go back to the jobs overview and we can see that maybe open the full jobs table and now we can see that the job is created. But we are in the queue now. So it will take a bit to actually have the finished transcription. But that is fine because we have also prepared what this would look like. So we have the example that are here as well. And once the recognition is done, we can then see that it looks like this. So we have the image here and then the recognized text here. And that is basically the workflow from uploading to recognizing a page with a text recognition model. And then let's go quickly back to the slides so that I can show you how to export or download the recognized text or the recognized documents. So this basically is also very easy. You just select either the entire document or the pages that you want to export. And then you click on the three dots next to the recognize and train model buttons and then you select export for that. And at the moment there is a standard export available with which you can download the images per se. You can also download the recognized text as PDF as a Word document or as a text file. And if we then just quickly go back to the interface again, we'll just have a look how this looks like in Transcribus itself or in the interface. We can just switch again. So we have this recognized page here. I didn't just select. We only have this one here and click on the three dots and export. And you can then see we have the standard export and within the other subscription plans, we also have different export options available. But within the standard export, we can, for example, just download this as a PDF file and start the export. And then again, we get this message that the job was started and we will get a download link via mail. Or when the job is done, you can also click on the three buttons here and download the PDF just directly from here. So and yeah, this was the basic Transcribus workflow from uploading to recognizing to downloading. And now my colleague Elena is taking over with to show you what you can do if the standard recognition maybe didn't work as well as expected. Thank you so much, Miriam. I think that we can switch directly back into the interface to work with the editor. Miriam showed us already before a little sneak peek of the editor. And some of you might have realized that we have made some changes and some improvements. So just to review the editor is here to transcribe and edit the text or modified the layout structure. So on the left side, we can see the layout editor and on the right side, the text editor. We have put the tools bar or the menu bar on the sides. If you remember in the old or current editor, the menu bars are even in the image. So now we try to clean it up a little bit and put them on the side. You can again choose to grab and use the drag mode or go to the selection mode. Here you can also add lines or add regions. So what are lines in regions? The text region is the field, zoom out a little bit, that encloses all the handwritten text. That is the text region. The base lines are the reference points for text recognition. This is a polyline that runs along the bottom of the handwritten line of text right here. Now what you can do in the layout editor is you can also edit, for example, the text region. If you click on the text region and press Shift on your keyboard, you can also edit it if you only want to have this as a text region or maybe a bigger one. You can edit that. If you want to split the region horizontally, press H on your keyboard and you will see that a horizontal line will appear where you can split it, for example, in half. If you want to use a vertical split, then just press the V on your keyboard. In this case, it doesn't really make sense with the letter to split it vertically, but just to show you what you can do. In the layout editor, you can also add structural tags. Click on the right of your mouse and then you can choose a structure type. It's a big sample paragraph heading initial. Let's go for example for paragraph right here. One thing that is also quite interesting if you work with the layout are some keyboard shortcuts such as I mentioned before pressing H or V or Shift to change something. Click on the three dots on the right side and then you have a list of some keyboard shortcuts that can help you out when you want to edit. We also have a list of those on the help center. So if there are any unanswered questions for now, you can always go to the help center and look for answers there. Then let's look at the text editor. Let's maybe increase the font size a little bit. In the text editor, you can edit SNM name says you can edit the text. For example, if you used a public model and the text recognition was not perfect because the public model was not trained on the specific type of handwriting that you uploaded, there can be some mistakes. So this is a perfect way where you can edit. I don't know, I should say you can delete for example. Here you can see that you can also another little hiccup. You can also basically underline words. Here you can also add tags, you can add textual tags. For example, the name of a person, let's see if it works now or if it's also, I think it's a little bit shy. It's a demonstration effect. But here you can also add textual tags. What you can also do here is edit the layer. Now it works perfect. So you can choose a abbreviation date person. You can also add more tags if you want and add the tags under tag settings. You can also see you can change the text to bold or cursive. So you have some options there when you want to work with the text. Exactly. We can see here is the date tag. Thank you so much. Another thing you can change here is... It didn't work because now you can enable or disable the tagging feature which you couldn't before. So that's one of the new things and that's why it didn't work. But I needed to fight for the mouse until I was enabled. Struggles of sharing the screen. Thank you so much Flo for clarifying that. Here you can also edit the layout tree. So again, if for example you want to switch around the lines, you can just click on the layout tree here, open the region, and then just for demonstration's sake, let's move this line and you can immediately see how the line switches. So this way you can again correct or edit your transcriptions. One other really important thing or a nice tool that we have is the settings button. Here you can really play around and configure your editor. For example, you can choose to show or not show the numbers of the text editor, let's see if it works. You can also choose for example in the image to show the regions, the base lines. So now they're gone, you can see here that the blue line appears. You can also choose to not show the polygons. You can also change for example the label size. So now you saw that the numbers increase drastically if for example you want to have a more prominent numbering. What you also can change is the line color for example. So if it's a bit hard to see the contrast with the background, you can just click here on line color and switch it for example to this bright pink to have more of a contrast. In this way you can really play around and optimize the layout editor so that you can best work with it. When you're done with editing and you're satisfied with the transcription, you can save your document or your page. If you still want to keep working with it, then you can also simply change the status. So right now it's a ground truth but you can also change it to in progress if you're still working on it or undone if the transcription is done but you still want to check it. If you want to go back because for example you're not satisfied with the transcription, you can also click on version history and go back to a different version. We've also seen before that here by clicking on the three dots you can see the keyboard shortcuts. This is also for example where you can directly export your pages. You can also share or for example click to help if click on help if you have any questions. So now we've reviewed the basics. We've also seen the new layout of the new editor and now we will move on to the new features and my colleague Sarah will introduce the field and table models. Thank you so much. Thank you, Lina. Now let's go on to the next slide. Now we will talk about the field and table models. Next slide. Okay. Field models are only available in beta for now. So if you want to access them or test them, go to beta.transcribus.eu and you can use the same credential that you use on the normal web app to access beta. And the field models need to extract information from a page, information that have a layout meaning. So in this case, we have this register card and we don't want to transcribe the entire content of the document because we are not interested in it but we want just to extract the name, the birthplace and the birth year. For doing this, we need to train a field model. We can teach the machine to automatically detect the text regions that we are interested in and then we will recognize them. So instead of transcriber all the printed and written words on this documenter, we can teach the machine to look only at the information that we want to extract. For this type of documents, we need to use field models. Field models can be trained to automatically recognize and mark certain layout components of the documents. So it doesn't work only for on-force. We have first to draw the text regions that we are interested in and these text regions are called the field and we can also decide if you want to assign structural tags to those text regions. In this case, in the image you see, we have the structural tag name, word and year and we can also teach the machine to automatically assign those structural tags. Why they are important, it really depends on the project but if you want them to export all this information into an Excel spreadsheet, you will end up with a column named name, a column named order and another column named year and you will have all the information for all these forms in a structure way and you can then work on the data, make your searches or what you need to do. Some application of the field models are when you are working with complex text regions. In transcripts, as Miriam has shown, you can automatically recognize the layout when you start a text recognition but there are sometimes some complex layouts printed out on Wicton and you want to recognize certain regions as separate ones, like in this case, the signature here at the bottom. You can train a field model to do that for you. Another example, it's a bit slow, is newspaper here. You can segment a newspaper page and divide it in headers and paragraphs for articles. So each article will have its own text region and you can also tag them as paragraph or header and so you can then decide to work separately on them. Another use of fields models is for form segmentation, also the first example that I show you. So here instead of transcribing the entire form you can just select the information that you are interested, tag them and then export the information and also with printed layouts, it's very useful if you have complex layouts, like in this case, two columns with header at the top where you can easily train a field models to recognize the specific layout of your document. So to do that, you need about 50 pages of training data or 50 pages of ground fruit. It's important to note that fields models are not the general models. So you need to train a field model for your very specific use case for your collection, for your documents. It's very difficult to apply other fields models to your documents unless we are talking about newspapers that have quite a traditional structure or recurrent structure. The first thing to do is to draw a text region around the relevant information that you want to extract as Lene has shown us before. And then you can assign a structural tag. The structural tag is optional. So you don't need to assign them if you're not interested in that. And here we have an example. So we have those text regions and we have customized, I have customized the structural tags and call them ShurfMarker, Name, Newspaper, Details. And because I'm not interested in this part of the tag, in this part or in, sorry, let's go back. In this stamp here, I just omitted them from my training data and the model will learn to do the same. Yeah, when you have done this on about 50 pages, you can train your field model and creating the ground truth is quite fast. Despite creating the ground truth for the text recognition because you need to type all the transcriptions for fields model, it's quite easy to do it. And when you have your 50 pages, you go to the model tab, the ones that Fleur showed us at the beginning. And here you need to select the optional training model and the type of model that you want to train is a field model. At this point, you are asked to select the training data and the validation data. This is common when we talk about the machine learning. And it also happens with all the other models. When you have your ground truth, your 50 pages of ground truth, you need to assign around 90% of them to the training data and 10% of them to the validation data. The training data is the actual pages on which the model learned. The validation data is a set of pages put aside during the training and used by the model to refine the parameters and test its accuracy. So the validation data is set aside and it serves the model to see how it is performing and how well it is learning. So the first step is to select the training data. So your ground truth pages, you can choose the collection, the documents and the pages. Then in this case, you have to select the tags that you want to train. In this section of your test selection, you can decide if you want to train just the text regions without tags and this is an option. Or if you want to train specific tags like paragraph, image, header and so on. Then you have to select the validation data and there is the option to automatically assign 10% of your ground truth to the validation data. And we always recommend to do it. But there is also the possibility to assign it manually. Then there is model setup when you need to add the metadata of the model like the title, the description and other details. And there you can also change the button settings like the training cycles and the learning rate. But for the first training, so we really recommend you to stick with the suggested values. And at the end, you can start your training. Going to the job that you will see the status of your training. And after a couple of hours for one day, it really depends on the size of your training data. You will, your model will be ready and you can do the recognition of new pages with that. Now we will see how tables models works. They are similar to fields models and tables models how to automatically recognize their rows and the columns and does improve the extraction and analysis of tabular beta because then you can export them in, you can export the text in a structure way, export them in a spreadsheet and work on it. Differently from fields models, in this case, the model learns to recognize rows, columns or both. So you can also work with, you can also create tables if you have just rows or columns or when you have a traditional table with both rows and columns. As in the case of fields, this is not a general model. So you won't find a public model that works for every type of existing table, but you need to train a table model for a specific type of table inside your collection or documents. You can also train a table model if there aren't visible separators for columns and rows. We know that there is a great variety of historical tables. So you can also train models when there are no printed or row separators for columns but just a white space to indicate them. And with enough training data, a model can handle multiple type of tables, but you need to include all the tables that you want to train the model on in your training material. Here is an example of table. Here we don't have the vertical and the horizontal separator, but just the white space between the columns and the rows. And this is the recognition of the rows and that this is the recognition of the columns. And what you see in the ending for scribbles is the intersection of both columns and rows. Another example is this one. So this is the result. And as you can see here, there is no... There is... Here, we created the column and train it but in reality in the regional document there was no column here. So the model can also learn to add other columns or to omit columns if you're not interested in first or in the last one. This is an example of how the model performs with the screw tables. And the last one is how it works with the multiliner cells. So even if the rows, the height of the rows varies, you see the model with enough training material is capable to recognize that this row is just three rows and this is much larger. To prepare the training data, if you are talking about easy tables, you just need 20 pages manually annotated to train your model. In the case of difficult tables, you need at least 50 pages. And if you have a mix, if you want to train a model on different type of tables, you need between 50 and 100 pages of ground torture depending on the number of tables and their complexity. And now I will just show you how to create the ground torture. So how to annotate a table and create the ground torture. So let's go back to the platform. Let's take this one. The first thing to do is to draw the table. So you selected this button and then you click to start the table and then you click again to... And now we have drawn our table. It's up to you if you want to include this part or not. Usually we don't include it because we can just add those info that are always the same on each page during post-processing. So when you have your Excel sheet, you can just add the title of each column. And then as Helena has shown us, we can click V on the keyboard to create the columns and click H to create the rows. And this is enough. So when you have done... You have segmented the entire table, you can save it as ground torture. And this is enough to train our table modeler. You don't need to add the lines or to add the transcription because the training only happens on the image and on this structure here. So in the table, we can go back. Can we go back to the presentation? And the last step is how to train the field model. Thank you. It's similar to the field model. So also here you need to select the training data. We don't have structural tags here. So you jump right away to the validation data, the model setup, and the advanced settings. But also here we don't recommend to modify them at the beginning. Here is a summary about fields and tables models. So you can start with about 40, 60 pages of ground torture depending on your material. You have first to prepare the training data with the editor. In the case of fields models, you need to draw and type regions. In the case of tables model, you need to draw the tables and draw the columns and the rows. There is no need to add the transcription. Then you can... At the end, you can train a model and to recognize your pages with tables and fields, you need it's a bit more complex than the other documents we have seen because there are three steps here. You need first to apply the field or table model to recognize the region, the regions or the tables. After that, you need to run the baseline recognition and only after that, when you have detected text regions and the lines, you can start a text recognition. So it's three separate steps that you need to do in order to get the text in the end. And after that, you can work on them within transcripts or explore your data in the format that you prefer. And now I will hand over the floor too much. You will talk about Transcreber Sites. Yeah, thank you very much, Sarah and hi to everyone. Also from my side, yeah, it's showtime. We now talk about Transcreber Sites because what you've heard so far is quite a lot of work, quite a lot of work that you have to go through beforehand to work on your documents, recognize those and then also enriching them, putting in tables, etc. But the nice thing about Transcreber Sites is also that you're able to publish later on. So with Transcreber Sites, which I'm now talking about, this is exactly what's possible. So Transcreber Sites is where the show happens. All the work that you've put in, you can now publish and you can show to other people outside of Transcreber Sites with your own Transcreber Sites instance. So as you can see here, you'll find Transcreber Sites directly next to the desk and to the models part. So the navigation should be quite clear as we've heard about the desk and the models part before. So simply click on this button called Sites and then you are basically in the right region. And what we basically can do with Transcreber Sites is show our documents in the way that are displayed right here. So it is basically a very easy way to share your material. You have this nice side-by-side view. So on the left side, the original document. And on the right side, you have the transcript that you have done before with automatic text recognition. But also if you produce some manual transcripts, you can also show them through Transcreber Sites and you have enhanced searching capabilities. So this is basically one of the main reasons why we actually do this text recognition to make everything searchable. The thing is when you have Transcreber Sites, for example, you can simply type in a search term and you'll find it in your documents. So mainly in the transcript, but you always have the connection to the original also on the left. So always the line that is highlighted in the transcript is also highlighted in the original and vice versa. And this way, of course, we make everything way more accessible than it would be if these documents just were in an archive, for example, or somewhere else where you have to go physically in order to look up what you want to know. So these are basically the features that you have in Transcreber Sites. And the question, of course, is how do I get there? How do I basically get my own Transcreber Sites instance? And this is what we tried to make as easy as possible within the last months. So on the left side, what you can see is part of the content management system that is available with Transcreber Sites. When clicking on this button, Sites, then you basically end up here. And there you can add it, your whole Transcreber Sites instance. You can give it a title. You can give it a background image. You can control all the texts that are there also on the homepage, but also on the about side so that you also can talk a little bit about your project. And what you can also include is tags, et cetera. Right, so what do you have to do in order to get to your first Transcreber Sites website? As said, first of all, click on the Sites button in the top navigation bar, and then you will find, oh, here we are. Thank you very much for changing. So first of all, what you do is click on this button up here. So the Sites button in the top navigation bar, as I just said, and then you will land right here. What we can see here is a Sites overview. So we have already got one, which we can show you later about Marjorie Fleming's diary. But of course, what you can also do is create your own new site by clicking on this button here in the top right corner. Then what you have to do is first of all, give it a title. So for example, diary, then you can select a custom URL. So just to give it a little bit more flair and to make it shareable also on your side, you can have this unique URL. So for example, diary of a kid. We could just write it like this. And then what you have to do is simply see which collection you want to publish. So we've heard from Miriam in the first place that we have collections in Transcribes and these collections are basically also used for publishing. So you can do the work in one collection, but you can have another collection in which you basically copy your documents or link your documents which you want to publish. So this way, you are of course able to navigate through the documents that you're still working on and don't publish them but have the documents that you want to publish in a separate collection. And then you simply have to click on the collection that you want to share and then you click on creates sites dot down here. So of course, we already have something to show you. So this is why I'm going to click right into the Marjorie Fleming's diary example, which we have here. And now we're already directly in the CMS. This is also where you would end up if you clicked on create sites. And now you already see that on the right side, you basically have the final product. You have the final Transcribers site's website there. And on the left, you have the editing possibilities. So I can simply click into the title, for example, and I can delete or add something else. So for example, exhibition. So I have already changed the title on the right as well. So you always see live what changes you basically make to Transcribers sites. Also, you can change the description by simply clicking here. You can also change how it is formatted. So if you don't want it to be bold, you can remove this right here and rather put in some italics. And you can upload an image right here, which we've already done in the background. And this is basically where it's being added right here in the home page. Then you can navigate through your Transcribers site's instance through two different ways, basically. You have the possibility to change here. So if you want to go to the about section, you can simply use this dropdown. But the more easy way, I think, is to simply use this navigation bar up here. So you can simply click on home. You can click on explore. You can click on search and also on about. And these are basically also the pages that you will be working on. Right. So we've done the home page and we now want to change the view on the right side to the about section because this is the next thing that we want to work on. So simply either use the dropdown or use the top navigation there, go to about, and then we have a different sort of CMS editing part on the left side and also, of course, a different page on the right side. And what is possible here always is, of course, to again put in a title, then you can upload images. And this image up here is basically the background, which is behind about this project in that instance. But of course, you can also work on all these several parts of the text that you basically have here. So what you can do is you can basically simply click on edit and then you can change the text right here. So again, what we will do to showcase it is just adding some bold script, for example. And then, of course, you see that also here the script is being changed. And this way you can then put in some text. As you can see here, we just have text. In the second part, we don't just have text. We also have an image which is uploaded right here. So simply by a click of this button and the lowest part basically is then text again. You can always add sections if you want, but you can also delete those by clicking on this button right here. And this way you can put together your very customized about section. Then we can have a look at the explore page right here. And what we see here on the left side is that we have enabled the tags. This is simply done by clicking on this toggle right here. If you have tags as shown before in your documents, then you can enable them. You don't have to. And then, of course, you can also search which tags you want to add. And then basically simply click on the tag and then it is added to your tag directory. And they can basically then be found right here. So first of all, you have the documents which are shown, but also the tags. And it is simply this little button that you have to click on in order to toggle between the two. And then if you click on a certain tag, you find all these tags showcased right here in basically a list of results. Then, of course, the explore page gives you some more possibilities in terms of the documents themselves. So here is basically a list of all the documents that you have in your collection. And if you click on one of those, you see that here basically you get all the pages that are inside of this document. And again, there are some things that you can adjust on the left. So for example, the title. And you can also click on one of these pages to just see how basically the part where you can really read the documents looks like. And as you can see here, we always have this highlighting. If I click into the original at each and every single line, I always get the highlighting also in the transcript on the right and vice versa. And this is basically a very handy way to go through the documents and read what is written there. Then we basically just have one page left. And let's just put in a search term to see how everything works. And if I now put in Fleming, then I get a list of results. And this list of results basically shows us the transcript at the top and then always a snippet from the original to always have the possibility to directly check if the transcript is good enough. I can also have some filters here on the left. Of course, there are not that many in this case as there is just one document. So if I basically click on this, I get the same results in this case, but normally you would have more documents so you can filter by the document here on the left. And then there's also some advanced filters which you can use. For example, you can adjust the fuzzy search. The fuzzy search basically does nothing else than giving you some freedom in terms of what is different between the search term that you put in and what is found in the transcript. So this could be one character, two characters, or three, depending on how high the fuzziness is basically set. Right, and then you can also filter by the author. For example, of course, here again, we just have one, but if you had more authors in your documents and you would enable this filter, then these are available as well. Right, this is basically all about it when we don't have the possibility to look into the settings right here. I think it is, again, due to the rights that we have in this role, but this is basically how you set up your own Transcriber Sites website. And if you're then happy with all the editing that you've done, you have a nice home page, you have a nice explore page, you have a nice search page, and also a nice about search site, then you can simply click on Want to Publish and then give it a go, make it available to the public and let people have a look at what work you have done. So that's it from my side, and now back to you, Florian, and let's talk about some more things, the subscriptions, et cetera. Yeah, thank you very much. Yeah, very great tool, Transcriber Sites. So here, everybody can really showcase what they're up to in Transcribers and in general with their work. But now let's have a look at another new and exciting addition to the Transcribers platform, at least from a perspective of developing Transcribers Solver. So as you've heard at the beginning, Transcribers basically has its roots in a research project or basically two research projects that are led by many leading universities within Europe, which transitioned into a cooperative four years back. And now basically we're kind of taking the next steps in the journey of Transcribers and are introducing or have introduced exactly 13 days ago, the new Transcribers subscription plans. You might have seen some hints already in the software itself, which I will also have a look later with you. But now I'm quickly just explaining why we did this and why we are introducing those subscription plans. First and foremost, we are very, as you've heard before, a community driven and tried to cater a lot of different use cases and a lot of different users. So everybody uses Transcribers a little bit differently and for that we tried to cater towards this very broad base of users for this and also to kind of maintain Transcribers father, as you've heard before, we are a cooperative and basically need to make money in order to sustain Transcribers. That's basically what our main purpose is of the cooperative. As we have in our institute, for instance, there's no shareholder or dividend payout. Basically, that's forbidden. We're not paying out any money. So every euro or dollar that is made on Transcribers is reinvested in the platform. And I think that's also one of the main reasons why Transcribers is hopefully, as you might have seen, becoming better and greater over time to make it a little bit more sustainable. We've now transitioned to this model of subscriptions, but we still try to make Transcribers as user friendly and as kind of open as possible. And that's why we also made a very substantial shift in terms of how much everybody can use for free. So before every user that signed up for a free Transcribers account received 500 credits, a one-time package, and then you needed to get your own credits after that. Now we try to compensate those users that are loyal. So if you stick with Transcribers on the longer term, for instance, already after five or six months, you have more credits than before. So you will get 150 credits every single month now. They are expiring. So those credits are valid for a month. They do not accumulate. But if you're using Transcribers on a constant basis, then credits are a lot more than before. So you can use about 1,200 credits a year now. And in the long run, it's even more. So with the individual plan that we are introducing, we try to be as far as possible. While on the other hand, let's come to the next slide. We're introducing also some more plans. The main plan is called this color plan. And that is basically the plan where you got advanced features. So while on the individual plan, most of the features that you've seen will be available. So the entire editor, text recognition will be available. Also training your own custom AI models will be available on the individual plan, but some advanced tools, as you might have seen with their field models and the table models. And those are also the new tools that we're adding to the suite of Transcribers. They will be only available in the Scola plan. So the reasoning was we don't want to take away anything from anybody that has known Transcribers before. So we tried to put everything that was available before in Transcribers into the individual plan and really limit only new additions, such as field models, table models, the super models that we have introduced at least in data stage over the last couple of months into the Scola plan. And obviously also organizational plans, they come with advanced capabilities such as user management because we also are aware that many organizations are also using Transcribers. And for that sake, you need to have a certain user management in order to add your colleagues to Transcribers and also manage the availability of features in Transcribers as well as having a dedicated success team is very important to us. So we really are interested in having a partnership together with everybody that is using Transcribers and that's why we try to provide as good as of a success management team to the organizational plan users as possible as well as the API, so the processing API which is the interface that is available when the organizational plans to process large amounts of pages through Transcribers. This is mainly used to build Transcribers into other apps. So there is already a number of apps and software out there that are using the Transcribers API to basically enable text recognition within their software. Let's have a closer look what the difference is as said before, where our reasoning was to kind of not take anything away from users that have known and loved Transcribers before but only for new features as we're introducing with Transcribers sites, for instance, with field and table models on the Transcribers column plan and with the organizational plan as said there is more credits are shareable so in order to distribute credits that you might buy on an organizational scale credits are shareable so you can transfer credits within Transcribers to other users or collections. Then there's also some limitations in terms of user seats as said with the individual and scholar plan you have basically your own plan on the organizational plan there are different tiers so 10 or 30 users but also a custom level so we really try to tailor the offering to every institution depending on their use case and for that we basically also offer custom user seat amounts and then the export formats will be a little bit limited so that's one of the little things that will be a little bit limited more and the issue here was that there are so many export formats at the moment that this led to a lot of support that we needed to provide try to provide free support to everybody so at the info address that you can reach we really try to reply to every single email but as the usage of Transcribers grows we simply cannot cope anymore with that and for that we need to restrict some features at least that are leading to a lot of support and for that the export formats will be a little bit more limited on the individual plan while in the scholar plan you can really enjoy anything another item on that list is document storage so currently document storage is not yet limited within Transcribers we'll introduce some limitations but really try to have a reasonable amount of storage here and you can always extend that storage then once we introduce the storage limitation as well as the training runs so currently on the free individual plan you can train for free five models every single month while in the scholar plan it's 30 models and if you really train 30 models a month then you're an excellent user of Transcribers so basically we think at least data has shown that basically no one is really training 30 models a month there are some iterations that you might do when training a model so you train your first model and you make another iteration but with a model a day you should really have a good amount of run and one important thing is here also a little bit to have some consciousness training models is a really resource intensive task so model trainings can run up to two or three weeks and this obviously is a lot of compute power that uses a lot of energy and we really try to make that a little bit more obvious that consuming that much energy for training a model and very often for only testing out some things could be considered a little bit more and to reduce energy usage a little bit at least then customer support will be also tiered into basic priority and set dedicated success team while basic I would still call it priority so we try to maintain the current level so the current level that everybody's receiving a reply that our support team will remain the basic one we of course might take some time to reply that takes a little bit longer which will be reduced on a scola plan of course so if you're on a scola plan then the reply might be a little bit faster as we try to really offer a service for what you're offering in a change and also the processing speed will be different so if you're on a scola plan or organizational plan you can jump people in front of you so if you're in a free plan you will need to be in the queue and need to wait until it's your turn with the advanced processing speed you can basically jump everyone in front of you in the processing queue once there's a lot of traffic on the servers which happens to be the case and we really try to have a powerful infrastructure but with the sheer massive usage of transcribels there are some queues that sometimes pile up and for that you have advanced speeds so your pages are processed faster then as you've heard advanced AI tools under which field models, table models and also super models fall and also small search which we have not spoken about today this is a rather small tool where you can enhance the search results so more search hits are found once you type in something in the search bar this is also only included in the scola or organizational plans and finally let's have a look at what that looks like and the best part of it is starting a free trial and also that's something that we will have a look now as you see here with that golden banner and you might have seen that already there's a hint that we are upgrading transcribels to the new subscription model and that your features might be limited so features that are not available on your plan will also appear as such for instance, if you try to recognize something with the text titan this will not be available so basically you need to upgrade but the cool thing now is that we really try to also make the usage of those features available to everybody at least to try it out so you can start your own free trial once you click on start trial you can start it and once you are in the free trial now as you see up here the free trial has been started you see now there's 30 days left in the current trial phase and once we go to our usage dashboard which is also new to transcribers up here you can see that I am on a free trial for another 30 days where you can basically experience the scola plan and all features of it below here you can see there are 100 free credits that you receive every single month this will be this 100 free credits will be added in this case for instance always on the 23rd of the month so you can use this allowance to enjoy transcribers and recognize your valuable sources let's now quickly jump back to the presentation as I said you will see these kind of pop-ups in the software if you are not on a scola plan but then you can basically really start your free trial and experience those features for 30 days one other thing as I've mentioned before we are coming from academia and we are definitely not forgetting our roots we try to support as many users as possible and for that we have established a scholarship plan already now it's already three and a half years ago where we try to support students that are working with transcribers obviously many students will not have the budget to have a subscription or buy credits for their work and for that we have launched a scholarship program as well as for teachers that are working together with students in their classes they're basically students and teachers can apply for a free subscription or for a free credit package that they can use for their work there are so many exciting projects and I don't have the exact numbers in mind but I think it's more than 300 scholarships that we've granted now in three and a half years to users from I think more than 50 or 60 countries so I'm not 100% sure I would need to put the data in the slides next time but on top of my mind I can just say that we have like more than 300 scholarship beneficiaries and I think I don't even know how many credits we've granted but we really try to support as much as possible so for students this might be a good opportunity and basically all we ask is a little bit of an abstract on what they are working so we're interested basically in what everybody is up to and we really have like from law to economic history to classical studies that you might expect that work with transcribers to also very exotic studies that we would not have expected there's a very huge variety of different disciplines and fields where transcribers is applied and used and that scholarship program is a little step that we are trying to give back to the community as I said and here we're trying to support as many students as possible now in terms of support as you've heard there's about 170,000 users which leads to a lot of questions and issues obviously as its historical material was not designed as contemporary papers you really have a huge variety of materials and a huge variety of questions and problems that come up and everybody has a different approach to expecting that information and unlocking those sources for that we try to set up a very extensive help center which you can check out on help.transcribers.org but obviously we're always happy to reply to your requests on the info at transcribers.org email address which you can obviously also find on the help center and yeah please check it out if you have any questions very often that's the quickest way where you can answer questions but obviously if the help center doesn't help out you can always consult our support team which I think does a really great job in providing that support to basically everyone and now a little outlook as you might have seen already or maybe not seen and I'm telling you in this very moment we're having a user conference in about a month so on the 15th and 16th of February we will have a hybrid conference in Innsbruck and online with the title of the future of information extraction so here we will focus mainly on extracting information from historical sources we have more than 60 speakers confirmed and as you can see on the photo of the last time so about the third of the attendees of that conference it will be between 150 and 210 days so it's not an extensive conference at least in person participation as obviously also COVID has shown us we don't always need to travel far to enjoy such events and that's why we also try to make it hybrid so online there's obviously a lot more of participation as you can join from everywhere but still with more than 60 people being on the stage and presenting their work we have at least a feeling that this conference gives the opportunity to so many exciting projects again to show what everybody's up to carrying their experiences with transcribers and maybe also inspiring others on how problems could be solved and how historical sources can be unlocked to come back to our vision and if you're interested in what's on the agenda for this year in terms of new features I would encourage you to take part in the user conference so there's a lot of exciting things as you've heard before maybe I've read it in the chat I'm not entirely sure we are working on named entity recognition so that is another high-based approach to extract information so tagging text so basically entities in the text will be a new addition to the models that you can train in transcribers that will be available on beta soon but we're also on other journeys such as large language models as with the introduction of chat with you it's in all of our minds and smartphones and work so many of us have experienced what large language models are capable of and combining this with transcribers is obviously something very exciting we are very cautious of course we don't want to introduce large language models just through integrating it with some existing service providers so we are working on them on our own at the moment and really trying to as we've that approach or had that approach since the beginning we try to have everything that is computed on our own servers and on premises so we try to make that available only through our infrastructure and for that we need to work a lot it's hard work but also very exciting work and we're very happy hopefully then to present something in the upcoming months and obviously we're also continuing the work on the web app we will work on a new desktop app many of you might know the old desktop app we're already working on that so that's also something that is on the agenda and those exciting things we will talk in more details about those things during the user conference especially the first morning on the Thursday of the 15th you might tune in and even better if you're here in person in Innsbruck then come by have a chat with us and experience what we're up to that's having said it's time for questions we're pretty sharp in time so I'm not sure maybe someone that is or was reviewing the chat has some open questions that we could discuss now if there are some I've seen that many questions were answered directly in the chat which is really great but if there are any questions maybe we can now talk about them maybe Zahar or Miriam do you have anything on top of your lists that we should address? yes I noted down one question which was about tags and how they can get exported to use them within other yeah outside of Transcribe was so if we can talk a bit more about exporting tags let's maybe jump back to the software so tags is a very exciting thing as we also just briefly mentioned with named entity recognition I'm not sure in which document we have some tags let me quickly jump into some otherwise we just create some here we don't even have text so that's a bad example but we have in the example that there so tagging let's move that a little bit to the side basically is a very nice approach to enriching your material so obviously if you talk about Boston obviously we all know there's a big city in the US that is called Boston but there might be some other places that are called Boston and you want to tag that in a certain way you want to say yeah that's a certain yeah that's maybe place is not enabled here you can enable tags by the way once that is enabled you can also tag Boston as such and then reach it with more metadata for instance if you click on Wicked ID it already provides you this entry in the Wicked data database and you can connect it with that so like that you can enrich your material with metadata and the underlying format within transcripts currently is XML so there's different versions of XML and different standards of XML but basically you can export that text as an XML and basically that's a very common format which you can use then in any other app or transform it to other formats you can export it let's now quickly save it just an example once you select that page or the entire document you can go to export and then select what you want to export as said you can export a standard document where you can select what you want to export yeah there are different XML formats such as page which is basically the standard format within transcripts but also auto XML that you can export but you can also export spreadsheets which is available here and we will add some more options where you can also add those text into spreadsheets and basically export the data that you have enriched into a database or spreadsheet as you might have known plus that answer the question I hope so unless there's anything still open just consult the chat or type it in the chat then we're happy to answer that anything else we could have a look at maybe we could just because it has come up in the chat just yeah just a reminder again or to make it clear that if you work with other people in transcripts you might be or you might need to be in a scholarship plan but others who are just correcting text or a layout or something like that they don't need to be in a higher subscription plan so they can stay in the individual plan and just use transcripts for free exactly one of the common misconceptions so yes if you are in a collection for instance you have user management here and here you can see there's three users and you can also add other users that you know the email of to your collection and then provide them with access and mainly they can do anything they can use the editor they can alter text they can hack they can yeah work on the layout basically anything you can imagine what will be limited is the usage of some advanced tools such as super models and now I'm in the free trial so I can use it but if they are not on a scholarship plan they will not be able to use super models so they cannot run recognition job with that model or train field models or table models so those advanced and advanced teachers will be limited but other than that their features will be available as to anyone anybody yeah, that was a question regarding Gobi and Dintanda I'm not perfectly up to date but I think that's also one of the cases where the API is being used and would need to look that up so yeah cannot answer that part of the bet okay, I'll shrink current so re-answer it how can I finish my subscription? I think cancelling the subscription that's also something where you can yeah, now I'm only on the free plan but if you click here then you can click on manage subscription and you will end up on the yeah, user management where you can manage your subscription and cancel it you're always sad of course if we see someone leaving because the major motivation that we have is really to provide a tool to this of help that can help achieve your things if not obviously we don't want anybody to force or force anybody to stay with us and so you can cancel any time basically yeah, natural language processing yeah, Sarah sorry Babyfler, you can better explain the extension between training rounds and credits because there was a bit a bit of confusion so what you can do with the individual plan so it's quite straightforward if you get 100 free credits every month on the individual plan you will see your available credits here so this account is a little bit older so it has more but you will see your available credits here and the credit equals a page you can process one page per credit recognize one page per credit with those 100 free credits so basically you can recognize 100 free pages free every month while as the training runs they are not limited yet that's still in beta stage but we will move that also to the production so app.transcreeper soon and then basically you can train five models so you can try in five models a month which is I think still a pretty large amount where you can train your own AI model and basically the amount of pages you throw into such a model is not limited so you can train as big as of a model as you like yeah, no question on API use there will be a webinar we currently in talks so we will have a webinar on API use soon anything else? I think there was a question regarding natural language processing that's also something we will address at the user conference so yeah you might tune in there but obviously natural language processing is something very important because if there's like more complex tasks where you want to dig deeper into the text then natural language processing is something very exciting okay since we're running a little bit I would like to round it off unless there's anything else yeah, really happy that so many joined and so many are still here even if we were running a little bit and we always yeah it took almost two hours now it's really great that so many of you are passionate about what we're doing here and we're also passionate about what you're doing which I screw so we really try to listen to you and try and corporate that yeah, thank you to everybody that was also presenting today did a great job I guess so really big thank you to everybody we will have some more webinars soon so webinars will be a little bit more Fossil read that it was not perfectly aligned that we first talked about the basics so this was a webinar where we tried to have it as extensively as possible but yes I would say it was a nice evening or morning wherever in the world you are or night if you're really very passionate about transcribers it's really great if you joined us during the night if you're all the East of Europe then it's really great to have you here and yeah thank you for using transcribers and as we always say keep unlocking the past and thank you very much