 Hello, also from my side. Before we've heard the number of 80 million pages, that's a pretty large number, I would say. What we have done, we've stacked them together on top of each other. And that's the actual footage that we've taken. And it's not AI generated. So yeah, we're almost at the tip of Mount Everest, with our pages. So if you stack them on top of each other, it would be about eight kilometers of a stack. So today, we will be talking about evolution. How have we become so high in terms of pages? And how can we get higher than Mount Everest with those pages? As you see, our beloved Volpi, how did Volpi start flying? Because first, in Transcubus, it was not flying. And our new logo that we've developed in the last year is flying. How come? I'm really happy that my colleague Kerstin is also with me. She joined us back in October as a product lead. And she will also talk about the roadmap today. So but first, I would just like to take a step back and think about what we have done. Last time, I was standing here. That's an actual image of there. At least one of us got a new hoodie. But this was 17 months ago. And now I want to have a look at what we have achieved in those 17 months. First of all, the move to the web. So all of you probably have experienced that we have a new web app. The web app is basically our answer to consolidating Transcubus into one platform, into one tool with the expert client and the previously known Transcubus light. We had two tools. And it was really, really a lot of effort to kind of maintain both at the same time while still providing features in both tools. And we obviously needed to take a decision. We see the future in the web and want to build Transcubus as a web platform. For that, we've also launched Transcubus Beta back in about 15 months ago. And the approach here is to really be as fast as possible in terms of feature development and providing those features to you. So you can get as fast as possible access to those features. Because what value is really valuable to us is feedback. So your feedback, basically, the feedback that we are getting from Beta is crucial for us. For the new editor that you might have seen already, and my colleague Kirsten will talk about it later, we got a lot of feedback on Beta and really tried to incorporate that feedback into our web app. One tiny little tool that some of you might have used is Transcubus AI. As you can see, the idea here was to drag and drop an image like with Google Translate and get a result really quickly. I mean, it still takes like 15 seconds until you get the result. But still, I would say this is a really, really great success. And we're really happy because this tool is used a lot. With our public models, there's about 40,000 users every single month that use that tool. With no login needed, then you can basically really quickly understand what's written there. So many users say, hey, I have a little diary entry from my grandma, but I can read it. Now they can basically just drag and drop it in. Often, of course, as you can see also here, the text is not perfect, but at least you understand what it's all about. And like this, we really try to make Transcubus an accessible tool where you can quickly get out what information is in the image. Then back in April, as you can see, April 5, was the day we uploaded it in a platform. We have introduced a new type of models, which is called supermodels. They're a transformer-based approach to handwritten text recognition. And the text item one is basically our first and most powerful model at the moment. You will hear a little bit more about that later on as well. And it went pretty well. So many of you probably have already used it. It does really good out of the box. And that's basically where this new type of models is the most powerful. So when you really want to get quickly good results, these are the go-to models. When you want to have the best results, very often you still need to train a custom model. So combining those two approaches, we have found is a really powerful addition to Transcubus. Then I will not talk too much about that. You've seen before in the presentation of my colleagues, we've also integrated field and table models into the platform. For now, it's only available on beta. But we will release that soon as well. Then something else. Very often with data, it is the case that it's sensible. It may not leave out of the premises from your institution. And for that, we've also come up with a solution called Transcubus on-prem, where you can install Transcubus basically on your institution. Data will not leave your institution. And you can really enjoy how Transcubus is basically changing the way we extract information but on your premises without leaving your infrastructure. Another thing that is already there for a little bit longer, but now we have rolled it out even more and more institutions and organizations that are using it. So there's already, and I'm happy to say that, a number of software solutions where there is a button where you can click on it and it says, transcribe or get a recognition, whatever. And basically what they're using is the Transcubus Metagraph API, where you can get back a transcription of your result basically in that system. And then our most substantial change of the last 17 months was the move to the new subscription system. Many of you might have seen that button where you can start a free trial. The idea here is to make Transcubus more sustainable. So as you've heard, we are a cooperative. We invest every single euro that we make into the platform. But for that, we also need to make that euro. And how do we do that? We've come up with this solution with the subscription model to generate a more sustainable revenue stream that will allow us to develop all those nice tools and features. And for that, we try to come up with these three plans, as you see with the individual plan. We try to really give as much as possible to the community for free. So basically, the idea here was everything that was free before will be free. So all tools that were already in Transcubus before will still be free. There are some minor exemptions. But still, the free plan is basically as Transcubus as it was before. What will be on the pay plans are the more professional tools, as we've heard from the field and table models. You might use them on a more professional level. So if you want to work on your grandma's diary, as I said before, you probably might not need a field and table model. But you still can get the tax recognition for free. And for that, we've also changed the model for the free credits. So now you will get 100 free credits every single month. They are expiring. So you need to be loyal. You need to use Transcubus on a constant basis. But on the other hand, you got a lot more. Before, you got 500 credits once, and then was it. And now, if you use Transcubus on a constant basis, you really can process, I would say, a decent amount. And for everyday usage for your grandma's diary, again, you can really use Transcubus. And now, I will hand over to my colleague, Kerstin. Let's have a look what we will develop in the next 17 months. Yes, this is on. Check. Nice. So thank you so much for the introduction. And hello, everyone here today joining here and online. A lot has happened in the past and in the past year. And we have taken quite big steps towards moving to the web fully. So let's have a look at some more details, some highlights and more details. Sorry. Yeah, first off the editor, as Laura already mentioned, it's the main area where your manual work is actually happening. And it improved quite a lot, especially in the last months. And we actually released the newest version just two weeks ago. Feels already quite longer. And you have the possibility to just try it out whenever it is convenient for you to give everyone time to get familiar with the new features and how the new editor is working. So you can still go back to the old editor, at least for the meantime. And if something is missing or not working for you, you have the option to do that. But please let us know if that's the case because we want to improve the editor. So it's working better for you. I also would like to take the opportunity to thank you all for your continuous feedback, especially with the editor, because it helps us tremendously to improve one of our core functions to your needs because it has to work for you. Yeah, we heard your feedback regarding working on smaller displays, especially on smaller notebooks, and started to optimize transcripts exactly for that. It is still an ongoing process, so don't expect that everything is working perfectly yet. But we will pursue this further this year, so it can only get better. Yeah, especially with the editor, we made sure that it is working on smaller screens, especially laptops, as I mentioned before, and you can even use the web app on mobile devices to, let's say, browse your collections and documents. You could try to use the editor on mobile, at least on tablets, on smartphones. You could browse collections to just quickly check something, and you could even, you could open up the editor to look at the image or look at the transcription, but you will not be able to use all features. Tagging is not possible, for example, and please don't attempt to draw a layout with your fingers on a smartphone. I have to say we are currently not planning to make the editor fully available on a smartphone because it is not really an adequate working environment, at least in our opinion, but we will look into what users would actually want or need to do on a smartphone so we can at least adapt for that if that's possible. So if you want to just quickly show something to your colleague, that might be a use case that we can support. What we do consider, though, is to enable working on a tablet if this is a relevant use case because then you would probably need touch input and actually draw a layout with your finger, but it depends on if this is a relevant use case. Yeah, one upcoming need from you as a community is publishing the great work you are actually doing and up until a few months ago, we were offering a written search to do so which got harder and harder to maintain on an individual basis. Each single project meant some work for us and a continued maintenance and fixing issues that arose of moving forward with Transcribos. Besides, as a new addition to our product line, it is now possible for you to publish your work completely on your own without our help or any dependency on us actually. You have to full control over what documents you want to publish and when you want to publish them. So you only need a subscription for a certain amount of document pages. Yeah, the editor insights. Yeah, that's the correct one. The editor insights lets you create individual content for the homepage as well as an about page and you have the option to even translate your content if you would like to offer your site in multiple languages. With six predefined teams that are currently existing you can add a personal touch to the appearance of your site and the first version of sites was actually released at the beginning of this year so also not too long ago and some improvements such as using a hierarchy to structure your documents and the option to have multiple content pages to present and contextualize your project are still to come. Yeah, besides in particular it is essential to make it easy to use on mobile devices so that others can browse your material and content on tablet and smartphone alike. Sites is not only intended to be used by your colleagues but by the wider public and interested people around the world can access your content. Even people without any knowledge about transcripts and are able to engage and access this material. We are very excited to see how sites will be used and what you plan to use it for and what you would like to do with sites in the future and in which direction we can develop it so to passport your publishing needs. Yeah, so we have spent enough time to go down memory lane and you can try out all of what we just saw yourself at any time. So let's rather take a look at what 2024 has in store for us. A short disclaimer before we begin, a whole year is a really long time in software development and what I'm about to present is our current plan with the knowledge we have. So circumstances and priorities can of course change and so can this plan. But let's start. Yeah, it should be a delight to use transcripts. No, it should be. No laughing here. Even though we have come a long way already we are determined to improve your experience working with transcripts even more. So one focus area of the first half of this year is to actually ensure reliability and stability of the web platform. This might sound rather vague and boring so let me explain what's actually in for you. Speaking generally, we want to bring down the rate of errors and failures in the user interface as well as for recognition and training jobs especially. So we are aware of the frustration these causes for you sometimes and we can also benefit ourselves because our hardware resources are best utilized in the best possible way for successful jobs not the failed ones. So some of you who have already worked with table and field models that we have already heard about today might wonder when these features will come out of beta stage and the lack of reliability and stability is currently the main reason why we are holding those features back for the moment but as Flo already mentioned before we are working hard to bring table and field recognition to all of you soon. Probably our own quality standards have risen and even though new functionality will be made available on beta in an early stage as we do now and we will continue to do we want to ensure reliability and stability before everyone is using a feature. Yeah, reliability also means that the application actually behaves in a way that you expect and how you need it and a challenge we plan to tackle more consciously is to support occasional or new users as well as our long-time and expert users the best way we can. We, I think we already have taken a big step forward especially with the editor because it's quite intuitive but other parts still need some love and taking care of let's say that way because first and foremost transcribels has to be useful for you. We want to continue or rather increase engaging with you for the further developing of transcribals to be able to provide you this best experience for working with historical documents and then web app that behaves in a way that supports you and your workflow. So we have to smooth out a few rough edges such as you probably know what I'm talking about if you experienced it already the process of training a model for example or also finding a good model even for your material or managing text just to name a few. It is not as bad as in the infamous comic on the right side that probably some of you at least know so you don't have a single point of failure but to provide a delightful experience we need to work on our technical basis as well to ensure a high quality of new features to come and build a bonus table foundation. I mentioned that because oftentimes this is sort of work that is not immediately visible to users or they just experience it in a slight undertone so to say and it might even go on notice but it still takes up quite a lot of time on our end and is very important. Reliability and stability will always be an important topic of course but we will set a particular focus on it in the first half of the year as depicted at the top of the slide this will always be an indicator if we are trying to do this in the first or the second half of the year. Yeah, there are still some important features missing in the web version of Transcribals and one of them being the different options to import your material. Today you can upload a PDF or multiple images to create a document and larger institutions also have the data available in other ways they might want to utilize in web two. So in the first half of this year we want to extend the import options to support Triple-I-F Manifest at least as good as we can because they're sometimes complicated too. And in the desktop version it is also possible to bulk upload large quantities of files via an FTP server and we do want to provide a solution for bulk uploads in the web as well so you don't have to upload each and every document on its own. So the last option it is currently possible to export a page XML for each page in your document and to more or less go the full circle and upload of those page XMLs is also planned. It is hard to support each page XML variant out there for importing your documents because of the very as you may know open and customizable format but we will try to provide some guidance on what will work and what we cannot support or we explicitly don't support in order to make at least the capabilities of the import option transparent. First and foremost we want to make sure with this feature that exported page XMLs can be so to say re-imported again. Yeah, oftentimes transcriptions exist in a separate word or text file and are not connected to the image of the material. Regularly requested feature is text to image and it already existed before actually but got removed because it was based on an earlier hand written text recognition engine before that we had to remove and it's not supported anymore but we plan to re-introduce this feature in the first half of this year probably as one of the import options. Just to quickly explain what this even is the text to image tool attempts to match existing transcriptions of a page with the image. So for the image a layout and the text recognition has to be done to find all the baselines and words in the image that then can be matched with the already existing transcription. So with this tool you can basically feed additional text into the process to match your existing transcription with the results of the recognition. Yeah, a topic already mentioned before and actually asked for, which is quite nice. So another important topic for us is still largely missing in web as well is the evaluation of results to find the best models to work with on your material and there are multiple ways already available of doing that in the expert client that we want to move to the web as well. So when you try out different models or text recognition on your material you need an easy way to compare the resulting text with an already corrected version, your ground truth to be able to decide which model yields the best results. We are therefore planning to introduce the text comparison tool for single pages to easily see the difference between the two versions of text. Yeah, you may not always want to, you cannot see that, but I'm coming to the screenshot. You may not always want to compare the text in a visual way, page by page, but rather have some precise numbers on accuracy for an automatically recognized page versus your ground truth such as the character and word error rate or precision recall and F1 score. These statistics are certainly only used by experts who can interpret these figures and is by no means useful for all users but we understand the need to be able to accurately communicate results in your projects or even for funding. A third approach to, and the last one for now is to evaluate the results to not just compute the accuracy on single pages, but rather on a randomly drawn sample of lines for multiple pages. And yeah, evaluating on a sample has of course the benefits of reduced bias and is a representative and statistically valid result. So we'll also bring that into the quality control feature. Yeah, we will most likely provide the quality control for text recognition already in the first half of the year because lots of it is already there and we only need to implement it in the web. And as far as I know, we do not have a way to calculate accuracy currently for layout recognition automatically but someone can correct me on that later. Still, we want to provide a tool for evaluating layout recognition this year as well to further improve the quality of your results by finding a suitable model for more easily. And I think in the design, you can also see the tech evaluation and attribute evaluation. So we have not forgotten about this. This will be a big issue or a big topic when we introduce named entity recognition. Speaking of which, we already heard a lot about information extraction and named entity recognition today, which are basically based on text. The text can already be set in the editor linked with Wikidata and enriched through additional custom attributes because information extraction and therefore text attributes and linked data should take on a much more prominent role in transcribers, some improvements and additional features are necessary. First of all, text need to be searchable in the future when you have thousands of them. For sites, you can see that on the right side, we already have a basic option to show all tags in our collection with the number of occurrences and you can click through to the page and see the tag in context. In desks, the search is not yet capable of doing so, but this will get more important when more tags are being set. We are also thinking about moving the creation and management of tags away from a collection and rather enable to create tag sets that can be used in multiple collections instead, especially when working with many custom attributes like we saw in the example before with 15 or even more. It's more easy to create a sort of tag set that you can use multiple times in multiple collections and don't have to set up every time again. You also may want to use services other than Wikidata. This is the only one currently implemented to link to open data and we are very curious to find out what services are needed and whether they can be integrated. So your input is always very welcome. However, we also heard a little bit about that. Tags and named entity recognition have even more potential, especially in connection with sites. We don't have any concrete plans yet because they will depend on how sites will develop, but there are already ideas to use tags to visualize locations on a map, for example, or to show an index of persons or an index of other entities that are of interest. We are interested in your projects that you are planning and the needs emerging around tags and sites. Now to the named entity recognition. The named entity recognition must of course also be made available in transcribels and is currently not. We will take probably the same approach as for the already available trainings where you can click through a few steps and then select your training and validation data and set a few options that are needed and we will use this extension of our training option, so to say, to actually improve the process of starting a training. Because there are some issues you might have already run into that lead to confusion or even trainings with pages that you missed to add. Some parts are not working very well, generally speaking, or even missing such as selecting individual pages for your validation set, but it's a very important feature. Yeah, I say this with a grain of salt. We are aiming to bring our first version to beta before the end of the first half of the year, but this is highly depending on the progress we generally make with this. For a stable version, we definitely will take longer than the first half of this year. You already kind of played the ball here. We are working on two new Transformers or Supermodels that will be available on scholar or organization plans, as well as for members. And I'm happy to share with you that DutchDimitra1 will arrive next week. It is, I see photos being taken from this amazing image. You will see the image for the model then as well and can download it there. It is, as the name suggests, for Dutch and about 46% more accurate than our current flagship model, the text titan that we already heard about before. And it was trained on 70,000 pages and 80 million words. Yeah, but this is not the only one we are currently doing. We are currently starting to train an improved version of our text titan 1 Supermodel based on 400,000 pages and roughly 85 million words. So that's quite a lot. And the text titan 2 will, of course, stay an around model supporting about 17 languages. I want to encourage you to stay for the closing session on Friday afternoon for a little surprise. We currently have a very strict hierarchy for organizing your documents, their collections, documents that have pages in it, and that's basically it. What we want to establish is a more flexible way of organizing your files by enabling a deeper folder structure. Each document holding pages or images could simply be seen as a folder with pages in it. And following that thought, we want to enable you to create as many subfolders to a collection as you need for working on your project. Collection and pages will still exist, but you have the option to create multiple levels of folders in it. While we are already working on the organization of files, we will take a closer look at our permission system. There might be potential to improve and make it easier to understand which role a user is allowed to do what, because sometimes it's confusing and not that obvious. And we might also enable sharing not only on a collection level, but on folder level as well. Organizing files or organizing documents and pages is currently not always intuitive, easy, or even possible, which is why we will seize this opportunity to improve on moving pages and documents around freely. Finally, selecting many pages is quite cumbersome at the moment. If you want to select, for example, the first 20 pages in your document, you actually have to click 20 times and enabling some sort of batch-selecting pages for further processing like withholding shift is a highly requested feature, and will actually relieve your index finger in the future. In the past, we mainly focused on the card view to show your documents and pages in your collection, but the list or table view is useful as well and shouldn't be neglected. The harmonization of the feature-ranging card and list view is there for long overdue and will be tackled, so you can actually use the list view properly. Yeah, metadata, generally speaking, is an area of improvement in the app today. OK, no second steps. You can set some metadata for documents, but not very detailed ones and certainly no custom metadata. There is also some metadata available on page level, showing basic information about the file, but there is no manual input or customization available at all. Somewhat standardized and extended set of metadata will also be the foundation for an extensive search, though, which is why we need to improve in this area. Metadata does not only concern documents and pages to be precise, because when training or publishing a new model, for example, only a minimal set of information is available to others that might want to use that model and might actually benefit from more detailed information. We currently don't always have information in the center that is available, for example, that could be useful. Strictly put, this is not related to metadata that much, but speaking of the models and the models, Jerry, we don't show the training data in an appropriate size currently, so that you could actually tell if the writing is similar to what you have. So we could also improve that and additionally show the validation data set as well. But back to metadata. We have been discussing internally for some time how we can use metadata better, so that it has added value for the collection owners and other users alike, as well as for future functionality like search functionality or assistance in finding a suitable model. And the discussions are going on. I cannot say anything specific at the moment, but we will be addressing metadata this year to make better use of it. I think you might have already heard something about that two years ago. I'm not sure. I wasn't here then, but I'm not sure exactly anymore what those communicated was. Yeah, oftentimes it's a team effort to make sense of historical documents, and many of you might already be working on projects together in Stanskrius. There are discussions happening, questions being asked, and how to read words or decipher together. We want to encourage and promote collaboration within the platform beyond sharing collections, so you are able to work together efficiently and effectively. I think collaborating within web applications has almost become a sort of must-have, and we cannot and do not want to miss out on this opportunity anymore. We are very curious to find out how collaboration around Transkrius works today and where we can best support or we can best offer support within the platform. We will be conducting interviews with users this year to discover more details, so let's stay tuned. On the side note, collaborating with all of you to develop Transkrius is very important to us too, so this is our project together, so to say, and we plan to include your specific topics in a more focused way and test other ideas before we actually implement them to best support your needs. Yes, we will start with a specific collaboration feature right away that we are at least, at the moment, are very sure we'll be useful to you so we don't need to interview. We think this will be useful. For working together on your material, we will introduce comments on page level. So comments on a page allow you to ask your PS questions or answer them and clarify whatever is needed to finalize a transcription. Sometimes a single answer is not enough and there is a slightly longer discussion needed to settle on a way to go forward, which is why each comment becomes a thread as soon as someone replies. So a thread holds a discussion together and ensures that several discussions can take place simultaneously without mixing them up. To keep track of solve discussions, you can pick off threads as resolved and keep an overview of open ones on the page. And open discussions will also be indicated when you are filling all pages in your document so you see at a glance where there are still discussions going on that need to be cleared up. Later on, we will iterate on this feature because just commenting is not enough to let you set an avatar to show up with a comment that's probably not the most important feature, but everyone likes to set an avatar for themselves. So probably more important, you will be able to highlight users you shared your collection with and that you would like to include in this discussion and will be notified about dimension. Comments serve an additional purpose at least for us because we hope that your engagement with comments lead to insights on how you want to or need to collaborate with Intran's keywords. And it will help us to shape collaboration functionality in the future. Of course, in combination with talking to you. So your avatar will be set in the user settings that the currency not existing, but we will introduce and this is certainly not the biggest advantage of introducing user settings. We will eliminate some issues with this feature, such as saving your preferred view on documents and pages, card for card view or list view, or your settings for the editor. Currently, they are saved in a cookie and get lost as soon as you clear the cookies or switch to another device. And that might be annoying, especially if you have set some special things in the editor, for example. Virtual keyboards haven't had an update yet and are one of the last topics we need to tackle in the editor. We are thinking about saving certain configurations of a virtual keyboard in your user settings so that you can simply select them and use it in a collection instead of setting them up again and again for each collection, especially if they are more complicated. They should save you some time. Yeah, we are also thinking about applying this concept to text as I think already mentioned before so that they can have multiple text sets available to choose from in your collection and you don't have to set up the same text again and again in your collection and especially with named entity recognition and increased use of text in general, it might come in handy to just store certain configurations. Yeah, I briefly already mentioned notifications before. We are planning to send notifications not only by email like we do now, but also in-app and let you choose for which event you would like to receive a notification via email or in-app or both. And at the moment we only send emails for finished training jobs, export jobs and when you get added or removed from a collection, I'm not aware of, might be some more. But we would like to use notifications, of course, for new comments and when you are mentioned in a comment but are also exploring additional options such as informing about the finished recognition job that takes a certain amount of time. And of course, to not spam you with notifications, each type of notification can be turned on and off, of course. I think that one you mentioned last year or two years ago. Also me. I'm just putting a blame on you. I'm taking the bullets. Yeah, one final goal we want to achieve this year is to leverage the potential of you, the community, more actively, countless times. I think just a moment ago in the break, we got the questions like, is there maybe someone else you know of that is working with a particular language or language in a time period or type of material? And more often we can actually help with that but certainly we don't know the full array of people working on something. But we want to provide a platform where you can actually engage in each other's projects and work and find people working on similar projects and in general be able to share your knowledge, experience and good practices you established. So you don't have to go the extra mile around our support team or other people in the company. Yeah, for that we plan to introduce first of all a forum based on discourse if you know this software to serve as a platform for exchanging ideas and information and we're very curious what will happen there then. Yeah, to summarize in 2024, we want to complete the move to web and ensure reliability and stability. Two new super models are on the way already and we will focus on supporting collaboration and the community. We will introduce named entity recognition and improve tagging and we will optimize organizing files for more flexibility. I want to encourage you to try out the web version and a new editor as soon as you get back home and let us know how it went. I will be around so don't hesitate to approach me if you have any questions or feedback or other ideas. And I think an exciting feature of working with historically documents lies ahead of us and with that, back to you. Thank you very much. Yeah, you've seen a lot of really exciting things coming up. Let's come to the things that we will then maybe not be able to present next time but we're talking about them. Anyways, we're really excited about those things. These are more general topics. So we really want to also kind of enlarge our view a little bit and also think about some more additional topics we will be working on. One thing is, as if you're working with retrievals you might have experienced something. Basically workflows are based on page level. So if your paragraph starts on one page and then on the next page is the next part of the paragraph you might in some occasions have a hard time to kind of work on that. So we're also thinking about moving to a more document based workflow where you can really work on documents and entire collections based of documents. Something that we've heard before also and was a question before is what is when you're working with a lot of tools and we're adding more tools and more recognition steps and more trainings that you can add. You might need something to manage those tools and here we are thinking about the workflow builder. So as you add all those tools complexity arises and we've heard before we need an interface to manage that and for that we're thinking about the workflow builder where you can really configure your workflow and your tools in a way that you can block them together and really get out a great result by pouring in your material. Another thing that was also really important to us is to making everything available via API. So we have the Metagraph API as we've heard before that is basically an API where you can put in an image and get text back but you might want to do more. So basically we're really trying to focus on making those tools like field models, table models and also the recognition and available via API and then eventually also on tram. Then in relation to site something very, very exciting that we're thinking of is building basically global platform to connect sites. So there are already a number of sites available together. So a platform to search history. So that you have one search bar as you see it's really simple to design here to search the past that you can basically type in your keyword and find the records in the past and really inspect the sources. As the availability of those resources is really important in making those sources available online is basically what sites is here for. And then also we're trying to expand a little bit removing the expert client on one hand but on the other hand you might still need a tool to work offline. So not every, all the time you have internet available. So we're also thinking about taking the web app and building on the web app to have a downloadable desktop client that might also work offline. On the other hand, we also need something as we've heard before the editor will know not work perfectly on mobile. So don't hesitate drawing a layout on mobile but we might need an app. So there is the DocScan app already available in connection to the scan tent but we're also thinking about providing a more elaborate app with a transcribers app and we're all below vault B icon on the app icon. And now let's come to a key and A as Kirsten had before, as I said before I'm also around in the breaks and after and at the dinner. I'm hoping that everybody's coming to the dinner is this evening and then we can discuss those things and also get into discussions because not every time you will agree on everything and that's also good because those discussions really encourage us to think about stuff to really reiterate and then make it better. Yeah, so if you have any questions, just let us know and then we can discuss those questions. My colleague is going around and then we can handle those questions. Yes, thank you so much, Jeremy from Canada. I just wanted to know, you spoke about the app but what about the expert client platform? Are you planning to, when is it? That is exactly what I was talking about, okay. So we really think that the future of transcribers is in the web and we really want to consolidate in the web. So basically what we're trying to do is here, so first, as we heard, we want to focus on reliability and stability, move those features from the expert client that are really used because there's so many features. There's a saying nobody in the team knows every feature in the expert client that I think that's true. So there are so many features that we really are also digging into data to see if there's a feature that nobody uses, why should we bring that to the web app? But we really want to bring every feature that is used and that is of use and of value to users to the web app. And then once that move is completed and obviously we will also think about removing or at least as the expert client is open source, leave it as it is. It might be possible that the APIs using might not be supported anymore and then some things might also break in the long run. But for now we're really focusing on moving everything into one platform because maintaining two is, as you might imagine, quite an effort. Will you want to add something, Kostin? Okay. Thank you. I was wondering whether you are also restructuring or tailoring the export options for users that are not using the API functionality or just because there is many in the expert client and some are satisfactory, some are not. And sometimes you get a lot of emails and sometimes you just get one email and it's sort of unpredictable. And I was wondering whether this can be part of this year's roadmap too. Definitely. So maybe you want to add something because that's actually something we're really working at the moment. We did move at least the most used experts to the web app, I think last week, except the TI expert is still missing and we did not change anything of the behavior or what it is actually doing, but just moved the current functionality. Also with the focus to see what people are actually doing with the experts because we do have some data around that, but we're curious how it will be used in the web and what feedback we will get, how to improve the export because you want to get your data out of the platform, of course. And yeah, the emails are annoying sometimes, that's true. Hello, I'm Magali Alegre from Peru. I was wondering if you can comment on relation to collaboration and if you do have experiences or improved in collaboration, if you have experiences in terms of using the functionality for historical archives and for involving volunteers on the supervision or transcription of documents. Thank you. Yeah. Do you want to begin? I can take it. So that's fine. First, it's really nice that you're coming here from Peru. I can imagine it's a really long travel. So I think one of the longest, so really kudos, coming back to your question. So collaboration, as I said, will be a focused topic. With the comments, we're taking the first step, but we know there's a lot of crowdsourcing projects going on. So what we are currently thinking of, but those are the things that thank you, I think the next time we promised it and didn't hold it, is project management features where you can really use those features to kind of organize your crowdsourcing projects, for instance, that you can set up tasks, for instance, and say, hey, this person could work on those pages and correct them and design those tasks. So that is something that we're actively thinking about and really discussing it. But as you've seen the list is rather long. We need to prioritize and really work on features one at a time. We're really trying to do so. And it is something that is on the roadmap a little bit more regularly than those things that we've presented today, but those features will hopefully also come in the near future. Thank you very much for your insights. So I have one short question, one little bit longer question. One would be, do you have a schedule when the workflow builder will be online, will be live? Because I think one of my colleagues is really interested in a combined table analysis and named entity recognition. Here the answer is rather easy. That's why we also have not put it on the roadmap. We don't know yet. We know that we want to work on it. Hopefully this year, but we're always ambitious and we've seen there's always a prioritization that you need to make. And then we will try to do it, but of course. Okay, I keep my fingers crossed. Peter will tell. Okay, thank you. Another one. So I'm coming from a digitalization project. We have our own databases, our own data models. I was wondering when you're talking about API for named entity recognition, will it be possible that we use our own knowledge from our own databases, from material that isn't published yet and use an API to train an entity and named an entity recognition model or how can I imagine API usage? Perfect. I understand what you want to ask. So maybe Michael, if you want to correct me then, that is a really valuable point to kind of input, for instance, your data to retrain those models. I think we discussed that just last week. So it will be definitely something that we will also work on. How it will be implemented, I can tell you. So I'm not involved in that so deep that I can tell you. But maybe do you want to say something Michael or Andy? So we could just shift around the microphone but that's definitely something that we could work on. Also for simpler applications like language models, inputting your data in the language model of a text recognition or of a model at least, that could be something that is rather easy. So there is already a language model that has been trained during training. Once you train a Pylio model, altering that and that in more information, especially for names, for instance, that might be something that is also on the plan. But correct me if I'm wrong or that if you want to add something. And we also have a mic here. So we don't need to run around too much. Cool, thank you very much. Yeah, actually a training named entity recognition will involve inputting user generated data. So that's one of the core features I would say that is required for this. So yeah, the technical aspects, we haven't really talked about those. So what this will look like in the user interface or what formats will be supported, et cetera. But you will definitely be able to ingest lists basically that you can train your named entity recognition models on. That's also what we have done in the project work that we presented during the first keynote. So yeah, a definite yes, I would say, but the details are still a bit up in the air. That's where it gets tricky. I don't know, Michel, do you want to add anything? Okay, great. More questions here in the front we have two. If there are any questions that we can answer, you need to come to the dinner and we can handle those. So thank you. For those of us working with a lot of documents, it would be so helpful to have some way of tagging at the document level, especially in terms of workflow, the way that you have, at least in the expert client, the ability to tag pages in terms of what level they've been recognized and proofed and final proofed, et cetera. The same seems like it would be really helpful on the document level and to be able to, of course, filter your collections based on various kinds of taggings would be super helpful. Totally agree. You might see this here, here on the table. So here I already tried to also put that in that model already. So as we've heard from my colleague, Kostin, metadata is really important also for documents, also for pages, we need that definitely. And then you can use that metadata for your workflow because you know how you want to structure the metadata and that's why I really want to give it into your hands that you can set it up as you like and then use it for instance and say only documents that are done go into my system, not the documents that are still in process or a draft or whatever, but you can define it. Currently there is the page status, but it's predefined, you cannot modify it. It might be something that is also very useful. We've been discussing that also for a long time, but as we will work on metadata, that's also something where we'll work on. Next one. One question here. And then everybody's hungry, I guess. Thank you. So I think so that me and some other people are holding on to the expert client for some of the reasons like the speed of the work, you know, and I'm not speaking about the box in the online client, but the speed, which is maybe attributed to the keyboard shortcuts. You know how you can use the keyboard to be much more quicker? Are you going to make the expert client also better on big screens, not only on the small screens, by implementing more shortcuts, by really doing it suitable for keyboard work? I think you Kirsten worked a little bit on the keyboard shortcuts concept already. Yes, you can look at, I think at the menu where you can see all shortcuts that we're currently supporting in the editor at least. So we do support, I think most of the actions, there might be some missing that you can point out to us then. And it definitely makes sense to use even more shortcuts to work with, I don't know, moving documents around, or there are certainly actions that would benefit from keyboard input. So yeah, definitely something we can improve and think about. Yeah, that's so sad. With the keyboard shortcuts, that's something on the list as well. As you see the list gets longer and longer. And I've seen Kirsten is also taking down notes, which is really great. Or she's not taking down notes, I don't give any feedback, please. Yeah, so we're writing that as well. And keywords will be also needed outside of the editor. So that's a concept in general we will be working on. Yeah, maybe one more. I had a couple of questions about the super models, the text Titan two in particular. First of all, what network architecture are you gonna use? Is it TROCR or? Yeah, the same as Titan one for now, yeah. One option would be to publish 17 language specific super models instead of a multilingual model. Have you done a systematic evaluation of those two options on single language evaluation sets? So should I answer that question? Yeah, then we can go to the next one. Yeah, there has been some work. And that's why also we are basically launching the Dutch Demeter first. On evaluating if those models appear to perform better. So language specific models like the Dutch Demeter till now there is no clear result based on that evaluation if we can really say that if we take TROCR to basically improve the recognition for one single language or for multiple ones. At least what we have seen is that basically the more you add the better it gets. That's why we are also training this larger model. It's basically to give a rough estimate about two to three times of the current Titan. So it's a lot bigger. And we have seen with the Dutch Titan for instance already with the Dutch Demeter, sorry, already that for language specifics, if you really add just ground truth for that language it really improves. But if there is the same effect, if you add exactly the same amount along with other languages to a bigger model, there is no evaluation that we've made. So that's why we are currently training it and then we can evaluate. Yes, I'm interested because the TROCR is decoding on a subword token level. And my experience is it tends to mix languages, mix language tokens, sort of Germanify Swedish language, for instance. So I'm interested in whether the multilingual setting actually improves the accuracy or if it diminish it. Yeah, totally agree. We will also be interested about that and then we can probably evaluate. But currently there is no concrete evaluation about that if really the language specific model because that's still underway, launching next Thursday, will be much better. One thing that we can say is that the tokenization during pre-training plays a big role. And what we are doing with the Titan models is technically speaking fine-tuning. So what can be observed is that languages that haven't been seen that much in the pre-training won't get that good of a performance in a mixed multi-language model. So the tendency that we're seeing is that probably the very large models work well but they may not work as well for all the languages that you put into the model. And the other thing is also keeping in mind our resources. So doing specific models will enable us to cover some languages or historical regions with dedicated models without needing to experiment too much with even larger amounts of data. So we're trying to service separate communities separately also, so it's a part of the strategy of improving things in a more sustainable way, basically. So those are the considerations that we can add to this. So, is that an addition? Because I was saying, let's say, Just quickly, I was wondering if you have landed some kind of editorial rule for contributing ground truth because every model is tailored on their project and if you are just combining ground truth that is available that might have expanded abbreviation non-expanded abbreviation, normalization, not normalization in a big Titan model, I am facing the problem that the effects are kind of random, which is exactly the same problem we have with large language models we don't know what was thrown into the cauldron and we won't know what we can expect to get out of it. So, I'll answer that then. I think we have a question online as well and then we go to our timings. Funny stories, if I go back into our task management tool for the development, that's actually a task that is now there for about nine, 10 months, exactly for that. And basically the task that then developed into being the task that we need to work on document metadata because that's basically just some metadata that you need to add to the documents and then we know, hey, actually that's tech as ground truth for that model and then we can just set it. So, it's basically long story short, metadata here is relevant again and then we can just filter out that ground truth and put it in the model. So, yeah, once we have that, you can just start labeling your material to be integrated into such a model. And then let's maybe have the question online as well. So, from the chat arose the question if the comments are only for the editor or if they maybe will be published on sites too, maybe Kerstin can tell us some words about this. Currently, we're planning to use it in the editor for actually working together. I mean, it would be possible to use that on sites but that brings up a whole new issue because then we have to actually moderate those comments because they're publicly available and we don't have a solution for that yet. A little box of worms. So, with the Digital Services Act, you as a platform need to moderate those comments then and that's currently with the current resources would take a huge stretch to really do that. So, technically might make sense and in the long run probably may also make sense to really have an open platform where you can comment on those sources but for now that's rather something that we will probably shift a little bit towards the end of the list.