 All right, ladies and gentlemen, good morning and welcome to TUC 2022. As you know, we have prepared quite some intense two days for you with about 50 speakers, 10 panels and all this in two days, of course, with some social events as well, because we also want you guys to talk to each other and exchange your experiences. I'm very happy to also say that the director of the University of Innsbruck is also here today in order to welcome you to TUC 2022, and I don't want to lose that many words right now, so let's start with Timon Merck. Yeah, thank you very much for this nice introduction. Dear friends, transcribes, users, fellow read co-op members, and in particular also dear colleagues, welcome to this 2022 edition of the Transcubus User Conference. Obviously, it is a great pleasure to be here today, and to have the opportunity to speak to you and to speak for the University of Innsbruck. It is a great pleasure for us to co-host this wonderful, long-awaited event after a forced break by the Corona pandemic. Can we get that in touch? Yes. Okay, now I won't hear you again. You can hear me, yes. This is a teacher. Yeah, Transcubus was developed within the Read Horizon 2020 Research Project funded by the EU Commission, and to our great pleasure, it was coordinated by the University of Innsbruck, and in particular by our colleague Mühlberger. Read stands for Recognition and Enrichment of Archive Documents, leading research groups from all over Europe were from the beginning taking part in it, and due to the great international interest in the platform, the project was eventually quite successfully converted into a European cooperative society, actually also a first for us, having quite a lot of experience with such founding companies. In the summer of 2019, the founding meeting of the company with this highly innovative legal form, this European cooperative society, took place right here in this building, and I think this is also one of the reasons why you are today also here, that your conference moved now to Innsbruck. It was the starting point for a sustainable, and this is important also, community-driven economic model that has been the basis for a stable existence of the Transcubus platform now for over more than three years. Today, ReadCorp has 125 members and the Transcubus platform, and this is the main success product of ReadCorp, has almost 100,000 users, very impressive. I'm proud to say that Transcubus has had a tremendous impact since its start, and speaking as a scientist, I believe that it has the potential to change the way we think about historical documents, actually it has already changed the way, both in research and beyond. The great thing about Transcubus is that it's not only a tool for scholars and researchers, but it's open also to the public. Anyone who is interested can use it to decipher ancient documents and everyone can help to retrieve and make accessible for the first time historical pieces. I believe that Transcubus, and especially the ReadCorp, has the potential in a way to democratize access to knowledge about the past, and I'm confident that it will continue to have a positive impact on the world for years to come. As far as now your conference is concerned, I was told that this is the event, that this is the event in the Transcubus year, where all its different kinds of users from all different kinds of fields and backgrounds and from all over the world come together, and I've just been told that we have also more than 300 Lucas via the Internet. As for us as academia, it is also important from time to time to have a forum for users to share their experiences and to learn from each other and to discuss the latest technological and project-related developments. So I would like to take this opportunity to thank all of the Transcubus users and also in particular its makers for their passion and commitment to Transcubus and its community, and in particular I also would like to thank and say my admiration to Dr. Günther Müllberger, the initiator in a way and some of the founders and the spiritus rector of this thing. They have made... Yeah, Bravo! Very, very impressive and this is really a great achievement. And you all know we are also a university where the quantum computer has been devised, thought about and now it will be produced by a company and so you are in a good company with read co-op. It's at least also a very great achievement. You, the community, the Transcubus community, you have made something a reality that not many universities can showcase. You have put into practice a perfect example of what academia fulfilling its third mission can look like. Namely in this case, I just mentioned another example with the quantum computer, in this case in the area of humanities and not just in the natural and technical sciences which are usually traditionally associated with cutting edge technology. The traditional missions of universities obviously are research and teaching. These are main goals. However, we live in a time of growing social and economic challenges and universities are increasingly called a bond to contribute not only indirectly while well-educated students but also directly to society and the economy. And that is what we call nowadays third mission and I think third mission consists of two goals. First, the application of scholarly findings to deal with a wide range of societal challenges and second, the transfer of technologies and innovations in the form of collaborations with industry. And in a sense, Transcubus has been doing both of these things in the area of the humanities in a time where they seem to have lost much of their social significance. So the better. So at the end, I am confident that the future of Transcubus is bright and I look forward to seeing great things from the community in the years to come. Thank you and all the best for your conference. Right, and now we want to move on to the words of Günther Müllberger, who is the Chairman of the Board of the REIT Corp. And as we just heard, well, the father of Transcubus. Dear friends, dear members, dear co-op, dear colleagues, dear Rector Merck. It's almost 10 years ago, on January the 1st, 2013, that the journey began that had brought us together here in Innsbruck. Today, nearly 150 people and many more also online from more than 20 countries are gathered here, from Germany, Italy, Finland, even from Argentina, Canada, the US and the Emirates. And all of you are dedicated to the research of historical documents. Wow. If someone had told me at that time that I would be able to speak here today the Chairman of a European cooperative whose most important product, Transcubus, is used all over the world, I would certainly not have believed it. So it was, in January the 1st, 2013, that the EU project Transcripturium started, led by Juan Andrés Sanchez and Enrique Vidal from the Technical University of Valencia. There, we developed the idea and the first prototype of Transcubus. Three years later on the 1st, January 2016, the REIT project started, now led by Innsbruck, by my group, dedicated to improve the software, but also making the technology directly accessible to research and academia, libraries and archives, and last but not least, family researchers. We received 8 million euros, of which about one-third went to research, one-third went to building the Transcubus platform and one-third to so-called network activities. In retrospective, that was an extraordinary stroke of luck and an opportunity you don't get too often in your life. A lot has happened in these 10 years. In the first few years, our work certainly attracted some interest, but skepticism and doubt clearly dominated. In fact, I often had the impression that our audience was more impressed by our confidence that we would overcome the technical challenges than they were actually convinced by the results. And yes, just as we always saw the glass as half-full, our audience, on the other hand, often saw only the flaws and judged the effort and benefits as disproportionate. But there were several factors that came to our rescue. First, we can credit ourselves that we played with open cards. From the very beginning, Transcubus allowed users to try it out on their own computers, to upload their own documents and to try out the latest tools, mainly baseline detection and actual detection recognition. Everyone who made the effort could thus check for themselves what the state of art was and still is. Second, we are proud of the fact that also very early on we gave users the opportunity to train their models themselves and made the computing time required for this, which can be rather considerable as we all know, available free of charge, even after Transcubus had been commercialized. Third and foremost, there were and are users who believed in the possibilities of the technology as we do and worked on their models with great carefulness and systematics. To mention just a few names, Maria Kalio from the National Archives Finland, Dirk Alvermann from the University Archive Greifswald, Tobias Hodel from the University Bern, Achim Rabus from the University Freiburg, Anemiekke Romijn from the Royal Academy of Sciences in the Netherlands, Pauline van Huyvel and Jirsi Reinders from the City Archives Amsterdam, Ingrid Bayer from the National Library of Norway, Robert Klugseider from the Austrian Academy of Sciences, Dominik des Landres and Maxim Goher from the Universities of Montreux and Quiebeck and many, many others and I apologize for all those who I didn't mention. So you spent a lot, thousands of hours actually of your working time in the careful transcription and in improving the software with your feedback and with your work. You were and are our most important ambassadors and we are very, very grateful for your trust and confidence. Starting in 2017 and 2018, the technology underlying transcribals developed a level of maturity that actually made its use in practice possible for everyone and in the years that followed, the technology became quite widespread and is now part of standard tech companies such as Google, Microsoft and Amazon. So how will transcribals continue to evolve? I would like to focus on three aspects. In terms of actual handwriting recognition technology, we are obviously at the end of an innovation cycle. That is, of course, we see that optimization and improvements will enable us to faster train models with less training data or to just fine tune large networks. The integration of so-called transformers will also lead us to some progress. But a real revolution, at least from my point of view, would need to come from a very different direction that is still unclear from my point of view. The situation is somehow different for layout recognition. We humans take it for granted that we can not only read a text but that the layout of a document gives us very important clues about the content and function. It goes without saying that we can recognize a heading as a heading and know that the heading structures a text that it contains a kind of summary for the next section and so on. Or that a table cell is defined both by its vertical and horizontal organization and that, for example, one column contains family names and the cells next to it are associated with this family name. To teach the computer this knowledge and also to be able to map complex document structures will be a significant step forward and enable us to structure collections and documents in a much more fine-grained way than it was possible until today. As you can imagine, our technical colleagues from the Transcriber's team are working hard on these improvements and we hope that we all will benefit soon of that. However, finally, I would like to talk about another aspect that is particularly close to our hearts. This is the question of searchability and findability of the digitized documents. Many archives have been busy in recent years and they digitized large quantities, 5, 10, 50 million of digitized documents. In addition, there are still tens of thousands of microfilms that can be digitized at a relatively low cost. The large models at Transcriber's often show very good results even right from the start from scratch without training. So everything is well in place and we hope that in coming years we will indeed be able to process a significant portion of these millions and millions of digitized files. But then, a full-text search can be implemented quite quickly and our read and search product is also an attempt to shorten the path from the digitized and recognized document to the audience as much as possible. But as great as a full-text search over many millions of pages in seconds can be, we are only scratching the surface. A search engine that really specializes in historical documents does not yet exist to my opinion. Such a search engine must be able to do more than build a full-text index. It must have a knowledge of how historical documents are structured, what the essential features are of millenets, reports, letters, church records, disaster documents and other document types, how places, people, political entities are to be understood and how these documents were filed and documented in the archives. Or, to put it in another way, the technology we use today for full-text searching in archives is essentially at the level of the late 1990s when Google started making the Internet searchable. But if Google had stayed at that level, no one would be using that search engine today. I don't have a concept for such a Google of the past, but it is clear that one has to think beyond the boundaries of the individual archive. A collaborative approach will have much better chances to be successful on the long run. In this sense, transcribers and to read co-op are hopefully very first steps, which will be followed by many more, I hope. I wish you all an interesting and fruitful conference and especially thank the co-op team for their commitment and creativity in preparing for this conference. And my university director had to leave because the next opening will take place on the mountains and he has to rush there. So he apologized for that. So I want to thank my university also for the support which we received and that it made it possible to try out these things. Thank you. Thank you very much, Günther, for these words and also describing how it all came together in the end. As you all know, we have a very international audience today and, well, let's take the first step over the border of Austria to Italy and give the word to Andi Stauder, our managing director. Yeah. Very warm welcome to everybody who has come here today and who is following us online for me as well. I would like to get a little bit more concrete and to take a look back together with you and to see what we have been up to the past three years as a cooperative and what transcribers has meant to us and what it has been able to achieve with all of you together. Yeah. A bit of history first. As Günther already said, we came out of two EU funded projects the work on this technology basically started in 2013 during the Transcriptorium project then the REIT project followed which also is the namesake of the cooperative this is why it's called the REIT cooperative and in 2019 exactly in this building we founded this wonderful cooperative that has a lot of members all around the world we're more than 125 now and we are spread across 27 countries of the world. Yeah, this is a photo from that memorable day I think it was 1st July 2019 and here you can see the founding members and some representatives of the University of Innsbruck as well which is also a founding member. Here you can see some logos and some figures about the cooperative we are 126 today 77 of our members are institutional members so we've got a lot of archives, universities, libraries, museums even all around the world and 49 of us are private members. Yeah, there's this little window that hides part of the headings Yeah, the mouse is there Let's try to get rid of this Not even in a copy Let's zoom for you Okay, now this doesn't work either So, here Yeah, and here to see this on a map is really impressive how spread out we are already across the globe and I always like to say we are spread between the poles so our northernmost member is from Iceland and our southernmost member is from New Zealand so you basically can't get more spread out over the globe than this This is a slide on our team the A team or the AI team, wink wink so there's already a lot of us here too so we are more than 30 today we're from four countries so we're also quite international and people are joining us from different countries as we go along and many of us are still missing here so we can't even keep up with adding the photos to our website which is really, yeah, beautiful and I think it's wonderful to see that here the same cooperative spirit predominates as well as in the cooperative as a whole Yeah, let's talk a little bit about Transcribus which is, as Gunther already alluded to more than just text recognition what have we been doing for the last three years we have been establishing an ecosystem really not just one single tool so as you may know, Transcribus has a lot of you could call them siblings or sister tools that enable many different things in the work with historical documents so we are trying to facilitate the whole process from getting an image recognizing the text that's contained in it annotating it, training models improving the text recognition and publishing the whole thing on the web then or on your intranet wherever you may require to publish it and to share it with others the cooperative is obviously about sharing too these are the most important tasks that we are able to accomplish with Transcribus and its related tools you can transcribe text obviously it's in the name you can train AI models you can recognize the text you can recognize layout increasingly well we'll also be talking about this in more detail later you can enrich the text with tagging for example you can collaborate, collaboration and sharing go hand in hand you can export your documents and you can search your documents as Gunther already said search is very important and I think we live in the age of search that's one of the most important technologies of maybe the last 100 years so finding things is really important a couple of numbers yeah, there's 94,000 registered users we have recognized more than 20 million pages now with our technology there's more than 40 million pages that have been uploaded to the platform 16,500 individual models have been trained by you the community this is just a staggering number just to think about it that you have literally created more than 10,000 models we have active users in more than 150 countries so we're basically as big as the UN so we're the UN of text recognition you could say we have 40,000 monthly website visitors 30 read and search instances online already so that's the search websites that you can get in order to publish seamlessly what you've been working on with transcribers and more than a thousand scantents are in use so far all over the world then let's get even more concrete what have we been doing within the company we have been trying to improve the infrastructure and started this process almost two years ago invested heavily in the infrastructure tripled our computing power our storage is now almost up to a petabyte so we have 800 terabytes of storage this is not full obviously but we have to provide for the growth of data that are coming in and we are also thinking a lot about the architecture of the tools how do they work at the technical level and we'll also talk a little bit in more detail about this later and we have been trying to improve redundancy and failure tolerance because having so many active users and so many data this is obviously a technological challenge to manage all of this so we are on that very much as well we have further improved this content and tried to bring it to as many people around the world as we possibly could because this is the very first step first you need an image and for many especially smaller institutions it's a big problem to get an image in the first place because it doesn't help you if you have smart technology that can read the text if you don't have an image that's of no use to you then at the other end of the chain or of the workflow there's our search solution and publication solution where you can publish your digital editions of manuscripts for example or where you can make available your notarial acts to the general public etc and we have 30 of these already too then we have developed a processing API to facilitate large scale processing of material because we don't have just smaller institutions or individual members among our users but also larger institutions commercial providers so everybody who is helping to decipher all the documents that are hidden in all the archives around the world and all the newspaper collections and all the libraries and we have been working a little more quietly on transcribers on-prem which we will also present in some more detail not that much because it's still a bit early but we would like to point this out here as well because often you don't have a chance as an institution to publish your documents or to move even the images to another country or even outside your premises and that's why we would like to enable also these institutions and the millions and millions of pages that they have of very important historical documents to process them on-premise which is something that even Google and Amazon aren't doing properly and really on-premise yet Yeah, so that's been the first three years in a nutshell basically but what are we up to next? As I said, we are overhauling the technological basis so we are working on the foundation of transcribers as well so there might be considerable changes coming there in the next months and years we want to introduce more social elements so we want to connect the community even stronger also on a technological level and we want to make our focus on content and search even stronger and you will hear about this at 10.30 in our next-gen presentation so don't miss it if you want to know more about these things and where transcribers is heading in the next years and yeah, what's really next so what's next here in this room? First we have the keynote by Professor Glauner who Matthias will also announce in a minute we have a panel discussion following that where you also get a chance to ask questions that you would like answered by the experts in their fields on the question is AI the key to history and after that, as I said, is the presentation of the next generation of transcribers and this is what we chose for the motto of this conference and also for the cooperative as a whole unlocking our written past together and this is what we would like to do with you thank you