 Good morning everybody thanks a lot for coming I'm happy to see you and I'm very happy to welcome our vice rector for research here at the University of Innsbruck, Professor Rieke Tanze and she will open this conference. Ladies and gentlemen on behalf of the rectorates of the University of Innsbruck it is my great pleasure to welcome you here in Tyrol. I hope everyone had a pleasant journey and that some of you already had the opportunity to spend some time in our city or its spectacular surroundings and thanks to the organization today is the first brilliant weather. Today's beautiful venue is the so-called Kaiser Leopold Saal named after one of the founders of our university Emperor Leopold the first of Habsburg. We find ourselves in the former Jesuit College and therefore in one of the oldest buildings of our university which was founded 351 years ago. I don't always know our exact age but we celebrated our 350th anniversary last year so it's still quite simple to tell you how old we are. We are proud of more than 350 years of inspiring research and teaching, 350 years of visions ideas and continuous advancements in the heart of the Alps and in the hearts of Europe. However we are also says the occasion of our anniversary to take a look into the future to rethink and reshape the role of research and universities. Like our university today's conference takes a look into the past and into the future. 14 partners from all over Europe carried out research within the consortium of the REIT project that was funded by the European Commission with 8 million euros from 2016 to 2019. The very ambitious goal of the project was nothing less than revolutionizing access to handwritten documents. One of the most notable characteristics of this project was not only its outstanding quality of basic research but also the fact that the service platform called Transcribus was developed. The success of this effort exceeded all expectations. Recognition rates are much better than hoped for. The platform is now fully functional and actually used by thousands of users and some of the users are here from this more than thousand. As I said past, present and future are linked by this project and its outcomes by modern means of digitization, automated recognition, transcription and searching routines users engaged in the transcription of printed or handwritten documents are now able to carry out research on historical texts on a completely different level. The University of Innsbruck was the coordinator of the REIT project due to its tremendous success. The idea for a spin-off emerged. Together with the project team the university engaged in this experiment and founded a so-called European Cooperative Society. It is great to see that already more than 30 institutions have joined so far and we are very optimistic that this success story will go on. Today's conference with 150 attendees from 26 countries is an impressive proof of the quality and relevance of Transcribus. There are people from archives, from libraries, from universities and companies attending this conference. It is obvious how important Transcribus is for many different user groups. I thank you all for coming to Innsbruck for sharing your thoughts, scientific insights and findings on this very important topic. In particular, my thanks go to the organizers headed by Dr. Günther Mühlberger from our university and my special thanks also are to Johanna Walcher and Tamara Terbol and I would say we give them a special applause for all the work they have done to organize this conference. Thank you and all your project partners for your enormous efforts in the past years within the REIT project and for making this conference in Innsbruck possible and I hope you all enjoy your stay in Innsbruck. I wish you an inspiring and successful meeting. Yes thank you Ulrike. This was very hard warming. It's also a great pleasure for me to say thank you. I have first of all to say thank you to you for coming to Innsbruck. Some of you really took a long travel from Canada from I think someone came even from Japan from the US, Great Britain, Finland. So really thanks for coming. Also thanks to all who are representing one of the co-op members. I think about 30 people in the room are somehow connected to institutions or personally members of the co-op so that's really great. I also would like to say thank you to all the partners of the REIT project. The REIT project has ended half a year ago and more than half a year ago but it was really a great pleasure and without the support and the trust of these partners neither the technology nor this legal entity would have come into being so thanks again for this. And I also have to say of course thank you to the team Tamara and Johanna already have been mentioned but there are many people behind transcribers as you may know have seen from the emails so thank you for the team. Members Philip, Sebastian, Bertolt, Florian, Andy of course and others so great to work together. What has this nice guide to do with transcribers? There is a funny story from Spain. People were looking for treasures and some colleagues from Belgium made a nice video which some of you might know but not all of you will know this video so I would like to show it to you. Historical archives contain a treasure trove of information. In filing cabinets, boxes and folders you can find vast amounts of old maps, letters, monuments and other often handwritten archival documents. To still be able to see the forest for the trees archivists create global descriptions of the contents of those cabinets boxes and folders. Those descriptions of the inventory can easily be searched online at home from your comfy chair. But sometimes a description is not enough instead you'd like to search the text word by word but that's only possible after having someone typing out all of these documents by hand and that's really happening a lot it's called transcribing. Imagine the extraordinary amount of work but luckily computers can land as a handy space. That's interesting. Because of major breakthroughs in the field of artificial intelligence computers can now train themselves to become very good at something in particular. But how does it work with transcriptions? Well first you'll transcribe a couple of pages by hand. Feed this to the computer that will then start practicing on it. It will check adjust and improve itself over and over again until the model that is created is good enough. Using this model the computer can now read all the documents in this handwriting that it has never seen before. Whether it's 100 or 100,000 pages. Finally you put it online and from then you can search the full text word by word. Handy! So thanks a lot. I think that's really great. Rick, are you here in the room? Yeah, so here he is the guy who did the video. Just by fun. Okay. When I see your faces I know that you are really representing these four groups which we had in mind that we want to provide services from the very beginning and this slide is really old but it's still what we would like to do that we provide services for archives, for libraries, for humanities scholars, for the public and also computer scientists and technology providers and so that's how we understand ourselves and of course we want to support all these groups. They have of course also different needs and need different tools but that's what we really like to do. Last conference in Vienna a bit more than a year ago we asked you to answer some questions and we got very interesting answers from you or from some who attended this conference a year ago. How has working with transcribers impacted your professional and your scholarly activity? It has convinced me that it is workable and not va pu va. So yes you can really work and we see it every day. It is it is possible to do something useful with it. Greatly my most interesting job and my first funded project which I'm now starting would not have been possible without it so I hope that the project was a success but this is indeed transcribers is used in many project applications and the project design can be changed if this technology is used. Could do things that were not possible some years ago yes definitely. Expect it will affect my profession in the future dramatically. Yeah I mean we have to admit that of course at the universities many people are still working in a traditional way and sometimes the promises made by digital humanities papers or digital humanities representatives are maybe a bit too much. If we see it on the longer term it is by sure that it will change the way humanities are done at the universities and research institutions. Gave our department the ability to start a whole new project wouldn't have started my digital edition without transcribers and groundbreaking and I again have to emphasize that this technology is coming from several research groups in Europe. It has been mainly developed by research groups from Valencia from Rostock there are others in Europe so it's really great to see this input coming from universities to a service platform. I had a quick look what happened a week ago more or less a bit more than a week ago just a usual day in transcribers so 11 models were trained by users within transcribers or if the training finished on this day. The language is used for these models were Arabic, Danish, French, German, Latin, Syriac and Tibetan. All in all more than two million words were the basis for these models the ground truth. There was one very big model which finished at that day so that is not representative of course to get a model with 1.7 million words every day that's not the case it came from a large national archive but the rest are still more than 300,000 words so if we count 150 words per page or 200 words per page or whatever you come to something between 1,500 and 2,000 pages which were the basis for ground truth just at this day and as you know creating ground truth is expensive and you can always hear from computer scientists and people working with data science and this artificial intelligence and all this nice stuff that ground truth is the real value and and we are every day surprised that you are producing really so much data so much reliable ground truth for training the models and that's really one of the the outstanding features of the transcribus platform so I assume that within transcribus you will find the largest ground truth collection for historical documents worldwide for old historical documents of course if you go to the industry Google, Microsoft, other providers have much larger collections for modern handwriting but for historical hand-wired things I'm rather sure that this is the largest collection yeah the next step is of course from an established platform to an economically viable entity and this is definitely new something for us so I have some experience with projects on the academic level but to make a startup is something new and it will be interesting to see where this journey will lead us it is five years ago that I wrote together of course with others in the read project proposal for the read project and and we had to explain a business model and there I was writing about the governance model and I wrote already that it would be interesting because the concept of gathering the data centrally was already in our heads and was already our plan that it would be an interesting idea to set up a cooperative society and there are millions of people actually in Europe part of a cooperative so Rifeisen is very well known or many cooperatives are there in the agricultural sector or in the bank sector or nowadays in energy so it's a well-known legal governance model but when we tried to convince our university it was not that easy actually but finally we succeeded and it was possible to set up this legal entity on the first day after the end of the project on the 1st of July then we had and still are struggling with bureaucracy and administrative issues so the real operational start is is now more or less so we got our tax number which is of course very important just a week before Christmas the main idea of a cooperative is that in many cases it is more efficient and will especially for the long term bring better results if institutions and people are collaborating and therefore cooperatives are on the one hand profit oriented but the profit is for the members and the member has to benefit directly from from a cooperative I believe that it is a really strong model a really strong tool which will hopefully bring good results in the mid and long term so we are really excited to start this and on the 1st of February the first two team members joined this legal entity so part of the team is still working at the university parts of the team are now working in the cope and Andy who is the managing director will talk more about it and I'm also very happy that Melissa Terrace is the third head in our board of directors and we three want to hope that we can run this cope yeah so I hope that you will join the right and the next speaker will be Andy Stauder he will introduce you or give you more details on the future of transcribers thank you so hello everybody ladies and gentlemen dear friends of transcribers I'm here to tell you a story about transcribers today and it's called after the ground because as Gunther already said the you project during which transcribers has been largely developed has ended and yeah we need to maintain the team and the technology and so on so we needed to find a model for doing that even after the the end of the project what to expect from this story at first I'm going to talk a little bit about where transcribers comes from when and where it was developed then I'm going to talk a little bit about the technology so what can transcribers even do because probably many of you here today are already familiar with most of the things that transcribers can do but as I heard there are also many newcomers today or people that aren't that familiar with transcribers yet so I'll try to talk a little bit about what transcribers is about anyway and yeah the third part then is about the future of transcribers or a little bit about the state it is in right now so economically and also in terms of the user base and what it's hopefully going to be in the future so let's dive right in the development of transcribers began during the transcriptorium project which ran from 2013 to 2015 so it was a three-year project as well and the read project that probably more among you are familiar with and this was also a three-year project and the three main goals of these projects I've listed at the bottom here they were enhancing HDR technology obviously because that's what transcribers does mainly and this was done by conducting research into pattern recognition machine learning computer vision so all the stuff that you need to recognize handwriting another important goal was networking this was done by hosting scientific competitions workshops providing support to users and so on so to build up the network and the user base also and this brings me to the third and probably most important item here which is bringing HDR technology to the users so giving ordinary people who are technicians of some sort of an opportunity to train neural technology themselves so to train models and to use this new powerful AI that we have to nowadays for recognizing handwriting and eventually then we built or tried to build at least a service platform through which this can be done and that is essentially transcribers yeah so let's talk about the technology what does transcribers do the base technology is line-based neural text recognition so it's not only handwritten text recognition it's text recognition in general you can also use use it for printed material but why line-based I brought this example along you can see for lettery things I'd call them here you're not really sure what they are if you don't have any context and that's the same for a machine so if it doesn't have any context which is the line then it's hard to decide which letter you're dealing with at a particular moment and if you see the whole word then it all becomes clear at least if you speak German this is it because this is a German word which means event like the transcribers user conference and in context the letters are really easy to to recognize as long as you speak the language and works basically the same way for the machine as well so transcribers takes whole lines and decipher them using context and thus being able to recognize the texts without just looking at individual letters which aren't much to go on so and prerequisite for that is that you get lines at first because at first you need to figure out where's my line where's the line that I'm trying to decipher and this is done by AI or machine learning and we trained very large model for line recognition which is able to do this you get a page that doesn't have to look very nice necessarily from a yeah scanning point of view and transcribers neatly finds almost all the lines pretty well then comes the recognition itself here I brought a few examples of handwritten text recognition here this one is from the 18th century German and it's just one hand so the results are pretty good here you can see the words that contained errors and the errors were mostly minor so what you get is almost clean text this page for example has an error rate of about 2% so if you've got one hand and you've got a couple of hundreds of pages of grand truth you can expect to yeah land somewhere between two to five percent maybe character error rate depending on the script of course and the hand but as I said it works for printed text as well here we've got a newspaper scan the materials from the early 18th century as well and it's German Gothic script which is hard to read for modern OCR but not for transcribers even with scans of this quality it yields very good recognition rates here for example the error rate is below 1% almost half a percent and another very interesting feature is a trainable structure detection because once you've got the lines figured out and the text contained therein you're maybe interested in the structure of the page because a page is a very complex thing or it can be at least it gets it's got a heading a page number let's get the running texts marginals marginalia and so on and so forth and this is a type of information itself it's it being structured and this can also be trained in transcribers for example here we have a filing card and you might be interested for a series of filing cards to find the family name field as in this case thing that's circled in red and transcribers can do this you can train it to find a specific structure element in a whole series of images or in this case filing cards and one of the most magical things about transcribers is this technology called keywords potting which enables you to find words even if they haven't been recognized correctly using all the information that transcribers outputs in the background so you do not only get the full text but in the background transcribers produces lots and lots of information that you as a normal user don't even see and this enables us to find words even if they're basically look like this this marked word here more of a shake you know or whatever transcribers was thinking here is actually the name Mitalena as you can see in the handwriting in the image this was the German word that the user was looking for here and with the keyword spotting they were able to find it even if you would have never found it in plain full text not even using fuzzy search technology because it maybe it shares like two or three letters with the with the whole word let's take a brief detail from transcribers because something that's very important is before you start recognizing handwriting you need to have images so in the beginning was the image and for that you can use the scan tent you can find out more about it here and what it is is basically a tent that you put on your desk and you can use your smartphone to scan archival documents for example if you're in an archive and upload the images directly to transcribers and recognize them there so this is also a very important component and now let's move on to the third part of the story of transcribers and I had it this sharing is caring and together we are strong I couldn't make up my mind which one was better because they're both true because this is exactly what read co-op is about sharing and doing things together in order to do them better or to be able to achieve them at all and this means what's one of the most important things for us is the user base because there are many many power users who spend hundreds and thousands of hours using transcribers to boost ground truth or just to work work with the material and this means user power in turn so this is basically one part or one very important cornerstone of the whole transcribers program platform so to speak and yeah let me just give you a few numbers and figures as of now we have around 32,000 users 200 to 400 of them work with transcribers daily and about a thousand per week a thousand unique uses they have trained by now about 4000 models and this using almost 9 million images which is really a lot and the most staggering part that I can tell already talked about is the sheer amount of ground truth that's in the platform right now this is a total of 400,000 pages so almost half a million pages that have been either manually typed or at least corrected after a first run and they if you value them at 20 euros per page which isn't even that much this converts to about 8 million euros worth of ground truth in the platform and this is really staggering or if you convert it to work time it would be about 200 years of one person working on this 200-person years yeah and this just shows how huge the interest really is in the platform and in the technology and we get daily requests from universities archives family researchers from all over the world and to make this even easier and more collaborative we are coming to the web in the future but unfortunately I can't tell you really that much about that today because it's still all work in progress but it's something to look forward to and this graph shows the development in graphical form here you can see the user numbers and we've been able to almost double them every year since the beginning basically so at the end we have the 32,000 users here and at the beginning it was like a thousand or something the huge transition that we're in the middle of right now is the transition from project organization you could call it to and yeah a company basically and I wrote a kind of business because business is not what we are mainly doing as I'll explain in a moment so what's this European cooperative called read co-op about it follows the motto of Friedrich Wilhelm Reifersen basically he was a social reformer and one of his ideas was that what one cannot do alone many can do together that is to work towards a common goal and to thus even make possible to achieve this goal and read co-op is such a cooperative a European one which is supposed to make bureaucratic stuff within the EU easier it's supposed to again to is laughing already because yeah there's still a lot of red tape and a lot of bureaucratic hurdles but I'll talk about that in a moment a little more one of the advantages of a co-op is that you can take on members very easily you don't need to change the the basic contract of the company as for example with an LLC so you don't need to go to a lawyer and notary and so on whenever you want to take on a new member members can join very easily and we also set the the price relatively low so almost everybody should be able to do it if they want because you can start buying one share which is 250 euros and there's a four share minimum for organizations to balance it out a little bit between private people and organizations so we don't want anyone to be too powerful on their own within the organization and one share as a private person gets you one vote and four shares get you one one vote as an organization and 25% of the amount of the shares that you bought that's your annual members membership fee so if you bought bought one share it's like 60 euros so nothing really also it's very democratic we've got the board of directors as Guntur already mentioned that's Guntur, Melissa and me and the general assembly that meets annually which consists of all the members of the co-op where you can vote elect the directors vote on the decisions that we're taking and so on and the main goal of a cooperative is really the direct benefits to the members and these benefits with read co-op are mainly discounts on the pages that you buy so if you remember it's a lot cheaper you can also buy them if you're not a member or if you can't be a member for yeah administrative reasons which is sometimes the case but if you remember it gets a lot more affordable transparency you're part of the company you have a right to know basically anything that goes on within the company you've got a right to vote as I said that means you can font shape the future of the company as well because you can elect the directors you can vote on the decisions and so on so you can sort of be part of it and one very interesting thing especially for organizations is that you can do funding rounds so you can co-fund for example new features with entrance creepers for example imagine you have a feature that costs about 100,000 euros to program if there's 20 members that need this feature it's a really minor amount that you need to spend and you can get around having to do the whole song and dance of a public tender which you would have to do if it was if it was a hundred thousand euros even if you had access to the money you would probably be obligated to put it out to competitive tender and yeah as I said we don't want any single one to be too powerful within the organization and this is made possible by share cap you can't buy more than 20 shares in the co-op and this distributed ownership is also ideal for data sharing because you're not sharing your data with some company in another country which you have no control over but you're sharing it with your own company basically with the company that you're part of because in in essence with a co-op customers are owners and owners are customers and this is a really beautiful thing I think yeah this is a picture of the founding meeting of the four founding members the meeting took place as going to set on the first of July last year and until we were fully operational it took us like half a year we had expected it to be a couple of weeks but it ended up to be a half a year it was a really long and winding road courtesy of yeah Austrian and European bureaucracy so it wasn't yeah basically any one person's fault or anything and another thing is we did it in the middle of summer so if you want to start a company take my advice don't do it in the middle of summer because everybody's going to be on holiday and everything's going to take twice as long than it normally as it normally would but we had a lot of wind underneath our wings because there were already lots of institutions and also private people that wanted to apply for membership and this really gave us the strength to power through this bureaucratic marathon so nothing comes from nothing as all of you are aware so we need to talk pricing a little this is I guess very interesting to many of you because you're going to want to know what transcribers is going to cost you in the future and how you're going to manage to find that money it fits on one sheet of paper basically so we tried to build a relatively straightforward pricing model the prices will be somewhere in the range between 22 and 13 euro cents depending on how many pages you buy whether you're a member or not these are the standard prices and then there will be custom pricing of course for very large orders or for additional services so if you need lots and lots of support for a project if you need help with generating the ground truth if you have a couple of million pages or so on then of course we need to tailor this to the specific contract but for the regular payment there are two models one is a prepaid model you've got five price tiers or package sizes so depending on the package size of pages that you buy that's what the price depends on the larger package the lower the price and members get a 10% discount on this price and the other option is an annual subscription which is basically the same the under difference is it's recurring that means you commit to buying the same amount of pages the following year as well which might also be very interesting to institutions because they often can conclude such ongoing or continuous contracts and you don't have to apply for the money every year again and if you are a member and you take out the subscription then you get an additional discount that is you don't just add the 10 plus 10% but you get 25% off so it really pays off for members so being a member yeah makes the whole thing a lot more affordable and you can buy additional non subscription packages at any time of course so you're not sort of yeah locked in by this subscription you can buy more pages any time you want as a regular package the prices you can see here on the right these are without the discounts so these are the base prices depending on how many pages you buy and everybody gets 250 pages for free when they sign up everybody likes freebies and we like to keep transcripts free at least to some extent so we really are thrilled about community and we of course also want to give something back and what's also going or what's really going to remain free is storage and training which all of this means costs for us in terms of the infrastructure and for you as well if you are a member because you need service to run the whole thing on and this is going to remain free but there will be a deferrable auto delete feature that means you get a warning look you haven't touched this collection in three months we're going to move it to the archive there it's going to stay for another nine months you'll get some warning before the end of that as well and then it's deleted so you can postpone this as many times as you want but we don't want stuff to just be lying around with nobody even being interested in it and it blocking up space and resources for other users and we are hoping to start with the payment models in June but yeah it might be a bit later it might be a bit earlier but that's the very rough time frame and all of this is of course subject to change so this is the first draft that we're going to use for the payment model of transcribers and we'll of course also we are happy about any feedback that you can give them as users and we'll take a close look at the numbers and we'll try to make it work for for everybody and yeah these funds of course mean that we can do a lot of things and that we can keep transcribers alive and here are the things that we've been able to do with the larger projects that we've already started working on in the background with a couple of larger partners and clients which have kept us afloat in the few months up until now after the end of the project and I'm happy to say that we could keep on the whole transcribers team and even extend it because the work is getting more as well with the more users you have the more work means of course we have been able to keep transcribers alive the most important thing keep the technical infrastructure running also keep improving transcribers and adding new features and keep HDR available to everyone so this the thing that I said earlier that we want to make it available to almost anyone who has a computer basically and keep transcribers free sort of at least so at least there's the 250 free pages and the free storage and also we've been able to keep organizing a transcribers user conference and I think the family photo it's maybe going to be a bit bigger this year I hope it's coming out that nice as well and I hope everybody is going to be in it I don't know how we organized this anyway you'll get details on that later and yeah you'll also know when to run away if you don't want to be in the filter so that's it from me and if you want to be part of it all then you can find out more at these addresses and if you want to join I don't know can you read it as I tried to really make you see that you want you to join because that's basically the most important thing for us right now to get as many members as possible yeah and to keep the whole thing going yeah hope it's been an interesting story and now you know a bit more about transcribers and what it's all about and what you can be part of thank you