 Yeah, welcome everyone to this round table. This is a somewhat open and interactive format. So we will be talking to each other. I will be moderating mostly. But the panelists here will talk to each other about various issues surrounding large language models. Everybody's favorite these days. And yeah, we will also involve you, the audience. And during this session, you're actually allowed your phones to have your phones. So yeah, that was too quick to go back. Yeah, please everyone use this QR code to enter a little survey that will be that we'll be conducting. Should be working. At first we're going to ask you a little question about who you are or what you do basically, because we would like to see a little bit later on. Yeah, if that has any bearing on your responses. And yeah, the main purpose of this interactive element is that we would like to involve you in the future of how LLMs are going to be used or made available inside transcribers. So we would like you the community to inform the development, the future development of transcribers in terms of large language models. Everyone ready with the QR code and the web link. We'll show it another time later on when the actual questions come. So now, yeah, this round table is going to be about large language models as I've already said, what's possible with them, how they're already being used in the field by the experts on the material. And yeah, what's problems around them implementation wise, ethically, economically, ecologically, so there's a lot of stuff to discuss around large language models. And yeah, let's take a look at the panelists who's with us today. And the handsome gentleman in the middle is David Brown, who is joining us from Trinity College Dublin. And he has worked on delivering a series of high profile digital humanities projects, including the down survey of Ireland in 2013, and the virtual record treasury of Ireland in 2022. And the author of Empire and Emperor Enterprise published in 2020, a study of early modern merchants in the Atlantic world, along with numerous book chapters and journal articles. A strong believer in the beneficial use of generative AI as a practical research tool. And demonstrating these capabilities to librarians, archivists, and academics over the past year so very active in promoting what can be done with large language models in the humanities community and memory institutions community. Then, joining us from the Amsterdam City Archives, we have Nick Vauhoef, second on the left, stage left. Yeah, who serves as a project manager for digital innovation at the Amsterdam City Archives, focusing on automating workflows and integrating systems to streamline archival processes. Nick spearheads initiatives aimed at enhancing the efficiency and accessibility of archival resources. And his work involves leveraging cutting edge technologies to ensure that the vast collections of historical documents are preserved, digitized and made available to the public in more interactive and user friendly formats. Yeah, together with his colleague Pauline, who's also joining us today. Yeah, they, their dedication to innovation positions the Amsterdam City Archives at the forefront of digital archiving and the preservation of cultural heritage for future generations. So a very important institution in this regard. And we're happy to have two representatives of this pioneer institution. Yeah, and as I said, also joining us is Nick's colleague, Pauline von den Hövel, who is an innovative archivist at the Amsterdam City Archives and leading the use of handwritten text recognition and large language models to digitize historical documents. Since 2010 she spearheaded crowdsourced projects and from 2017 worked with volunteers using transcripts for ground truth data, focusing on 17th and 18th century archives. Her aim is to make Amsterdam City Archives fully searchable and globally accessible. So very much in line with what transcribers is doing or trying to achieve as well. Yeah, and she wants to ensure AI aids, yeah, aid that AI aids rather than replaces human effort. Pauline is dedicated to improving HDR quality and exploring AI's role in archival work, striving to make historical insights more accessible. Then also among our guests is Helene Wilbrink joining us from the Utrecht Archives where she's a program manager. And yeah, Helene focuses on enhancing digital accessibility of archival materials. Collaborating closely with her colleagues, Helene works on the application of AI, specifically HDR and LLMs to unlock historical texts. One of her core interests is implementing linked open data to interconnect and share information seamlessly. If you remember the generation of knowledge is all about connecting things. And yeah, having contributed to several Dutch transcribers projects, Helene combines her technical expertise with a background in Egyptology to innovate and preserve cultural heritage. Demonstrating a unique blend of historical passion and digital innovation. Then we have two members of the transcribers team, one of whom you have already had the pleasure of listening to during yesterday's R&D keynote, Michael Lustoszewski, he is our head of research and development and a data scientist with a background in translation studies and computational linguistics. His main responsibility is the management of scouting, testing and further developing technologies that at some point end up in the transcribers user interface. This includes handwritten text recognition, natural language processing pipelines and visual analysis techniques. And last but not least, another team member, Gregor Lanzinger, the very stage right. He is our chief technology officer at the Reed Cooperative and an open source enthusiast. He is passionate, he's a passionate advocate for open source principles at the company as well. His interests extend into the realm of large language models, reflecting his fascination with artificial intelligence and its potential to revolutionize our interaction with information. Enhance problem solving and shape a more informed and innovative future, especially when it comes to historical documents. So those are our panelists. And now let's take a quick look at what we are going to do here or what the structure of this round table is going to be. Here again, you see the panelists and institutions that are represented here on this at this round table. The principle here will be that we're going to do short kickoff talks, which will be followed by discussions where we will discuss among ourselves or the panelists among themselves. And we will also involve you, the audience. And the third part then is your chance to shape the future role of LLMs and transcribes, because we are going to do a survey in the future about how we are going to incorporate LLMs into transcribes. And we want your help to build this survey. So it's a meta survey if you, if you will. Good. Yeah, so then, David, if you would do the honors, let's have our first kickoff talk. And David will tell you a little bit about how he's been using LLMs in his work. Okay, thanks for the thanks for the introduction. We've been using transcribes for many years. And like many people, we've ended up with a large amount of text in our case, it's about 60 or 70 million words, which has ended up certainly since 2021, basically hanging around the place like a teenager, getting busy, not doing very, very much at all from what we can see. And we were beginning to actually wonder about how much more text we were going to generate on the basis that we weren't using a lot of the text that we already had. And then in March of last year, our transcribers text finally got a job, and we could make a go out and do something. Transcribers and GPT for which is what we use we don't use the free to our cheap ones as a reason why the cheap the ones you pay for are actually better. Gemini, which is also just been released probably has similar capabilities, but everything you're going to see has been generated to GPT for pilot was introduced as a GitHub in 2017. And the original algorithm developed by Google Labs, which was which introduced the world to transformers was also introduced in 2017. And the GPTs are based on this original transformer research, the transformer research is a language translation tool. So the GPT is basically also work as language translation tools. So the closer you bring your work to language translation, the better and more consistent results you're going to get. When you start picking stuff out of the internet and using generative AI, that's going to have the problems. So if you start off with a translation based approach, and that's either translating one language to another from one version of one language to another from a human language to machine language like asking you to produce code, you tend to get very effective results. So we'll start off with something pretty easy. Considering my colleagues we have a large collection of I think we have 20,000 Dutch pamphlets in Trinity. They're in the process of being digitized are in the process of being cataloged. Nobody in Trinity Bar one or two scholars speak any Dutch at all. We just ran us through in the normal way using trans people's print we produced a preferably reasonable HDR result. Still have no use to us. Maybe it's of use to my colleagues in the Netherlands, but none of our students can read this. So we put it into chat GPT we asked it to produce a translation in modern English, the check out all the flowery early modern stuff like, you know, here I trust you, etc, etc, etc. And gave us sort of summarized English language version of the original pamphlet it was checked by a couple of scholars you can read both I can't, which is a first little ethical pitfall that we have because I'm now working with a text that I don't understand the original of. And this translation step normally people don't even notice when I do listen presentations ago that's good, but they used to translation. But you're already moving into this world of machine has done something you don't know what it's done, and you do not have the capability to question what it's done, and you simply move forward now, which we will do. As our American friends say we're going to put a pin in this we're going to come back to this text in a couple of minutes, but we're going to go and look at some other examples. These are real world examples of what we've been doing with GPT for everything you're about to see has been replicated so it's not a kind of prompt lucky dip, which is what these things can often degenerate into. This is a poor Latin calendar of a document that was destroyed we don't have the original of the Latin is badly written there's a lot of abbreviation in it. It often contains mixes of old English words old Irish words, sometimes a bit of Norman French when they're quoting from a legal document. The handwriting is poor. The model that it was made for didn't use any Latin characters there's no ligatures in it there's no special characters in it. It produces a straight bad transcription, which you would wonder if there's any point in trying to fix up there isn't actually, but we put this into the translation program with a few rules so we always introduce. Keep the place names for same expand the abbreviated personal names expand the abbreviated words make sure you put all the office holders and positions into full. It produces a excellent English modern English version of that terrible Latin text. So we've jumped a few steps and one go we're not even trying to translate a good Latin text to try to translate a terrible text and it's produced a very good English language version which we can then do entity extraction on and so on and so forth. This is an example of a 17th century English parliamentary diary it's the appointment of a parliamentary committee. And again, we've used one of our early modern English transcriptions we haven't bothered correcting it. It's quite perfectly well. We've told chat GPT that this is an English House of Commons and 1642 document naming a committee so we've given it a bit of background. And what I've also told is that I wanted to extract the names, which you can take it as read is really good at extracting names you don't need to demonstrate it. But I also want it as a slide for PowerPoint. And I'd like to have a few words of background biographical detail for each of the people that is extracts. And I want to have this sort of bibliographical data, because we do a lot of work with archives and archivists. And one of the things that we like to do is to produce these 30 word summaries 12 word titles, extract data and so on. It's really, really good at the speed of thousands of these now from different types of documents, and you can basically just run your way through them. It tends to work best on chunks of text of about 400 words. Don't know why they won't tell us. You tend to do about 30 at a time with GPT for then you have to wait. And this means you're doing about 9,000 words in the morning and 9,000 words in the afternoon to have to be checked, which is fine. You can just about read through it and that's basically your workflow because there's no point in cracking the stuff out automatically and throwing it into a catalog. It has to be checked. So automating it further we found there's no point to it. So we produce our data. I'm going to find it would have taken me 20 minutes to produce that from what was originally a three page document. And then it produces it does a extraction of subjects. And it produces mark and do a decimal codes if you're a library or if you're an archive. We checked the codes to check for hallucinations and the codes are fine. We've done a few where they've been wrong. We've been getting a accuracy rate of about 95%, which is probably okay for a not terribly well trained library assistant anyway, who stuff would also have to be checked before it goes into the catalog. So as an assistant again, this is a very useful thing that it does. This is my slide. And the biographical information. It's basically extracted all the names in that original document. It's given their full titles. It's given you their full first names where Robert might have been ROB full stop. And the little biographies it's produced are actually very good. We don't think they're Wikipedia. We think they're from the State of Parliament, which is a bit more authoritative, but there's no way of knowing. But for a slide for presentation. Very good for students as well to just to be able to quickly see and to visualize their own work. A lot of time spent making slides and actually we find that any of us in the history department could have produced the biographical data very quickly, but it takes its ages to make nicely laid out slides. So just having a produced the slide and the self is useful thing that it does. This is from the 1641 depositions and this is a collection of verbatim transcripts by normal ordinary people. So this is one of the very few examples we have of how English was spoken by regular folk in the 17th century as opposed to prepared speeches are trained text or things that have been cleaned up later on. And again, we've used this for entity extraction before in a previous project from the original text using space in base NLP tools and so on a few years ago. And again, we wanted to see how well this might work. So again, we've done the translation stuff this is the original text. This is the translated text, and we've asked to do a close word to word translation not a summary like it but the original document. So the all of the words have been translated into modern English, the grammar has been corrected into contemporary English, and it's corrected the grammar that enables the NLP to work so well. And we asked it to produce a table to do the entity extraction we wanted first name we wanted surname we wanted rights and occupations, we wanted gender, and we wanted the relationship between people mentioned in the text. And it has done all this perfectly. And then the total document those about 42 people mentioned that we ran through in blocks of 400 words at a time with a 2000 word document. We then later on tried to assign C doc CRM codes to the relate to the relationships wasn't great at that. But by when we put in a training table of this relationship codes that we wanted. Because it has generated this text so it knows what this text is. Again as a time saver it's excellent. We could find no reason not to use this in a much more general way, which we will do. This is kind of last, I think it's our last big text example. This is to extract place names from a deed. It's quite challenging because the county which is a upper administrative level in Ireland is mentioned at the beginning. There's a lower administrative level mentioned at the end and then the small parts of land are mentioned in the middle of the block of text, and we wanted it to produce a hierarchical table of the text. So that I will be lowest denomination over to the right, which it did. So it managed to interpret the text and managed to produce a hierarchical table. We then match the place names against the table from the ordinance survey so it's your reference and we can just bang it into my map so whatever. Now we go back to our original document that we were looking at. And we remember we had that summary in English. Joe Clark is the head of the history department in Trinity College Dublin. And we uploaded one of his articles and asked chat GPT to do a linguistic analysis of his writing. What we're trying to do is show you cannot detect when students plagiarize using chat GPT. So it did a linguistic analysis of his writing. We uploaded the translation that chat GPT had done asked it to write basically a normal student type of presentation essay. And it did a really good job. This is this is Joe, this is Joe's text. And Joe uses a lot of. Proverbs and examples of what's happening on the ground locally. And this is what it produced. And it's put a Dutch proverb in the middle and. And if a student gave it to you go, that's fine. Read that out in the tutorial. And we'll discuss it later. There's nothing wrong with it. It is historically accurate. It quotes from the original document, which is what you wanted to do. It's put in a proverb. It does everything you want. And then finally for our last slide because why not. We asked it to produce a painting of the essay. Admittedly Charles II looks a little bit like Jesus. Based the painting on the proverb, the Dutch proverb, the best helmsman stands on shore. We asked to produce in the style of the Dutch master. And that is where we will finish with some of the capability that we've been using. Thanks a lot. Yeah, as you can see, there's lots of fun things that you can do in a post processing workflow after you have transcribed the text with transcribers. Yeah. In one sentence, David, what would be, what would you say are the main strengths that you see in the LLMs in your workflows? And we've only used one seriously. We experimented with the other ones and found they weren't as good. So why bother? We find that it's basically helpful as a research assistant. If you ask it the kind of questions you ask a research assistant in exactly the same way, you can coax it into doing what you want and to replicate those tasks quite quickly. So really it's a productivity tool. And that's how we look at it. It doesn't replace people. It doesn't do things that we can't do. It doesn't do things as well as most of us would do things. But it does speak things up and there's no doubt about it. These are useful tools once you get the hang of them. And the best way to get the hang of them is to play with them. Okay. Thank you. So are there any comments from the audience? Thank you, David, for your very interesting talk. I was wondering and giving the pretty good results that you had was using the chat GPT or GPT-4 in translating from, let's say, poorly transcribed text to pretty legible and good quality text. Did you play around with the settings in the playground? Did you play around with temperature and top P, top K, et cetera, to get these results and get the hallucination out to a limited level? No, we only gave it conventions. We just told about how we wanted various things like people in place to be represented. There are a few terms that are specific to Ireland, particularly with places. And we had to tell them that this particular word means whatever it meant. But other than that, no, very little, probably a 200-word setting. And then we just let it go. It seemed to learn as we went along because when we started doing that in July with those in positions, they're a very important source for us. When it was an update in, I think, in September, it didn't need that pre-prompt anymore and we started a new job because it doesn't just what users put into it, which we're perfectly happy for it to do because we're publicly funded, et cetera, et cetera. So, yeah, we did find that it improved over the course of six months as they did upgrades there and they were putting in, obviously, what we were asking you to do on many courses around the world. Thank you. Yeah, thank you for this very interesting question. Parameterization actually is also a very hot topic when it comes to product development in Transcribers because we don't want to overwhelm users with too many parameters. On the other side, we also don't want to hide them too much from them. So combining those more advanced use cases with an easy to learn and easy to work with user interface, that's actually one of our main challenges. And yeah, keep that one in mind for the questions later on. So you can put that into the survey then. Good. Yeah, I think we have to pick up the pace a little bit. So I would like the colleagues from the Amsterdam City Archives to come up or Nick, I think, you will be talking and to tell us a little bit about what you've been up to and thinking about when it comes to large language models. Yeah, thank you very much. No, after seeing similar examples, David gave us over a year ago. Our main focus is how can we bring this into production. And we took one of our workflows within the city archive, which is making indices on sources with a lot of person observations. In the past 25 years, we made over 25 million person observations, first through outsourcing and after that through crowdsourcing. But it's a lot of work. A lot of people work on it, hundreds of volunteers. And you have to make choices on what to indices. Do we only take the names? Or do we also take the occupations or other data that is in the sources? And because of its that much of work, there's one big hiatus in our data set. And that's the civil registry, one of the most important genealogical sources we have. Why is it hiatus? It's because it's so much. It's 280 meters over 8,000 registers, over 3 million deeds of birth, marriage and death and over 18 million estimated persons mentioned in these deeds. The past year, we've been working on making a pipeline using transcribes, open AI and our own collection management system to indices these 18 million persons in just a click of a button. And that's what I'm going to show you next. This is the pipeline. There's a small video that demonstrates hope it starts. At the moment, we're working on transcribing all the registers with transcribes. And then with this pipeline, we get the transcription from the transcribes API. And with the good prompts, I've been tweaking this prompt for weeks and weeks. You can get a really nice clear structured data sets from this piece of text. You have to give a lot of parameters. I gave it the instruction to don't change the text data, et cetera, and which labels I want to use. And after that, this string of text is made into a JSON object. And we can reuse that to make records in our collection management system. And here you go. There we have a record for you. Born on the 24th of May, 1916 at the Peter Langerdijk Straat, number 20. This is showing that our collection management system is capable of linking people to other knowledge graphs, such as Wicked Data. And now I'm planning to do something with graph algorithms to also automatically link all these persons to other knowledge graphs. But that's for the future. So that was my small presentation of what is capable with this technology. Okay, thank you. Thanks a lot. Wow. Looks like LLMs really are becoming part of the staff. It looks like, would you say that's an accurate statement? Yeah, well, I think LLMs are way better in extracting information than all the other natural language processing tools we've seen. And they are also more user-friendly because you only have to tweak your prompts to get a good result. Yeah, well, I've also been working with an intensive recognition as part of this and never got a better result than 92%, but this is getting almost 100. Would you say you get the feeling that it's becoming part of the team or do you rather see it as a tool? No, it's a tool. Okay. Any other statements from the archivists on the panel? Helene, Pauline, what would you say? Well, I just love his pipeline. I would like to have it too. And Utrecht, would you have a different system? And we were discussing yesterday also over dinner that the prompts, asking the right prompts is so important. And I know Nick and we update each other, but yeah, I would also love to see his prompts, but also from other people in the audience to maybe get something also worth sharing to learn from each other. Thank you. Yeah, because this was actually a question that we asked one and a half years ago on a panel that was on AI in general. And there we were talking a little bit about, yeah, is AI becoming part of the gang, basically, or do we rather see it as a tool? I don't know what the general feeling is when it comes to that. Are there any comments from the audience? What would you say? Where is this going? Do you feel it's already like working with someone instead of something? I want to go. Hi. So when I've used it, I think when using it to do extremely structured things, so not treating it as a person, but treating it as an information processing thing that's taken one pot of information and putting it to different formats, not actually asking it to infer too much or change the data by just restructuring it. And I echo Helen's point about understanding the prompts that people are using to make sure that they can trace back. Because I think the danger is when, excuse me, when we ask it to infer or change too much and we can't understand the pipeline. But if we have clear processing instructions where we can understand how it's taking one bit of data and turning it into a different structure, then it's incredibly useful for the type of data that we need to do to build these online systems. Okay, thank you. So it looks like the general consensus seems to be rather that it's more than a tool than a team member. Dave, you wanted to comment? I found interacting with it that when I felt I was in charge, that I was the domain expert, I treated it as an assistant or a piece of software. When I was asking it to do things I didn't know how to do, like write a Python script to automate something, I found I was much more talking to it like another person or a teacher and it was talking back to me. It did get a bit snippy, where I would have to say that we were not friends, but we were colleagues for this particular Python writing exercise. Okay, yeah, very nice. And yeah, you don't need to be friends with all of your colleagues, right? I don't know. What do the tech guys say or the developers? I'm interested in the question if it changes your workflow, the profiles needed from your employees, because for me, do you have to look for other kinds of mistakes you're doing if you're working together with the church with the other AIs? And I see it more less like a tool, often like somebody talked to regarding that question, yes. Okay, so it's not that clear cut after all. Okay, yeah, let's keep moving on. And I'm asking you, Helene, to tell a little bit about your work. I also decided to improvise because I'm really curious also about the people here. So who already used LLMs like chat to PT on transcribes material did some experimenting. Well, that's quite a few already. And who used it in general, not necessarily transcribes. A lot more. Great to know. Yes, I'll keep it very short because we saw already great examples and I completely agree with Andy. Yesterday said with the quote that if you want to innovate, you do it together. So that's also what we did at the archives where I work. So when chat to PT came out already a couple of months later in February, we had we had a session with about 40 heritage professionals to share experiences prompts, and I think that's also great that we're doing that here today. Next slide please. Yes. So we do similar things that we saw from the others, a little bit less sophisticated. So it's not a complete pipeline. But for example, here you have Dutch notarial deeds where my colleague that you saw in the previous slide, Rick, he created with transcribers a transcription using the Duchess one model. Then by a human specialist, we had it improved. So it's also an important thing. What do you do with the quality control and where in your pipeline are you doing that. And then the improved version next slide please. He asked chat to PT and it was the 3.5 version to answer questions and give it back in a Jason format, the things that people have been doing experts for many years to making our notarial deeds accessible. That's quite similar to what we already saw. So who are the key players. What are the roles. When was it written where give abstract. And I think also seeing David's presentation it's interesting also whether you first translate and then do the extraction, or whether you do it from the original source and you want the original spelling. So that kept me also thinking about how to make your pipeline and in what order to do what. And then I give the floor to Nick. Yes, because these things are very great. But also come with some or maybe a lot of risks. These are just some headlines I found on the internet. Besides all the misleading and fear of AI taking over the world. They also have an impact on our environment. They are trained on a lot of copyrighted data. And we don't know what all these big tech companies are doing with the data we put through them. So these are some things we are going to ask you some questions about further on. I don't know the right answer of how and whether we should use these technologies. But there are some studies already available which compare different models and different technologies. Can you move this? Yes, this one was done by Stanford and they checked whether these models are complying with the draft EU AI Act that passed last year. And you see that none of them complies wholly with the values we as a European Union have stated that AI technology should comply with. Moreover, our AI team at the municipality of Amsterdam did their own research on comparative research to models. They left out the more the parts of copyrighted data and energy use etc. But they looked at the ethics and the outcomes of the model. Can you move on? These are the criteria which they scored on. Because we at the municipality of Amsterdam are thinking of a way how can we make a choice if we want to use this technology. Which one should we use for what purpose? And so you can say that if there is a high risk of processing personal data and making decisions about people, the use of these technologies will not be allowed. But if there is not such a risk and you are working with open data, maybe it can be very helpful. So this was the result of the Amsterdam study in which you see that for Dutch, because they worked with Dutch prompts and Dutch data, there are only two, maybe three models that really work and the most difficult problem is that only one and that's the GPT 3.5 turbo is usable on a large scale. So having seen this, we have some questions for you and I think I'm going to give the mic to Paulien to do these questions. Yes, I don't have a screen. Sorry, Helene is the next speaker. Thank you. Go back to Helene first. Yes, and I really recognize what Nieke is saying. What I see is that in the Netherlands as here also most people are using open AI to experiment or ready to implement. But I heard often that there's a need for alternative and I completely agree with that. And we are very happy that in the Netherlands there's now a more open initiative that's trying to be very transparent and fair. It's called GPTNL and a couple of weeks ago, again, we put together the people from GPTNL and heritage experts to see how we can work together and also train it on historic data and for what use cases. And I think this is something that you see worldwide, but especially in several European countries. And I think for me that would be also really interesting to not only look at open AI options to integrate in transcribes, but also look at these more fair models. And hopefully also a little bit smaller and more efficient, because I think there's a need for that. Thank you. Yeah, this is the next thing. Yeah, maybe let's go back one slide. Keep this one up for now. Yeah, so I think what we've just seen is that one overarching topic here is tool chains and workflows. So stringing things together in the right way. This is something that we also discussed during the R&D keynote. And yeah, I think it's not as easy anymore as it was a couple of years ago when it was more about the technology. Now it's more about putting technologies together. So this collaborative element between code components basically is gaining more and more importance. I don't know what would our R&D department say. I'll just show the next slide because there we have some answers to it. Is the mic on? I would have suggested to just show the next slide because we are dealing with those issues on the next slide. Yeah, we're a bit short on time anyway. So I think, yeah, just let's go to the next slide and take a quick look at what we've been up to at the cooperative in terms of preparing, researching, testing the use of LLMs. Sorry. Exactly. As we mentioned yesterday in the keynote, we made experiments with unsupervised entity extraction and entity tagging. And here our major concern was really to standardize prompts that work well across multiple datasets and scenarios. So we wanted to get away from this copying back and forth transcripts from transcripts to chat, GPT and tinkering around with prompts. We wanted to, yeah, validate those prompts. And for this task, it is crucial to have benchmarking datasets. So before we can really start, yeah, experimenting, we need a robust databases with solid ground truth data. Then we can really go ahead and find out which prompt combinations work best. In this first task that we explored the unsupervised entity tagging, we devised a series of experiments that tested different scenarios. For example, single shot versus multi-shot and few-shot example feeding into the prompt. We experimented with single prompts versus chain of thought experiments. Chain of thought refers to a technique where the LLM is prompted to provide a hypothesis like a temporary output and then to revise step-by-step in a systematic and analytic manner the output to correct itself. And this yielded relatively good results according to the two benchmarking datasets that we used, but only with GPT-4, GPT-3.5 and all open source solutions were not capable of dealing with those complex prompt structures. Yeah, and the last bullet point on this slide highlights the main challenge that we see in order to reasonably work with unsupervised LLM extraction as an alternative to straightforward supervised approaches. We really need to focus on ground truth quality. We need benchmarking datasets. And some further applications that we would like to explore apart from unsupervised information extraction are language modernization. This has arose quite sometimes in the discussions already. Language translation, so from one variant to another. Band extraction would be really interesting. Document understanding with a wide variety of sub-tasks would be cool and we have put it on our to-do list. And HDR post-processing is also something that we want to explore. Yeah, we have won a series of challenges and probably the best or one of the most viable ways to approach those challenges would be to work on self-hosted open source models that we can tune and optimize with the historic data that is available in our ground truth collections. And regarding the challenges, Gego is going to tell a few words. Yeah, so why Gregor is moving up to the microphone. I think, yeah, what we can see here too is right in line with this tool chain idea, which means that things have got a little bit more complicated because as opposed to like HDR where you do more of the same until it gets better. So you just produce ground truth until you see, okay, I'm reaching the CRR level that I want. It's besides stringing together the right tools, one of which is HDR, it's also about the usage of those tools. So using them in the right way and tinkering with them. And I think this is a phenomenon that is typical of any new technology. So at first, there is no standardized way of doing things. And yeah, this poses many challenges. Gego, please tell us a little bit about those challenges. I want to speak about the challenges we are facing when it comes to hosting our LLMs on our hardware. Once Michael already mentioned this, we need to find the solution for resource friendly implementation. One way would be for example to have models which have a very good ground roof. This would mean we could reduce the number of parameters in the end up to a factor of 100, which would make it easier to run the models. Another possibility would be in this context, for example, to use techniques like quantization and that stuff. I don't want to dive too deep into that. Then obviously we need much more in-house resources in terms of GPU power for that because already now we are running on the higher end of what our hardware is able to process. And also expertise is needed because this is a relatively new topic and our resources are limited. But I think this is also a possibility for us to grow and it's very plausible that this technique plays an important role in the future. So I think this would be something to invest. Then when it comes to data, this is something we do already now, but it's important to do it all some future is the transparency when it comes to the training and fine tuning of the models. And obviously the control of the training data, which is more, which is different than it is now. And Michael mentioned already benchmarking. There we have a lot of expert knowledge for use case validation, for prompt validation and for standardization. And then for us, it's also interesting if we can combine our existing techniques or different kind of expert models with other lamps so that we have a workflow like Google is doing with that, and obviously one of the biggest problems, the alignment problem, because as we heard today in the first dark, in the past, the words had a different meaning. And so this maybe also has to be considered in training and into the process, depending on what you do. And obviously the hallucinations because we don't want to rewrite our past. Thank you. Quick question. Right. Thank you. Quick question right off the bat. Could you explain a little bit more what the alignment means because I'm not sure if everyone's familiar with this term. Alignment means nowadays a lot. There is things which are not politically correct. On the other hand, but also the alignment with the intentions you have. And when it comes to language, then it makes a difference. I mean, you as linguist, you know, that different phrasing can change a lot of the meaning. And this is a very big concept. And it's, I think it's very difficult to train because in the end, if I would have to find an alignment on certain topics with Michael, it would be spread impossible perhaps. So I have to find a model which is in alignment with everybody of us with the society. I think this is one of the biggest challenges, but this is way bigger. Okay. Yeah, so we're right back on the topic of culture. So that will probably be culture specific LLMs. Yeah, which will be used in different contexts. Yeah, but yeah, let's move to the last segment, which is again a bit more interactive. Yeah, let's build a survey together. Here's the QR code from the beginning again. I hope you all still have the survey, the meta survey open. If you haven't, then here's the QR code again. If you want to go there again, because we have a couple of questions about questions basically for you. Everybody ready? Yeah, I think we're going to move on. And here's the first question. Areas of interests. What are areas related to the integration of LLMs into transcripts? Which are the ones that you're most interested in or concerned about? These could range from technical challenges to ethical considerations. Please just type in your answers. They will come up here on screen and you can also upload the responses from others. And in the meantime, we're on the topic of ethical considerations. Yeah, since I was asking whether LLMs are becoming part of the team, I don't know what are the archivists saying? Do you get the feeling that they are replacing human labor? I don't know, Pauline. No, they're complimenting. They're doing the boring parts. And I think it leaves us humans for the juicy bits. Okay, yeah. I think that's... And if we're going to add, if there's one thing I learned, I think I also saw it this morning and also yesterday, that algorithms are generic. So if they find something unusual, they will label it as a mistake. But humans will know that they struck gold. And that's still a difference. And I'm really curious if that would be changed in the next transcript conference. Okay, yeah. That's a very cool detail to think about. Because, I mean, one main issue here is who is better at what? And yeah, who's better suited to which tasks? Yeah, and we are seeing the answers coming in. Very nice. Yeah, correction of HTR text, for example. Info extraction, the main topic of our conference. Structuring of data. Yeah, prompt engineering. We are going to ask you about that as well. Yeah, I think a lot of good stuff. And we are going to use all of that and look at all of that later. Then a second question. Specific challenges or opportunities? Are there specific challenges or opportunities you believe should be explored through the survey regarding LLM integration and transcribers? For example, what types of problems could be addressed with LLMs? What could have a positive impact on LLM integration? So are there any things that you think may be problematic or beneficial for the integration of LLMs? I think we can overrun a couple of minutes. So the official time of this session is up, but yeah, there's three more questions and then let's hope we can collect a lot of good feedback. I don't know, what do the audience say? Are there any challenges that you would like to say a couple of words about or any opportunities? What do you think, wow, this is really something that's going to make LLM integration and transcribers a good thing? Any thoughts or from the panelists? Or anything where you think this is going to be hard? Depending the corpus, which LLMs are based on, which goes in the direction of explainability, accountability as well, transparency. I think this is a huge topic. Mentioning of sources goes in the same direction. Historical language differences. Will it be able to overcome them? We've heard it before during this conference. Data protection, which is an ethical issue as well. But let's move on. Desired outcomes or improvements, how would LLM integration and transcribers make your life better? Which topics should we talk about in the survey when it comes to desired outcomes? So what are you wishing for? Named entity recognition, everybody's favorite. It's something that everybody understands and that provides huge value. Efficiency, yeah, saving time. I don't know any comments from the audience or the panelists. Feel free to raise your hand at any point if you want to comment on any of the questions. Yes, Elaine. I think it's also really interesting on what we saw yesterday in Gerson's presentation about how sites is better developing because we're doing it for opening our archives. And if you have the tagging and maybe also the identification with the WikiData to enrich your sources and make that also online available and searchable. So I think that's really interesting on how to make it available. Yeah, thanks. Yeah, availability. That's one of the core topics of transcribers, obviously, because bringing the technology to the users that's been our main goal right from the start. So then practical applications and use cases. Yeah, what practical applications do you see in transcribers? And do you think are crucial to investigate throughout the survey? Yeah, named entity recognition. Someone wants to make it really clear that we need named entity recognition. I mean, we're working on it anyway. Part of speech tagging, cool. Yeah, this may be very interesting to our linguist crowd. Yeah, linked open data. Yeah, improving the search options. Yeah, that's a very nice thing, very nice practical thing because we have or we already have an AI based search feature, which is called smart search, which basically just extends the search space to less well ranked variants of how the AI reads the words. So not just the best rank candidate, but all the words that it thinks. This could be the word at this place. But yeah, with LLMs, you could obviously do a lot in terms of search as well. Yeah, language translation. That's a frequent one, I think. Scribe recognition. Very interesting. Not sure if LLMs will be best suited, but at least on a more abstract level with stylometry. So identifying authorship by looking at how a text is written, not necessarily what the handwriting looks like. Very cool. So there's tons of stuff in there. And then we have ethical and societal implications. What do you think do we need to ask you about when it comes to ethical things? So what broad topics do you think are there? For example, one we've been discussing leading up to or during the preparations of this roundtable, data protection, data protection, I don't know, data protection, anyone? Yeah, that's obviously very, very, very important consideration. But also I think job security maybe. We discussed this topic before the roundtable a little bit. Yeah, we've already alluded to it a little bit here today as well. I don't know, David, what do you think? I was actually having a vague forward thought about entity extraction. And at what point will that become essentially something in the background that we're not actually very interested in? Because one of the features of large language models is to produce text that lets you communicate with people. There's blog posts, there are articles, there are poems or solace or whatever it is. And my thought was maybe we are focusing very heavily on entity extraction and being the theme of the conference. But maybe at the next conference we might go, yeah, we kind of done that now. But this is what we're actually using our historical text for. It's for explaining and helping the public understand the past in simple or complex but easy to understand in targeted language. Yeah, so accessibility is an important topic I think. So just because you can read a text through HDR that doesn't mean that you can understand it. And even if it's translated into modern day language that doesn't mean that you will understand it either. But returning more to the ethical and societal implications. I mean this is a societal topic as well because making history explainable. I think that's a huge topic especially nowadays with everything that's going on with social media and false information. Basically that's being spread throughout social media. Increasingly well we have to say because those messages look increasingly more convincing and real. Yeah, there it's very important to be able to interpret how things were before and what led up to the situation or situations that we find ourselves in. So what led to a certain political situation and why we are at each other's throats on social media when there's the slightest hint of a political topic. Okay, yeah, so I think there's a lot of good stuff there. Can I? There's another one. At one point. Because my personal biggest concern with these technologies is the second one, environmental concerns. And I haven't thought this over very good. But what I'm thinking of is that having access to technology if we should see it as a ground right just as having access to food, to healthcare, to housing is it possible to collaborate, to get together to make these technologies as efficient and less harmful as they can. And what I wanted to say today here on this conference I think that Transcribis as a cooperative tech platform could be an example for establishing this on a much greater scale than only within the heritage sector. But yeah, I don't know how. I would really like to think about how we can work together to have a cooperative platform to make tech more sustainable. Cool. Yeah, I think this is a very nice thought to round out this session. I mean, Transcribis and the Read Cooperative are all about collaboration and also our community is so diverse. It's amazing. I didn't expect to learn of many new parts of the community or specialties, basically, of people. And every other month or so, someone new comes along. Yeah, I don't know. I'm in education or I do art, for example. That's also what I learned at this conference here that people are using Transcribis to do art, which just blew my mind. And I think this is also in the future of Transcribis. So this will continue to increase even more. So there are going to be more and more people from more and more diverse backgrounds. And documents is where we used to store our information for centuries or millennia and also a couple of decades ago. So I think also more recent information is going to get more and more relevant for Transcribis. So, yeah, thanks a lot. Here you can sign up for the survey once it comes out. So if you want to take the survey and see how many of your thoughts have made it into the questions or catalog of questions, then please use this QR code. Just take a photo of it. You can use it later. Yeah. Can I ask one more thing? Sign up right away. Yeah. One minute. If we're all sharing, we as, I think we are more of data users and data suppliers. And I'm happy to hear that you're experimenting with all the kind of things and you have a different view. So will you be willing to share it like in blog posts or completely different view, I think on those? This is definitely something that we can do. And as you know, we like writing blog posts. So I think this is one of the next topics that you're going to see online. Thanks for this amazing input.