 So, next up is Felix Velgert from the University of Bonn in Germany and the title of the topic is Building a Topic Model for 19th Century Handwritten Sources, the immediate Zeitungsberichte of the Prussian Administration, a very nice long German word. So, the floor is yours, okay, yes, thank you very much. Yes, this is joint, okay, perfect. So, this is joint work with Alexander Emmerkopf, who's also at Bonn University, but is not here today. So, all questions are on me. Yeah, this is about building a topic model from handwritten sources and there are some things to entangle. As you said, why the Rhine province, what is a topic model and what is an immediate Zeitungsberichte. So, let's start with pressure that you're seeing on the right hand side of the slide. You're concentrating on the Rhine province of the Prussian monarchy because this is a very dynamic region in the early 19th century in terms of social modernization and economic modernization. And also in terms of state building, this region is one of the regions in Germany that adopted very early the French institutional rules set. So, these are all very interesting things, economic historians, I'm an economic historian, I'm interested. The Zeitungsberichte are governmental reports to the king that is the immediate part. So, the regional administration are sending one three reports to the king and his cabinet. And these reports contain information on weather conditions, agriculture, trade, security police and several administrative reforms. So, they are very rich source for constitutional, social, economic and as well as environmental history. And they also have a time and space dimension. So, we can compare different regions that are more rural, that are more urban, that are more dynamic in terms of social and economic modernization. So, that's a very interesting source, but it's hardly used by historians because, well, it's a quite large source. Alone for the Rhine province between 1816 and 1822, there are 44 volumes in the Geheimer Staatsschief in Berlin with over 5,000 handwritten pages. So, there are two problems here to transcribe these pages and to make sense of this large amount of textual data. And here is where transcribers is coming in, in transcribing this data. And here is where topic modeling is coming in, in terms of making sense of this information. What a topic model basically is doing is that it's a machine learning, an unsupervised machine learning tool that allows you to assign topics, so meaningful meaning to a large amount of documents. So, it's a form of distant reading based on mathematical analytical methods. How does our workflow look like? So, we have first the transcribers workflow, that's the classic workflow, I would say we digitized our documents via the DocScan app and the scan tent. We uploaded it to the platform, we first did a layered analysis, did a transcription, we also developed an own model and then we tagged sections and subsections. Because the thing is for a topic model, you need documents below the logical structure of each report. So, we did this tagging to extract these subsections and to produce our documents. And we used these tags to export TXT files and for this use case we commissioned a new export function that allows you to split TXT files via tags, so this is the start and end tag here. So, it's a very useful tool, I think, maybe you want to try it out as well. Then we have the topic model workflow, which is still work in progress. Here we are importing our TXT files with a pandas data frame or some other Python data forms. We pre-process and lemmatize, clean the data, then we train the topic model and the key here is that we are trying out different models, such as classic LEA algorithms but also HTTP algorithms that allow us to develop correlated topic models. You may also develop a dynamic topic model. What topic models are doing, they're spitting out lists of word and that is where the historian comes in. We have to interpret these lists of word and give them senseful meaning. And then we can assign these topics back to documents that are in our collection or to new documents and making sense of the meaning in these documents as well. We are basically working with Python, several Python libraries, Gansum, NLTK and Tomatopi and we are implementing this in Jupyter notebooks and it's also planned to bring this on the Jupyter Hub platform. So this is a picture of how these results can look like on the left side of the screen. You're seeing the topics in a two-dimensional space which basically gives you some information how these topics are correlated. On the right-hand side is a word list and the red bars giving you the frequencies of the most common terms in the topic. And the blue bars are the frequencies of these words in the overall corpus. So in this case the top words are rheumatic, fever, gastric and so on. So this is an epidemic topic here. And this is indeed assigned to a section which is called Krankheit und Fiese Euchen and so on. So what basically at this stage we can say our model works and that's a good thing. And again we can do this for all these categories the administration had decided on and when we calculate our topics we mostly assign the correct topic to these subsections. So what are the takeaways or what is the power of this model and where do we want to go from here. First of all, this model may facilitate the heuristic of a muscle. So it helps historians to deal with great amount of textual data. We can do a diachronic and synchronic comparisons even beyond the Ryan province beyond the period 1862-1872. We can also address some methodological aspects. So which topic models, which sources are, for which sources are topic models needed or are they needed at all in historical research, which topic models work best in these cases and so on. What we're also planning is the what I want to stress out these last bullet point here. We wanted to build a platform that students and other researchers can use where the engine model is some levels below so that we are facilitating the use of these models within the community. So thank you very much. Thank you very much for this interesting insight into an NLP topic. Yeah, and which many among our community are very interested because this is the actual information that you want in the end so HDR is a sort of. A step in between digitizing the document and getting interesting information out at the other end. So are there any questions that you would like to ask. Oh yeah, please Milan, just let me give you the microphone because you're close then. Thank you for the presentation. I was wondering how does this this workflow basically that you presented relate to your analytical process what kind of questions to ask, how does this in practice relate to your, your interpretation of your results and finding new answers and conclusions. So you're, this is on as well. Yes, yes. You're alluding to the topic model workflow. Yes. So, basically, that's first it's a holistic tool to, to better understand your resource material. But it's an iterative process and we are within this hypnotic cycle so we have to to read the document as well but but but the topic model is doing it can hint to us what parts of this huge math of documents. We should read first and yeah. So does this does do the topic models inform the formulation of your research questions or do you first think about research questions and then try to answer them using the topic models. We can do both. In this case we are coming from the topic models as I said these methodological aspects are also a prior priority and the research agenda because we see topic models in historical research is mainly newspapers. So, and one of the questions here is can we transfer these types of models to other sources. And how can we profit from them. So this is the priority research question this in this particular case, but in a more general from a more general view, I think the power of these models. First lies in this. Yeah, first step of hemorrhagic to formulate as a, as a tool to formulate research questions, but it can also go the other way around. Okay, then we have another question from one of the key. Why the topic modeling and not using a restricted vocabulary so that you are sure that you do not have the computer come pop up words that are not comparable to other words, for instance, if you your list actually resembles quite a lot of my research for vocabulary on police ordinances for the 1670 and 18th century. And you can train a model for instance within on if and you can then apply that to your sources, but then you're sure that the computer won't pop up words that's well you wouldn't be able to compare to other sources. So that's always the question about stop words you're putting into these into these models, basically, isn't it. The restricted vocabulary can only choose from the list that you provide. So that's the list of topics. So no stop words are in there. So in this case, what what what the, the thing is, what the advantage of a topic model is that you, you're not influencing the learning process in the from the from the beginning from the start. I don't have to say these are the keywords I'm searching for. I can do the computer do the learning and then I have to react and say okay. This is that topic. These are these issues and so on. So, um, it's, yeah. You may be surprised by a new results and what you're not what you're not what sing it think about while you're making this list. Yeah, I think the research design is. Yeah, it depends on what you want to do with your research design right and you have to tailor it to the purpose of your research then. Okay, we have another question here. You showed us these categories. And when did you link this categories with your data within transcripts or afterwards, and when afterwards how. So these, these sections. It was on one of the last slides here. Yeah. This is the topic model. So, um, yeah, or on the on the right hand side these categories of the Zeitungspurrichter are, are the structured kind of marginalia in the, in the reports, which is got marked handle and so forth. We are using that to do snippets of text. Putting then these are our documents. And then we are throwing these documents into the model. And the model does not know how the label is. Yeah. And what they become back is a list of words. And then I assigned these lists of words to topics. And I, I can then put that put the, put the whole thing back together. But these categories categories within to be attacked them within transceivers. These are transceivers text. This is what I wanted to transceivers. You wanted to ask another. Okay. Any other questions. If there aren't any, then I think we have made really very good time. Thanks a lot, everybody. Thank you to you as well as our last speaker today. And obviously, you're getting a mug too. Thank you, everybody. And yeah, this is the last session for today here at the university building and the conference dinner will be at eight at the, at the place called beer standard, which is in close to here in Innsbruck. You can find all the information on the website. You have QR code and the link on the back of your name tag if you haven't seen it yet. So on the cheat sheet you can find where you need to go. And now, yeah, you have some time to go back to your hotel relax a bit freshen up for the dinner and hope to see many of you there later on. Thanks a lot. Thank you.