 Hello everyone, my name is Patricia Martín Rodilla, I'm now postdoctoral researcher at TPU Center, which is a Center in Information Technology Research at the University of Santiago de Compostela, but this work is done together with my partner, David Barreiro, from NPTTC, which is my previous organization, and the main idea of this work is to think about if current technologies are able to help us in supporting some critical reading studies that are very common right now, and especially in heritage-legal discourse, in heritage-legal texts. So the main idea was to use natural language processing techniques and discourse analysis techniques in order to see what happens with these kind of critical reading ideas. So firstly I'm going to talk a little bit about the legal context and situation of heritage here in Spain, which is the main domain that we have analyzed, and also the ideas emerged from the critical reading that my partner from NPTTC, Barreiro Ambarela, was done, analyzing this heritage-legal text, and that emerged some working hypothesis as a starting point for this analysis. So we are talking about what is natural language processing techniques and how we can use it, and finally I present our work methodology and some first results and conclusions. So the legal context in heritage in Spain is quite like a moral springer at the beginning. We have of course laws at a national level, but most of them are of course transferred by all the competencies to the regional governments. So we have regional laws in each part of the regional governments. In this case we analyze the case of Calicia, because we read there and our staff analyzed it, because this law changed recently. We have the old law, which is the 1995, and we have the current law from 2016, and we have other ones from other regional governments that we compare with. The competencies and also the scope is quite similar between both, of course, and also between the other ones, but if you see the length of the text, which is very important for our automatic analysis, it's quite different. Not in terms of legal articles, in terms of things that we are going to legislate, but yes, in terms of legal general length and discourse, it's quite different. Now we are talking a little bit. So my partners from Interpithecis, sorry, published an article in Nailo's journal last year, Barreiro Valera Boussa, which called this, the new cultural heritage law of Valitia, a critical reading. When they tried to compare previous cultural heritage laws, another regional laws in Spain, from a critical perspective, trying to identify some hypothesis about the traits of change that are taking place in the administration, and a more important one to try to contextualize them as part of the neoliberal paradigm and doing some effort to criticize some process of legislation inherent in Galicia. So we said, okay, this is super cool, but is it possible for us to extract these hypotheses and to see what happens really in the discourse, in the text, with technology? This is the main idea of the work. So we extract the working hypothesis of my colleagues, which are sorry, four. The first one is the ontological expansion of the heritage concept. So they said, okay, in the previous law, the ideas about what is heritage and what is not are very different from the new law. So in the new law, there are more things that potentially could be heritage for us. So in the second hypothesis, they said the new law is more space-based. They present a lot of this kind of space-based heritage management. So for instance, some possible evidence in the discourse could be a lot of space-related terms. So grab uses, EIS, landscape reference, and this kind of thing. The first hypothesis is about the amplification of the symbolic and identity dimensions. So these kind of motivations about the heritage, what is and what is not, about our identity, some nationalisms, and these kind of things. And finally, the amplification of economic dimension is the full hypothesis of work. So maybe the heritage is more and more and more seen as our resources, as our economic resources in the new law than in the previous one. So we said, okay, let's do this. Can we detect these trends in discourse? In what way are they reflecting the real leg out text? And with the current discourse analysis technologies helping in analysis kind of critical reading and supported in some way, not only seen as an opinion of our experts, but also seen with technology. So we tried to use natural language processing and discourse analysis technique to do it. And what is that is the application of computational techniques to the analysis and synthesis of natural language. It has been applied to multiple domains, but legal domain is a classic one because it's a very narrative-based domain. Automation is composed with a fragment or control vocabulary, so it's good for these kind of applications. And we can do some tasks in an automatic or semi-automatic way, such as semantics, asymptotic parser, lemmatization, text segmentation, classification, summarization, language identification. So these kind of tasks are now in a good technology moment to apply it. And finally, we have discourse analysis, which try to work in a more abstract level than semantic and syntactic one. So the idea is to try to obtain from the text, for instance, causal relations or contrast relation generalization and these kind of things, which could be very good in order to analyze these kind of ideas in the discourse, in the real test. So we define our methodology in three phases. In the phase one, we prepare the test, because it's very important in this kind of thing. We have the legal text, the legal textual sources in Spanish and Galician, which is important for this kind of publication. It's not English, so we are going to talk about that later. And we ask our experts in the critical reading to annotate the test manually, okay, with all the terminology, expressions, and these kind of things that they think could be interesting for each hypothesis of work that we said before. We also select the algorithms that we are going to use and develop for this kind of analysis. In the phase two, this is the automatic part, okay? We plan to apply three models. One is natural language processing and current study, so the idea is to study the relevance of the terminology that our experts find interesting for each hypothesis in the real test, okay? The discourse analysis model, the idea is to extract the causal, contrast, relations, generalization, exemplification, this kind of relation in more instructional level. And finally, the agent subtraction model, which is a model in order to extract from text in an automatic way who is involved in the text. So persons, organizations, and these kind of categories of entities that could be present in one of these. And finally, we come back to our experts and say, okay, this hypothesis is important in the discourse or not, these kind of questions. So the real work that we have done, this is one of the tests in this case in Spanish, but we have annotation for Spanish and Galician. And the expert annotates all the tests, the real tests, and also all the kind of terms that they think that could be interesting for each hypothesis. We work all the time across the entire methodology and drives for this initial hypothesis of work, okay? And also we select the algorithms that we are going to use for analyzing these texts. In this case, we use linguaqit, which is a natural language processing technology from the University of Santiago de Compostena. There are others, even more standards, for instance, in fact, we have NLTK, or we have a Stanford, which is very good and tested. But the idea to use linguaqit is, okay, we have linguists working with us. So we have to go and to ask them for linguistic ideas and take the feedback immediately. Also linguaqit is a completely open source. We can go to GitHub, download all the code, contribute with a new module if you want, et cetera. And also linguaqit is multilingual support. So we have Spanish, Galician, Portuguese, and English. All the tags, the little tags. So token is a romanticization, et cetera, et cetera, in four languages. So it's quite good set. And this is one XML source. It looks like we have the test and we have the annotations. Okay. We have the annotation of our experts. Drives by this hypothesis, okay? So in the second phase, the first module was the natural language processing. The idea is that we have our hypothesis. All the terms annotated. And for each term, firstly, we start doing a frequency study. So we count the occurrence of the term in both law. Regarding tokens, this is only the original term that our experts annotated. And also at a level of lemma. So not only the term, but also all the terms of the same family that are in the text. Okay. But we realized that this approach could be a little naive because we have two very different length tests. So if we are going to talk to count occurrence, there's not comparison available at all. So our idea is to find some metrics or some idea, base it on the relevance of the term and not in the length of the number of occurrence in the text. So finally, we find this, which is the tfidf, term frequency inverse, which is a metric from information retrieval, text mining, and this kind of thing. And it's a numerical statistical indicator that says to reflect how important our is in a document. So this could be interesting for us. One term annotated by the way respect, how relevant is in our law in 1991, sorry, 1995 or in the new law. Okay. We have all the model called discourse analysis model. The idea, the initial idea was to apply retortical structure to theory, which is a theory to analyze the text in these terms in order to extract, for instance, those of relation and simplification, attribution, constructs and see the reflection of our critical reading and codices in the discourse. But we have a problem with this model because the automatic parsers are only available for English at this moment. So we have our text resources in Spanish and Galician and are very specific tests about laws inherited that are very difficult to translate in a good quality and with all the quality of the text. So we plan to, we are now in contact with some groups that are working in linguistics to do automatic parsers for Spanish in this way. And we are in contact in order to work together in order to be able to analyze this and to extract these caustals and exemplification in Spanish for our legislation text. And finally we have the Agents Instruction module. So we ask about who are the decision makers and who has heritage, legal competencies, or use right or exploitation rights in our best. So we develop an ad hoc algorithm based on the name entity recognition of lingua kit. And the idea was we extract with the algorithm not only the persons that appears in the text, the organization that appears in the text, but also the roles of people that appears in the text and also the job positions or this kind of thing. So for instance that is calling natural language parsers in titles, but it's for instance the president of Shinta de Galicia, this kind of structures that could be interesting for us. In order to answer these who are the decision makers. So first results. We organize the results by hypothesis. So our initial hypothesis remember that it's this ontological expansion of the heritage concept. Sorry. Okay. We have in the first column terms that are in the relevance in the first law. We have in the second columns terms that are relevant, especially relevance in the second law. And we have in this column terms relevant for both law. But of course we have in the metrics a higher score in one than in all. And if you see in this kind of, in this hypothesis we can see that a lot of terms related to these are relevant for the second law. And also we observe that the abstraction of the terms in the second law, for instance memory or nature or relevance or use are higher than the level of abstraction of the terminologies in the previous law, than Museu, Monument and this kind of thing. So maybe this heritage, sorry, ontological expansion of the heritage concept is supported by this kind of uses, to use a more abstract terminology. Regarding also the agency detection, the decision makers, we are not sure about these changes in the decision makers. So we extract a lot of decision makers that are relevant on both documents, but we observe that local institutions are more relevant in the second law. So maybe there is no more agents in the second law but there is more local agents in the second law. So maybe local institution are more involved in the current law than in the previous one. Okay, this allusion to Galicia and this kind of territorial institutions, et cetera, could be interesting to analyze it. The second hypothesis is related to the space-based heritage management and for us it's quite clear that all these terms, urbanism, urban planning, topography, geographic, et cetera, sorry, are more relevant in the second law than in the previous one. So this is an hypothesis that it's quite clear the support in the discourse. In the third one, it's not so clear for us because these allusions to Galicia, Galician, authenticity, identity and these identity terms are not relevant in the second law more in the previous one. So for us, this is the hypothesis more weak in this sense in the critical reading. And finally, the amplification of the economic dimension is also quite clear. So terminologies like, sorry, terms like prosperity, wealth, employment, sponsorship, patronage, sustainability are present with a high score of relevance in the current law and not in the previous one. So this is more or less the final conclusion. So we have three hypotheses more or less support and one hypothesis not so clear. And as a conclusion, the critical reading identifies in our opinion in an accurate way trends of change in cultural heritage legislation and these changes are clearly reflected in the discourse. So we can go with technologies, we can go and see it on the text. These results show the value of this sematomatic analysis as a suitable basis for critical reading. We envision this kind of methodology always as a supervised process in direct contact with the authors of the text, not an automatic analysis. And of course, it's completely source dependency in terms of language. So we can do it because we have tools in Spanish for doing it. And in case we don't have it, we can't do it, okay? And legal is a very good domain. Maybe if we touch all the kind of domain, it's not so clear. As a future, we plan to apply this discourse in Spanish, also work with a comparison with a general use language corpus so in order to see if one trend of change is related with trends in general or changes in general language uses or not. And to apply its own semantic polarity. And the idea is to go towards a design methodology to use this discourse analysis in critical reading support. So thank you very much. I'm sorry for...