 I am PhD in linguistics, and I am co-founder of Mujeres Tech to develop more, change the radio of female presence in digital sector. In fact, seven years ago, I was here in Big Data Spain to see how can monetize my PhD about linguistics, apply linguistics. And what was amazing for me, because two reasons, because there is a few women here in Big Data Spain. Now, there is more women in IIC. There is a little diversity of identity and gender. And also, there wasn't a diversity of knowledge, because I didn't see any talk, any keynote about unstructured data focused on linguistics. It was a pretty shucked with me. And now, for me, it's great to be here, because I'm going to show you that linguistic people, we are now a very, very, very key profile in a lot of companies of artificial intelligence. So maybe you work in data, I suppose, and I guess. And you know that 80%, according to Garner, of the data of companies, are unstructured data. So what is unstructured? You perfectly know. But these unstructured data, the nature of these unstructured data, most of them, is linguistics. It's linguistics. And we are going to review or listen about types of artificial intelligence. As Russell and Norby say that human machine, we develop linguistic people code machine, machine. And computational linguistic profiles are coding for human to machine. So there's a system that think like humans, and systems that act as humans. So in the development teams, they need humanistic profiles that anthropologists, psychologists, linguistic. And machine, machine, I am not going to focus on that in this keynote. So when we talk about linguistics, this is not binary system. There's super, super complex system. If we see in here, this is the core linguistic, the sounds. All starts by physics, sounds. And structures and sense, words. And I use all these things. I use for any reason, for any communicational end. And if you see, there's a lot of areas of knowledge that I have to use linguistic. And if a lot of fields are about linguistic. And I'm going to focus in my keynote in this one and in this one. But if you see, linguistic, we are forensics too. We can work in CES and physique. So when we work in artificial intelligence, I don't know if you know Nuri Oliver. I recommend you to read this speech, this amazing speech. Because he invite us to this speech is Intelligencia Artificial Fiction, Realidad y Sueños, Artificial Intelligence, Fictions, Reality, and Dreams, that talk about the need of now, the half, the knowledge, scientific knowledge, and humanistic knowledge, and share all this knowledge. And she talk about what is the artificial intelligence versus human intelligence. And if you read all the bullets, all the bullets, are linked about linguistics, semantics. Natural language processings, how to express doubts, how to express inference, so there is the challenge. And she express about semantics. And this is the core of, and who works with artificial intelligence knows, and who works with the people too, because it's hard. Because we can get semantics, we can get insights from the sound. If I say, if I use, in a call center, if someone call me and say, this sense in this expression, in this sound expression, word, sentence, paragraph, all have meaning in every level of linguistic level, and also context and knowledge of the world, cultural knowledge. I can link this utterance with a cultural understanding. And what about the principal challenge, figurative language? What about metaphor, metonymy, onomatopoeia, personification, horrible jokes, all these things that they know are literally, it's hard for some people to get this figurative language. How can teach the machine to understand? So for that, we have to understand how the language works, how we think, how we create, and how we get the world, and how we analyze the world through the concepts. The first linguistic who talked about this was Ferdinand de Sassur. There's not Chansky. Chansky is based on a lot of thesis from Ferdinand de Sassur. And he started to talk about concept, signified, and sound, always, the representation. But, okay, I have three, there's the image, and three sound, and three meaning, okay, there's a lot of, but sound is everywhere. Sound starts in everywhere. And sound is now who working in call centers. There's very, very, very insights through the voice. And making a little stop in the conversation, I wanna talk about bias in voice. Bias, if you see this research, you can see that the voice, there's bias. We can get bias in the voice if we, in this preference of leaders with masculine voice. And there's a, in this study, asked several people about the voice of women. And they say that the leaders of the voice are linked about masculine tones. So, let's think about it. And the context. How can develop the context? How can see the context, can understand the context? And number, bulk expression. How we use gives? Because the gives supplies our Nobel bulk expression. And there's very interesting, we can understand the gives. We can understand and give meaning to gives. And pragmatics. We work in a society and we work with rules. And in pragmatics, in the use of language, there are working several principles, several maxims, as for example, politeness and cooperation. In the gift behind me, there's not any cooperation. But in linguistics, in conversation, there are several, as maximum of quantity, maximum of quality, maximum of manner, different maxims that I invite you to see to read something about this. Pragmatics. How I interact with the world, how can I interact with people. And we know that. We know, I can, at this moment from the beginning and now, I try to show you how complex is our code. Our human code, that is the key of the difference about a lot of species. We can, and we in computational linguistics and natural language processing, any speech, any text, using software as OCR, SER, different kinds of software, I can get any kind of data, image, sound, whatever, and get the text. I use the text to understand in different levels of linguistics, and I can generate also another text. That is the best challenging, this challenger to get that. But there are companies, as for example, Narrativa. Narrativa is producing 50,000 news per day using artificial intelligence for us, 20 minutos. So now the challenge is to get a machine to talk, speak, understand, and make me feel like a human. As for example, her, if you see, I recommend, if you haven't seen this movie, I recommend this movie. Because there is now, he fall in love machine with a machine, and when he realized that there is this machine, he feels so bad, so sad. But it's the principal challenge. And how we work with all these complex system, as for example, in companies as Tiger, focus on semantic technologies, we use, as we say, symbolic artificial intelligence. We try to define, produce models, focus on this symbolic approach, this focus on symbolings behind of the linguistic, any level of linguistic, focus about all semantics. So because how, this is knowledge engineer, this is how we manage the knowledge, knowledge that comes from every data, every kind of data, call centers, emails, or legal documents, wherever. But our, how we acquisition the reality acquisition and linguistic acquisition is through, through association. In fact, our memory works with linking concepts suddenly. And this is the reason that we work, and working with semantics is easier to work with machine understand several data, unstructured data. So as I told you, we work with different kinds of sources of data, we don't care about the nature of this data, because we use another different software to convert this data in text. And once I have this text, we start to make an ontologies. Yes, we are going to, I know that you know ontologies, but I'm going to stop a little bit soon. But as you see semantic technologies, I use open data. I use a lot of kind of data. The humans, we live in the story of the humanity story that we produce a lot of data. Any speaker with a smartphone is producing a lot of data. And there is a lot of data, open data that are linked and I can have knowledge graph. And this knowledge graph are very useful and it's free, but we have to produce the software to connect in all this data. As I told you, there's different kinds of analyze the reality, analyze the semantic. You know, there's different steps. We have vocabularies, okay? List supports, okay, it's useful. In fact, in artificial intelligence, it starts developing for a translator, translator area. And tags, the following steps, tags, okay. These words I'm going to start to put in some group with a lexical semantic reason or for semantic fields or any reason. Another step up is taxonomies. It's okay, there's groups that means something and there are linking about some reality. Enthesaurus, these groups, they are connected. They are linked like a network of words based on semantics, based on morphology, like the form of the word or the base of the word, whatever reason there are linked. And ontologies, there are smartsaurus. As for example, if we are working with Mortage for any banking solution, if we talk about the concept Mortage, there are linked about a lot of, suddenly the machine is going to propose you a lot of concept linked about Mortage. As for example, credit, loan or whatever. So, is where we work. As for example, we start to, as birds, they lie, X, half it can. It's not only that they use the, okay, what we have seen with the friend in Asosiros. No, no, there's linking about one reality, the reality of this concept, which is linked with a lot of concepts. And it's very useful for marketing. In marketing, they are using for a while. In fact, it is Acre, this is from my PhD. So, the Acre is talking about the conceptual brand map of McDonald's. If you see McDonald's, families, social involvement, there is no only products as we see, okay, McDonald's, hamburgers. No, there is linked, is linked to the brand with a lot of concept, emotional, aspirational, a lot of concepts. That's the McDonald's is linked. So, how we get that, how we work in Tiger with sharing these two approaches, this symbolic, because we have to understand the data, to extract your data using neural networks and using categorize the entities, extraction these entities, and see the grammar inside the entities, see the grammar inside the text that we are extracting or we are working with, and we start to give value here, because we have our own ontologies, we have our own knowledge graph. In fact, we are focused on banking, finance, legal, because we have software with ontologies, global ontologies, general ontologies, and for any business case, we develop a specific ontology and knowledge graph and rules, because we have our own algorithms, all our artificial intelligence focus on that. So, we work with that, we work with the ontologies, as I say, natural language understand, because we have profiles like natural language profiles, as for example, computational linguistics and natural language engineers. So, focus on that, focus on understanding grammars and machine learning for what I told you. And what we work, we work with that. We don't care about the data, the natural data, because we use software for this data transformation, and we, our value is here, is with the natural language symbolic rules, and we use these, sharing these words, reference detection, lexical parsers, different kinds of software that help us to categorize and help us to make, to see the grammar behind the data. And also, we extract the data and we make sure that data quality, because we use, we create our own cleanser for OCR. We use, we are developing, we are developing a software cleanser for OCR. So, and we consolidate the data. In fact, I'm going to show you a sample of Banco Santander, how we match the data, as for example, or with the IT company with Registro Nacional, Registro Mercantil, in which we make the match and we consolidate this data. And also, if I, if the user upload the ID, upload another legal document, we match the different data data points. And we normalize the data. How, how does it mean? There is different, a lot of kinds of, to express data, for example, we talk about invoice. Sometimes the amount is expressed by verbal or expresses with numbers. So the client asks us, okay, I want to, I want to extract this information only with this, in this way, okay. We extract, we understand, the machine understand that it's amount, and extract like with number. This is a sample of Banco Santander, what we do onboarding, digital onboarding. The user upload data of constitution, upload power of attorney and company ID, upload, and we process that. We process the unstructured data. As for example, I'm going to show you the power of attorney. There's a, the following power of attorney. There's a 10 pages, and we extract in two minutes half, we read and we understand, we extract the information, and we can see, and we can extract the apoderado, who has the power to open the account, and we say, okay, or cow, okay. And we send this information, and we, through AP, we connect with the system of the client. In two minutes and 30 minutes, in two minutes and 30 seconds, the machine reads, understand, extract, extract 150 data points, that the data points that the clients ask us for this process. So, when I talk about with some clients that, oh, okay, I have, well, there's not too much cases, but I have money to start working on artificial intelligence, and working with automation. Okay, okay. Let's start to think about if you need that, if it has sense. It has, if this has sense, you have to focus on, do you have a team working in the same tax and the same repetitive tax and boarding tax, reading the same documents, a lot of stuff of documents, above all legal or admin documents. You want a safe cost, and you want to save avoid, you want to avoid human mistakes, and you want to make it scalable and digitalize your process. So, you can, you can automatize this process with using natural language processing, because for us it's easier to train in the machine. We use, for training the machine, we use only three months, because we have ontologies, we have knowledge graph, we have teams that working in multilingual solutions, and if we need to develop our solutions in some new language, as for example, Korean, we get a team of natural language engineers and computational linguists. So, we work on a lot of these issues and a lot of things about that, and so this is the reason that now we are focused on automation. If we can, if a machine can do that, why are you going to spend a lot of time? And now this is the moment that there is a lot of people who ask me, okay, you are going to automatize a lot of words and what happened with the workers? And I say, okay, maybe the workers, maybe no, and I'm sure that the workers, the real talent of these workers, I focus on another things that there are more value for the company, as for example, customer service, strategy, creativity, managing teams, and all these things, and all these things that we don't have time if I have to read a lot of papers or working on these things. So now it's time to use your people skills and let the artificial intelligence, people who work with the natural language processing and help you to automatize and help you to read faster, easy, and make your life easy. So thank you very much for all and have a nice day. Thank you, Christina. So now we have time for questions. Are there any questions in the audience? Yeah, there's one over here. So we'll bring a microphone around to you. Hi. Hi. I would like to ask, how far did you get with idioms and how pure your input has to be concerning that people make mistakes. They are dyslexic people. Also, and people who have some speech problems. So do you have some mechanism to pure the data before it even gets in your system to work with? I'm sorry, I'm not native English, so. Okay, no worries. Thank you for asking that, because yes, of course, we work with the linguistic profiles, and people who work with the linguistic, we know that there's a lot of, there's things like that you say, dyslexical or as we call diatomic varieties of language is not the same Spanish from Spain, Spanish from Mexico, this different, we assume as different projects. But in the case of your question, we work with the, they're different, what is the end of the communication end and what is the, we work with the grammar. So we link this item, these different expressions with different expressions, rights and wrong. And the machine is going to propose, there's the cleanser, the cleanser on the other algorithms that we use to propose the right, the right, the normative, the normative use of language. But we work with that. Sorry. So you have different models for every idiom? Did I get it? Yes, we have different teams for every idiom. Is what I told you, if we have, for example, we get a Siri B last July and a Korean, there's an investor who is from Korea. He asked us to develop our solutions in Korean. We develop our solutions in one month because we get a team with a wide idiom on language. So, and for the case that working, our work is very, very supervised. There's linguistic people, computational linguistic who were training the machine very, very, very close to the machine and very, very supervised this process. So there's the reason why we get 80% of accuracy on unstructured data. Now in Santander, we are getting 97% in unstructured data because in the training phase, we have a team that working very, very, very close supervising all the process, all the entities training the machine very close. Yes, the numbers are very impressive because I'm a linguist, so I know the mathematics. But the most important, how to say it, in my question, do you recycle the 80% of the model for one idiom to another or the models are totally independent? So do you have a network like in Spanish language? No, we, I'm sorry, excuse me, go ahead. So you have a standard Spanish, one model for that and you have some idiom or dialect, dialect is better. So you expand that model, this is the base but do you expand or do you make models for this dialect, this dialect? Yes, this dialect is what I told you before. We assume every dialect with every project, different project but because it's not the same training the machine using legal documents from Mexico that using legal documents from Spain. So we have to focus above all in lexical levels because we use a lot of words. In fact, syntax level because we use the grammar sometimes we use in generative schools that they understand better that we move objects and verb and all these things. So, but we focus, we assume different project is linked with every idiom, every dialect, every language. We develop our solutions in Chinese, French, Spanish, Spanish from Spain, Mexican, Spanish. Well, Chinese, Korean and Japanese, we have a lot of our solutions in a lot of languages. And as I told you, if you propose Danish, for example, we develop our solution in Danish in one month. One month and a half, yeah, about half. Great, any more questions? There don't seem to be any more. So then another round of applause for Christina. Thank you so much.